Seven
steps to troubleshooting hard drives
Here’s a quick and proven hard disk troubleshooting
process. With each point, ask yourself the question(s)
that follow.
Physical connectivity - Is the drive
receiving power? Is it plugged into the !important;
font-weight: 400; position: static">PC by a
correctly connected ribbon cable? For IDE drives, are
its jumpers set correctly? Or with SCSI drives, are
its SCSI termination and ID set correctly?
BIOS setup - Does the BIOS see the
drive?
Viruses - Does the drive contain any
boot sector viruses that need to be removed before continuing?
Partitioning - Does FDISK find a valid
partition on the drive? Is it active?
Formatting - Is the drive formatted
using a file system that the OS can recognize?
Drive errors - Is a physical or logical
drive error causing read/write problems on the drive?
Operating system - Does your OS have
a feature that checks the status of each drive on your
system? If so, what is that status?
Checking physical connectivity
To work properly, a hard drive needs power and a connection
via a ribbon cable to the PC. If a drive doesn’t
work after moving it to a new PC, after physically moving
the PC, or after the cover has been taken off, start
your troubleshooting by checking the physical connectivity.
It’s possible for plugs to jiggle loose when moving
a PC, and it’s easy to uproot a ribbon cable connection
when pulling circuit boards or performing other maintenance
tasks inside the case.
A
hard disk works with any Molex connector from the PC’s
power supply. Make sure the plug is fully inserted.
Molex connectors require a lot of pressure to fully
insert, and even more pressure to remove, so don’t
be afraid to push hard or pull, as the case may be.
Just make sure you handle the plastic connector, and
do not try to push or pull the wires.
As
the PC starts up, place the palm of your hand on the
flat part of the hard disk. If you can detect any vibration,
the drive probably has power. If there’s no movement
at all, either the drive’s physical mechanism
is shot or the Molex connector you have selected is
faulty. Try using a different connector before assuming
the drive has a problem.
Systems
like the AT/LPX have a small connector that runs from
the front of the case to the hard disk. On ATX systems,
it runs from the motherboard to the hard disk. This
enables the LED on the case to illuminate when the hard
disk is in use. Don’t rely on that LED as a positive
indicator as to whether the hard disk is receiving power.
The light could be burned out, the wire disconnected,
or the drive might be receiving power but not be connected
correctly to the PC.
The
other physical requirement for a drive is the PC itself.
If it’s an IDE model, the drive should be connected
via a ribbon cable to the IDE bus on the motherboard.
Connections can also be made with a SCSI or proprietary
expansion card. Secure both ends of the ribbon cable
connector and make sure the connector is covering all
pins. On systems where the pins are bare instead of
surrounded by a plastic ridge, it’s easy to offset
the connector by a row or two on the pins. If the drive
is getting power but the BIOS can’t find it, try
a different ribbon cable; the one in use might have
a broken wire or other flaw.
Note
that there are different types of hard disk ribbon cables.
UltraDMA 66 and above drives require 80-wire cables.
If you use the 40-wire type, the drive will be limited
to UltraDMA 33 performance.
The
red stripe on the ribbon cable must match up with Pin
1 on both the drive and the motherboard or expansion
card. Sometimes, though, it’s not easy to locate
Pin 1. Look for tiny numbers at one end of the connector.
If you see a 1 or 2, that’s the end with which
the red stripe should be matched. Some connectors are
notched on one side while the ribbon cables have a tab
that fits into that notched area. However, this isn’t
always the case. Unlike with floppy drives, where the
drive light stays on even if you have the ribbon cable
backward, there is no simple way to tell whether you
have the cable backwards. Without the notched connectors,
your only choice is to use the trial-and-error method.
Checking
jumper settings
On an IDE hard disk, one or more jumpers on the drive
must be set to determine its Master/Slave status. This
setting isn’t usually an issue in an existing
hard disk installation that suddenly doesn’t work
anymore, but it can cause problems when you move a drive
from one PC to another.
Depending
on the drive, the following jumper settings may be available:
Single - Use this setting when the
drive is the only one on that IDE subsystem; that is,
the only one on that ribbon cable. Not all drives have
a Single setting; if there is none, use the Master setting
instead.
Master (MS) - When there are two drives
on the IDE subsystem and the other drive’s jumpers
are set to Slave, or if this is the only drive on the
subsystem and it doesn’t have a separate Single
setting, use this setting.
Slave (SL) - Use this setting when
there are two drives on the IDE subsystem and the other
drive’s jumpers are set to Master.
Cable Select (CS) - If you are using
a cable that relies on the device positioning to determine
its Slave/Master status, use this setting. This setting
is uncommon.
Checking SCSI termination
If the machine uses a SCSI drive, there are two factors
with which to be concerned: termination and ID. These
settings are not an issue when troubleshooting a drive
that has suddenly gone bad in an existing system, but
if you are moving a drive from one system to another
and it doesn’t work in the new system, improper
SCSI settings may be the culprit.
If
this is the last SCSI device in the chain, it must be
terminated. Termination methods vary. On some devices,
you set termination with an extra jumper; on others,
you use a cap or plug over a connector. On most hard
disks, you terminate using a jumper setting.
SCSI-based
drives usually have jumpers just like ATAPI ones, but
instead of setting the Master/Slave status, they assign
a SCSI ID number to the device. Some SCSI devices have
a wheel or button instead of jumpers with a little window
indicating the setting, but this is uncommon on a hard
disk.
There
can be up to seven SCSI devices on a single narrow SCSI
bus, and up to 15 devices on a wide SCSI bus. There
are either eight or 16 addresses in total, depending
on your system. The host adapter takes one of those
addresses, leaving seven or 15 for the remaining drives.
Usually, the host adapter claims the highest number
for itself.
The
SCSI ID comes from a binary representation of the jumpers.
For example, on a device with three SCSI jumpers and
all of them are without jumper settings, the ID would
be 000b (b stands for binary here), or 0. An ID of 001b
would be 1; 010b would be 2; and so on.
The
problem lies in the fact that some manufacturers set
the jumpers to read from left-to-right, while others
use right-to-left. So on one drive, the leftmost jumper
set would be 1, while on some other drive, the rightmost
jumper set would be 1. Check the drive’s label
for information about which way the drive works. If
all else fails, try the manufacturer’s Web site.
Checking
BIOS setup (IDE only)
In most modern systems, the BIOS can automatically detect
your hard disk, so no special BIOS setup is required.
However, if you are working with an older or quirky
BIOS, you might need to enter the BIOS setup program
and change the drive’s IDE channel (such as Primary
Master or Primary Slave, for example) from None to Auto
so the BIOS will attempt to find and identify the drive.
On
an old BIOS, you occasionally may need to select User
as the drive type and manually enter the drive’s
settings. Automatic detection of IDE devices was part
of the ATA-3 standard, released more than 10 years ago,
though, doing so would be rare.
Some
BIOSs also have a separate Detect IDE Devices utility
built in. If the BIOS contains such a utility, you can
use it to prompt the BIOS to detect the new hard disk.
This comes in handy when you aren’t sure whether
or not the drive is working, because you can get an
answer immediately rather than rebooting and waiting
to see whether the BIOS finds the drive on startup.
Virus
checking
If you’ve come this far in the troubleshooting
process and the drive still isn’t working, check
for viruses. A drive containing a boot-sector virus
will not only malfunction, it can spread the virus to
the disk you boot from, such as your emergency startup
disk.
On
a system that you know is good and that has an anti-virus
program installed, update the virus definitions, and
then make a virus-checking boot disk. Write protect
it, and then use it to start the system containing the
nonworking hard disk and check it for errors. If the
drive is not partitioned and formatted, the boot disk
might not be able to check the data area of the drive.
That’s okay for now; just let it get as far as
it can before moving on to the next step, checking the
partition.
Checking
for a valid partition
If the BIOS can see the drive but the drive isn’t
working, make sure the drive is partitioned. Use FDISK,
a command-line utility you’ll find on a Windows
9x/Me startup disk, to check. Boot from the write-protected
startup disk and type FDISK. When asked whether or not
you want large disk support, type Y.
If
the active partition’s type is FAT, FAT32, or
NTFS, it should be recognized by the operating system.
One exception would be if you put an NTFS drive into
a Windows 9x/Me system. The OS wouldn’t recognize
the NTFS because it doesn’t support NTFS, not
because it was partitioned incorrectly.
If
it is a partition problem, you have two choices: Try
to recover the data using a disk recovery program, or
give up on the data, delete the partition, and re-create
it in FDISK. If you want to try recovery first, see
the section below on Advanced Data Recovery Options.
If
you want to delete the partition and re-create it, return
to the FDISK main screen by pressing [Esc] and deleting
the partition (option 3 on the screen), and then return
to the main screen again and create a partition (option
1 on the screen). After using FDISK to create or delete
partitions, you must reboot the machine before doing
anything else.
Checking
drive formatting
If FDISK recognizes the drive and it has a valid partition
type, you should be able to view the drive’s content
from a command prompt via your startup disk, or from
the Recovery Console in Windows 2k or XP. Change to
that drive by typing its drive letter followed by a
colon and pressing [Enter]. Then, display a list of
files on the drive with the DIR command.
If
you see a message about an invalid media type, the drive
is probably not formatted using a file system that your
OS recognizes. You can either try a data recovery program,
or you can give up on the drive’s data and reformat
it with the FORMAT command.
Fixing
physical and logical drive errors
Let’s assume at this point that your OS finds
the drive and can read some files on it, but not all
of them. Maybe you’re receiving read or write
errors, or certain programs aren’t working right.
The problem is likely a physical or logical disk error.
A
physical disk error is a bad spot on the drive. It can
result from physical trauma to the computer, like knocking
it off of a table while it’s running.
A
logical disk error is a discrepancy between the two
copies of the file allocation table (FAT) on the disk,
or a discrepancy between the FAT’s version of
what are stored on the drive and the reality of actual
storage. Such errors are typically caused by improperly
shutting down the PC or abnormal program termination.
A
message about a data error while reading or writing
the drive is probably a physical error. Logical errors
are manifested in many different ways, not always directly
attributable to the disk itself. For example, certain
programs might fail to run or might lock up after starting.
Such a problem could mean a memory parity error or even
a bad cooling fan; you never know until you check the
system and eliminate the possibilities.
It’s
best to try the simplest solution first, so run a disk-checking
program. Windows 9x/Me/2k comes with ScanDisk, which
will check for both physical and logical errors. Windows
XP comes with a similar utility called Check Disk. In
Windows XP, access Check Disk from the Tools tab of
the drive’s Properties sheet. In early versions
of DOS, a command-line utility called CHKDSK does the
same thing. Use it with the /F switch to fix any errors
it finds.
Checking
and reactivating disks in the Windows 2k/XP OS
Windows
2k and Windows XP both have a Disk Management feature
that checks the status of each drive on your system.
This utility allows you to convert to dynamic disks,
change space allocation, and much more.
With
Disk Management, the most important thing to check is
the status of each drive. The Windows Disk Management
application will display the drive’s status. If
a drive reports that it is offline or a status other
than Healthy, right-click it and choose Reactivate Disk.
Conclusion
Because
so much is stored on hard disks, knowing how to revive
a failed hard drive is a critical function for technology
professionals. Having an effective guide to the recovery
process might mean the difference between a total loss
and full recovery. With this seven-step process, though,
you’ll be ready to tackle most hard disk errors
that arise.
|