|
On Hard
drive rotation : By Gare
Henderson 2006 My phone buzzed, my young assistant's voice
interrupts my quiet lunch, and somewhat nervously says...there is a new
customer on line 2 who has a data recovery problem. I take a swig of
Starbucks best, to wash down the last remnants of a delicious smoked turkey wit mango mayo sandwich, and take the call. Hello,
how can I help you...I intone with all the warmth of your favorite great
uncle. My assistants nervousness, was an implicit signal that
this customer sounds edgy...watch out. As a twenty year veteran of the
data recovery business...I know to expect anything. From a tearful why
me...I tried to do my best...but now I'll probably lose my job and my
health insurance, and my baby girl is very sick, because the business is
shut-down and I'm the IT guy. To OK you bastard, charge me thousands of
dollars, that I was going to spend on fixing my car....which I need to make a
living, with some silly trick or some-other nonsense....because your friends
at the manufacturer have sold me this piece of crap drive...which crashed
after only 3 weeks. I've both cried, and wiped away many tears of others
as legacy of a crashed hard drive. Large commercial systems
are my specialty, Linux boxes, RAID arrays,
SCSI drives...but I get my share of crashed laptops, fried desktops, dead
DVR's, and flattened thumb drives. I'm not complaining, the mis-fortunes, hopes, and dreams of corporate computer
users has been my business since I ran computer help, a tech support firm on
9Th Ave at 42nd street in New York's Times Square in the 90's. But
I am a sensitive soul, who is happy to make his living from those truly
un-avoidable Mis-conception
1: Hard drives are like cars, you use them till you need a new one. This is the most damming of notions, and the fundamental
cause of much, if not most, of data lost due to hard drive or media
failure. Back in the 70's as a sales engineer for Otis elevator
one fundamental truth of all engineered products was our major selling
point. Every nut, bolt, spring, or motor has an MTBF (mean time
between failures) at the heart of its design. In the elevator business,
I could sell you a hoist-way door motor that I was willing to guarantee would
open the doors 5,000 times, or one that would open the door 50,000
times. The difference between these two motors was how carefully they
were constructed, what choices of fabrication materials were made, what
design was used, what maintenance was proscribed, and of course the
cost. This is and un-deniable truth of all engineered products,
especially hard drives, laptops, USB interfaces, DVD disks and players, zip
disks, and anything else you might consider. A hard drive is composed of thousands of engineered
parts, coatings, and synchronized interrelationships. Each part
is more or less crucial to the successful operation of the drive.
A small spring may, due to heat, lose its elasticity and cause another part
to move too slowly or not rise rapidly enough. A small bit of
insulation, due to heat or electrostatic forces, may chip and float around
the sealed platter chamber. This is difficult to avoid and particularly
pernicious in that even a microscopic bit of materials can scratch wide areas
as it becomes lodged between the floating heads and the coated platter
surface. Whenever this occurs data is almost surely lost, because the
heads are designed to never physically touch the platter
surfaces. I could mention any one of the over 10,000
pathological conditions that are listed in our tech guides, but hopefully you
get the idea....a hard drive is a robust but highly sophisticated system of
often microscopic parts....which given enough time will fail and your data
that it contains will be lost. This leads to the first and possibly the most
important tip that I can offer....view your
hard drive like a tire...not a car.
A blown tire and a hard drive crash can be fatal. While a broken-down
car, is usually just an inconvenience. When your hard drive fails, even
the most diligent backup schedule will be partially uncompleted, with at
least the last few minutes of data being lost. In most cases
losing a few minutes or a single day of data is not mission
critical. However, when the failure leaves a key data base in an
un-resolved state, like a accounting program, or when the system is capturing
real time data as in a trading or monitoring applications, even a minimal
loss of data can be catastrophic. The only answer is that drive rotation schedules are
mandatory in sensitive applications. I recommend that you note the date
when the drive was first installed for a mission critical application, and
change it on a regular basis. For mission critical applications the
operational depreciation schedule may be a year, or even a few months for drives
that are accessed by multiple users simultaneously. You
don't have to chuck the old but still usable drives; you can move them to
less critical applications like desktops, when user data is actually being
stored on a corporate server. Desktop hard drive crashes in a
well managed office are simply inconvenient, and can quickly be over-come
with a new drive and a re-installed OS, so a less than stellar device may be
appropriate. But for your mission critical applications employ fresh
drives on a regular basis and you will save your self
a lot of sleepless nights. Mis-conception 2:
Computers are like humans they just get a little tired. One of the first questions that I ask a
new client is...was this crash sudden, or has the drive been dying for some
time? Most people out of pride or ignorance will respond that it just
happened all of a sudden. But with further questions like, "have
you noticed slow start ups or shut downs" during the last 10 uses of
your computer, the answer is invariably yes. Well, although this
is not always the case, most hard drive failures, especially the most difficult
to recover, are the result of a gradual failure of the media which follows a
disease model. Your computer takes a couple of seconds longer to
start each day, or it fails to shut down after your have asked it
to. Many people attribute these symptoms to user error, or viral
activity. However, more often than not, these are the signals that your
media is struggling to survive, and ignoring them is analogous to ignoring a
deep bullet wound. The damage grows, in the body through opportunistic
infections and loss of vital fluids, in your hard drive it manifests itself
as increasing numbers of CRC
errors (cyclical redundancy errors: and error identification method
used by computers), and bad sectors. Literally the capacity of your
drive is being reduced every second it is in operation. This is
not because the problem, which is usually a small shard of plastic or metal,
is only affecting the unused portion of your drive. Instead the damage
will be most severe on the files that are accessed most often, which means
transaction databases and system files are the first to suffer. The problem manifests as slow operations, because
the media controller and the OS must constantly try to find new locations to
store key data. These progressive errors then become ever increasing
overhead as more and more of your computers power is
used to overcome these errors. The bad news is...this problem is all but
inevitable, and the only solution is frank recognition that your hard drive
is almost as perishable as bananas. When the latent phenomenon becomes
apparent, you don't have to run out and buy a new drive. But
ignore it at your peril. This is the time to increase your vigilance
about putting your important documents in a safe place. Mis-conception 3: I
should backup to an external hard drive External drives are great for moving data from one
machine to another, or adding capacity. However,
as any experienced data recovery technician can attest external hard drives
are submitted at twice the rate of internal drives. The reasons for
these failures vary from falls, shorts, spills to fits of rage. But
more often than any of these clear causes for a catastrophic failure are
the drives that one day just failed....with no warning. The
reasons for these failures are common to all hard drives, but often
exacerbated by poor heat distribution, increased exposure to normal tremors
from being mounted on often bumped desks. This means that your
external drive, despite its advantages of being used less that
a typical boot drive, often has a shorter life than
its internal counterparts. The stark reality is that an external drive
is identical to an internal drive, in a much less sophisticated case. Store shelves are full of external hard drives which
appear solid and substantial, and some have incredible capacities in
comparison to their internal cousins. This is the tender trap of
data storage. First of all the drives in those fancy cases are not only
manufactured by the same companies as the internal versions...they are the
same drives. However, a couple of important concerns should be
noted. First, to make the external units as competitive as possible,
case manufacturers squeeze the drive OEM's (Maxor, Seagate,
Western digital, etc.) to get the best prices. This often leads
to what we call in the industry the orphan drive problem. Orphan drives are drives done in short production
runs, before something significant about their design is
modified. For example, an OEM comes up with a hot new chip to implement
a part of the drives data access strategy. Lets say that this hot new chip has the ability to
stack commands in a way that allows the drive to be more responsive in a
multi-user environment. It works brilliantly in beta testing, and
a run of 5,000 drives is put into production. A hundred of the
drives are released for further testing by internal users in real world
situations. But to everyone's dismay the new drives tend to lose
instructions when multiple users of disk intensive applications like CAD or
graphics programs try to access the drive simultaneously, resulting in small
distortions. This is marginally acceptable, but puts the OEM at a
competitive disadvantage with high end users, so the design is modified to
use an older more reliable chip. The remaining 4,900 drives are now
orphans, and sold at auction or sold to a reseller, who is fully aware of the
problem. These discounted drives are then often sold to large
buyers, who will enclose them in cases and market them to less demanding end
users. This is fine, and no one has been harmed until one of the
drives fail. The failure of an orphan drive is significant
in a data recovery situation because data recovery companies do not generally
have access to spare parts. When we have to repair a drive to
recover the data, we must almost certainly find the identical drive in the
after-market and harvest the needed parts from it. If a drive is from
such a short run, the chances that we will be able to quickly locate a
suitable drive for parts are severely limited. This problem often
translates into extended recovery times at best and un-recoverable drives at
worse. One last point about external drives may
save you tons of grief. If you see an external drive that has
more capacity that any of the single drives available, then is is most likely a RAID array. A RAID array is at
least two single drives combined in to a single data-set by clever software
and hardware. It may not appear any larger than a normal desktop IDE or
SATA drive, but in those cases it may contain multiple laptop format
drives. This is a potential nightmare if your external unit fails, because
your data will most likely be spread across all of the drives, and the
failure of one is often enough to make the entire data-set un-usable. In
short there is no free lunch, and an external unit which contains multiple
drives is both more likely to fail, and much more expensive to recover. Mis-conception 3:
Everything is important What is a careful data steward to do?
The first thing is not to classify everything as important. Data, like
canned goods, is quite often of perishable utility. For example data
about past clients is useful, but its loss is often insignificant when
compared to data about current or even future clients. Often when I ask
a client to prioritize the data that needs to be recovered, they will
tell me that I need everything. This tells me that this client
has not given his valuable data sufficient scrutiny, and is quite possibly
why they need to engage in an expensive recovery at all. Just
as my years in the elevator business found me in more broken elevators in a few
short years, than most people will encounter in a lifetime...as a
data/hardware specialist I have experienced more crashed hard drives, than
most small groups of users will ever suffer. But when a
drive of mine fails, I generally just toss it in the trash, or try to return
it to the manufacturer for a refund. Does this mean that I don't
value my data as much as a normal user! Of course not...my cavalier
attitude springs from a couple of basic practices which make the loss of any
one drive unfortunate for sure but rarely catastrophic. First I
treat important data differently from mission critical data. Important
data is backed up to either DVD or CD, and these are examined periodically,
stored carefully, and labeled assiduously. I have carved
out some space on my web-hosting accounts where compressed versions of
important data are stored for the long term as well. This means
that even if my hard drives fail, I may have to take some time...but I know
that this important data is somewhere in my possession. This is fine for important data which
may be useful for litigation or reference. However for my mission critical
data the time necessary to search would often be unacceptable. Consequently
I make it a point that mission critical data
exists in at least 2 or 3 accessible places at all times. This sound
difficult but when this objective is factored into your daily operations it
is quite painless. For me email are where almost all critical
data is generated and stored, so I ensure that I CC myself or one of my
colleagues in all important emails, and I make sure that a copy of emails
remains on my email server for at least a week. I route my company
mail through a big public mail server like Gmail, and it retains all
correspondence. I also ensure that my email client...outlook
express...is open on my laptop so that all messages which are sent or
received are also stored there. For documents and spreadsheets,
and pix I use web based systems like Google writely
to create or store my documents in a safe professionally managed environment.
In short I relax myself from the straight jacket of my own equipment, and use
resources where I find them. Of course you may have many documents which require increased scrutiny, or security. However if your classification policy is intelligent, you can harden your organization to most hardware failures.
----------
|