On Hard drive rotation : 

  

By Gare Henderson 2006

  

My phone buzzed, my young assistant's voice interrupts my quiet lunch, and somewhat nervously says...there is a new customer on line 2 who has a data recovery problem.  I take a swig of Starbucks best, to wash down the last remnants of a delicious smoked turkey wit mango mayo sandwich, and take the call.  Hello, how can I help you...I intone with all the warmth of your favorite great uncle.   My assistants nervousness, was an implicit signal that this customer sounds edgy...watch out.  As a twenty year veteran of the data recovery business...I know to expect anything.  From a tearful why me...I tried to do my best...but now I'll probably lose my job and my health insurance, and my baby girl is very sick, because the business is shut-down and I'm the IT guy.  To OK you bastard, charge me thousands of dollars, that I was going to spend on fixing my car....which I need to make a living, with some silly trick or some-other nonsense....because your friends at the manufacturer have sold me this piece of crap drive...which crashed after only 3 weeks.  

  

I've both cried, and wiped away many tears of others as legacy of a crashed hard drive.   Large commercial systems are my specialty, Linux boxes, RAID arrays, SCSI drives...but I get my share of crashed laptops, fried desktops, dead DVR's, and flattened thumb drives. I'm not complaining, the mis-fortunes, hopes, and dreams of corporate computer users has been my business since I ran computer help, a tech support firm on 9Th Ave at 42nd street in New York's Times Square in the 90's.   But I am a sensitive soul, who is happy to make his living from those truly un-avoidable ftp://gallenhenderson.com/images/personal 039.JPGdisasters like sabotage, logical errors, and legal forensics.  However I am pained by most of the data disasters that come my way since they are highly, if not completely avoidable, we call them ID-10-T errors.  The problem is that users harbor some fundamental mis-conceptions about data storage in general and storage media in particular.   Let me take this opportunity to dis-abuse you of a few of these dangerous mis-conceptions.  

  

Mis-conception 1: Hard drives are like cars, you use them till you need a new one.  

  

This is the most damming of notions, and the fundamental cause of much, if not most, of data lost due to hard drive or media failure.   Back in the 70's as a sales engineer for Otis elevator one fundamental truth of all engineered products was our major selling point.   Every nut, bolt, spring, or motor has an MTBF (mean time between failures) at the heart of its design.   In the elevator business, I could sell you a hoist-way door motor that I was willing to guarantee would open the doors 5,000 times, or one that would open the door 50,000 times.  The difference between these two motors was how carefully they were constructed, what choices of fabrication materials were made, what design was used, what maintenance was proscribed, and of course the cost.   This is and un-deniable truth of all engineered products, especially hard drives, laptops, USB interfaces, DVD disks and players, zip disks, and anything else you might consider.  

  

A hard drive is composed of thousands of engineered parts, coatings, and synchronized interrelationships.   Each part is more or less crucial to the successful operation of the drive.   A small spring may, due to heat, lose its elasticity and cause another part to move too slowly or not rise rapidly enough. A small bit of insulation, due to heat or electrostatic forces, may chip and float around the sealed platter chamber.  This is difficult to avoid and particularly pernicious in that even a microscopic bit of materials can scratch wide areas as it becomes lodged between the floating heads and the coated platter surface.  Whenever this occurs data is almost surely lost, because the heads are designed to never physically touch the platter surfaces.    I could mention any one of the over 10,000 pathological conditions that are listed in our tech guides, but hopefully you get the idea....a hard drive is a robust but highly sophisticated system of often microscopic parts....which given enough time will fail and your data that it contains will be lost.  

  

This leads to the first and possibly the most important tip that I can offer....view your hard drive like  a tire...not a car.  A blown tire and a hard drive crash can be fatal.  While a broken-down car, is usually just an inconvenience.  When your hard drive fails, even the most diligent backup schedule will be partially uncompleted, with at least the last few minutes of data being lost.   In most cases losing a few minutes or a single day of data is not mission critical.  However, when the failure leaves a key data base in an un-resolved state, like a accounting program, or when the system is capturing real time data as in a trading or monitoring applications, even a minimal loss of data can be catastrophic.  

  

The only answer is that drive rotation schedules are mandatory in sensitive applications.  I recommend that you note the date when the drive was first installed for a mission critical application, and change it on a regular basis.  For mission critical applications the operational depreciation schedule may be a year, or even a few months for drives that are accessed by multiple users simultaneously.    You don't have to chuck the old but still usable drives; you can move them to less critical applications like desktops, when user data is actually being stored on a corporate server.   Desktop hard drive crashes in a well managed office are simply inconvenient, and can quickly be over-come with a new drive and a re-installed OS, so a less than stellar device may be appropriate.  But for your mission critical applications employ fresh drives on a regular basis and you will save your self a lot of sleepless nights.  

  

  

Mis-conception 2: Computers are like humans they just get a little tired.  

  

One of the first questions that I ask a new client is...was this crash sudden, or has the drive been dying for some time?  Most people out of pride or ignorance will respond that it just happened all of a sudden.  But with further questions like, "have you noticed slow start ups or shut downs" during the last 10 uses of your computer, the answer is invariably yes.   Well, although this is not always the case, most hard drive failures, especially the most difficult to recover, are the result of a gradual failure of the media which follows a disease model.    

  

Your computer takes a couple of seconds longer to start each day, or it fails to shut down after your have asked it to.   Many people attribute these symptoms to user error, or viral activity.  However, more often than not, these are the signals that your media is struggling to survive, and ignoring them is analogous to ignoring a deep bullet wound.   The damage grows, in the body through opportunistic infections and loss of vital fluids, in your hard drive it manifests itself as increasing numbers of CRC errors (cyclical  redundancy errors: and error identification method used by computers), and bad sectors.  Literally the capacity of your drive is being reduced every second it is in operation.   This is not because the problem, which is usually a small shard of plastic or metal, is only affecting the unused portion of your drive.  Instead the damage will be most severe on the files that are accessed most often, which means transaction databases and system files are the first to suffer.    

  

The problem manifests as slow operations, because the media controller and the OS must constantly try to find new locations to store key data.  These progressive errors then become ever increasing overhead as more and more of your computers power is used to overcome these errors.  The bad news is...this problem is all but inevitable, and the only solution is frank recognition that your hard drive is almost as perishable as bananas.  When the latent phenomenon becomes apparent, you don't have to run out and buy a new drive.   But ignore it at your peril.  This is the time to increase your vigilance about putting your important documents in a safe place. 

  

  

Mis-conception 3: I should backup to an external hard drive  

  

External drives are great for moving data from one machine to another, or adding capacity.   However, as any experienced data recovery technician can attest external hard drives are submitted at twice the rate of internal drives.  The reasons for these failures vary from falls, shorts, spills to fits of rage.   But more often than any of these clear causes for a catastrophic failure are the drives that one day just failed....with no warning.   The reasons for these failures are common to all hard drives, but often exacerbated by poor heat distribution, increased exposure to normal tremors from being mounted on often bumped desks.   This means that your external drive, despite its advantages of being used less that a typical boot drive, often has a shorter life than its internal counterparts.  The stark reality is that an external drive is identical to an internal drive, in a much less sophisticated case.  

  

Store shelves are full of external hard drives which appear solid and substantial, and some have incredible capacities in comparison to their internal cousins.   This is the tender trap of data storage.  First of all the drives in those fancy cases are not only manufactured by the same companies as the internal versions...they are the same drives.  However, a couple of important concerns should be noted.  First, to make the external units as competitive as possible, case manufacturers squeeze the drive OEM's (Maxor, Seagate, Western digital, etc.) to get the best prices.   This often leads to what we call in the industry the orphan drive problem.   

  

Orphan drives are drives done in short production runs, before something significant about their design is modified.   For example, an OEM comes up with a hot new chip to implement a part of the drives data access strategy.  Lets say that this hot new chip has the ability to stack commands in a way that allows the drive to be more responsive in a multi-user environment.   It works brilliantly in beta testing, and a run of 5,000 drives is put into production.   A hundred of the drives are released for further testing by internal users in real world situations.   But to everyone's dismay the new drives tend to lose instructions when multiple users of disk intensive applications like CAD or graphics programs try to access the drive simultaneously, resulting in small distortions.   This is marginally acceptable, but puts the OEM at a competitive disadvantage with high end users, so the design is modified to use an older more reliable chip.  The remaining 4,900 drives are now orphans, and sold at auction or sold to a reseller, who is fully aware of the problem.   These discounted drives are then often sold to large buyers, who will enclose them in cases and market them to less demanding end users.   This is fine, and no one has been harmed until one of the drives fail.   

  

The failure of an orphan drive is significant in a data recovery situation because data recovery companies do not generally have access to spare parts.   When we have to repair a drive to recover the data, we must almost certainly find the identical drive in the after-market and harvest the needed parts from it.  If a drive is from such a short run, the chances that we will be able to quickly locate a suitable drive for parts are severely limited.  This problem often translates into extended recovery times at best and un-recoverable drives at worse.  

  

One last point about external drives may save you tons of grief.   If you see an external drive that has more capacity that any of the single drives available, then is is most likely a RAID array.  A RAID array is at least two single drives combined in to a single data-set by clever software and hardware.  It may not appear any larger than a normal desktop IDE or SATA drive, but in those cases it may contain multiple laptop format drives.  This is a potential nightmare if your external unit fails, because your data will most likely be spread across all of the drives, and the failure of one is often enough to make the entire data-set un-usable.  In short there is no free lunch, and an external unit which contains multiple drives is both more likely to fail, and much more expensive to recover. 

  

Mis-conception 3: Everything is important  

  

 What is a careful data steward to do?  The first thing is not to classify everything as important.  Data, like canned goods, is quite often of perishable utility.  For example data about past clients is useful, but its loss is often insignificant when compared to data about current or even future clients.  Often when I ask a client to prioritize the data that needs to be recovered, they will tell me that I need everything.   This tells me that this client has not given his valuable data sufficient scrutiny, and is quite possibly why they need to engage in an expensive recovery at all.    Just as my years in the elevator business found me in more broken elevators in a few short years, than most people will encounter in a lifetime...as a data/hardware specialist I have experienced more crashed hard drives, than most small groups of users will ever suffer.    But when a drive of mine fails, I generally just toss it in the trash, or try to return it to the manufacturer for a refund.   Does this mean that I don't value my data as much as a normal user!  Of course not...my cavalier attitude springs from a couple of basic practices which make the loss of any one drive unfortunate for sure but rarely catastrophic.   First I treat important data differently from mission critical data.   Important data is backed up to either DVD or CD, and these are examined periodically, stored carefully, and labeled assiduously.    I have carved out some space on my web-hosting accounts where compressed versions of important data are stored for the long term as well.   This means that even if my hard drives fail, I may have to take some time...but I know that this important data is somewhere in my possession.    

  

This is fine for important data which may be useful for litigation or reference.  However for my mission critical data the time necessary to search would often be unacceptable.   Consequently I make it a point that mission critical  data exists in at least 2 or 3 accessible places at all times.  This sound difficult but when this objective is factored into your daily operations it is quite painless.   For me email are where almost all critical data is generated and stored, so I ensure that I CC myself or one of my colleagues in all important emails, and I make sure that a copy of emails remains on my email server for at least a week.  I route my company mail through a big public mail server like Gmail, and it retains all correspondence.   I also ensure that my email client...outlook express...is open on my laptop so that all messages which are sent or received are also stored there.   For documents and spreadsheets, and pix I use web based systems like Google writely to create or store my documents in a safe professionally managed environment.   In short I relax myself from the straight jacket of my own equipment, and use resources where I find them.  

  

 Of course you may have many documents which require increased scrutiny, or security.   However if your classification policy is intelligent, you can harden your organization to most hardware failures.  

 

----------