Hard disk drive failure

src: i.ytimg.com

A hard disk failure occurs when the hard disk drive malfunction and stored information can not be accessed with a properly configured computer.

Hard disk failure may occur in normal operation, or due to external factors such as exposure to fire or water or high magnetic fields, or suffering sharp impacts or environmental pollution, which can lead to head crashes.

Hard drives can also become unable to operate through data corruption, interruption or destruction of the master boot record hard drive, or through malware that deliberately destroys the contents of the disk.

Video Hard disk drive failure

Cause

Some hard disk drives fail due to damaged parts, others fail prematurely. Drive manufacturers usually determine the average time between failure (MTBF) or annual failure rate (AFR) which is a population statistic that can not predict the behavior of individual units. This is calculated by continuing to run the drive sample for a short time, analyzing the resulting wear and tearing on the drive's physical components, and extrapolating it to give a reasonable estimate of its age. Hard disk drive failure tends to follow the bathtub curve concept. Drive usually fails in a short time if there is any damage coming from the manufacturer. If the drive proves to be reliable for a period of several months after installation, the drive has a greater chance of staying reliable. Therefore, even if the drive is subjected to several years of heavy daily use, it may show no signs of wear unless strictly checked. On the other hand, the drive can fail at any time in any situation.

The most notable cause of drive failure is head crash, where the internal read-and-write head of the device, usually just hovering over the surface, touching the plate, or scratching the surface of the magnetic data storage. Head crashes usually cause severe data loss, and data recovery efforts can cause further damage if not done by specialists with the right equipment. The drive plate is lined with a very thin, non-electrostatic lubricant layer, so the read-and-write head will only glance at the plate surface in the event of a collision. However, these heads only float a nanometer off the surface of the platter that makes a recognized risk collision. Another cause of failure is a faulty air filter. The air filter on the current drive equates atmospheric pressure and humidity between the drive cage and its outer environment. If the filter fails to capture the dust particles, the particle can land on the plate, causing the head to fall if the head occurs sweeping it. After a head crash, particles from the plates and heads of damaged media can cause one or more bad sectors. This, in addition to platter breakdown, will quickly render the drive useless. Drives also include electronic controllers, which sometimes fail. In such a case, it is possible to recover all data.

The phenomenon of disk failure is not limited only to the drive, but also applies to other types of magnetic media. In the late 1990s, the Iomega 100-megabyte Zip disk used in the Zip drive was affected by death clicks, so called because the endless drive was clicked when accessed, indicating an impending failure. A 3.5-inch floppy disk can also be a victim of disk failure. If the drive or media is dirty, the user may experience a hum of death when trying to access the drive.

Maps Hard disk drive failure

Signs of drive failure

Hard disk drive failure can be catastrophic or incremental. The first is usually displayed as a drive that can not be detected by CMOS settings, or that fails to pass through the POST BIOS so that the operating system never sees it. Hard drive failure gradually can be more difficult to diagnose, because the symptoms, such as damaged data and the slowing of PCs (caused by hard drive areas that repeatedly require repetitive reading before it succeeds), can be caused by many others. computer problems, such as malware. Increasing the number of bad sectors can be a sign of hard drive failure, but since the hard drive automatically adds it to its own growth defect tables, they may not be a proof of utility like ScanDisk unless the utility can catch it before the hard drive. the defect drive management system is not, or the backup sector stored in the backup by the internal hard disk drive defect management system is exhausted. Repeated recurring patterns of search activity such as sooner or later look-to-finish sound (death click) can be an indication of a hard drive problem.

Landing zone and load/unload technology

During normal operation, the head on the HDD flies over the recorded data on the disk. Modern HDDs prevent electrical interruptions or other malfunctions from landing their heads in data zones by physically moving head parking into dedicated landing lanes on discs not used for data storage, or physically locking the head in a suspended position ( dismantled ) lifted from the disc. Some early PC HDDs do not park heads automatically when power is prematurely disconnected and heads will land on data. In some other initial units, the user will run the program to park the head manually.

Landing zone

A landing zone is a plane area usually near the inner diameter (ID), where no data is stored. This area is called the Contact Start/Stop zone (CSS). Disks are designed in such a way that either spring or, more recently, rotational inertia on platters is used to park the head in case of unexpected power loss. In this case, the spindle motor temporarily acts as a generator, providing power to the actuator.

The spring tension from the head mounting constantly pushes the head toward the platter. While the disc is spinning, the head is supported by the air cushion and has no contact or physical wear. In CSS drive sliders carrying head sensors (often just called heads) are designed to survive a number of landings and takeoffs from the surface of the media, although wear and tear on these microscopic components eventually takes its toll. Most manufacturers design sliders to survive 50,000 contact cycles before the chance of damage to startup rises above 50%. However, the degree of decay is not linear: when the disk is younger and has fewer start-stop cycles, it has a better chance of surviving the next startup than the older, higher mileage disks (such as the head actually dragging along the disk surface until the air cushion is established). For example, the Seagate Barracuda 7200.10 hard disk drive series is rated to 50,000 start-stop cycles, in other words no failures associated with the head-platter interface are visible before at least 50,000 start-stop cycles during testing.

Around 1995 IBM pioneered the technology in which the landing zone on the disk was made by a precise laser process ( Laser Zone Texture = LZT) resulting in a series of "nanometers" smooth bulges in the landing zone, thereby increasing stiction and wearing performance. This technology is still widely used today, especially in desktop and enterprise drives (3.5 inches). In general, CSS technology can be susceptible to increased stiction (head tendency to stick to platter surfaces), eg. as a consequence of the increase in moisture. Excessive stiction can cause physical damage to the platter and slider or spindle motor.

Unload

Load/disassemble technology depends on the head raised from the platters to the safe location, thereby eliminating the risk of wearing and stiction altogether. The first RAMAC HDD and most early disk drives use a complex mechanism to load and unload the head. Modern HDDs use ramp loading, first introduced by Memorex in 1967, to load/unload onto "sloping" plastic near the edge of the external disk.

Overcoming shock toughness, IBM also created technology for their ThinkPad laptop line of computers called Active Protection Systems. When suddenly, sharp movements are detected by the default accelerometer in the Thinkpad, the head of the internal hard disk automatically breaks away to reduce the risk of potential data loss or initial flaw. Apple then also utilizes this technology in the line of PowerBook, iBook, MacBook Pro, and their MacBook, known as Sudden Motion Sensor. Sony, HP with HP 3D DriveGuard and Toshiba they have released similar technology on their notebook computers.

Failure mode

The hard drive may fail in some way. Failure can be immediate and total, progressive, or limited. Data may be completely destroyed, or partially or completely reversible.

Past drives have a tendency to develop bad sectors using and using; This bad sector can be "mapped" so it is not used and does not affect drive operation, and this is considered normal unless many bad sectors develop in a short period of time. Some early drives even have tables attached to drive cases where bad sectors must be listed when they appear. Then the drive maps the bad sectors automatically, in a way that is not visible to the user; drives with the mapped sector can continue to be used. Statistics and logs are available through S.M.A.R.T (Self-Monitoring, Analysis, and Technology Reporting) providing information about the mapping.

Other failures, which may be progressive or limited, are usually considered as reasons for replacing drives; the value of potentially risky data is usually much greater than the cost saved by continuing to use a drive that may fail. Repeated but recoverable read or write errors, unusual sounds, overheating and unusual warming, and other abnormalities, are warning signs.

Head crash : the head can contact the rotating plate due to mechanical shock or other reasons. At best this will cause permanent damage and loss of data where contacts are made. In the worst case, the debris eroding the damaged area can contaminate all heads and dishes, and destroy all data on all platter. If the damage is initially only partial, continuous rotation of the drive can extend the damage to the total.
Bad sectors : some magnetic sectors may become corrupted without making the entire drive unusable. This may be a limited occurrence or an imminent sign of failure.
Story : after a while, the head may not "take off" when it starts because it tends to stick to the plate, a phenomenon known as stiction. This is usually due to the unsuitable lubricating properties of the platter surface, design or manufacturing defects rather than wear. This sometimes happens with some designs until the early 1990s.
Circuit failures : electronic circuit components may fail to drive the drive from operating.
Bearing and motor failures : electric motors may fail or burn, and pads may be sufficient to prevent proper operation.
Other mechanical failures : parts, especially moving parts, of any mechanism may break or fail, preventing normal operation, with the possibility of further damage caused by fragments.

Failure metrics

Most hard disk and motherboard vendors support SMART, which measures drive characteristics such as operating temperature, spin-up time, data error rate, etc. Certain trends and sudden changes in these parameters are considered to be associated with a possible increase in drive failure and Data loss. However, S.M.A.R.T. only parameters may not be useful to predict individual drive failures. While some S.M.A.R.T. parameters affect the probability of failure, most of the drive failures do not produce S.M.A.R.T. parameter. Unexpected details can occur anytime in normal use, with the potential for losing all data. Recovery of some or even all data from a damaged hard disk sometimes, but not always possible, and usually expensive.

A 2007 study published by Google suggested very little correlation between failure rates and either high temperatures or activity levels. Indeed, Google's research shows that "one of our key findings is the lack of consistent patterns of higher failure rates for higher temperature drives or for those who drive at higher utilization rates." Hard drives with an average temperature reported under SMART 27 Ã‚ Â° C (81 Ã‚ Â° F) have a higher failure rate than hard drives with the highest reported average temperature of 50 Â° C (122 Â° F), failure rates are at least twice as high as the SMART-reported temperature range is optimal from 36 Â° C to 97 Â° C (117 Â° F). Correlations between manufacturers, models and failure rates are relatively strong. Statistics in this case are kept secret by most entities; Google does not associate manufacturer names with failure rates, although it has been disclosed that Google uses Hitachi Deskstar drives on some of its servers.

Google's 2007 study found, based on a large field sample of drives, that the actual annual failure rate (AFR) for each drive ranged from 1.7% for the first year of drive to more than 8.6% for a three-year drive. A similar 2007 study at CMU on drive companies showed that MTBF measured was 3-4 times lower than manufacturer specifications, with AFR estimating an average of 3% for 1-5 years based on replacement logs for large samples of drives, and that was difficult. drive failure is highly correlated in time.

A 2007 study of latent sector errors (as opposed to studies above complete disk failure) showed that 3.45% of 1.5 million disks developed latent sector faults for 32 months (3.15% of nearline disks and 1.46% of enterprise-class disks developed at least one latent sector error within twelve months of their ship's date), with annual sector error rates increasing between the first and second year. Enterprise drives show fewer sector errors than consumer drives. Background scrubbing is found to be effective in correcting these errors.

SCSI, SAS, and FC drives are more expensive than consumer-grade SATA hard disks, and are typically used in server and disk arrays, where SATA drives are sold to home and desktop and near-line desktop computers and are considered less reliable.. This distinction is now blurred.

The average time between failure (MTBF) of a SATA drive is usually set to about 1.2 million hours (some drives like Western Digital Raptor have got 1.4 million hours of MTBF), while the SAS/FC drive has ratings upwards of 1.6 a million hours. However, independent research indicates that MTBF is not a reliable estimate of the longevity of a hard disk (service life). MTBF is performed in a laboratory environment in the test chamber and is an important metric for determining disk drive quality, but is designed to measure only a relatively constant failure rate over the life of the drive (center of "tub curve")) before the final wear phase. A more interpretable, but equivalent, metric for MTBF is the annual failure rate (AFR). AFR is the percentage of expected drive failure per year. AFR and MTBF tend to measure reliability only in the early part of the hard disk drive life thereby minimizing the real probability of drive failure being used.

Backblaze cloud storage company generates annual reports into hard drive reliability. Nevertheless the company claims that the company primarily uses commodity consumer drives, which are deployed under company conditions, not in their representative condition and for the purpose of their use. Consumer drives are also not tested to work with corporate RAID cards of the type used in data centers, and may not respond when RAID controllers expect; such cards will be identified to fail after they have not done so. The results of such tests may be relevant or irrelevant for different users, as they accurately represent the performance of consumer drives in the company or under extreme pressure, but may not accurately represent their performance in normal or intended use.

Examples of drive families with high failure rates

IBM 3380 DASD, 1984 ca.
Computer Memories Inc. HDD 20MB for PC/AT, 1985 ca.
Fujitsu MPG3 and MPF3 series, 2002 ca.
IBM Deskstar 75GXP, 2001 ca.
Seagate ST3000DM001, 2012 ca.

SMART Hard disk error and Solution to it - YouTube

src: i.ytimg.com

Mitigation

To avoid data loss due to disk failure, a common solution includes:

Data backup
Rubbing the data
Data redundancy
Active hard-drive protection
S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) included in the hard-drive
The isolation base used on the server rack in the data center

A hard disk drive error is shown on the NAS. What should I do ...

src: qnapsupport.net

Data recovery

Data from failed drives can sometimes be recovered partly or completely if the magnetic layer of the disk is not completely destroyed. The company specialized in data recovery, at significant cost, by opening the drive in a clean room and using the right equipment to read data from the disc directly. If electronics have failed, it is sometimes possible to replace electronic boards, although often drives are nominally exactly the same models produced at different times have different circuit boards that are not compatible. In addition, modern drive electronic boards typically contain the adaptation data special drives required to access their system area, so that the related components need to be reprogrammed (if possible) or not sold and transferred between two electronic boards.

Sometimes surgery can be recovered long enough to recover data, may require reconstruction techniques such as carving files. Risky techniques can be justified if the drive is dead. If the drive is turned on after that drive can continue running for a shorter or longer time but never start again, so as much data is restored as soon as the drive starts. A 1990 drive that does not start because stiction can sometimes start by tapping it or rotating the drive body quickly by hand.

src: i.ytimg.com

References

Hard disk Drive Failure rates HGST Toshiba Western Digital and ...

src: i.ytimg.com

External links

Backblaze: Hard Drive Annual Failure Rate, 2014, 2016
Failure Trends in Large Disk Drive Populations - Google, Inc. February 2007
Net View-Slate on Disk Scrubbing
Hard Drive Failure
Sound is created by damaged and failed hard disk drives
Anatomy of hard disk drive: Logical and physical failure

Source of the article : Wikipedia

Hard disk drive failure

Senin, 16 Juli 2018