Predictive Hdd Failure

Discussion in 'Storage & Backup' started by GenoCyber, Jan 9, 2012.

  1. GenoCyber

    GenoCyber Member

    Joined:
    Dec 27, 2001
    Messages:
    170
    Location:
    Sydney
    How does Dell and HP analyse and determine the predictive failure of a hard drive?

    I wasn’t able to find much information on their websites explaining how they determine predictive hdd failure.
     
  2. simmoi

    simmoi Member

    Joined:
    Jul 19, 2001
    Messages:
    32
    via SMART information

    http://en.wikipedia.org/wiki/S.M.A.R.T.

    from wikipedia

    S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system for computer hard disk drives to detect and report on various indicators of reliability, in the hope of anticipating failures.
     
  3. garetz

    garetz Member

    Joined:
    Aug 7, 2002
    Messages:
    913
    Location:
    2217
    Looks at MTBF

    Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a system during operation.[1] MTBF can be calculated as the arithmetic mean (average) time between failures of a system. The MTBF is typically part of a model that assumes the failed system is immediately repaired (MTTR), as a part of a renewal process. This is in contrast to the mean time to failure (MTTF), which measures average time to failures with the modeling assumption that the failed system is not repaired (infinite repair rate).

    from wikipedia

    btw, smart is in no way predictive, it is a simple monitoring system.
     
  4. rowan194

    rowan194 Member

    Joined:
    Jan 5, 2009
    Messages:
    2,030
    Semi-intelligent analysis of "dumb" SMART variables can help with predicting some failure modes... but yeah, SMART itself is ultra conservative and will still claim everything is okay when the drive is so full of bad sectors and read errors that it's virtually unusable.
     
  5. simmoi

    simmoi Member

    Joined:
    Jul 19, 2001
    Messages:
    32
    but manufactures pull smart information to predict hard drive failures.

    From a hp - 1720 - SMART Hard Drive detects imminent failure. Your hard disk drive is detecting an imminent failure. To ensure no data loss, backup contents and replace this hard disk. Attribute Failed: #10

    Dell, Toshiba also do this...
     
  6. OP
    OP
    GenoCyber

    GenoCyber Member

    Joined:
    Dec 27, 2001
    Messages:
    170
    Location:
    Sydney
    MTBF cannot be the only factor used to determine predictive failure otherwise all hdd's in a SAN or RAID would be flagged at the same time.
     
  7. simmoi

    simmoi Member

    Joined:
    Jul 19, 2001
    Messages:
    32
    thanks for posting that

    also to note

    MTBF is not accurate as manufacturers inflate it. MTBF on some enterprise 10K rpm hard drive is rated at around 200 years...
     
    Last edited: Jan 10, 2012
  8. Concept CBF

    Concept CBF Member

    Joined:
    Nov 16, 2008
    Messages:
    2,221
    Location:
    Behind you
    No shit lol have you read what every other post has said about SMART information...
     
  9. rowan194

    rowan194 Member

    Joined:
    Jan 5, 2009
    Messages:
    2,030
    A MTBF of 200 years doesn't mean the drive is expected to last 200 years.

    "MTBF is commonly confused with a component's useful life, even though the two concepts are not directly related. For example a battery may have a useful life of four hours, and an MTBF of 100,000 hours. These figures indicate that in a population of 1,000,000 batteries, there will be approximately ten battery failures every hour during a single battery's four-hour life span."

    http://en.wikipedia.org/wiki/MTBF

    HD manufacturer supplied MTBF is typically calculated through accelerated testing (imagine if they had to properly test each new model for 2 or 3 years before mass manufacturing it?) so it's not a number you would rely on anyway. :)
     
  10. OP
    OP
    GenoCyber

    GenoCyber Member

    Joined:
    Dec 27, 2001
    Messages:
    170
    Location:
    Sydney
    So no one knows how predictive failure of a hard drive is determined?
     
  11. dr_deathy

    dr_deathy Member

    Joined:
    Apr 27, 2007
    Messages:
    2,592
    you just learn how to read SMART data yourself. doesnt take long to work out when a drive is about to fail.

    my policy is ANY reallocated/pending sectors is drive dumped, even in the middle of the price jump they are still to cheap to risk and remapping hardly ever lasts long term.

    dell hp etc run their own software that they have set to go off when X data is read, just using what they have found to be a warning signs. like increasing reallocated sector count is dead give away.
     
  12. cvidler

    cvidler Member

    Joined:
    Jun 29, 2001
    Messages:
    12,079
    Location:
    Canberra
    Not really.

    1. Because you can't really predict failure anyway. Some drives die slow painful deaths, others just won't turn on one day after a life of excellent service.

    2. The data returned by SMART, can be interpreted in numerous ways. It only provides some metrics on things like run time, reallocations, read failures, write failures etc. It doesn't provide any guidance on how many of those things are bad or normal, or warning of imminent death. It's up to the user (or some software developer) to determine the thresholds. What WD says may be different to Seagate or Hitachi in their drive testing tools, which will be different to how IBM/NetApp/HDS etc. say is appropriate for their huge arrays in SAN devices, which of course will be different again for home users.
     
  13. neo_nick

    neo_nick Member

    Joined:
    Jun 19, 2004
    Messages:
    12
    Location:
    SE Melb
    What software do most people use to monitor the SMART data?
     
  14. PabloEscobar

    PabloEscobar Member

    Joined:
    Jan 28, 2008
    Messages:
    13,269

    I have computers with this that are still running (powered up/down daily).

    The downside is that on the HP's there does not appear to be any way of disabling it. So the user always gets the F1 to continue message.
     
  15. rowan194

    rowan194 Member

    Joined:
    Jan 5, 2009
    Messages:
    2,030
    Just in case the OP is not aware, in this case the drive has declared itself failed (smartctl is just reporting what the drive says, not predicting the failure.) In many cases, the drive will die or become unusable before SMART is anywhere near being tripped.
     
  16. OP
    OP
    GenoCyber

    GenoCyber Member

    Joined:
    Dec 27, 2001
    Messages:
    170
    Location:
    Sydney
    The further explanation and the smartctl example has been appreciated as it has been able to answer my question.
     

Share This Page

Advertisement: