EMC CX4-120 frequent HDD failure. Why?

Discussion in 'Business & Enterprise Computing' started by Multiplexer, Apr 5, 2013.

  1. Multiplexer

    Multiplexer Member

    Joined:
    Feb 26, 2002
    Messages:
    2,127
    Location:
    Home
    Anyone know the possible cause for frequent HDD failure? I am not even sure would it be even consider frequent or simply just the norm. But so far we already have 2 HDD replaced in 2013.

    The SAN is a
    * EMC CX4-120
    * 26 HDD
    * 12 LUN
    * 3 Storage group
    * 10 LUN is being used for database server. So very intense read and write
    * Disk is Seagate STE45085 CLAR450
    * About 3 to 4 years old

    My guess is the database intense read and write activities. Any input is appreciated.
     
  2. Iceman

    Iceman Member

    Joined:
    Jun 27, 2001
    Messages:
    6,647
    Location:
    Brisbane (nth), Australia
    Drives are 3-4 years old and SAN's will "fail" out a drive at the first sign of trouble - your data is worth way more than the drive.
     
  3. Gecko

    Gecko Member

    Joined:
    Jul 3, 2004
    Messages:
    2,715
    Location:
    Sydney
    I agree with this.

    Another thing to check would be the PSUs, but having 3-4 year old drives is most likely the main problem.
     
  4. Jai

    Jai Member

    Joined:
    Jun 27, 2001
    Messages:
    742
    Clearly the 1.3m hours MTBF means nothing :) 148 years between failure... hello consumer please enjoy 0.003% of that!
     
  5. VR4hore

    VR4hore Member

    Joined:
    Sep 8, 2001
    Messages:
    261
    Location:
    Brisbane
    wellll, maybe not the 'first' sign... but fairly close to it.

    Seems about normal to me anyway.
     
  6. CordlezToaster

    CordlezToaster Member

    Joined:
    Nov 3, 2006
    Messages:
    4,080
    Location:
    Melbourne
    You clearly have not had the joy of using a dell equallogic ps4000 with sata harddrives.

    As stated that would be expected after 3-4 years.
     
  7. Myst

    Myst Member

    Joined:
    Feb 26, 2004
    Messages:
    1,350
    Location:
    Hobart, Tasmania
    Hi fellow CX4-120 administrator!

    We've had about 3 drives fail in the 3 years we've had 2 x CX4-120's. 3 disk shelves all fully populated. I think 2 x Sata and 1 fibre channel have carked it.

    Ours were under warranty though I believe, can't remember how many years they ship with?
     
  8. gwills

    gwills Member

    Joined:
    Jan 14, 2005
    Messages:
    410
    Location:
    Melbourne
    Id say its pretty normal ,

    I look after a few Netapp systems and find that certain types of drives seem to fail more often than others

    15 K FC Disks I think every one of them has been replaced 14/14

    15 K 144GB SAS None 0/80

    7.2 K 1TB sata probably about 15 / 48

    All in the space of about 4 years.
     
  9. OP
    OP
    Multiplexer

    Multiplexer Member

    Joined:
    Feb 26, 2002
    Messages:
    2,127
    Location:
    Home
    I suspect the HDD is getting old as well and been hammer by the database just speed up the process and reduce the life span. But how to prove that is that is the case with my manager?

    Also, what is the standard practice for situation like this? Replace all HDD? Hard to communicate that to management $$$.
     
  10. Iceman

    Iceman Member

    Joined:
    Jun 27, 2001
    Messages:
    6,647
    Location:
    Brisbane (nth), Australia
    You continue to use the SAN and the vendor continues to replace the disks as they fail under your maintenance contract?
     
  11. obi

    obi Member

    Joined:
    Oct 16, 2004
    Messages:
    127
    You forgot option two: cry into scotch because you were too stingy getting the maintenance contract renewed.
     
  12. OP
    OP
    Multiplexer

    Multiplexer Member

    Joined:
    Feb 26, 2002
    Messages:
    2,127
    Location:
    Home
    No option two, The SAN is under warranty and we do organise Dell to come out and replace HDD. Was hopping to mitigate HDD failure, because we dont want to go to data center every 2 month.
     
  13. Renza

    Renza Member

    Joined:
    Dec 1, 2004
    Messages:
    4,929
    Location:
    Melbourne
    you obviously dont know how MTBF works :rolleyes:
     
  14. aza2001

    aza2001 Member

    Joined:
    Sep 14, 2002
    Messages:
    2,016
    Location:
    Northmead
    it could be a bunch of things:

    vibrations caused by other devices or the storage itself
    dirty power
    power supplies on the way out
    age of devices


    either purchase extended warranty, extra drives or replace the array

    would also make sure your backups are intact :)
     
  15. mr.ilford

    mr.ilford Member

    Joined:
    Dec 26, 2007
    Messages:
    101
    Location:
    At work
    As previously stated by a number of people, this isn't too unusual. As long as you have maintenance and hot spares, no big deal. If you don't have hot spares, why not?

    A previous employer of mine had 3-4 disk failures a week. But we had a lot of disks. Got to know the EMC tech very well
     
  16. GTiRolla

    GTiRolla Member

    Joined:
    Apr 17, 2007
    Messages:
    361
    Location:
    Canberra :(
    Only 2 failures is a blessing, I had to replace 8 disks from over the Easter break alone. Usually get 1-2 disks fail a week between EMC/NetApp gear we use.

    Did you see the EMC Advisory regarding certain 15k RPM Drives? - Its last revision we got was after 27th Mar 2013

    Have a look at EMC Advisory ID: emc254763

    Problem: As detailed in the earlier ETA released by EMC, EMC quality management teams have determined that certain 15K RPM, 73GB, 146GB, 300GB, 450GB and 600GB disk drives may experience increased replacement rates.
    Affected customers may experience service events related to core software upgrades and drive replacements.

    You'll need an EMC account to access the advisory (which I'd imagine you'd already have?)
     
  17. OP
    OP
    Multiplexer

    Multiplexer Member

    Joined:
    Feb 26, 2002
    Messages:
    2,127
    Location:
    Home
  18. GTiRolla

    GTiRolla Member

    Joined:
    Apr 17, 2007
    Messages:
    361
    Location:
    Canberra :(

    Wrong KB - Try this one :)
     

Share This Page

Advertisement: