1. If you're receiving a message that you are banned from the Current Events or Politics forums, it's not you specifically: those forums have been hidden for all users. For more info, see here.
    Dismiss Notice

ZFS Gurus - advice needed.

Discussion in 'Storage & Backup' started by stewpot, May 4, 2015.

  1. stewpot

    stewpot Member

    Joined:
    Sep 2, 2001
    Messages:
    1,308
    Location:
    NW Tas
    Hi all,
    I've got an HP Microserver running FreeNAS with 5 * 2 TB consumer grade drives in RAIDZ2 mode.

    One of these drives became flaky. I was getting lots of checksum errors and ATA timeouts, leading to poor performance and long pauses. I cold swapped the flaky drive for another unit, and this is where I am now:

    Immediately after the replacement, another disk is now exhibiting exactly the same flakiness.
    Code:
    [root@freenas ~]# zpool status                                                                                                      
      pool: bucket                                                                                                                      
     state: DEGRADED                                                                                                                    
    status: One or more devices could not be opened.  Sufficient replicas exist for                                                     
            the pool to continue functioning in a degraded state.                                                                       
    action: Attach the missing device and online it using 'zpool online'.                                                               
       see: http://illumos.org/msg/ZFS-8000-2Q                                                                                          
      scan: scrub in progress since Sun May  3 00:00:05 2015                                                                            
            4.83T scanned out of 5.92T at 24.5M/s, 12h56m to go                                                                         
            436M repaired, 81.61% done                                                                                                  
    config:                                                                                                                             
                                                                                                                                        
            NAME                                            STATE     READ WRITE CKSUM                                                  
            bucket                                          DEGRADED     0     0     0                                                  
              raidz2-0                                      DEGRADED     0     0     0                                                  
                gptid/4d86db8d-447f-11e3-9669-3cd92b0cfbd1  ONLINE       0     0     0                                                  
                gptid/4ed362e1-447f-11e3-9669-3cd92b0cfbd1  ONLINE       0     0    70  (repairing)                                     
                8093132496243193286                         UNAVAIL      0     0     0  was /dev/gptid/503868c4-447f-11e3-9669-3cd92b0cf
    bd1                                                                                                                                 
                gptid/512fb8d7-447f-11e3-9669-3cd92b0cfbd1  ONLINE       0     0     0                                                  
                gptid/52ae9659-447f-11e3-9669-3cd92b0cfbd1  ONLINE       0     0     0                                                  
                                                                                                                                        
    errors: No known data errors
    The first problem is that I can't utilise the new drive.
    Code:
    [root@freenas ~]# zpool replace bucket 8093132496243193286 ada2                                                                     
    cannot replace 8093132496243193286 with ada2: no such pool or dataset                 
    This worked OK on my other, almost identical unit, so I presume it's barfing because the other disk is being repaired.

    I've got about 3.5TB of data on this machine, and I'm trying to rsync it off ASAP, but with the ATA timeouts, it's running like a one legged dog, about 4Mbytes/sec, so it'll take about 2 weeks to finish.

    My current thinking is one of two alternatives:

    1. Don't touch it. Leave it to limp along for 2 weeks and hope no other drives fail.

    2. zpool remove the sick drive, and hope that this will bring the volume up to speed so I can whip everything off quickly.

    Can I get thoughts on this?
     
  2. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    18,380
    Location:
    Brisbane
    The problem is you have a scrub running when you should have a resilver. The scrub whilst you're in a degraded state is going to literally take forever - because the scrub trying to check that you have no data corruption, but before it can do that, it has to build the array to what it should look like.

    You can tweak the scrub and resilvering speed (vs Pool performance speed) via ZFS tunables.

    Which ones and how specifically to FreeNAS, I couldn't tell you.

    Additionally, I'm not sure if you *should* abort the scrub so that you can resilver and bring the array to fault tolerant state - THEN scrub.

    What i've done in the past with Nexenta is to set the tunables "live" but not change the start up values, then post resilver, reboot the storage host.
     
  3. OP
    OP
    stewpot

    stewpot Member

    Joined:
    Sep 2, 2001
    Messages:
    1,308
    Location:
    NW Tas
    Ah! Didn't spot that it was mid-scrub, thanks for the help.

    Anyway the scrub completed, and I was able to get the data off. Win!
     
  4. frenchfries

    frenchfries Member

    Joined:
    Apr 5, 2013
    Messages:
    101
    If The replacement exhibits the same behaviour I would look at the Sata/sas cables.
     
  5. gr8bob

    gr8bob Member

    Joined:
    May 12, 2009
    Messages:
    130
    x2

    ATA/CRC errors usually is an indication of an issue with the cable/link interface from experience.
     
  6. OP
    OP
    stewpot

    stewpot Member

    Joined:
    Sep 2, 2001
    Messages:
    1,308
    Location:
    NW Tas
    Thanks guys, will look into it. It struck me as a bit too much of a coincidence that the second drive flaked as soon as the first was replaced - thought it might be an external cause.
     

Share This Page

Advertisement: