Saving a zfs pool when you try resilvering the wrong drive

Discussion in 'Storage & Backup' started by blackied, Aug 3, 2017.

  1. blackied

    blackied Member

    Joined:
    Sep 10, 2008
    Messages:
    64
    Ok, so here's my situation. I have a esxi server with a vm running an old version of omnios and napp-it. This has a 1015 passed through. On the 1015 are 8 drives in one pool.
    1 raidz1 of 4x4tb.
    1 raidz1 of 4x2tb.

    One of the 4tb drives started erroring out and dropping the pool, so I bought a drive to replace it. However in a moment of brilliance, I swapped in and started resilvering the wrong drive. In theory the resilver has now completed, but the array is still degraded since it's saying there are too many errors.

    What I'm about to try is unplug one of the drives from the 2tb array and plug the swapped out drive into that, which will degrade the 2tb array. I'm hoping I can then redetect the first drive, and get the replacement to replace the actual faulty drive instead. I'll then pull out the faulty disk and plug the 2tb back in.

    Is this crazy enough to work?
     
  2. OP
    OP
    blackied

    blackied Member

    Joined:
    Sep 10, 2008
    Messages:
    64
    Ok, I've plugged the drive back in, and it looks like the resilver has started again on it's own from the beginning. I'm hoping that there is enough parity between the 5 drives to successfully complete the replace, and I can then use that drive to replace the actual faulty drive. The 2tb array is so far behaving despite losing a drive.
     
  3. gea

    gea Member

    Joined:
    May 22, 2011
    Messages:
    210
    Unless a disk fails completely in a degraded Z1 vdev, ZFS behaves extremely uncritical. On a conventional Raid-5 any error in a degraded array will stop the rebuild as a Raid-5 knows only the raid-stripes not the data.

    A ZFS Z1 knows all data as it is softwareraid and it knows if a datablock is corrupted due checksums and it knows the file where it belongs to.

    So additional errors ex due bad blocks even on a degraded Z1 will not lead to a pool lost but a damaged file.

    If a second disk in a degraded Z1 pool is removed this will lead to an offline state. If the disk comes back, you are again in a degraded state with data available.

    What you should prepare is some sort of a disk map (bay number with id and serial) to be able to identify a failed disk properly.
     

Share This Page

Advertisement: