Overclockers Australia Forums
OCAU News - Wiki - QuickLinks - Pix - Sponsors  

Go Back   Overclockers Australia Forums > Specific Hardware Topics > Storage & Backup

Notices


Sign up for a free OCAU account and this ad will go away!
Search our forums with Google:
Reply
 
Thread Tools
Old 25th June 2012, 10:00 PM   #1
W0MB13 Thread Starter
Member
 
W0MB13's Avatar
 
Join Date: Apr 2006
Location: Sydney
Posts: 1,169
Default Help! FreeNAS 8 ZFS status DEGRADED!

My original build thread here
http://forums.overclockers.com.au/sh....php?t=957083&

6x2TB Hitachi drives in RAID-Z2


I fired up my ZFS box to backup some files the other night, and I could not get it to go

I troubleshooted for a bit, and it seems to be a possibly dodgy video card causing the issue. For some reason with this particular motherboard, if there is no video card present, the PC will not actually boot properly in the background past a certain point. I took the card out, put it back in. Voila! Working again. I was about 8 BIOS versions behind, so I decided I'd do an upgrade. This seemed to fix problems I had with the USB auto-booting, so all good as far as I'm concerned.

Now...

I'm not sure if this is just PURE COINCIDENCE... but once I booted into FreeNAS (embedded), I was greeted with a nasty ZFS pool = DEGRADED..

It looks like one of the drives is "unavailable". I had a look through the serial numbers that FreeNAS could see, and cancelled them all out until I was left with one missing drive/serial number. I've physically marked the drive with a paint marker so I know which one it is.

It seems a bit odd to me that this has just suddenly happened. I thought for a moment it might be a problem with one of the onboard SATA controllers, however there is another drive on the controller that is working fine. (6 ports in total, 4 on one controller, 2 on another). I tried bringing the unavail drive online using "zpool online mypoolsname weirdlongnumbershowingup", doesn't seem to work.

Anyone have any ideas? further steps/checks I can do? Does it sound like the drive is simply dead and needs to be swapped out? or perhaps something is bugging out?

Your opinions and help greatly appreciated


Click to view full size!


Click to view full size!


Click to view full size!
__________________
Last.fm

Last edited by W0MB13; 25th June 2012 at 10:02 PM.
W0MB13 is offline   Reply With Quote

Join OCAU to remove this ad!
Old 26th June 2012, 6:25 AM   #2
HobartTas
Member
 
Join Date: Jun 2006
Posts: 192
Default

Greetings

Quote:
Originally Posted by W0MB13 View Post
I'm not sure if this is just PURE COINCIDENCE... but once I booted into FreeNAS (embedded), I was greeted with a nasty ZFS pool = DEGRADED..
I doubt it happened at the same time, I'm guessing it happened a while ago and you only just got around to noticing, I had a similar problem and Solaris 11 didn't notify me either that it was degraded and I just noticed it when I went to do some maintenance (scrubs/snapshops etc).

Anyway, I had a similar problem and concluded I either had a dodgy cable or it wasn't seated properly and after a sufficient number of I/O errors ZFS offlined the drive.

There are just 2 possibilities

(1) The hard drive is actually dead or useless,

does it no longer appear in the BIOS?

what happens when you swap sata ports?

what happens when you swap sata cables?

what happens if you attach the drive to a different PC?

what happens if you put it into an external USB case, does the PC recognise its there, obviously it won't recognise the partition but disk management in admin tools should still pick up that the drive exists.

can you download some bootable diagnostic diskette/cdrom from the manufacturers website or something similar to do a NON-DESTRUCTIVE test of the hard drive e.g. I use ESTOOL for my Samsung drives, does it pass or fail?

(2) If hardware-wise its physically OK then it's probably the scenario I outlined above

In my case I would zpool clear the error, does your version allow you to do the same?

http://docs.oracle.com/cd/E19253-01/...zge/index.html

After I did this then with some more usage the drive would have errors and would get offlined again, I can't remember exactly whether it was "offline" or "unavailable", all I know is that I would clear the error and it would re-occur within a short period of time again.

In your case if you get it back online then put it through its paces and do something like a scrub, fortunately ZFS time/date stamps all reads/writes so re-silvering is very quick as all it needs to do is write the changes since it went offline which I presume was not all that long ago.

If it does go offline again then

(a) try a different cable and make sure you've inserted it properly, if this doesn't work then

(b) export the pool and change the ports around and then re-import the pool, it will re-detect the drives in the new positions.

(c) does the same drive go offline again with a different port and different cable? if it does then its more likely that its physically faulty. If its now a different drive then its most likely the port (or another dodgy cable).

In my case the problem did not re-occur so I concluded I didn't put the cable on the connector properly. I also had an additional problem at the time in that for what ever reason ZFS also took a second drive offline at the time (maybe it could have been dropping bits on the transfer and was spoofing the second drive! who knows for sure what the hell was going on?)

It's not a fun feeling to initially think you have two dead hard drives on your Raid-Z2 array and that your only one more drive away from complete disaster.

Anyway removing the first problem drive from the system meant that the second one would resume working normally again so after I fixed the problem with the first drive as I described above then everything was working again.

Since now that the quality of the drives is a bit more suspect given they can have warranties as low as a year and also there's not a lot to pick from I'm thinking the next array I'm going to do will most likely be a Raid-Z3 with cheaper (green/5400 RPM) drives rather than a Raid-Z2 with the more expensive enterprise drives. If I was going to get a 24 bay case and completely fill it with drives I'd rather do one big Raid-Z4 or Raid-Z5 array if such a thing existed.

Cheers
HobartTas is offline   Reply With Quote
Old 26th June 2012, 9:37 AM   #3
W0MB13 Thread Starter
Member
 
W0MB13's Avatar
 
Join Date: Apr 2006
Location: Sydney
Posts: 1,169
Default

None of the SATA cabling has been touched since initial install, so seems a bit odd if it is a dud cable, I've never really experienced such a thing before though.

I'll try a different cable and also the ZFS error clearing tonight, and see how I go. Seems unlikely to me that the drive is actually dead, as the pool has been totally fine until last night with this one drive being unavailable. Note that I don't run this box often, goes online for maybe an hour once every two months. Could just be wishful thinking though.
__________________
Last.fm
W0MB13 is offline   Reply With Quote
Old 26th June 2012, 2:28 PM   #4
davros123
Member
 
Join Date: Jun 2008
Posts: 2,245
Default

not sure why it's happened...

however, you might also try a zpool export and azpool import to see if that will bring it back online....perhaps a reboot in between export and import.

I get some weird thing on my esxi server and my LSI card...must look into it some time...but I assume it's a dud port! cable/drive...
__________________
Want a nas, you may find my Esxi/Solaris ZFS NAS build thread of interest.
Quote:
Originally Posted by Stanza View Post
yeah well I just reported my own post...ferk....
Quote:
Originally Posted by Blinky View Post
If you have become content with the size of your e-penis, sticking clear of rack mounted stuff will save you heaps of $$$.
davros123 is offline   Reply With Quote
Old 26th June 2012, 2:46 PM   #5
BBITS
Member
 
BBITS's Avatar
 
Join Date: Oct 2007
Location: Brisbane Southside
Posts: 668
Default

You say you changed a video card, perhaps you nudged a cable?
BBITS is offline   Reply With Quote
Old 26th June 2012, 7:47 PM   #6
W0MB13 Thread Starter
Member
 
W0MB13's Avatar
 
Join Date: Apr 2006
Location: Sydney
Posts: 1,169
Default

Thanks for all the suggestions guys.

Oddly enough (to me at least) the drive is totally dead. Isn't seen in the bios by the server. I put it into my main PC and the onboard SATA controller sits there for a very brief moment, and does not detect the drive.

RMA time I guess

Having only dealt with Samsung HDD RMA in the past (who were EXCELLENT/SO FAST), this could be interesting.

My question now is... I've taken this drive out. When I put a new drive in the server in it's place, will Freenas/ZFS by design start rebuilding automatically? or does the replace command somehow still apply? I assumed it only would in hot-swap situations.
__________________
Last.fm
W0MB13 is offline   Reply With Quote
Old 27th June 2012, 9:26 AM   #7
sreg0r
Member
 
sreg0r's Avatar
 
Join Date: Jul 2001
Location: Melbourne
Posts: 1,100
Default

Not sure if the FreeNAS GUI provides any options but you just need to run a 'zpool replace', the drive will resilver and you'll be good to go.

http://docs.oracle.com/cd/E19253-01/...cet/index.html
sreg0r is offline   Reply With Quote
Reply

Bookmarks

Sign up for a free OCAU account and this ad will go away!

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +10. The time now is 3:56 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd. -
OCAU is not responsible for the content of individual messages posted by others.
Other content copyright Overclockers Australia.
OCAU is hosted by Internode!