Large Storage Build

Discussion in 'Storage & Backup' started by prod, May 9, 2015.

  1. prod

    prod Member

    Joined:
    Jul 7, 2002
    Messages:
    23
    Looking at purchasing some hardware for my FreeNAS Build. At this stage I know mostly what hardware I want except for Motherboard HBA and Powersupply

    Case: Norco RPC-4224 ($579)
    Case Upgrade: Norco 120mm Fan Wall Bracket & Norco OS Drive Bracket ($24)
    SAS to SAS Cable x 6 ($48)
    16GB SSD Drive $49
    Intel Quad Core Xeon CPU E3-1226v3 ($329)

    Will need to Purchase:
    some sort of HBA to handle 24 drives
    Motheboard
    ECC RAM that will work with Motherboard
    Some sort of Power supply
     
  2. frenchfries

    frenchfries Member

    Joined:
    Apr 5, 2013
    Messages:
    101
    You could go for an expander if you don't have enough slots for cards or aren't keen on spending big dollars on a high port count controller.

    If you go for a high port count controller be advised that the new ones use a new, smaller sas connector.
     
  3. frenchfries

    frenchfries Member

    Joined:
    Apr 5, 2013
    Messages:
    101
    What disks btw? Don't go for the smr drives. Until Zfs has the capability to handle a resilver well with them I would steer clear of them.
     
  4. trash

    trash Member

    Joined:
    Feb 6, 2005
    Messages:
    306
    Location:
    Darling Downs QLD
    PSU

    I did a rough power calculation on what you have mentioned - SSD+24HDDs+CPU+3x120mm fans, 2 raid cards - comes out near 550 watts.

    I would buy something around the 850w mark - gold if you can - just to give that little bit extra.
     
  5. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,593
    Location:
    Canberra
    ZFS can and does resilver SMR drives just fine - the way seagate has built them, there is no "special sauce" required - they don't take trim commands or anything like that.

    ZFS can't make a drive that has a raw write speed of 30-40MB/sec resilver any faster than its going to take.

    Remember the general rule of thumb for ZFS/FreeNAS. 8GB RAM for the OS, then 1GB of Ram for every TB of storage (more if you plan to use Dedupe)- you may find this makes an e3 Xeon platform not suitable.
     
    Last edited: May 11, 2015
  6. frenchfries

    frenchfries Member

    Joined:
    Apr 5, 2013
    Messages:
    101
    Actually it could but would require some special sauce on ZFS/MDs part.
    ie. The issue is that the drive uses the "cache" to store writes until it has enough to do a whole "band" This kills perf. because it has to buffer on the PMR portion and then once enough has been collected, and providing it does not have any incoming writes, it has to rewrite the SMR band, which actually means another read before the write leading to excessive seeking.

    They really need 20GB of SLC on these disks.

    However, if it was committed a whole "band" it would skip the cache and write directly to the SMR portion rather than the PMR cache.

    Ergo, if ZFS/MD collected writes and batched them in chunks that match the bands it would write directly and not have to double write every time killing throughput.

    It's kind of like write coalescing that NVRAM on a controller does.

    That idea is patent pending BTW, unless of course there is prior art for it :p
    Realistically this will come to MD on linux before anywhere else as they have already started investigating and MDs design leads to this sort of write coalescing particularly the changes committed for k4.1

    extra info

    There is a cool paper investigating the seagate disks using a high speed camera and window in the drive. https://www.usenix.org/system/files/conference/fast15/fast15-paper-aghayev.pdf
    Also, this is why host aware is a way better idea than drive managed SMR as then you know the geometry without resorting to reverse engineering or heavy testing to ascertain these aspects of the physical disk.
     
    Last edited: May 12, 2015
  7. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,593
    Location:
    Canberra
    I truly doubt openzfs is going to write an entire module just to handle an idiosyncracy of a 2nd gen smr drive from a single vendor.

    Even Seagate themselves aren't recommending these drives for raid use.

    There is a thread on the freenas forums about these drives and pretty much nothing is planned it. (here - https://forums.freenas.org/index.php?threads/seagate-8tb-archive-drive-in-freenas.27740/)

    *edit*

    Nexenta on SMR & ZFS - http://storageconference.us/2014/Presentations/Novak.pdf

    Namely, until the industry calms the fuck down and settles on a standard - not much will be done. "Enterprise" SMR drives aren't here yet - and so the desire to do work on what is likely to be a intermediate standard is probably very low. But yes, ZFS (and BTRFS, probably NetApp's WAFL too) are in the best position to actually deal with their new fangled magic.
     
    Last edited: May 12, 2015
  8. frenchfries

    frenchfries Member

    Joined:
    Apr 5, 2013
    Messages:
    101
    The three types of SMR disk are known today and work has already begun on how best to deal with each type.

    I was referring solely to resilver/rebuild efforts within multi disk arrays and attempting to explain that without a little magic the current disks perform badly.

    I was not referring to nominal use, which by all accounts seems ok.

    I actually think that MD will gain something like what I described quite quickly with BTRFS happing somewhat there after once they rebase their raid code properly on MD.

    ZFS will come much later IMO as it appears that everyone is afraid to touch the striping code.

    Hey, maybe everyone will transfer to NILFS2 or F2FS, which would probably be easier to adapt, due to them being log based. ;)

    Anyway, I'm not looking for an argument, I was merely stating that unless you want week+ long rebuilds on full disks then avoid the SMR drives.

    EDIT

    I just looked at your linked article. Good luck with ZFS as it claims the need for BP-rewrite for all the good bits of ZFS of which you are more likely to catch a unicorn-pegasus cross.
     
    Last edited: May 12, 2015
  9. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,593
    Location:
    Canberra
    Word is Seagate has been super present at recent ZFS events. If SMR is to take off in the enterprise (and for physical platter drives to survive, it has to at the rate Flash is dropping in price), then something will have to give.

    There is apparently 2 standards in the works - and if a standard comes out (along with Enteprise drives), ZFS will be there to support it - guaranteed.
     
  10. Dropbear

    Dropbear Member

    Joined:
    Jun 27, 2001
    Messages:
    9,720
    Location:
    Brisbane
    Only if you plan running dedup...

    ZFS does not have high RAM requirements all, without dedup.
     
  11. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,593
    Location:
    Canberra
    Bullshit.

    https://forums.freenas.org/index.ph...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/

     
    Last edited: May 21, 2015
  12. Butcher9_9

    Butcher9_9 Member

    Joined:
    Aug 5, 2006
    Messages:
    2,085
    Location:
    Perth , East Vic Park
  13. Dropbear

    Dropbear Member

    Joined:
    Jun 27, 2001
    Messages:
    9,720
    Location:
    Brisbane
    "For most home users just sharing some files and perhaps some plugins/jails, 16GB of RAM is an excellent place to start"


    jesus H... I cannot even begin to comment about how ridiculous that is ...

    I am running ZFS on a HP Microserver with 4GB of RAM using that shitty arsed processor and it fills a 1Gbps ethernet just fine.

    I think people misunderstand what "home server" means
     
  14. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,593
    Location:
    Canberra
    Dedupe is indeed far more ram.

    What you probably don't understand is what ZFS does with Ram. Ultimately it uses most of it as a massive Read Cache. If your array is say only good for 300-400MB/sec - you can still burst 10-20GB @ 800-900MB/sec using that Ram under certain circumstances.

    Dedupe goes beyond this. To dedupe, you need a database of how its going to do this. If the database isn't in Ram - it sucks. If you lose your Dedupe table - your data is toast.

    Dedupe on ZFS is there, and works perfectly - but it has significant performance costs. Given that people generally have no idea what they are doing with ZFS (typically using correct - or often incorrect - assumptions based on software/hardware raid), L2ARC, ZIL's and other aspects of it - and compression is basically free (and often just as, if not more effective than dedupe - depending on your datasets), its probably a good thing that people don't try to understand dedupe on ZFS.

    Except if you have 100TB, then thats an even bigger reason why you should be on ZFS.

    here's some videos on the topic.





    To summarise - ZFS doesn't trust a drive just because its "healthy".

    It stores a checksum table of your data, and periodically it will scrub your drives - ensuring that data that is stored on the drives is right - and if not, rebuild it so that it is. Outside of BTRFS (which currently only does mirrors anyway) - nothing else really does this. If a drive decides to silently corrupt your data - without tripping a fault on your raid controller, you have no real way of knowing.

    Did you miss the part where countless people have come to the freenas forums, having lost their pool and its come down to ram?

    I think you completely misunderstand that just because your thing works now - people who actually develop the fucking platform and do regression testing and have seen it crash and burn and die might actually know something about why it might not work that way for ever?
     
    Last edited: May 21, 2015
  15. davros123

    davros123 Member

    Joined:
    Jun 18, 2008
    Messages:
    2,930
    ...and yet those same smart people seem totally incapable of explaining why this happened or fixing it!
    Oh, I am far too smart and far too busy to write a blog post explaining this constantly asked questing...ahhhh...bullsit.
    The arrogance and stupidity of these pricks really pisses me off. It's as bad as all these f*ckwits who say oh, you should go get an ssd for a L2ARC\ZIL\slog\whatevercositmustbegoodasit'sanssd!

    It MAY be a FREENAS/FreeBSD issue (I doubt it), but it is most certainly NOT a zfs issue.
    zfs will run happily (like crap, but happily) in 1GB.

    4-8G is fine with 100+TB...you may see a slowdown in commercial high throughput/concurrency environments but not at home and to suggest 100TB needs 100GB is just absolute irresponsible bullshit. The 1GB/1TB rule is only for de-dupe as it cache's the tables! No one at home is going to run de-dupe! No one. And really if you are that stuck for disks at home that you need to you might as well run compression or just buy more disks - it's cheaper than trying to keep all those de-dupe tables in memory!

    EDIT: Here are the stat's from mine running 8GB and 30TB
    Code:
    root@nas:/cloud# mdb -k
    Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp zvpsm scsi_vhci zfs mpt sd 
    ip hook neti arp usba stmf stmf_sbd kssl sockfs md lofs random idm ipc nfs mpt_sas crypto fctl fcp cpc 
    smbsrv fcip sppp nsmb ufs logindmux ptm sdbc nsctl sv rdc ii pmcs lsc scu emlxs qlc ]
    > ::memstat
    Page Summary                 Pages             Bytes  %Tot
    ----------------- ----------------  ----------------  ----
    Kernel                      404669              1.5G   19%
    Guest                            0                 0    0%
    ZFS Metadata                 89135            348.1M    4%
    ZFS File Data              1342609              5.1G   64%
    Anon                        157987            617.1M    8%
    Exec and libs                 4033             15.7M    0%
    Page cache                   25584             99.9M    1%
    Free (cachelist)              8602             33.6M    0%
    Free (freelist)              64420            251.6M    3%
    Total                      2097039              7.9G
    
    oh, and these are the same people who also berate virtualisation but admit it's mostly because people do not do it correctly. (yes freenas performs poorly and is not optimised for a VM but that does not make it unstable).
     
    Last edited: May 21, 2015
  16. ae00711

    ae00711 Member

    Joined:
    Apr 9, 2013
    Messages:
    1,329
    link doesn't work.. plus this:

    "
    How Much RAM is needed?

    FreeNAS requires 8 GB of RAM for the base configuration. If you are using plugins and/or jails, 12 GB is a better starting point. There’s a lot of advice about how RAM hungry ZFS is, how it requires massive amounts of RAM, an oft quoted number is 1GB RAM per TB of storage. The reality is, it’s complicated. ZFS does require a base level of RAM to be stable, and the amount of RAM it needs to be stable does grow with the size of the storage. 8GB of RAM will get you through the 24TB range. Beyond that 16GB is a safer minimum, and once you get past 100TB of storage, 32GB is recommended. However, that’s just to satisfy the stability side of things. ZFS performance lives and dies by its caching. There are no good guidelines for how much cache a given storage size with a given number of simultaneous users will need. You can have a 2TB array with 3 users that needs 1GB of cache, and a 500TB array with 50 users that need 8GB of cache. Neither of those scenarios are likely, but they are possible. The optimal cache size for an array tends to increase with the size of the array, but outside of that guidance, the only thing we can recommend is to measure and observe as you go. FreeNAS includes tools in the GUI and the command line to see cache utilization. If your cache hit ratio is below 90%, you will see performance improvements by adding cache to the system in the form of RAM or SSD L2ARC (dedicated read cache devices in the pool)."

    from here:

    http://www.freenas.org/whats-new/20...design-part-i-purpose-and-best-practices.html
     
  17. davros123

    davros123 Member

    Joined:
    Jun 18, 2008
    Messages:
    2,930
    Yes, and that is utter bullshit with absolutely no supporting evidence other than a bunch of FreeNAS guys who refuse to do anything but quote their own citations....which, oh yes, they wrote! LOL!

    Go find me a SINGLE post or blog or well, anything from Oracle or SUN that even hints that this is true. Oh, that's right, they can not...because it is not true.

    Give me five minutes and I'll create a wiki that says oranges are purple. Tweet referencing this wiki and then refuse to explain it to people because they are clearly too stupid to read the wiki post.

    Whatever.
     
    Last edited: May 21, 2015
  18. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,593
    Location:
    Canberra
    Who gives a fuck what Oracle has said on the issue. Name a single thing that Oracle has contributed to the Open-ZFS project ever - or even to the available source of ZFS? My pool is still v28 like pretty much everyone elses (although I have all the current feature flags).

    When was Sun's changes actually relevant to ZFS? 2010? Thats 5 years ago.

    Since then it has predominantly been maintained by Illumnos and FreeBSD.

    Illumnos is predominantly funded by Nexenta - who have some pretty "not home user" orientated goals with the product. Fuck, Nexenta no longer really cares about anything other than pulling business away from Netapp, EMC, HDS, etc - they do not support SATA, at all, anymore..

    That leaves FreeBSD for the majority of it in the "not enterprise" space. IXsystems would no doubt make up a large chunk of that maintenance given that their product - TruNAS - leverages on FreeNAS.

    So yes. What FreeBSD/FreeNAS/TruNAS says on the topic matters - and given that you're running more than just "ZFS" on a FreeNAS system (you're running FreeBSD, Nginx, Samba, etc, etc, etc, etc), simply saying "well ZFS only needs 1GB of ram to operate and everything else is just cache" is stupid.
     
    Last edited: May 22, 2015
  19. Butcher9_9

    Butcher9_9 Member

    Joined:
    Aug 5, 2006
    Messages:
    2,085
    Location:
    Perth , East Vic Park
    I was not saying it would take that much Ram just pointing out how stupid the idea of 1GB per TB is with no context.

    The Article NSanity posted said you needed 8GB of Ram as a Base (I'm fine with that) but the 1GB per TB did not mention whether that was for stability or for features like Dedube ect.

    I understand the issues with hardware Raid and bit level corruption ect. The problem is I can't Run ZFS on the same box as my Windows machine (without some crazy VM environment that will just cause everything to Fail way more and be crazy complex) and I can only take one PC to a LAN so having 2 boxes is not an option.

    But generally a huge cache (say 90Gb GB in the case of 1TB per GB for my 100TB array/s, assuming some is used for the OS ect) is only required for performance not stablilty
     
  20. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,593
    Location:
    Canberra
    Dedupe recommendations are 5GB Ram/TB.

    Note one thing is that no-where mentions if these are Raw or Provisioned numbers either...

    So you'd rather risk 100TB having bit rot and other such failures?

    A large chunk of that 90GB would be Cache / Performance. The last thing solaris said about ZFS and memory was this - but again, this a long time in ZFS.
     

Share This Page

Advertisement: