Next gen filesystems (ZFS, BtrFS, ReFS, APFS, etc)

Discussion in 'Storage & Backup' started by elvis, May 20, 2016.

  1. GumbyNoTalent

    GumbyNoTalent Member

    Joined:
    Jan 8, 2003
    Messages:
    4,158
    Location:
    Briz Vegas
    :p most data guys I've meet look like they sleeping even when they working.
     
  2. Doc-of-FC

    Doc-of-FC Member

    Joined:
    Aug 30, 2001
    Messages:
    2,821
    Location:
    Canberra
    Code:
    time       read   hits   miss   hit%   l2read   l2hits   l2miss   l2hit%   arcsz   l2size
    10:42:56   1.1K    465    659     41      659      601       58       91    167G     676G
    10:43:00   1.6K     94   1.5K      6     1.5K      134     1.3K        9    167G     676G
    10:43:01   1.0K     89    941      8      941      283      658       30    167G     676G
    10:43:09   1.4K    111   1.3K      7     1.3K      168     1.1K       12    167G     676G
    12:21:20   1.3K    696    562     55      562      108      454       19    167G     676G
    13:31:44   1.5K   1.1K    335     77      335      309       26       92    167G     676G
    
    grabbed some moments where >100 hits came from L2ARC

    10:42:56 shows 1,124 requests came in within 1 second, 40% served from ram, 53% served from L2ARC and 5% served from disk.

    granted L2 ARC comes at a cost, there's no free lunches.
    Code:
    L2 ARC Summary: (HEALTHY)
            Passed Headroom:                        45.43m
            Tried Lock Failures:                    4.76m
            IO In Progress:                         0
            Low Memory Aborts:                      550
            Free on Write:                          349.40k
            Writes While Full:                      191.34k
            R/W Clashes:                            0
            Bad Checksums:                          0
            IO Errors:                              0
            SPA Mismatch:                           15.17b
    
    L2 ARC Size: (Adaptive)                         676.71  GiB
            Header Size:                    0.08%   571.56  MiB
    
    L2 ARC Evicts:
            Lock Retries:                           6085
            Upon Reading:                           0
    
    L2 ARC Breakdown:                               461.53m
            Hit Ratio:                      10.29%  47.49m
            Miss Ratio:                     89.71%  414.03m
            Feeds:                                  11.53m
    
    L2 ARC Buffer:
            Bytes Scanned:                          1.65    PiB
            Buffer Iterations:                      11.53m
            List Iterations:                        46.04m
            NULL List Iterations:                   169
    
    L2 ARC Writes:
            Writes Sent:                    100.00% 769.85k
                                                                    Page:  4
    ------------------------------------------------------------------------
    
    571MiB is a small price to pay for a 676GiB L2ARC
     
  3. wintermute000

    wintermute000 Member

    Joined:
    Jan 23, 2011
    Messages:
    915
    So if you run zfs on vanilla Linux acting as a regular server ue not dedicated storage, can you control how much RAM zfs uses? Can it dynamically change on demand?
     
  4. Doc-of-FC

    Doc-of-FC Member

    Joined:
    Aug 30, 2001
    Messages:
    2,821
    Location:
    Canberra
    /etc/modprobe.d/zfs.conf
    options zfs zfs_arc_max=25769803776

    max ARC I don't think is on demand tunable
     
  5. wintermute000

    wintermute000 Member

    Joined:
    Jan 23, 2011
    Messages:
    915
    Thanks.
    Waiting for either coffee lake e3 xeons or for ryzen Linux stability to be sorted in mainline Ubuntu/centos
     
  6. davros123

    davros123 Member

    Joined:
    Jun 18, 2008
    Messages:
    2,829
    Zfs will use available ram.It will release ram as other apps need it. So, it’s a win win. If some app. needs some ram, zfs will release it to it. So fear not. It’s not some ram eating monster and there is no need to reserve or limit ram.Chillax , it’s all good.
     
    Last edited: Nov 5, 2017
  7. theSeekerr

    theSeekerr Member

    Joined:
    Jan 19, 2010
    Messages:
    2,536
    Location:
    Prospect SA
    Thanks autocorrect, for that entirely intelligible post!
     
  8. davros123

    davros123 Member

    Joined:
    Jun 18, 2008
    Messages:
    2,829
    i hate frigging autocorrect! Fixed now
     
  9. choppa

    choppa Member

    Joined:
    Jan 1, 2002
    Messages:
    2,479
    Location:
    Asia-Pacific
    New build for a friend. 12TB Seagate Ironwolf drives x5. Going to set up 5-disk RaidZ with ashift=12, compression=lz4, dedup=off. No ECC ram, just 16GB in a microserver. Boots off USB.

    Backup is another system in another building in a rack that powers up once a month with 12x 8TB drives in RaidZ3 + hot spare (8+3+1) config. He's content that the backup is "good enough", I told him to get another 4TB portable USB for documents that are critically important so I've let it go.

    Any ideas on what recordsize I should set (most of his files are raw photos ~40-100mb in size and many GB videos)? I'm leaning towards the larger the better due to most (80%) of files being >1MB in size?
     
  10. OP
    OP
    elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    30,143
    Location:
    Brisbane
    Yup, at that file size, the bigger, the better.
     
  11. HobartTas

    HobartTas Member

    Joined:
    Jun 22, 2006
    Messages:
    574
    Go with the maximum recordsize available which should be 1MB.
     
  12. waltermitty

    waltermitty Member

    Joined:
    Feb 19, 2016
    Messages:
    166
  13. grs1961

    grs1961 Member

    Joined:
    Jan 21, 2005
    Messages:
    477
    Location:
    Melbourne
    User-mode NFS will run like a dog, what's wrong with the NFS that came with your UNIX or Linux system?
     
  14. waltermitty

    waltermitty Member

    Joined:
    Feb 19, 2016
    Messages:
    166
    Topping out at 500mb/s, the drives can read/write a lot faster. I know there's probably some tuning I could do but parallel nfs seemed like a silver bullet.
     
  15. Doc-of-FC

    Doc-of-FC Member

    Joined:
    Aug 30, 2001
    Messages:
    2,821
    Location:
    Canberra
    Although it won't crash your machine when it attempts to write to an uninitialised / already in use memory address like kernel mode services would.

    protocol development and stabilisation is far easier when it's within userland, it saves you doing a kernel stacktrace, modify code, recompile and re-load the kernel module.

    I'm not by any means suggesting this won't need to be done, it's more about getting a standards based and supported capability to migrate to a kernel module.

    pNFS is the panacea for distributed storage ;)

    With 2 metadata servers and X backing servers most of the NFS / ZFS / modern filesystem design flaws are abstracted nicely.

    Todays NFS (v3) is from a SUN standard circa 1995, when servers were single boxes and scaling meant adding more hard drives.

    Todays need for NFS is far beyond what v3 can provide, funnelling all of your requests to a single processing node to handle storage I/O is ludicrous. For some people they need parallel storage at 10 users, for others it's in the thousands.
     
  16. choppa

    choppa Member

    Joined:
    Jan 1, 2002
    Messages:
    2,479
    Location:
    Asia-Pacific
    Bloody hell, doesn't matter what I try to do, the ZFS partition only wants to write at ~10MB/s. Write caching on, write caching off, doesn't matter. Using dd or copying over network or from a USB flash stick (200MB/s read) I get smashed down to ~10MB/s write. Individually the drives (using Disk utility in Ubuntu 16.04.3 desktop) on same hardware benchmark at ~180-190 MB/s avg. write each.

    I have no freaking clue what is causing the bottleneck, as creating an mdadm raid5 also only writes at 10MB/s. I know the CPU can't be it because for the 5 drives in the pre-build (5x4TB Seagate NAS drives) this was running at >200MB/s write, and with mdadm the CPU stays under 20%. 1 core goes to 100% with ZFS. The only thing I've changed is flashed from v41 of BIOS to the "modified" v41 BIOS which unlocks port #5 to be full speed SATA2 rather than SATA1. Oh and installed 16.04.3 instead of the original 12.04 that it was running.

    Pulling my hair out here...
     
  17. fredhoon

    fredhoon Member

    Joined:
    Jun 27, 2003
    Messages:
    2,130
    Location:
    Brisbane
    Since both ZFS and mdadm have the same symptoms then you'll need to troubleshoot the bios and OS changes.

    Was the pre-build running ZFS under 12.04? The easiest thing to test is rolling back the bios (although this shouldn't be an issue if individual drives including SATA5 are full speed).
     
  18. Doc-of-FC

    Doc-of-FC Member

    Joined:
    Aug 30, 2001
    Messages:
    2,821
    Location:
    Canberra
    FreeBSD / ZFSoL ?

    iostat -xd show anything interesting, drive ms_w times, busy, q_len?

    are you forcing sync writes either in the bios or loader / sysctl phase ?
     
  19. wintermute000

    wintermute000 Member

    Joined:
    Jan 23, 2011
    Messages:
    915
    just setup a zfs pool in ubuntu. Holy sh1t it was easier than I thought. It even mounted automagically without fstab intervention or needing to fdisk partitions, wtf. Limited the ARC to 8Gb (16Tb array in RAIDZ... yeah I like to live dangerously, + apparently RAIDz expansion is coming).

    Is the 'correct' way to setup snapshots and scrubs just using normal cron jobs?
     
  20. waltermitty

    waltermitty Member

    Joined:
    Feb 19, 2016
    Messages:
    166
    I use snapper on btrfs, no idea if it plays nice with ZFS snapshots but it's a good tool with flexible config
     

Share This Page