is it easy enough to setup in ubuntu ? platform would be a G8 microserver 8g ecc with 4x 3tb hdd and a 250gig ssd for a cache thanks, i'll go do some reading and think
I would think so. You can at least do things like set your metadata to "dup" to get a little extra goodness out of it.
Agreed. L2ARC's aren't really required at all if all you want to do is flood 1G of random, large data files (e.g movies etc) for a handful of clients.
Not really. L2ARC is just going to cache stuff that doesn't land in ram. Its *read* only - and so if the data isn't hot, you're not going to notice it. On a VM host, I see about a 5% hit rate- vs 60% hit rate on my ARC.
Interesting, but certainly seems to make sense. I wonder if it would be feasible to develop a tool that tested the dedupe performance, and/or other performance metrics to provide some indication of whether the RAM allocated is under and possibly by how much? Say you had a zpool of 15TB that contained a lot of VMs. There's going to be a lot of dupes there, but perhaps only a handful of VMs will be live. I'm guessing cache would be very important to performance tuning in that scenario?
BtrFS checksums everthing (data and metadata). Checksumming something doesn't let you fix it when it's broken. It merely let's you identify a fault. Keeping a second copy allows the filesystem to recover from faults. BtrFS has two modes of duplicating things. "Dup" allows a second copy to live on the same volume (useful for mdadm users) and "RAID1" mandates the second copy must live on a separate volume. You can tell BtrFS to treat data and metadata differently. Telling your data to dup will use a lot of disk space, but telling metadata to dup doesn't waste a whole lot of space, and gives you a bit of extra safety. On top of that, BtrFS offers LZO compression, which is likely to save whatever space you lost to metadata dup for negligible CPU overhead.
DO NOT EVER USE DEDUPE ON HIGH-IO REQUIREMENT SITUATIONS De-dupe is, at its most basic level, a trade off of iops for space. VM's - by definition are high-iops requirements (compared to just shifting files). ZFS specific De-dupe is a pretty average implementation - and comes with some pretty crazy ram requirements. This on top of the fact that serving VM's effectively from ZFS requires some pretty specific pool design and ram requirements to do effectively - means that its almost certainly a no-no for that implementation - unless you have literally 256GB+ ram, and maybe an All-Flash array.
Thanks for that information. Now to tend to my bleeding eyeballs. Is this the same for all dedupe? What about VSAN/EMC etc?
I can't speak for others, but BtrFS dedup is done via a manual "one shot" command. You can tell it to scan the filesystem and store the hash values either in RAM (default) or in an SQLite3 DB via an option. My crappy old home fileserver with a mere 8GB of RAM can happily dedup with minimal RAM usage. As for high IOPS setups, BtrFS is a bit crappy for that at the moment anyway, so the point is kind of moot. It's much better suited as a generic file store.
Fundamentally - it *has* to be the same. Think of the difference between Dedupe and compression. Lossless Compression has all the information to reassemble the file right there - realistically its a function of CPU time, which with CPU's as ludicrous as they are (and that file access and decompression is easily paralleled) its a non issue. The *very* nature of dedupe means that; 1. There needs to be a table somewhere mapping where all the data is for each reference point. (yes, ram - or at least flash). 2. This table merely provides a point of reference to rebuild that data In a way, De-Dupe is "compression" for an entire volume. Where as compression is for a single object. The quickest way to increase space savings is to shrink your de-dupe block size - which makes your de-dupe table larger, but also puts more potential load (in some circumstances - such as VM's) onto a particular point on a disk. If you have 500 Windows VM's (or linux, who cares) all referencing the same blocks - but all need to be re-assembled from a dedupe table - and you have a Boot Storm, watch your NAS/SAN crumble. Microsoft binned Single Instance Storage in Ex2007 for a reason - Disk is cheap. IOPS are expensive. Dedupe makes the most sense for things like Backups - so long as you can guarantee your dedupe table (remember, without it - you're fucked) - because its infrequently accessed data. File servers too (but you're less likely to see as great savings) - because realistically, File Sharing is typically fairly boring and easy - especially if you have hot data caching on a faster tier.
I'm not sure how BtrFS does it, to be honest. It uses COW+Reflink, and I don't think it maintains a dedup table once the deduplication is done. https://lwn.net/Articles/331808/ http://www.pixelbeat.org/docs/unix_links.html As I understand it, a reflink is similar to a hard link, but (ab)uses COW to allow future changes to maintain separate metadata for that block. I have a feeling that data doesn't need a table of sorts, and rather just uses the native inode semantics to keep a track of things. Again, BtrFS is terrible for hosting large disk images, databases and can't do the equivalent of "zvol" volumes, so testing VM performance is kind of moot. I've not noticed any IOPs loss when deduping BtrFS file systems I've been using, but again they're all general purpose file stores, so I'm not entirely sure. I could be 100% wrong of course. I've not read deeply into it at all, specific to BtrFS.
Microsoft Exchange went to a Google like approach. No underlying RAID system, just lots of spindles and lots of copies. I think the internal install for MS has 16 copies across their 16 server DAG groups split between their three data centres. Lots of JBOD. The software defined approach for file systems is removing the need for expensive servers and RAID cards.
I'd be surprised if it wasn't an ultra-bleeding-edge Exchange Online build, backed on Hyper-V 2016 and SOFS backed on Storage Spaces (but yeah, without Raid cards) with DAG's (well 365 will be all based on this tech - they will just be on a much newer iteration of it). MS has heralded a long history of eating their own dogfood - even to their own detriment (there are various leaks about the Windows 8 and 10 productivity losses through the alpha/beta programs)
They're actually different. ZFS is in-band only, meaning it runs in realtime. BTRFS currently only supports out-of-band dedup, so it needs to be manually run (like a defrag). They're working on an inline solution (I think it was marked experimental a few months back) but it'll be like ZFS and eat RAM to do it. Does anyone know how Windows does it? Reality is, spindles are cheap. If you need more storage space with ZFS / BTRFS, rack and stack. That's the whole point, you're not paying the SAN / brandname premiums of the big products and have more flexibility. Out-of-band dedup may be a reasonable compromise, but I'll be waiting a bit longer before trying it
I keep getting told this, and then I look at my 2PB of production storage sitting at 90% utilisation, and groan at the idea of asking management for more storage budget.