Ok who broke rsync at the ATO?

Discussion in 'Business & Enterprise Computing' started by link1896, Dec 14, 2016.

  1. GumbyNoTalent

    GumbyNoTalent Member

    Joined:
    Jan 8, 2003
    Messages:
    8,273
    Location:
    Briz Vegas
  2. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    40,277
    Location:
    Brisbane
    More or less, yeah. If it was something folks could buy and not build, there'd be more of that out in the wild.

    And to be honest, that's how a lot of stuff starts. I've been building storage out of server-grade x86 Linux boxes for decades. Back then, big enterprisey folk told me I was nuts, and I had to go and buy vendor storage instead. Problem was it cost 10 times as much, and the businesses I worked for didn't have that cash.

    Fast forward 20 years, and I'm running "enterprise" storage in the shape of an Oracle ZS3-2 array. What's inside it? A generic x86 server with lots of SATA storage, and a pretty web GUI over the top. Exactly the sort of thing I was building years ago. Between then and now, folks like FreeNAS offered similar things too, until the enterprise market caught up, and realised this was something they could offer people for the same sorts of profits but without the penalty of proprietary hardware costs.

    Same thing goes for other tech. For example, Google weren't happy with the way large scale data storage at the database/application layer worked, so they made "Bigtable". It was their in-house code for ages, until they released the spec, and other people had a crack at building it. From Bigtable's design spawned the open source "Hadoop", maintained by the Apache foundation.

    Who uses Hadoop today? Lots of people (I reckon you'd see it in 75% of Fortune500s). Heck, Microsoft even SELL it as a managed service!

    The difference between 2016 and 1986 is that we're just more aware of these cycles now. Once upon a time if a company did something crazy in-house, nobody heard of it except for a few industry rumours at the annual trade show in Las Vegas. Today, it gets social media'ed to death, and everyone hears about it. The cycle is the same though - it goes from R&D project to in house tech to lots of folks tinkering to vendor offering.

    With all of that said, specific to backups and archives, you really are hard pressed to find technology that out lasts tape for long term reliability. Burnable optical media suffers a lot of problems 7+ years into its life, whereas tape can last for 30+ without much effort. Likewise, finding things that can read 30 year old tapes is far less costly/difficult, and once again you've got lots of vendor support there as well for things like robot loaders and other bits that make actually using it faster/easier. I know it's all a bit of "chicken and egg" argument for that last point, but it honestly does matter when you've got federal legislation demanding you keep this data for a long time (something Facebook doesn't have).
     
    Last edited: Dec 16, 2016
  3. cvidler

    cvidler Member

    Joined:
    Jun 29, 2001
    Messages:
    13,043
    Location:
    Canberra
    No I didn't misunderstand you, it's still a stupid idea for this purpose, no doubt it works well for Facebook. Because no one will care if your years old cat photos are lost.

    blurays (any optical disc) are unreliable*. Tape is for backups. And you can buy WORM tapes

    [​IMG]
    funnily they still have the write protect tab, not that it matters. the write once is enforced by the ROM in the tape and the drives.



    *unreliable, small, slow and expensive for useful data sets. 100GB a disc @ 9MBps for $50 is rubbish capacity, performance and value. LTO7 tape holds 6TB each (up to 15TB with compression), and run at 300MBps, and are ~$100/tape at low volume, price rapidly drops as you buy in bulk.
     
  4. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    40,277
    Location:
    Brisbane
    Yeah, that often gives folks a laugh, but you really need it there for certain drives otherwise they freak out.

    Plus from a manufacturing point of view, it's a lot easier/cheaper to just keep making them the same physical shape/size/properties.
     
  5. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    17,838
    Location:
    Canberra
  6. GumbyNoTalent

    GumbyNoTalent Member

    Joined:
    Jan 8, 2003
    Messages:
    8,273
    Location:
    Briz Vegas
    NP, I'm always being corrected as being 50+ the grey mush isn't as good at recall as it once was, but I fondly remember DB2 on a S/360, and not so fondly Tivoli anything when it first came out as Tivoli in the 90s.
     
  7. PabloEscobar

    PabloEscobar Member

    Joined:
    Jan 28, 2008
    Messages:
    13,873
    Was this presented seamlessly to the Application layer?

    We've got applications that bitch if we redirect folders, because it expects local disk access times, and gets slightly slow access times because network.

    I'd hate to think what an off-the-shelf application would do if it had to wait for a robot to load up tapes/optical disks and read in data.

    GoogBook et-al have the benefits of systems built specifically for the hardware they build/use. and the benefits of scale, to make it all worthwhile. the ATO dataset would just be a drop in the ocean.
     
  8. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    40,277
    Location:
    Brisbane
    Good ones will keep metadata online forever, so any recursive directory traversal, file size lookups, or other things will work as expected, and only when full file read access is requested will data be pulled back.

    It also stops someone triggering a restore of PB of data with an accidental recursive dir/ls command.

    Disk space wise, typical for our use case, metadata is one SI unit less than the matching data (i.e.: 1GB data = 1MB metdata ; 1TB data = 1GB metadata, 1PB data = 1TB metadata, etc).
     
    Last edited: Dec 16, 2016
  9. obi

    obi Member

    Joined:
    Oct 16, 2004
    Messages:
    127
    It was seamless*.

    Files would be available via SMB or NFS shares to the end user. It also helped that the sector we were working in weren't putting dinky databases on a network share and then crapping their pants.

    *Seamless assuming a sane client. Occasionally some operating systems would choke, but we could also perform manual calls to bring the files back into cache.

    EDIT: Also what elvis said. Metadata was always available for a file, so could be browsed without triggering a restore of entire folders.

    Oracle HSM/Sun SAM-QFS for those interested.
     
    Last edited: Dec 16, 2016
  10. Luke212

    Luke212 Member

    Joined:
    Feb 26, 2003
    Messages:
    9,769
    Location:
    Sydney
    its also the no. 1 reason cloud services fail. i have seen a provider lose all servers and all backups from a firmware update.

    firmware is a form of inter-dependency. its like putting all your data one 1 drive.
     
  11. Daemon

    Daemon Member

    Joined:
    Jun 27, 2001
    Messages:
    5,469
    Location:
    qld.au
    None of the above, I run something fairly industry specific (Virtuozzo) which will come into the enterprise market soon :) Production proven with companies running it into the multiple PB range.

    Ceph is just a completely different beast compared to what most enterprises deal with. At scale, there's no issue as you have a team working with it 24/7. It's like trying to drive a F1 racecar, if you don't have a full team and the right experience, it's very easy to crash and burn. Get it right though and it's incredibly powerful.

    Ironically enough, they're not so different. You can get greater densities with micro-services / small web based systems but they present the same overall (70% read, 30% write) type scenarios as a typical VM environment. It's really the management and network layers where there's differentiation.

    If you want to copy Facebook for archiving, then this is what you want: https://code.facebook.com/posts/1433093613662262/-under-the-hood-facebook-s-cold-storage-system-/

    DC's designed from the ground up, hardware designed from the ground up just for cold storage. Everything down to the controllers only spinning up one drive at a time has been factored in. This at least can be scaled down to a single server scenario and therefore useful to enterprise.

    The Bluray stuff you'd want to have the issue of archives for over 1PB before it was even a consideration. Tape however is dead at a large scale and many big companies are using 10TB drives in cold storage configuration for long term archiving.
     
  12. obi

    obi Member

    Joined:
    Oct 16, 2004
    Messages:
    127
    I'm not sure everyone agrees with you.

     
  13. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    40,277
    Location:
    Brisbane
    Specific to our use case, we were on clustered storage for a long time, but were ultimately forced off it because our workloads crippled it.

    We still use it around the place for secondary workloads, but our primary stuff had to move to more "traditional" storage to get the performance we needed.

    "Long term" is a relative statement, and will mean different things to different people. I've worked at places where "long term" meant 12 months (small business, #yolo), and others where it meant 30+ years (large scale architecture, where you can be sued for defects in buildings many decades after the plans were finalised).
     
    Last edited: Dec 16, 2016
  14. Daemon

    Daemon Member

    Joined:
    Jun 27, 2001
    Messages:
    5,469
    Location:
    qld.au
    If there was 100% consensus or only the one answer, then it wouldn't be here to be debated :) Sure it's not dead for everyone but anyone rolling new systems then you won't see tape being mentioned often.
    Not all clustered systems have tackled the performance side well, especially those who are focused on data integrity or use object storage as the base product. There are already systems which can achieve 10x the typical Ceph performance in the same tin, which shows how far Ceph and the likes have to go when it comes to increases.

    The drives for this are very specific to cold storage / long term archiving, which is a big change from previous technologies. This is exactly why all of the big providers are using them, and though the magic of automation they can continuously upgrade and replace drives to further prolong lifespans well beyond 30 years.
     
  15. ECHO

    ECHO Member

    Joined:
    Jun 17, 2002
    Messages:
    636
    Location:
    Canberra
    Probably a bad time to try and offload a brand new HPE StoreVirtual device I have sitting around at work ... lol
     
  16. PabloEscobar

    PabloEscobar Member

    Joined:
    Jan 28, 2008
    Messages:
    13,873
    ATO might need another spare for N+2 Garbage Replication.
     
  17. looktall

    looktall Working Class Doughnut

    Joined:
    Sep 17, 2001
    Messages:
    25,289
    [​IMG]
     
  18. rainwulf

    rainwulf Member

    Joined:
    Jan 20, 2002
    Messages:
    4,227
    Location:
    bris.qld.aus
    ZFS is the fucking bomb.

    LOVE me some tasty ZFS action.
     
  19. Daemon

    Daemon Member

    Joined:
    Jun 27, 2001
    Messages:
    5,469
    Location:
    qld.au
    ZFS is still single FS / single chassis only. If Oracle hadn't screwed Sun completely, I have no doubt that it'd be a scalable, networked FS.

    While it's great for a contained chassis (and exactly what I use it for), it doesn't come close to competing with a SAN / distributed FS.

    Can anyone go back in time and slap Larry so that he doesn't buy Sun please? :)
     
  20. cvidler

    cvidler Member

    Joined:
    Jun 29, 2001
    Messages:
    13,043
    Location:
    Canberra
    Sun was circling the drain, they wouldn't be here if it wasn't for Larry.

    Their silicon was ancient and outgunned by everyone else, and they had nothing on the roadmap. They were just short of going all x86_64
     

Share This Page

Advertisement: