Ok who broke rsync at the ATO?

Discussion in 'Business & Enterprise Computing' started by link1896, Dec 14, 2016.

  1. link1896

    link1896 Member

    Joined:
    Jul 28, 2005
    Messages:
    366
    Location:
    Melbourne
  2. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    16,500
    Location:
    Canberra
    I mean, its back up now (well parts of it).

    This is why you need a backup.
     
    Last edited: Dec 14, 2016
  3. ^catalyst

    ^catalyst Member

    Joined:
    Jun 27, 2001
    Messages:
    11,660
    Location:
    melbourne
    RAID is not backup
    RAID is not backup
     
  4. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    16,500
    Location:
    Canberra
    BUT HP SED IF I RSYNC TO DAT ONE WE GUD?
     
  5. shredder

    shredder Member

    Joined:
    Dec 26, 2001
    Messages:
    9,790
    Location:
    Dec 27, 1991
    [​IMG]
     
  6. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    16,500
    Location:
    Canberra
  7. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    30,821
    Location:
    Brisbane
    Boss: "Realtime synchronise all of the things!"

    Grunt: "But sir, what if there's an error and we don't have an offline copy of..."

    Boss: "I SAID ALL OF THE THINGS!!!"

    Days later, corruption. Days later, investigation. Days later, result:

    Boss: "Heads will roll! We've sacked all our grunts, and told the new ones never to make this mistake again".
     
  8. Sphinx2000

    Sphinx2000 Member

    Joined:
    Sep 16, 2001
    Messages:
    5,332
    Location:
    Brisbane
    HPE Engineer: "How we tell them it actually failed?"
    HPE Boss: "Tell them this is first time this has ever happened in the world, that always makes them feel better"

    :lol:
     
  9. PabloEscobar

    PabloEscobar Member

    Joined:
    Jan 28, 2008
    Messages:
    10,376
    All the press is coming from the "Acting CIO"...
    Perhaps heads already have.
     
  10. chip

    chip Member

    Joined:
    Dec 24, 2001
    Messages:
    3,404
    Location:
    Perth
    I don't have specifics on either case, but it sounds similar (ie catastrophic data loss) to this one: http://www.theregister.co.uk/2016/1...kups_staff_get_order_to_never_make_their_own/

    I've also seen smaller HP SANs with firmware faults that replicated and killed production data a couple of times over the years.
     
  11. power

    power Member

    Joined:
    Apr 20, 2002
    Messages:
    51,981
    Location:
    brisbane
    HP Appreciates your participation in this public beta.
     
  12. cesario

    cesario Member

    Joined:
    Jun 15, 2009
    Messages:
    283
    This story has been the hot topic around my workplace, id be very keen to find out exactly what happened.
    Sounds like one of those 'oops' moments
     
  13. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    16,500
    Location:
    Canberra
    Hardly.

    SAN's flake out from time to time. If its filesystem level, it typically replicates at the speed of light (i mean, thats what you wanted right?) - but by the time you notice the damage is done.

    If you have snapshots pre-corruption, and they aren't affected, you can roll back.

    If you don't well you're rolling back to backup pre-corruption.

    elvis and I keep saying this, but large datasets need filesystem level integrity checks - and well that's a next-gen filesystem - e.g ZFS, BTRFS and ReFS (although the latter doesn't work for VM workloads - because you turn integrity streams off).
     
  14. PabloEscobar

    PabloEscobar Member

    Joined:
    Jan 28, 2008
    Messages:
    10,376
    The first problem was buying HPE kit...
     
  15. cesario

    cesario Member

    Joined:
    Jun 15, 2009
    Messages:
    283
    Kinda have to feel bad for Steve Hamilton though
    "Acting CIO 28 November to 16 December"
    All this 2 days before his role ended, now thats some terrible timing
     
  16. connico

    connico Member

    Joined:
    Jan 30, 2004
    Messages:
    2,977
    Location:
    Sydney
    Lol it was HPE or IBM... both of them shit...
     
  17. bcann

    bcann Member

    Joined:
    Feb 26, 2006
    Messages:
    4,463
    Location:
    NSW
    that won't detect GIGO. I'd imagine they'd have that baby carved up into so many partitions that sure if one partition got burnt that might lose some data, but by reading the story it looks like both SANS (ok one replicated the gigo to the other) went to toast, that sounds more hardware/firmware to me.
     
  18. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    16,500
    Location:
    Canberra
    I mean, ZFS does. So long as the Network didn't fuck it on the way past, whatever lands in ram is basically gospel.

    If an App wrote junk data, then an app wrote junk data and its an app problem.
     
  19. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    30,821
    Location:
    Brisbane
    Damn right. If you're not checksumming data at every level on every read and write on modern workloads, you're doing it wrong.

    On modern filesystems, I can't see this being an issue to that scale.

    End-to-end checksums and standard hardware failover should have detected this. Snapshots would ensure that even if bad data came in, there's a diff to fail back to in the event of logical data errors or "garbage". Then on top of that, we fall back to tape backup.

    There's several layers of human fuckup here if a PB of data belonging to federal level taxation statutory authority in the first world goes missing. While it's fun to play blame the vendor, this just shouldn't happen in 2016, even if a piece of hardware caught on fire.

    Quite frankly all this does is reinforce my disdain for our public sector. It is quite evident that they cannot get technology right at any level. Whether it's Queensland Health's pay system, census, NBN or this, the list of fuckups with very large price tags attached is growing, as is our international embarrassment.

    This. And the way to recover is snapshot/backup, not replication.
     
  20. looktall

    looktall Working Class Doughnut

    Joined:
    Sep 17, 2001
    Messages:
    23,799
    Location:
    brabham.wa.au
    we're in the middle of migrating some hundreds of TB onto 3PAR storage.
    we started moving it on monday.

    nek minnit.
     

Share This Page