Overclockers Australia Forums

OCAU News - Wiki - QuickLinks - Pix - Sponsors  

Go Back   Overclockers Australia Forums > Specific Hardware Topics > Business & Enterprise Computing

Notices


Sign up for a free OCAU account and this ad will go away!
Search our forums with Google:
Reply
 
Thread Tools
Old 14th December 2016, 12:16 PM   #16
connico
Member
 
connico's Avatar
 
Join Date: Jan 2004
Location: Sydney
Posts: 1,895
Default

Quote:
Originally Posted by PabloEscobar View Post
The first problem was buying HPE kit...
Lol it was HPE or IBM... both of them shit...
__________________
www.shoepolish.net.au
connico is online now   Reply With Quote

Join OCAU to remove this ad!
Old 14th December 2016, 12:38 PM   #17
bcann
Member
 
Join Date: Feb 2006
Location: NSW
Posts: 3,886
Default

Quote:
Originally Posted by NSanity View Post
Hardly.

SAN's flake out from time to time. If its filesystem level, it typically replicates at the speed of light (i mean, thats what you wanted right?) - but by the time you notice the damage is done.

If you have snapshots pre-corruption, and they aren't affected, you can roll back.

If you don't well you're rolling back to backup pre-corruption.

elvis and I keep saying this, but large datasets need filesystem level integrity checks - and well that's a next-gen filesystem - e.g ZFS, BTRFS and ReFS (although the latter doesn't work for VM workloads - because you turn integrity streams off).
that won't detect GIGO. I'd imagine they'd have that baby carved up into so many partitions that sure if one partition got burnt that might lose some data, but by reading the story it looks like both SANS (ok one replicated the gigo to the other) went to toast, that sounds more hardware/firmware to me.
__________________
Quote:
Originally Posted by elvis View Post
All I do is hand folks the working gun. Up to them as to whether or not they go hunting to bring home the bacon, or shoot themselves in the foot. I am merely the lowly gunsmith, and nothing more.
bcann is offline   Reply With Quote
Old 14th December 2016, 12:59 PM   #18
NSanity
Member
 
NSanity's Avatar
 
Join Date: Mar 2002
Location: Canberra
Posts: 15,611
Default

Quote:
Originally Posted by bcann View Post
that won't detect GIGO.
I mean, ZFS does. So long as the Network didn't fuck it on the way past, whatever lands in ram is basically gospel.

If an App wrote junk data, then an app wrote junk data and its an app problem.
NSanity is offline   Reply With Quote
Old 14th December 2016, 1:01 PM   #19
elvis
Old school old fool
 
elvis's Avatar
 
Join Date: Jun 2001
Location: Brisbane
Posts: 27,897
Default

Quote:
Originally Posted by NSanity View Post
elvis and I keep saying this, but large datasets need filesystem level integrity checks - and well that's a next-gen filesystem - e.g ZFS, BTRFS and ReFS
Damn right. If you're not checksumming data at every level on every read and write on modern workloads, you're doing it wrong.

Quote:
Originally Posted by bcann View Post
that won't detect GIGO. I'd imagine they'd have that baby carved up into so many partitions that sure if one partition got burnt that might lose some data, but by reading the story it looks like both SANS (ok one replicated the gigo to the other) went to toast, that sounds more hardware/firmware to me.
On modern filesystems, I can't see this being an issue to that scale.

End-to-end checksums and standard hardware failover should have detected this. Snapshots would ensure that even if bad data came in, there's a diff to fail back to in the event of logical data errors or "garbage". Then on top of that, we fall back to tape backup.

There's several layers of human fuckup here if a PB of data belonging to federal level taxation statutory authority in the first world goes missing. While it's fun to play blame the vendor, this just shouldn't happen in 2016, even if a piece of hardware caught on fire.

Quite frankly all this does is reinforce my disdain for our public sector. It is quite evident that they cannot get technology right at any level. Whether it's Queensland Health's pay system, census, NBN or this, the list of fuckups with very large price tags attached is growing, as is our international embarrassment.

Quote:
Originally Posted by NSanity View Post
I mean, ZFS does. So long as the Network didn't fuck it on the way past, whatever lands in ram is basically gospel.

If an App wrote junk data, then an app wrote junk data and its an app problem.
This. And the way to recover is snapshot/backup, not replication.
elvis is online now   Reply With Quote
Old 14th December 2016, 1:11 PM   #20
looktall
Working Class Hero
 
looktall's Avatar
 
Join Date: Sep 2001
Location: brabham.wa.au
Posts: 22,569
Default

we're in the middle of migrating some hundreds of TB onto 3PAR storage.
we started moving it on monday.

nek minnit.
looktall is online now   Reply With Quote
Old 14th December 2016, 1:12 PM   #21
NSanity
Member
 
NSanity's Avatar
 
Join Date: Mar 2002
Location: Canberra
Posts: 15,611
Default

Quote:
Originally Posted by looktall View Post
we're in the middle of migrating some hundreds of TB onto 3PAR storage.
we started moving it on monday.

nek minnit.
NSanity is offline   Reply With Quote
Old 14th December 2016, 1:14 PM   #22
looktall
Working Class Hero
 
looktall's Avatar
 
Join Date: Sep 2001
Location: brabham.wa.au
Posts: 22,569
Default

Quote:
Originally Posted by NSanity View Post
and this will be my reaction if it all goes tits up due to some unidentified firmware issue etc.

looktall is online now   Reply With Quote
Old 14th December 2016, 1:19 PM   #23
elvis
Old school old fool
 
elvis's Avatar
 
Join Date: Jun 2001
Location: Brisbane
Posts: 27,897
Default

Quote:
Originally Posted by looktall View Post
and this will be my reaction if it all goes tits up due to some unidentified firmware issue etc.
There's a lot of 3PAR out there, and it's doing a lot more than 1PB workloads. I'd be very surprised if the ATO stumbled across some bug before other folks did.

With that said, I'm also 100% sure we'll never know the real reason this happened, because public sector arse covering.
elvis is online now   Reply With Quote
Old 14th December 2016, 1:21 PM   #24
looktall
Working Class Hero
 
looktall's Avatar
 
Join Date: Sep 2001
Location: brabham.wa.au
Posts: 22,569
Default

Quote:
Originally Posted by elvis View Post
There's a lot of 3PAR out there, and it's doing a lot more than 1PB workloads. I'd be very surprised if the ATO stumbled across some bug before other folks did.

With that said, I'm also 100% sure we'll never know the real reason this happened, because public sector arse covering.
no doubt.

i would be not surprised in the slightest to find out that this was caused by human error rather than a hardware or firmware fault.
looktall is online now   Reply With Quote
Old 14th December 2016, 1:23 PM   #25
NSanity
Member
 
NSanity's Avatar
 
Join Date: Mar 2002
Location: Canberra
Posts: 15,611
Default

Quote:
Originally Posted by looktall View Post
no doubt.

i would be not surprised in the slightest to find out that this was caused by human error rather than a hardware or firmware fault.
word on reddit is the number 1 reason 3PAR's fail is during firmware updates apparently.
NSanity is offline   Reply With Quote
Old 14th December 2016, 1:38 PM   #26
elvis
Old school old fool
 
elvis's Avatar
 
Join Date: Jun 2001
Location: Brisbane
Posts: 27,897
Default

Quote:
Originally Posted by NSanity View Post
word on reddit is the number 1 reason 3PAR's fail is during firmware updates apparently.
As in, data corrupts during the update process? Or it works before the update, folks do the update, then on go-live after the update it all turns to shit?
elvis is online now   Reply With Quote
Old 14th December 2016, 1:41 PM   #27
NSanity
Member
 
NSanity's Avatar
 
Join Date: Mar 2002
Location: Canberra
Posts: 15,611
Default

Quote:
Originally Posted by elvis View Post
As in, data corrupts during the update process? Or it works before the update, folks do the update, then on go-live after the update it all turns to shit?
Not clear, but I'd imagine that if firmware doesn't match (or is a really really narrow set of valid versions) between nodes then bad things happen.

What no-one is talking about is how much data (and what) they will have lost

Last edited by NSanity; 14th December 2016 at 1:43 PM.
NSanity is offline   Reply With Quote
Old 14th December 2016, 1:43 PM   #28
Dre_
Member
 
Join Date: May 2014
Posts: 475
Default

Quote:
Originally Posted by PabloEscobar View Post
All the press is coming from the "Acting CIO"...
Perhaps heads already have.
Nah, Steve Hamilton has been acting CIO since end of November and Ramez Katf is due back on board next Monday.

Ramez Katf is just on holidays or something.


Last edited by Dre_; 14th December 2016 at 1:48 PM.
Dre_ is offline   Reply With Quote
Old 14th December 2016, 1:48 PM   #29
PabloEscobar
Member
 
Join Date: Jan 2008
Posts: 8,685
Default

Quote:
Originally Posted by elvis View Post
Quite frankly all this does is reinforce my disdain for our public sector. It is quite evident that they cannot get technology right at any level. Whether it's Queensland Health's pay system, census, NBN or this, the list of fuckups with very large price tags attached is growing, as is our international embarrassment.
Public Sector Mindset - As long as this is someone elses fault. It's all good.

Vendors know this, and accept the blame, because accepting the blame is why they bake in massive margins to their Public Sector quotes.
PabloEscobar is online now   Reply With Quote
Old 14th December 2016, 1:54 PM   #30
NSanity
Member
 
NSanity's Avatar
 
Join Date: Mar 2002
Location: Canberra
Posts: 15,611
Default

http://lets-talk.ato.gov.au/ato-systems-update

Quote:
Have you lost one petabyte of data?
No. The petabyte of data referred to in media reports relates to storage capacity, which includes not only data but applications and systems as well. This figure does not relate to data impacted by the outages. While we experienced some data corruption, we are in the process of fully restoring this data from back-up. No data has been lost.
Ahahahahaha Bullshit.
NSanity is offline   Reply With Quote
Reply

Bookmarks

Sign up for a free OCAU account and this ad will go away!

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +10. The time now is 11:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
OCAU is not responsible for the content of individual messages posted by others.
Other content copyright Overclockers Australia.
OCAU is hosted by Micron21!