Discussion in 'Business & Enterprise Computing' started by link1896, Dec 14, 2016.
and this will be my reaction if it all goes tits up due to some unidentified firmware issue etc.
There's a lot of 3PAR out there, and it's doing a lot more than 1PB workloads. I'd be very surprised if the ATO stumbled across some bug before other folks did.
With that said, I'm also 100% sure we'll never know the real reason this happened, because public sector arse covering.
i would be not surprised in the slightest to find out that this was caused by human error rather than a hardware or firmware fault.
word on reddit is the number 1 reason 3PAR's fail is during firmware updates apparently.
As in, data corrupts during the update process? Or it works before the update, folks do the update, then on go-live after the update it all turns to shit?
Not clear, but I'd imagine that if firmware doesn't match (or is a really really narrow set of valid versions) between nodes then bad things happen.
What no-one is talking about is how much data (and what) they will have lost
Nah, Steve Hamilton has been acting CIO since end of November and Ramez Katf is due back on board next Monday.
Ramez Katf is just on holidays or something.
Public Sector Mindset - As long as this is someone elses fault. It's all good.
Vendors know this, and accept the blame, because accepting the blame is why they bake in massive margins to their Public Sector quotes.
well, now. technically the data probably isn't lost.
they know exactly where it is, they just can't access any of it.
I don't think they brought another SAN (or pair) in to restore to, then plan to data recover, then merge afterwards.
That PB is gone.
"chmod 777" am i rite?
isn't that how everyone does it?
WEBSITE WORKS AGAIN!
It was already 777'd, and someone opened an E-mail from "Ostraya Post" to see where their package was, and why it hadn't been delivered.
And now, someone in Russia has the keys to their encrypted data.
omfg. If it was ransomware, I'll lose my shit.
That would just be too damned hilarious.
Pity that snapshots would simply let you roll it back... unless they changed enough data to expire the existing stuff...
gui only I fear. console is where the devil hides
Anyone rolling out a SAN these days for large data sets is a chump. They're in the same category of "nobody got fired for buying Cisco" type thinking.
The rest of the world as moved on. There's a reason that the Google / AWS world doesn't see these issues, they can't happen easily. SAN's traditionally have 1-2 controllers per shelf and despite all the marketing hype of distributed workloads most don't distribute data integrity checking. If a controller goes rogue with corrupt data, none of the other controllers have the ability to verify.
Well designed, distributed, block storage takes care of this issue. Ceph (as one example) has multiple metadata servers and the more you have, the greater the fault tolerance / data security (min 3 to form a quorum). They also typically work on n+2 for block storage or erasure coding to then give that ability to recover if one or more nodes goes rogue.
That said, because governments and big enterprise love new tech I expect they'll roll SAN's out for the next 10+ years
how do you see distributed IP storage Ceph/Gluster etc. vs hyperconverged, or is it all the same, just whether you do it on dedicated metal vs sharing with compute metal?
Is there ANY use case for a traditional SAN still? (specific dedicated high performance? mid-market i.e. rolling Ceph too complex for a small IT team with relatively modest requirements?)
Yep. Have seen the inside of the beast up close and personal and the incompetence/ass-covering/sheer wastage/complete amateur hour is mind boggling