Discussion in 'Business & Enterprise Computing' started by elvis, Jul 1, 2008.
Agree, sounds like you are carrying a fair bit of risk currently...
This - the clients won't all hit the server at the same time in any case. And you can always rate limit the IIS side of things if you want it to serve out slower. There have been some monster updates lately though with Office 2013 - 1Gig+ a month.
Take ownership of the issue.
Admit you disobeyed a direct instruction however outline the reasons you did so. Don't throw anyone under the bus, do not mention names and generalise those responsible. Point out the fact that these workstations are not officially supported by Microsoft so in an effort to reduce risk to the company you took the initiative to resolve a major issue.
Point out the reasons that the restart cause this issue and also point the fact out why the reboot needed to take place in the 1st place and provide steps on changes you will make to ensure this doesnt happen again, i.e fix the printer issue.
the underlying problem is that it is a fragile system. one action should never cause such chain of events. but chain reactions occur in fragile systems.
if it wasn't this, it would have been something else.
Like daisy-chaining 10gbps ethernet to avoid buying a switch?
If you're told not to fix an exploit on a system for whatever reason from someone above you, then don't.
Question it, raise concerns, write you don't agree, even go above their head if it's an absolute last resort and you think the company's at risk - but it doesn't give you the right to do it anyway.
After the fact, apologise, say you won't ever do it again and realise it's not your place to do it.
Does sound like you'll be better off in a different environment anyway, so look at it as a blessing and a lesson learnt.
There are times when i have bent the rules to move things into a better place - stuff like rebooting non-critical systems during the day to fix issues, rather than heading through CAB to arrange a reboot (because it was a backup server and actually did nothing during the day anyway - plus i wasn't waiting 3 weeks to reboot a fucking server when we had *real* issues with the backup itself) and dealt with the fallout later - but ultimately i take ownership of this and people who need to know, knew "unofficially" before about 40 people's phones go off w/ a Sev 2.
I don't think what you did was ultimately a bad thing (that said, i probably wouldn't have gone about it that way), however you have to own it completely - including the fallout. If you can state your position and reasoning behind why you did it and why it needed to be done, you'll probably come out the otherside ok.
The fact you were explicitly told not to do something, but did it anyway, yeah well you're gonna have to wear that.
Doing that whilst simultaneously proving that what you did, wasn't actually the cause of the outage is going to be tough though. The people you work with don't sound like they can be reasoned with all too well.
Workstations check for new updates on an "Automatic Updates Detection Frequency" which should be set by group policy. They don't all hit wsus or startup or shutdown.
Group Policy - Computer Configuration -> Policies -> Administrative Templates -> Classic Administrative Templates (ADM) -> Windows Components -> Windows Updates
Configure Automatic Updates
4 - Auto download and schedule the install
Scheduled Install Day 0 - Every Day
Scheduled Install Time - 22:00
Boom, Updates are downloaded across the day, as per normal, but not installed until the scheduled install time, thus, the middle-of-the-day wait for updates as a computer restarts is avoided.
You approved the updates, without ensuring the the policies would prevent this sort of outage.
Sure 'reboot the environment in the middle of the day to fix a spooler issue' is a stupid way of fixing the problem, and letting computers get 18months out of date is hardly best practice but the multi-hour workstation outage wasn't caused by either of those things, it was caused by improper update approval.
If you know your in an environment held together with String and Tin-Cans, than running through it with scissors isn't a good way to effect change.
I've had to bring environments up to current patch levels before, Testing and Baby steps is always the way. Split the workstations out into separate WSUS groups, and run a staggered release of updates, Split the updates up and only patch 3 months at a time, instead of mass-approving everything ever.
Have a plan, and within that plan, have a backout plan, as much as I hate the rigmarole that is modern change management, taking some of the core steps can offer plenty of useful steps that are often missed by smaller IT teams.
Literally the only thing i actually give a flying fuck about in ITIL, PRINCE, etc. Also letting actual stakeholders know.
I found out that the best way out of this is not to get into this shit the fist time. All middle management need to look after their asses, no fucks are given how but they do have to look after their asses. What I do is simple, I write a letter of concern to the middle management as well as their direct manager where I state the company policy and how it's not being implemented and close with a solution and a backup plan for this solution as well as risk if the risk is high I CC the risk manager into this as well or the COO/CEO and then walk away from it. If shit fails it's not my problem.
I learned to NEVER do anything that I have not got specific approval to do. I learned not to give a shit. I know it will fail and sometimes I can tell exactly when but if my recommendations were not adhered to then I simply don't give a shit. I catch them when they ask for an RCA, normally I got very good at writing long RCA's where I attach my recommendations made previously.
To be honest you went in like a cowboy because you put too much heart into this and did what you think is right. Today this is wrong. No shit should be given if you don't have in writing what you need to do. There should be a clear line between project work and BAU. I try and push everything to project work as much as I can, mainly because there are checks and approvals and change control etc. attached to it. BAU is just for simple mindless tasks which don't require anything apart form cutting and pasting logs or images into a support ticket.
Time for the OP to wake up. No offence to the OP but the days of the past are gone. I refuse to reboot or restart a service in a production system without a change control approval. The way I look at it is "you signed for me to fuck up, it's your responsibility not mine"
I've been burnt that way.
You can't win.
I've worked at a place with similar rules, and while I didn't agree with them, they were the rules, and I abided by them.
Given the choice, I would implement them differently.
I don't need to wait for 18 people from 10 different countries, to all sign off on me restarting some crappy service that impacts nobody, just because it has a PROD label on it.
To often, I see businesses follow the route of
No Change Management
Change Fucks Shit up Big Time
CHANGE MANAGEMENT FOR ALL THE THINGS!
Which ends up with things taking far longer than they should, because the change management process is to onerous for the work. Instead, I'd rather implement change management not as a knee jerk reaction to a Borked change, but as a process improvement, the first things that get bought online are pre-approved changes... 'changes' that are not quiet BAU, but need to be tracked somehow.
Change Request: Power from Off To On
Risks of performing: May start a fire due to untidy test
Risks of not doing change: Pablo wont do any work
Approval Request: Denied
So he hasnt been seen since 9:30am, wonder if he got canned......
iirc he was only freshly into that job - with the last one being "merged" into another company that steamrolled his site's stuff with crap that didn't work (from the stuff they had that did).
Can Microsoft release Win7 SP2 already?
243 updates (Office 2010 SP2 + Win7 SP1 + DotNet 4.51 Base) is ridiculous.
Ditto for server 2k8r2
A slipstreamed roll up would work
ran this up in a vm this week. Took half a day. Bonkers.
slipstream the updates into your install.wim using dism.
IfI had to build 50 of them yeah sure. But I'm lucky to build 1 server every year. I don't do that crap anymore (where at all avoidable).
if you're only doing it once per year then how annoying can it be really?