Consolidated Business & Enterprise Computing Rant Thread

Discussion in 'Business & Enterprise Computing' started by elvis, Jul 1, 2008.

  1. tobes

    tobes Member

    Joined:
    Dec 23, 2001
    Messages:
    4,099
    Location:
    Melbourne
    Also for TLS don't underestimate the ability of Techs to do the exact opposite of what you want them to do in binary scenarios. I've seen outages caused because they updated route tables to point to systems being taken offline for updates. I've also seen that happen two nights in a row after rolling back the first time....
     
  2. OP
    OP
    elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    44,217
    Location:
    Brisbane
    You're being quite harsh on those poor preschoolers.
     
    Hive, millsy, 2SHY and 4 others like this.
  3. wintermute000

    wintermute000 Member

    Joined:
    Jan 23, 2011
    Messages:
    2,540
    I've seen TLS techs roll back a change because the output didn't exactly match the provided sample. Key word being sample.
    The sample was the test results in the model lab... which is obviously not 1:1 exactly the same as prod....

    Again, outsourcing it all to checklist following script simians get you outcomes like this.
     
    elvis likes this.
  4. BAK

    BAK Member

    Joined:
    Jan 7, 2005
    Messages:
    1,215
    Location:
    MornPen, VIC
    Accurate and amusing.
     
    elvis likes this.
  5. wwwww

    wwwww Member

    Joined:
    Aug 22, 2005
    Messages:
    6,310
    Location:
    Bangkok
    I posted here a while back on how half of Windows Updates seem to fail to install and cause all sorts of problems on our Server 2012 systems.

    Figured out the problem occurs when you have RRAS enabled on a Hyper V host. The time to shut down VMs become extremely long which causes the shutdown sequence to take a long time which seems to stuff up the updates.

    Moved RRAS into a VM and updates have been working smoothly since.
     
  6. Frozen_Hell

    Frozen_Hell Member

    Joined:
    Sep 11, 2002
    Messages:
    3,019
    Location:
    Cairns
    But that is a process issue and not a technical one. All sufficiently large organisations have a truckload of process debt, usually as a result of some knee jerk reaction to a problem that happened just once. Now you put all of these processes together and BAM, no one is thinking about the actual change they are doing anymore, they are spending all of their time and energy following the processes and not doing any actual thinking.

    This is also one of the detrimental effects of technology silo based teams, often they will not understand the end-to-end impact of changes outside of their specific technology domain. That is why there is the trend towards adopting multi-functional teams that own services and not technologies amongst many companies.
     
    Last edited: May 7, 2018
    fredhoon likes this.
  7. PabloEscobar

    PabloEscobar Member

    Joined:
    Jan 28, 2008
    Messages:
    14,538
    Are you me?
     
  8. j3ll0

    j3ll0 Member

    Joined:
    Jul 13, 2005
    Messages:
    4,794
    Gents, I'm happy to concede you are both massively more knowledgeble around this than I (hence the tongue in cheek 'everybody else's job is easy' hashtag).

    My ignorance stems from memories of when a couple of 6509s used to have enough compute to drive core internet routing. I am hearing from you both that complexity has grown at internet speed. Fair enough.

    *thumbsup*

    .
     
  9. itsmydamnation

    itsmydamnation Member

    Joined:
    Apr 30, 2003
    Messages:
    10,692
    Location:
    Canberra
    6500 are asic based forwarders, if you issued no ip cef ( or did something to make it process switch), you would get about 500,000 pps max out of it, its cpu's are anemic. But a 6500 is the perfect example because depending on the combination of MSFC, DFC, DCEF vs CCEF and also all the other die cards that made a 6500 a 7600 could completely change the way some commands functioned ( anything to do with a pseudo wire being a perfect example).
     
  10. IACSecurity

    IACSecurity Member

    Joined:
    Jul 11, 2008
    Messages:
    760
    Location:
    ork.sg
    Proof that SOMEONE doesn't have redundant paths, i worked on sections of the revised E000 system (replacing ISDN) and the end provider of E000 calls is state/territory based agencies (in the majority) - the carrier can provide full redundancy, but if the call provider elects not too consume this in an intelligent way, sayy due to shear incompetence after siting on notifications for years, and then finally does something last minute without redundancy and expects Telstra to jump a month out from transition - you can't reeeally blame Telstra for that. For example. Not saying this is the case now, but its not always as obvious as the media makes it out to be.

    Transmission and Retail power providers do this across their distributed power networks with millions of end points feeding back the state of the network, which is nearly as dynamic and certainly as real-time as our little slice of the Internet..

    Just felt like quoting you.
     
  11. Frozen_Hell

    Frozen_Hell Member

    Joined:
    Sep 11, 2002
    Messages:
    3,019
    Location:
    Cairns
    Well it has been stated by TLS that a router that was configured for failover didn't failover. So the intention was to have redundancy and failover, but for whatever reason it didn't failover or didn't failover cleanly.

    As disclosure, yes I do work for TLS (my opinions are my own etc of course), but not in that area and no I haven't looked up the details of the fault - so I don't know more than what has been published in the media. From my experience working in IT in general a number of scenarios can happen that cause failover mechanisms to not function as expected:
    - Bad/missing configuration
    - Vendor bug
    - The failure wasn't a complete failure, e.g. the "health" checking might've indicated OK but no service traffic flow
    - The failover didn't work and ended in a split-brain scenario or even a loss of quorum etc.

    I have personally seen each of those in different environments multiple times. No doubt failover would've been tested before being production ready, but probably hundreds of configuration changes and software versions later (even could've been to adjacent devices, not even the device specifically in question), all bets are off even if it has been lab tested along the way. I can tell you that the weird and wonderful bugs that I've seen from various different vendors, even after said vendors assure you that they have "thoroughly bug scrubbed this version" is just plain stupid.
     
  12. IACSecurity

    IACSecurity Member

    Joined:
    Jul 11, 2008
    Messages:
    760
    Location:
    ork.sg
    I concluded with the above.

    You forgot 'people', we warm meat sacks are amazing and screwing stuff up at the best of times, let alone 'in a crisis'.
     
    elvis likes this.
  13. tobes

    tobes Member

    Joined:
    Dec 23, 2001
    Messages:
    4,099
    Location:
    Melbourne
    Except that power is essentially directly substitutable where as internet is more of a precious little being.
     
  14. fredhoon

    fredhoon Member

    Joined:
    Jun 27, 2003
    Messages:
    2,817
    Location:
    Brisbane
    Also the state and possible routing paths of the power networks are orders of magnitude simpler than later 2/3 routing. However the configuration and state of multiple redundant and backup protection schemes are a good parallel.

    Protection schemes and electronic realys also suffer from infrequent failure to operate correctly, despite the "simple" (by comparison) network topology.
     
    Last edited: May 8, 2018
  15. Frozen_Hell

    Frozen_Hell Member

    Joined:
    Sep 11, 2002
    Messages:
    3,019
    Location:
    Cairns
    Covered by the "bad/missing configuration" comment. If it was due to a person, it would've been well before the actual event as it wasn't a case of someone was meant to fail it over and follow instructions, it was meant to automagically happen. Unlike the mobile outage early last week that was related to a person executing a change that went bad.
     
  16. OP
    OP
    elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    44,217
    Location:
    Brisbane
    Preach!

    We misanthropes would have monthly meetings to sneer at humanity if we didn't hate each other and ourselves so much.
     
    freaky_beeky and ^catalyst like this.
  17. Unframed

    Unframed Member

    Joined:
    Mar 30, 2010
    Messages:
    9,160
    Location:
    Hella south west
    Being asked to rack servers at a datacenter on Saturday because it's convenient for our american employees.
     
  18. NSanity

    NSanity Member

    Joined:
    Mar 11, 2002
    Messages:
    18,315
    Location:
    Canberra
    phat OT.

    I love Datacenter Saturdays/Sundays.

    Double's my take home.

    unf
     
  19. Unframed

    Unframed Member

    Joined:
    Mar 30, 2010
    Messages:
    9,160
    Location:
    Hella south west
    Yeah I've never been told about compensation for it so I get TIL which is shit IMO but I'm not sure how to say "want me on weekends then fucking pay"
     
  20. wintermute000

    wintermute000 Member

    Joined:
    Jan 23, 2011
    Messages:
    2,540
    WTF is wrong with people who want training videos (over books/PDFs). Everything is video.

    Brand spanking new vendor stack: 'here's 30 hours of training videos.' HOW ABOUT THE BOOK GODDAMMIT. I can read the book in way less than 30 hours.
     
    tobes and elvis like this.

Share This Page

Advertisement: