Virtualization - a step forwards or backwards?

Discussion in 'Business & Enterprise Computing' started by Cape_Horn, Apr 14, 2011.

  1. Cape_Horn

    Cape_Horn Member

    Joined:
    Dec 23, 2001
    Messages:
    2,454
    Location:
    Shooting Baker
    It looks like IACS managed to stir up a few questions for everyone with his little thread.
    Between that and the discussion over lunch, well, time to bring the question to a larger audience...

    Is Visualization a step forwards or backwards?

    We are currently in the process of doing a hardware replacement for a lot of our unix machines for one of our clients. Currently we have around 200 physical machines and ~240 logical machines. (AIX on pSeries)
    The current plan is to replace our p5, p4 (and older) machines with approx 15 new physical machines, but we will be keeping the same number of logical partitions.

    The question that was asked is that with what amounts to most/all of the production environment ending up on one physical host, if that host has a major problem, then we WILL lose the production environment, with the current setup, we can pull the power on half the DC before we would lose most of the applications. With the issues we already have getting outages for OS/firmware upgrades, trying to drop an entire environment to upgrade the firmware for one box will now be even harder, and yet we are still told this is a step forward.

    The second question comes from another client, who (again, it is time to start planing what the next generation of hardware will look like) is suggesting that a specific application could be moved off AIX/pSeries and onto VMware linux, because it will be a cost saving, and 'everything should be virtualized, it is the "way of the future"' - Telling the CIO of that client that his solution is not supported by the application vendor, that AIX/pSeries is seen as best practice by the vendor, and the application support teams prefer the current setup and would like to just move from p5 to p7 doesn't seem to get through to those people who get to make the decisions.

    So - in reality - are we going forwards or backwards?
     
  2. Lethal_Lynx

    Lethal_Lynx Member

    Joined:
    Sep 8, 2005
    Messages:
    266
    Location:
    Brisbane
    What about VMware's High Availability or Fault Tolerance features?

    http://www.vmware.com/products/fault-tolerance/overview.html

    How the hell can it be a step backwards. Taking 200 servers and making 15 do the same thing, simply amazing.
     
    Last edited: Apr 14, 2011
  3. Oppressa

    Oppressa Member

    Joined:
    Jun 5, 2004
    Messages:
    5,573
    Location:
    Sydney
    Yeah, these are some great features that have come in handy for me before, running VMWare at work. Will make the admins' jobs so much damn easier to manage!


    To an extent. I think cloud-hosted is the way of the future. Sure the vendors may be using virtualised systems but the client doesn't really care as long as their server is running and the content is accessible/being delivered.
    I'm looking forward to handing a lot of the maintenance over to major companies with experience in the area.
     
  4. OP
    OP
    Cape_Horn

    Cape_Horn Member

    Joined:
    Dec 23, 2001
    Messages:
    2,454
    Location:
    Shooting Baker
    there is no emulation for pSeries within VMware.
    HACMP on pSeries can take upto 3-4 seconds to swap nodes, this is classed as an 'unacceptable outage'
    the more complex the environment, the longer the swapover takes.

    Because it introduces a single point of failure, which instead of taking one or two systems out, could take down 30 or 40 instead.


    maybe that is my issue - I am in one of the companies that you would be throwing your cloud computing over to.
     
  5. 3t3rna1

    3t3rna1 Member

    Joined:
    Dec 24, 2001
    Messages:
    1,452
    Location:
    Perth
    It can work in the opposite way, if you detect a problem with a VM host you can migrate all VM's off it and fix the issue without downtime. In my opinion, VMware is shit at this and ESX normally just stops responding. Other hypervisors are a lot less prone to spastic behaviour.

    If you are using SAN storage with some redundancy, what can really go wrong with a host?
     
  6. Soarer GT

    Soarer GT Member

    Joined:
    Sep 26, 2007
    Messages:
    2,836
    Location:
    Melbourne
    The SAN has issues and starts "flapping" with the DR SAN... Big mess comes afterwards and the Vendors all blame each other.
     
  7. one4spl

    one4spl Member

    Joined:
    Dec 9, 2005
    Messages:
    428
    Location:
    Jamboree Hts, Brisbane
    Application clustering like HACMP (or Windows Clustering, Oracle Clusters, whatever) is as good as it gets for HA. You can still virtualise under it- just have rules that make sure that the application cluster nodes are spread across multiple physical hosts.

    If the application doesn't support clustering then you can't get too picky about how highly available it is - but something like VMware HA where a failed VM or Host has the VMs booting on another node automatically in about 15 seconds is a lot better than waiting 4+ hours for the vendor to come out and fix a physical host.

    I see no down side in a well thought out and well understood virtualisation solution.
     
  8. Iceman

    Iceman Member

    Joined:
    Jun 27, 2001
    Messages:
    6,647
    Location:
    Brisbane (nth), Australia
    You either have insufficient redundancy or you have it in the wrong place :)
     
  9. grs1961

    grs1961 Member

    Joined:
    Jan 21, 2005
    Messages:
    514
    Location:
    Melbourne
    If the application(s) is(are) critical, and have to go 24x7, and cluster fail-over times just don't cut it, and so on, then the only real solution is to go to Tandem(HP) NonStop, which "Just Fscking Works"TM...

    Of course, each application has to be re-written to fit the NonStop rules, and, of course, the hardware itself costs the proverbial metric manure-load per item, but, unlike the vapid promises of VMWare/Oracle/MS salesdroids, it really does just work. Even better, the sales and support contracts specify what "Just Works" means.

    One trick - which I have used before - get the vendor of the AIX application to quote a full custom port from Power to x64 architecture - usually the sheer number of digits gets through the buzzword-induced blindness.
     
  10. 4wardtristan

    4wardtristan Member

    Joined:
    Apr 9, 2008
    Messages:
    1,181
    Location:
    brisbane
    I will happily admit i have no idea what the OP is talking about in re. to pSeries IBM kit etc.

    One thing i learned about vmware FT is that is has huuuge limitations:

    1 vCpu (NO SMP!)
    NO RDM
    no snapshots
    dedicated NIC
    alot of other things, cant be bothered getting my VCP 4.10 books from my car

    point is, most of those are killers for extremely small (2-3 host setups) configs let alone of the scale OP is talking, i imagine.

    edit:not sure what vsphere 5 is bringing in terms of those, but those are current limitations.

    or wrong HA policies in vmware land?
     
    Last edited: Apr 14, 2011
  11. one4spl

    one4spl Member

    Joined:
    Dec 9, 2005
    Messages:
    428
    Location:
    Jamboree Hts, Brisbane
    It may be a step backwards for applications that have been written from the beginning on platforms that were built to be highly avaialble, or applications that are highly available in and of themselves, but full (not parra-) virtualisation brings and extra 9 or two to applications that have no idea about how to do high availability by themselves.

    Google, Facebook, et al have figured out how to do highly available on commodity hardware by using software that's been built with high availability in mind. I doubt there's much future in custom hardware like NonStop - not that it's going away any time soon, as mentioned above its stupendously expensive to rebuild complex applications for a new platform. You'd want to have an exit strategy though, maintaining software on a dying platform also gets quite expensive after a while.
     
  12. OP
    OP
    Cape_Horn

    Cape_Horn Member

    Joined:
    Dec 23, 2001
    Messages:
    2,454
    Location:
    Shooting Baker
    Been there, had that.
    (Client - "what do you mean that three heart beats are not enough? so they all run on the same piece of copper/fibre, so what?")

    Or you have people trying to save money when buying kit,
    example
    one of our machines (3lpars, 8?wpars, nothing too big compared with what they want in the near future) has a psu die. all good, it has 4.
    so then we get the replacement and get told this by hardware support - "you are missing a piece, which means you need to power down the frame to replace a psu, oh, and you want to do that sooner rather than later, these two psu's power the top of the frame, and these two (indicating the broken one and the one next to it) power the bottom half, for them to share power you need this widget..."

    We ARE getting better at being able to do this, but we still get told off by the client for 1-4sec outages as we push the work from one system to another. yes, we need the apps better written, most of which should be classed as legacy, and were written by the client many years ago.
     
  13. Iceman

    Iceman Member

    Joined:
    Jun 27, 2001
    Messages:
    6,647
    Location:
    Brisbane (nth), Australia
    I'm curious, what client notices a 2 second outage and what is the cost of that outage? Are they running some high volume website where 2 seconds of downtime equates to 40,000 hits of missed ad impressions or something?
     
  14. OP
    OP
    Cape_Horn

    Cape_Horn Member

    Joined:
    Dec 23, 2001
    Messages:
    2,454
    Location:
    Shooting Baker
    Government Client with high expectations.
    it isn't the ad revenue, it is public expectation (so they tell me)
    Next statement - If you are an Australian, you deal with them. (Not Tax)
     
  15. Iceman

    Iceman Member

    Joined:
    Jun 27, 2001
    Messages:
    6,647
    Location:
    Brisbane (nth), Australia
    Oh right, it's a paper requirement not an actual one. "Website must have 6 nines of uptime, not 5 nines. We can't afford that 6.05 seconds's per week!"..

    Show me a government department where website downtime is legitimately their biggest concern :p
     
  16. DavidRa

    DavidRa Member

    Joined:
    Jun 8, 2002
    Messages:
    3,077
    Location:
    NSW Central Coast
    I think you will find they are attempting to use a measurement of availability to specify a desired level of resiliency to failure, and desired MTO/RTO times.

    Instead, they should actually be specifying those MTO and RTO times, and RPOs for data.
     
  17. OP
    OP
    Cape_Horn

    Cape_Horn Member

    Joined:
    Dec 23, 2001
    Messages:
    2,454
    Location:
    Shooting Baker
    It isn't a website (specifically) that they want the uptime on.
     

Share This Page

Advertisement: