1. OCAU Merchandise now available! Check out our 20th Anniversary Mugs, Classic Logo Shirts and much more! Discussion here.
    Dismiss Notice

My VmWare nightmare, a warning ...

Discussion in 'Business & Enterprise Computing' started by Nyarghnia, Oct 14, 2010.

  1. Nyarghnia

    Nyarghnia (Taking a Break)

    Joined:
    Aug 5, 2008
    Messages:
    1,274
    Hi all,

    I'm going to post this summary of our VmWare nightmare that I have been living ever since the business decided to go this way...

    Just for starters, we're running brand new HP gear, the latest VSphere 4.1 Essentials Plus with VDR, a SAN, all brand new, even new switch gear, the lot.. a ton of money to replace a couple of racks full of various bits of gear.

    Sadly to say, VmWare has turned out to be less than spectacaular, despite running top of the shelf disk subsystem, hosts with multiple CPUS and tons of RAM, all the required number of NIC's and Controllers for storage, a high capacity fabric...

    I've only got about a dozen VM's too at present, the environment should be barey breaking a sweat, we got in some 'experts' who are VmWare certified to get this up and running for us.

    1. Performance is slow.. disk I/O seems to struggle, I can't figure out why.. the hardware isn't being stressed, I can take the same hardware, put somethig else on it and get 4 or 5 times the I/O throughput.

    2. VDR... complete and utter rubbish, can't seem to back things up, consumes vast numbers of resources and generates numerous warnings, then it randomly decides to either freeze, leave snapshots all over the place or decides that the backup repository is corrupt.

    3. Backups... oh my god... backups.. we've tried VDR, vRangerPro and now VEEAM, every single time you run a backup job, random VM's just fail to backup.. and hell I don't even have any serious apps running yet (except Exchange on one VM) and it appears to be COMPLETELY RANDOM as to which VM's decide that today they're not going to be backed up.

    4. Daily phone calls to the 'experts' result in me spending more and more of my own time trying to sort this shit out, now the phone calls are about 'crisis meetings' and 'definitions of scope' etc etc.

    In short, this project should have been a walk in the park, brand new certified hardware, a solutions provider with a good track record, people with all the appropriate certifications, plenty of resources, modest computing requirements, instead i'm weeks behind, the solution provider has had to escelate this to their senior internal managements, e-mails between us are now CC'ed to all sorts of 'senior' people.

    I've now got an environment I don't trust, how the hell can I migrate critical apps onto this infrastructure? the simple answer is that I can't, I'm now trying to work out how to V2P my exchange box.

    Motto of the story... Don't buy into the hype, the presentations or testimonials...

    Set up a test lab, or go hire some Data Center space for a week, put VmWare, Xen and Integrity side by side and put them through their paces...
    test your recovery strategy then decide, don't rely solely on scope of works, budgets and design documents..

    Actually get the technologies running side by side and then use that in conjunction with budgets, scope documents and such.

    I've learned my lesson, the hard way... some can flame me all they like but I think it's a lesson that's best imparted to others who may be about to embark on the same road.

    If this post gives you pause about leaping onto the VmWare band-wagon... possibly at the expense of making me look even more stupid than I already look... then i'll take the hit if it makes you look twice before leaping.

    -NyarghNia
     
  2. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    41,802
    Location:
    Brisbane
    This mirrors my experiences almost to the letter (and oddly enough, also on HP gear).

    VMWare is relatively simple to roll out for small locations that don't require any sort of decent performance. Past that (and for any average enterprise), it's utter rubbish.

    It's clear they're surviving on CxO mind share and nothing else. Given that there are now at least 3 other commercial hypervisors out there that are enterprise ready and all outperform VMWare by several orders of magnitude (particularly under high load and density), VMWare really need to pull their fingers out and soon. They've survive thus far on their name and an easy to use GUI, but that's going to change very rapidly once people figure out they have better choices that cost less and perform better.
     
  3. DavidRa

    DavidRa Member

    Joined:
    Jun 8, 2002
    Messages:
    3,069
    Location:
    NSW Central Coast
    Wow, that really sucks. I'm no VMWare expert - sure I've had some experience - but I know it's generally a pretty good product. One of our sister companies is in the VMWare consulting biz, I'd sure like to know if it's them ...

    Are you using the same SAN LUNs when you do your performance tests? Are all the tests done under the same conditions (e.g. MPIO)? Are all the VMs running with the VM additions? I'm assuming you've done all the normal stuff - BIOSes, drivers ...

    Can I ask which storage provider you're working with? FC, iSCSI or NFS storage?

    Are you at least getting the CPU performance you expect? Had a chance to talk to VMWare themselves?

    Edit: I've seen multiple ESX environments. One (IBM kit, local storage) performed poorly. The other much larger deployment (HP kit, IBM SAN) was reasonably snappy. Funnily enough the latter has just been migrated to the Blue camp...
     
    Last edited: Oct 14, 2010
  4. Skitza

    Skitza Member

    Joined:
    Jun 28, 2001
    Messages:
    3,764
    Location:
    In your street
    Here we go :)

    Sorry to hear you are having so many issues, I can honestly say everything runs pretty peachy here but I'm still 3.5 but planning to move to 4.1 very soon just for some better iSCSI support.
     
  5. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    41,802
    Location:
    Brisbane
    One thing we noticed after many months was that VMWare for some reason was doing 32MB (yes, MegaByte) I/Os back to our SAN.

    Once 600+ production VMs across 100+ LUNs all started picking up steam, the SAN nearly shat itself.

    There's work on now to fix that (bringing it down to 128KB I/Os), but it's those sorts of stupid things that make you wonder why the hell such defaults exist.

    And for the record, even post fix performance still sucks. RHEV on the same hardware runs almost three times the maximum workload of VMWare without breaking a sweat, and costs roughly 1/3 the price. What exactly VMWare's selling point is, I have no idea (particularly when RHEV has a pretty GUI, identical features, and a much better hardware compatibility list).

    What size setup do you have? Often I find small shops run it without issue. Again, try to scale it past a given point, and watch it shit itself.
     
  6. DavidRa

    DavidRa Member

    Joined:
    Jun 8, 2002
    Messages:
    3,069
    Location:
    NSW Central Coast
    Hey elvis just out of interest - is RHEV full stack virtualisation (run a full OS per VM) or is it more like a BSD jail (where the services are isolated but run on bare metal)? Or something in between?
     
  7. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    41,802
    Location:
    Brisbane
    Full virtualisation, using KVM. (KVM is available in many Linux distros, and is also commercially supported in Ubuntu 10.04LTS and newer, and SLES11 and newer).

    And Windows is 100% supported as a guest OS on RHEV by both Red Hat and Microsoft.

    Indeed, even as a 100% Microsoft shop you could run RHEV for virtualisation, and all-Windows VMs if you wanted. You'd save 70% compared to VMWare for licensing, and need 1/2 the infrastructure for the same workload.
     
    Last edited: Oct 14, 2010
  8. gwills

    gwills Member

    Joined:
    Jan 14, 2005
    Messages:
    410
    Location:
    Melbourne
    yeah Vmware always sounds good but we had a smilar experience , VDR is terrible we only use it at remote sites as a fail safe if a machine gets corrupted , we back up the file system separately

    You may have misalignment on your LUNS which can cause performance issues . We have this problem at the moment and as you are using top shelf no expense shared hardware
     
  9. DavidRa

    DavidRa Member

    Joined:
    Jun 8, 2002
    Messages:
    3,069
    Location:
    NSW Central Coast
    Nyarghnia ... the other question to ask is whether your guests are 32-bit or 64-bit - I found in the past that 32-bit performed OK but 64-bit was beyond the pale. Are they all Windows or a mixed environment?
     
  10. Oxley

    Oxley Member

    Joined:
    May 31, 2006
    Messages:
    484
    Location:
    Old Bar NSW
    Please don't, your scaring me!

    We about to virtualise here and we are going with VMWare.

    But all databases and file sharing will stay physical with backups to VM images as part of the DR plan, but all non critical systems such as WSUS, AV, remote access etc will be P2V'd.
    Our main reasoning is if the fit really hits the shan (and we are a very budget/ghetto set-up here, we have systems that are so old and one of a kind, all the support people have retired ) as long we can get to the VDMK files, last resort is a PC with VMWare Player, which is something Citrix doesn’t have.
    We didn't look seriously at any other providers due to lack of local support, or vendors with real world or similar industry experience, or who’s name was Bubba but kept telling me it was Ben Dover, yes $5k just to talk to us and for a demonstration is rude.
     
  11. Brad2k4

    Brad2k4 Member

    Joined:
    Aug 6, 2004
    Messages:
    152
    Location:
    Rockhampton
    +1
    Except we are going all VM's besides our backup server which will also run vcenter. Arggghhh!
     
  12. 4wardtristan

    4wardtristan Member

    Joined:
    Apr 9, 2008
    Messages:
    1,181
    Location:
    brisbane
    we are running....3 x decent-ish hosts off of a ds3400 w/3 disk shelves and it is running quite nicely. definitely none of this performance issue jazz. this is esx 3.5 BTW

    however, citrix xenserver enterprise at 98$/server/month, unlimited sockets and VM's is quite tasty.........
     
  13. Skitza

    Skitza Member

    Joined:
    Jun 28, 2001
    Messages:
    3,764
    Location:
    In your street
    Our's is small no doubt with around 20 vm's spread over two Dell's with local storage but we have Exchange/SQL/Sharepoint/Terminal Servers/Fileshares/AV/Everything else all virtual and it runs fine. The boxes aren't stressed either. Would love to see get my hands on a bigger proper setup to see where it's all failing and what people call a failure.
     
  14. Falkor

    Falkor Member

    Joined:
    Jun 27, 2001
    Messages:
    4,060
    Location:
    Sydney
    Do you have more info?

    How many hosts are we talking? What specs are they? how many guests? 32 bit or 64 bit?

    What storage are you running? FC or iSCSI? Etc.

    Interested to see all your specs.

    Interesting that you put exchange in VMWare as well, I don't know of many people who recommend that. Not saying its not possible, just I don't think many people do it.
     
  15. Nikoy

    Nikoy Member

    Joined:
    Mar 10, 2004
    Messages:
    2,972
    Location:
    Perth WA
    I have a dell EMC an with 2 disk shelfs Dual FC switches. I have about 28 Virtual servers running on 3 x dell servers. 2 old with 32 gb of ram and xeon processors apporx 2.5 years old and one new with 64 GB of ram anddual xeons. I am getting no performace issues. I also have cisco switches with 10GB back plane with 6-8 nics per host. I am running SQL, Exchange and fileshares most of the environment on it. running VMware 4.1 doing backups via backup exec.

    All is running to scope.
     
  16. Jimoin

    Jimoin Member

    Joined:
    Jul 26, 2002
    Messages:
    579
    Location:
    Melbourne
    Look, I'm not vmWare fanboy or anything but I'd just like to post some balance here.

    We're migrating to vSphere 4.1 right now and have gone through a reasonable testing phase before even starting the migration.

    We're on IBM gear in a H series blade chassis. We're moving from old HS20's to HS22's and using a DS4300 (still) with only 28 disks populated over 2 shelves.

    I'm the first to admit our resource use is quite low, but we run Oracle, SQL & Exchange all from the SAN and in virtually all of our testing there was no large discrepancy between running in a VM vs base hardware and in some instances an improvement for I/O.

    No, vmWare is not for everyone, other technologies are superior in a lot of ways, but for some companies, vmWare is the easy choice much like Windows was quite a few years ago.

    I'm mostly shocked to hear that you have actually forked out $$ before getting this sort of problem out of the way? I mean hardware - you had to buy anyway, but paying for software? Crazy.

    Edit: Seems a few positive posts above mine since I started writing.
     
    Last edited: Oct 14, 2010
  17. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    41,802
    Location:
    Brisbane
    This is what I'm talking about. These are tiny rollouts.

    We're at over 40 physical hosts (more on the way), and well over 700 guests (growing at about 1 per business day), all spread across 3 SANs with 6-8 shelves per SAN.

    At that size, VMWare utterly craps itself. We're now at a point where we're forced to go 10GbE (HP Flex10) just to give enough dedicated bandwidth to VMWare's heartbeat! I've never in my life seen a clustered system that needed that sort of spec just for heartbeat.

    Our issues started at about half this size, and due to the rapid growth of the business have not subsided at all.

    What's worth noting though is that both HP-UX Integrity Virtual Machine and RHEV are running in the data centre, and both are scaling without a problem.

    Sadly we have some die-hard pro-VMWare guys in the team, and along with them management who live by the "it's the most popular so we'll use it" mantra. Thanks to them we're haemorrhaging cash on consultants and support trying to keep the thing stable enough to get through a production week without major incidents.

    Within the Linux/UNIX team (that I run), we've managed to get most of the important stuff off VMWare. Our critical incidents since doing so are zero (compared to the dozens in the same time over in VMWare/Windows land), yet for some reason the powers that be can't seem to correlate the stats with the choices of platform. Instead they blame all sorts of other things. One middle manager has blamed "unnecessary large file copies", and has now banned all copies over 500MB in size during production hours! Yes, it really is getting that ridiculous (particularly when multi-TB copies between UNIX boxes don't even cause a blip on our radar, but he doesn't seem to want to listen to that).

    I swear that management ignorance when it comes to brand names is getting worse as time goes on.

    I personally have chosen to pull right out of supporting anything within the business related to VMWare, much to the pain of my manager. I've told him that as long as he continues to choose software that is clearly substandard, he can pay a third party to struggle with fixes. The proof is in the putting, when we're spending a fraction of the dollars on both licensing and support of alternative platforms with zero issues while other teams blow millions out of our budgets paying for and supporting useless software that causes more business-affecting issues than it solves.

    I'd have some sympathy if it was a business-critical or business-specific piece of software, but it isn't. VMWare is utterly commodity and can be replaced by no less than three alternative technologies from dozens of other companies. When you look at it that way, there's little argument left (other than close mindedness, incompetence or laziness) to stick with it at our infrastructure size.
     
  18. Iceman

    Iceman Member

    Joined:
    Jun 27, 2001
    Messages:
    6,647
    Location:
    Brisbane (nth), Australia
    That's a serious configuration. Which VMware specialist is responsible for the design of this beast that's shitting itself? Do any of their other similar size sites have the same issue?

    That doesn't sound right for a heartbeat. What do VMware say about this?
     
  19. samus

    samus Member

    Joined:
    Jun 3, 2002
    Messages:
    1,264
    Location:
    Baulkham Hills, Sydney.
    Ill add another positive VMWare experience:

    I run a very small shop, only 60 or so staff, and I use IBM kit, as some of you know. Using VSphere 4.1 and ESX 4.1. Running exchange, AD, SQL, all on VM's and performance is fine. This is Fibre channel onto a DS3400, with all slots used, so 12 drives. Backups are onto a IBM tape library.

    I think if we where any bigger, my setup would not work at all. Though in a few years, I won't be renewing my vmware licensing.
     
  20. elvis

    elvis Old school old fool

    Joined:
    Jun 27, 2001
    Messages:
    41,802
    Location:
    Brisbane
    Long story, and not one I'm going to repeat on a public forum.

    All I'll say on the matter is that the individual who "designed" (and I use that term lightly) the original configuration has since left (thank God).

    The consultancy mob who have since come on to help gain most of their clients through very dodgy and borderline corrupt methods. I have zero faith in any of them, and quite frankly they're a bunch of snakes.

    We get told by the consultants that we're the odd one out. Yet everyone I speak to falls into one of two categories:

    1) Small setup that works well

    2) Large setup that works poorly.

    I've yet to talk to anyone in person who has a setup as large as ours that works as advertised.

    VMWare recommendations for our size rollout are sparse at best. You would think that they of all people would have tested this size of infrastructure, but despite asking until I'm blue in the face, I've not seen a single whitepaper, best practice guide, or even KB article that details the sort of rollout we're dealing with.

    At best we get random "experts" talk to us (usually once, and then they're never heard from again despite numerous emails and phone calls) who recommend different things. These "experts" get introduced as "the smartest person at VMWare", or "the best tech on the ground" and other nonsense. These guys come in all arrogant and smug, scoff at our setup, make one or two changes (which 3 out of 4 times now have taken down production) and then leave with their tail between their legs.

    No doubt my frustration in these posts and others is becoming quite apparent, but as you can tell it's been 18 months of hell for me dealing with this crap. This is software I didn't choose, I didn't design, and I don't want. Yet somehow it's on my shoulders not only to support it, but to add salt to the wounds I'm supposed to remain positive and chipper about the whole thing to the rest of the business.

    FWIW, I've tried quitting twice now. Typed letter handed to senior management both times. Both times I've been dragged back into CxO offices for "emergency meetings" where they scramble to make any change I want to keep me. As of yet, getting rid of VMWare is the last remaining condition. In all fairness, we've got 18 months left on our current enterprise license. The CIO has made it well known he wants full proof-of-concept RHEV and MS Hyper-V environments running by the end of the year (which we've completed for RHEV). That in itself should give some indication of what's in store for VMWare at our location.

    I agree. Both our consultants and VMWare tell us we're one of the biggest locations, but I don't believe that at all. zVM can handle thousands of systems on a single box. How the hell is 700 VMs "large"? And how the hell does the whole thing choke at such average workloads? Utterly pathetic.
     
    Last edited: Oct 14, 2010

Share This Page

Advertisement: