Random Reboots (BSOD) on Windows 2003 R2 Server

Discussion in 'Business & Enterprise Computing' started by Smileyville, May 5, 2009.

  1. Smileyville

    Smileyville New Member

    Joined:
    May 5, 2009
    Messages:
    4
    I've been getting random BSOD on my Production server which is a:

    HP DL380G5
    2 Dual Core - Intel Xeon 5140@2.33 Ghz
    4 GB of Ram
    Windows 2003 R2 Server

    Applications: Domain Controller, Backup Exec 11d, BES Server (Integrated with Exchange 2007)


    Here are the errors,

    5/1/09 9:00:40 AM System Error, Event ID 1003

    Error code 0000000000000050, parameter1 fffffffffffffff4, parameter2 0000000000000000, parameter3 fffffadf8ffa7bce, parameter4 0000000000000000.

    5/1/09 11:46:45 AM System Error, Event ID 1003

    Error code 0000000000000109, parameter1 a3a03a389044c579, parameter2 0000000000000000, parameter3 344d331fc1d292e4, parameter4 0000000000000101.

    5/4/09 10:33:40 AM System Error, Event ID 1003

    Error code 0000000000000109, parameter1 a3a03a388fa3a509, parameter2 0000000000000000, parameter3 0f9ee1f26eefb8e5, parameter4 0000000000000101.

    4/27/09 10:26:57 AM System Error, Event ID 1003

    Error code 0000000000000050, parameter1 fffffffffffffff4, parameter2 0000000000000000, parameter3 fffffadf8ffa7bce, parameter4 0000000000000000.

    4/21/09 9:59:53 AM System Error, Event ID 1003

    Error code 00000000000000c2, parameter1 0000000000000007, parameter2 000000000000121a, parameter3 0000000000000000, parameter4 fffffadffde3c120.

    4/17/09 10:47:07 AM System Error, Event ID 1003

    Error code 00000000000000c5, parameter1 0000000000000008, parameter2 0000000000000002, parameter3 0000000000000001, parameter4 fffff800011a9b28.


    4/17/09 10:46:01 AM System Error, Event ID 1003

    Error code 00000000000000c2, parameter1 0000000000000007, parameter2 000000000000121a, parameter3 0000000000000000, parameter4 fffffadffe85e010.


    4/13/09 11:00:54 AM System Error, 1003

    Error code 00000000000000c2, parameter1 0000000000000007, parameter2 000000000000121a, parameter3 0000000000000000, parameter4 fffffadffe85e010.

    4/9/09 12:54:24 PM System Error, 1003

    Error code 00000000000000c2, parameter1 0000000000000007, parameter2 000000000000121a, parameter3 0000000000000000, parameter4 fffffadffe85e010.

    4/8/09 8:59:41 AM System Error, 1003

    Error code 00000000000000c2, parameter1 0000000000000007, parameter2 000000000000121a, parameter3 0000000000000000, parameter4 fffffadffe85e010.

    4/2/09 4:55:12 PM System Error, 1003

    Error code 000000000000000a, parameter1 0000000000000000, parameter2 0000000000000002, parameter3 0000000000000001, parameter4 fffff800010e9332.

    4/2/09 3:50:54 PM System Error, 1003

    Error code 00000000000000c2, parameter1 0000000000000007, parameter2 000000000000121a, parameter3 0000000000000000, parameter4 fffffadffe8da010.

    3/30/09 1:46:52 PM System Error, 1003

    Error code 00000000000000c5, parameter1 0000000000000008, parameter2 0000000000000002, parameter3 0000000000000001, parameter4 fffff800011a9b28



    Yes, it was stable for months, then just started happening, the Backup Application, the Quantum SuperLoader 3 tape Drive, Symantec Endpoint AV Client and BES were all installed from Day 1. Windows updates have been performed and updates to the BES application, but that is about it.

    I ran a MEM test for 4 passes, all came up clean and have continually been trying to link any changes or errors with the time it drops. But, come up empty handed.


    I have another DL380 G5 server purchased at the same time with the same configuration that is stable. It is running my Exchange 2007 environment. There is a slight difference in the hardware, the Exchange server has two controller cards and 6 drives compared to this systems 1 controller card and 2Hard Drives.


    Any Help is appreciated.
     
  2. scrantic

    scrantic Member

    Joined:
    Apr 8, 2002
    Messages:
    1,682
    Location:
    3350
    Have you installed the latest HP Driver support pack?
     
  3. Spingo

    Spingo Member

    Joined:
    Jun 28, 2001
    Messages:
    1,052
  4. Parra_Boy

    Parra_Boy Member

    Joined:
    Mar 10, 2002
    Messages:
    294
    Location:
    Sydney
  5. Ashpool

    Ashpool Member

    Joined:
    Feb 24, 2003
    Messages:
    3,352
    Location:
    Ye Olde Melbourne Town
    Latest HP support pack is the way to go. You will need to install it if you want to deal with HP tech anyway as this will be their first port of call.
     
  6. bcann

    bcann Member

    Joined:
    Feb 26, 2006
    Messages:
    5,089
    Location:
    NSW
    you need to supply more info, such as the name of the file that caused the BSOD for starters. But i'd us the MS debug tools, solved many a BSOD with them.
     
  7. mrpo

    mrpo Member

    Joined:
    Feb 24, 2005
    Messages:
    229
    Location:
    Oxford UK
    Couple of things you should do first if you think it is HP hardware related then do install the latest "ProLiant Support Pack" (PSP) but also the latest "Firmware Maintenance CD" for your server, this will update *all* firmwares on the machine. Recent HP versioning control when it comes to BIOS has gone to pot, make sure you check the date of the latest version in case it supercedes the firmware disk. You may as well reset the NVRAM for whatever good that might do, thats normally the next step if you log a call with HP.

    If that doesn't fix it then its Debug time :D
     
    Last edited: May 5, 2009
  8. OP
    OP
    Smileyville

    Smileyville New Member

    Joined:
    May 5, 2009
    Messages:
    4
    Thank you for all your suggestions. I'll answer what I can thus far.

    As far as the HP Support pack, been trying to apply the latest, and yes, it's for the 64-bit environment, but it always comes back to say Discovery Failed.

    I updated the video driver to, ATI ES1000 Driver. The BIOS is currently at HP P56 11/1/08, guess I'll be udpating that, thought I had already.

    I have been looking at the Mini Dumps using the Microsoft Windbg for x64. Unfortunatley, I'm not very good at reading these, and it varies a bit. I do have a couple I can attach if anyone wants to analyze them.

    I'll update the firmware and see how it goes. Seems to be that Symantec Endpoint can cause a crash though while doing a full-system scan, so something to throw in the mix.
     
  9. looktall

    looktall Working Class Doughnut

    Joined:
    Sep 17, 2001
    Messages:
    24,186
    well you've found the event id, i'm assuming from event viewer.
    if you look a bit closer you'll see an entry for the event source.

    searching the event id and source on www.eventid.net will probably go some way towards finding a cause and solution.
     
  10. rager

    rager Member

    Joined:
    Jun 28, 2001
    Messages:
    650
    Location:
    Melbourne
  11. OP
    OP
    Smileyville

    Smileyville New Member

    Joined:
    May 5, 2009
    Messages:
    4
    Thank you for your continued suggestions.

    All these events noted earlier in this post use the same Event ID and Source Code. I have looked at Eventid.net, in fact, usually one of my first stops. lol... However; the information is not very valuable for me anyway, for this particular instant because there are way too many variables.

    I ran the Firmware Maintenance CD and let it update the firmware. Things are so far functioning. I did take a look and attempt to apply the 1.82 (17 Apr 2009) Firmware Storage Update as recommended, but it actually didn't work properly and kept hanging and not apply. :Paranoid: Thankfully, the system is functional after this attempt.

    Since this is random, it's a waiting game right now. I'm also looking into updating the Backup Exec application from 11d to 12.5 to see if that helps as well.

    I'll keep you posted if I find the fix. If anyone else experienced the same issues and have done what I have so far (with everyones suggestions, thanks), please continue on this thread and I will try your suggestions.
     
  12. therazza

    therazza Member

    Joined:
    Aug 19, 2004
    Messages:
    1,384

    Mate do this, I've hit it twice and was about to post telling you to do this ASAP!
     
  13. OP
    OP
    Smileyville

    Smileyville New Member

    Joined:
    May 5, 2009
    Messages:
    4
    Been sitting tight since the firmware updates, but the server crashed again.

    System Error 102, Event ID 1003

    Error code 0000000000000024, parameter1 000000000019033d, parameter2 fffffadf89389530, parameter3 fffffadf89388f40, parameter4 fffffadf8ffa7ddc.


    I went to look at the Memory.dmp with WinDbg, but looks like something is messed up. I had moved the page file to another partition and accidentailly changed the path to the dump file, rebooted and then realized I did that and put it back to %SystemRoot%\MEMORY.DMP. Would that have messed up the dump file?

    So, thoughts on next steps? Any other suggestions on the HP Support Pac since the HP Update Manager always gets a "Discovery Failed". I know in older products, this utility worked great! Or am I simply executing the wrong file? Or, what do you recommend for the individual cp files?

    Thoughts?
     
  14. Kizuka

    Kizuka Member

    Joined:
    Sep 22, 2008
    Messages:
    249
    Location:
    SETI ALPHA FIVE!
    I know you said you ran memtest for 4 hours with no errors, but I've seen memtest run for 24 hours with no errors and still found the memory to have been the culprit.

    This server has been in production for how long with no errors and it's only now that it's become unstable? Latent firmware bugs? Can't say I've come across such things. I've seen bad firmware eat a RAID set up within hours/days of operation, but not after months and months of use (saying that, I've not had much experience with HP servers).

    Buy 4GB of new RAM imo...how much would that cost you? Admittedly going down the hardware path might mean you eventually have to start replacing motherboards, CPUs, controller cards etc, but as a cheap first option, replacing the RAM would be my first choice.
     
  15. looktall

    looktall Working Class Doughnut

    Joined:
    Sep 17, 2001
    Messages:
    24,186
    this is especially the case if you are testing multiple sticks at the same time.
    i've had numerous cases where i've tested multiple sticks of ram on a system and received no errors and then tested each of the same sticks individually and found one of them to be faulty.

    it can be a slow process to test them all individually and in some cases you're better off just replacing all of the ram to get the system up and running and then testing each stick later when you have time.
     
  16. Swathe

    Swathe (Banned or Deleted)

    Joined:
    Mar 23, 2007
    Messages:
    2,512
    Location:
    Rockhampton
    Even if they pass it's always good to have a spare set of ram for your machines imo.
     
  17. novakain

    novakain Member

    Joined:
    Mar 18, 2004
    Messages:
    33
    I had this exact same thing happen to me, BSOD's and all, with one of our servers. I found that applying the patches under Windows caused the system to hang, trying to boot from the maintenance CD, the CD would come up with a "discovery failed" error. I found that I had to run the updates from within Windows apart from the controller update, reboot the server to allow it to apply the firmware updates to the hard disks, then run the update on the disk controller from Windows after reboot.

    This was on a Proliant 385 G2 with a P400 RAID controller though.
     
  18. rager

    rager Member

    Joined:
    Jun 28, 2001
    Messages:
    650
    Location:
    Melbourne
    Sorry, I think you should actually have a P400 controller - not the e200.

    Check your System Management Homepage for hardware on the system, then verify what firmware versions need upgrading.

    On the HP Smartstart CD, there is a hardware diagnostics that can be pretty handy - it will help diagnose memory faults etc and you can boot off it to run it.

    There's basically 2 cd's you need to update everything on the server. HP Firmware update CD and the latest Proliant Support Pack. But some firmware and drivers are updated in between these CD versions.

    Failing all this, log a call with HP support and they can help guide you with what you should update.
     
  19. spokeydokey

    spokeydokey Member

    Joined:
    Jan 29, 2004
    Messages:
    275
    One possibility may be a processor mismatch.

    I've been told recently of a problem with mismatched processors, the stepping code in particular. The processors can be virtually identical except for stepping codes and this will cause problems such as seen in the link below;

    http://social.answers.microsoft.com.../thread/5c2f8f77-e014-4ced-9b47-320b77a19f8b/

    The processors themselves will have a code such as SLABN (core stepping B2) or SLAGB ( core stepping G0). The link below shows the details for the SLABN;

    http://processorfinder.intel.com/details.aspx?sSpec=SLABN

    Also, HP Firmware Maintenance is now up to 8.5 (since somewhere in May I believe).

    I hope this helps.

    Edit: Installing the lower stepping code revision processor in processor slot 1 and the higher revision in slot 2 may be a way around this;

    http://h20000.www2.hp.com/bizsuppor...15351&prodSeriesId=1121516&objectID=c01271775
     
    Last edited: Aug 17, 2009

Share This Page