HP Blade 460c G1 Correctable Error Theshold
So Onboard Administration has been reporting Memory errors in the IML Log. At first we have 2 DIMMs reporting errors. DIMM 2 was reporting 2 correctable and 2 uncorrectable, DIMM 4 was reporting 2 correctable errors.
Called HP and got 2 "new" memory sticks, replaced them and loaded up HP Insight Diag again just to find out that errors still being reported; however, the errors had different counts, DIMM 2 had 2 correctable errors and DIMM 4 had 1 correctable error.
I called HP again and they sent a tech to replace the system board, after it was replaced and ran insight diag again, errors were still being reported. I once again called HP and they sent new memory sticks again.
Got the new sticks today and they were installed. This time after running insight diag only 1 of the DIMMs report 1 correctable error.
Now, blade has been down since Monday now. All my VMs are confortably running on 2 other blades. So i am in no rush to get this blade running and can afford having the blade down.
So my question is do i really need to bother with this kind of errors? Is it really that bad?
Technically, ECC should solve any memory corruption issues. In reality, what I've seen are odd slowdowns and errors on systems that run with ECC errors for any appreciable length of time.
With where you're at, I'd check the firmware, swap for known good parts if available, then phone the vendor (edit: and escalate).
|reply to PToN |
did you replace all of the ram?
|reply to PToN |
No. i am still with 1 DIMM reporting errors.
My HP support contract was all wacky and had to get it fixed first. Dont ask me how that happened, the chassis, switches, etc were in contract, everything except for the blades...
Anyways. I am just gonna keep ordering replacement parts till all errors go away. It took 3 times before we got the errors from 2 DIMMs to just 1 DIMM.
This may be obvious to some but it never hurts to throw it out there since you never know who is reading these threads. Have you tried swapping DIMMS around? See if the error counts follow the DIMM or stay with the socket. If the board has been replaced I would think and hope any problem with that has been corrected by now but you never know. After all, given you've had the memory replaced twice and the board once, I'd have thought the errors would be gone already.
I'm curious, did the "new" DIMMS start at 0 on the error counts?
your moderator at work|
|reply to JoelC707 |
Re: HP Blade 460c G1 Correctable Error Theshold
No the new DIMMs showed errors as well. I did put the DIMMs into other slots and errors did follow the fault ones.
does swapping cpu sockets do any thing?