site Search:


 
    All Forums Hot Topics Gallery






how-to block ads


 
Search Topic:
Share Topic
Posting?
Post a:
Post a:
Links: ·Forum FAQ ·Attitude Adjustment ·Linux docs ·DistroWatch ·OPLM ·FreeBSD Handbook
AuthorAll Replies


leibold
Premium,MVM
join:2002-07-09
Sunnyvale, CA
kudos:2
Reviews:
·SONIC.NET
·Pacific Bell - SBC

reply to galacticroot

Re: [FreeBSD] Disk read corruption issues on server.

That is what I was looking for. It is always the same bit in a 128bit/16byte word (it would be an even larger address space if it wasn't for that one error at 0xb860592). As elegant as your xor trick is in highlighting the defect bit, it hides whether it is always the same kind of change (0 to 1 or 1 to 0) or if it is random (however my guess would be that it is always the same change). If the corruption was happening in a serial bus (such as the sata cables to your disk drives) or in a narrow parallel bus (e.g.: 32-bit PCI bus) then the defect would show up in other positions as well.

This is very typical for a single bad memory cell and it would have to be an area where you have a wide parallel bus (such as a dual-channel memory interface which is 128-bit wide) for it to be otherwise. However if it was the main memory interface or one of the cpu caches I would expect more serious problems in keeping the system running. I would also expect memtest86/memtest86+ to detect those errors.

My guess is either the memory on the raid controller or a harddisk cache memory chip (none of which can be tested with memtest). I don't think you will be able to further narrow it down without swapping parts.

P.S.: rereading your posts I don't see how I got the wrong impression on what your conclusions were. Sorry!
--
Got some spare cpu cycles ? Join Team Helix or Team Starfire!

Sunday, 12-Feb 19:28:09 Terms of Use & Privacy | feedback | contact | Hosting by nac.net - DSL,Hosting & Co-lo
over 12.5 years online! © 1999-2012 dslreports.com.
Most commented news this week
Hot Topics