  leibold Premium,MVM join:2002-07-09 Sunnyvale, CA clubs: 
| reply to galacticroot Re: [FreeBSD] Disk read corruption issues on server.
That is what I was looking for. It is always the same bit in a 128bit/16byte word (it would be an even larger address space if it wasn't for that one error at 0xb860592). As elegant as your xor trick is in highlighting the defect bit, it hides whether it is always the same kind of change (0 to 1 or 1 to 0) or if it is random (however my guess would be that it is always the same change). If the corruption was happening in a serial bus (such as the sata cables to your disk drives) or in a narrow parallel bus (e.g.: 32-bit PCI bus) then the defect would show up in other positions as well.
This is very typical for a single bad memory cell and it would have to be an area where you have a wide parallel bus (such as a dual-channel memory interface which is 128-bit wide) for it to be otherwise. However if it was the main memory interface or one of the cpu caches I would expect more serious problems in keeping the system running. I would also expect memtest86/memtest86+ to detect those errors.
My guess is either the memory on the raid controller or a harddisk cache memory chip (none of which can be tested with memtest). I don't think you will be able to further narrow it down without swapping parts.
P.S.: rereading your posts I don't see how I got the wrong impression on what your conclusions were. Sorry! -- Got some spare cpu cycles ? Join Team Helix or Team Starfire! |