Speaking strictly about the drive with serial number 5RX4060D --
The only anomaly shown here is a high number of CRC errors: 451 accumulated over the course of 1858 hours.
The SMART error log contains a count of 507 errors, but only has space to store the most 5 recent errors, so how long this issue has been going on is unknown. The most recent error occurred at 13787 power-on hours (which was roughly 1422 hours in the past from the time the SMART attribute snapshot was taken). Example:
Error 507 occurred at disk power-on lifetime: 13787 hours (574 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 37 98 c3 42 40 Error: ICRC, ABRT 55 sectors at LBA = 0x0042c398 = 4375448
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 ff b8 17 c3 42 40 00 00:20:02.829 READ DMA EXT
25 ff b8 57 c2 42 40 00 00:20:02.827 READ DMA EXT
25 ff b8 97 c1 42 40 00 00:20:02.826 READ DMA EXT
25 ff b8 d7 c0 42 40 00 00:20:02.825 READ DMA EXT
25 ff a8 27 c0 42 40 00 00:20:02.824 READ DMA EXT
This log entry indicates that the drive was in the process of handling 48-bit read I/O requests for a linear number of LBAs when the most recent request (at 00:20:02.829 timestamp (just an arbitrary number)) resulted in a protocol-level CRC error and returned ABRT status (indicated by the
Error: ICRC, ABRT
line). Your controller driver and/or the OS should have noticed this condition, as it was sent all the way back to the OS. The OS may have re-tried the read request with success.
The other 4 errors shown are the same type, but for different LBAs, which makes perfect sense given what CRC errors indicate.
Protocol CRC errors are the most difficult type of error to track down because there are many possibilities that could explain the issue. Some examples:
* Physical cabling issues (e.g. bad SATA cable), including cables with crappy shielding
* Dust or other such things within the SATA data connector (on the motherboard or on the drive itself), including a loose connection
* Physical damage to the SATA data connector (on the motherboard or on the disk PCB)
* Physical damage to the disk PCB, particularly near/around the SATA data connector, or traces between the data connector and the PCB's controller; this may also be the result of faulty manufacturing
* Physical damage to the motherboard, particularly near/around the SATA data connector, or traces between the data connector and the motherboard's SATA controller; this may also be the result of faulty manufacturing
* A system which is emitting excess interference/EMI, compounded by one of the above issues
This type of damage is often invisible to the naked eye. Usually what I recommend people do, and in this order, is:
1. Unplug the SATA cable from the motherboard and blow air into the SATA port on the mainboard, as well as around/at the end of the cable. Re-plug the cable and continue to use the system + watch for recurring errors.
2. If errors continue: unplug the SATA cable from the disk and blow air into the SATA port on the drive PCB, as well as around/at the end of the cable. Re-plug the cable and continue to use the system + watch for recurring errors.
3. If errors continue: replace the SATA cable entirely.
4. If errors continue: replace either the motherboard or the disk. (If you have a replacement disk PCB for the exact
model and revision and firmware of disk, you can try swapping that out instead).
5. If errors continue: same as #4 but replace whatever the opposing part is (e.g. if in #4 you replaced the disk, now try replacing the motherboard).
6. If errors continue: issue is almost certainly EMI-related, in which case I have no advice on how to troubleshoot this kind of issue.
The reason I recommend this fairly long and drawn out procedure is that it allows the person to figure out where the actual problem was. Most people I encounter just "replace the SATA cable" and report "the issue is gone! It must have been a bad cable!" which is incorrect/inconclusive -- it could have been dust in the port or a loose connection which could have been relieved through air or re-tightening. So which methodology you choose to follow is up to you, but keep an open and logical mind.
Understand that these are not
sector-level ECC errors (which some people like to erroneously call "CRC errors"), these are ATA protocol-level CRC errors. Think of it like an IP or TCP or UDP packet: if the packet checksum included in the packet does not match the calculated checksum when received, then that means data integrity can't be verified, hence error. This happens between the two SATA controllers (e.g. motherboard and disk, or HBA and disk).--
Making life hard for others since 1977.
I speak for myself and not my employer/affiliates of my employer.