dslreports logo
 
    All Forums Hot Topics Gallery
spc
uniqs
14511

trparky
Premium Member
join:2000-05-24
Cleveland, OH
·AT&T U-Verse

trparky

Premium Member

Buggy Intel SSD 520 Series SSD Firmware

According to this thread, users have been experiencing the SSD dropping from the SATA channel.

According to this guy...
said by sandfarce :
Now users are having major issues with the failure of the SSD's SATA controller to properly respond to ATA Sleep/Wakeup and SATA Disconnect/Reconnect events. The result is bad context restoration and corruption of the drive state information resulting in a drive locked in a "panic mode" where the controller will fail to reset and the drive will no longer appear as a SATA device to the host.
After months of working fine, the SSD started dropping from the SATA channel at random. Only a cold boot would fix the issue and Intel doesn't at all want to do anything for me regarding the SSD.

Ended up running to Microcenter to grab a Samsung 840 Series SSD to replace it.
trparky

2 edits

trparky

Premium Member

At this point, my suggestion (in my opinion) is to avoid any SSD that uses a Sandforce controller since the Sandforce controller has been plagued with firmware bugs for months.

Samsung, who recently released their new 840 and 840 Pro Series of SSDs runs their own firmware, their own in-house developed SSD controller, and their own in-house manufactured NAND Flash Chips have been showing great promise.

Myself, I got a 840 Series (not the Pro) with the TLC (Triple Level Cell) NAND and I hardly see the difference in performance.

I copied a file from of my HDDs over to the SSD, a rather large file, and write speeds were 130 MB/s according to Windows Explorer's File Copy Dialog. Decent performance if you ask me.

No, I don't work for Samsung!!!!
Chrno
join:2003-12-11

2 edits

Chrno to trparky

Member

to trparky
Personally, I wouldn't worry about it too much. The guy in the post sounds like he knows what he is talking about but if you read between the lines, he doesn't have a clue.

I mean if your stuff is mission critical, you should have a backup structure in place to handle backups but look what he is saying in the post. Not to mention it sounds like he is using these drives in an enterprise environment which isn't this drive's target audience. The Sandforce controller does real time compression so I wouldn't be surprised the data on the LBAs to not correspond to what they would look like without compression. I am guessing the controller uses something similar to the LZW compression algorithm which is lossless.

Case in point, don't leave anything on any storage drive that you can't live without without a backup. So backup, backup and backup your data.

trparky
Premium Member
join:2000-05-24
Cleveland, OH
·AT&T U-Verse

trparky

Premium Member

In my case, the drive itself was indeed dropping from the SATA channel.

For instance... this morning I awoke to find my computer with a black screen and a frightening error message. "No Boot Device Found"

So I pressed Control-Alt-Delete and went into BIOS, sure enough, the Intel SSD was completely MIA. Gone. Nowhere to be found. As far as the motherboard was concerned, it wasn't connected to the SATA port.

Only a cold boot would bring the SSD back from where ever it went.

Then, it happened a second time. I was browsing this site, I went to go make myself another cup of coffee this morning. Came back to the same "No Boot Device Found" error message.
trparky

trparky

Premium Member

So I did some troubleshooting. I moved the SSD from SATA port 0 to SATA port 1 which SATA port 1 had a Western Digital Black 2 TB HDD on it which I've never had a problem with at all on SATA port 1.

So simply put...
The SSD went from port 0 to port 1.
The HDD went from port 1 to port 0.
Basically a port swap and a cable swap too.

Same ugly "No Boot Device Found" error came up again.
Chrno
join:2003-12-11

Chrno

Member

Exercise your right to claim warranty, you have 5 years for the 520 series so use it wisely.

aurgathor
join:2002-12-01
Lynnwood, WA

aurgathor to trparky

Member

to trparky
said by trparky:

According to this thread, users have been experiencing the SSD dropping from the SATA channel.

According to this guy...

said by sandfarce :
Now users are having major issues with the failure of the SSD's SATA controller to properly respond to ATA Sleep/Wakeup and SATA Disconnect/Reconnect events. The result is bad context restoration and corruption of the drive state information resulting in a drive locked in a "panic mode" where the controller will fail to reset and the drive will no longer appear as a SATA device to the host.
After months of working fine, the SSD started dropping from the SATA channel at random. Only a cold boot would fix the issue and Intel doesn't at all want to do anything for me regarding the SSD.

For what it worth, the exact same thing was happening to my external 2 TB Hitachi drive over the weekend when I was trying to move stuff out of it.

As far as I know it didn't result in corruption, but I had to powercycle said drive several times.

trparky
Premium Member
join:2000-05-24
Cleveland, OH
·AT&T U-Verse

trparky to Chrno

Premium Member

to Chrno
said by Chrno:

Exercise your right to claim warranty, you have 5 years for the 520 series so use it wisely.

Intel told me, because the drive was an OEM drive, this drive didn't have a warranty on it. I didn't know at the time it was an OEM drive when I bought it so no warranty for me.

Luckily, I had chosen to buy a SquareTrade warranty on it so I'm going to be sending the drive off to them for a claim on the warranty.

rusdi
American V
MVM
join:2001-04-28
Flippin, AR

rusdi to trparky

MVM

to trparky
I have a Mushkin 240GB, (non-Deluxe) SATAIII that has the Sandforce 2281 controller. It's in AHCI single disk mode. Been running great for over a year now 24/7/365. Only restarted after power outage, or shut down for cleaning.

I have seen articles that recommend SSDs NOT be put into hibernation, or sleep, but I wasn't sure why.
Now, I think I know.

Sorry to hear about your bad experience with the Intel SSD. It may be in your best interest to disable "hibernate/sleep" with this drive. Might try it & see if this behavior stops.
Good luck. Hope you can resolve this issue!

trparky
Premium Member
join:2000-05-24
Cleveland, OH

trparky

Premium Member

It's a desktop so it never was hibernated or put to sleep.

rusdi
American V
MVM
join:2001-04-28
Flippin, AR

rusdi

MVM

said by trparky:

It's a desktop so it never was hibernated or put to sleep.

Oops, my mistake. I misread your OP.

pnjunction
Teksavvy Extreme
Premium Member
join:2008-01-24
Toronto, ON

1 edit

pnjunction to trparky

Premium Member

to trparky
said by trparky:

At this point, my suggestion (in my opinion) is to avoid any SSD that uses a Sandforce controller since the Sandforce controller has been plagued with firmware bugs for months.

Might be the thing to do. I thought Intel spent months testing their firmware for these things too. IIRC they were credited with finding bugs in it.

Myself though I have an OCZ vertex 2 and vertex 3 with sandforce and no problems yet. *knock on wood*

Krisnatharok
PC Builder, Gamer
Premium Member
join:2009-02-11
Earth Orbit

Krisnatharok

Premium Member

My Vertex 2 is still humming along nicely as well, but I think for my next drive, I will get either the OCZ Vector or the Samsung 840 Pro.

trparky
Premium Member
join:2000-05-24
Cleveland, OH

trparky

Premium Member

I have the 840 Series (not the Pro) since it was cheaper than the 840 Pro. The guy I bought it off of at Microcenter said that I wouldn't notice the performance difference. He was right.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

2 recommendations

koitsu to trparky

MVM

to trparky
I'm confused, so help me understand. Quote:
said by trparky:

It's a desktop so it never was hibernated or put to sleep.

said by trparky:

According to this guy...

said by sandfarce :

Now users are having major issues with the failure of the SSD's SATA controller to properly respond to ATA Sleep/Wakeup and SATA Disconnect/Reconnect events. The result is bad context restoration and corruption of the drive state information resulting in a drive locked in a "panic mode" where the controller will fail to reset and the drive will no longer appear as a SATA device to the host.

Can you explain how you determined your SSD going bad was absolutely hands down the result of what this random Internet jhonka posted on Intel's forum, and how that determination exclusively caused you to (effectively) boycott a brand? How many drives have you experienced this issue with? (Heck, how many drives has that random Internet jhonka experienced his oddity with? He says "many", but I want all the low-level high-detailed technical specifics). How do you know the power circuitry on the SSD didn't die?

I can point you to lots of SATA controllers (on motherboards, for example), that have all sorts of lovely silicon-level bugs in them (meaning there's no way to fix them, only silly software workarounds at the ATA level). Silicon Image's 3112 and 3512 are two that come to mind. I believe the 3114 also had some bugs but I forget which.

But those are actual chip bugs, while actual SATA drives (of any kind) falling off the bus can happen for a multitude of reasons (MHDD or SSD). For example, a beautiful one that some DSLR folks found/dug up (disks falling off the bus when used in a RAID array) was caused by a buggy Intel MatrixRAID driver (since fixed in the Intel RST drivers). Imagine if someone had said "Yeah! Screw these Seagate drives I got! They keep falling off the bus!" only a few months later to find out it was a driver-level issue.

That said, I cannot confirm nor deny that there may be issues with ATA-level power mangement with drives (of all kinds -- MHDDs or SSDs). PM in general is one of those "ehhhhh.... sure hope it works" technologies. It's also important to understand that AHCI has its own power management that's fully separate from ATA's PM -- and I believe there are some cases where some SATA controllers in AHCI mode do not do PM correctly (which is one of the reasons why on FreeBSD you can adjust the PM capability -- and be sure to read the last line in that section too, heh heh heh...).

We all know that Intel's 320-series drives did have a firmware-level problem associated with physical power being lost (not sleep/standby, but actual power loss) + power being restored, where the drive might power up in a state (permanently from that point forward) where it reported an 8MByte capacity and all underlying data on the drive was lost. But that particular problem, as I described, could only happen during full power-down and full power-up.

I can talk at length about this subject, all the way down to ATA CDBs if need be and how drives have to have capacitors on them to continue to provide power just long enough to handle queued CDBs which didn't get flushed to the platters (or NAND flash) between the short period of time where the OS submit the CDB and the drive lost power.

So like I said: if "sandfarce" (the fact he called himself that already indicates his motive, sigh) has the low-level technical details, I want to read them. All of them. And I am certain Intel will want to read them as well. Throwing a tantrum (not you, him) on a forum is not the proper way to go about getting something looked at.

trparky
Premium Member
join:2000-05-24
Cleveland, OH
·AT&T U-Verse

trparky

Premium Member

Oh god, I think you're the one who works for Intel. Crap.

Narrowing down the fact that I believe that the SSD is the part that's failing is that none of the other devices that are connected to the same SATA ports and controller are having issues.

I have two HDDs and one SSD that are connected to the Intel SATA controller that's on my motherboard.

I went through the process of determining if the port is possibly what went bad. I tried the SSD on SATA Port 1 which another drive that has been known to not have any issues in this system of mine has been connected to for months with no issues. I plugged the SSD into that port, SATA Port 1, and that HDD into SATA Port 0 which effectively not only swapped the ports but also the SATA cables themselves. This revealed no change, I still had the SSD drop off the bus.
trparky

trparky

Premium Member

I did do some extra investigation into the issue that I was having, specifically reading the Windows Event Log.

OK, I'm give some information about my setup that I have along with the scheduled TRIM that I do nightly on the SSD. I have a Windows Scheduled Task scheduled to perform a TRIM on the SSD on a nightly basis at 3 AM in the morning. Those mornings that I found my computer sitting at the "No Boot Device Found" error had no Windows Event Log entries shortly after 3:05 AM. So that may indicate that the SSD failed, thus dropped off of the bus, shortly after the TRIM command was issued to the SSD.

This morning, I deleted almost 10 GBs of data off the SSD. Mainly temporary files, files that I downloaded and already did something with, etc. You know the kinds of files.

After I deleted those files I ran a manual TRIM via the Windows 8 "Defragment and Optimize Drives" tool. I then went to go get myself another cup of coffee, I then came back to my machine sitting at the "No Boot Device Found" error message.
trparky

trparky

Premium Member

It is interesting that the SSD failed and dropped off of the bus after I issued it a TRIM command. So that may indicate that something occurred in the SSD when the TRIM command was issued which triggered the SSD's controller chip to malfunction in some way which then caused the SSD to drop off of the bus.

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu to trparky

MVM

to trparky
said by trparky:

Oh god, I think you're the one who works for Intel. Crap.

I've never worked for Intel directly or indirectly. My CV/resume is on my home page (see profile) if you want to know where I've worked. And my forum signature here applies universally even if I did work for Intel (though if I did, I would not have touched this topic with a ten foot pole; anyone who has worked for enterprise corps knows you avoid commenting on your own company's products if you value your job).
said by trparky:

Narrowing down the fact that I believe that the SSD is the part that's failing is that none of the other devices that are connected to the same SATA ports and controller are having issues.

I have two HDDs and one SSD that are connected to the Intel SATA controller that's on my motherboard.

I went through the process of determining if the port is possibly what went bad. I tried the SSD on SATA Port 1 which another drive that has been known to not have any issues in this system of mine has been connected to for months with no issues. I plugged the SSD into that port, SATA Port 1, and that HDD into SATA Port 0 which effectively not only swapped the ports but also the SATA cables themselves. This revealed no change, I still had the SSD drop off the bus.

Thanks for explaining -- I understand fully what you went through per your description. You ruled out a bad motherboard SATA port, which is excellent. But all you concluded as a result was: "it's the SSD which is dead". Why it's dead is what's driving my responses here.

The "issue" the Internet dude is talking about pertains to drives falling off the bus as a result of either a) ATA or AHCI-level PM (my guess is ATA-level) or b) being physically disconnected and reconnected to the SATA bus.

You stated your system is a desktop therefore you don't use drive PM (sleep/standby) (which I assume also means you've disabled that capability in (presumably) Windows -- if you haven't, i.e. power management in Windows is actually set to power down the drive after X seconds of it being idle, then you've been using ATA-level PM), and you obviously didn't sit around unplugging the power to your drive (you certainly wouldn't be complaining about the issue had you been doing that).

So how did you determine the root cause for your issue was absolutely what "sandfarce" described?

What I'm getting at here is that it seems to me you jumped on a convenient bandwagon based on little-to-no actual hard data confirming the issue you experienced was the result of this "bug" some guy on a forum said exists.

SSDs die all the time, and the way they die is significantly different than a MHDD solely because they're solid-state. It's akin to a mobile phone ceasing to work, a digital wristwatch ceasing to work, or even a stick of RAM ceasing to work (that's stretching it a bit though; there's a lot more that goes on within an SSD than a DIMM). What you experienced is just a flat out failure, and without an actual engineer to take the drive and do proper analysis of it, I don't think you've provided enough evidence to say "yeah, this random Internet guy said the 520-series drives stop working, and I had my 520-series drive stop working, therefore it must be this thing this guy described".

Generally speaking "tech" people on the Internet are not actual engineers. I'm not talking about you, I'm talking about that guy making wild claims without any hard technical data. They're usually end-users who can do things like, say, build a PC or know how to replace a video card therefore they're somehow skilled at knowing how to determine the root cause of an SSD failure. They know how to do more than their grandparents, or more than the average joe who uses a computer, and this somehow makes them a wizard. To those of us who are engineers, which even the wizards refer to as wizards ( ), we look at guys on forums (not you) flailing their arms and making wild claims and say "sigh, dime a dozen".

Please note that I say all of this quite willingly with the admission that when I first heard about the 320-series 8MByte issue I had a very hard time believing it (mainly because I've owned 6 separate 320-series drives during my life, 4 of which were used in server-class hardware which was power-cycled on occasion, and not one experienced the problem). I sat thinking how that problem could actually be a firmware-level problem and it made no sense. I was very, very surprised when Intel announced that they had found the reason for the problem and fixed it (or possibly worked around it -- I don't know) with a firmware update. I'm still left wondering what the real root cause of that problem was, because there still hasn't been anything highly technical released by Intel or anyone else -- only the symptoms. I have some suspicions but they're purely assumptions until I can get proof, thus I keep my mouth shut.

trparky
Premium Member
join:2000-05-24
Cleveland, OH

trparky

Premium Member

Did you read my post about how the failure of the SSD seemed to coincide with an issue of the TRIM command to the SSD?

koitsu
MVM
join:2002-07-16
Mountain View, CA
Humax BGW320-500

koitsu

MVM

said by trparky:

Did you read my post about how the failure of the SSD seemed to coincide with an issue of the TRIM command to the SSD?

No, that comment hadn't shown up by the time I wrote my explanation. This just brings even more nonsense into the picture:

1. How are you issuing "TRIM commands" for the drive on a nightly basis? On Windows Vista onward you should not need to do that (presumably you're running the "Intel SSD Optimizer") using Intel's SSD Toolbox; that's intended for XP systems which do not offer native OS-level TRIM (see below). You should not be doing this daily unless you are doing massive amounts of I/O every day (I'm talking maybe 60-80GBytes of writes and deletes, daily, on a 100-120GB SSD).

2. Why are you doing this on Windows 8? Since Windows Vista the OS has had TRIM capability natively within the ATA I/O subsystem driver. The OS takes care of this natively, and cleanly, for you per every I/O delete operation (or may submit large consecutive/linear LBA blocks in an optimal way, rather than one at a time -- both methods are fine). You should let the OS take care of this for you; it will do a more efficient job and consistently 100% of the time.

3. Are your partitions on your SSD properly aligned to either 1MByte or 2MByte boundaries (or possibly other multiples of (2^10)*4)? Windows Vista onward has ensured that, but if you did something like use XP back in the day, then installed Vista (including a clean install but without deleting the existing partition (not the same as formatting the existing partition; that would not change the alignment)) or newer, then that would explain very bad TRIM performance in general. I'm not talking about 4KByte alignment here, I'm talking about NAND erase block size alignment (which is represented by N number of sequential NAND pages, and the NAND page size varies per SSD brand, model/device, revision, and lithography. Most manufacturers do not disclose this info, which still to this day pisses me off). You can read about the ill effects of non-NAND-erase-block-aligned partitions here (read, do not skim): »wiki.laptop.org/go/How_t ··· e_Device

4. How much free space did your Intel SSD have before you deleted 10GB of data off of it, and what capacity is that SSD? You should always keep roughly 30% of free space on the SSD.

5. Have you ever done something like a "full format" on the SSD, i.e. every LBA ("sector") written to with data (or zeros)? If so, this would explain awful performance of the SSD, especially during TRIM or GC operations -- the FTL is completely maxxed out. Do not do this on an SSD. ATA-level Secure Erase is the proper way to do this.

6. Did you ever look at any of the SMART statistics on your Intel SSD? If so, do you have that data somewhere (screenshot, etc.)? It would give me some indication of its internal state.

TRIM in general is an expensive operation, and likewise, GC (garbage collection) takes even longer. The drive can go catatonic during this state, to the point where kernel/device drivers may think the underlying device has "fallen off the bus" (in actuality the kernel/drivers hits an internal I/O timeout and then kicks the drive off the bus itself). Any OS will see this. The timeout on FreeBSD is 30 seconds with a 5-attempt retry count. What Windows uses as a timeout depends on the underlying storage drivers; if you're using Microsoft's AHCI, you would need to ask them or look through MSDN. If you're using Intel's RST drivers (be sure to state what version) you would need to ask Intel what the value is.
WhyMe420
Premium Member
join:2009-04-06

WhyMe420 to trparky

Premium Member

to trparky
Sh*t. I have two 520 SSDs. Guess I'll just cross my fingers. Wonder if they'll ever release a fixed firmware? I thought that Intel's firmware was supposed to be free of the SandForce plague? I know it's the same controller but I thought that the firmware was different.

Hyrules
join:2006-07-19
Gatineau, QC

1 edit

Hyrules to trparky

Member

to trparky
It's sad that so many SSD have problems. Intel has problems with theirs, corsair as well. Technology is too new.

DarkLogix
Texan and Proud
Premium Member
join:2008-10-23
Baytown, TX

DarkLogix to koitsu

Premium Member

to koitsu
This is why I always set windows to not turn off the harddrive ever.

though oddly and I blame marvell my intel 520 480gb shows a crazy number of unclean shutdowns.

But my 520 is running great.

rusdi
American V
MVM
join:2001-04-28
Flippin, AR

1 edit

rusdi to trparky

MVM

to trparky
The only trouble I have had with mine, is slow boot.
I'm now curious if this might be affecting your Intel drive as well. Maybe on a different level.

If you're willing to make a few Registry changes:
»Slow boot time on your SSD?

This is for Windows 7.

trparky
Premium Member
join:2000-05-24
Cleveland, OH
·AT&T U-Verse

trparky

Premium Member

I thought that TRIM was supposed to be done even if Windows does it itself. I know that Windows performs a TRIM when you delete files but what about when you overwrite a file?

Microsoft hasn't exactly been very forthcoming with their policy on the TRIM command that's built into the OS. When does a TRIM happen?

Ghastlyone
Premium Member
join:2009-01-07
Nashville, TN

Ghastlyone to DarkLogix

Premium Member

to DarkLogix
said by DarkLogix:

This is why I always set windows to not turn off the harddrive ever.

How do you do that?

I must have overlooked that option in there.

Cheese
Premium Member
join:2003-10-26
Naples, FL

Cheese

Premium Member

It's in power options

rusdi
American V
MVM
join:2001-04-28
Flippin, AR

rusdi to trparky

MVM

to trparky
Click for full size
said by trparky:

I thought that TRIM was supposed to be done even if Windows does it itself. I know that Windows performs a TRIM when you delete files but what about when you overwrite a file?

Microsoft hasn't exactly been very forthcoming with their policy on the TRIM command that's built into the OS. When does a TRIM happen?

It "should" be.

Here's a way to check if it's enabled.

Open a command prompt, (Administrative level).

Command prompt > fsutil behavior query disabledeletenotify

DisableDeleteNotify = 1 (Windows TRIM commands are disabled)
DisableDeleteNotify = 0 (Windows TRIM commands are enabled)

trparky
Premium Member
join:2000-05-24
Cleveland, OH

trparky

Premium Member

I know about that, but appears that that may indicate that a TRIM only happens when you delete files. What about an overwrite of a file?