 | degradation of SSDs What can make SSDs performance to degrade over time? Can repeated hard shutdowns cause degradation? OS is Win8.
TIA -- Wacky Races 2012! |
|
 | Many, many writes cause SSD's to degrade over time... or just a bad drive. I've only seen it once since they came out, a drive from 2008.
Hard shutdowns? Not so much. Corruption to your data maybe... |
|
 | Yeah, I know that flash cells, especially MLC can be written only for a limited number of time before they start to fail. For the arguments' sake, let assume that the number of writes is less than 1/10 of the max number specified for a given flash part, so end of life wearout shouldn't be an issue. -- Wacky Races 2012! |
|
|
|
 | reply to aurgathor Write amplification (logical). Thinning of gate oxide layer (physical). |
|
 koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | reply to aurgathor
Both what urbanriot  and Chrno  said are correct, but I'll add something fun to the mix. You said, quote, "Can repeated hard shutdowns cause degradation?" I assume "hard shutdowns" means hard power-off, i.e. holding the power button down for 3+ seconds (ATX power shut-off). The answer to this is no, it will not cause "degradation" -- however, Intel X25-series and 320-series drives suffered from a firmware bug where on power loss (whether or not low-power (standby) mode also applied is unknown) the drives could, when powered back on, show a total capacity of 8MBytes (and consider your data lost) -- permanently. References: » www.anandtech.com/show/4646/inte···3x-error» www.engadget.com/2011/08/17/inte···es-ssds/» news.softpedia.com/news/Intel-32···09.shtmlBottom line with PCs in general, regardless of what hardware is in them: at all costs please try to avoid forcing a power-off via the 3+ second ATX power shut-off method. Only do this if after a few minutes of waiting the system *absolutely* won't shut down cleanly (or if it's locked up hard). I've used the same method for 20 years to determine if a system is locked up hard: press NumLock. If the LED toggles, the system is actually still alive; if it doesn't, it's locked up hard. The LED toggling is actually done by the keyboard driver/OS and always has been.  So in summary, treat your hardware with respect. And with SSDs in general, specifically regarding wear levelling / write amplification, try to keep between 30-50% of the SSD empty. The more free space = the longer the drive will live (specifically less NAND erase cycles having to be wasted on a per-page basis). -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. |
|
 OctaveanPremium,MVM join:2001-03-31 New York, NY kudos:1 | reply to aurgathor Some models of SSD will have very different performance based on how much storage space is used (not referring to TRIM or GC). There are also some special cases with some SSD controllers were performance can be permanently impeded. |
|
 | reply to aurgathor Write amplification (logical). Thinning of gate oxide layer (physical). Given the short time period (~weeks) I highly doubt that low level physical effects would play a role, though I can't rule them out for certain. I think something logical that happens at the drive FW level, or as a result of an interaction between the host and the drive.
Yes, the hard shutdowns are accomplished by randomly cutting power.
So in summary, treat your hardware with respect. a) it's not my HW b) I'm getting paid to break things! 
And with SSDs in general, specifically regarding wear levelling / write amplification, try to keep between 30-50% of the SSD empty. The more free space = the longer the drive will live (specifically less NAND erase cycles having to be wasted on a per-page basis). That just gave me another idea.  -- Wacky Races 2012! |
|
 pnjunctionTeksavvy ExtremePremium join:2008-01-24 Toronto, ON kudos:1 1 edit | reply to koitsu Indeed hard shutdowns can be troublesome. I put a cheap OCZ Vertex Plus into an old laptop to make it usable for a while longer. It's been working well but one time when the machine ran out of battery and did a hard shut down the file system got corrupted and I had to re-format. I suppose the same thing could have happened with a hard drive though I haven't seen it in a while.
I have a 120GB Intel 320 in my laptop and their toolbox says that I have written 1.6 TB and 'lifetime' is still at 100%.
With wear-levelling you should be able to write the entire capacity of the drive thousands of times, though this number is shrinking and is apparently at 3,000 for 25nm flash. Still assuming write amplification is kept under control that is a pretty huge number unless you have some crazy application that is writing huge amounts of data all the time. |
|
 | reply to aurgathor
Here is one thing that I'm seeing. |
|
 koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | Is the LBA the same in all of those events?
I'd like to see SMART attributes for that SSD, preferably from smartctl -x / smartmontools.
-- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. |
|
 | I didn't check all those warnings, but what I've seen seem to indicate that they are all unique.
The device that produced the above has been wiped and re-initialized, so I'm not sure how much of the old data left in there, but I'll run smartctl -x sometime, especially if I see those errors again. -- Wacky Races 2012! |
|
 koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | If the LBAs all vary, then I have a bunch of theories, and it's virtually impossible on Windows to figure out which it is. This is much easier to determine on *IX.
1. The SSD may have (at some point during its lifetime) been utilised fully -- meaning a very high amount of space/usage had been reached. SSDs begin to take longer to respond to I/O requests as free space begins to diminish. This has to do with how wear levelling/write amplification works, and the fact that the drive's firmware has to spend a large amount of time cycling through the FTL map to find the "least used" NAND page for the write being issued to the drive. What you start to see performance-wise is a graph with a lot of Vs -- i.e. repeated dips to zero or low I/O rates. Here's an example (out of context).
If an SSD has ever been heavily utilised like this (and this especially applies if it was used with an OS that doesn't provide TRIM support), the best way to get it back to its factory default performance is to issue a Secure Erase. What this does is zeros the FTL mapping on the drive (it does not zero all the data on the drive -- more on why that shouldn't be done in a moment) which then allows wear levelling to start from the beginning again, i.e. "I have no knowledge of how utilised some of these NAND pages are, so I'll just assume everything has equal wear/tear".
This is not the same thing as writing zero to every LBA -- in fact you don't want to do that with an SSD since it fills the FTL map up (you've now issued I/O to every single LBA and thus every NAND page, so the FTL map is now quite literally maxed out -- it's the same as filling the drive up 100%!), and that forces the drive to rely on GC. No erasing program out there I've seen issues TRIM requests after each LBA or block zeroing (of course it also depends on how the OS implements its TRIM support and at what layer/level the erasing program is operating at).
2. The I/O layer in the OS may have too aggressive internal timeouts for I/O transactions. Windows does TRULY tell you what piece of technology thought the I/O operation took too long -- it just says "System: disk". Was it the I/O subsystem in Windows that thought this? Was it the filesystem driver? Or was it the ACHI driver?
3. Crappy GC implementation in the drive's firmware, which is causing the firmware to spin while iterating over the FTL map for long periods of time. It is well-established and well-known that GC on SSDs can often take a long time (i.e. 10 seconds in some extreme cases). This is why TRIM is significantly better than relying entirely on the GC. 
But in general you're really not giving enough technical information in the thread so far (model of drive, firmware version, etc.) to narrow down the explanation. Sorry man. :/ -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. |
|
 1 edit | I wish I could tell you more, but I work under NDA. 
Timing does play some role, but it is definitely not the root cause as several other SSDs (both from the same mfg and others) work just fine. And this isn't my area aither -- I supposed to be working on something completely different, but things happen. 
Seems to me that with flash based SSDs, there can be quite a few things going under the hood, so in certain situations their behavior can be somewhat non-deterministic and unpredictable. -- Wacky Races 2012! |
|
 pandoraPremium join:2001-06-01 Outland kudos:1 Reviews:
·Google Voice
·Comcast
·ooma
·Future Nine Corp..
| reply to aurgathor We have had SSD's for about 2.5 years. The older 128 GB drives work fine (we have had no data loss using several types of drive).
There have been issues with the drive being nearly full. At about 92-95% the 128 GB Kingston drives seemed to have performance issues. The issues may be related to the file system or the operating system use of the file system.
Whatever the reason, deleting large unused files or products worked as an intermediate solution. A byte by byte copy to a new, larger drive, then expanding the old drive to fit the new drive worked amazingly well.
I was surprised, but a byte for byte copy of a hard drive when installed on the PC the old drive ran on, didn't require re-activation of Windows. Though on initial boot, Windows 7 installed new drivers for the new SSD. -- "If you put the federal government in charge of the Sahara Desert, in 5 years there'd be a shortage of sand." - Milton Friedman" |
|
 koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | said by pandora:There have been issues with the drive being nearly full. At about 92-95% the 128 GB Kingston drives seemed to have performance issues. The issues may be related to the file system or the operating system use of the file system. The issue has nothing to do with filesystem or OS use of the filesystem. The issue here, absolutely hands down, has to do with wear levelling/write amplification being ineffective with that much space used on the drive. The FTL map is probably massive and takes a long time to iterate over. There's really nothing you can do about this other than get bigger/larger SSDs (and it sounds like you did).
Like I said earlier, with SSDs, try to keep 30-50% of the drive unused at all times. One way to ensure this is to only create a partition that uses between 50%-70% of the drive's total capacity -- this ensures the LBAs in the remaining 30-50% of the drive never get touched, thus can always be used for wear levelling. Do not believe nonsense you read on the web about "SSDs having special reserved areas for spare LBAs/NAND pages" -- this is false. Unlike MHDDs, SSDs have no such thing. Their "spares" are actually NAND pages which the FTL has listed as either never having been written to, or have been written to the least.
said by pandora:Whatever the reason, deleting large unused files or products worked as an intermediate solution. A byte by byte copy to a new, larger drive, then expanding the old drive to fit the new drive worked amazingly well. I can't comment on the former part, but the latter part (byte-by-byte copy) can be explained. You didn't state what capacity drive you upgraded to, but I'm going to assume you went from 128GB to 256GB. A full disk copy (byte-by-byte or LBA-by-LBA) from the 128GB to the 256GB would then only issue writes to half the 256GB drive -- the remaining half would therefore be available for efficient wear levelling. (That's also assuming nobody ever did something like "zero the entire SSD" -- shame on them if they did that, bad bad bad)
In the future though, with SSDs, I would not recommend you do the byte-by-byte method (unless these are OS drives and you don't want to deal with setting up boot blocks/etc. -- it isn't that hard on Windows, but nobody documents how to do it very well. It's only 2 commands. ). I would recommend you recreate the filesystems and do a file-by-file copy instead, or use a backup/restoration tool that uses Windows' native "shadow copy" / "volume snapshot" capability. I know with OS disks this is a serious PITA with Windows Vista and 7 due to their "System Restore" partition or whatever (not talking about Dell's stuff!).
The reason why I recommend what I do: say you have a 128GB SSD with a 120GB partition on it, and you're using 40GB of that 120GB worth of space. A byte-by-byte or LBA-to-LBA or disk-to-disk copy will actually read all 128GB and write it to the new drive -- that means writing 88GBytes of nothing. The FTL inside of the destination SSD will then mark these LBAs as written to, which has an effect on the wear levelling (the drive now has indication that these LBAs have been used, while originally they hadn't). Meaning: every time you write to an LBA on the drive, no matter if you're writing zeros or not, you're affecting the FTL.
No byte-by-byte or LBA-to-LBA or disk-to-disk copy program I have seen on the market yet issues TRIM commands on the unused portions of an SSD, which means you end up having to rely 100% on the GC to "do the right thing", and as I've already stated, GC can sometimes stall I/O transactions on a drive for 10+ seconds (why? too many factors/reasons to list here), which on some controllers can result in the drive showing I/O timeouts or falling off the bus. This falls under my "treat your hardware with respect" clause -- you can't treat SSDs exactly the same as MHDDs in this regard. (Well, you CAN, but the effects of such can be devastating). -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. |
|
 pandoraPremium join:2001-06-01 Outland kudos:1 Reviews:
·Google Voice
·Comcast
·ooma
·Future Nine Corp..
| Thanks for your helpful description of the problem with a nearly full SSD drive.
If it helps, I'd add that Windows can expand or contract a logical drive (the C: drive for example).
My copy was byte for byte of 128 GB, but after the copy, I enlarged the Windows partition (C: drive) to occupy the entire space.
Performance was pretty good after the migration. I could have reduced the C: partition to the minimal size necessary, but overall, the 128 GB copy was quick and painless. All settings, saved passwords, email, even the activation were preserved. -- "If you put the federal government in charge of the Sahara Desert, in 5 years there'd be a shortage of sand." - Milton Friedman" |
|
 | reply to pandora In this particular case (128 GB drive) the utilization is below 25%, so it's nowhere near to being full. Interestingly, other SSDs (64 & 256 GB) from the same manufacturer do not exhibit the issue seen with some of the 128 gig drives. Their latest request is to re-format them before testing....  I kinda doubt that it would make any difference, but we'll see.  -- Wacky Races 2012! |
|
 koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | If by "reformat" they mean issue a Secure Erase, I can possibly understand it, but otherwise would be my reaction too. |
|
 | I did ask them if they have a program for that, or something they would recommend, but no answer as yet. It doesn't matter much ATM since I won't be back to work until Thursday, the earliest. -- Wacky Races 2012! |
|
 | reply to koitsu
Still no reply on the format, but here are the SMART attributes from an SSD that had one failure recently, after 2000+ tries. -- Wacky Races 2012! |
|