A paper “A Study of SSD Reliability in Large Scale Enterprise Storage Deployments” is from Feb 2020 but is interesting reading for those that are wondering how well SSDs are doing in production. I have been retired for a while so a bunch of the jargon is over my head but there are points of interest to me:
- None of the SSDs ever got close to their write limit.
- RAID does not cause excessive write wear.
- SSDs have spare blocks and will use those blocks as replacements for bad blocks but none of the SSDs used up these extra blocks.
- If one SSD has problems in a RAID there is significant chance that another SSD in the RAID will have problems within a day or week. Note that this could be controller or power problems or what ever.
- Some SSDs seem to have more problems that others but it is not clear the exact problems. The largest SSD sizes have more problems but I’m not sure if the number of problems per terabyte would be higher.
Please note this paper’s SSDs, over one million SSDs, usage would not match your usage so the points may not matter.
Average replacement rate is 1.2% per year. So keep do those backups.
Thank you Restic writers and community.
edit: Forgot to include the link: SSD Reliability