With a new innovation in the storage industry coming almost every quarter, it is no surprise to see the rapid adoption of these new storage components in data centre storage systems. Before a transition to this new technology takes place, it is necessary to test the new components (especially SSDs) for understanding their behaviour and common reliability issues under various circumstances.
Recently, the researchers at Ohio State University and HP Labs conducted tests to examine reliability issues in block-level storage devices. They set up a testing unit comprising specially-designed hardware to simulate power faults directly in devices, software for stressing these devices after faults, and techniques to identify various failure modes.
The tests were performed with fifteen commodities SSDs. When power faults were injected to these drives, thirteen of them showed surprising results. The tested SSDs exhibited unexpected behaviour manifesting five different types of failures, i.e. bit corruption, shorn writes, metadata corruption, unserializable writes, and total failure.
The results also showed some amount of data loss on all devices that encountered failure. Massive corruption was observed with two SSDs, one of which failed to register on the SAS bus and the other having one-third of the blocks severely corrupt after facing eight fault-injection cycles.
The enthusiasts who claim that SSDs will soon take over the enterprise are oblivious to the reliability challenges these flash devices have posed that are yet to be worked upon. The experiments of the researchers from the University of Ohio and HP Labs brought to light different aspects of the behaviour of flash memory that might replace spinning disks used in data centres in the foreseeable future.
The failures observed with SSDs may have a negative connotation, especially when designing storage systems. For working around bit corruption and shorn writes, one would require ensuring in-place update to sensitive and highly critical data. Another concern is serialization errors for systems that overly depend on the order of operations to maintain data integrity. The new study will behoove organizations to test their SSD models before deploying them in data centres.