The client organization is a large-sized, global manufacturing company based in California, with presence in over 50 countries. Its subsidiary in India had deployed NetApp Fabric-Attached Storage (FAS 2020) as a Network Attached Storage (NAS).
NetApp® FAS systems are popularly used for leveraging their combination of performance and flexibility with technologies that are built for driving efficiency. FAS systems facilitate data management and can swiftly respond, based on the storage needs for flash, disk, and cloud.
The FAS2000 Series NAS used in this particular case was an old setup with ONTAP® 7.3.7 (Data ONTAP 7G) operating system, having total 12 WD® SATA HDDs configured using NetApp® RAID-DP® technology.
RAID-DP is a standard feature of the Data ONTAP operating system, which implements double-parity RAID 6 to prevent data loss when two drives fail.
Each of the 12 HDDs – including the one spare hard drive – in the RAID had 1TB storage capacity. Out of the total 12 TB ‘aggregate’ capacity, approx. 6.23 TB of the storage space was used for storing company data along with critical information of one of its key customers.
One of the hard drives in the RAID failed all of a sudden due to which the setup went into a degraded state. To address this issue, the storage administrator manually removed and replaced the failed disk with the spare hard drive.
As a result, the RAID was apparently restored to its normal functional state. However after about two days from this initial patch up of disk failure and performance degradation issue, the RAID hit a major problem; it went into a series of reboot cycles that were repeating after every 10 minutes of duration.
This was likely due to failure of multiple hard drives, considering the fact that RAID DP with double parity can withstand failure of up to 2 drives (without data loss) and would fail and go offline if the number of failed drives is >2.
Reportedly, the server ON duration between these reboot cycles was insufficient to allow the RAID to resync, due to which each of these reboots was resulting in ‘unclean’ shutdown. After running so through these reboot cycles for about 24 hours, the RAID went into a permanently degraded state (failed). Also, the single aggregate volume in the RAID setup turned into “WAFL inconsistent” state, meaning, the root aggregate had been marked corrupt because of file system corruption.
Failure of the RAID turned the NetApp storage inaccessible, with potential risk of losing critically important data unless the RAID could be reconstructed to allow data recovery operations.
Recover data from crashed NetApp NAS server in ‘WAFL inconsistent’ state.
Data Recovery Challenges
Data Recovery Approach
Stellar® constituted a dedicated team of data care experts to execute this NAS data recovery project. This team employed the following steps:
Stellar data care experts successfully recovered requisite data from NetApp Fabric-Attached Storage (FAS 2020). The entire data recovery project — from job intake and assignment to execution and final closure — was completed within the committed time.
The data was recovered intact, with 100% integrity in its original form, as verified by the client organization. The quick turnaround and quality of service helped the client organization to quickly recover from the downtime and reinstate normal business operations.
“Our Net App Server was creating WAFL Inconsistent error. It’s containing important data and we were not able to access that data. It was data lost situation and we contacted stellar to recover data. The complete data recovery process was very transparent and the team is very professional. We got 100% data from the server within estimated time. I recommend stellar for any data recovery service.”