The client is a leading enterprise healthcare management solutions provider. It offers solutions to physicians, diagnostic testing facilities, and healthcare systems from the private, public, government, and not-for-profit groups. The organization used a Dell® PowerEdge R710 storage server comprising 6 TB volume stacked with 4 Serial Attached SCSI (SAS) Hard Disk Drives (HDDs), each having 2 TB capacity. The volume was configured to RAID Z that functions on the ZFS file system.
The server was hosting crucial data of the organization’s customers, including the employees’ details, patients’ history, health reports, accounting data, images, and other business-critical documents.
Recently, when the client tried accessing the data stored in a ZFS volume, it received a file system corruption error message.
The client attempted the following to fix the problem:
- Rebooted the server to repair the file system error, but failed.
- Tried importing the ZFS storage volume pool, but the volume showed up as empty.
Apprehending the risk of permanent data loss, the organization contacted Stellar® to restore its lost data from RAID Z drives of the Dell® R710 server.
What made this RAID data recovery case rare and complex?
This was a critical and complex data loss case concerning a RAID Z recovery and volume functioning in a complex and sophisticated environment.
The client was using the latest Community Edition of NexentaStor, a UNIX-like computer operating system that creates storage virtualization pools consisting of multiple hard disk drives. The OS was integrated and optimized for the VMware infrastructure environment.
NexentaStor OS utilized ZFS, an advanced file system, and RAID volume manager that provides pooled storage. The ZFS file system eliminates the volume concept of traditional file systems. It employs a common storage pool, called ZFS Pool for virtual devices (VDEV), where each device may contain multiple disks.
Figure: RAID Z Structure on ZFS File System
The client had lost the data from this ZFS storage pool volume due to file system corruption.
Installation of VMware® ESXi hypervisor, software to run virtual machines, on this ZFS pool further complicated the RAID data recovery process. The client’s data was stored in the ESXi volume having multiple Virtual Hard Disk (VHDx) files. VHDx is a file format of virtual hard disk drives used for storing files and folders like physical storage.
The Stellar® team successfully recovered 100% data from the corrupted ZFS pool volume. This was a remarkable remote ZFS pool recovery case, a first in India.
Recover data from the inaccessible and corrupt ZFS pool volume that showed file system error message — ‘Metadata Failed while Importing ZFS Pool’.
Data Recovery Challenges
- Repair corrupt ZFS pool volume
- Extract readable data from ESXi volume containing multiple VHDx files
Remote Data Recovery Approach
Stellar® assigned a specialized team of RAID experts to execute this project. The team employed the following steps to recover ZFS pool volume data:
Step 1: Hard Disk Examination & Diagnosis
Stellar® RAID data expert team performed an online examination of the RAID Z hard disk drives for Raid Z Recovery. To facilitate the online analysis, the Stellar® data recovery technician emailed a secure remote access application link to the client organization to map the client system with Stellar’s server. The client accessed the link through its computer to establish the remote connection.
After analyzing the four RAID HDDs, the team found that the hard drives’ health was up to mark, but there was corruption in the ZFS storage pool volume with broken inode links.
Note: The inode is a numbered data structure in a ZFS file system. It describes a file-system object and stores the attributes (that contain metadata, owner, and permission data) and disk block locations of the object's data.
The team concluded a logical corruption in the ZFS volume, due to which the data was inaccessible.
Step 2: Disk Imaging
The next step was to create an image of the corrupt ZFS volume to facilitate the data recovery process. The team used a Hex editor tool to create an image of the corrupted ZFS volume.
Step 3: Rebuild ZFS pool
Data recovery was not possible from the corrupt volume. So the next step was to repair the ZFS pool. It was a tedious task as the team had to rebuild the broken inodes of the ZFS pool volume manually. It took about two weeks to reconstruct the ZFS volume and reinstate it to a functional state.
Step 4: Deep Scan and ZFS Data Recovery
After reconstructing the ZFS volume pool, the team performed deep scanning on the rebuilt ZFS pool volume.
A byte-wise scanning of the ZFS volume image file revealed the presence of the NexentaStor 4.0.3 operating system.
The scan results showed VMware ESXi hypervisor volume of 5.5 TB capacity having 25 VHDx files. The actual data required by the client was inside these VHDx files.
Stellar® RAID data expert team was able to extract approximately 12.2 TB data from the highly compressed VHDx files. The data comprising SQL files, PDFs, Word documents, Excel, PowerPoint files, images, videos, etc., was recovered in the readable format with the original file name.
Stellar® RAID data expert team recovered 100% data from the ZFS-based RAID Z deployed on the DELL R710 server. It restored the files to their original form and shared a detailed assessment and recovery report with the client organization. Stellar® team completed the entire RAID Z data recovery assignment within the specified time. Stellar®, an ISO 9001:27001 organization, maintained a strict adherence to ISO process standards for ensuring the confidentiality, security, privacy of the client’s data.