RAID-5 Array Not Responding Due to One Dropped Drive

An interesting thing happened on the way to backing up the VM server this weekend. And by interesting I mean mindbendingly horrible.

So I’m pulling down the array for a backup. Not that odd in and of itself. And then I start getting spurious IRQ requests (it’s spamming the crap out of the line) that crash out one of the HDDs on the array and render the machine frozen.

OKAY. That’s why we have a RAID-5 array in the first place.

When I bring the machine back up it chokes with some errors on the RAID initiation. Apparently it doesn’t think there are enough drives in the array anymore to bring up the primary /dev/md1 array that holds all our precious data. Excellent, I love spending a weekend de-mucking dead servers :/.

Naturally I don’t want to compound issues, so I pull a backup of each of the bloody 300+ GB drives to a recently verified good (off it’s third RMA.. hardware incompatibility rather than mechanical flaws) 640GB backup drive. This takes roughly a day.. but it’s worth it if the drives die in the middle of a recovery effort.

I’ve got everything pretty straight data-wise, no real fear of doing worse damage at this point. Cracking open mdadm to do a examine on the arrays reveals something a bit weird though. The drives, two of them anyway, show that they’re still okay…

Turns out that the system hit the third drive in the array first, saw that it reported a failure of the entire array, and went no further. The other two drives in the array report as working fine.. and do. I did a –assemble sans funky drive and the array came right up for me to pull a quick backup.

Now I’ll just re-add the “dead” drive to the array and have it re-build once the backup is finished.

So, if you’re staring at an array that won’t come up take a closer look at the mdadm output to make sure it isn’t just hanging on a single debilitated drive. Although I’ve never seen this happen before, restoring a single drive sure beats restoring off backup medium.

New Data Recovery Resource

So Dan and I teamed up to put together a site on data recovery for and using Linux. I’ve been looking for a steady writing project to go along with PCBurn only with a more commercial bent. PCBurn isn’t currently geared toward heavy moneymaking, it’s geared toward giving me and others a platform on which to write and showing users interesting products.

This’ll be largely the same content-wise, but with zan_d’s marketing skills attached. And a targeted URL toward the subject matter. All in all, it should be self sufficient with a wealth of data recovery/Linux information. Check it out at LinuxRecovery.org.