RAID-5 Array Not Responding Due to One Dropped Drive

An interesting thing happened on the way to backing up the VM server this weekend. And by interesting I mean mindbendingly horrible.

So I’m pulling down the array for a backup. Not that odd in and of itself. And then I start getting spurious IRQ requests (it’s spamming the crap out of the line) that crash out one of the HDDs on the array and render the machine frozen.

OKAY. That’s why we have a RAID-5 array in the first place.

When I bring the machine back up it chokes with some errors on the RAID initiation. Apparently it doesn’t think there are enough drives in the array anymore to bring up the primary /dev/md1 array that holds all our precious data. Excellent, I love spending a weekend de-mucking dead servers :/.

Naturally I don’t want to compound issues, so I pull a backup of each of the bloody 300+ GB drives to a recently verified good (off it’s third RMA.. hardware incompatibility rather than mechanical flaws) 640GB backup drive. This takes roughly a day.. but it’s worth it if the drives die in the middle of a recovery effort.

I’ve got everything pretty straight data-wise, no real fear of doing worse damage at this point. Cracking open mdadm to do a examine on the arrays reveals something a bit weird though. The drives, two of them anyway, show that they’re still okay…

Turns out that the system hit the third drive in the array first, saw that it reported a failure of the entire array, and went no further. The other two drives in the array report as working fine.. and do. I did a –assemble sans funky drive and the array came right up for me to pull a quick backup.

Now I’ll just re-add the “dead” drive to the array and have it re-build once the backup is finished.

So, if you’re staring at an array that won’t come up take a closer look at the mdadm output to make sure it isn’t just hanging on a single debilitated drive. Although I’ve never seen this happen before, restoring a single drive sure beats restoring off backup medium.

Tuning in to Other People’s Music looks like a fairly interesting service. Check out what other people are listening to. If it’s in the catalog, listen to it too!

Plus it looks like there’s integration in Amarok. That’s what I use to listen to audio these days, what with it’s exhaustive feature set, so that’s pretty important.

In addition to checking out last I’m also getting pulseaudio going. It looks like yet another audio daemon so I’m curious if it’s better than the rest of the (somewhat dissapointing) pack.

NAS (Network Audio System) always looked good and may yet be the audio layer of choice, but it’s a bit under the radar at the moment. Methodical is how I’d describe those Xorg folks.

And hey, whatdya know. It’s installed. Time to go play with :).

New CentOS 5.0 Xen Server Live

Our new virtual server is live! Pound’s handling the connections and routing to the appropriate destination, Xen does the VM’ing, and Apache’s still working its server mojo.

All this being served up off a beefy (and more efficient) Red Hat Linux machine. Now I just have to get the processor power controls up and we’ll be all set.