It’s (still) alive!
Posted by drow on the 12th of August, 2007 at 6:49 pm under tech. This post has 4 comments.The gloom from my last entry has more or less lifted. To refresh, my desktop died shortly after a lightning storm; no definite cause and effect relationship, but a strong correlation. I shut it down, and after the next reboot I got scary errors all over the place, starting with this disclaimer:
HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor
Now, that may be true, or it may not be. Certainly the disclaimer is sensible, since machine checks are usually caused by failing components. And it persisted for most of a day every time I tried to boot the machine, but there was quickly a new addition: different scary errors on ata1. And it got rapidly worse. On Friday morning, having let the machine cool and relax overnight, I brought it up on a rescue CD and could get at /boot (a separate partition), but not at /etc (from inside my RAID/LVM setup). I went out and bought a new SATA drive and attempted to recover /boot onto it, but I had waited too long; at this point I could still get at the partition table, but not the contents of /boot.
Some things to note here.
- As far as I know, you still need a separate /boot if your root uses RAID + LVM.
- Therefore your /boot is obviously not backed up by your carefully constructed RAID.
- But that’s fine because all that really mattered was your kernels, and you have them still lying around on your mirrored disk. Right?
- Of course, right. But dynamically generated initrds, while technically awesome, are awkward to recreate from a rescue CD. Especially a 32-bit rescue CD when your root filesystem is all 64-bit.
- It can’t be hard to find a rescue CD with a 64-bit kernel, right? No, not right either. I did eventually find a 50MB Gentoo install CD that worked serviceably well.
- Your laptop has a convenient CD burner, right? No, not right, turns out it doesn’t write CDs after all. Oh well, Windows file sharing plus the wireless plus the Mac Mini in the other room… there you go.
- You didn’t need anything else from /boot, right? Like your carefully constructed voodoo incantation to boot Windows… whoops… hopefully you remember how that worked…
So, after some futzing around I was able to boot the Gentoo rescue CD, partition the new hard drive, mount the remaining disk in degraded mode, reinstall the handy Debian packages of my kernels, have it magically recreate ramdisks, and then reinstall grub. By the way, --no-floppy obviously means something different to grub than it does to me, as most of the reinstalling grub process was still spent waiting for my non-existant fd0 to time out. Then I added the new disk to the array, let it reconstruct for two hours, and rebooted.
Lo and behold… no more machine checks! I am a happy camper. Despite the tribulations above I came out remarkably well, since I seem to have lost nothing but my grub menu.lst file. And now I have rdiff-backup configured to back up /boot onto the array in case I ever need to do this again.
Submit Comment