Posted by drow on the 12th of August, 2007 at 6:49 pm under tech.    This post has 4 comments.

The gloom from my last entry has more or less lifted. To refresh, my desktop died shortly after a lightning storm; no definite cause and effect relationship, but a strong correlation. I shut it down, and after the next reboot I got scary errors all over the place, starting with this disclaimer:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor

Now, that may be true, or it may not be. Certainly the disclaimer is sensible, since machine checks are usually caused by failing components. And it persisted for most of a day every time I tried to boot the machine, but there was quickly a new addition: different scary errors on ata1. And it got rapidly worse. On Friday morning, having let the machine cool and relax overnight, I brought it up on a rescue CD and could get at /boot (a separate partition), but not at /etc (from inside my RAID/LVM setup). I went out and bought a new SATA drive and attempted to recover /boot onto it, but I had waited too long; at this point I could still get at the partition table, but not the contents of /boot.

Some things to note here.

  • As far as I know, you still need a separate /boot if your root uses RAID + LVM.
  • Therefore your /boot is obviously not backed up by your carefully constructed RAID.
  • But that’s fine because all that really mattered was your kernels, and you have them still lying around on your mirrored disk. Right?
  • Of course, right. But dynamically generated initrds, while technically awesome, are awkward to recreate from a rescue CD. Especially a 32-bit rescue CD when your root filesystem is all 64-bit.
  • It can’t be hard to find a rescue CD with a 64-bit kernel, right? No, not right either. I did eventually find a 50MB Gentoo install CD that worked serviceably well.
  • Your laptop has a convenient CD burner, right? No, not right, turns out it doesn’t write CDs after all. Oh well, Windows file sharing plus the wireless plus the Mac Mini in the other room… there you go.
  • You didn’t need anything else from /boot, right? Like your carefully constructed voodoo incantation to boot Windows… whoops… hopefully you remember how that worked…

So, after some futzing around I was able to boot the Gentoo rescue CD, partition the new hard drive, mount the remaining disk in degraded mode, reinstall the handy Debian packages of my kernels, have it magically recreate ramdisks, and then reinstall grub. By the way, --no-floppy obviously means something different to grub than it does to me, as most of the reinstalling grub process was still spent waiting for my non-existant fd0 to time out. Then I added the new disk to the array, let it reconstruct for two hours, and rebooted.

Lo and behold… no more machine checks! I am a happy camper. Despite the tribulations above I came out remarkably well, since I seem to have lost nothing but my grub menu.lst file. And now I have rdiff-backup configured to back up /boot onto the array in case I ever need to do this again.



* Required

Posted on the 12th of August, 2007 at 8:28 pm.

FWIW you can put your /boot on a RAID-1 partition. LILO supports it out of the box, GRUB needs some coaxing.

Posted on the 12th of August, 2007 at 9:33 pm.

You can have /boot on a RAID array, but not on LVM AFAIK. It does mean that you have to have RAID compiled into the kernel however, and not on the ramdisk, and the partition type of fd aka linux raid autodetect.

In Debian this means compiling your own kernel as the stock kernels don’t provide raid built in (maybe they should?). Then you have to jump through some hoops to get GRUB working on both disks (basically use the setup() call twice in grub shell, once with each disk set as root()), but in the end I think it’s all worth it.

Posted on the 13th of August, 2007 at 7:40 am.

And with grub-2 you can boot your kernel from RAID+LVM without the headache of reruning lilo too ;)

Posted on the 13th of August, 2007 at 11:17 am.

It sounds like a grub (v1) setup here is much more awkward than I’d like. GRUB 2 looks very interesting… I wonder if it’s really stable yet, though.