netfilter.org downtime - moving and updating servers

I've spent the whole Monday in the hosting center where netfilter.org, gnumonks.org and most of my other projects are hosted. The main reasons for this visit were:

  • do kernel updates on two boxes that are known to be difficult with new kernels
  • move all five machines to a new rack, the old one is too crowded (no space for new machines, too hot)
  • add yet another new box (parvati.gnumonks.org), which makes the number of machines now six

As usual, Murphy's law applied, so about everything that could go wrong went wrong. And, confirming Murphy's law, the most important machine (vishnu.netfilter.org) had the longest downtime, something close to 9 hours.

This was mainly due to the last Gentoo update overriding my custom-modified yaboot boot script (for using the serial port, this is a headless XServe cluster node) with the default one, which wants to use the non-existent framebuffer.

That combined with the fact that KDUMP-capable kernels can't be booted from OpenFirmware (why isn't this indicated in the menuconfig help???) and thus the new default boot kernel couldn't be booted from yaboot.

That day I've tried about anything, from attaching a powerbook with bootable cd in firewire target mode to booting yaboot via tftp (which fails to load yaboot.conf via tftp *sigh*).

Now I've learned my lesson: chattr +i on yaboot.conf and the modified boot script for serial console.