Installing Linux on a G5 ClusterNode XServe

Now that I got this decent new dual G5 box, I wanted to install Linux. This turned out to be an extremely difficult job, as apparently nobody has ever tried to install Linux on any of the new XServe G5 Series machines, neither 32bit nor 64bit kernels.

There are a number of challenges:

  • No internal IDE or SCSI CD-ROM
  • Only serial console
  • A very new hardware with little Linux support

First I tried a number of ready-built installation ISO images, including the current sarge Debian-installer image for PPC, and the 32bit and 64bit live images of Gentoo.

The first thing I had to do is to disable autoboot and enable the serial console. Luckily, the box actually ships with a manual that instructs you how to put the OF boot console on the serial port. You have to press the admin (!) Button at the front of the box a magic number of times.

To permanently make the serial console work, use the following OF commands:

> setenv input-device scca
> setenv output-device scca

Next I had to figure out how to boot from the external firewire cdrom.. apparently this depends on your OF device tree and the GUID of your firewire device. On my particular box it works with

> devalias cd /ht/pci@5/firewire@e/node@00d04b3c50090210/sbp-2@c000/disk@0
Using Commands like
> dir cd:,\
I was then able to list files on the CD-ROM. To boot the yaboot loader on a Debian installer cd image, you can use
> boot cd:,\install\yaboot
sbp2:Open ->login?
speed=ffffffff 2 2 load-size=239a4 adler32=a5cf5aa0 

Loading ELF



sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 Config file read, 2907 bytes


sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 \
sbp2:Open ->login?
speed=ffffffff 2 2 Welcome to Debian GNU/Linux sarge!

This is a Debian installation CDROM,
built on 20040729.

The default option is 'install'. For maximum
control, you can use the 'expert' option.

If the system fails to boot at all (the typical
symptom is a white screen which doesn't go away),
use 'install video=ofonly' or 'expert video=ofonly'.

The plain options are for the powerpc family of
processors (from 601 to G4). The *-power3 options
are for IBM Power3 boxes, and the *-power4 options
are for IBM Power4 and Apple G5 boxes. Press the tab
key for a list of options, or type 'help' for help.

************************************
If in doubt, just choose 'install', and if that 
doesn't work, try 'install video=ofonly'.
************************************
Welcome to yaboot version 1.3.12
Enter "help" to get some basic usage information

sbp2:Open ->login?
speed=ffffffff 2 2 boot: 
I tried all of the provided images, with different options - no success. A common option to be used because of the serial port is "console=ttyS0,57600". All I got was:
boot: expert-power4
Please wait, loading kernel...

sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2    Elf32 kernel loaded...
copying OF device tree...done
starting cpu /cpus/PowerPC,G5...failed: 00000000 
Calling quiesce ...

erasing fff06000  of Micron B1 part
flashing fff06000  of Micron B1 part
swapping blocks
DO-QUIESCE finishedreturning 0x01400000 from prom_init

Playing with the Gentoo live cd images didn't bring me any further at all.

I then tried to compile a current 32bit ppc 2.6.8-rc2 kernel by hand (for G5 CPU's). Putting this kernel on the debian installer ISO didn't get me any further. So apparently either the serial port is not working, or the kernel crashes somewhere.

Using a cross-compiler running on my dual G4 PowerMac, I compiled the same 2.6.8-rc2 kernel for ppc64 target platform. Putting this on the debian boot cd helped a lot, I now got it as far as:

boot: expert-g5-64 console=ttyS0,57600
Please wait, loading kernel...

sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2 
sbp2:Open ->login?
speed=ffffffff 2 2    Elf64 kernel loaded...
Looking for displays
OF stdout is    : /ht@0,f2000000/pci@3/mac-io@7/escc@13000/ch-a@13020
Opening displays...
Calling quiesce ...

DO-QUIESCE finishedreturning from prom_init
Found U3 memory controller & host bridge, revision: 53
Mapped at 0xe000000080000000                          
Found a K2 mac-io controller, rev: 96, mapped at 0xe000000080041000
PowerMac motherboard: XServe G5                                    
Starting Linux PPC64 2.6.8-rc1 
-----------------------------------------------------
naca                          = 0xc000000000004000   
naca->pftSize                 = 0x17              
naca->debug_switch            = 0x0 
naca->interrupt_controller    = 0x1
systemcfg                     = 0xc000000000005000
systemcfg->processorCount     = 0x2               
systemcfg->physicalMemorySize = 0x20000000
systemcfg->dCacheL1LineSize   = 0x80      
systemcfg->iCacheL1LineSize   = 0x80
htab_data.htab                = 0xc00000001f800000
htab_data.num_ptegs           = 0x10000           
-----------------------------------------------------
[boot]0100 MM Init                                   
[boot]0100 MM Init Done
idle = native_idle     
Linux version 2.6.8-rc1 (laforge@dathomir) (gcc version 3.4.1) #4 SMP Sat Jul 31 16:12:42 CEST 2004
[boot]0012 Setup Arch
via-pmu: Server Mode is disabled
PMU driver 2 initialized for Core99, firmware: 0c
nvram: Checking bank 0...                        
nvram: gen0=204, gen1=205
nvram: Active bank is: 1 
Adding PCI host bridge /pci@0,f0000000
Found U3-AGP PCI host bridge. Firmware bus number: 240->255
Adding PCI host bridge /ht@0,f2000000                      
Can't get bus-range for /ht@0,f2000000, assume bus 0
U3/HT: hole, 0 end at 9fffffff, 1 start at b0000000 
Found U3-HT PCI host bridge. Firmware bus number: 0->239
Can't get bus-range for /ht@0,f2000000                  
PCI Host 0, io start: fffffffffd800000; io end: fffffffffdffffff
PCI Host 1, io start: 0; io end: 3fffff                         
Top of RAM: 0x20000000, Total RAM: 0x20000000
Memory hole size: 0MB                        
On node 0 totalpages: 131072
  DMA zone: 131072 pages, LIFO batch:16
  Normal zone: 0 pages, LIFO batch:1   
  HighMem zone: 0 pages, LIFO batch:1
[boot]0015 Setup Done                
Built 1 zonelists    
Kernel command line: ro debconf_priority=low devfs=mount,dall init=/linuxrc console=ttyS0,57600
PowerMac using OpenPIC irq controller at 0x80040000
[boot]0020 OpenPic Init                            
OpenPIC Version 1.2 (4 CPUs and 120 IRQ sources) at e000000082ccd000
OpenPIC timer frequency is 25.000000 MHz                            
[boot]0021 OpenPic Timer                
[boot]0022 OpenPic IPI  
[boot]0023 OpenPic Ext
[boot]0024 OpenPic Spurious
[boot]0025 OpenPic Done    
Slave OpenPIC at 0xf8040000 hooked on IRQ 56
[boot]0020 OpenPic U3 Init                  
OpenPIC (U3) Version 1.2  
[boot]0025 OpenPic2 Done
PID hash table entries: 16 (order 4: 256 bytes)
time_init: decrementer frequency = 33.333333 MHz
Console: colour dummy device 80x25              
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)   
Memory: 498688k available (3840k kernel code, 4120k data, 212k init) [c000000000000000,c000000020000000]
Calibrating delay loop... 66.56 BogoMIPS
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
PowerMac SMP probe found 2 cpus                           
Processor 1 found.             
Synchronizing timebase
Got ack               
score 299, offset 1000
score 299, offset 500 
score 299, offset 250
score 299, offset 125
score 299, offset 62 
score 299, offset 31
score 239, offset 15
score -107, offset 7
score 101, offset 11
score -5, offset 9  
score 63, offset 10
score -51, offset 9
Min 9 (score 5), Max 10 (score 87)
Final offset: 9 (61/300)          
Brought up 2 CPUs       
NET: Registered protocol family 16
PCI: Probing PCI hardware         
U3-DART: table not allocated, using direct DMA
PCI: Probing PCI hardware done                
PCI: no pci dn found for dev=0001:04:0f.0 Apple Computer Inc. K2 GMAC (Sun GEM)
PCI: no pci dn found for dev=0001:05:0c.1 PCI device 1166:0240 (ServerWorks)   
SCSI subsystem initialized                                                  
usbcore: registered new driver usbfs
usbcore: registered new driver hub  
nvram_init: Could not find nvram partition for nvram buffered error logging.
rtasd: no RTAS on system                                                    
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au)   
devfs: boot_options: 0x1                              
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Initializing Cryptographic API                          
pmac_zilog: 0.6 (Benjamin Herrenschmidt )
ttyS0 at MMIO 0x80013020 (irq = 22) is a Z85c30 ESCC - Serial port 
ttyS1 at MMIO 0x80013000 (irq = 23) is a Z85c30 ESCC - Serial port
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)                                         
sungem.c:v0.98 8/24/03 David S. Miller (davem@redhat.com)
So apparently, there were some issues finding the OpenFirmware dn (distinguished name) for the Ethernet Chips and the ServerWorks chips. I tried to put some printk's into the arch/ppc64/pci_dn.c file to see what's going on. This then led me to the earlier error messages about the U3-DART. After reading some more code, it appeared like the DART is Apple's IOMMU, and it is supposed to be needed only when running with >2GB RAM. My box had 512MB, but I tried to force usage of the DART by putting "iommu=force" into the kernel commandline.

Great, this was apparently the problem, since now I got up to the point where it wanted to mount the root filesystem. I thought I didn't really need an initrd, since the kernel contained all drivers statically linked in. However, Debian installer seems to be running inside initrd only.

First try was just using one of the pre-supplied initrd.gz images. Yes, they have the wrong versions of the modules - but I don't want/need those modules anyway.

Of course this wouldn't work either:

RAMDISK: Compressed image found at block 0                 
Kernel panic: VFS: Unable to mount root fs on unknown-block(0,0)
 <0>Rebooting in 180 seconds..                       
No errror message, nothing. So I thought the problem is with devfs, and I tried passing several different root parameters ('root=/dev/ram', 'root=/dev/rd/0') without any success.

In the end I found out that the structure sizes of the cramfs superblock (include/linux/cram_fs_sb.h) are arch-dependent, so I cannot use an initrd that was built on a ppc32 machine. Unfortunately it is also endian-dependent, and at this time I only have 32bit big endian and 64bit little endian boxes at home.

Next step was to use an ext2 initrd, since reasonable filesystems don't have any strange host/byteorder/wordsize dependencies.

Now it is able to load the initrd, and mount it... although then some other stuff goes terribly wrong. No time yet to investigate this.

OLS2004 is over

After holding a BOF on GPL-Violations, and the traditional netfilter/iptables BOF, OLS ended with Andrew Morton's Keynote.

Obviously, there also was the traditional OLS Social Event at the Black Thorn Pub, which I left quite early in order to get some more work done on the ulogd2 flow accounting work.

David Miller survived my 13-patch patch-bomb

This is good news, DaveM accepted all the 13 netfilter related patches that I had pending for 2.6.9. The patches included a number of optimizations, the ctstat, connection-based accounting, TCP window tracking, and some conversions to new in-kernel-API (seq_file, module_param).

Now let's hope that 2.6.8 will be released soon and we can start the 2.6.9 cycle...

Merging 2.6.8-rc2 changes into patch-o-matic ng

I just started the boring job of merging 2.6.8-rc2 with patch-o-matic-ng... I'm happy that Jozsef, Martin and Patrick did this for the last couple of kernel releases. However, I need to get more into this job again in order to determine which patches still have to be submitted to the mainline kernel...

Expect some pom-ng breakage over the next couple of days...

IPFIX / ulog integration

After some more in-depth study of the IPFIX IETF drafts, I finally started coding. Having written the first dozens of lines, I discovered that on an abstract layer IPFIX doesn't do something too different from my good old ulogd. Ignoring the minor difference that ulogd deals with individual packets and IPFIX with flows, the ulogd_iret_t structure is very similar to what IPFIX templates are trying to describe.

So I now forked a ulogd2 branch off the current ulogd subversion tree and started to reorganize the tree.

For more flexibility, I am going for a stackable plugin infrastructure, where the sysadmin can configure stacks like: ULOG->ulogd_BASE->flow aggregation->IPFIX-over-TCP-export or ctnetlink->IPFIX-over-SMTP-export.

First day of OLS

OLS started today (well, it started with the official beer-drinking BOF yesterday night). Like at the kernel summit, there are massive problems with the wireless network, forcing me to operate in offline mode most of the time.

The presenters are apparently all running in slow motion, so I can allocate a small time-slice to listen to them and spend most of the time working on some code (conntrack-accounting/ipfix, qsearch, browsing through Rusty's patches). OLS thus starts more productive than I would have thought ;)

Had lunch with Daniel Phillips, who is now working on clustering infrastructure at RedHat. We detected a general shift from the 'everything is a filesystem' to 'everything is a socket' mentality.

Working towards IPFIX based on conntrack

I've written a patch to add 64bit packet and byte counters for both directions of every ip_conntrack. This should enable a clean and efficient implementation of flow based accounting, when combined with ctnetlink events and a userspace daemon picking up those events.

I need to study the IPFIX (IETF Working Group) specifications in more detail before writing the respective daemon...

The patch is apparently working, you can read the counters via /proc/net/ip_conntrack and also use a modified/extended/updated version of the 'connbytes' match.

Pattern-matching API in the 2.6.x Kernel

There are various places in the kernel where we need to do some kind of pattern matching on the packet contents. Applications range from connection tracking helpers (looking for FTP PORT command, ...) over the 'string' match to intrusion detection systems.

Two years ago, Phillipe Biondi once came up with something called libqsearch. It implements a generic pattern matching API, supporting plugin based algorithm implementations.

I now took the liberty of porting this into a 2.6.x kernel, resulting in lots of changes that make my qsearch port now incompatible with what Philipe wrote. Anyway, I'm now in the process of combining this with Rusty's recent work on skb_walk() and skb_iter(), so we can pattern-match against a fragmented/nonlinear skb without any copy.

Day one of the Kernel Summit

So this was day one of the famous kernel summit. Apart from meeting lots of friends, this basically meant lots of in-depth technical discussions on various subjects.

Most noticeable were long discussions about the deficiencies of the power management API, problems with 3-level-page tables on AMD64, and last but not least: The first-hand technical information from AMD, Intel and IBM on their upcoming CPU generations.

My personal favourite (AMD) will be shipping dual core (not hyper-threading, but two real cores) CPU's by mid 2005. They share the same Hyper-transport and Memory interface, and therefore have to divide I/O Bandwidth between them.

Also had some interesting discussions with Jamal about netfilter performance and the future l3 generalized connection tracking (called nf_conntrack). Maybe I can talk him into attending the netfilter workshop for further discussion of his ideas.

Just arrived in Portland, OR for the Linux networking conference

Getting to Portland (via Frankfurt) turned out to be no problem. But getting though Portland and to the reserved hotel was a bit problematic. Apparently there was a jam of MAX light rail trains, and we had to wait for quite some time before the journey could continue.

Then I decided to get off at Beaverton station. My assumptions was that at such a terminal, there would certainly be some cabs nearby. However, there weren't. Maybe I forgot that this is the U.S. and public transportation is not like what I'm used to.,

Anyway, I switched to the bus.

However, i went the wrong way. My destination was close to 158th Avenue, but the bus went to 185th Avenue. As it was getting late already, I decided not to go back by bus to Beaverton, but rather walk from 185th to 158th Avenue. Despite the hot sun (27 centigrade in the shade), my backpack and the suitcase, this was only a 40 minutes walk. This reminded me to hiking in southern France with the boyscouts ;)

Well, the hotel did not yet cancel my reservation, and after a cool shower and checking my email I went to sleep. Due to the jetlag I was awake at 3:30am. Not too bad, considering a 9 hour time shift.

Let's hope the trip from the hotel to the conference venue will be less complicated ;)

Apple Xserve Dual G5 ClusterNode arrived

I still cannot believe it. After waiting only four months, the Dual G5 XServe has just arrived today. Unfortunately I'm leaving for a two week trip (Linux Networking Summit, Linux Kernel Summit, Ottawa Linux Symposium) tomorrow, so I don't really have any time to play with it.

Just had a quick look under the hood, and it seems like putting some additional drive in place isn't difficult at all. The Mainboard has three SATA connectors (two empty), and if you remove the front panel you can access the two empty 3.5" drive bays. The PCB for Apples hot swap bays is also present, but unfortunately missing the SATA and Drive connectors.

The only remaining issue is getting power to the SATA drive, but that should be pretty easy to find out...

So you might ask yourself why I didn't buy the non-ClusterNode in the first place? Because it's way more expensive and apart from those tiny details exactly the same hardware.

Another interesting part will be bootstrapping Debian/ppc onto that box - without any VGA board and only a serial console. Apparently there is no distribution that really supports installation via Serial console (even on x86)... despite being extremely easy to implement... *sigh*

Adding support for multiple data acquisition boards

Tomorrow I'll be hacking GSPC to support multiple data acquisition boards in one system. The main focus will be testing, since GSPC already contains the untested code for this. However, the vendor-supplied driver is hardly able to deal with this situation *lol*.

Make ESTIC compile with recent gcc

I doubt there are many readers who know about ESTIC. It is a multi-platform (Dos, Windows, Linux, *BSD, OS/2) configuration software for some ISDN PBX systems (ISTEC 1003/1008) that used to be common in the early to mid 90's in Germany.

The original vendor of the PBX went out of business long time ago, and the author of the ESTIC tool even removed his ESTIC homepage in 1997.

Anyway, I still have some of the old ISTEC boxes in use, even at friends places. It turned out that we needed to reconfigure one of these old boxes, so I took the source code (yes, it's open source) and made it compile with recent gcc. Jeez, I never liked C++...

So if anybody is interested in the patch to compile estic 1.60 with gcc-3.3.4 on Linux: it's located at ftp://ftp.gnumonks.org/pub/patches/es160src-linux-gcc3.3.4.patch.

New all-in-one fileserver

I just put my new all-in-one fileserver [called 'sunbeam'] into production. It's a Athlon64 based system, with five 200GB SATA drives. Since there's now a working (but still unofficial) Debian AMD64 Port, I can even run a 64bit distribution on it. As far as production machines go, I have the policy to only run debian on them.

This machine replaces the old box [now called 'shiva'], (Athlon 1000 with eight hard drives 80-120GB] which now serves as main storage server for my network-wide backup system.

As new backup solution, I chose to use Bacula. The architecture just looks like the 'right thing'. The catalog is maintained in the SQL RDBMS of your choice, every to-be-baked-up machine runs a client (bacula-fd). A director running on one server then directs the clients to write their data to one or more storage servers of their choice. This now also means that I can have one centralized directore for three different physical locations (yes, my machines are spread over three distinct locations with low bandwidth interconnection). Every location just has it's own storage server :). Oh yes, of course, this is on-site backup to hard drives only. Won't help if my house went fire.

The Karlsruhe Cemetery

On Sunday morning the weather was fine (sun shining but not too hot), so I finally went to the Karlsruhe Hauptfriedhof (main cemetery) for some photo shooting. For those of you who don't know it yet: Photography [especially of historical cemeteries] is one of my hobbies.

To my big surprise, there were a number of beautiful graves, angels, statues, ... Normally 99% of all cemeteries in Germany look the same, since most of the graves are from 1950 to present - and apparently nobody has the money (and the taste) for something different than a standard grave stone.

Also, this was one of the first occasions to get some more experience with my new digital SLR camera, the EOS-300D. I really hope that the convenience of digital photography won't prevent me from still doing real chemical b/w photography... Especially with my lack of time, I fear this possibility.

Now you may be asking yourself: Where are the pictures? Well, I really want to show you all of them, but first I need to get the database-enabled photo repository finished. Stay tuned.

LinuxTag 2004

The annual LinuxTag... Germany's biggest Linux / Free Software event. Well, apparently I start to get bored by conferences, especially mostly end-user or sysadmin oriented ones. The only interesting part is meeting with old friends, talking to fellow developers, ...

Unfortunately no Brazilian Linux hackers present, so I ended up talking Portuguese to a German guy who lives in France ;) Discovered that my pt_BR is really really rusty these days. I should find myself some conversation classes at home in Berlin.

If it wasn't for Astaro (whose main office is in Karlsruhe), I guess I wouldn't have been at LinuxTag anymore.

Initiative for Freedom of Information Act in Germany

As I became aware today, there is a new initiative for something like a Freedom of Information Act in Germany at pro-information.de.

Surprisingly, this apparently has not been communicated a lot, considering the small number of about 2000 signatures so far.

If you feel like Germany should enact a FOIA in order to give citizens, journalists and historians access to all kinds of files of the administration, please support support the pro-information campaign by signing it.

Interview about my GPL enforcement efforts

The German IT News Portal "golem.de" has just published an Interview with me, entitled "The freedom of the GPL has limitations".

I wish I had sometimes used less complex sentence structures - but hey, I'm not very used to give interviews anyway.

Sorry for you English speaking people out there. I would love to give that interview in English, but no English news portal asked for one ;)

Apparently even shareware authors disrespect the GPL

I just received an email claiming that there is a proprietary shareware program called "DVDxDV" sold for USD 80. The author of the email claims that DVDxDV includes code from the GPL licensed liba52 project (formerly known as ac3dec).

While I didn't do any tests on this alleged infringement (yet), there seems to be more information about this issue on http://gpl-cowboy.blogspot.com/

Maybe I'll find some time to investigate soon...

New iptables-1.2.11 and patch-o-matic-ng-20040621 release

I have just released iptables-1.2.11 and patch-o-matic-ng-20040621 on the netfilter homepage.

Seems like we'll never have an iptables release that doesn't introduce some severe bug that requires releasing another version immediately later. To some part, I blame the users. Seems like not enough of them try the CVS snapshots and report bugs back to us.

I'll be speaking at WOS3

Wizards of OS is a conference on the future of the digital commons, to be held at Jun 10 to Jun 12 in Berlin, Germany.

I'll be participating on a panel on the future of copyright, where I'll present my recent success in enforcing the GNU General Public License.

WLAN Router project

I've started to work on a WLAN Router project based on the PC Engines WRAP.1C platform.

I decided to go for the wisp-dist LEAF branch, modified to work with uClibc and a 2.6.x kernel.

The major part, however, is adding the required WDS functionality to the madwifi driver.

But this definitely is a fun project to work on :)