Installing Linux on a G5 ClusterNode XServe
Now that I got this decent new dual G5 box, I wanted to install Linux.
This turned out to be an extremely difficult job, as apparently nobody has ever
tried to install Linux on any of the new XServe G5 Series machines, neither
32bit nor 64bit kernels.
There are a number of challenges:
- No internal IDE or SCSI CD-ROM
- Only serial console
- A very new hardware with little Linux support
First I tried a number of ready-built installation ISO images, including the
current sarge Debian-installer image for PPC, and the 32bit and 64bit live
images of Gentoo.
The first thing I had to do is to disable autoboot and enable the serial
console. Luckily, the box actually ships with a manual that instructs you how
to put the OF boot console on the serial port. You have to press the admin (!)
Button at the front of the box a magic number of times.
To permanently make the serial console work, use the following OF commands:
> setenv input-device scca
> setenv output-device scca
Next I had to figure out how to boot from the external firewire cdrom..
apparently this depends on your OF device tree and the GUID of your firewire
device. On my particular box it works with
> devalias cd /ht/pci@5/firewire@e/node@00d04b3c50090210/sbp-2@c000/disk@0
Using Commands like
> dir cd:,\
I was then able to list files on the CD-ROM. To boot the yaboot loader on a
Debian installer cd image, you can use
> boot cd:,\install\yaboot
sbp2:Open ->login?
speed=ffffffff 2 2 load-size=239a4 adler32=a5cf5aa0
Loading ELF
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2 Config file read, 2907 bytes
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2 \
sbp2:Open ->login?
speed=ffffffff 2 2 Welcome to Debian GNU/Linux sarge!
This is a Debian installation CDROM,
built on 20040729.
The default option is 'install'. For maximum
control, you can use the 'expert' option.
If the system fails to boot at all (the typical
symptom is a white screen which doesn't go away),
use 'install video=ofonly' or 'expert video=ofonly'.
The plain options are for the powerpc family of
processors (from 601 to G4). The *-power3 options
are for IBM Power3 boxes, and the *-power4 options
are for IBM Power4 and Apple G5 boxes. Press the tab
key for a list of options, or type 'help' for help.
************************************
If in doubt, just choose 'install', and if that
doesn't work, try 'install video=ofonly'.
************************************
Welcome to yaboot version 1.3.12
Enter "help" to get some basic usage information
sbp2:Open ->login?
speed=ffffffff 2 2 boot:
I tried all of the provided images, with different options - no success. A
common option to be used because of the serial port is "console=ttyS0,57600".
All I got was:
boot: expert-power4
Please wait, loading kernel...
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2 Elf32 kernel loaded...
copying OF device tree...done
starting cpu /cpus/PowerPC,G5...failed: 00000000
Calling quiesce ...
erasing fff06000 of Micron B1 part
flashing fff06000 of Micron B1 part
swapping blocks
DO-QUIESCE finishedreturning 0x01400000 from prom_init
Playing with the Gentoo live cd images didn't bring me any further at all.
I then tried to compile a current 32bit ppc 2.6.8-rc2 kernel by hand (for G5
CPU's). Putting this kernel on the debian installer ISO didn't get me any
further. So apparently either the serial port is not working, or the kernel
crashes somewhere.
Using a cross-compiler running on my dual G4 PowerMac, I compiled the same
2.6.8-rc2 kernel for ppc64 target platform. Putting this on the debian boot cd helped a lot, I now got it as far as:
boot: expert-g5-64 console=ttyS0,57600
Please wait, loading kernel...
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2
sbp2:Open ->login?
speed=ffffffff 2 2 Elf64 kernel loaded...
Looking for displays
OF stdout is : /ht@0,f2000000/pci@3/mac-io@7/escc@13000/ch-a@13020
Opening displays...
Calling quiesce ...
DO-QUIESCE finishedreturning from prom_init
Found U3 memory controller & host bridge, revision: 53
Mapped at 0xe000000080000000
Found a K2 mac-io controller, rev: 96, mapped at 0xe000000080041000
PowerMac motherboard: XServe G5
Starting Linux PPC64 2.6.8-rc1
-----------------------------------------------------
naca = 0xc000000000004000
naca->pftSize = 0x17
naca->debug_switch = 0x0
naca->interrupt_controller = 0x1
systemcfg = 0xc000000000005000
systemcfg->processorCount = 0x2
systemcfg->physicalMemorySize = 0x20000000
systemcfg->dCacheL1LineSize = 0x80
systemcfg->iCacheL1LineSize = 0x80
htab_data.htab = 0xc00000001f800000
htab_data.num_ptegs = 0x10000
-----------------------------------------------------
[boot]0100 MM Init
[boot]0100 MM Init Done
idle = native_idle
Linux version 2.6.8-rc1 (laforge@dathomir) (gcc version 3.4.1) #4 SMP Sat Jul 31 16:12:42 CEST 2004
[boot]0012 Setup Arch
via-pmu: Server Mode is disabled
PMU driver 2 initialized for Core99, firmware: 0c
nvram: Checking bank 0...
nvram: gen0=204, gen1=205
nvram: Active bank is: 1
Adding PCI host bridge /pci@0,f0000000
Found U3-AGP PCI host bridge. Firmware bus number: 240->255
Adding PCI host bridge /ht@0,f2000000
Can't get bus-range for /ht@0,f2000000, assume bus 0
U3/HT: hole, 0 end at 9fffffff, 1 start at b0000000
Found U3-HT PCI host bridge. Firmware bus number: 0->239
Can't get bus-range for /ht@0,f2000000
PCI Host 0, io start: fffffffffd800000; io end: fffffffffdffffff
PCI Host 1, io start: 0; io end: 3fffff
Top of RAM: 0x20000000, Total RAM: 0x20000000
Memory hole size: 0MB
On node 0 totalpages: 131072
DMA zone: 131072 pages, LIFO batch:16
Normal zone: 0 pages, LIFO batch:1
HighMem zone: 0 pages, LIFO batch:1
[boot]0015 Setup Done
Built 1 zonelists
Kernel command line: ro debconf_priority=low devfs=mount,dall init=/linuxrc console=ttyS0,57600
PowerMac using OpenPIC irq controller at 0x80040000
[boot]0020 OpenPic Init
OpenPIC Version 1.2 (4 CPUs and 120 IRQ sources) at e000000082ccd000
OpenPIC timer frequency is 25.000000 MHz
[boot]0021 OpenPic Timer
[boot]0022 OpenPic IPI
[boot]0023 OpenPic Ext
[boot]0024 OpenPic Spurious
[boot]0025 OpenPic Done
Slave OpenPIC at 0xf8040000 hooked on IRQ 56
[boot]0020 OpenPic U3 Init
OpenPIC (U3) Version 1.2
[boot]0025 OpenPic2 Done
PID hash table entries: 16 (order 4: 256 bytes)
time_init: decrementer frequency = 33.333333 MHz
Console: colour dummy device 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 498688k available (3840k kernel code, 4120k data, 212k init) [c000000000000000,c000000020000000]
Calibrating delay loop... 66.56 BogoMIPS
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
PowerMac SMP probe found 2 cpus
Processor 1 found.
Synchronizing timebase
Got ack
score 299, offset 1000
score 299, offset 500
score 299, offset 250
score 299, offset 125
score 299, offset 62
score 299, offset 31
score 239, offset 15
score -107, offset 7
score 101, offset 11
score -5, offset 9
score 63, offset 10
score -51, offset 9
Min 9 (score 5), Max 10 (score 87)
Final offset: 9 (61/300)
Brought up 2 CPUs
NET: Registered protocol family 16
PCI: Probing PCI hardware
U3-DART: table not allocated, using direct DMA
PCI: Probing PCI hardware done
PCI: no pci dn found for dev=0001:04:0f.0 Apple Computer Inc. K2 GMAC (Sun GEM)
PCI: no pci dn found for dev=0001:05:0c.1 PCI device 1166:0240 (ServerWorks)
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
nvram_init: Could not find nvram partition for nvram buffered error logging.
rtasd: no RTAS on system
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Initializing Cryptographic API
pmac_zilog: 0.6 (Benjamin Herrenschmidt )
ttyS0 at MMIO 0x80013020 (irq = 22) is a Z85c30 ESCC - Serial port
ttyS1 at MMIO 0x80013000 (irq = 23) is a Z85c30 ESCC - Serial port
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: loaded (max 8 devices)
sungem.c:v0.98 8/24/03 David S. Miller (davem@redhat.com)
So apparently, there were some issues finding the OpenFirmware dn
(distinguished name) for the Ethernet Chips and the ServerWorks chips. I tried
to put some printk's into the arch/ppc64/pci_dn.c file to see what's going on.
This then led me to the earlier error messages about the U3-DART. After
reading some more code, it appeared like the DART is Apple's IOMMU, and it is
supposed to be needed only when running with >2GB RAM. My box had 512MB, but I tried to force usage of the DART by putting "iommu=force" into the kernel commandline.
Great, this was apparently the problem, since now I got up to the point where
it wanted to mount the root filesystem. I thought I didn't really need an
initrd, since the kernel contained all drivers statically linked in. However, Debian installer seems to be running inside initrd only.
First try was just using one of the pre-supplied initrd.gz images. Yes, they
have the wrong versions of the modules - but I don't want/need those modules
anyway.
Of course this wouldn't work either:
RAMDISK: Compressed image found at block 0
Kernel panic: VFS: Unable to mount root fs on unknown-block(0,0)
<0>Rebooting in 180 seconds..
No errror message, nothing. So I thought the problem is with devfs, and I
tried passing several different root parameters ('root=/dev/ram',
'root=/dev/rd/0') without any success.
In the end I found out that the structure sizes of the cramfs superblock
(include/linux/cram_fs_sb.h) are arch-dependent, so I cannot use an initrd that
was built on a ppc32 machine. Unfortunately it is also endian-dependent, and
at this time I only have 32bit big endian and 64bit little endian boxes at
home.
Next step was to use an ext2 initrd, since reasonable filesystems don't have
any strange host/byteorder/wordsize dependencies.
Now it is able to load the initrd, and mount it... although then some other stuff goes terribly wrong. No time yet to investigate this.
[ /linux |
permanent link ]
Putting multiple SATA drives into a XServe ClusterNode G5
Apple is selling two different models of their Dual G5 XServe: One 'Normal'
model, and another 'ClusterNode' Model. They are pretty much the same, but the
ClusterNode doesn't have things you usually don't need in a rack-mounted 1U
server anyway: CD-ROM and VGA-Card. However, it is also limited to a single
hard drive.
I guess Apple's reason is that in a scientific cluster computing environment,
the node's local storage is insignificant - whereas on a real server you most
likely want multiple (mirrored) drives.
However, the significant price difference (Dual G5 ClusterNode has the same
price as the Single G5 XServe) made me ponder buying a ClusterNode and adding another drive.
Fortunately, the hardware is quite similar. It turns out that the Mainboard
has three SATA connectors, and the space for the 2nd and 3rd IDE drive was left
empty. Also, the Backplane for Apples hotplug drives is not fully assembled - it is missing the connectors for the 2nd and 3rd drive :(
So Putting the drive in place and attaching it via a fixed cable to the SATA
connector is no problem at all. However, Power is a slight problem. The whole
machine has not a single standard power connector, so my only remaining option
was to solder some wires onto the drive backplane PCB. This is ugly, but
well.. who cares ;)
I'll put some photos of the modification online soon.
[ /linux |
permanent link ]
David Miller survived my 13-patch patch-bomb
This is good news, DaveM accepted all the 13 netfilter related patches that I
had pending for 2.6.9. The patches included a number of optimizations, the
ctstat, connection-based accounting, TCP window tracking, and some conversions
to new in-kernel-API (seq_file, module_param).
Now let's hope that 2.6.8 will be released soon and we can start the 2.6.9 cycle...
[ /linux/netfilter |
permanent link ]
OLS2004 is over
After holding a BOF on GPL-Violations, and the traditional netfilter/iptables
BOF, OLS ended with Andrew Morton's Keynote.
Obviously, there also was the traditional OLS Social Event at the Black Thorn
Pub, which I left quite early in order to get some more work done on the ulogd2
flow accounting work.
[ /linux/conferences |
permanent link ]
Final court opinion on Sitecom Appeal released
The court handling the Sitecom appeals case has now released it's final
opinion. For those of you who happen to understand legal German, the 20 page document is available as PDF. An English translation will be available soon.
[ /linux/gpl-violations |
permanent link ]
Merging 2.6.8-rc2 changes into patch-o-matic ng
I just started the boring job of merging 2.6.8-rc2 with patch-o-matic-ng... I'm
happy that Jozsef, Martin and Patrick did this for the last couple of kernel
releases. However, I need to get more into this job again in order to
determine which patches still have to be submitted to the mainline kernel...
Expect some pom-ng breakage over the next couple of days...
[ /linux/netfilter |
permanent link ]
IPFIX / ulog integration
After some more in-depth study of the IPFIX IETF drafts, I finally started
coding. Having written the first dozens of lines, I discovered that on an
abstract layer IPFIX doesn't do something too different from my good old ulogd.
Ignoring the minor difference that ulogd deals with individual packets and
IPFIX with flows, the ulogd_iret_t structure is very similar to what IPFIX
templates are trying to describe.
So I now forked a ulogd2 branch off the current ulogd subversion tree and
started to reorganize the tree.
For more flexibility, I am going for a stackable plugin infrastructure, where
the sysadmin can configure stacks like: ULOG->ulogd_BASE->flow
aggregation->IPFIX-over-TCP-export or ctnetlink->IPFIX-over-SMTP-export.
[ /linux/netfilter |
permanent link ]
Group Photo of the Kernel Summit
At http://gnumonks.org/static/photos/ks2004/ are the group photos of this year's Kernel Summit. You obviously won't find me on those pictures, since I was behind the camera ;)
[ /linux/conferences |
permanent link ]
First day of OLS
OLS started today (well, it started with the official beer-drinking BOF
yesterday night). Like at the kernel summit, there are massive problems with
the wireless network, forcing me to operate in offline mode most of the time.
The presenters are apparently all running in slow motion, so I can allocate a
small time-slice to listen to them and spend most of the time working on some
code (conntrack-accounting/ipfix, qsearch, browsing through Rusty's patches). OLS thus starts more productive than I would have thought ;)
Had lunch with Daniel Phillips, who is now working on clustering infrastructure
at RedHat. We detected a general shift from the 'everything is a filesystem' to 'everything is a socket' mentality.
[ /linux/conferences |
permanent link ]
Working towards IPFIX based on conntrack
I've written a patch to add 64bit packet and byte counters for both directions
of every ip_conntrack. This should enable a clean and efficient implementation
of flow based accounting, when combined with ctnetlink events and a userspace
daemon picking up those events.
I need to study the IPFIX (IETF Working Group) specifications in more detail before writing the respective daemon...
The patch is apparently working, you can read the counters via
/proc/net/ip_conntrack and also use a modified/extended/updated version of the
'connbytes' match.
[ /linux/netfilter |
permanent link ]
Day one of the Kernel Summit
So this was day one of the famous kernel summit. Apart from meeting lots of
friends, this basically meant lots of in-depth technical discussions on various
subjects.
Most noticeable were long discussions about the deficiencies of the power
management API, problems with 3-level-page tables on AMD64, and last but not
least: The first-hand technical information from AMD, Intel and IBM on their
upcoming CPU generations.
My personal favourite (AMD) will be shipping dual core (not hyper-threading, but
two real cores) CPU's by mid 2005. They share the same Hyper-transport and
Memory interface, and therefore have to divide I/O Bandwidth between them.
Also had some interesting discussions with Jamal about netfilter performance
and the future l3 generalized connection tracking (called nf_conntrack). Maybe
I can talk him into attending the netfilter workshop for further discussion of
his ideas.
[ /linux/conferences |
permanent link ]
Pattern-matching API in the 2.6.x Kernel
There are various places in the kernel where we need to do some kind of pattern
matching on the packet contents. Applications range from connection tracking
helpers (looking for FTP PORT command, ...) over the 'string' match to
intrusion detection systems.
Two years ago, Phillipe Biondi once came up with something called libqsearch. It implements a generic pattern matching API, supporting plugin based algorithm implementations.
I now took the liberty of porting this into a 2.6.x kernel, resulting in lots
of changes that make my qsearch port now incompatible with what Philipe wrote.
Anyway, I'm now in the process of combining this with Rusty's recent work on
skb_walk() and skb_iter(), so we can pattern-match against a
fragmented/nonlinear skb without any copy.
[ /linux/netfilter |
permanent link ]
Just arrived in Portland, OR for the Linux networking conference
Getting to Portland (via Frankfurt) turned out to be no problem. But getting
though Portland and to the reserved hotel was a bit problematic. Apparently
there was a jam of MAX light rail trains, and we had to wait for quite some
time before the journey could continue.
Then I decided to get off at Beaverton station. My assumptions was that at
such a terminal, there would certainly be some cabs nearby. However, there
weren't. Maybe I forgot that this is the U.S. and public transportation is not
like what I'm used to.,
Anyway, I switched to the bus.
However, i went the wrong way. My destination was close to 158th Avenue, but
the bus went to 185th Avenue. As it was getting late already, I decided not to
go back by bus to Beaverton, but rather walk from 185th to 158th Avenue.
Despite the hot sun (27 centigrade in the shade), my backpack and the suitcase,
this was only a 40 minutes walk. This reminded me to hiking in southern France with the boyscouts ;)
Well, the hotel did not yet cancel my reservation, and after a cool shower and
checking my email I went to sleep. Due to the jetlag I was awake at 3:30am. Not too bad, considering a 9 hour time shift.
Let's hope the trip from the hotel to the conference venue will be less complicated ;)
[ /linux/conferences |
permanent link ]
Apple Xserve Dual G5 ClusterNode arrived
I still cannot believe it. After waiting only four months, the Dual G5 XServe
has just arrived today. Unfortunately I'm leaving for a two week trip (Linux
Networking Summit, Linux Kernel Summit, Ottawa Linux Symposium) tomorrow, so I
don't really have any time to play with it.
Just had a quick look under the hood, and it seems like putting some additional
drive in place isn't difficult at all. The Mainboard has three SATA connectors
(two empty), and if you remove the front panel you can access the two empty
3.5" drive bays. The PCB for Apples hot swap bays is also present, but
unfortunately missing the SATA and Drive connectors.
The only remaining issue is getting power to the SATA drive, but that should be pretty easy to find out...
So you might ask yourself why I didn't buy the non-ClusterNode in the first
place? Because it's way more expensive and apart from those tiny details
exactly the same hardware.
Another interesting part will be bootstrapping Debian/ppc onto that box -
without any VGA board and only a serial console. Apparently there is no
distribution that really supports installation via Serial console (even on
x86)... despite being extremely easy to implement... *sigh*
[ /linux |
permanent link ]
Adding support for multiple data acquisition boards
Tomorrow I'll be hacking GSPC to support multiple data acquisition boards in
one system. The main focus will be testing, since GSPC already contains the
untested code for this. However, the vendor-supplied driver is hardly able to
deal with this situation *lol*.
[ /linux/gspc |
permanent link ]
Make ESTIC compile with recent gcc
I doubt there are many readers who know about ESTIC. It is a multi-platform
(Dos, Windows, Linux, *BSD, OS/2) configuration software for some ISDN PBX
systems (ISTEC 1003/1008) that used to be common in the early to mid 90's in
Germany.
The original vendor of the PBX went out of business long time ago, and the
author of the ESTIC tool even removed his ESTIC homepage in 1997.
Anyway, I still have some of the old ISTEC boxes in use, even at friends
places. It turned out that we needed to reconfigure one of these old boxes, so
I took the source code (yes, it's open source) and made it compile with recent
gcc. Jeez, I never liked C++...
So if anybody is interested in the patch to compile estic 1.60 with gcc-3.3.4 on Linux: it's located at ftp://ftp.gnumonks.org/pub/patches/es160src-linux-gcc3.3.4.patch.
[ /linux |
permanent link ]
New all-in-one fileserver
I just put my new all-in-one fileserver [called 'sunbeam'] into production.
It's a Athlon64 based system, with five 200GB SATA drives. Since there's now a working (but still unofficial) Debian AMD64 Port, I can even run a 64bit distribution on it. As far as production machines go, I have the policy to only run debian on them.
This machine replaces the old box [now called 'shiva'], (Athlon 1000 with eight
hard drives 80-120GB] which now serves as main storage server for my
network-wide backup system.
As new backup solution, I chose to use Bacula. The architecture just looks like the
'right thing'. The catalog is maintained in the SQL RDBMS of your choice,
every to-be-baked-up machine runs a client (bacula-fd). A director running on
one server then directs the clients to write their data to one or more storage
servers of their choice. This now also means that I can have one centralized
directore for three different physical locations (yes, my machines are spread
over three distinct locations with low bandwidth interconnection). Every
location just has it's own storage server :). Oh yes, of course, this is
on-site backup to hard drives only. Won't help if my house went fire.
[ |
permanent link ]
|