Intel e1000 (82546) TX performance

After recent discussions with Robert Olsson at the netfilter workshop, I've decided to investigate a bit further, why the Intel e1000 gigabit MAC's are quite limited when it comes to TX performance and large numbers of pps.

My first assumption was that the in-kernel pktgen.c code might not keep the transmitter busy at all times, resulting in only 760kpps (out of the theoretical maximum of 1480kpps).

So I hacked the e1000 driver to hardcode a refill of the Tx queue with the same skb over and over again. Using a 2048 Tx descriptor ring, I was able to keep the transmitter busy at all times (E1000_ICR_TXQE interrupts).

Unfortunately, I still didn't get more than the 760kpps in this setup (PCI-X, 66MHz, Dual-Opteron 1.4GHz, DDR-333 (PC-2700) RAM. So either we're seeing a limitation of the 82546 chip, or the PCI-X bus / memory latency / whatever.

I'll try the same experiments on a different machine with PCI-X 100 / 133MHz in order to find out what exactly is causing this limit.

netfilter workshop / Linux Kongress 2004

I've not been able to write any articles for this log over the last few days, since I've been busy with the third netfilter developer workshop and Linux-Kongress 2004.

The netfilter workshop went really well, apparently the

Finishing preparations for upcoming netfilter developer workshop

I've spent a significant amount of time over the last couple of days with the final preparations of the upcoming 3rd netfilter developer workshop. This is the first one where I'm in charge of every tiny bit of the organization, and I hope I got everything right.

The first attendees are scheduled to arrive tomorrow. They might even arrive before me, since I'll be heading the 500km down south tomorrow.

Getting the external VGA of my Apple Powerbook (TiBook IV) working

If you've attended one of my presentations during the last 12 months, you will certainly have noticed the poor quality of the slides. Yes, the content and the presentation is poor, too - but I'm mostly referring to the optical quality.

I've already spent at least a whole day in the past in trying to get the external VGA working with Debian/ppc, with little success so far. I really don't care whether the external port mirrors the content of the display, or if it runs in dual head mode.

Today, I spent some three more hours in trail-and-error with the radeon driver of the dri-trunk XFree86. I tried CloneMode, Dual Head, with and without FBMode, and about any other parameter within XF86Config-4.

In the end it turned out that the man page was not up-to-date, and the preferred way to get it running was the so-called MergedFB mode. This wasn't as easy to configure as expected, and I still got lots of 'Signal 11' segfault-style crashes.

The crashes seem to be totally unrelated to my graphics setup. In fact, it crashes when eth0 is not configured yet, but works after the network device is up. Now please somebody step up and explain...

Started a new 2.6.x based mini router distribution

I'm in the process of deploying a couple of PC Engines WRAP.1C embedded x86 boards deployed in my apartment. They make neat little playgrounds for Router/NAT/VPN/WLAN/... style appliances.

Unfortunately I didn't find any embedded Linux distribution project that was up to my demands. Apparently they all use age-old kernels (2.4.17 or something ancient like that). And they very rarely come with a decent automatic build system that would allow you to rebuild it from scratch, adding your own patches, ...

So what did I do? I started my own :(. Not that I'm proud of it, but it was necessary. My home VLAN/firewall/PPPoE/NAT/VPN router is now running the very first image of this new distribution I called 'gRouter'.

It's main features are kernel 2.6.8.1, uClibc-0.9.26, busybox-1.00rc3, pppd with in-kernel PPPoE support, quagga, iptables-1.2.11, openvpn-1.6.0, and dropbear for SSH. It all fits in about 8MB of compact FLASH.

The build process is semi-automatic, apart from a few glitches the whole image compiles itself. I stole some of the build magic from the WISP-DIST project (part of LEAF), although this is all quite simple scripting.

After some more cleanups and testing, I plan to release this distribution. Please don't expect any support, or any configuration tools. It will be available for Linux experts who can configure and setup their system from scratch, and want to have the gadgets of the latest software releases.

On the todo list is cross-compilation support (well, since it is uClibc based, it already does cross-libc-compilation), madwifi support, and especially IPsec using the 2.6.x kernel implementation.

On VIA's failure to provide adequate Linux support

VIA is definitely one of the most innovative producers of PC-hardware. Their EPIA-series mini-ITX and nano-ITX mainboards are ideal for small appliances, such as firewalls, VPN-gateways, and especially home entertainment platforms such as PVR/DVR applications, DVB-Receivers, DVD/VCD/AVI-players, VideoLan receivers and such.

Just two days ago, VIA made a press release on their new VeXP 3.0 release, a VIA-enhanced fork of xine. To the unfamiliar reader, this press release raises the impression that VIA is really involved with Linux and the Free Software community.

This is just terribly wrong. They do anything but to support GNU/Linux. Comparing this press release with reality, I think VIA's Linux involvement as a whole is nothing more than a PR strategy.

I've recently investigated the "Linux support" they make available for their EPIA platforms. Even from the first glance it was obvious, that VIA just doesn't have any idea on on what it takes to "Support Linux".

All they do is to publish proprietary, pre-compiled kernel frame buffer and XFree86 display drivers for a limited number of particularly old GNU/Linux distributions.

Oh yes, I almost forgot it: They also publish the source to some 'lite' driver which lacks all the functionality needed for hardware-assisted MPEG2 decoding. This is obviously useless, since the whole point of buying a small fan-less board with hardware MPEG acceleration and TV-Out is to use the acceleration.

So their "Linux Support" is so good, that a number of people have to spend days and days in reverse engineering their binary proprietary drivers. You can find more information about the reverse engineering effort. My special thanks are going to Ivor Hewitt for doing all this work.

But wait, wasn't that what the Linux folks usually did with Windows drivers? Welcome to the world of "VIA Linux support", where instead of reverse engineering Windows drivers, we now have to do it with Linux drivers.

If VIA was really interested in providing good GNU/Linux support for their EPIA products, they would

  1. write full source code drivers licensed under appropriate Free Software licenses.
  2. make those drivers use standard interfaces, the respective project's coding style, contain useful comments.
  3. publish those drivers as patches against the latest development version of the respective project (kernel, XFree86, Xine)
  4. Work with the respective project maintainers to integrate those patches
  5. not have to care about maintaining RPMs for each and every distribution
  6. not have to care about porting their drivers to ever-changing API's, since they are included in the respective Free Software projects
  7. Provide documentation for their hardware down to the register level, so the Free Software community can continue development extending to features maybe not yet covered by the current driver.

Related Links:

  • http://lwn.net/Articles/99464/ VIA's original press release
  • http://www.viavpsd.com/ VIA's EPIA homepage
  • http://www.viaarena.com/ VIA's support forum and driver downloads
  • http://www.epiawiki.org/ The comprehensive source of EPIA/Linux related information
  • http://www.ivor.it/cle266/ The reverse engineered driver page
  • More Allnet Devices contain Linux

    I've now successfully proven that the ALL0185A, ALL0186, ALL1297, ALL2100, ALL2110 and ALL6100 devices contain the Linux kernel and are not distributed according to the GPL.

    Considering the out-of-court agreement that I have concluded with them earlier this year in ALL0277, I have to say I'm a bit disappointed that this happened again. It should be in their own best interest to distribute within the GPL license terms, and not first try to infringe and wait until somebody complains.

    I've contacted them, and they promised to publish the source code and adhere to the license within a short term. Let's see how this continues.

    Fujitsu Siemens Corporation not fulfilling amicable agreement

    As part of an amicable agreement, Fujitsu Siemens Corporation (FSC) agreed to make a donation to the German Unix Users Group. It came to me as a surprise, that GUUG has not yet received the funds even four months later!

    Again, I am very disappointed by the behaviour of the former GPL violators. It should be in their own best interest not to produce any negative publicity.

    Video Documentation on 21C3

    I've attended a meeting on the subject of providing audio/video documentation at the 21st Chaos Communication Congress. During that meeting, I was appointed as being responsible for this part of the 21C3 conference.

    So we want to do on-the-fly encoding of four video signals from DC1394 cameras to DVD-compatible MPEG2, low-resolution MPEG4 for live-streaming, and OGG audio only for live streaming.

    I did some preliminary experiments with the available experimental x86_64 assembly patches for ffmpeg, and it turns out that at least theoretically a 1.6GHz AMD64 should have enough power of doing those three encodings at the same time.

    Unfortunately the dv1394 device at the moment only supports one encoder mmap() ing the ring buffer of incoming 1394 frames - but that should be fixed pretty easy.

    I'll do some more experiments in the next couple of weeks, stay tuned.

    Main netfilter.org server has been replaced

    Yesterday I finally got around moving almost all netfilter.org services from our old Sun Ultra5 to the new XServe ClusterNode.

    Unfortunately there were lots of complications, so I had to stay awake until 5am in order to get all services running again. At least for now, everything seems to run smoothly.

    Off-the-shelf multi-port serial cards and Linux

    This is now the third time I've bought some PCI serial multi-port card (6 to 8 ports) that claimed to have 'Linux support'. If you then read the document, the vendor bluntly tells you that Linux generally doesn't support more than four ports, so if you have two built-in ports, you can only use two more. I've never read such bullshit anywhere else ;)

    So after some minor twiddling, I now submitted a patch adding support for this particular 6port device. Apparently there is either a wide variety of such boards, or almost no Linux users... A couple of years ago I added support for an AFAVLAB 8port serial card, to the Linux serial driver.

    I think I now know way too much about the serial driver. Not stopping with those two PCI 8250 based boards, I did lots of serial driver hacking for the XServe G5 and also for my recent ARM embedded work. Let's hope I can again advance to some more exciting work in the future.

    Using a human-based data acquisition plugin

    Why buy expensive data acquisition boards, if you can have a cheap human being entering the data on some terminal? No, just kidding.

    Anyway, GSPC now has a gpsc_acquire_user.c plugin that retrieves measurement data via a ncurses-based dialog instead of any data acquisition board. This is useful for testing, but also in some real-world cases.

    Two hard drives dying in one week

    This week already the second hard drive in one of my workstations died.. both times it was the same model: IBM DTLA-307060, produced Nov 2000 in Hungary. If that isn't some coincidence. Maybe they have a built-in 'best before' date :(

    So both my main workstations (Dual PIII-733 and a Dual Apple G4-500) were inoperable, isn't that great? The good part is that they've been replaced with silent Samsung SP1213N models, significantly reducing the noise level in my office.

    Attaching an UW-SCSI hard disk to an embedded ARM922T

    No, I'm not doing this for fun, this is part of work. It turned out that nfsroot is a bit of a problem while you're hacking the core network stack (and everything breaks all the time). So I now attached an 18GB UW-SCSI disk to an old aic7xxx controller and plugged this into my ARM development board. Seems to work quite fine, as long as the aic7xxx_old driver is used. The new one apparently calls pci_alloc_consistent from interrupt context ?!?.

    News on the GPL Violation Front

    It's been some time that I've reported news on the GPL violation side... Thus, no news is good news, one could think. Unfortunately to the contrary, I've been receiving a number of new GPL violation reports, unfortunately none of them containing my copyrighted work - and thus I am now looking for the respective copyright holders in order to get this issue sorted out.

    Stay tuned...

    Performance of system logging

    One of my customers recently had a serious performance issue with one of his installations. Surprisingly, it wasn't even the real applications software itself that had performance issues, but the mechanism used for logging from this application.

    So I started to think about the way logging usually works within a Linux-based system.

    The server applications can be divided within two groups. One of them logs via syslog(), the other logs directly to it's own files. The logging itself happens synchronously, i.e. blocking the normal code flow until the log line was written. In the case of syslog, it might block because the syslog pipe is full - in case of stand-alone files, the file/io might take some time to complete.

    Even in a multi-threaded or forked model of a network server program, this might pose considerable problems with regard to threads waiting for their log i/o to complete.

    Syslog itself might not be as bad, especially since the 2.6.x pipe implementation works with only the minimal necessary amount of copying, and supports larger pipe sizes to avoid writer blocking.

    Some people however tend to use something like syslogger in order to redirect the log output from programs with no syslog support also into syslog. This means that you have one pipe between your application and syslogger, and another pipe between syslogger and your real syslog daemon.

    Comparing this issue with networking is actually not too problematic. In networking, we have packets that are passed from one process to another... with logging it's not a packet but usually one or more lines of text (that is, about 60 to 240 characters per entry).

    You don't want to copy this data around and around... and in a lot of installations you'd rather want to use a couple of log lines than to slow down your application just for some statistics that you might collect.

    Of course, you don't want to modify any of the existing applications, too - they should just be able to use syslog() calls as usual. OF course you could load a LD_LIBRARY_PRELOAD lib and redirect the syslog() calls, if needed.

    So what I came up with, is something like a partially mmap()able pipe. The logging process would log to that pipe like it would with any other file descriptor. Internally, that 'pipe' has a ring buffer of configurable size. The pipe-reader could now mmap() this ring buffer into his address space in order to read the log.

    This scheme should have the advantage of not blocking the writer if the pipe is full (it would just wrap around the ring buffer), and it avoids copying the data from some in-kernel pipe buffer into the user-space of the pipe reader.

    Did you notice, this now looks perfectly like the DMA ring buffer of your Ethernet device and the Linux softirq handler ;)

    Anyway, as I didn't do any vm / vfs hacking in Linux so far, this is not a trivial thing to implement. And I have lots of other work at this point. However, I'd certainly like to investigate the possible performance gains [losses?] of this idea. Comments welcome.

    IETF work on NAT behaviour

    Apparently some people within the IETF have started a new working group called 'BEHAVE'. It is about the behaviour of NAT devices on the internet, and their inconsistent and incompatible behaviour. The working group aims to give guidelines to ipmlementors, in order to assure interoperability with new applications such as VoIP and peer-to-peer protocols, as well as multicast and others.

    Certainly a topic that is in in the main focus of my interest, so I decided this is the right point in time to start participation in the IETF.

    For more information about behave, see the mailinglist.

    Upcoming Chaosradio episode on software patents

    The next Chaosradio radio show will be about the ongoing debade on software patents, especially the recent development within the European Union.

    Being part of the anti software patent movement for about 4-5 years now, I am more than happy to help with the radio show on this subject.

    The radio show will be on air on Sept 01, 10pm GMT+2. If you understand german, there's a MP3 live stream available on the homepage.

    Working on embedded Linux ARM SoC project

    While there hasn't been any update on this weblog for quite some time, I've been buried under a lot of work.

    One of the most interesting projects is an embedded ARM-based SoC project with special network acceleration hardware. Unfortunately I'm not allowed to talk too much about it at this point, but be assured it is very exciting, and of course runs Linux :)

    During development I found it quite comfortable to run the small embedded system with nfsroot mounted from some larger box. The nfsroot contains a debootstrap'ed installation of Debian sarge for ARM.

    The main problem for this kind of operation is the limited on-board memory. But I'm tempted to put a 64MB graphics card into one of the PCI slots and hack the Linux kernel to treat this framebuffer as (somewhat slow) RAM :)

    Booting from a md raid device on powerpc

    Apparently, nobody has ever tried to do this so far, since the mac partition handling code in the Linux kernel had no provisions for enabling auto-detection of md software raid.

    I've now written patch for Linux 2.6.8, available at http://gnumonks.org/ftp/pub/patches/linux-2.6.8-mac-autoraid.patch implementing this feature. All you need to do is apply that patch, and make sure your md partitions have the type 'Linux_raid_autodetect' in the mac partition table.

    Figured out the fan control on the XServe ClusterNode

    I spent the last couple of hours figuring out the missing bits of the fan/thermal control on Apples Dual XServe ClusterNode. Luckily it's very similar to the design Apple used in their Desktop G5 machines, so I can build on the work that Benjamin Herrenschmidt did with his thermal_pm72 driver.

    So in case anybody is interested in the technical details: Eight fans are controlled by the FCU (Fan Control Unit), which is attached to a i2c bus of the Apple U3 northbridge.

    There are three RPM controlled fans per CPU. The Left CPU (viewing from the front of the machine) has fans #1,2,3. The right CPU: #4,5,6.

    The other two fans are not RPM controlled, but just PWM controlled... so instead of setting an RPM, you have to set a pulse-width between 10 and 100%. PWM Fan #1 is located between RPM-fan 3 and 4 (between both CPU's) and it's job is to keep the U3 chip cool. PWM Fan #2 is located behind the PCI-X slots and thus cooling them (too bad in my machine there is no card to be cooled *g*).

    Regulating the CPU fans is quite easy, since there is a per-CPU temperature sensor, and also a voltage and current reading, so we can calculate the power consumption of each CPU and tune the fans accordingly.

    For the U3 it is a bit more difficult.. I have not yet found a way to get a temperature reading for it, but I'm quite sure there is some temperature sensor somewhere.

    As for PCI cards, there is apparently some way to read the power consumption - but of course again undocumented and not reverse engineered yet. As I don't have PCI boards in my box anyway, I personally don't care that much. But I should now stop arguing rationally, since a machine hosted in some rack-space is very unlikely to need fan control at all :)

    I'll try to make a somewhat cleaner unified driver for PowerMac7,2 and RackMac3,1 and post a patch in the next couple of days.

    I really wonder why Apple is not releasing their FCU driver source code for Darwin... it's really annoying. And I doubt they can claim that it contains any valuable intellectual property that their competitors are not allowed to see ;)

    Finally the XServe ClusterNode runs Linux!

    Yes, it does. I now have two partitions: One running the experimental Gentoo ppc64 port, and another one running the overly-conservative Debian woody ppc32. The plan is to boot into Gentoo, and run publicly-accessible production services within the Debian woody chroot.

    So how did I make it? Well, I gave up on the idea that the usual installation process of any distribution would work. So instead of trying to fix up whatever goes wrong in the installation scripts, I just escaped to a shell ASAP, run mac-fdisk, mkfs.ext3, extracted the stage3.tar.gz and did the rest of the Gentoo install.

    Debian was then installed using the convenient debootstrap tool.

    One of the major remaining questions is however: Does the Apple XServe Hardware give you anything similar to Sun boxes, where you could just send break over the serial line and get into OpenFirmware? This is very convenient for remotely resetting machines without any local 'reset-staff' present.

    After some chatting with Benjamin Herrenschmidt, apparently nobody is working on getting fan rpm/speed/temperature control implemented on the XServe so far. Well, as it's a rack-mounted machine sitting in some hosting center I don't really care about the noise anyway.

    More interestingly, the Apple KeyLargo2 based machines have a Hardware Watchdog. Driver Source code is available within the public part of the Darwin kernel, so it should be easy to implement a Linux driver for this. Maybe I'll find some time to dive into this.

    IPv6 packet filter benchmarking

    It seems like a German university is currently doing feature analysis and benchmarking of IPv6 packet filters. Coincidentally, I'm going to near that university next week anyway, so I'll stop over for a short visit and help them with their ip6tables evaluation setup.

    I would be very interested to see some numbers on ip6tables... as we just discovered at the networking conference in Portland, nobody seems to be doing benchmarking / profiling on the Linux IPv6 code so far.

    Database Design + Content for GPL-Violations

    In order to keep track about the gpl violations that I am encountering myself or that are reported by fellow users, I really need some semi-automatic system to keep track of this.

    Being a RDBMS geek in my former life, I designed a SQL-based data model to cope with the individual objects such as vendors, products, product-firmware-versions, violations, settlements, compensations, comments, documents, contracts, ...

    It all turned out to be more complex than I thought initially. But I think it was really worth the effort.

    This database is for strictly internal use, since there is a lot of confidential information in there. However, according flags indicating the public/private nature of the data records are included in the data model. At some later point I might extract the public information to create some web pages at www.gpl-violations.org.

    It's main target is to allow me keep track with what's going on, and also keep track about what has been verified where, if for new upcoming firmware images the source code was made available, if the source was complete, ...

    I've already filled in lots of the existing data I have, but it's far from being complete. This needs some more time of filling in data records.

    And yes, I built some simple forms using GNU Enterprise Designer and Forms. It's still in 0.x stage, but usable for easy tasks.

    Putting multiple SATA drives into a XServe ClusterNode G5

    Apple is selling two different models of their Dual G5 XServe: One 'Normal' model, and another 'ClusterNode' Model. They are pretty much the same, but the ClusterNode doesn't have things you usually don't need in a rack-mounted 1U server anyway: CD-ROM and VGA-Card. However, it is also limited to a single hard drive.

    I guess Apple's reason is that in a scientific cluster computing environment, the node's local storage is insignificant - whereas on a real server you most likely want multiple (mirrored) drives.

    However, the significant price difference (Dual G5 ClusterNode has the same price as the Single G5 XServe) made me ponder buying a ClusterNode and adding another drive.

    Fortunately, the hardware is quite similar. It turns out that the Mainboard has three SATA connectors, and the space for the 2nd and 3rd IDE drive was left empty. Also, the Backplane for Apples hotplug drives is not fully assembled - it is missing the connectors for the 2nd and 3rd drive :(

    So Putting the drive in place and attaching it via a fixed cable to the SATA connector is no problem at all. However, Power is a slight problem. The whole machine has not a single standard power connector, so my only remaining option was to solder some wires onto the drive backplane PCB. This is ugly, but well.. who cares ;)

    I'll put some photos of the modification online soon.