TomTom and your own kernel

I've started to merge the TomTom specific patches into a plain 2.4.27 kernel. Most of it is quite straight forward, since apparently they backported half of the kernel to 2.4.18-rmk6 (which is what they use as base). I don't really get it why companies still develop new products for 2.4.x, especially for really old version like 2.4.18. In the windows world, nobody still writes windows 3.11 applications, why do they start this kind of crap with Linux? *sigh*

Anyway, I'm thinking about a 2.6.x kernel port at some point, but obviously this is not an important issue on my agenda and I'd rather get some netfilter stuff running first.

Berlinux 2004

Some time ago I was approached if I would be able to give a presentation at Berlinux 2004, Berlin's local incarnation of a Linux conference, organized by the Berlin Linux User Group.

This should be the first contact to any user groups I've had for about five years. I've tried to avoid Linux user groups exactly because of the 'User' part. I have a hard time dealing even with Linux-savoy iptables users, let aside users who need explanation how to install a given Linux distribution or even how to use a file manager.

Unfortunately Berlinux seems to be very user-oriented, too. I arrived about 40 minutes early and am now waiting for a presentation explaining the principles of mounting and the Linux file system layout to finish.

I'm surprised that Berlinux is so small, considering that Berlin is about seven times the size of my old hometown of Nuernberg, and the ALIGN Linux Setup Parties had about the same size.

Oh yes, does the idea trouble you that you know somebody at every international Linux conference, from Bangalore to Ottawa - but at an event in your own hometown you have a hard time finding any person whom you know? That's how I feel. Misplaced, at the wrong event :(

Porting PPTP conntrack/nat helpers to 2.6.x

I've always refused to do the port of the PPTP conntrack/NAT helper I wrote for 2.4.x because there's higher priority items on my agenda.

Apparently it helped, as I was told Mandrake did a port to 2.6.x. I thought that is great news, and I thought it'd take an hour or so to get it merged.

Unfortunately that 'port' was totally incomplete. NAT couldn't have worked at all, and if you sent it a nonlinear TCP packet it would very likely crash your kernel.

In the end I spent the whole afternoon at it, with a resulting patch that is about the same size as the original code :(

The code is now in our subversion repository, I didn't have the time test it so far, so any testing you (yes, you, the reader) might give it would be appreciated.

I should do more press releases

I'm sorry for that. GPL-enforcement progresses meanwhile. I've been able to obtain amicable agreements with three more vendors (D-Link, Gigabyte, TomTom), and there are two more open / ongoing cases at this point.

Expect more news and even an official press release during next week

Fun with incompetent BMW employees

So during the repairs of my BMW F650's carburetor, I lost the choke plunge. Not a big deal, just a tiny part regulating the fuel/air ratio at engine startup time.

So I picked up the phone and called the spare part department of BMW in Berlin, and told them the exact part I wanted. "Chokekolben" is 100% not possible to be misinterpreted, there is no other part with the same name. So I was told that this part is not available on it's own, but just in a set bundled with the linkage/string that actually attaches to the plunge.

One day later I was called that the part had arrived. It took me about an hour to get to the BMW subsidiary, only to find out that they had ordered the choke string, but it came without plunge.

They showed me the exploded view of the carburetor, and it was very clear that the plunge is sold separately for about EUR 3. I have no idea how one can misunderstand the exploded view and/or the spare part list associated.

After ordering the plunge, I asked them if they made the exploded views available for customers, so they could directly order a particular spare part number in order to avoid such misunderstandings. Apparently they only provide those spare part catalogues to their BMW partners, and they see no way how they could provide me a copy. *sigh*. So I will have to rely on some brain dead spare part sales assistant who has most likely never disassembled that bike ..

Luckily, there's eBay and I found somebody who sold the original BMW spare part catalogue on CD-ROM. What would the world be without eBay.

BMW, this happened about two weeks ago, and I still don't have that spare part.

Yet again more cases coming up

I've authorized my lawyer to act in five more new GPL violation cases. As usual I will not disclose their names until some kind of agreement (or a court order) is in place.

In one of the cases we unfortunately now had to go after a reseller, since the warning notice to the Dutch vendor was unanswered. Apparently the strategy is working, since the German reseller now put pressure on the Dutch vendor, who suddenly now replies to us ;)

Conntrack events for 2.6.x

I've separated out Patrick McHardy's conntrack events from the nfnetlink-ctnetlink patch and ported it to 2.6.x. The patch was posted to netfilter-devel, in case you're interested.

For those of you who don't know what this means: It means that the first part of what is required for a 2.6.x ct_sync port is now done ;)

ct_sync ethereal plugin

While doing some more ct_sync testing/debugging, I found out that for some reason my ctnl_dump program didn't work anymore. Instead of fixing it, and updating it to CTSP (conntrack sync protocol) version 2, I decided to write a plugin for the well-known packet analyzer ethereal.

Due to the nature of the CTSP, it passes arch- endian- and configuration-dependent data structures between master and slave. This means that it is virtually impossible to write a analyzer that will work in any of those combinations.

My plugin now assumes that you use a little-endian 32bit machine with the pptp-conntrack-nat patch applied.

The plugin turned out to provide very useful information, and I was able to fix some issues in ct_sync using it.

No big news this week - I'm in Astaro labs

I'm about to do one week of benchmarking and profiling using an Ixia four-port Gigabit Traffic generator and a Sun Fire v20z dual Opteron box in the Astaro labs. Let's hope I can find some code pieces in the network stack that can be optimized in order to achieve higher performance...

xfrm_user.c doesn't use netlink correctly

If you read the netlink documentation (and look on how existing users such as rtnetlink or ipt_ULOG uses it), then all messages part of a dump have the NLM_F_MULTI flag set, and the dump is terminated with a NLMSG_DONE message.

The code in net/xfrm/xfrm_user.c however dumps those messages without the NLM_F_MULTI flag. I've hacked a first patch, but apparently it doesn't catch all cases.

Motorbike problems

I wanted to take pictures of a recently detonated old building in Berlin. I wanted to go there via motorbike. Unfortunately the bike got some problems: After about 3km from my home, it suddenly stopped and refused to start again. While trying to get it running, I suddenly noticed vast amounts of fuel leaking from the air filter. That's a bad sign, it basically says that somehow the carburetor is getting fuel into the wrong direction.

I went home by public transport (no photos taken), and luckily found a truck rental that was open on Sundays. So I managed to get the bike back home, take everything apart and clean the carburetor. I couldn't find something serious like a worn out fitting... all I found was a minimal amount of dirt.

I'll put the bike pieces back together tomorrow, let's see whether cleaning the dirt actually helped. Jeez, as if I hadn't enough to do already...

2.4.x backport of neighbour cache rework

I've finished my 2.4.28 and 2.4.21 backports of our recent neighbour cache re-work (see netdev of last two weeks in case you're interested). 2.4.28 was quite straight-forward, just the missing per-CPU hurt a bit. 2.4.21 was pretty hard, since the neighbour cache apparently changed quite a bit between 2.4.21 and 2.4.28.

But well, it's over now. Thank god :)

Generalized Linux network statistics

While working on the neighbour cache, I introduced some generic neighbour cache statistics. They are done in the core, but exported to userspace for every ncache separately (arp, ndisc, atm_clip, decnet). I used the same techniques and file format as rt_stat.

Martin Josefsson also recently introduced ctstat, the same kind of statistics for ip_conntrack. He did a copy+paste 'port' of the rtstat userspace program. I now also needed four more new copy+paste 'port's. And I couldn't do it. Copy+Paste style ports are what I am fighting in the iptables world for two years, so I certainly don't want to introduce them elsewhere..

The result is what I call lnstat. It's a generalized version of rtstat, it works with neighbour cache, routing cache and conntrack statistics - either separately or all at the same time. It has user-defined formatting (field width) and key selection, as well as some other bells and whistles. Let's hope this gets integrated with iproute2 soon, so people can benefit from it.

I also thought about writing some daemon, but abandoned that idea in favour of writing a ulogd2 plugin for it... this means ulogd2 will be able to log per-packet, per-flow and generic things such as statistics...

Linux Bangalore / 2004

The LB/2004 organizers have officially appointed me as speaker recruiter ;). Apparently they have some trouble in contacting various Linux developers due to over-reactive spam filters (blocking everything from India, heh?).

This means I end up writing emails trying to convince folks such as Alan Cox, Andrea Arcangeli, Russell King, Erik Andersen, Robert Love, ... to attend this wonderful Indian conference.

Did I mention that I'm going to be there this year, too ;)

First Solaris-based contract in four years

For more than four years, I did 100% linux based work. But apparently there are still people interested in Solaris stuff, since I just got my first solaris based contract in quite some time.

Spent an incredible amount of time getting Solaris 9 installed on my Ultra 5, which was only running Linux before. I never understood how Sun could rectify Solaris being so much slower than Linux on their own hardware ;)

pkttables finally making some progress

I've found some time to work on pkttables again. Isn't that great news? If my brain is not completely broken, I've now worked out a RCU-powered way to have full table traversal with a completely lock-less reader path, while providing atomicity either on table- or chain level.

Also, I ripped the "struct nf_attr" and NFA_xx macros from the nfnetlink core, since they get replaced by my vTLV (Versioned TLV) code.

With some luck I'll be able to continue my pkttables work next week

CLUSTERIP is in patch-o-matic-ng

About one year ago I did some work for SuSE in implementing load-balancer-less load-balancing clusters ;) This is achieved by replying to ARP requests with a link-layer multicast address, so all nodes receive all packets. Hashing parts of the ip header now determines whether the packet is to be passed up the stack on a given node.

The result is called the iptables CLUSTERIP target, and I've now finally put it in patch-o-matic-ng, since it was only available in my undocumented public CVS tree so far.

Reworking the Linux neighbour cache

Since I've lately had some customer issues with regard to neighbour cache overflows, I studied the current code quite a bit. From my point of view, it has a couple of shortcomings.

The general problem goes like this: What do we do, if we're attached to let's say a /16 (formerly 'Class B') network that has a theoretical limit of 65535 neighbours at layer 2, and somebody sends us a single packet for every one of those neighbours. We now start to send ARP requests for all those neighbours, and until those time out (1sec default), thus flooding our neigbour table. The current Linux strategy is to configure a static limit (default: 1024), and as soon as we reach the limit, we start deleting old entries. 'old' entries are those for real hosts to which we've recently had connectivity... We do not expire any of the incomplete neighbour entries in order to avoid ARP-floods.

So if you want to avoid that, you always have to set the gc_thresh3 value to at least the theoretical number of total machines that could be directly reachable at layer 2. While this is not a problem with /16, it suddenly becomes one with /8, or with the extremely large IPv6 prefixes.

The problem is further increased, since the number of hash buckets is very low (static number of 32), and the used hash algorithm apparently has a bad distribution. So either we increase the hash table, increase the number of buckets and improve the hash algorithm, or we change the expiration scheme to also drop incomplete entries. But the current situation is definitely not good.

So I picked up some old 2.4.x patches from Tim Gardner, ported them to 2.6.x and brushed them up. The number of hash buckets is now a kernel boot parameter (if not specified, the hash is dynamically sized, like the TCP syn-queue, fragment queue or ip_conntrack hash). The hashing algorithm now uses a Jenkins hash, just like all other parts of the kernel use, too. The patch is in testing at my machines at the moment, but I think I'll push it soon.

Siemens is violating the Settlement

Siemens is offering the SE-505 firmware on their homepage without any reference to the source code, the GPL, or the GPL text. This is in violation of the signed settlement agreement that I have concluded with them.

The lawyer is already informed, and we'll see what kind of legal options we now have in pushing Siemens [again *sigh*] for GPL compliance.

NAPIfied natsemi driver

I've now successfully NAPIfied the second NIC driver: natsemi.c... this was the only remaining driver that I care about, since it is used in the PC Engines WRAP embedded systems that I use as routers/bridges/wlan-gateways.

The result is that I can now get about 34kpps routed on an embedded 266MHz Geode CPU at full 148kpps 64byte single-flow udp flood on the input NIC.

Adding NAPI support to the sungem.c Ethernet driver

Yesterday I implemented NAPI support for the sungem.c driver. This was done because I was annoyed by the fact the my notebook (Apple Powerbook with on-board Gigabit Ethernet) could still be killed by a machine running pktgen and flooding it with some 700 kpps.

After submitting the patch, David Miller pointed out that he has added NAPI support to sungem.c to the bitkeeper tree about four days ago :( So I spend a number of hours in duplicating work that was already there... not that I didn't have other stuff to do.

Well, at least I learned a bit more about Linux NIC drivers..

I'm now facing the task of implementing NAPI for the natsemi.c driver, which is used in the PC Engines boards that I've been using recently as embedded Routers / Firewalls.