Since I've lately had some customer issues with regard to neighbour cache
overflows, I studied the current code quite a bit. From my point of view, it has a couple of shortcomings.
The general problem goes like this: What do we do, if we're attached to let's
say a /16 (formerly 'Class B') network that has a theoretical limit of 65535
neighbours at layer 2, and somebody sends us a single packet for every one of
those neighbours. We now start to send ARP requests for all those neighbours,
and until those time out (1sec default), thus flooding our neigbour table.
The current Linux strategy is to configure a static limit (default: 1024), and as soon as we reach the limit, we start deleting old entries. 'old' entries are those for real hosts to which we've recently had connectivity... We do not expire any of the incomplete neighbour entries in order to avoid ARP-floods.
So if you want to avoid that, you always have to set the gc_thresh3 value to at
least the theoretical number of total machines that could be directly reachable
at layer 2. While this is not a problem with /16, it suddenly becomes one with
/8, or with the extremely large IPv6 prefixes.
The problem is further increased, since the number of hash buckets is very low
(static number of 32), and the used hash algorithm apparently has a bad
distribution. So either we increase the hash table, increase the number of
buckets and improve the hash algorithm, or we change the expiration scheme to
also drop incomplete entries. But the current situation is definitely not good.
So I picked up some old 2.4.x patches from Tim Gardner, ported them to 2.6.x
and brushed them up. The number of hash buckets is now a kernel boot
parameter (if not specified, the hash is dynamically sized, like the TCP
syn-queue, fragment queue or ip_conntrack hash). The hashing algorithm now
uses a Jenkins hash, just like all other parts of the kernel use, too. The
patch is in testing at my machines at the moment, but I think I'll push it
soon.