userspace conntrack helper code compiles (yet untested)
Finally, both the kernel side (nfnetlink_helper) and the userspace side (libnetfilter_cthelper) code for userspace conntrack helper support is basically finished and compiles. I didn't yet dare to test it, and I'm rather heading off to bed now. Testing will be done tomorrow.
So how is this supposed to work? Well, basically a new nfnetlink subsystem exists, which can (on behalf of an userspace process) create dummy "nf_conntrack_helper" structures inside the kernel. Such a dummy structure has the usual properties (tuple, mask, timeout, etc.) but a dummy expectfn() which only calls NF_QUEUE() to send the packet to userspace. Userspace can then look at the packet, possibly modify it and re-inject it back into the kernel. Since helpers are now processed at a different netfilter hookfn() than the rest of the conntrack code, this actually works.
Now during the reception of such a packet in userspace, the process is likely going to want to create a new expectations. Expectations can already be created by means of libnetfilter_conntrack/nf_conntrack_netlink. However, in order to create the expectation, a number of things are needed. Mainly the tuple(s) of the master conntrack, but also other ancillary data such as ctinfo are sometimes desired. As long as we don't do NAT, the process could derive the tuple from the packet's IP[v6] header, and query nf_conntrack_netlink for the remaining details. However, this is inefficient since we'd add another kernel/userspace round-trip and the associated latency. So instead, I chose to extend nfnetlink_queue a bit, and allow it to have a new queue_mode (NFQ_MODE_PACKET_CT) in which there is a new nested attribute (NFQA_CT) which in turn contains the tuple, id and ctinfo.
Userspace now has all informations to create a new expectation. But wait, what do we do about expectfn()? We use the same magic as with helpfn(): Userspace tells the kernel to which nfnetlink_queue queue_id packets hitting the expectfn() should be sent. The 'minor' difficulty here is that expectfn() is called from the middle of the conntrack code (init_conntrack() actually), and when we get back from the queue (set_verdict or re-inject), then the netfilter hook code would continue at the next hookfn, skipping most of the conntrack code. But we can also return NF_REPEAT in order to call conntrack again. Since our expectation is already confirmed, expectfn() will not be called and it _SHOULD_ somehow just magically work, maybe with some tiny ugly hack here or there.
The NFQA_CT way is still far from being optimal, since we copy the same conntrack tuple for every packet of the control connection to userspace, no matter that this information never changes, and no matter that we actually only need it in those few cases where we want to raise an expectation. So the mid-term plan is to make userspace keep a small copy of selected conntrack state entries. This can be done by sending NEW and DELETE events for all conntracks that have a helper assigned. We could create a new multicast group specifically for this purpose, in order to keep the overhead and memory usage low. Userspace keeps a hash table indexed by ct->id. Packets sent via nfnetlink_queue will therefore only need a single 32bit ID attribute and not the full tuple(s).
Apart from userspace helper code, I've been working on getting some x_tables / nf_conntrack refcounting / dependency issues sorted out. Again another issue where having a couple of dozens of inter-dependant netfilter modules seems to become a major PITA. Sometimes I want to have back the simplicity of a truly monolithic kernel.