Performance of system logging

One of my customers recently had a serious performance issue with one of his installations. Surprisingly, it wasn't even the real applications software itself that had performance issues, but the mechanism used for logging from this application.

So I started to think about the way logging usually works within a Linux-based system.

The server applications can be divided within two groups. One of them logs via syslog(), the other logs directly to it's own files. The logging itself happens synchronously, i.e. blocking the normal code flow until the log line was written. In the case of syslog, it might block because the syslog pipe is full - in case of stand-alone files, the file/io might take some time to complete.

Even in a multi-threaded or forked model of a network server program, this might pose considerable problems with regard to threads waiting for their log i/o to complete.

Syslog itself might not be as bad, especially since the 2.6.x pipe implementation works with only the minimal necessary amount of copying, and supports larger pipe sizes to avoid writer blocking.

Some people however tend to use something like syslogger in order to redirect the log output from programs with no syslog support also into syslog. This means that you have one pipe between your application and syslogger, and another pipe between syslogger and your real syslog daemon.

Comparing this issue with networking is actually not too problematic. In networking, we have packets that are passed from one process to another... with logging it's not a packet but usually one or more lines of text (that is, about 60 to 240 characters per entry).

You don't want to copy this data around and around... and in a lot of installations you'd rather want to use a couple of log lines than to slow down your application just for some statistics that you might collect.

Of course, you don't want to modify any of the existing applications, too - they should just be able to use syslog() calls as usual. OF course you could load a LD_LIBRARY_PRELOAD lib and redirect the syslog() calls, if needed.

So what I came up with, is something like a partially mmap()able pipe. The logging process would log to that pipe like it would with any other file descriptor. Internally, that 'pipe' has a ring buffer of configurable size. The pipe-reader could now mmap() this ring buffer into his address space in order to read the log.

This scheme should have the advantage of not blocking the writer if the pipe is full (it would just wrap around the ring buffer), and it avoids copying the data from some in-kernel pipe buffer into the user-space of the pipe reader.

Did you notice, this now looks perfectly like the DMA ring buffer of your Ethernet device and the Linux softirq handler ;)

Anyway, as I didn't do any vm / vfs hacking in Linux so far, this is not a trivial thing to implement. And I have lots of other work at this point. However, I'd certainly like to investigate the possible performance gains [losses?] of this idea. Comments welcome.