Logging specific fields with Caddy access logs

tl;dr: I wrote a Caddy log filtering program

It's available on my GitHub and is licensed under the MIT License.

Why?

Caddy has a very robust, verbose logging system which outputs structured (JSON) logs by default. However, it includes way too much information for many use cases. Due to limitations in the Zap logging framework that Caddy uses, it's not possible to log only the fields you're interested in - you've got to exclude the fields you don't care about. This makes it too easy to accidentally log sensitive information.

I wanted to record usage logs in a secure and privacy preserving manner, to see which pages and hosts on my static site are popular, and look out for abuse on some of my reverse proxied services. That would enable me to run the logs through some analysis scripts to look for top pages, hits per day, etc.

For example, I use Caddy as a reverse proxy to a DNS over HTTPS server. I'm only interested in logging:

  • the client IP (to watch for abuse),
  • the hostname they connect to (the server is accessible via multiple hostnames),
  • a timestamp,
  • the user agent (to filter out friendly bots and see desktop/mobile split),
  • how long it takes to service the query,
  • and the size of the response.

However, by default, Caddy will log the query string, which includes the DNS query submitted by the client. This would undermine the decision to turn off logging on my DoH server.

And Caddy's access logs are big by default. They log all of the headers (minus a few common ones that Caddy deems to be sensitive). So my choice would be to have aggressive log rolling (built into Caddy), which would risk losing data before I can prune/transform it, or have very lax log rolling, which would risk filling the disk on my server.

Common approaches to the problem

The transform-encoder plugin lets you specify a formatting string and output only the fields you're interested in (with a helper for Common Log Format). However, you lose the benefit of structured logging which makes ingesting the logs into any analysis tools more complex. And it also complicated the deployment process as you need to compile a custom Caddy build in order to include plugins.

Another alternative is to log the data in the verbose logging format, then strip the fields we're not interested in during the ingestion process. However this means logging potentially sensitive data to disk (which I'd rather avoid).

Caddy Log Filter

After considering the alternatives above, I decided the best way to achieve the balance of flexibility, performance and security that I needed was to write a small network log listener that logged only the fields I was interested in.

The program is available on my GitHub and is licensed under the MIT License.

Caddy Log Filter lets you log only the fields you're interested in, and will automatically strip the query string from the URI by default before persisting anything to disk.

The program is pretty tiny, and tailored very much for my use case. But it's easy to modify (by virtue of its simplicity) to log other fields or perform other transformations before logging to disk.

I decided to write it in JavaScript with Deno for a couple of reasons:

  1. It was a good excuse to sharpen my JavaScript skills and learn more about the Deno standard library.
  2. Deno has some good security defaults out of the box (sensible permissions system).
  3. I wanted to use a dynamic language so that I could avoid rigidly defining the JSON structure to facilitate deserialisation or fight with casting map[string]any (also a rigid approach).

Deno experience

Writing Caddy Log Filter in JavaScript with Deno was pretty straightforward. I appreciated the permission system, it was easy to deploy with Docker and it was very performant.

There was a bit of a gap around file stream documentation, which was a challenge - some of the Deno examples I found online referred to deprecated file writer interfaces, and the Deno docs (in places) seemed to amount to "Use the standardised Web Streams API", but the API docs had a procedurally-generated JavaDoc feel (listing interfaces and methods but very little commentary or guidance on error handling). Although, I'm sure part of the issue was that I'm pretty inexperienced in server-side JavaScript.

That said, I'd definitely consider Deno for my next project.

Performance

There's minimal impact on Caddy server performance when using Caddy Log Filter, based on simple benchmarking of a static site using go-wrk.

Running the benchmark for 10 seconds yielded the following results on my M1 MacBook Pro listening locally:

Without Logging With Logging
Requests 346,533 302,866
Requests/sec 34,867.11 30,451.88
Transfer/sec 17.12MB 14.96MB
Overall Requests/sec 34,646.45 30,279.20
Overall Transfer/sec 17.02MB 14.87MB
Fastest Request 100µs 109µs
Avg Req Time 286µs 327µs
Slowest Request 31.12ms 9.288ms
Number of Errors 0 0

In my case, an average response time of 327µs (that's 0.327ms, or 0.000327) when logging is acceptable. This benchmark was carried out against a simple (and very small) static site. In all likelihood, a more complex site would see the bottleneck occur in serving the page (e.g. from a reverse proxy) and not in logging the request.

Conclusion

I wish Caddy had this functionality built-in (even with a caveat that it might be less performant), and I'm not a huge fan of the Caddy module system, where rebuilding is required to add modules. I'm not sure if the alternative (Mattermost-style separate binaries for modules that communicate with the main server process) is any better, but this is really a limitation of Go and it's statically compiled nature.

Hopefully other people might find this helpful (it seems to be a common request and I've found very little existing work using Caddy's log-to-socket functionality).

It solved my problem for now, so was a worthwhile weekend project.