GELF - Better way of logging

Logging from your application goes roughly like this:

  1. You have a data which you want to log
  2. You shove that data into a string which has time, severity and other information.
  3. Your log gets written to a file
  4. If you have centralised logging (as you should) your log shipper (like Logstash) reads the file and forwards it to the log server.
  5. In the log server you parse the log file to extract the separate fields for indexing (for searching purposes)
  6. Profit! (usually not)
So... we start with data structures at the begin and end up parsing it again at the end back to data structures. Seems kind of silly to create the log file in the middle doesn't it. And don't get me started with syslog protocol... Log files are usually meant for human consumption with grep and other command line tools. They are not best suited for machine processing. There are better formats and protocols for that.

Enter GELF

Graylog has introduced a wonderful new logging format called GELF (Graylog Extended Logging Format). It is a simple JSON type format with a few mandatory fields and a protocol to ship the logs around. The end result is that you start with a data structure and can log it as it is and it ends up indexed exactly like that. There is a built in support for GELF in Docker for example. You get container names, ids, image names etc. as separate fields.

We've also tried the GELF logging modules for Python and Apache HTTP Server. Module for Log4J2 is also available although we haven't tried it yet.