At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

Handling Multiline Stack Traces with Logstash

October 9, 2018

Table of contents

Here at Sematext we use Java and rely on Elasticsearch for our hosted ELK logging SaaS, a lot. We like them so much that we regularly share our logging experience with everyone and help others with logging, especially Elastic Stack.

Centralized log management plays nice with Java (and anything else that can write pretty logs). However, there is one tricky thing that can be hard to get right: properly capturing exception stack traces.


log management and analitics ebook sematext

Looking to replace Splunk or a similar commercial solution with Elasticsearch, Logstash, and Kibana (aka, “ELK stack” or “Elastic stack”) or an alternative logging stack?
In this eBook you’ll find useful how-to instructions, screenshots, code, info about structured logging with rsyslog and Elasticsearch, and more.

Download yours.


Unlike your typical single-line log events, stack traces have multiple lines and they aren’t always perfectly uniform. As such, most log shippers don’t handle them properly out of the box and typically treat each stack trace line as a separate event – clearly the wrong thing to do (n.b., if you are sending logs to Sematext this way, you are going to hit your plan limits faster, so make sure you send stack traces properly!). Thus, in most cases, a special configuration is needed in order to get stack traces right.

Use any standard logging tool to send logs to Sematext Logs and it will take log management off your shoulders. Get Started

Configuring Logstash for Java Multiline Events

Let’s see how one needs to configure Logstash 2.x or later for this task.

Prerequisites

We start with the following assumptions:

  • you have a working Logstash setup
  • local logs are written to a file named: /var/log/test.log
  • the conversion pattern for log4j/logback/log4j2 is: “%d %p %m%n”

Based on the above pattern, logs should look like this, for example:

2015-03-23 11:11:50,778 INFO Initializing Systems Cache
2015-03-23 11:11:58,666 ERROR Processing request failed
kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
    at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
    at kafka.producer.Producer.send(Producer.scala:77)
    at kafka.javaapi.producer.Producer.send(Producer.scala:42)
    ...

Extracting Exception Stack Traces Correctly with Codecs

Stack traces are multiline messages or events.

Logstash has the ability to parse a log file and merge multiple log lines into a single event. You can do this using either the multiline codec or the multiline filter, depending on the desired effect. A codec is attached to an input and a filter can process events from multiple inputs.

Logstash Multiline Filter Example

NOTE: Logstash used to have a multiline filter as well, but it was removed in version 5.0. The problem was that it wasn’t thread-safe and wasn’t able to handle data from multiple inputs (it wouldn’t know which line belongs to which event).

The most popular example that allows matching log lines that are a part of an exception uses a regexp. Since the multiline codec doesn’t allow for multiple patterns, that regexp tends to be complicated:

input {
  file {
    path => "/var/log/test.log"
    start_position => "beginning"
    codec => multiline {
      pattern => "(^d+serror)|(^.+Exception: .+)|(^s+at .+)|(^s+... d+ more)|(^s*Caused by:.+)"
      what => "previous"
    }
  }
}

If the log line is part of an exception, it is appended to the last received log line. Once a new log line is received, the complete exception is sent to be processed. The downside of this approach is that it might not match all exceptions correctly.

Sure, you could keep fine-tuning and expanding the regexp and attempt to capture all possible variants of stack trace format, but… there is a better way.

input {
  file {
    path => "/var/log/test.log"
    start_position => "beginning"
    codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
    }
  }
}

The pattern looks for log lines starting with a timestamp and, until a new match is found, all lines are considered part of the event. This is done by setting the negate parameter to true.

Structuring Multiline Logs with Grok

TTo structure the information before storing the event, a filter section should be used for parsing the logs.

filter {
  mutate {
    gsub => [ "message", "r", "" ]
  }
  grok {
    match => [ "message", "(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} %{GREEDYDATA:message}" ]
    overwrite => [ "message" ]
  }
  date {
    match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
  }
}

output {
  stdout { codec => rubydebug }
}

The output section, in this case, is only used for debugging.  In the next section, we’ll show how to actually ship your logs.

Using the Mutate Filter

First, the mutate filter removes any “r” characters from the event. Usually, this is something you want to do, to prevent later issues when storing and visualizing the logs where “r” could be interpreted as an “n”.

Using the Grok Filter on Multiline Events

The grok filter splits the event content into 3 parts: timestamp, severity and message (which overwrites original message).

The (?m) in the beginning of the regexp is used for multiline matching and, without it, only the first line would be read. The patterns used in the regexp are provided with Logstash and should be used when possible to simplify regexps.

By default, the timestamp of the log line is considered the moment when the log line is read from the file. The date filter fixes the timestamp, by changing it to the one matched earlier with the grok filter.

Shipping the Logs

Finally, now that you’ve got all your logs extracted, even those with multiple lines like exception stack traces, it’s time to ship them to wherever you keep your logs.  Here’s how you’d ship them to Logsene:

output {
  elasticsearch {
    host => "logsene-receiver.sematext.com"
    ssl => true
    port => 443
    index => "YOUR LOGSENE APP TOKEN GOES HERE"
    protocol => http
    manage_template => false
  }
}

Final Words

If you are looking for a way to ship logs containing stack traces or other complicated multi line events, Logstash is the simplest way to do it at the moment.
It is written JRuby, which makes it possible for many people to contribute to the project. The downside of this ease of use and maintainability is that it is not the fastest tool for the job and it is also quite resourced hungry (both CPU and memory). Though, depending on the log volume that needs to be shipped, this might not be a problem.

Ship logs to Sematext Logs and see all logs and metrics under one roof to make your life easier. Get Started

For those looking for tips on how to ship multiline logs with rsyslog or syslog-ng, subscribe to this blog or follow @sematext – tips for handling stack traces with rsyslog and syslog-ng are coming.
If your organization need help with logging (ELK stack, but also rsyslog, Fluentd, Flume…), let us know.  If you like to build tools and products around logs and logging, join us – we’re always looking for good toolmakers world-wide.

Java Logging Basics: Concepts, Tools, and Best Practices

Imagine you're a detective trying to solve a crime, but...

Best Web Transaction Monitoring Tools in 2024

Websites are no longer static pages.  They’re dynamic, transaction-heavy ecosystems...

17 Linux Log Files You Must Be Monitoring

Imagine waking up to a critical system failure that has...