Configuring Sematext Docker Agent

9.1 Connection to SPM and Logsene

SPM and Logsene are available in the Cloud (SaaS) or On Premises. Depending on this setup Sematext Docker Agent needs to be configured to ship data to the appropriate SPM and Logsene data receiver endpoints.

9.1.1 SPM and Logsene in the Cloud (SaaS)

The default configuration of Sematext Docker Agent is to connect via TLS (HTTPS) to the SaaS provided by Sematext using the following API endpoints:

If you are using SPM or Logsene SaaS there is no configuration required to use the above default settings.

To reach the above mentioned Receiver services through firewalls, it is possible to configure proxy server settings as URL using the environment variable HTTPS_PROXY.

9.1.2 SPM and Logsene On Premises

If SPM and Logsene (not just the agents, but the whole SPM and Logsene solution) are deployed in the local network the servers will have local IP addresses or DNS names. Sematext Docker Agent lets you change the Receiver addresses for SPM and Logsene using environment variables:

  • SPM_RECEIVER_URL – URL to your SPM Receiver
  • EVENTS_RECEIVER_URL – URL to your Events Receiver
  • LOGSENE_RECEIVER_URL – URL to your Logsene Receiver

Detailed installation instructions are included in the SPM and Logsene On Premises package – email sales@sematext.com or call +1 (347) 480 1610 for a free evaluation copy.


9.2 Log Handling Options

9.2.1 Blacklisting and Whitelisting Logs

Not all logs might be of interest, so sooner or later you will have the need to blacklist some log types.  This is one of the reasons why Sematext Docker Agent automatically adds the following tags to all logs:

  • Container ID
  • Container Name
  • Image Name
  • Docker Compose Project Name
  • Docker Compose Service Name
  • Docker Compose Container Number 

Using this “log metadata” you can whitelist or blacklist log outputs by image or container names. The relevant environment variables are:

  • MATCH_BY_NAME — a regular expression to whitelist container names
  • MATCH_BY_IMAGE — a regular expression to whitelist image names
  • SKIP_BY_NAME — a regular expression to blacklist container names
  • SKIP_BY_IMAGE — a regular expression to blacklist image names

9.2.2 Automatic Parser for Container Logs

In Docker logs are console output streams from containers. They might be a mix of plain text messages from start scripts and structured logs from applications.  The problem is obvious – you can’t just take a stream of log events all mixed up and treat them like a blob.  You need to be able to tell which log event belongs to what container, what app, parse it correctly in order to structure it so you can later derive more insight and operational intelligence from logs, etc.

Sematext Docker Agent analyzes the event format, parses out data, and turns logs into structured JSON.  This is important, because the value of logs increases when you structure them — you can then slice and dice them and gain a lot more insight about how your containers, servers, and applications operate.

Traditionally it was necessary to use log shippers like Logstash, Fluentd or rsyslog to parse log messages.  The problem is that such setups are typically deployed in a very static fashion and configured for each input source. That does not work well in the hyper-dynamic world of containers! We have seen people struggling with the syslog drivers, parsers configurations, log routing, and more! With its integrated automatic format detection Sematext Docker Agent eliminates this struggle — and the waste of resources — both computing and human time that goes into dealing with such things! This integration has a low footprint, doesn’t need retransmissions of logs to external services, and it detects log types for the most popular applications and generic JSON and line-oriented log formats out of the box!

Example: Apache Access Log fields generated by Sematext Docker Agent

Example: Apache Access Log fields generated by Sematext Docker Agent

For example, Sematext Docker Agent can parse logs from official images like:

  • Nginx, Apache, Redis, MongoDB, MySQL
  • Elasticsearch, Solr, Kafka, Zookeeper
  • Hadoop, HBase, Cassandra
  • Any JSON output with special support for Logstash or Bunyan format
  • Plain text messages with or without timestamps in various formats
  • Various Linux and Mac OSX system logs

In addition, you can define your own patterns for any log format you need to be able to parse and structure. There are three options to pass individual log parser patterns:

  • Configuration file in a mounted volume:
    -v PATH_TO_YOUR_FILE:/etc/logagent/patterns.yml
  • Content of the configuration file in an environment variable
    -e LOGAGENT_PATTERNS=”$(cat patterns.yml)”
  • Download pattern definitions via HTTP
    -e PATTERNS_URL=http://yourserver/patterns.yml

The file format for the patterns.yml file is based on JS-YAML, in short:

  indicates an array element
!js/regexp – indicates a JavaScript regular expression
!!js/function > – indicates a JavaScript function

The file has the following properties:

  • patterns: list of patterns, each pattern starts with “-“
  • match: group of patterns for a specific log source (image / container)
  • regex: JS regular expression
  • fields: field list of extracted match groups from the regex
  • type: type used in Logsene (Elasticsearch Mapping)
  • dateFormat: format of the special fields ‘ts’, if the date format matches, a new field @timestamp is generated
  • transform: JS function to manipulate the result of regex and date parsing

The following example shows pattern definitions for web server logs, which is one of the patterns available by default:

Example from https://sematext.github.io/logagent-js/parser/

Example from https://sematext.github.io/logagent-js/parser/

This example shows a few very interesting features:

  • Masking sensitive data with “autohash” property, listing fields to be replaced with a hash code. See section 9.2.4.
  • Automatic Geo-IP lookupsincluding automatic updates for Maxmind Geo-IP lite database. See section 9.2.5.
  • Post-processing of parsed logs with JavaScript functions. See section 9.2.6.

The component for detecting and parsing log messages — logagent-js — is open source and contributions for even more log formats are welcome.

9.2.3Log Routing with Docker Labels

Storing all logs in a single searchable index represented by a Logsene App Token might be very convenient for quick troubleshooting.  However, there are common scenarios when you would want to have different logs indexed in separate Logsene Apps, such as:

  • Limit access to different logs to different teams or team members. In Logsene the access permissions can be granted on a per Logsene App basis. This means you can have very fine control over who has the rights to see which logs.
  • Analytics for logs. All data used for structured analytics like web server logs, sensor data or KPI’s are much easier to process when stored in their own Logsene Apps without the “noise” from other applications with different log structures. For example, you probably wouldn’t want to mix logs from your custom app running in a container with Nginx and MySQL logs, so you might create separate Logsene Apps for each of them.

Sematext Docker Agent can route logs from different containers to specific Logsene Apps. It builds the log routing table by reading the Logsene App Token from containers’ Docker Labels. This approach is much more dynamic than maintaining a large configuration file that maps container IDs to Logsene App Tokens.

Example:

To route logs from Nginx to a dedicated Logsene App and attach a Docker Label to the Nginx containers:

docker run –label LOGSENE_TOKEN=YOUR-LOGSENE-TOKEN-HERE nginx

Sematext Docker Agent will recognize the Label during the auto-discovery of any new containers and will use the corresponding Logsene App Token to ship logs to that Logsene App. The end result is that all logs from all containers labeled “nginx” will get aggregated in the same Logsene App.

9.2.4 Masking Sensitive Data in Logs

Logs can contain sensitive data — credit card numbers, social security numbers, birthdays, and so on. Sematext Docker Agent lets you mask such sensitive data before shipping it, thus hiding itfrom overly curious 3rd parties (network proxies, storage providers, etc.).

Replacing content with hash codes has the advantage that the content is not “readable” for 3rd parties, but knowing the original value lets you calculate the hash code for later search. This makes it possible to search for logs with a specific hashed field content.

Consider a scenario with a client phone number that was stored as SHA hash code in a masked field in Logsene. During an investigation of a problem related to that phone number you would be able to calculate the SHA hash code of the client phone number and then search for that hash code in your logs in Logsene to find all related logs  without exposing the actual phone number to 3rd parties (including Sematext).

To use this, in the custom pattern definition list all log fields that need to be masked.  The Sematext Docker agent will then automatically mask all such fields. For example, we could use this settings in patterns.yml:

# Sensitive data can be replaced with a hashcode.

# It applies to fields matching the field names by a regular expression

autohash: !!js/regexp /user|password|email|credit_card_number|payment_info/i

#

# set the option to include original log to ‘false’

# when autohash is used.

# The original log line might include sensitive data!

originalLine: false

9.2.5 Automatic Geo-IP Enrichment for Container Logs

Getting logs from Docker Containers collected, shipped and parsed out of the box is already a big time saver, but some application logs need additional enrichment with information from other data sources. A common use case is to enrich web server logs (or really any logs with IP addresses) with geographical information derived from those IP addresses.

Sematext Docker Agent supports Geo-IP enrichment, simply activated by the the environment variable GEOIP_ENABLED=true.

It uses Maxmind Geo-IP lite database, which is updated automatically in the running container! There is no need to stop the container, mount new volumes with the Geo-IP database, etc.

Visualization of Geo-IP data in Logsene / Kibana

Visualization of Geo-IP data in Logsene / Kibana

9.2.6 Post-processing Parsed Logs with JavaScript Functions

Log data can be complex! Simple extraction of text might not be sufficient for advanced analytics  you might need to perform simple calculations based on extracted fields. Or, in another case, you like to transform the most relevant part of a large log entry into a human readable message. Sematext Docker Agent lets you apply post-processing to the output of the log parser to further restructure logs before they are shipped and indexed.

In custom pattern definitions (patterns.yml for the logagent-js parser), post processing hooks can be defined in JavaScript (Node.js runtime). Each pattern definition has an optional “transform” property for such JavaScript functions. In the following example we simply overwrite the “message field” in a web server log with the HTTP method and the path to generate a short but readable content in the “message” field:

Example from https://sematext.github.io/logagent-js/parser/

Example from https://sematext.github.io/logagent-js/parser/

If you want to apply a function to all logs, and not just to specific patterns, Sematext Docker Agent supports this as well. To do that use the JavaScript function called “globalTransform” with two parameters: the name of the log source (image_name/container_name/id) and the parsed object to be modified.

The “globalTransform” function is a top level property in the patterns.yml file and not bound to any specific subsection for patterns.

image08

For more information visit Sematext Docker Agent page.