At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

Shipping Kubernetes Nginx Ingress Logs Made Easy

September 11, 2019

Table of contents

Kubernetes is gaining popularity every day. Using an Ingress controller is the preferred method of allowing external access to the services in a cluster. This makes ingress logs incredibly important for tracking the performance of your services, issues, bugs, and even the security of your cluster.

You can learn more about logs in Kubernetes from our Kubernetes logging tutorial or check out how you can monitor Kubernetes logs, metrics, and events from our guide on Kubernetes monitoring.

Depending on the traffic volume, logging each request may end up being expensive. How to solve this? There are several methods of reducing log volume and, in doing so, the cost as well. Here are a few:

  1. Remove unnecessary log enrichment. Logagent is very neat. It enriches each log line by adding extra container info that is very useful for understanding the context. In the case of an Ingress this extra info may not help that much and could be removed.
  2. Log fewer fields. Logging everything in hopes it will someday be of help is an admirable goal. Logging only important things is much harder. Choose wisely which fields you think will be important. For example, the HTTP referrer field is useful when debugging a web application, but may not be important at the Ingress level, where the main task is to route requests.
  3. Log less data. Logging all requests may provide various stats if logs are used as metrics. If you already collect those metrics, you could just skip logging successful requests.

Following these three steps helped us reduce our log volume in Sematext Logs by 75%.

Let’s see how you can ship Nginx Ingress logs using Sematext Logagent without breaking the bank.

Sematext Logagent is an open-source, light-weight log shipper parsing many log formats out of the box. With its rich set of input and output plugins, it becomes a general ETL tool for time-series data like logs or IoT sensor data. You can read data from various sources like files, databases, Elasticsearch or IoT devices (via MQTT), process the data and store the data in files, databases Apache Kafka, or Elasticsearch, InfluxDB or Sematext Cloud.

In the following examples, we will collect the Nginx Ingress controller log files and ship them to Elasticsearch. We will remove less important fields to reduce log volume.

Shipping Ingress logs

We start by assuming that the following prerequisites are met: 

The first step is to enable JSON logging, by updating the Ingress config section:

defaultBackend:
  replicaCount: 2

controller:
  kind: DaemonSet
  extraEnvs:
    - name: LOGS_TOKEN
      value: "<YOUR_LOGS_TOKEN>"
  config:
    use-forwarded-headers: "true"
    use-geoip: "false"
    use-geoip2: "false"
    log-format-escape-json: "true"
    log-format-upstream: '{ "@timestamp": "$time_iso8601", "remote_addr":
        "$remote_addr", "x-forward-for": "$proxy_add_x_forwarded_for",
        "request_id": "$req_id", "remote_user": "$remote_user", "bytes_sent":
        "$bytes_sent", "request_time": "$request_time", "status": "$status",
        "vhost": "$host", "request_proto": "$server_protocol", "path": "$uri",
        "request_query": "$args", "request_length": "$request_length",
        "duration": "$request_time", "method": "$request_method", "http_referrer":
        "$http_referer", "http_user_agent": "$http_user_agent" }'

Logagent can now easily parse and ship logs:

shell script
helm install --name agent stable/sematext-agent -f agent.
region: US
logsToken: "<YOUR_LOGS_TOKEN>"
logagent:
  config:
    MATCH_BY_NAME: .*_(default|ingress)_.*

By using MATCH_BY_NAME we limit log collection to the default and ingress namespaces.

A sample log line will look like this, and is almost 3k (2570 chars) in size:

{
  "@timestamp": "2019-07-29T07:27:32.030Z",
  "severity": "info",
  "os": {
    "host": "ip-10-4-62-243.eu-central-1.compute.internal"
  },
  "timestamp": "2019-07-29T07:27:32+00:00",
  "remote_addr": "188.26.243.229",
  "x-forward-for": "188.26.243.229",
  "request_id": "b7ba683189225e96d7af6b8e42554720",
  "remote_user": "",
  "bytes_sent": 974,
  "request_time": 0.001,
  "status": 200,
  "vhost": "k8s-echo.test.elb.eu-west-1.amazonaws.com",
  "request_proto": "HTTP/1.1",
  "path": "/echo",
  "request_query": "",
  "request_length": 604,
  "duration": 0.001,
  "method": "GET",
  "http_referrer": "",
  "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15",
  "logSource": 
"sha256:02149b6f439fe60efad026f3846452f3cd0861f3b22f9e9b45eb0a5abba314ac_k8s_nginx-ingress-controller_ingress-nginx-ingress-controller-5rss4_ingress_09ac9e9d-b1d2-11e9-b8ba-027092ac4e5e_0_a6b891d21807",
  "container": {
    "id": "a6b891d21807e59124982a0812eecb80c91d57d2eae35086bdb92bc20e2aea88",
    "type": "docker",
    "name": "k8s_nginx-ingress-controller_ingress-nginx-ingress-controller-5rss4_ingress_09ac9e9d-b1d2-11e9-b8ba-027092ac4e5e_0",
    "image": {
      "name": "sha256:02149b6f439fe60efad026f3846452f3cd0861f3b22f9e9b45eb0a5abba314ac"
    },
    "host": {
      "hostname": "ip-10-4-62-243.eu-central-1.compute.internal"
    }
  },
  "labels": {
    "io_kubernetes_container_logpath": "/var/log/pods/09ac9e9d-b1d2-11e9-b8ba-027092ac4e5e/nginx-ingress-controller/0.log",
    "io_kubernetes_container_name": "nginx-ingress-controller",
    "io_kubernetes_docker_type": "container",
    "io_kubernetes_pod_name": "ingress-nginx-ingress-controller-5rss4",
    "io_kubernetes_pod_namespace": "ingress",
    "io_kubernetes_pod_uid": "09ac9e9d-b1d2-11e9-b8ba-027092ac4e5e",
    "io_kubernetes_sandbox_id": "ca9a453a24ddcfbd33834a20769cca82fbff50a8aef1414c8dd1b4846e5d9d33",
    "annotation_io_kubernetes_container_hash": "aebb22e4",
    "annotation_io_kubernetes_container_ports": 
"[{\"name\":\"http\",\"containerPort\":80,\"protocol\":\"TCP\"},{\"name\":\"https\",\"containerPort\":443,\"protocol\":\"TCP\"}]",
    "annotation_io_kubernetes_container_restartCount": "0",
    "annotation_io_kubernetes_container_terminationMessagePath": "/dev/termination-log",
    "annotation_io_kubernetes_container_terminationMessagePolicy": "File",
    "annotation_io_kubernetes_pod_terminationGracePeriod": "60"
  },
  "@timestamp_received": "2019-07-29T07:27:41.697Z",
  "logsene_orig_type": "logs"
}

Remove log enrichment

We immediately notice 3 larger fields container, labels and logSource that were added by Logagent. These could be removed in this context by adding REMOVE_FIELDS in the configuration:

shell script
helm upgrade agent stable/sematext-agent -f agent.
region: US
logsToken: "<YOUR_LOGS_TOKEN>"
logagent:
  config:
    MATCH_BY_NAME: .*_(default|ingress)_.*
    REMOVE_FIELDS: container,labels,logSource

The result is only 1k (890 chars) in size:

{
  "@timestamp": "2019-07-29T07:42:39.368Z",
  "severity": "info",
  "os": {
    "host": "ip-10-4-62-243.eu-central-1.compute.internal"
  },
  "timestamp": "2019-07-29T07:42:39+00:00",
  "remote_addr": "188.26.243.229",
  "x-forward-for": "188.26.243.229",
  "request_id": "8074ba0130449bf3eff03655c3e7da5e",
  "remote_user": "",
  "bytes_sent": 971,
  "request_time": 0.001,
  "status": 200,
  "vhost": "k8s-echo.test.elb.eu-west-1.amazonaws.com",
  "request_proto": "HTTP/1.1",
  "path": "/echo",
  "request_query": "",
  "request_length": 604,
  "duration": 0.001,
  "method": "GET",
  "http_referrer": "",
  "http_user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Safari/605.1.15",
  "index": "c7718d77-9d0a-4729-96a9-08d32c6fa07d",
  "@timestamp_received": "2019-07-29T07:42:45.980Z",
  "logsene_orig_type": "logs"
}

Log fewer fields

This looks small, but if you process millions of requests daily, it can add up to a lot at the end of the month. Removing some unneeded fields may improve things even further. This time the fields should be removed in the Nginx Ingress log format.

    log-format-upstream: '{ "@timestamp": "$time_iso8601",
        "remote_addr": "$remote_addr", "bytes_sent": "$bytes_sent",
        "request_time": "$request_time", "status": "$status",
        "vhost": "$host", "request_proto": "$server_protocol",
        "path": "$uri", "request_query": "$args",
        "request_length": "$request_length", "duration": "$request_time",
        "method": "$request_method"}'

The result is now only 0.5k (569 chars) in size:

{
  "@timestamp": "2019-07-29T08:00:53.000Z",
  "severity": "info",
  "os": {
    "host": "ip-10-4-62-243.eu-central-1.compute.internal"
  },
  "remote_addr": "188.26.243.229",
  "bytes_sent": 974,
  "request_time": 0.002,
  "status": 200,
  "vhost": "k8s-echo.test.elb.eu-west-1.amazonaws.com",
  "request_proto": "HTTP/1.1",
  "path": "/echo",
  "request_query": "",
  "request_length": 604,
  "duration": 0.002,
  "method": "GET",
  "index": "c7718d77-9d0a-4729-96a9-08d32c6fa07d",
  "@timestamp_received": "2019-07-29T08:00:55.979Z",
  "logsene_orig_type": "logs"
}

Log less data

Pretty big change from 2570 chars to 569 chars. There’s only one way to reduce it even further, and that’s to not ship all of the logs. For example, the 2xx requests can be dropped, by filtering them using IGNORE_LOGS_PATTERN in Logagent:

shell script
helm upgrade agent stable/sematext-agent -f agent.yaml
region: US
logsToken: "<YOUR_LOGS_TOKEN>"
logagent:
  config:
    MATCH_BY_NAME: .*_(default|ingress)_.*
    REMOVE_FIELDS: container,labels,logSource
    IGNORE_LOGS_PATTERN: \“status\“:\s20

Summary

The examples above show that Loagagent provides all required methods to reduce log volume and at the same time reduce costs. By using MATCH_BY_NAME you can limit log collection to desired namespaces. Unneeded fields can be removed using REMOVE_FIELDS in the configuration. Even entire log lines can be ignored with IGNORE_LOGS_PATTERN. Logagent makes it easy to slim down any logs with very little effort.
Don’t hesitate to shoot us a message about any questions you may have.

Service Level Agreement (SLA)

Definition: What Is an SLA? A service-level agreement (SLA) is...

JVM Threads

What Is a Thread in Java? A Java thread is...

Key Elastic Dev Commands for Troubleshooting Disk Issues

Disk-related issues with Elasticsearch can present themselves through various symptoms....