At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

Logagent Meets Apache Kafka

October 16, 2017

Table of contents

This is a guest post from Filippo Balicchia. Filippo contributed Logagent plugins for Apache Kafka the details of which he is sharing in this post.  Filippo is a software engineer and a passionate coder interested in distributed and cloud technologies, working at Openmind, one of major System integrators in Italy specialized in full development life cycle of eCommerce solutions from inception to maintenance.

Introduction

Logagent is a lightweight, modern log shipper designed to be very simple to use, for people who have not used a log shipper before. Getting Logagent to ship data to Elasticsearch, Sematext Cloud, or other destinations takes just a couple of minutes. The introduction of Apache Kafka input and output plugins makes Logagent even easier to use in more scalable logging architectures that use Kafka without sacrificing the ease of use.

We can think of Kafka as a partitioned and replicated distributed log service. It provides the functionality of a messaging system and can be considered a log-based message broker like Amazon Kinesis stream and Apache DistributedLog.

Give me the details!

One of the main concepts in Kafka is a Topic.  A topic, of which there can be many instances in a single Kafka cluster, is made up of partitions.  Each partition is made of a log stored as a group of files in Kafka Broker.  Each such partition can be replicated to more than one Broker.

Kafka Producers are the ones that publish data to the topics in Kafka Brokers. A Producer can choose which record to assign to which partition within a topic. Data is first written to leader Broker.  The follower Brokers in the ISR (in-sync-replica) replicate records from the leader Broker. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function. The Producer can be configured to wait for the response from leader Broker or from all Brokers in the ISR.  The latter is slower, of course.

At the time of this writing required.acks in logagent-output-kafka can be configured and can assume the following value:

  • 0 – Producer never waits for an acknowledgment from the Broker
  • 1 – Producer gets an acknowledgment after the leader replica has received data
  • -1 – Producer gets an acknowledgment after all in-sync replicas have received data

From a security standpoint Logagent can communicate with Kafka over SSL using a dedicated port, although this is not enabled by default.  We’ll show how easy it is to do that via an example.

Opposite Producers, on the other side of Brokers, are Consumers.  In case of Logagent the  logagent-input-kafka plugin acts as a Consumer. There is also a notion of Consumer Group and each Consumer Group uses one Broker as a coordinator.  Its main task is assigning a partition when a new member of a Consumer Group joins.  In our case, when a new instance of Logagent with logagent-input-kafka shows up.  Kafka Consumers support three messaging semantics types:

  • At least once
  • At most once
  • Exactly once, as of Kafka 0.11 release

At the time of this writing, logagent-input-kafka (version 1.0.4) supports “At least once” with enable.auto.commit=true.  This triggers commits periodically with the interval defined in autoCommitIntervalMs, 5000 ms by default. By reducing the commit interval you can limit the amount of re-processing the Consumer must do in the event of a crash.

A duplicate message may occur when, for example, Logagent consumes and forwards the message but crashes before getting a chance to commit consume offset information to Kafka.  If that happens, Logstash will, upon restart, re-consume data from the last known/committed offset, which means it may re-consume messages it has consumed prior to the crash.

Cool but show me how!

Let’s look at the example where we first publish data to Kafka, then consume it from there, and ship it to Logsene.

As this picture shows, we’ll use both Logagent plugins for Kafka and we will produce and consume data over SSL, also shipping data to Sematext Cloud or Elasticsearch via SSL to make it searchable.

For configuring Kafka we could follow security conf, but for this post, we’ll prepare a Docker compose-file and focus on Logagent integration.

First we’ll install Logagent and the Kafka plugins (assuming nodejs is installed already):

npm i -g @sematext/logagent
npm i -g logagent-output-kafka 
npm i -g logagent-input-kafka

Next, we need to define configuration files for Kafka output and input plugins.

# Global options
options:
  includeOriginalLine: false

input:
  stdin: true

output:
  kafka: 
   module: logagent-output-kafka
   host: localhost
   port: 9092
   topic: test
   requireAcks: 1
   sslEnable: true
   sslOptions: 
      - rejectUnauthorized: false

For logagent-input-kafka (Kafka Consumer) we’ll define logagent-input-kafka.yml:

# Global options
options:
  includeOriginalLine: false

input:
  kafka: 
    module: logagent-input-kafka
    host: localhost
    port: 9092
    groupId: logagent-consumer-example
    # use 'topic' key for a single topic
    # topic: test
    topics: 
      - test
    autoCommit: true
    sessionTimeout: 15000
    sslEnable: true
    sslOptions: 
      - rejectUnauthorized: false

output:
  # index logs in Elasticsearch or Logsene
  elasticsearch: 
    url: https://logsene-receiver.eu.sematext.com
    # default index (Logsene token) to use:
    index: YOUR-LOGSENE-TOKEN-HERE

After creating these Logagent yaml configuration files we can send a message to Kafka like this:

$ logagent --config logagent-output-kafka.yml << “Simple message from Sematext Blog”

And now we just need to consume this message:

$ logagent --config logagent-input-kafka.yml

And that’s it!

Let’s point our browser to Sematext Cloud to check on our data:

Bingo!  We got our data into Kafka via Logagent, consumed it from Kafka via Logagent, and shipped it to Sematext Cloud.  In real life, you would likely be running Logagent as a service and not as a command-line tool, though you can use Logagent CLI and its various parameters if you want.

For your convenience, we’ve made this example available so you can easily grab it and play with it.

Conclusion

In this post, we’ve covered logagent-input-kafka and logagent-output-kafka plugins and demonstrated how easy it is to use them.  If you are currently using Logstash with Kafka and are looking for something lighter, consider Logagent or one of the other Logstash alternatives.

Java Logging Basics: Concepts, Tools, and Best Practices

Imagine you're a detective trying to solve a crime, but...

Best Web Transaction Monitoring Tools in 2024

Websites are no longer static pages.  They’re dynamic, transaction-heavy ecosystems...

17 Linux Log Files You Must Be Monitoring

Imagine waking up to a critical system failure that has...