This is a guest post from Filippo Balicchia. Filippo contributed Logagent plugins for Apache Kafka the details of which he is sharing in this post. Filippo is a software engineer and a passionate coder interested in distributed and cloud technologies, working at Openmind, one of major System integrators in Italy specialized in full development life cycle of eCommerce solutions from inception to maintenance.
Logagent is a lightweight, modern log shipper designed to be very simple to use, for people who have not used a log shipper before. Getting Logagent to ship data to Elasticsearch, Sematext Cloud, or other destinations takes just a couple of minutes. The introduction of Apache Kafka input and output plugins makes Logagent even easier to use in more scalable logging architectures that use Kafka without sacrificing the ease of use.
We can think of Kafka as a partitioned and replicated distributed log service. It provides the functionality of a messaging system and can be considered a log-based message broker like Amazon Kinesis stream and Apache DistributedLog.
Give me the details!
One of the main concepts in Kafka is a Topic. A topic, of which there can be many instances in a single Kafka cluster, is made up of partitions. Each partition is made of a log stored as a group of files in Kafka Broker. Each such partition can be replicated to more than one Broker.
Kafka Producers are the ones that publish data to the topics in Kafka Brokers. A Producer can choose which record to assign to which partition within a topic. Data is first written to leader Broker. The follower Brokers in the ISR (in-sync-replica) replicate records from the leader Broker. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function. The Producer can be configured to wait for the response from leader Broker or from all Brokers in the ISR. The latter is slower, of course.
At the time of this writing required.acks in logagent-output-kafka can be configured and can assume the following value:
- 0 – Producer never waits for an acknowledgment from the Broker
- 1 – Producer gets an acknowledgment after the leader replica has received data
- -1 – Producer gets an acknowledgment after all in-sync replicas have received data
From a security standpoint Logagent can communicate with Kafka over SSL using a dedicated port, although this is not enabled by default. We’ll show how easy it is to do that via an example.
Opposite Producers, on the other side of Brokers, are Consumers. In case of Logagent the logagent-input-kafka plugin acts as a Consumer. There is also a notion of Consumer Group and each Consumer Group uses one Broker as a coordinator. Its main task is assigning a partition when a new member of a Consumer Group joins. In our case, when a new instance of Logagent with logagent-input-kafka shows up. Kafka Consumers support three messaging semantics types:
- At least once
- At most once
- Exactly once, as of Kafka 0.11 release
At the time of this writing, logagent-input-kafka (version 1.0.4) supports “At least once” with enable.auto.commit=true. This triggers commits periodically with the interval defined in autoCommitIntervalMs, 5000 ms by default. By reducing the commit interval you can limit the amount of re-processing the Consumer must do in the event of a crash.
A duplicate message may occur when, for example, Logagent consumes and forwards the message but crashes before getting a chance to commit consume offset information to Kafka. If that happens, Logstash will, upon restart, re-consume data from the last known/committed offset, which means it may re-consume messages it has consumed prior to the crash.
Cool but show me how!
Let’s look at the example where we first publish data to Kafka, then consume it from there, and ship it to Logsene.
As this picture shows, we’ll use both Logagent plugins for Kafka and we will produce and consume data over SSL, also shipping data to Sematext Cloud or Elasticsearch via SSL to make it searchable.
npm i -g @sematext/logagent npm i -g logagent-output-kafka npm i -g logagent-input-kafka
Next, we need to define configuration files for Kafka output and input plugins.
# Global options options: includeOriginalLine: false input: stdin: true output: kafka: module: logagent-output-kafka host: localhost port: 9092 topic: test requireAcks: 1 sslEnable: true sslOptions: - rejectUnauthorized: false
For logagent-input-kafka (Kafka Consumer) we’ll define logagent-input-kafka.yml:
# Global options options: includeOriginalLine: false input: kafka: module: logagent-input-kafka host: localhost port: 9092 groupId: logagent-consumer-example # use 'topic' key for a single topic # topic: test topics: - test autoCommit: true sessionTimeout: 15000 sslEnable: true sslOptions: - rejectUnauthorized: false output: # index logs in Elasticsearch or Logsene elasticsearch: url: https://logsene-receiver.eu.sematext.com # default index (Logsene token) to use: index: YOUR-LOGSENE-TOKEN-HERE
After creating these Logagent yaml configuration files we can send a message to Kafka like this:
$ logagent --config logagent-output-kafka.yml << “Simple message from Sematext Blog”
And now we just need to consume this message:
$ logagent --config logagent-input-kafka.yml
And that’s it!
Let’s point our browser to Sematext Cloud to check on our data:
Bingo! We got our data into Kafka via Logagent, consumed it from Kafka via Logagent, and shipped it to Sematext Cloud. In real life, you would likely be running Logagent as a service and not as a command-line tool, though you can use Logagent CLI and its various parameters if you want.
For your convenience, we’ve made this example available so you can easily grab it and play with it.
In this post, we’ve covered logagent-input-kafka and logagent-output-kafka plugins and demonstrated how easy it is to use them. If you are currently using Logstash with Kafka and are looking for something lighter, consider Logagent or one of the other Logstash alternatives.