Docker “Swarm Mode”: Full Cluster Monitoring & Logging with 1 Command

Until recently, automating the deployment of Performance Monitoring agents in Docker Swarm clusters was challenging because monitoring agents had to be deployed to each cluster node and the previous Docker releases (<Docker engine v1.12 / Docker Swarm 1.2.4) had no global service scheduler (Github issue #601).  Scheduling services with via docker-compose and scheduling constraints required manual updates when the number of nodes changed in the swarm cluster – definitely not convenient for dynamic scaling of clusters! In Docker Swarm Monitoring and Logging we shared some Linux shell acrobatics as workaround for this issue.

The good news: All this has changed with Docker Engine v1.12 and new Swarm Mode. The latest release of Docker v1.12 provides many new features for orchestration and the new Swarm mode made it much easier to deploy Swarm clusters.  


With Docker v1.12 services can be scheduled globally – similar to Kubernetes DaemonSet, RancherOS global services or CoreOS global fleet services


Read More


5 Logstash Alternatives

When it comes to centralizing logs to Elasticsearch, the first log shipper that comes to mind is Logstash. People hear about it even if it’s not clear what it does:
– Bob: I’m looking to aggregate logs
– Alice: you mean… like… Logstash?

When you get into it, you realize centralizing logs often implies a bunch of things, and Logstash isn’t the only log shipper that fits the bill:

  • fetching data from a source: a file, a UNIX socket, TCP, UDP…
  • processing it: appending a timestamp, parsing unstructured data, adding Geo information based on IP
  • shipping it to a destination. In this case, Elasticsearch. And because Elasticsearch can be down or struggling, or the network can be down, the shipper would ideally be able to buffer and retry

In this post, we’ll describe Logstash and its alternatives – 5 “alternative” log shippers (Filebeat, Fluentd, rsyslog, syslog-ng and Logagent), so you know which fits which use-case.
Read More


Open Source Docker Monitoring & Logging

Pets ⇒ Cattle ⇒ Orchestration

Docker is growing by leaps and bounds, and along with it its ecosystem.  Being light, the predominant container deployment involves running just a single app or service inside each container.  Most software products and services are made up of at least several such apps/services.  We all want all our apps/services to be highly available and fault tolerant.  Thus, Docker containers in an organization quickly start popping up like mushrooms after the rain.  They multiply faster than rabbits. While in the beginning we play with them like cute little pets, as their number quickly grow we realize we are dealing with aherd of cattle, implying we’ve become cowboys.  Managing a herd with your two hands, a horse, and a lasso willget you only so far.  You won’t be able to ride after each and every calf that wonders in the wrong direction.  To get back to containers from this zoological analogy – operating so many moving pieces at scale is impossible without orchestration – this is why we’ve seen the rise of Docker Swarm, Kubernetes, Mesos, CoreOS, RancherOS and so on.


Containers multiply faster than Gremlins

Pets ⇒ Cattle ⇒ Orchestration + Operational Insights

Container orchestration helps you manage your containers, their placement, their resources, and their whole life cycle.  While containers and applications in them are running, in addition to the whole life cycle management we need container monitoring and log management so we can troubleshoot performance or stability issues, debug or tune applications, and so on.  Just like with orchestration, there are a number of open-source container monitoring and logging tools.  It’s great to have choices, but having lots of them means you need to evaluate and compare them to pick the one that best matches your needs.

DevOps Tools Comparison

We’ve open-sourced our Sematext Docker Agent (SDA for short) which works with SPM for monitoring and Logsene for log management (think of it as ELK as a Service), and wanted to provide a high level comparison of SDA and several popular Docker monitoring and logging tools, like CAdvisor, Logspout, and others.  In the following table we group tools by functionality and include monitoring agents, log collectors and shippers, storage backends, and tools that provide the UI and visualizations.  For each functionality we list in the “Common Tools” column one or more popular open-source tools that provide that functionality.  An empty “Common Tools” cell means there are no popular open-source tools that provide it, or at least we are not aware of it — if we messed something up, please leave a comment or tweet @sematext.

Functionality Common Tools Sematext Tools
Collect Logs from Docker API
(including auto-discovery of new containers)
Logspout Sematext Docker Agent
Log routing Logspout
Routing setup for containers via HTTP API to syslog, redis, kafka, logstash
Docker Logging Drivers (e.g. syslog, journald, fluentd, etc.)
Sematext Docker Agent
(routing of logs to different indices based on container labels)
Automatic log tagging
(with Docker Compose or Swarm or Kubernetes metadata)
For Kubernetes: fluentd-elasticsearch, assumes Elasticsearch deployed locally Sematext Docker Agent
Collect Docker Metrics CAdvisor Sematext Docker Agent
Collect Docker Events ? Sematext Docker Agent
Logs format detection
(most tools need a static setup per logfile/application)
? Sematext Docker Agent
(out of the box format detection and parsing; the parser and the logagent-js pattern library is open source)
Logs parsing and shipping Fluentd
Sematext Docker Agent
Logs storage and indexing Elasticsearch
(exposes Elasticsearch API)
Logs anomaly detection and alerting ? Logsene
Log search and analytics Kibana
(Logsene’s own UI or integrated Kibana, or Grafana connected to Logsene via Elasticsearch data source)
Metrics storage and aggregation Graphite
Metrics charts and dashboards Grafana
Metrics anomaly detection and alerting Influxdb
Correlation of Metrics, Logs and Events ? SPM & Logsene integration

This table shows a few things:

  • Some of the functionality provided by SPM and Logsene is not available in some of the most popular open-source monitoring and logging tools included here
  • Some of the SPM and Logsene functionality is indeed provided by some of the open-source tools, however none of them seems to encompass all the features, forcing one to mix and match and head down the tech debt-ridden Franken-monitoring path
  • Try it yourself in the MindMap below – pick a few functionalities and see how many different tools you might have to use?


Avoid building technical-debt & Franken-monitoring by using a limited number of Docker monitoring & logging tools

Again, if we missed something, please leave a comment or tweet @sematext.
If you want to try Sematext Docker Agent sign up for a free trial.

P.S.: Sematext Docker Agent is available in the RancherOS Community Catalog and shows up with our new mascot “Octi” only one more pet 🙂 – so if you use RancherOS search for “sematext” in the RancherOS Catalog and within a few clicks you’ll have the Sematext Docker Agent deployed to your RancherOS clusters!



5 Minute Recipe: Heroku Log Drain Setup

Since we wrote about how to ship Heroku Logs to ELK we’ve received good feedback from Heroku users and, encouraged by that feedback, deployed a log ingestion service for apps running on Heroku. This makes it super easy to get structured Heroku Logs into Logsene, the hosted ELK logging service.  Let’s see how that’s done in under five minutes (check the current time!):

Step 1 – Create your Logsene App

If you don’t have a Logsene account already simply get a free account and create a Logsene App. This will get you a Logsene Application Token.

Step 2 – Configure Log Drain for your Heroku App

Once you create your Logsene app you’ll see a command to set up the Heroku Log Drain including the Logsene Token.

Simply copy that command and run it in one of the two places:

  1. in the Heroku app directory, like this:

heroku drains:add

  1. alternatively, specify your app name in the command instead of calling the command from your Heroku app directory:

heroku drains:add -a YOUR_HEROKU_APP_NAME

Step 3 – Watch your Logs in Logsene

If you now access your Heroku App, Heroku should log your HTTP request and a few seconds later the logs will be visible in Logsene.  And not in just any format!  You’ll see PERFECTLY STRUCTURED HEROKU LOGS:


Parsed Heroku Logs in Logsene


Check the time!  Under five minutes?  If you like your Heroku app logs in Logsene tweet us your setup time. 🙂

rsyslog Elasticsearch reindex multiple scripts(3)

Scalable and Flexible Elasticsearch Reindexing via rsyslog

Earlier on, we posted a recipe on reindexing data from within an Elasticsearch 2.3+ cluster. But this doesn’t work if you want to reindex in a different cluster or if your Elasticsearch is older than 2.3. Or both, when you’re trying to migrate from 1.x to 2.x or later.

For such cases, we posted a Logstash reindexing recipe. However, Logstash can sometimes become a bottleneck, so we needed something faster for indexing lots of data. We turned to rsyslog, a log shipper with performance as its #1 feature.

The plan

As rsyslog doesn’t have an Elasticsearch input like Logstash does, we’ve used an external application to scroll through Elasticsearch documents and push them to rsyslog via TCP. The flow would be:

rsyslog to Elasticsearch reindex flow

This is an easy way to extend rsyslog, using whichever language you’re comfortable with, to support more inputs. Here, we piggyback on the TCP input. You can do a similar job with filters/parsers – you can find some examples here – by piggybacking the mmexternal module, which uses stdout&stdin for communication. The same is possible for outputs, normally added via the omprog module: we did this to add a Solr output and one for SPM custom metrics.

The custom script in question doesn’t have to be multi-threaded, you can simply spin up more of them, scrolling different indices. In this particular case, using two scripts gave us slightly better throughput, saturating the network:

rsyslog to Elasticsearch reindex flow multiple scripts

Writing the custom script

Before starting to write the script, one needs to know how the messages sent to rsyslog would look like. To be able to index data, rsyslog will need an index name, a type name and optionally an ID. In this particular case, we were dealing with logs, so the ID wasn’t necessary.

With this in mind, I see a number of ways of sending data to rsyslog:

  • one big JSON per line. One can use mmnormalize to parse that JSON, which then allows rsyslog do use values from within it as index name, type name, and so on
  • for each line, begin with the bits of “extra data” (like index and type names) then put the JSON document that you want to reindex. Again, you can use mmnormalize to parse, but this time you can simply trust that the last thing is a JSON and send it to Elasticsearch directly, without the need to parse it
  • if you only need to pass two variables (index and type name, in this case), you can piggyback on the vague spec of RFC3164 syslog and send something like
    destination_index document_type:{"original": "document"}

This last option will parse the provided index name in the hostname variable, the type in syslogtag and the original document in msg. A bit hacky, I know, but quite convenient (makes the rsyslog configuration straightforward) and very fast, since we know the RFC3164 parser is very quick and it runs on all messages anyway. No need for mmnormalize, unless you want to change the document in-flight with rsyslog.

Below you can find the Python code that can scan through existing documents in an index (or index pattern, like logstash_2016.05.*) and push them to rsyslog via TCP. You’ll need the Python Elasticsearch client (pip install elasticsearch) and you’d run it like this:

python source_index destination_index

The script being:

from elasticsearch import Elasticsearch
import json, socket, sys

source_cluster = ['server1', 'server2']
rsyslog_address = ''
rsyslog_port = 5514

es = Elasticsearch(source_cluster,
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((rsyslog_address, rsyslog_port))

result =[1], scroll='1m', search_type='scan', size=500)

while True:
  res = es.scroll(scroll_id=result['_scroll_id'], scroll='1m')
  for hit in result['hits']['hits']:
    s.send(sys.argv[2] + ' ' + hit["_type"] + ':' + json.dumps(hit["_source"])+'\n')
  if not result['hits']['hits']:


If you need to modify messages, you can parse them in rsyslog via mmjsonparse and then add/remove fields though rsyslog’s scripting language. Though I couldn’t find a nice way to change field names – for example to remove the dots that are forbidden since Elasticsearch 2.0 – so I did that in the Python script:

def de_dot(my_dict):
  for key, value in my_dict.iteritems():
    if '.' in key:
      my_dict[key.replace('.','_')] = my_dict.pop(key)
    if type(value) is dict:
      my_dict[key] = de_dot(my_dict.pop(key))
  return my_dict

And then the “send” line becomes:

s.send(sys.argv[2] + ' ' + hit["_type"] + ':' + json.dumps(de_dot(hit["_source"]))+'\n')

Configuring rsyslog

The first step here is to make sure you have the lastest rsyslog, though the config below works with versions all the way back to 7.x (which can be found in most Linux distributions). You just need to make sure the rsyslog-elasticsearch package is installed, because we need the Elasticsearch output module.

# messages bigger than this are truncated
$maxMessageSize 10000000  # ~10MB

# load the TCP input and the ES output modules

  # buffer up to 1M messages in memory
  # these threads process messages and send them to Elasticsearch
  # rsyslog processes messages in batches to avoid queue contention
  # this will also be the Elasticsearch bulk size

# we use templates to specify how the data sent to Elasticsearch looks like
template(name="document" type="list"){
  # the "msg" variable contains the document
template(name="index" type="list"){
  # "hostname" has the index name
template(name="type" type="list"){
  # "syslogtag" has the type name

# start the TCP listener on the port we pointed the Python script to
input(type="imtcp" port="5514")

# sending data to Elasticsearch, using the templates defined earlier
  dynSearchIndex="on" searchIndex="index"
  dynSearchType="on" searchType="type"
  server="localhost"  # destination Elasticsearch host
  serverport="9200"   # and port
  bulkmode="on"  # use the bulk API
  action.resumeretrycount="-1"  # retry indefinitely if Elasticsearch is unreachable

This configuration doesn’t have to disturb your local syslog (i.e. by replacing /etc/rsyslog.conf). You can put it someplace else and run a different rsyslog process:

rsyslogd -i /var/run/ -f /home/me/rsyslog_reindexer.conf

And that’s it! With rsyslog started, you can start the Python script(s) and do the reindexing.

If you need any help with Elasticsearch, rsyslog, Logstash and the like, check out our Elasticsearch consulting, Logging consulting, Elasticsearch production support and Elasticsearch and Logging training info.

23 rules logstash up cpu

Elasticsearch Ingest Node vs Logstash Performance

Starting from Elasticsearch 5.0, you’ll be able to define pipelines within it that process your data, in the same way you’d normally do it with something like Logstash. We decided to take it for a spin and see how this new functionality (called Ingest) compares with Logstash filters in both performance and functionality.

Specifically, we tested the grok processor on Apache common logs (we love logs here), which can be parsed with a single rule, and on CISCO ASA firewall logs, for which we have 23 rules. This way we could also check how both Ingest and Logstash scale when you start adding more rules.

Baseline performance

To get a baseline, we pushed logs with Filebeat 5.0alpha1 directly to Elasticsearch, without parsing them in any way. We used an AWS c3.large for Filebeat (2 vCPU) and a c3.xlarge for Elasticsearch (4 vCPU). We also installed SPM to monitor Elasticsearch’s performance.

It turned out that network was the bottleneck, which is why pushing raw logs doesn’t saturate the CPU:
raw logs CPU

Even though we got a healthy throughput rate of 12-14K EPS:
raw logs throughput

But raw, unparsed logs are rarely useful. Ideally, you’d log in JSON and push directly to Elasticsearch. Conveniently, Filebeat can parse JSON since 5.0. That said, throughput dropped to about 4K EPS because JSON logs are bigger and saturate the network:
Throughput of JSON logs

CPU dropped as well, but not that much because now Elasticsearch has to do more work (more fields to index):

This 4K EPS throughput/40 percent CPU ratio is the most efficient way to send logs to Elasticsearch – if you can log in JSON. If you can’t, you’ll need to parse them. So we added another c3.xl instance (4 vCPUs) to do the parsing, first with Logstash, then with a separate Elasticsearch dedicated Ingest node.


With Logstash 5.0 in place, we pointed Filebeat to it, while tailing the raw Apache logs file. On the Logstash side, we have a beats listener, a grok filter and an Elasticsearch output:

input {
  beats {
    port => 5044

filter {
   grok {
     match => ["message", "%{COMMONAPACHELOG}%{GREEDYDATA:additional_fields}"]

output {
  elasticsearch {
    hosts => ""
    workers => 4

The default number of 2 pipeline workers seemed enough, but we’ve specified more output workers to make up for the time each of them waits for Elasticsearch to reply. That said, network was again the bottleneck so throughput was capped at 4K EPS like with JSON logs:
Logstash apache logs throughput

Meanwhile, Logstash used just about the same amount of CPU as Elasticsearch, at 40-50%:
Logstash apache logs CPU usage

Then we parsed CISCO ASA logs. The config looks similar, except there were 23 grok rules instead of one. Logstash handled the load surprisingly well – throughput was again capped by the network, slightly lower than before because JSONs were bigger:
Logstash CISCO ASA grok throughput

While CPU usage only increased to 60-70%:
Logstash CISCO ASA CPU usage

This means the throughput-to-CPU ratio only went down by about 1.5x after adding a lot more rules. However, in both cases Logstash proved pretty heavy, using about the same CPU to parse the data as Elasticsearch used for indexing it. Let’s see if the Ingest node can do better.

Ingest node

We used the same c3.xl instance for Ingest node tests: we’ve set node.master and to false in its elasticsearch.yml, to make sure it only does grok and nothing else. We’ve also set node.ingest to false of the data node, so it can focus on indexing.

Next step was to define a pipeline that does the grok processing on the Ingest node:

curl -XPOST localhost:9200/_ingest/pipeline/apache?pretty -d '{
  "description": "grok apache logs",
  "processors": [
      "grok": {
        "field": "message",
        "pattern": "%{COMMONAPACHELOG}%{GREEDYDATA:additional_fields}"

Then, to trigger the pipeline for a certain document/bulk, we added the name of the defined pipeline to the HTTP parameters like pipeline=apache. We used curl this time for indexing, but you can add various parameters in Filebeat, too.

With Apache logs, the throughput numbers were nothing short of impressive (12-16K EPS):
ingest node apache logs grok throughput

This used up all the CPU on the data node, while the ingest node was barely breaking a sweat at 15%:
ingest node grok apache logs CPU usage

Because Filebeat only sent raw logs to Elasticsearch (specifically, the dedicated Ingest node), there was less strain on the network. The Ingest node, on the other hand, also acted like a client node, distributing the logs (now parsed) to the appropriate shards, using the node-to-node transport protocol. Overall, the Ingest node provided ~10x better CPU-to-throughput ratio than Logstash.

Things still look better, but not this dramatic, with CISCO ASA logs. We have multiple sub-types of logs here, and therefore multiple grok rules. With Logstash, you can specify an array of match directives:

grok {
  match => [
   "cisco_message", "%{CISCOFW106001}",
   "cisco_message", "%{CISCOFW106006_106007_106010}",

There’s no such thing for Ingest node yet, so you need to define one rule, and then use the on_failure block to define another grok rule (effectively saying “if this rule doesn’t match, try that one”) and keep nesting like that until you’re done:

"grok": {
  "field": "cisco_message",
  "pattern": "%{CISCOFW106001}",
  "on_failure": [
      "grok": {
      "field": "cisco_message",
      "pattern": "%{CISCOFW106006_106007_106010}",
      "on_failure": [...

The other problem is performance. Because now there are up to 23 rules to evaluate, throughput goes down to about 10K EPS:
Ingest node CISCO ASA grok throughput

And the CPU bottleneck shifts to the Ingest node:
Ingest node CISCO ASA grok CPU

Overall, the throughput-to-CPU ratio of the Ingest node dropped by a factor of 9 compared to the Apache logs scenario.


  • Logstash is easier to configure, at least for now, and performance didn’t deteriorate as much when adding rules
  • Ingest node is lighter across the board. For a single grok rule, it was about 10x faster than Logstash
  • Ingest nodes can also act as “client” nodes
  • Define the grok rules matching most logs first, because both Ingest and Logstash exit the chain on the first match by default

You’ve made it all the way down here? Bravo! If you need any help with Elasticsearch – don’t forget @sematext does Elasticsearch Consulting, Production Support, as well as Elasticsearch Training.

Featured Image Geo-IP

Automatic Geo-IP Enrichment for Docker Logs

In “Innovative Docker Log Management” we wrote about the alternative (and better?) method for Docker logging compared to log drivers, which do only log forwarding. Getting logs from Docker Containers collected, shipped and parsed out of the box is already a big time saver, but some application logs need additional enrichment with information from other data sources. A common use case is to enrich web server logs (or really any logs with IP addresses) with geographical information derived from those IP addresses. Over the last few weeks, we’ve added Geo-IP support to logagent-js (blog post), which is used by the Sematext Docker Agent.

Use Sematext Docker Agent for out of the box Geo-IP support!

Here’s how to enable GeoIP lookups for your logs:

  1. Enable the feature with -e GEOIP_ENABLED=true in the docker run command for sematext/sematext-agent-docker
  2. Geo-IP lookups are enabled for web server logs out of the box (SDA v1.29.32 and above)
  3. Any new pattern in the rich pattern library could use Geo-IP lookup just by adding the setting
    geoIP: fieldName in the pattern. See for example the web server patterns here.

Things you do not need to think about at all:

  1. Maxmind Geo-IP lite database is downloaded automatically (on each start of the agent)

  2. Automatic updates for the GeoIP database is integrated too (update check runs every hour)

  3. Elasticsearch mapping for the Geo-Coordinates in Logsene for geographic queries and map displays

So if you install Sematext Docker Agent on Docker, Docker Cloud, Docker UCP or Docker Swarm all your web server logs will automatically get new fields geoip.location (longitude/latitude),,,, …

The new Geo-IP lookup feature for web server logs needs ZERO configuration for Docker users. Getting Geo-IP information into logs traditionally required administrative/IP work like downloading the Geo-IP database and setup cron jobs to keep it up to date, then addition of configurations for web servers to add this information to logs or configure log shippers like Logstash to do so …. This is not the case when you use Sematext Agent for Docker and the setup is easy, here is a complete example to check it out:

  1. Run Sematext Docker Agent agent as usual:
    docker run -d --name sematext-agent --restart=always \
    -e SPM_TOKEN=
    -e GEOIP_ENABLED=true \
    -v /var/run/docker.sock sematext/sematext-agent-docker

  2. Start nginx (or jwilder/nginx-proxy or Apache if you like … )
    docker run -p 80:80 -v $PWD/content:/usr/share/nginx/html:ro -d nginx

  3. Open a web browser to access nginx http://your-docker-host/

Sematext Docker Agent collects, parses and enriches the nginx logs and then ships them to Logsene. We made a little dashboard in Logsene’s integrated Kibana showing logs, image name of the Docker containers and a map with locations of the clients:

The example above needs no configuration for web server logs or the Geo-IP lookups! It’s never been this simple to get a web server setup including web analytics and performance metrics: One command to run a web server and another one to get structured logs, metrics and events!

We think such an easy setup is a good reason to run web servers on Docker, Docker-Swarm or Docker Cloud!

Do you need support for Geo-IP in other applications running on Docker? Please let us know and get in touch with on Twitter @sematext or via Github for feature requests. If you like what what you have seen here give SPM for Docker and Logsene a go!


Docker Cloud: Monitoring & Logging

Docker Cloud is a hosted service for Docker Container Management, originally based on Tutum Cloud, which was acquired by Docker in October 2015. Sematext supported the deployment of Sematext Docker Agent on Tutum Cloud from the get-go, so naturally we were quick to add support for Docker Cloud as well.

What is Docker Cloud?

Docker Cloud is a container management service that supports multiple cloud providers such as Amazon, DigitalOcean, IBM Softlayer, MS Azure and This makes it much easier to switch Docker deployments to different cloud providers or use a mix of providers including on-premises nodes for hybrid cloud applications. The user interface in Docker Cloud makes it easy to manage nodes on all supported cloud platforms and is able to deploy application stacks in containers, defined in a “Stack YAML” file. This Stack files are very similar to Docker Compose files, but with additional options, e.g. to define deployment strategies for the containers. The graphical user interface helps to view and modify container configurations.

Read More

Monitoring rsyslog with Kibana and SPM

A while ago we published this post where we explained how you can get stats about rsyslog, such as the number of messages enqueued, the number of output errors and so on. The point was to send them to Elasticsearch (or Logsene, our logging SaaS, which exposes the Elasticsearch API) in order to analyze them.

This is part 2 of that story, where we share how we process these stats in production. We’ll cover:

  • an updated config, working with Elasticsearch 2.x
  • what Kibana dashboards we have in Logsene to get an overview of what rsyslog is doing
  • how we send some of these metrics to SPM as well, in order to set up alerts on their values: both threshold-based alerts and anomaly detection

Read More

How to Ship Heroku Logs to Logsene / Managed ELK Stack

Update: We made it much easier to ship Heroku logs. Please read 5 Minute Recipe: Heroku Log Drain Setup for more details. 

Heroku is a cloud platform based on a managed container system, with integrated data services and a powerful ecosystem for deploying and running modern apps.  In this post we’ll show how you can ship logs from Heroku to Logsene, where you can then search your logs, get alerts based on log data, share log dashboards with your team, etc.

Watching Heroku logs in real-time in the terminal is easy using the “heroku logs” command, which is fine for ad-hoc log checks, but not for a serious production system.  For production, you want to collect, parse, and ship logs to a log management system, where rich reporting and troubleshooting can be done.  To do that the Heroku Log Drain setup is a must. What is a Heroku Log Drain and what does it do? In short, a Heroku Log Drain streams logs of your applications deployed on Heroku, either to a syslog or an HTTPS server.

When you have to deal with a large log volume a scalable log storage is required.  This is where Logsene comes into play. Logsene provides a hosted ELK Stack and is available On Premises and in the Cloud. Logagent-js is a smart log parser written in node.js, taking advantage of async I/O to receive, parse and ship logs – including routing different application logs to different full text indices. We made the Logagent-js deployment on Heroku very easy and scaling out for a very high log volume is just one “heroku scale web=N” command away.

Let’s have a look at the architecture of this setup:

  1. Heroku Apps configured with a Heroku Log Drain
  2. logagent-js to receive, parse and ship logs
  3. Logsene as backend to store all your logs

Step 1 – Create your Logsene App

If you don’t have a Logsene account already simply get a free account and create a Logsene App. This will get you a Logsene Application Token, which we’ll use in Step 3.

Step 2 – Deploy Logagent-js to Heroku


We’ve prepared a  “Deploy to Heroku” button – just click on it and enter a name for the deployed log agent in the Heroku UI:


Remember this name because we’ll need it later as the URL for the Log Drain.
Logagent-js can handle multiple Logsene tokens, which means it could be used for more than 1 Logsene app, simply addressed by /LOGSENE_TOKEN in the URL.

To run a short test without deploying logagent-js feel free to use the one we deployed for demos with the name “logsene-test”, reachable via

Step 3 – Configure Log Drain for your App

To configure the Heroku Log Drain we need the following information:

  1. The Logsene App Token
  2. The URL for the deployed logagent-js (e.g.
  3. The Heroku App ID or name of your Application on Heroku (e.g. web-app-1 in the example below)

Then we can use the Heroku command line tool, for example like this:

heroku drains:add –app web-app-1

Or we could use the Heroku API to activate the Log Drain:

curl -n -X POST


-d '{"url": "https://



-H "Content-Type: application/json" 
-H "Accept: application/vnd.heroku+json; version=3"

Step 4 – Watch your Logs in Logsene

If you now access your App, Heroku should log your HTTP request and a few seconds later the logs will be visible in Logsene. And not in just any format!  You’ll see PERFECTLY STRUCTURED HEROKU LOGS:



Like what you saw here? To start with Logsene get a free account here or drop us an email, hit us on Twitter.  Logagent-js is open-source – if you find any bugs please create an issue on Github with suggestions, questions or comments.