Recipe: rsyslog + Elasticsearch + Kibana

In this post you’ll see how you can take your logs with rsyslog and ship them directly to Elasticsearch (running on your own servers, or the one behind Logsene Elasticsearch API)  in such a way that you can use Kibana to search, analyze and make pretty graphs out of them.

Please tweet about Recipe: rsyslog + Elasticsearch + Kibana.

This is especially useful when you have a lot of servers logging [a lot of data] to their syslog daemons and you want a way to search them quickly or do statistics on the logs. You can use rsyslog’s Elasticsearch output to get your logs into Elasticsearch, and Kibana to visualize them. The only challenge is to get your rsyslog configuration right, so your logs end up where Kibana is expecting them. In this post, we’ll get you through the basics; if you want a more complete tutorial (including how to parse unstructured data and use rsyslog as a high-performance buffer), we suggest to have a look at our updated recipe here.

Note: that if this sort of stuff excites you, we are both hiring (from devops and core product engineering to marketing and sales) and working on Logsene – a log and data analytics product/service to complement SPM.

Getting all the ingredients

Here’s what you’ll need:

  • a recent version of rsyslog (v7+, if you ask me. The Elasticsearch output is available since 6.4.0). You can download and compile it yourself, or you can get it from the RHEL/CentOS or Ubuntu repositories provided by the maintainers
  • the Elasticsearch output plugin for rsyslog. If you compile rsyslog from sources, you’ll need to add the –enable-elasticsearch parameter to the configure script. If you use the repositories, just install the rsyslog-elasticsearch package
  • Elasticsearch :). You have a DEB and a RPM there, which should get you started in no time. If you choose the tar.gz archive, you might find the installation instructions useful
  • Kibana 3 and a web server to serve it. There are installation instructions on the GitHub page. To get started quickly, you can just clone the repository somewhere, then go into the “kibana” directory:

git clone https://github.com/elasticsearch/kibana.git

cd kibana

Then, you’ll probably need to edit config.js to change the Elasticsearch host name from “localhost” to the actual FQDN of the host that’s running Elasticsearch. This applies even if Kibana is on the same machine as Elasticsearch. “localhost” only works if your browser is on the same machine as Elasticsearch, because Kibana talks to Elasticsearch directly from your browser.

Finally, you can serve the Kibana page with any HTTP server you prefer. If you want to get started quickly, you can try SimpleHTTPServer, which should be embedded to any recent Python, by running this command from the “kibana” directory:

python -m SimpleHTTPServer

Putting them all together

Kibana is, by default, expecting Logstash to send logs to Elasticsearch. So “putting them all together” here means “configuring rsyslog to send logs to Elasticsearch in the same manner Logstash does”. And Logstash, by default, has some particular ways when it comes to naming the indices and formatting the logs:

  • indices should be formatted like logstash-YYYY.MM.DD. You can change the pattern Kibana is looking for, but we won’t do that here
  • logs must have a timestamp, and that timestamp must be stored in the @timestamp field. It’s also nice to put the message part in the message field – because Kibana shows it by default

To satisfy the requirements above, here’s a rsyslog configuration that should work for sending your local syslog logs to Elasticsearch in a Logstash/Kibana-friendly way:

module(load="imuxsock")             # for listening to /dev/log
module(load="omelasticsearch") # for outputting to Elasticsearch
# this is for index names to be like: logstash-YYYY.MM.DD
template(name="logstash-index"
  type="list") {
    constant(value="logstash-")
    property(name="timereported" dateFormat="rfc3339" position.from="1" position.to="4")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339" position.from="6" position.to="7")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339" position.from="9" position.to="10")
}

# this is for formatting our syslog in JSON with @timestamp
template(name="plain-syslog"
  type="list") {
    constant(value="{")
      constant(value="\"@timestamp\":\"")     property(name="timereported" dateFormat="rfc3339")
      constant(value="\",\"host\":\"")        property(name="hostname")
      constant(value="\",\"severity\":\"")    property(name="syslogseverity-text")
      constant(value="\",\"facility\":\"")    property(name="syslogfacility-text")
      constant(value="\",\"tag\":\"")   property(name="syslogtag" format="json")
      constant(value="\",\"message\":\"")    property(name="msg" format="json")
    constant(value="\"}")
}

# this is where we actually send the logs to Elasticsearch (localhost:9200 by default)
action(type="omelasticsearch"
    template="plain-syslog"
    searchIndex="logstash-index"
    dynSearchIndex="on"
    bulkmode="on"  # use the bulk API
    action.resumeretrycount="-1"  # retry indefinitely if Logsene/Elasticsearch is unreachable
)

After restarting rsyslog, you can go to http://host-serving-Kibana:8000/ in your browser and start searching and graphing your logs:

Kibana screenshot
Digging into syslog with Kibana

Tips

Now that you got the essentials working, here are some tips that might help you go even further with your centralized logging setup:

  • you might not want to put the new rsyslog and omelasticsearch on all your servers. In this case you can forward them over the network to a central rsyslog that has omelasticsearch, and push your logs to Elasticsearch from there. Some information on forwarding logs via TCP can be found here and here
  • you might want rsyslog to buffer your logs (in memory, on disk, or some combination of the two), in case Elasticsearch is not available for some reason. Buffering will also help performance, as you can send messages in bulks instead of one by one. There’s a reference on buffers with rsyslog&omelasticsearch here
  • you might want to parse JSON-formatted (CEE) syslog messages. If you’re using them, check our earlier post on the subject

If you don’t want to worry about any of that, you might want to check out Logsene. This is our new data & log analytics service, where you can just send your syslog messages (CEE-formatted or not) and not worry about running and maintaining a logging cluster in house. We’ll index them for you, and provide a nice interface to search and graph those logs. We also expose an Elasticsearch HTTP API, so Logsene plays nicely with Logstash, rsyslog+omelasticsearch, Kibana, and virtually any other logging tool that can send logs to Elasticsearch.

41 thoughts on “Recipe: rsyslog + Elasticsearch + Kibana

  1. template(name=”idapindex”
    type=”list”) {
    constant(value=”Idaphealthcheckup_”)
    property(name=”timereported” dateFormat=”rfc3339″ position.from=”1″ position.to=”4″)
    constant(value=”.”)
    property(name=”timereported” dateFormat=”rfc3339″ position.from=”6″ position.to=”7″)
    constant(value=”.”)
    property(name=”timereported” dateFormat=”rfc3339″ position.from=”9″ position.to=”10″)
    }
    *.* action(type=”omelasticsearch” dynSearchIndex=”on” searchIndex=”idapindex” searchType=”test” bulkmode=”on” server=”10.0.28.162″ serverport=”9200″ template=”customTemplate”)

    but indexing is not happening in elasticsearch server node
    and messages are not passed to es node

    if use this same thing as follows
    ##config

    template(name=”idapindex”
    type=”list”) {
    constant(value=”Idaphealthcheckup_”)
    property(name=”hostname”)
    }

    *.* action(type=”omelasticsearch” dynSearchIndex=”on” searchIndex=”idapindex” searchType=”test” bulkmode=”on” server=”10.0.28.162″ serverport=”9200″ template=”customTemplate”)

    it work fine with property hostname

    i need above part
    can u help me out to solve this problem

    1. This is interesting, because the template looks OK. Do you have any errors on the Elasticsearch side? If not, can you try an “omfile” action with that template, to see what it generates? Or to start rsyslog in debug mode and see where it pushes the logs and if it gets any interesting message back from Elasticsearch?

  2. This recipe makes rsyslogd on ubuntu 16.04 segfault (!) I’ve narrowed it down to soemthing in the second template (after trying with just the first and then the first and second). I was orginally thinking some kind of escaping error due to all the quoted quotes, but I replaced them with \x22 and it still segfaults. any ideas?

      1. I don’t get why the segfault happened but the config was wrong. In the template there are quite a few quotes that should be escaped – somehow the backslashes were gone 🙂 It should work now.

        I also recommend using the latest version of rsyslog, they have a PPA here that you can use: http://www.rsyslog.com/ubuntu-repository/

  3. HI Radu, I have a question about the reliability of rsysog + ES.
    When my ES server is down , it will happen that the high CPU and memory usage on rsyslogd process.And after I restart my ES server, parts of log will lost because the spool directory is always full.
    How can I fix my configuration to avoid this issue?I can monitor the ES server and restart it soon.

    1. Hi Louis,

      By default rsyslog won’t retry on a message failure, it will drop the message. I’m not sure why you have high CPU and memory usage, maybe you did configure retrying, the in-memory queue gets full and then it starts dropping messages. This would explain both the high CPU (because it keeps retrying) and the high memory (because the queue gets full).

      That said, you have control over a few things:
      – how long it retries before dropping (action.resumeretrycount). Set it to “-1” in your omelasticsearch action to retry forever
      – how long to wait between retries (action.resumeInterval)
      – how much memory the queue can take before it spills to disk (if you want it to) and eventually drops messages

      Some references:
      – a more complete blog post about using rsyslog with Elasticsearch, explaining all of the above: https://sematext.com/blog/2015/10/05/recipe-apache-logs-rsyslog-parsing-elasticsearch/
      – rsyslog docs on action parameters (including the above): http://www.rsyslog.com/doc/master/configuration/actions.html
      – rsyslog docs on queue parameters: http://www.rsyslog.com/doc/master/rainerscript/queue_parameters.html
      – old but good rsyslog doc explaining how queues work (I mention “old” because you’ll see old-style directives instead of the new ones that we normally use. Names are consistent, though, you can easily translate): http://www.rsyslog.com/doc/master/concepts/queues.html

      1. Hi Radu, we have an issue for this. We set the “queue.maxdiskspace=1G” and “action.resumeretrycount=-1” for “omelasticsearch”, so we found if the ES is down, rsyslog will retry to send the message to ES whose size is 1G(from the debug of tcpdump), I understand rsyslog should send message one by one, not the whole disk (1G). Is there any other setting for resolving this issue?
        we set the following setting for “omelasticsearch”:
        action.resumeInterval => is not set
        queue.type=”linkedlist”
        queue.filename = XXX
        queue.size=”5000″
        queue.highwatermark = “4500”
        queue.lowwatermark = “3750”
        Could you help give some advises for the detailed setting of this?
        Thanks a lot.

        1. You mean it tries to send a bulk of 1GB? That doesn’t sound right. Normally you’d set bulkmode=”on” in the omelasticsearch action and then the maximum bulk size (in number of messages) would be determined by the queue.dequeueBatchSize value.

          1. Thanks for your quick reply.
            In out setting the bulkmode=”on” and set “queue.dequeueBatchSize=300”, “queue.size=”5000”. but the rsyslog process sent a bulk of 1GB to the ES( when ES is restart after it down long time, such as 7*24 hours), so the ES will drop this bulk and response code is 413. And then the rsyslog will loop to retry to send(action.resumeretrycount=-1).
            at last the rsyslog exhaust the cpu and memory.
            I’m not sure whether there is other setting relate to this issue.

          2. Hi Louis,

            I’m not sure why that 1GB bulk/message was being sent, I’ve never encountered that. Normally, it should obey dequeueBatchSize. If you can reproduce this scenario, I suggest you open an issue at https://github.com/rsyslog/rsyslog with details like your operating system and rsyslog version.

  4. Hi guys,
    my omelasticsearch rsyslog module is not creating indexes.

    How can I debug if it is working or not?

    1. Hi Roberto,

      First of all, let me point you to the updated recipe. For example, omelasticsearch can be configured to retry, and you can keep messages in the queue in the meantime: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-missing-query.html

      When I have rsyslog->Elasticsearch issues, I normally do three things:
      – check the Elasticsearch logs for errors (if you want to centralize those, here’s a blog post: https://sematext.com/blog/2015/01/19/grok-elasticsearch-logs-with-logstash/ and we’ll make a rsyslog config available for it soon)
      – start rsyslog in foreground (rsyslog -n). This will show any errors that you may have in the config (note that it won’t stop with CTRL+C, you’ll need to kill the process from another terminal)
      – start rsyslog in debug mode (rsyslog -dn). Then you’ll see what happened when the omelasticsearch action is triggered

      If you still can’t figure out what the problem is, you can make rsyslog write down the responses it got from Elasticsearch. Check the errorFile references in the documentation: http://www.rsyslog.com/doc/v8-stable/configuration/modules/omelasticsearch.html

      1. I think the problem is that omelasticsearch module is not able to create the index.
        Here the abstract of:

        rsyslogd -nd

        6055.457310049:7fa099b35700: omelasticsearch: pData reply: ‘{“error”:{“root_cause”:[{“type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_expression”,”resource.id”:”logstash-2016.04.06″,”index”:”logstash-2016.04.06″}],”type”:”index_not_found_exception”,”reason”:”no such index”,”resource.type”:”index_expression”,”resource.id”:”logstash-2016.04.06″,”index”:”logstash-2016.04.06″},”status”:404}’

        elasticsearch log

        [2016-04-06 15:11:46,308][WARN ][action.index ] [node-1] unexpected error during the primary phase for action [indices:data/write/index], request [index {[logstash-2016.04.06][events][null], source[{“@timestamp”:”2016-04-06T14:58:56+02:00″,”host”:”itnu1lp07″,”severity”:”info”,”facility”:”local3″,”tag”:”rgmanager[102340]:”,”message”:” [script] Executing /usr/local/bin/change_src_address.sh status”}]}]
        [logstash-2016.04.06] IndexNotFoundException[no such index]

        But I don’t understand why the index file is not created.

        Some suggestions?

        Thank You

        1. I think this is down to Elasticsearch’s configuration. By default, indices are created automatically if they’re not there, but that can be changed. omelasticsearch doesn’t create indices, it only pushes messages. Normally, you’ll add a template to Elasticsearch that allows one to push logs to a new index – so that the new index gets created with the settings you want. Here’s a blog post that explains a bit about templates: https://sematext.com/blog/2015/02/09/elasticsearch-mapping-types-for-json-logging/

          1. Thank You very much Gheorghe for you support.
            Now I’ve a very clear idea how stuff work.
            So to close this post, I can only and simply say that there was a BIG error in the elasticsearch.yml for my mistake:

            index.mapper.dynamic: false

            not remarked as in the default
            So a time I’ve remarked the line (#index.mapper.dynamic: false), the elasticsearch indexes works like a charme!! So simply 😉
            Thank You again.
            Roberto

  5. Thanks for writing this, it’s great. I have an issue in that some of our hostnames have dashes in them and I guess because the host field is analyzed it’s splitting them in two when I do visualizations, queries, etc. I would like to change that filed to not analyzed. Is that something that I can change in the rsyslog.conf in this line?
    constant(value=””,”host”:””) property(name=”hostname”)

    thanks,
    mark

    1. You’re right, not analyzed is probably a better setting for the hostname field. That’s how we have it in Logsene as well. To do that on your own Elasticsearch, you can’t do it from rsyslog’s conf: you’d need to put a mapping that specifies index=not_analyzed in the definition of your hostname field. Here’s how you’d do it for one index http://www.elastic.co/guide/en/elasticsearch/reference/master/indices-put-mapping.html

      There are two problems with this approach: one is that you can’t change the definition of an existing field from analyzed to not_analyzed (you need to remove the index, put it back with the new mapping, re-insert your logs). The other one is that each new index needs to have this setting (in this post I’ve set one index per day in the form of logstash-YYYY.MM.DD).

      Normally, you’d put an index template that will apply to all newly created indices. Here’s a link to the docs for index template: http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html

      If you need some inspiration as to what to put in that template, here’s what Logstash 1.4 puts by default: https://github.com/elastic/logstash/blob/1.4/lib/logstash/outputs/elasticsearch/elasticsearch-template.json

  6. Using newest version of rsyslogd and the elasticsearch module. The logs that get into elastic search is missing facility and priority. Any idea why?
    The _source field only contains this:
    {“@timestamp”:”2014-12-03T14:48:46.980334+01:00″,”host”:”riakcs”,”message”:” Server listening on :: port 22.”}

    1. Hi Stefan,

      That’s usually because of the template – can you make sure the facility and severity fields are there, like:

      constant(value=””,”facility”:””) property(name=”syslogfacility-text”)

      From your JSON it looks like those fields are completely missing, so if only the property is wrong, then the field should be empty, so the constant must be missing, too.

      Note that I wrote “severity” instead of “priority” because priority is a numerical field from the originating syslog message that encodes severity and facility together. I assumed you’re interested in severity and facility.

      I’ve tested this recipe with rsyslog 8.5.0, Elasticsearch 1.4.0 and Kibana 3.1.2 and things worked as expected. So if the problem persists for you, can you post your rsyslog and ES versions?

      1. Hi Radu,
        Thank you for answering!

        I have created a file with the following content, and placed it in /etc/rsyslog.d

        # START

        module(load=”omelasticsearch”) # for outputting to Elasticsearch
        # this is for index names to be like: logstash-YYYY.MM.DD
        template(name=”logstash-index”
        type=”list”
        option.json=”on”) {
        constant(value=”logstash-“)
        property(name=”timereported” dateFormat=”rfc3339″ position.from=”1″ position.to=”4″)
        constant(value=”.”)
        property(name=”timereported” dateFormat=”rfc3339″ position.from=”6″ position.to=”7″)
        constant(value=”.”)
        property(name=”timereported” dateFormat=”rfc3339″ position.from=”9″ position.to=”10″)
        }

        # this is for formatting our syslog in JSON with @timestamp
        template(name=”plain-syslog”
        type=”list”) {
        constant(value=”{“)
        constant(value=””@timestamp”:””) property(name=”timereported” dateFormat=”rfc3339″)
        constant(value=””,”host”:””) property(name=”hostname”)
        constant(value=””,”testhost”:””) property(name=”hostname”)
        constant(value=””,”severity”:””) property(name=”syslogseverity-text”)
        constant(value=””,”facility”:””) property(name=”syslogfacility-text”)
        constant(value=””,”tag”:””) property(name=”syslogtag” format=”json”)
        constant(value=””,”message”:””) property(name=”msg” format=”json”)
        constant(value=””}”)
        }
        # this is where we actually send the logs to Elasticsearch (localhost:9200 by default)
        action(type=”omelasticsearch”
        template=”plain-syslog”
        # server=”10.0.2.15″
        # serverport=”9090″
        searchIndex=”logstash-index”
        dynSearchIndex=”on”)

        #END

        I tried to insert a HTTP recording piece of software between rsyslogd and elasticsearch,
        and the fields wherent there. I tried to insert another field ‘testhost’ in the template above, and it appeared in the input to elasticsearch. But no severity or facility..

        My rsyslogd versions are
        ii rsyslog 8.6.0-2 amd64 reliable system and kernel logging daemon
        ii rsyslog-elasticsearch 8.6.0-2 amd64 Elasticsearch output support for rsyslog

        1. Hello Stefan and sorry for the late reply,

          I’ve just tried with 8.6.0 and the config file you provided (after I fixed the copy-paste errors because of the way characters were handled) and it worked. Here’s a sample log:

          {
          “@timestamp”: “2014-12-09T18:13:50.005905+02:00”,
          “host”: “rgheorghe-suse”,
          “testhost”: “rgheorghe-suse”,
          “severity”: “info”,
          “facility”: “syslog”,
          “tag”: “rsyslogd:”,
          “message”: ” [origin software=”rsyslogd” swVersion=”8.6.0″ x-pid=”4617″ x-info=”http://www.rsyslog.com”] start”
          }

          The “testhost” field is there, but so is severity and facility.

          The only difference I can think of is that you’re saying you’ve put your config in /etc/rsyslog.d. I’m putting this in /etc/rsyslog.conf with this at the beginning (as written in the post):

          module(load=”imuxsock”)

          The safer way is to put everything in one file to be sure what’s actually processed. You might have another config file that’s processed first.

          So maybe you can try with the same config in /etc/rsyslog.conf (and add the imuxsock line first, and whatever else you may use for listening). If it still doesn’t work, maybe we can move this to the rsyslog mailing list? A debug output might be useful in that case (i.e. start rsyslog via `rsyslogd -dn > debug.log`)

  7. Hello , i have a question can we connect elasticsearch outputs to a java application by a connector and how ? thks for your help i need it so much

    1. Hello! You mean to send your Java logs to Elasticsearch by using this recipe? If yes, you can do that in two main ways:
      1) have your java app log to syslog somehow. For example, using the log4j syslog appender: http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/net/SyslogAppender.html
      2) log to a file, and have rsyslog tail that file with the omfile input module: http://www.rsyslog.com/doc/imfile.html

      I would go with 1) if it’s possible for your use-case

  8. Pingback: Anonymous

Leave a Reply