black friday log management checklist

Black Friday log management (with the Elastic Stack) checklist

For this Black Friday, Sematext wishes you:

  • more products sold
  • more traffic and exposure
  • more logs 🙂

Now seriously, applications tend to generate a lot more logs on Black Friday, and they also tend to break down more – making those logs even more precious. If you’re using the Elastic Stack for centralized logging, in this post we’ll share some tips and tricks to prepare you for this extra traffic.

If you’re still grepping through your logs via ssh, doing that on Black Friday might be that more painful, so you have two options:

  • get started with the Elastic Stack now. Here’s a complete ELK howto. It should take you about an hour to get started and you can move on from there. Don’t forget to come back to this post for tips! 🙂
  • use Logsene, which takes care of the E(lasticsearch) and K(ibana) from ELK for you. Most importantly for this season, we take care of scaling Elasticsearch. You can get started in 5 minutes with Logstash or choose another log shipper. Anything that can push data to Elasticsearch via HTTP can work with Logsene, since it exposes the Elasticsearch API. So you can log directly from your app or from a log shipper (here are all the documented options).

Either way, let’s move to the tips themselves.

Tips for Logstash and Friends

The big question here is: can the pipeline easily max out Elasticsearch, or will it become the bottleneck itself? If your logs go directly from your servers to Elasticsearch, there’s little to worry about: as you spin more servers for Black Friday, your pipeline capacity for processing and buffering will grow as well.

You may get into trouble if your logs are funnelled through one (or a few) Logstash instances, though. If you find yourself in that situation you might check the following:

  • Bulk size. The ideal size depends on your Elasticsearch hardware, but usually you want to send a few MB at a time. Gigantic batches will put unnecessary strain on Elasticsearch, while tiny ones will add too much overhead. Calculate how many logs (of your average size) make up a few MB and you should be good.
  • Number of threads sending data. When one thread goes through a bulk reply, Elasticsearch shouldn’t be idling – it should get data from another thread. The optimal number of threads depends on whether these threads are doing something else (in Logstash, for example, pipeline threads also take care of parsing, which can be expensive) and on your destination hardware. As a rule of thumb, about 4 threads with few things to do (e.g. no grok or geoip in Logstash) per Elasticsearch data node should be enough to keep them busy. If threads have more processing to do, you may need more of them.
  • The same applies for processing data: many shippers work on logs in batches (recent versions of Logstash included) and can do this processing on multiple threads.
  • Distribute the load between all data nodes. This will prevent any one data node from becoming a hotspot. In Logstash specify an array of destination hosts. Or, you can start using Elasticsearch “client” nodes (with both and node.master set to false in elasticsearch.yml) and point Logstash to two of those (for failover).
  • The same applies for the shipper sending data to the central Logstash servers – the load needs to be balanced between them. For example, in Filebeat you can specify an array of destination Logstash hosts or you can use Kafka as a central buffer.
  • Make sure there’s enough memory to do the processing (and buffering, if the shipper buffers in memory). For Logstash, the default 1GB of heap may not cope with heavy load – depending on how much processing you do, it may need 2GB or more (monitoring Logstash’s heap usage will tell for sure).
  • If you use grok and have multiple rules, put the rules matching more logs and the cheaper ones earlier in the array. Or use Ingest Nodes to do the grok instead of Logstash.

Tips for Elasticsearch

Let’s just dive into them:

  • Refresh interval. There’s an older blog post on how refresh interval influences indexing performance. The conclusions from it are still valid today: for Black Friday at least, you might want to relax the real-time-ness of your searches to get more indexing throughput.
  • Async transaction log. By default, Elasticsearch will fsync the transaction log after every operation (2.x) or request (5.x). You can relax this safety guarantee by setting index.translog.durability to async. This way it will fsync every 5s (default value for index.translog.sync_interval) and save you some precious IOPS.
  • Size based indices. If you’re using strict time-based indices (like one index every day), Black Friday traffic may cause a drop in indexing throughput like this (mainly because of merges):


Indexing throughput graph from SPM Elasticsearch monitor

In order to continue writing at that top speed, you’ll need to rotate indices before they reach that “wall size”, which is usually at 5-10GB per shard. The point is to rotate when you reach a certain size, and not purely by time, and use an alias to always write to the latest index (in 5.x this is made easier with the Rollover Index API).

  • Ensure load is balanced across data nodes. Otherwise some nodes will become bottlenecks. This requires your number of shards to be proportional to the number of data nodes. Feel free to twist Elasticsearch’s arm into balancing shards by configuring index.routing.allocation.total_shards_per_node: for example, if you have 4 shards and one replica on a 4-data-node cluster, you’ll want a maximum of 2 shards per node.
  • Overshard so you can scale out if you need to, while keeping your cluster balanced. You’d do this by setting a [reasonable] number of shards that has enough divisors. For example, if you have 4 data nodes then 12 shards and 1 replica per shard might work well. You could scale up to 6, 8, 12 or even 24 nodes and your cluster will still be perfectly balanced.
  • Relax the merge policy. This will slow down your full-text searches a bit (though aggregations would perform about the same), use some more heap and open files in order to allow more indexing throughput. 50 segments_per_tier, 20 max_merge_at_once and 500mb max_merged_segment should give you a good boost.
  • Don’t store what you don’t need. Disable _all and search in specific fields (and search in “message” or some other general field by default via index.query.default_field to it). Skip indexing fields not used for full-text search and skip doc values for fields on which you don’t aggregate.
  • Use doc values for aggregations (instead of the in-memory field data) – this is the default for all fields except analyzed strings since 2.0, but you’ll need to be extra careful if you’re still on 1.x. Otherwise you’ll risk running out of heap and crash/slow down your cluster.
  • Use dedicated masters. This is also a stability measure that helps your cluster remain consistent even if load makes your data nodes unresponsive.

You’ll find even more tips and tricks, as well as more details on implementing the above, in our Velocity 2016 presentation. But the ones described above should give you the most bang per buck (or rather, per time, but you know what they say about time) for this Black Friday.

Final Words

Tuning & scaling Elasticsearch isn’t rocket science, but it often requires time, money or both. So if you’re not into taking care of all this plumbing, we suggest delegating this task to us by using Logsene, our log analytics SaaS. With Logsene, you’d get:

  • The same Elasticsearch API when it comes to indexing and querying. We have Kibana, too, in addition to our own UI, plus you can use Grafana Elasticsearch integration.
  • Free trials for any plan, even the Black Friday-sized ones. You can sign up for them without any commitment or credit card details.
  • No lock in – because of the Elasticsearch API, you can always go [back] to your own ELK Stack if you really want to manage your own Elasticsearch clusters. We can even help you with that via Elastic Stack consulting, training and production support.
  • A lot of extra goodies on top of Elasticsearch, like role-based authentication, alerting and integration with SPM for your application monitoring. This way you can have your metrics and logs in one place.

If, on the other hand, you are passionate about this stuff and work with it, you might like to hear that we’re hiring worldwide, on a wide range of positions (at the time of this writing there are openings for backend, frontend (UX, UI, ReactJS, Redux…), sales, work on Docker, consulting and training). 🙂

Elastic Stack Training

Elasticsearch Training, San Francisco & New York, October

If you are using Elasticsearch and are looking for Elasticsearch training to quickly improve your Elastic Stack skills, we’ve running several Elasticsearch classes this October in San Francisco and New York.

All classes are also available virtually. This means you get to participate in the class, see the whiteboard, see and hear the instructor as well as other attendees, and they get to see and hear you….. without you having to travel.

Have two people attend the training from the same company? The second one gets 25% off.

To see the full course details, full outline and information about the instructor, click on the class names below.

San Francisco:

New York City:

All classes include breakfast and lunch. If you have special dietary needs, please let us know ahead of time. If you have any questions about any of the classes you can use our live chat (see bottom-right of the page), email us at or call us at 1-347-480-1610.

5 Logstash Alternatives

When it comes to centralizing logs to Elasticsearch, the first log shipper that comes to mind is Logstash. People hear about it even if it’s not clear what it does:
– Bob: I’m looking to aggregate logs
– Alice: you mean… like… Logstash?

When you get into it, you realize centralizing logs often implies a bunch of things, and Logstash isn’t the only log shipper that fits the bill:

  • fetching data from a source: a file, a UNIX socket, TCP, UDP…
  • processing it: appending a timestamp, parsing unstructured data, adding Geo information based on IP
  • shipping it to a destination. In this case, Elasticsearch. And because Elasticsearch can be down or struggling, or the network can be down, the shipper would ideally be able to buffer and retry

In this post, we’ll describe Logstash and its alternatives – 5 “alternative” log shippers (Filebeat, Fluentd, rsyslog, syslog-ng and Logagent), so you know which fits which use-case.
Read More

Elastic Stack Import-Export with Logstash & Logsene

In earlier posts, we explained how one can reindex data from one Elasticsearch cluster to another, or within the same Elasticsearch cluster, via tools like Logstash and rsyslog.

The same recipes apply to Logsene, as it exposes the Elasticsearch API. Not only can you push data to Logsene with everything that talks to Elasticsearch (such as Logstash), but you can also use Elasticsearch’s Scroll API to export data from Logsene. All you need to remember is that with Logsene, you need to specify your app token as the index name.

Migrating data from your in-house ELK stack to Logsene

Let’s say you already have an Elastic stack deployed, but you want to migrate existing logs to Logsene. Maybe because you’re spending too much time and money on managing and scaling Elasticsearch, and you’d like to outsource that. Or because you’d like built-in features of Logsene like role-based access control or anomaly detection. Either way, you can migrate your data and keep using Elasticsearch-focused tools:

input {
  elasticsearch {
   hosts => ["localhost:9200"]
   index => "logstash-*"

output {
  elasticsearch {
    hosts => ""
    manage_template => false

NOTE: Since Logsene plans are based on ingestion volume and retention, that initial import throughput spike may influence your costs. That shouldn’t be a problem if you just started and have a big enough trial plan. Even if the trial is over and go over the selected plan, you’ll pay at the same per-GB rate.

Reindexing data from one Logsene app to another

Let’s say you’re prototyping, you’re tweaking your Logstash grok rules, but you’d like to use a custom template. For the new template to apply, you’ll need a new index (i.e. a new Logsene app). So you can go ahead and create it, and then reindex the data from the first app with Logstash. Here’s a sample config (though you can also add filters to change data along the way). Except now, the source is not your in-house Elasticsearch cluster, but a Logsene app that already has logs you want to reindex:

input {
  elasticsearch {
   hosts => [""]

output {
  elasticsearch {
    hosts => ""
    manage_template => false

NOTE: If you want SSL encryption, just add ssl => true and change the port to 443.

Exporting data from Logsene

Even if Logsene comes with Amazon S3 log archiving, you might need to export your logs somewhere else using – you guessed it! – a similar config:

input {
  elasticsearch {
   hosts => [""]
   index => "LOGSENE_APP_TOKEN"

output {
  file {
    path => "/mnt/big_disk/big_log"

See? No lock-in! With Logsene you can also easily go back to self-hosted, if you want to build something custom around your ELK stack for example. We can actually help you with that, through Elasticsearch and logging trainings and through logging consulting.

Elasticsearch / Elastic Stack Training – NYC June 13-16

Next month, June 13-16, 2016, we will be running three Elastic Stack (aka ELK Stack) classes in New York City:

  1. June 13 & 14: Elasticsearch for Developers Training Workshop
  2. June 15: Elasticsearch Operations Training Workshop
  3. June 16: Elasticsearch for Logging Training Workshop

All classes cover Elasticsearch 2.x as well as Elasticsearch 5.x!

You can see the complete course outlines under Training Overview.  All three classes include lots of valuable hands-on exercises.  Be prepared to learn a lot!


  • 2-day course: $1,200 early bird rate (valid through June 1) and $1,500 afterwards.
  • 1-day course: $700 early bird rate (valid through June 1) and $800 afterwards.

There’s also a 50% discount for the purchase of a 2nd seat!

462 7th Avenue, New York, NY 10018 – see map

If you have any questions please get in touch.

Elasticsearch Ingest Node vs Logstash Performance

Starting from Elasticsearch 5.0, you’ll be able to define pipelines within it that process your data, in the same way you’d normally do it with something like Logstash. We decided to take it for a spin and see how this new functionality (called Ingest) compares with Logstash filters in both performance and functionality.

Specifically, we tested the grok processor on Apache common logs (we love logs here), which can be parsed with a single rule, and on CISCO ASA firewall logs, for which we have 23 rules. This way we could also check how both Ingest and Logstash scale when you start adding more rules.

Baseline performance

To get a baseline, we pushed logs with Filebeat 5.0alpha1 directly to Elasticsearch, without parsing them in any way. We used an AWS c3.large for Filebeat (2 vCPU) and a c3.xlarge for Elasticsearch (4 vCPU). We also installed SPM to monitor Elasticsearch’s performance.

It turned out that network was the bottleneck, which is why pushing raw logs doesn’t saturate the CPU:
raw logs CPU

Even though we got a healthy throughput rate of 12-14K EPS:
raw logs throughput

But raw, unparsed logs are rarely useful. Ideally, you’d log in JSON and push directly to Elasticsearch. Conveniently, Filebeat can parse JSON since 5.0. That said, throughput dropped to about 4K EPS because JSON logs are bigger and saturate the network:
Throughput of JSON logs

CPU dropped as well, but not that much because now Elasticsearch has to do more work (more fields to index):

This 4K EPS throughput/40 percent CPU ratio is the most efficient way to send logs to Elasticsearch – if you can log in JSON. If you can’t, you’ll need to parse them. So we added another c3.xl instance (4 vCPUs) to do the parsing, first with Logstash, then with a separate Elasticsearch dedicated Ingest node.


With Logstash 5.0 in place, we pointed Filebeat to it, while tailing the raw Apache logs file. On the Logstash side, we have a beats listener, a grok filter and an Elasticsearch output:

input {
  beats {
    port => 5044

filter {
   grok {
     match => ["message", "%{COMMONAPACHELOG}%{GREEDYDATA:additional_fields}"]

output {
  elasticsearch {
    hosts => ""
    workers => 4

The default number of 2 pipeline workers seemed enough, but we’ve specified more output workers to make up for the time each of them waits for Elasticsearch to reply. That said, network was again the bottleneck so throughput was capped at 4K EPS like with JSON logs:
Logstash apache logs throughput

Meanwhile, Logstash used just about the same amount of CPU as Elasticsearch, at 40-50%:
Logstash apache logs CPU usage

Then we parsed CISCO ASA logs. The config looks similar, except there were 23 grok rules instead of one. Logstash handled the load surprisingly well – throughput was again capped by the network, slightly lower than before because JSONs were bigger:
Logstash CISCO ASA grok throughput

While CPU usage only increased to 60-70%:
Logstash CISCO ASA CPU usage

This means the throughput-to-CPU ratio only went down by about 1.5x after adding a lot more rules. However, in both cases Logstash proved pretty heavy, using about the same CPU to parse the data as Elasticsearch used for indexing it. Let’s see if the Ingest node can do better.

Ingest node

We used the same c3.xl instance for Ingest node tests: we’ve set node.master and to false in its elasticsearch.yml, to make sure it only does grok and nothing else. We’ve also set node.ingest to false of the data node, so it can focus on indexing.

Next step was to define a pipeline that does the grok processing on the Ingest node:

curl -XPOST localhost:9200/_ingest/pipeline/apache?pretty -d '{
  "description": "grok apache logs",
  "processors": [
      "grok": {
        "field": "message",
        "pattern": "%{COMMONAPACHELOG}%{GREEDYDATA:additional_fields}"

Then, to trigger the pipeline for a certain document/bulk, we added the name of the defined pipeline to the HTTP parameters like pipeline=apache. We used curl this time for indexing, but you can add various parameters in Filebeat, too.

With Apache logs, the throughput numbers were nothing short of impressive (12-16K EPS):
ingest node apache logs grok throughput

This used up all the CPU on the data node, while the ingest node was barely breaking a sweat at 15%:
ingest node grok apache logs CPU usage

Because Filebeat only sent raw logs to Elasticsearch (specifically, the dedicated Ingest node), there was less strain on the network. The Ingest node, on the other hand, also acted like a client node, distributing the logs (now parsed) to the appropriate shards, using the node-to-node transport protocol. Overall, the Ingest node provided ~10x better CPU-to-throughput ratio than Logstash.

Things still look better, but not this dramatic, with CISCO ASA logs. We have multiple sub-types of logs here, and therefore multiple grok rules. With Logstash, you can specify an array of match directives:

grok {
  match => [
   "cisco_message", "%{CISCOFW106001}",
   "cisco_message", "%{CISCOFW106006_106007_106010}",

There’s no such thing for Ingest node yet, so you need to define one rule, and then use the on_failure block to define another grok rule (effectively saying “if this rule doesn’t match, try that one”) and keep nesting like that until you’re done:

"grok": {
  "field": "cisco_message",
  "pattern": "%{CISCOFW106001}",
  "on_failure": [
      "grok": {
      "field": "cisco_message",
      "pattern": "%{CISCOFW106006_106007_106010}",
      "on_failure": [...

The other problem is performance. Because now there are up to 23 rules to evaluate, throughput goes down to about 10K EPS:
Ingest node CISCO ASA grok throughput

And the CPU bottleneck shifts to the Ingest node:
Ingest node CISCO ASA grok CPU

Overall, the throughput-to-CPU ratio of the Ingest node dropped by a factor of 9 compared to the Apache logs scenario.


  • Logstash is easier to configure, at least for now, and performance didn’t deteriorate as much when adding rules
  • Ingest node is lighter across the board. For a single grok rule, it was about 10x faster than Logstash
  • Ingest nodes can also act as “client” nodes
  • Define the grok rules matching most logs first, because both Ingest and Logstash exit the chain on the first match by default

You’ve made it all the way down here? Bravo! If you need any help with Elasticsearch – don’t forget @sematext does Elasticsearch Consulting, Production Support, as well as Elasticsearch Training.

Sending your Windows Event Logs to Logsene using NxLog and Logstash

There are a lot of sources of logs these days. Some may come from mobile devices, some from your Linux servers used to host data, while other can be related to your Docker containers. They are all supported by Logsene. What’s more, you can also ship logs from your Microsoft Windows based hosts and visualize them using Logsene. In this blog post we’ll show how to send your Windows Event Logs to Logsene in a way that will let you build great visualizations and really see what is happening on your Windows-based systems.
Read More

Recipe: rsyslog + Kafka + Logstash

This recipe is similar to the previous rsyslog + Redis + Logstash one, except that we’ll use Kafka as a central buffer and connecting point instead of Redis. You’ll have more of the same advantages:

  • rsyslog is light and crazy-fast, including when you want it to tail files and parse unstructured data (see the Apache logs + rsyslog + Elasticsearch recipe)
  • Kafka is awesome at buffering things
  • Logstash can transform your logs and connect them to N destinations with unmatched ease

There are a couple of differences to the Redis recipe, though:

  • rsyslog already has Kafka output packages, so it’s easier to set up
  • Kafka has a different set of features than Redis (trying to avoid flame wars here) when it comes to queues and scaling

As with the other recipes, I’ll show you how to install and configure the needed components. The end result would be that local syslog (and tailed files, if you want to tail them) will end up in Elasticsearch, or a logging SaaS like Logsene (which exposes the Elasticsearch API for both indexing and searching). Of course you can choose to change your rsyslog configuration to parse logs as well (as we’ve shown before), and change Logstash to do other things (like adding GeoIP info).

Getting the ingredients

First of all, you’ll probably need to update rsyslog. Most distros come with ancient versions and don’t have the plugins you need. From the official packages you can install:

If you don’t have Kafka already, you can set it up by downloading the binary tar. And then you can follow the quickstart guide. Basically you’ll have to start Zookeeper first (assuming you don’t have one already that you’d want to re-use):

bin/ config/

And then start Kafka itself and create a simple 1-partition topic that we’ll use for pushing logs from rsyslog to Logstash. Let’s call it rsyslog_logstash:

bin/ config/
bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic rsyslog_logstash

Finally, you’ll have Logstash. At the time of writing this, we have a beta of 2.0, which comes with lots of improvements (including huge performance gains of the GeoIP filter I touched on earlier). After downloading and unpacking, you can start it via:

bin/logstash -f logstash.conf

Though you also have packages, in which case you’d put the configuration file in /etc/logstash/conf.d/ and start it with the init script.

Configuring rsyslog

With rsyslog, you’d need to load the needed modules first:

module(load="imuxsock")  # will listen to your local syslog
module(load="imfile")    # if you want to tail files
module(load="omkafka")   # lets you send to Kafka

If you want to tail files, you’d have to add definitions for each group of files like this:


Then you’d need a template that will build JSON documents out of your logs. You would publish these JSON’s to Kafka and consume them with Logstash. Here’s one that works well for plain syslog and tailed files that aren’t parsed via mmnormalize:

template(name="json_lines" type="list" option.json="on") {
  property(name="timereported" dateFormat="rfc3339")

By default, rsyslog has a memory queue of 10K messages and has a single thread that works with batches of up to 16 messages (you can find all queue parameters here). You may want to change:
– the batch size, which also controls the maximum number of messages to be sent to Kafka at once
– the number of threads, which would parallelize sending to Kafka as well
– the size of the queue and its nature: in-memory(default), disk or disk-assisted

In a rsyslog->Kafka->Logstash setup I assume you want to keep rsyslog light, so these numbers would be small, like:

  queue.workerthreads="1"      # threads to work on the queue
  queue.dequeueBatchSize="100" # max number of messages to process at once
  queue.size="10000"           # max queue size

Finally, to publish to Kafka you’d mainly specify the brokers to connect to (in this example we have one listening to localhost:9092) and the name of the topic we just created:


Assuming Kafka is started, rsyslog will keep pushing to it.

Configuring Logstash

This is the part where we pick the JSON logs (as defined in the earlier template) and forward them to the preferred destinations. First, we have the input, which will use to the Kafka topic we created. To connect, we’ll point Logstash to Zookeeper, and it will fetch all the info about Kafka from there:

input {
  kafka {
    zk_connect => "localhost:2181"
    topic_id => "rsyslog_logstash"

At this point, you may want to use various filters to change your logs before pushing to Logsene or Elasticsearch. For this last step, you’d use the Elasticsearch output:

output {
  elasticsearch {
    hosts => "" # it used to be "host" and "port" pre-2.0
    ssl => "true"
    index => "your Logsene app token goes here"
    manage_template => false
    #protocol => "http" # removed in 2.0
    #port => "443" # removed in 2.0

And that’s it! Now you can use Kibana (or, in the case of Logsene, either Kibana or Logsene’s own UI) to search your logs!

Recipe: rsyslog + Redis + Logstash

OK, so you want to hook up rsyslog with Logstash. If you don’t remember why you want that, let me give you a few hints:

  • Logstash can do lots of things, it’s easy to set up but tends to be too heavy to put on every server
  • you have Redis already installed so you can use it as a centralized queue. If you don’t have it yet, it’s worth a try because it’s very light for this kind of workload.
  • you have rsyslog on pretty much all your Linux boxes. It’s light and surprisingly capable, so why not make it push to Redis in order to hook it up with Logstash?

In this post, you’ll see how to install and configure the needed components so you can send your local syslog (or tail files with rsyslog) to be buffered in Redis so you can use Logstash to ship them to Elasticsearch, a logging SaaS like Logsene (which exposes the Elasticsearch API for both indexing and searching) so you can search and analyze them with Kibana:


Read More

Replaying Elasticsearch Slowlogs with Logstash and JMeter

Sometimes we just need to replay production queries – whether it’s because we want a realistic load test for the new version of a product or because we want to reproduce, in a test environment, a bug that only occurs in production (isn’t it lovely when that happens? Everything is fine in tests but when you deploy, tons of exceptions in your logs, tons of alerts from the monitoring system…).

With Elasticsearch, you can enable slowlogs to make it log queries taking longer (per shard) than a certain threshold. You can change settings on demand. For example, the following request will record all queries for test-index:

curl -XPUT localhost:9200/test-index/_settings -d '{
  "" : "1ms"

You can run those queries from the slowlog in a test environment via a tool like JMeter. In this post, we’ll cover how to parse slowlogs with Logstash to write only the queries to a file, and how to configure JMeter to run queries from that file on an Elasticsearch cluster.

Read More