Registration is open - Live, Instructor-led Online Classes - Elasticsearch in March - Solr in April - OpenSearch in May. See all classes

Elasticsearch Monitoring Guide

This article (the first of a four-part series) explains how to get started developing an Elasticsearch monitoring strategy. In subsequent articles (part 2, part 3 and part 4), we’ll discuss top 10 Elasticsearch metrics to monitor, followed by Elasticsearch open source monitoring tools, then explore how to monitor Elasticsearch with Sematext.

Let’s dive it starting from the very beginning with “What is Elasticsearch?” question.

What is Elasticsearch?

Search and Analytics are key features of modern software applications. Scalability and the capability to handle large volumes of data in near real-time is demanded by many applications such as mobile apps, web, and data analytics applications. Today, autocomplete in text fields, search suggestions, location search, and faceted navigation are standards in usability.

Elasticsearch is an open-source text search engine based on Lucene, initially published by Shay Bannon in 2010. It is based on a “shared nothing architecture” and has features like easy scalability, near real-time search, and aggregations (facets), paired with developer-friendly APIs and client libraries for many programming languages.  These features made Elasticsearch widely used for the development and integration of search and analytics functions, as well as for DevOps uses, such as log searching. The availability of special data types such as IP Addresses and Geo-Shapes supports a broad range of possible applications.

Distributed systems are complex, but Elasticsearch makes many decisions automatically and provides a clear API for client processes. Scaling Elasticsearch is, therefore, much easier than with many other systems, though large Elasticsearch clusters come with their set of issues and often require Elasticsearch expertise. Every application has different requirement profiles for Elasticsearch optimizations:

E-Commerce-Systems    Centralized Logging    Geo-Location Apps    
Geo-Queries   MediumLowHigh


Please note: Tuning is essential! Any system tuning must be supported by performance measurements; that’s why a clear understanding of monitoring and the implications of changed metrics is essential for anyone using Elasticsearch seriously.

This article introduces the methods for monitoring and tuning, explains how Elasticsearch works and puts the focus on the most relevant system settings and metrics.

Performance Monitoring

Before digging into the details of Elasticsearch, cluster setups, and performance metrics, we need to introduce a methodology to shift the information from this blog post into practice. As in other engineering disciplines, there is development, quality assurance, and operations — executed in iterative steps for all components. All phases involve measurements of metrics (you’ll learn about top 10 Elasticsearch metrics to watch in part 2 of this blog post series) to meet given specifications or prove assumptions made by the engineers or their spiky-haired bosses. In the software industry the specifications are often not clear or change rapidly; nevertheless, failures in backend services should not reach end users — this is why DevOps professionals like to know the capabilities and the current state of the systems they operate.


Iteration for system tuning

Where exactly is detailed application performance monitoring required?

  • Development and  POCs
    • See how the use case and implementation affect system performance. The result might lead to better algorithms or different concepts in the application or an improved setup for the specific use case.
  • QA – Benchmark and stress tests
    • See how the setup performs under load. The result might lead to the tuning of specific settings,  changes in infrastructure or optimized algorithms.
  • Operation of production clusters
    • See how reliable the current setup is. It might help to detect bottlenecks, plan for new resources and support root cause analysis.
    • Proactive monitoring and taking resulting actions is an effective protection against system failures before they happen.

To complete the picture on this often underestimated issue, here are a few more hints to make these procedures effective:

  • Keep monitoring and logging independent from the production cluster – metrics and logs generate continuously larger data volumes at a high rate. During a critical system state it could cause a system overload or data loss — imagine what happens when many processes log a growing number of errors to an unhealthy cluster!
  • Correlate Events, Logs and Metrics – metrics indicate that something happened; the evidence of what exactly happened is often available in logs and system events.
  • Include monitoring of surrounding systems – the best case would be to cover the whole application stack, including OS and network metrics. For example, Elasticsearch integrates with Hadoop and is often used behind reverse proxy servers. If one of the components causes trouble only a side effect might be visible with partial monitoring coverage.
  • Share reports in the organization – it’s much easier to communicate when everybody on the team has direct access to the facts:
    • save the results of measurements before and after testing
    • take screenshots or store links to the graphs
    • add it to the documentation.
  • Use anomaly detection and intelligent alerts – save time that would otherwise be spent watching lots of dashboards. Simple threshold-based alerts don’t work well in operational systems where the load is going up and down and triggering alerts each time a limit is passed. Good monitoring solutions provide Machine Learning algorithms and the possibility of setting adjustments.
  • Integrate alerting with workflows in your organization – a typical use case is forwarding to incident management platforms to route alerts to the right support desk:

Screen Shot 2019 03 27 at 11.11.26

Elasticsearch Working Principles

Setup, tuning, and troubleshooting of Elasticsearch require a basic understanding of how  Elasticsearch works and a deeper knowledge about the important functions and settings.  The basic principle of Elasticsearch is the “shared nothing” architecture:

A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.” Definition by Wikipedia

In other words, different nodes keep their own data and perform distributed computing. This is the base for scalability by ‘simply’ adding additional nodes to a cluster. The data structure used by Elasticsearch is an inverted index (using Lucene). An inverted index is a mapping of each unique ‘word’ (content) to the list of documents (location) containing that word, which makes it possible to locate documents with given keywords very quickly.

Index information is stored in one or multiple partitions also called shards. Elasticsearch is able to distribute and allocate shards dynamically to the nodes in a cluster. This mechanism makes it flexible with regard to data distribution. Redundancy can be provided by distributing replica shards (‘copies’ of the primary shards) to the cluster nodes. Index operations use primary shards and search queries use both shard types. Having multiple nodes and replicas increases query performance. Various options, defined either in its static configuration file or dynamically set via the exposed HTTP API, are available to control the behavior of Elasticsearch.

A lot of new expressions appear in the last two previous paragraphs – let’s walk through the following ‘vocabulary’ page for a better understanding of the following sections.

Elasticsearch Vocabulary: Understanding Elasticsearch Engineers

JVMJava Virtual Machine – a runtime engine that executes bytecode on many operating system platforms.
IndexIn terms of data modeling, it could be compared to a collection in MongoDB or CouchDB. A single index can hold one data type, with its own data structure. The schema is defined by the Mapping. An index is built from 1-N primary shards, which can have 0-N replica shards.
ShardA shard is a single Lucene index instance, which is managed by Elasticsearch. Elasticsearch knows two types of shards:

  • primary shards, or active shards that hold the data
  • replica shards, or copies of the primary shard

automatic mapping

A schema definition for the index. The schema can only be changed as long as no documents are indexed. Extending the mapping with new fields or adding sub-fields is possible at any time, but changing the type of fields is a more complex operation including re-indexing of the data.

When no mapping is defined Elasticsearch tries to detect the type of field (String, Number, IP, Geo-Point).  It then creates an automatic mapping for the data type and sets default analyzers for strings and add the “keyword” sub-field (not analyzed). By default you get a string mapped as both text and a keyword sub-field. So you can do full-text search on one hand, and exact matches, sorting and aggregations on the other. It’s important to define a correct mapping to avoid problems at query time, e.g., when a wrong analyzer is used, or when a field gets automatically identified as Number and, later on, that same field contains text causing the indexing to fail.

Segments  Chunks of a shard (Lucene Index)
Document A document is the main entity in Elasticsearch. Documents are represented in JSON format.  Documents are stored and indexed. The original is represented as “_source” in the API besides the actual indexed fields of a document. Search is only possible in indexed fields and retrieving the original field content is only possible in fields defined as “stored” in the Mapping (aside from the mentioned “_source” object that holds the complete document values). For efficient field-based display, the stored flag should be set when the “_source” objects are large – this can reduce network traffic and speed up the display of results.
NodeA node is a single running Elasticsearch process. Nodes discover other nodes in the cluster by their shared cluster name. Depending on the node configuration, multicast or unicast discovery is used. Multiple nodes can run on a single physical server, VM, or container.
ClusterA cluster consists of one or more nodes. Each cluster has a single master node, which is automatically elected (e.g., when the current master node fails).

Scaling an Elasticsearch Cluster

The installation of an Elasticsearch server is simple.  The understanding of working principles and metrics influenced by configurations are the base for safe operations. Looking at typical Elasticsearch projects we see the following setup demands:

  • High index rate
  • Fast query response
  • Large capacity

In many use cases a combination of the above may be demanded, which leads to a setup with nodes dedicated to different tasks: Master nodes, client nodes (load balancer) and data nodes for storage. Each node type can be tuned to its specific task and the system can scale by increasing the number of data, client or master nodes. Having three dedicated master nodes, not used for anything else other than cluster management, reduces the chance of instability (e.g. “Split Brain Problem”). Nevertheless, a balanced tuning is the only way to achieve good results with contradicting requirements like high index rate, fast queries, and high capacity.

Bildschirmfoto 2014 12 03 um 23.22.31

Example: Scaling Elasticsearch with different node types

  • Define different node types – To create different node types set the following values in the Elasticsearch configuration file:
  • Client Node / Load Balancer – this node can serve as HTTP endpoint for the client applications. It holds no data and is no master node.
    •   false
    • node.master: false
  • Master Node – is master node and holds no data
    • node.master: true
    •   false
  • Data Node – holds data and is no master
    • node.master: false
    •   true

Elasticsearch and Java

Elasticsearch is written in the Java programming language. The source code is compiled to a portable bytecode for the Java Virtual Machine (JVM), available on many operating system platforms. JVM has a special concept of memory management. First of all, each Java process has a limit on the amount of heap memory it can use. Memory limits can be configured via command line options to the JVM, e.g., minimum and maximum heap. In addition, Java uses a garbage collector. Unlike memory management in C/C++,  where unused memory needs to be destroyed by a free/delete instruction, Java programs don’t need to free memory by such an instruction.  The garbage collector tracks objects, which are not referred anywhere, and frees the memory according to its garbage collection strategy, typically causing a delay. Depending on the used Java Runtime Environment the garbage collection might behave differently.

Elasticsearch runs in a JVM, so the optimal settings for the JVM and monitoring of the garbage collector and memory usage are critical. There are several things to consider with regard to JVM and operating system memory settings:

  • Avoid the JVM process getting swapped to disk.
  • Define the heap memory for Elasticsearch.
  • Monitor memory metrics and merge times of indices to see the actual demand of the Elasticsearch server. Setting the heap size too high might cause the index merge times (relates to disk IO) and garbage collection times to increase.

Bildschirmfoto 2015 05 20 um 00.10.29

Please refer to Chapter „5.5 Java — Heap Usage and Garbage Collection“ for further details of the tips above.

How Indexing Works

Documents are indexed by HTTP POST requests containing the JSON documents.  The URL specifies the index name and type name and optional document ID. Using the curl command line an index operation looks like this:

curl -XPOST -H "Content-Type: application/json" 'localhost:9200/customer/_doc/1?pretty' -d '
  "name": "John Doe"

The response:

  "_index" : "customer",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  "_seq_no" : 0,
  "_primary_term" : 1

But what happens inside?

Elasticsearch looks for the node with the primary shard to index the document. In more detail: the node collects the data and writes it first to the write-ahead log (called transaction log), and then to immutable segment files via Lucene. The configuration parameters’ refresh interval defines when these segments are added to the shard and are visible for search. Aside from the primary shards, all replica shards are updated. In the case of a document update operation, Elasticsearch marks the current document as deleted and writes the new document to the disk. The merge combines smaller segments to larger ones. During the process, it cleans up deleted documents. This means that updates of documents are costly operations, which is why many systems have indexing as the last step of data processing or use preprocessing queues.

The throughput of merges is auto-throttled to avoid disk I/O problems. The merge scheduler runs in multiple threads when needed. The maximum number of merge threads can be configured to control the maximum number of parallel merges.


Elasticsearch index operation

When each document is sent in a separate HTTP request, it creates overhead for transmission, open, writing, and closing files.

That’s why bulk indexing API (_bulk) is very relevant to indexing performance. Using the bulk index format many documents can be transmitted to Elasticsearch in a single HTTP request, and Elasticsearch can optimize the index operation. Please note that the default setting for the size of the HTTP post request is only 100 MB. (this number can be adjusted in the Elasticsearch configuration). Clients need to be aware of this setting because larger bulk index requests will fail.

The best practice for faster indexing is to increase the “refresh interval,” which is by default one second. This means that documents are “committed” and available for search after one second. During a bulk index, this creates overhead because the underlying Lucene index is reopened every second during the operation. Setting the refresh interval from the default of one second to five and thirty seconds might show results like this:

  • refresh_interval: 1s   – 2.0K docs/s
  • refresh_interval: 5s   – 2.5K docs/s
  • refresh_interval: 30s – 3.4K docs/s

Defining the right number of shards and replicas is, in fact, a crucial question. Replica shards have to be updated during indexing, which decreases the indexing throughput.  On the other hand, having multiple replicas supports distributed search operations. The number of replicas can be dynamically changed. Splitting and merging existing indices is possible under some conditions, but not an easy and cheap operation in terms of required system resources.
As long as the shards are not too large it might be faster to reduce the number of replicas, perform large imports and increase the number of replicas again. In this case, less CPU is used, while network traffic and I/O goes up when the new replica copy process starts.

The idea to utilize the “refresh” API to disable the index refresh during a one-time import and enabling it again after indexing might save some time, but in fact, a very costly operation is just executed ‘later’ and it’s not applicable when the system gets continuous new inputs.

Search Operations

Search operations are distributed to all nodes and executed using the primary and replica shards. The client sends a search request to a node which forwards the request to other nodes and returns the combined result back to the client. Elasticsearch has a feature to control routing of requests to make the query forwarding much more efficient.


Distributed search request over several node and shards

Optimizations for routing or Elasticsearch DSL queries like using cached filter would be a topic in and of itself and can’t be covered in this space.  From an operations point of view, the following settings are the most relevant for search performance:

  • Tune Request Cache size and Query Cache size – Monitor the utilization of the request and query cache of your application to find the accurate value.
  • Maximum shard size and number of shards – shards have overhead; therefore the number of shards should be balanced. A single shard might look like the most efficient but limits the distributed search capability. This setting can only be done at index creation time. By using time-based indices and index aliases the ‘pressure’ to make the right choice is taken away.  The next time-based index could be created with different settings. “Curator” from Elastic is a tool that helps with automatic setups for time-based indices and aliases.
  • Number of replicas  – only replicas on additional nodes improve distributed search, but they decrease the index performance. Unlike primary shards, the number of replicas can be changed at any time using index settings API “number_of_replicas”.
  • Client nodes for load balancing of requests – load balancer nodes are neither master nodes nor data nodes.  They interface with clients and data nodes to serve index and search requests.


The setup and tuning of Elasticsearch require a good knowledge of configuration options and performance metrics. In this post, we showed how Elasticsearch works, while in part 2 of this blog series, we discuss top 10 Elasticsearch metrics to monitor.

Finally, if you need additional help, keep in mind that Sematext offers a full range of services for Elasticsearch.

Start Free Trial