Sometimes we just need to replay production queries – whether it’s because we want a realistic load test for the new version of a product or because we want to reproduce, in a test environment, a bug that only occurs in production (isn’t it lovely when that happens? Everything is fine in tests but when you deploy, tons of exceptions in your logs, tons of alerts from the monitoring system…).
With Elasticsearch, you can enable slowlogs to make it log queries taking longer (per shard) than a certain threshold. You can change settings on demand. For example, the following request will record all queries for test-index:
curl -XPUT localhost:9200/test-index/_settings -d '{ "index.search.slowlog.threshold.query.warn" : "1ms" }'
You can run those queries from the slowlog in a test environment via a tool like JMeter. In this post, we’ll cover how to parse slowlogs with Logstash to write only the queries to a file, and how to configure JMeter to run queries from that file on an Elasticsearch cluster.
Parsing slowlogs with Logstash
The first step is to get the actual query JSON from the slowlogs and put it in a file, one query per line. This will make it easy for JMeter to pick up the queries and replay them. Below you can find a Logstash config derived from the one described in an earlier post about parsing Elasticsearch logs with Logstash. The field we’re interested in is source_body, which contains each query. We’ll put it into the queries file (/var/tmp/just_queries) one per line via the json_lines codec.
input { file { path => "/var/log/elasticsearch/elasticsearch-test_index_search_slowlog.log" start_position => "beginning" } } filter { grok { # parses the common bits match => [ "message", "[%{TIMESTAMP_ISO8601}][%{DATA}%{SPACE}][%{DATA}%{SPACE}]%{SPACE}[%{DATA}]%{SPACE}[%{DATA}][%{DATA}] took[%{DATA}], took_millis[%{DATA}], types[%{DATA}], stats[%{DATA}], search_type[%{DATA}], total_shards[%{DATA}], source[(?(.|r|n)*], extra_source[)%{DATA}" ] } mutate { gsub => [ "source_body", "], extra_source[$", "" ] } } output { file { path => "/var/tmp/just_queries" codec => "json_lines" message_format => "%{source_body}" } }
Building the JMeter test plan
Next up, we need the JMeter test plan in order to run it. To do that, you’d download and run JMeter, which will open a new test plan. Right-click the Test Plan icon on the left and go to Add -> Threads(Users) -> Thread Group. There, you can specify on how many threads (as if they were concurrent users) would run queries and how many queries to run for each thread.
Under this thread group, we’ll create a config element that reads queries, and a sampler that would run those queries via HTTP.
Reading queries into a variable
For reading queries you can use the CSV Data Set Config. You’d right-click the thread group, go to Add -> Config Element -> CSV Data Set Config. You’d have to add the name of the file containing your queries (/var/tmp/just_queries in this example), the variable name that gets the query (BODY in this example) and specify that queries are delimited by newline (n).
Using the variable in the HTTP requests
With queries in the BODY variable, the last step is to create an HTTP Request sampler that would run those queries. Right-click the thread group, go to Add -> Sampler -> HTTP Request. Then, you need to make the following changes:
– specify the host and port to run queries on. If you run them remotely, it’s likely that it will be localhost:9200
– change Method to POST. This is important, because GET would ignore the post body where your query will be
– the URL path. Typically INDEX-NAME/_search
– specify that ${BODY} will be the Post Body
Run queries remotely with JMeter
If your cluster is in a remote datacenter (or in a cloud) I’d avoid running queries over WAN, because it will introduce latencies and possibly invalidate tests. Instead, you can download JMeter on the remote machine, copy the test plan over and run it there:
bin/jmeter -n -t /path/to/test-plan.jmx -l /path/to/output-file.jtl
Then you can copy the result file back to your machine and analyze it using JMeter’s GUI.
Checking results
To check results, JMeter offers quite a lot of aggregations. Often, I just need aggregate numbers, and for that I use Summary Report (right-click on Test Plan -> Add -> Listener -> Summary Report). There, I can browse for the JTL file and open it to see average query times, throughput, and so on:
Checking individual query results
If you this something is fishy with the results (e.g. query times too low) you can check what Elasticsearch replies to each query by using the Results Tree view. You’ll probably want to use JMeter’s UI for that, and to forward requests from your own machine, you can forward the remote server port via SSH:
ssh -L 9200:localhost:9200 user@elasticsearch01
And then you’d run queries from your local JMeter against localhost:9200.
To view the results as you run the test, you’d right-click on the thread group and go to Add -> Listener -> View Results Tree. There, you can select any query and see the request that went to Elasticsearch and the reply, to see if it matches your expectations.
Final words
There are some limitations to this procedure – here are the two big ones that I’m aware of:
– we’re running the queries against a hardcoded index. Ideally, we’d extract the index name from the slowlog and use it as a variable in JMeter
– slowlogs are per shard, so if a client query hits five shards and all of them cross the slowlog threshold, you’ll have five identical queries in the log, and JMeter will replay them all. You could get just one query from one of the shards (e.g. by filtering on shard number in Logstash), but often with load tests it’s fine if you put some more load on the cluster.
I hope you found this post useful, and if this kind of stuff excites you, I’m glad to say that we’re hiring worldwide. You can also follow us @sematext.