Skip to main content
Logging, Search

Recipe: Reindexing Elasticsearch Documents with Logstash

Radu Gheorghe Radu Gheorghe on

If you’re working with Elasticsearch, it’s very likely that you’ll need to reindex data at some point. The most popular reason is because you need a mapping change that is incompatible with your current mapping. New fields can be added by default, but many changes are not allowed, for example:

  • Want to switch to doc values because field data is taking too much heap? Reindex!
  • Want to change the analyzer of a given field? Reindex!
  • Want to break one great big index into time-based indices? Reindex!

Enter Logstash

A while ago I was using stream2es for reindexing, but if you look at the GitHub page it recommends using Logstash instead. Why? In general, Logstash can do more stuff, here are my top three reasons:

  1. On the input side, you can filter only a subset of documents to reindex
  2. You can add filters to transform documents on their way to the new index (or indices)
  3. It should perform better, as you can add more filter threads (using the -w parameter) and multiple output worker threads (using the workers configuration option)

Show Me the Configuration!

In short, you’ll use the elasticsearch input to read existing data and the elasticsearch output to write it. In between, you can use various filters to change how documents look like.


To read documents, you’ll use the elasticsearch input. You’ll probably want to specify the host(s) to connect to and the index (check the documentation for more options like query):

input {
  elasticsearch {
   hosts => ["localhost"]
   index => "old-index"

By default, this will run a match_all query that does a scroll through all the documents of the index, fetch pages of 1000, and times out in a minute (i.e. after a minute it won’t know where it left off). All this is configurable, but the defaults are sensible. Scan is good for deep paging (as normally when you fetch a page from 1000000 to 1000020, Elasticsearch fetches 1000020, sorts them, and gives back the last 20) and also works with a “snapshot” of the index (updates after the scan started won’t be taken into account).


Next, you might want to change documents in their way to the new index. For example, if the data you’re reindexing wasn’t originally indexed with Logstash, you probably want to remove the @version and/or @timestamp fields that are automatically added. To do that, you’ll use the mutate filter:

filter {
 mutate {
  remove_field => [ "@version" ]


Finally, you’ll use the elasticsearch output to send data to a new index. The defaults are once again geared towards the logging use-case. If this is not your setup, you might want to disable the default Logstash template (manage_template=false) and use yours:

output {
 elasticsearch {
   hosts => ["localhost"]
   manage_template => false
   index => "new-index"
   document_type => "new-type"

Final Remarks

If you want to use time-based indices, you can change index to something like “logstash-%{+YYYY.MM.dd}” (this is the default), and the date would be taken from the @timestamp field. This is by default populated with the time Logstash processes the document, but you can use the date filter to replace it with a timestamp from the document itself:

filter {
 date {
   "match" => [ "custom_timestamp", "MM/dd/YYYY HH:mm:ss" ]
   target => "@timestamp"

If your Logstash configuration contains only these snippets, it will nicely shut down when it’s done reindexing.

That’s it! We are happy answer questions or receive feedback – please drop us a line or get us @sematext. And, yes, we’re hiring!

26 thoughts on “Recipe: Reindexing Elasticsearch Documents with Logstash

  1. Hi Radu,
    For a field which is unindexed, can you please guide me the steps?
    I am trying to get the location field as type geo_point to be indexed in my records.
    i am able to get them as geo_point but Unfortunately they are showing unindexed.
    Any help would be highly appreciated.

    1. If a field is unindexed, but stored (usually in _source), you can still do this: (1) create a new index with the new mapping (geo_point, for example) (2) reindex as shown in this recipe – then the new field will get the new data then (3) remove the old index. Or are you saying that you can get the field as geo_point but it says unindexed? Where does it say that? Because if it’s geo_point you should be able to run queries on it. Especially on ES 5.x and later (or Logsene), the field is either geo_point or not, if it’s geo_point then you can query it.

    1. Hello,

      You wouldn’t normally specify the mapping in Logstash, you’d create the new index with the correct mapping upfront. Like you did in the link you showed. I see you found a solution, I think the initial problem was caused by syntax – Elasticsearch thought you were defining a field when you were trying to define types (parent, child and grandchild would be types).

  2. Is it possible to also ‘copy’ the mapping with logstash, in order to migrate an index from an old ES server to a new one by reindexing?

  3. Since elasticsearch is not supporting aggregation + pagination, we are planning to put aggregated data in another index and query data from the another index. For this, we need to re-index data every time when aggregation query is fired. Can you please help us how we can do this with logstash? Provided our query would be dynamic every time

    1. Hi Akash,

      Recent versions of Elasticsearch do support pagination for aggregations (at least the terms aggregation):

      If you want to programmatically generate a config for Logstash, I think you have two options: one is to start Logstash with -e and provide the configuration string, the other is to change the config file and enable config reloading (see Then you can send a SIGHUP to Logstash every time you need it to reload the config.

  4. Somebody help with this error:

    {:timestamp=>”2016-09-15T09:33:12.980000+0100″, :message=>”Failed parsing date from field”, :field=>”timestamp”, :value=>”2016-09-15 08:33:05.813000″, :exception=>”Invalid format: \”2016-09-15 08:33:05.813000\””, :config_parsers=>”ISO8601,yyyy-MM-dd’T’HH:mm:ss.SSSSSSZZ,yyyy-MM-dd HH:mm:ss,SSSSSS,MMM dd YYYY HH:mm:ss”, :config_locale=>”default=en_US”, :level=>:warn}

    I noticed that when i changed the comma “yyyy-MM-dd HH:mm:ss,SSSSSS” to dot “yyyy-MM-dd HH:mm:ss.SSSSSS”. Logstash started and does not returned the error but kibana stopped visualizing. Once i returned the comma kibana started working again and logstash started giving the error again.

    my filter logstasg config”

    grok {
    add_tag => [ “valid” ]
    match => { “message” => “%{TIMESTAMP_ISO8601:log_timestamp} %{DATA} Processed (?:inbound|outbound) message for ([^\s]+): %{GREEDYDATA:json_data}” }

    json {
    source => json_data

    date {
    match => [ “timestamp”,”ISO8601″,”yyyy-MM-dd’T’HH:mm:ss.SSSZZ”,”yyyy-MM-dd HH:mm:ss,SSS”,”MMM dd YYYY HH:mm:ss” ]
    remove_field => [“timestamp”]
    target => “@timestamp”

    Could this issue attach to elasticsearch mapping?
    Elasticsearch mapping:
    “@timestamp”: {
    “format”: “yyyy-MM-dd’T’HH:mm:ss.SSSZ”,
    “index”: “not_analyzed”,
    “type”: “date”

    1. I think there are two problems here: one is date parsing withing Logstash – if your field contains a dot and your date filter doesn’t (because it has a comma), it throws an error.

      The other is when your Logstash filter contains a dot (so it parses correctly – no error) but you don’t see the logs in Elasticsearch/Kibana. I believe that would be a parsing error on the Elasticsearch side, because Logstash likely produces microsecond-level timestamps (like your original log) and Elasticsearch is configured to only accept millisecond-level ones.

      Elasticsearch logs can confirm/deny my hypothesis, but either way, I think if you remove the “format” line from the mapping then Elasticsearch will accept whatever Logstash throws at it by default.

      1. Hello RADU,

        Thanks for response. I have removed format from elasticsearch mapping but still at same issue.
        What else can i do?


        1. Hello Temitope,

          Anything interesting in the Elasticsearch logs? I think they will give us a better indication on what’s going on than trial and error.

          1. Hi Radu,

            Unfortunately there is no error coming from elasticsearch.
            Except you have what i should look for in elasticsearch.


        2. Hi Temitope,

          How about adding a stdout output next to your Elasticsearch one. You can use the JSON codec to see what it sends to Elasticsearch. If there’s no error, you can try indexing one of those documents manually and see what you get.

          1. Hello Radio,

            I really appreciate your supports so far. I got it fixed by adding Get as timezone and adding the milliseconds.


    1. With Logstash you can’t segment the reading part yet like your reindexer does (though that will likely become possible with 5.x and Sliced Scroll), though this is rarely a bottleneck. If it is, I guess the not-so-nice workaround is to start multiple inputs, one for each slice. Either way, you can tune the number of pipeline threads and output threads, to make use of more CPU while transforming and sending data.

Leave a Reply