At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

When and How to Delete an Elasticsearch Index?

April 6, 2023

Table of contents

As an Elasticsearch administrator, you will inevitably have to delete an index at some point. There could be several reasons why this might be necessary:

  • A field is mapped as an incorrect data type.
  • A change in document payload fields has resulted in mapping conflicts.
  • An index has become too large.
  • Index data is no longer needed.
  • The index is old and has not been deleted by an ISM policy.

This article provides an overview of available methods to back up and restore an Elasticsearch index in the event of deletion.

How to Delete an Index in Elasticsearch?

In any of these cases, you will need to delete a single index or multiple indices from Elasticsearch as part of fixing whatever issue you are dealing with. This can be done using a command like the following:

DELETE movies-000001

This request can also be made with a wildcard expression and optional accompanying parameters:

DELETE movies*?allow_no_indices=false&ignore_unavailable=true

Before deleting an Elasticsearch index, there are a few things you should consider.

First, is data still actively being written to the index you are about to delete? You should be able to check this using disk I/O monitoring applications or by leveraging the index stats API:

GET movies-000001/_stats

In any case, you need to stop the data ingestion to this index or disable automatic index creation to prevent Elasticsearch from recreating the index automatically and, potentially, incorrectly. If you are attempting to delete the current write index of a data stream, you will need to roll the index over before you can do so.

Second, you need to consider any active applications that will be making search requests to the index in question. They could encounter search errors due to unavailable data, which can result in application downtime.

The last and most important consideration is whether the index data you are deleting is important and needs to be kept as performing this operation will result in the loss of all documents, shards and metadata associated with the index. If this is the case, you will need to back up the index to prevent any permanent data loss.

Index Data Backup Methods

The following methods can be used to back up an index’s documents before deletion.

Snapshot Create and restore a snapshot of an index
Reindex Reindex all index documents to a different index
Clone Clone all data in an existing index
Split Split an index into a new one with more primary shards
Shrink Shrink an index into a new index with fewer primary shards

Although all of these methods can be used to back up an index, each has unique characteristics that can come in useful in different situations. Let’s take a deeper look at each one.

Snapshot

Snapshots are typically used as data backups for disaster recovery because they can store cluster state as well as index metadata along with your index documents. However, snapshots can also be useful when fixing other issues by providing the following options:

  • Backing up or restoring only specified indices using a wildcard expression
  • Including or excluding an index alias
  • Including or excluding the global cluster state
  • Renaming indices when restoring them using a renaming pattern
  • Specifying new settings when restoring indices
  • Ignoring old settings when restoring indices
  • Performing operations on multiple indices using a single request

Before you can start creating snapshots, you need to ensure that your repository is configured correctly. For example, if you are using a shared file system repository, all your Elasticsearch nodes need to have access to the same shared file system that you have dedicated for snapshots. Elasticsearch also requires that the path to this files system is specified by path.repo: [“/mnt/snapshots”] in your elasticsearch.yml or opensearch.yml file.

You also need to register a snapshot repository in this location:

PUT _snapshot/my_snapshot_repository
{
    "type": "fs",
    "settings": { 
        "location": "/mnt/snapshots"
    }
}

Once that is set up, you can take a snapshot of any index as follows:

PUT _snapshot/my_snapshot_repository/my_snapshot
{
    "indices": "movies-000001",
    "ignore_unavailable": true,
    "include_global_state": false,
    "partial": false
}

To restore a particular index from an existing snapshot using some of the options discussed earlier, the following example request can be made:

POST _snapshot/my_snapshot_repository/my_snapshot/_restore
{
    "indices": "movies-*",
    "ignore_unavailable": true,
    "include_global_state": false,
    "include_aliases": false,
    "partial": false,
    "rename_pattern": "movies-(.+)",
    "rename_replacement": "movies-$1_restored",
    "index_settings": {
        "index.blocks.read_only": false
    },
    "ignore_index_settings": [
        "index.refresh_interval"
    ]
}

Restoring an index using a snapshot does have its limits. If you need to change the mappings of the index, you should consider the reindex operation.

Reindex

With the reindex operation, documents are recreated in a target index with its own settings and mappings. This is useful if you need to change the data type of a specific field or any index settings that cannot be dynamically updated.

Documents from one particular index can be reindexed to another as follows:

POST /_reindex?wait_for_completion=false
{
    "source":{
        "index":"movies-000001"
    },
    "dest":{
        "index":"movies-000002"
    }
}

Adding the wait_for_completion=false parameter to this request will immediately return a task id that can be used to monitor reindexing progress. It is very important to check that all documents have been reindexed successfully before deleting the source index. Before you perform this operation, you need to ensure that the target index has been created correctly, either manually or using index templates, with the desired mappings and index settings.

Clone

If you simply need to rename an existing index while retaining its original settings and mappings, you should consider using the clone operation. An index can be cloned using a request like the one in this example:

POST /movies-000001/_clone/movies-000001_cloned

It is important to keep your index naming conventions in mind when cloning an index if you want the data in question to remain searchable under existing index patterns. For example, if you name the new index cloned_movies-000001, it will not match a search request based on movies-*. In contrast, movies-000001_cloned will still match existing saved index patterns, provided that they use the appropriate wildcard expressions.

Split

If an index’s primary shards have become too large, you should consider splitting each of them into two or more primary shards to better fit your cluster architecture. In our example, splitting an index can be done using the following request:

PUT /movies-000001/_split/movies-000001_split
{
    "settings": {
        "index": {
            "number_of_shards": 6,
            "number_of_replicas": 1
        }
    },
    "aliases": {
        "movies": {}
    }
}

It is important to note that the number of shards an index can be split into is determined by the number of original primary shards as well as the index.number_of_routing_shards setting. This is a static index setting that needs to be specified on index creation, unless the index is closed. Using default settings, index shards can be split by factors of 2 up to 1024 shards. In the case that an index consists of only one primary shard, it can be split into any arbitrary number of primary shards.

Shrink

If an existing index has multiple primary shards that are perhaps too small and might cause an unnecessarily high shard count, you should consider using the shrink operation. An index can be shrunk using a request like the following:

POST /movies-000001/_shrink/movies-000001_shrunken
{
    "settings": {
        "index.number_of_shards": 1
    },
    "aliases":{
        "movies": {}
    }
}

There are many ways to back up Elasticsearch index data in the event of deletion. It is always useful to know the unique traits of each method to exploit them to suit your specific needs when troubleshooting and fixing Elasticsearch issues.

Your One Stop Shop for Elasticsearch


Platform Open Source Stack Community Help Monitoring – Metrics, Logs. Health, Alerting SaaS Training, Implementation & Maintenance Production Support & Consulting
Elasticsearch
Elasticsearch + Sematext

Delete Index – Best Practices

Delete is the request that carries the highest risk by far, so it is of utmost importance to take extra special care when deleting Elasticsearch indices. First and foremost, proper access control should be implemented to prevent “non-admin” users from deleting data they aren’t supposed to, either on purpose or by accident.

It is always recommended to back up an index before deletion unless you are absolutely certain the data is no longer needed. Furthermore, if you are using Elasticsearch devtools (or something similar) to make cluster API requests, it is advised that you clear any delete requests after you are done using them to prevent yourself from accidentally deleting something in the future.

Here are a few other considerations when working with the different index data backup methods described above.

Snapshot Recommendations

If you plan on using snapshots as your primary data backup method, you need to consider how often your data is backed up and how long a particular snapshot needs to be stored based on your clusters’ specific needs. Indefinitely creating snapshots without deleting old ones will ultimately result in the snapshot volume reaching its storage capacity. Deleting old snapshots can be done either manually or by automating the process using snapshot management.

Reindex Recommendations

When reindexing documents from one index to a new one, take special note of the index’s mappings. If you do not specify the mapping for every field, you run the risk of fields being mapped incorrectly, which can result in some documents being rejected from the index due to mapping conflicts. Using the task ID to monitor the progress and results of a reindexing operation helps with checking that all documents are reindexed successfully. To ensure that mappings are correct, you can compare the old and new mappings using syntax similar to the following example request:

GET movies-000001/_mapping

Clone Recommendations

Cloning an index does not really change much other than its name and a few settings. This makes it quite useful for archiving old data under different index naming patterns and/or on cold storage architecture if the data in question will no longer be queried regularly.

Split and Shrink Recommendations

Splitting or shrinking an index can be useful and is sometimes necessary. If an index gets rolled over too frequently by its ISM policy, this will eventually lead to an increased shard count. If an index consists of multiple primary shards that are quite small and can be combined to reduce shard count, it can be shrunk to fewer shards to improve cluster performance.

On the other hand, if an index has become too large and thus difficult to move around in your Elasticsearch cluster, it might be useful to split it into multiple smaller primary shards. Both of these situations can be avoided by correct cluster and index design, but it can sometimes be quite difficult to predict if or when they will eventually occur.

Still looking for some help? Sematext provides a full range of services for Elasticsearch and its entire ecosystem.

Conclusion

Deleting an Elasticsearch index is easy, but it is important to understand why it needs to be deleted and how to deal with the data it contains before and after doing so. Hopefully, this article provided some useful information to help you safely back up, delete, and restore indices to solve your index-related Elasticsearch issues.

Structured Logging

Definition: What Is Structured Logging? Structured logging is the practice...

Uptime

Definition: What Is Uptime? Uptime refers to the duration during...

JVM Heap

What Is Java Heap Memory? Java heap memory is a...