At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

Migration from Elasticsearch to OpenSearch

July 4, 2023

Table of contents

Introduction

In this tutorial, we will guide you through the process of migrating from Elasticsearch to OpenSearch. OpenSearch is aan open-source search and analytics suite that is compatible with Elasticsearch.

There are several reasons why people choose to migrate, such as taking advantage of new features or differences in governance. Our OpenSearch vs Elasticsearch article further discusses how the two compare against each other.

In the following sections, we will discuss version compatibility considerations, and guide you through the migration process.

This method assumes that you will want to initially run both clusters in parallel until confidence in the new cluster has been gained and upstream apps have been migrated to replace the existing Elasticsearch client with the appropriate OpenSearch client.

As a side note: Sematext provides services for both Elasticsearch and OpenSearch. If you’re interested, be sure to check it out. Additionally, if you’d like a little help migrating from Elasticsearch to OpenSearch, let’s chat! Sematext provides a full range of consulting, training, and support services for OpenSearch. Find out more in the short video below:

Version Compatibility of Elasticsearch and OpenSearch

Before starting the migration process, it’s important to consider the version compatibility between Elasticsearch and OpenSearch. The following support matrix table outlines the compatible paths for migration:

Elasticsearch OpenSearch
v5.x, pre v6.8 Upgrade to ES v6.8 in place and reindex. Will need an upgrade iteration per major version.
v6.8 Upgrade to ES v7.10.2 in place and reindex.
v7.10.2 v1.1 Rolling or Cluster Migration
v7.11.x + v1.1-v2.x Migrate using remote Reindexing API

Alternatively, the Remote Indexing API on Opensearch can be used to migrate your data to a new cluster. See the section further down.

If you are using Xpack, Kibana Upgrade Assistant can be used to identify obsolescence in your schemas and help you upgrade your schemas to the next major version.

Note: Elasticsearch index schemas and templates created prior to v6.x may contain mapping types. These will require adjustments to remove use of these prior to starting the upgrades. Refer to Elastic Documentation around this

Upgrade Elasticsearch to 7.10.2

To begin the migration process, you need to upgrade your existing Elasticsearch cluster to version 7.10.2. There are two methods to accomplish this: the rolling upgrade method and the cluster upgrade method.

Rolling Upgrade Method

The rolling upgrade method allows you to upgrade your Elasticsearch cluster without downtime.

Upgrade the inactive masters first, then the active master, then the data nodes.

Refer to the Elasticsearch Documentation for a detailed description.

Steps:

  1. Disable shard allocation;
  2. Stop one Elasticsearch node and upgrade it to version 7.10.2;
  3. Start the upgraded node and wait for it to join the cluster;
  4. Repeat steps 2 and 3 for each remaining Elasticsearch node until all nodes are upgraded;
  5. Enable shard allocation:

Cluster Upgrade Method

The cluster upgrade method involves bringing down the entire Elasticsearch cluster, upgrading all nodes simultaneously, and then restarting the cluster. Here are the steps:

  1. Disable shard allocation;
  2. Stop all Elasticsearch nodes in the cluster;
  3. Upgrade all nodes to Elasticsearch version 7.10.2;
  4. Start all Elasticsearch nodes simultaneously.

Reindex

After each major-version upgrade of Elasticsearch, you need to reindex your indices to ensure that the on-disk format is updated.

The reindexing process involves disabling writes, selecting the appropriate reindexing method, and executing the necessary steps.

  1. Reindexing creates a new copy of each index. Ensure you have enough disk space for each new index;
  2. To prevent replication while reindexing, execute the following command:
    curl -XPUT localhost:9200/_settings -H 'Content-Type: application/json' -d 
    '{
        "index.blocks.write": true
    }'
  3. To reindex:
    curl -XPOST -H 'Content-Type: application/json' 
    'http://localhost:9200/_reindex' -d '{
        "source": {
        "remote": {
            "host": "http://elasticsearch:9200"
        },
        "index": "source_index"
        },
        "dest": {
        "index": "destination_index"
        }
    }'

We’ve written a whole post about Elasticsearch reindexing if you’re interested in learning more.

How to Create Your OpenSearch Cluster

OpenSearch is packaged for Docker, Helm, RPM, Debian, Ansible along with a Tar file for both Linux and Windows.

Usually this is a packaged release on Linux under systemd with Java bundled. The choice of which package type will be fairly obvious given your in-house orchestration or managed cloud platform.

You should refer to the official documentation for more information on options.

You would be looking to match the same architecture as you are using with Elasticsearch, unless rescaling on nodes with different spec or going with a managed OpenSearch.

OpenSearch has an expanded set of security features to consider, provided by theOpenSearch Security Plugin andsecurity analytics system.

You may already be using these if you have Platinum or Enterprise-level ElasticSearch licensing or you are moving to OpenSearch to access these features. If this is the case its likely to be in the following areas:

  • SAML SSO;
  • LDAP/OpenID/Active Directory integration;
  • Token based authentication / JWT;
  • IP filtering;
  • On-disk encryption;
  • ML anomaly detection;
  • Audit Logging;
  • Field and Document level security.

With OpenSearch, encryption at rest is performed by the operating system, usually LUKS. The other features are provided out of the box.

Links:

How to Extract Your Source Data From Elasticsearch

There are two stages used in performing the migration to minimise downtime:

  1. Transfer the bulk of your data by taking a snapshot from Elasticsearch and restore it on your OpenSearch cluster.
  2. If you will write to your source indices after the snapshot, use the Remote Reindexing API on OpenSearch with a selection query to transfer the last remaining delta. Note: This requires a timestamp field. Anything that writes to an index must set this value, at least for the delta. If not present it will need to be added.

We will discuss both methods and guide you through the necessary steps.

Option: On-Prem File System Snapshot

To prepare your data using a shared file system snapshot, follow these steps:

  1. Create a NFS directory share on the target for backups. In production set permissions
sudo mkdir -p /var/nfs/snapshots
sudo chmod -R 777 /var/nfs/snapshots

Note: In production, the actual directory and NFS permissions would be adjusted to suit your context.

  1. Add an entry to /etc/exports on the backup target for each one of your nodes:
/var/nfs/snapshots   <client_ip>(rw,sync,no_subtree_check)
  1. Create a local mount point on each node
mkdir -p /var/nfs/snapshots
  1. Mount the NFS share in /etc/fstab on all nodes.
# <file system>     <dir>           <type>   <options>       <dump>        <pass>
<target_ip>:/var/nfs/snapshots /var/nfs/snapshots  nfs      defaults 0  0
  1. Restart the NFS service
sudo systemctl enable --now nfs-server (or sudo exportfs -ra)
  1. Configure the snapshot repository in Elasticsearch:
curl -XPUT 'http://localhost:9200/_snapshot/fs_snapshot_repository' -H 
'Content-Type:     application/json' -d '{
 "type": "fs",
 "settings": {
  "location": "/var/nfs/backups"
 }
}'
  1. Take a snapshot of your Elasticsearch data:
curl -XPUT 'http://localhost:9200/_snapshot/fs_snapshot_repository/snapshot_1?wait_for_completion=true'

Option: AWS S3 Snapshot

For data preparation using an S3 snapshot, follow these steps:

  1. Set up an S3 bucket and configure the necessary permissions and roles.

  2. Configure the snapshot repository in Elasticsearch:

curl -XPUT 'http://localhost:9200/_snapshot/s3_snapshot_repository' -H 
'Content-Type: application/json' -d '{
 "type": "s3",
 "settings": {
   "bucket": "your-s3-bucket",
   "region": "your-region",
   "role_arn": "arn:aws:iam::your-account-id:role/your-role-name"
 }
}'
  1. Take a snapshot of your Elasticsearch data:
curl -XPUT 'http://localhost:9200/_snapshot/s3_snapshot_repository/snapshot_1?wait_for_completion=true'

How to Import Data into OpenSearch

The assumption is you want to transfer your data with least interruption to service and have both Elasticsearch and Opensearch running in parallel while you transition the rest of your production.

First the main bulk of the data will be transferred offline using a restore of your Snapshot backup taken in previous Step 1 to OpenSearch.

Then a small remote reindexing task will be used to periodically transfer data changes to keep OpenSearch up to date until you are ready to transition your production. This will select documents for transfer based on a timestamp, the point of the last snapshot or when the sync process was last run.

Bulk Transfer

Follow these steps to send the bulk of your data to OpenSearch:

  1. On your backup server add an entry to /etc/exports for each one of your OpenSearch nodes:
/var/nfs/snapshots   <client_ip>(rw,sync,no_subtree_check)
  1. Restart the NFS service
sudo systemctl enable --now nfs-server (or sudo exportfs -ra)
  1. Create a local mount point on each OpenSearch node
mkdir -p /var/nfs/snapshots
  1. Mount the NFS share in /etc/fstab on all OpenSearch nodes.
# <file system>     <dir>       <type>   <options>   <dump>        <pass>
<backup_ip>:/var/nfs/snapshots /var/nfs/snapshots  nfs  defaults 0  0
sudo mount /var/nfs/snapshots
  1. Create a snapshot repository in OpenSearch:

Option : Shared NFS Filesystem – when destination OpenSearch is on-prem:

curl -XPUT 'http://<opensearch_ip>:9200/_snapshot/fs_snapshot_repository' -H 
'Content-Type: application/json' -d '{
 "type": "fs",
 "settings": {
   "location": "/var/nfs/snapshots"
 }
}'

Option: S3 – when destination OpenSearch is located in AWS:

If Elasticsearch is already in AWS then your source snapshot will most likely already be in Amazon S3.

If you want to copy your local on-prem snapshot to S3 you can install the AWS CLI tools and run:

aws s3 cp /var/nfs/snapshots s3://your-s3-bucket/ --recursive
  1. Identify the snapshot repository to OpenSearch in Dashboards

  2. Restore the snapshot to OpenSearch using Dashboards.

How to Sync OpenSearch with Elasticsearch

Once you have the bulk data transferred to Opensearch you may want to sync data changes made in Elasticsearch through to OpenSearch while you migrate the rest of your production.

This method runs a selective remote reindexing on OpenSearch identifying relevant changes using an indexing timestamp. Note this timestamp field needs to exist in Elasticsearch, set in whatever upstream indexing process you use to write to Elasticsearch.

In Dashboards

POST reindex -H 'Content-Type: application/json' -d'
{
    "source": {
        "remote": {
            "host": "http://<elasticsearch_ip>:9200",
            "username": "elastic",
            "password": "your_password"
        },
        "index": "my_index",
        "query": {
            "filter": {
{ "range" : { "@timestamp" : { "gte" : "<{last run or snapshot timestamp}>"}}}
            }
        }
    },
    "dest": {
        "index": "dest"
    }
}'

Note this works only when Elasticsearch and OpenSearch both reside on-prem.

AWS Opensearch remote-reindexing from an on-prem or self-managed Elasticsearch cluster depends on your particular networking environment. Some different options around how to do this inAWS blog.

Refer to official documentation for details on options for tuning remote reindexing.

It is possible to perform the migration using just remote reindexing. This has the advantage in that upgrades through different versions of Elasticsearch can be avoided so long as obsolete schema features (usually mapping types) have been adjusted out prior to migration. This method is slow so would be more appropriate when downtime is not an issue, data is fairly static and a longer initial bulk transfer can be tolerated.

Client and Tool Compatibility

When migrating to OpenSearch, it’s important to consider client and tool compatibility.

OpenSearch provides a backward compatibility flag that ensures compatibility with some Elasticsearch clients, notably older Logstash and Filebeat. It’s recommended to update your clients and tools to the latest versions to take advantage of new features and improvements.

PUT _cluster/settings
{
    "persistent" : {
        "compatibility.override_main_response_version" : true
    }
}

Migrate Your Apps

Once you have migrated your data and set up your OpenSearch cluster, it’s time to migrate your applications to use the Opensearch Client appropriate to the language(s) you are using.

In your staging environment switch clients and application configurations to point to the OpenSearch cluster and test the functionality to ensure a successful migration.

If you are using other ELK stack products, you will need to migrate these as well:

  • Elasticsearch Kibana is replaced with OpenSearch Dashboards
  • Elasticsearch Logstash is migrated to OpenSearch Data Prepper

Note: OpenSearch has an output plugin for Logstash that could be used to either integrate the two systems and/or buy you time while you migrate other parts of your platform.

Post Migration

After completing the migration, it’s essential to perform post migration tasks before retiring the Elasticsearch cluster. These include such things as:

  • Functional confidence testing;
  • Load confirmation testing and performance tuning;
  • Configuration for scheduled Snapshot backups;
  • Index rollovers (for timeseries based indices);
  • Configure Monitoring and Alerting;
  • Migrate index lifecycle management from Elasticsearch ILM to OpenSearch ISM.

By following the steps outlined in this tutorial, you can successfully migrate from Elasticsearch to OpenSearch, leveraging the new features and capabilities provided by OpenSearch while maintaining compatibility with your existing applications.

Sending your Windows Event Logs to Sematext using NxLog and Logstash

There are a lot of sources of logs these days....

Mezmo

Definition: What is Mezmo? Previously known as LogDNA, Mezmo is...

10 Best Server Performance Monitoring Tools & Software in 2023

Setting up and administering multiple servers for business and application...