At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

Avoid common mistakes when assigning Elasticsearch Mappings in your cluster

May 18, 2023

Table of contents

Elasticsearch is a search and analytics engine that allows for complex searches on large datasets of different types and formats. Elasticsearch mappings are the blueprints that define how data is indexed and searched to support these data-related features. Understanding how Elasticsearch mappings work is essential to an effective Elasticsearch deployment.

In this article, we’ll explore the key concepts of Elasticsearch mappings and common Elasticsearch mapping pitfalls to avoid.

Need some help with Elasticsearch? Sematext provides a full range of services for Elasticsearch and its entire ecosystem. If you’re not sure where to start, check out this short video that talks about Elasticsearch metrics  to monitor:

 

Summary of key Elasticsearch mapping concepts

The table below summarizes the key Elasticsearch mapping concepts this article will explain in detail.

Concept Description
Data types Elasticsearch data types are comparable to database data types. One of the significant differences with Elasticsearch data types is the variety of string types.
Field properties Field properties such as reverse indices and doc_values add more detail to the data model.
Mapping types Elasticsearch supports dynamic and static mapping types (and templates). Dynamic Elasticsearch mappings automatically detect data types. Static mappings define the data type in advance.
Analyzers Elasticsearch analyzers are text-processing mechanisms that enable you to break text into tokens.

What are Elasticsearch mappings?

Elasticsearch mappings are data structure definitions that determine how storage and searching work within an Elasticsearch index.

When you create an index, you can define the mappings for each field in the index, including its data type and how it is indexed. In the following sections, we’ll take a look at four key Elasticsearch mapping concepts:

  • Data types: Indicators of the type data stored in a field.
  • Field properties: Sub-fields that describe data.
  • Mapping types: Define whether mappings are static or dynamic.
  • Analyzers: Text-processing mechanisms that define how characters are interpreted.

Data types

Elasticsearch has many of the same data types supported by the other databases. This includes string, numeric, boolean, and date data types. However, Elasticsearch has more string types than typical databases. This data type variety can cause misuse of the data type and lead to performance problems.

The sections below explain the different Elasticsearch text data types and when to use them.

Keyword

Keywords are strings that are only matchable exactly as entered. A keyword is not broken down into individual words based on delimiters such as spaces and does not have the full-text search features enabled. A typical use for keyword data types is matching unique identifiers for a document.

Text

Text fields have analyzers which are the key to full-text search, as we’ll explore later. Text fields are analyzed and broken down into individual words. This enables us to use different kinds of queries for full-text search (e.g., simple_query_string, query_string, and match_phrase_prefix)

Completion

The completion data type is used for indexing terms that support fast autocomplete suggestions.

Common data type pitfall to avoid: Using the wrong data type

It is easy to choose the wrong data type for a use case. Data types determine matching results, so getting the data type right is essential. For example, you should only use the keyword type to do exact match queries.

To understand the implications of different data types, consider the data set below:

[
 {
   "id": 1,
   "category": "dress"
 },
 {
   "id": 2,
   "category": "long dress"
 },
 {
   "id": 3,
   "category": "dress for women"
 }
]

And this corresponding query:

GET my-index/_search
{
 "query": {
   "term": {
     "category": {
       "value": "dress"
     }
   }
 }
}

You can use the query for both keyword and text fields. However, the behavior is different. For example, consider these two potential outcomes for our data set and query:

  • If the data type is text, results will include all categories containing the word “dress”
  • If the data type is keyword, results will include only the category called “dress”  (with id=1 from the example dataset)

Another type that can be used to text is “completion” mainly used for a search with spellcheck, and text auto-completion features.
Example query: completion data type

GET test-completion/_search
{ 
  "suggest": {
    "item-suggest": {
      "prefix": "tele",
      "completion": {
        "field": "my_completion_field", 
        "size": 5           
      }
    }
  }
}

You’ll notice in the response values such as “television” “telephone” as this query helps to find values that have the prefix “tele”.

One thing to remember is that completion builds an in-memory tree which can be expensive on lots of data (i.e., you don’t want to do that for indexes that store logs).

Some other important considerations regarding types are when choosing a data type for a field, consider the type of data that will be stored in the field and the type of queries that will be run on it. For instance, if a field will only store dates, using the date data type is appropriate. This will enable Elasticsearch to perform date-based queries such as range queries. Therefore before indexing a date field as an epoch long, take some time to think of the use case and whether you’ll need the date-exclusive queries in the future.

Pro-tip: Consider your use case before indexing.Always thoroughly analyze your use case before indexing your data. You may not easily be able to change your data types later.

If you need to change types, you can always reindex the data using an out-of-the-box reindex API shipped with Elasticsearch. Here is an example:

POST _reindex
{
 "source": {
   "index": "old-index",
   "_source": ["name_of_field"]
 },
 "dest": {
   "index": "new-index"
 }
}

Keep in mind that this means you’re recreating the index. The new index is going to be empty initially. The API may take hours to move your documents, depending on the data volume.

Your One Stop Shop for Elasticsearch


Platform Open Source Stack Community Help Monitoring – Metrics, Logs. Health, Alerting SaaS Training, Implementation & Maintenance Production Support & Consulting
Elasticsearch
Elasticsearch + Sematext

Field properties

In Elasticsearch, apart from the data type, you can also have some properties added to the field, which can look like this example:

{
 "mapping": {
   "properties": {
     "images": {
       "type": "text",
       "index": false
     },
     "title": {
       "type": "text",
       "index": false,
       "copy_to": "searchableField"
     },
   .....
   }
 }
}

Index

If a field is not indexed, it’s impossible to use this field to search. Unindexed fields are only available for storage and sometimes (based on type) aggregation purposes. An image path field that isn’t often searched is an excellent example of a use case for disabling indexing. Disabling indexing for unsearched fields can allow you to be more efficient with your hardware.

To reduce the number of indexed fields, compare your existing indexed fields to those used by your filters. Then, set the unused field’s index value to false and perform a reindex.

If you use a high volume of fields for text searches, it is common practice to use “copy_to” to copy the fields to a single field and only index that one field though you’ll need to keep in mind the potential caveats such as the fact that you can’t boost on specific fields (boost text from item title field x2 opposed to keeping category x1 (default) for example).

doc_values

Elasticsearch doc_values enable aggregations and sorting. They store the values of a field in a column-oriented data structure, which is helpful for aggregations. They are enabled by default for all aggregation-eligible fields.

Mapping types

Elasticsearch supports two types of mappings: dynamic and static mappings.

Dynamic mappings

A dynamic mapping allows Elasticsearch to detect field data types automatically as documents are added to the index. In some cases, dynamic mappings are useful. However, dynamic mappings can often create problems.

One potential problem is that Elasticsearch cannot decide if a string provided is a text field or a keyword.  This will cause some of your queries to fail. It can also cause you to create some fields you did not intend to since it’s pretty easy to manipulate the whole mapping scheme by simply adding/indexing a document that includes a new field.

Since this is the default setting for any new mapping, you have to disable it as such explicitly:

{
 "my-index": {
   "mappings": {
     "dynamic": "false",
     "properties": {
       "name": {
         "type": "text"
       },
       "blood_group": {
         "type": "keyword"
       },
       ....
     }
   }
 }
}

Static mappings

Unlike dynamic mappings, Elasticsearch static mappings require you to define the mappings for each field in advance. This can be more work upfront but can ensure that the data is indexed and searched correctly.

Elasticsearch dynamic templates:

Allow you to define custom mappings for fields that are not explicitly mapped in your index. This is especially useful when dealing with dynamic and evolving data structures or when you want to apply certain mapping rules to fields that match a specific pattern.

To create dynamic templates, you need to define them in the index template or update the mappings of an existing index. Dynamic templates consist of a name, a match rule, and the mapping settings to be applied.

an example of a dynamic template:

PUT my_index
{
 "mappings": {
   "dynamic_templates": [
     {
       "full_text": {
         "match_mapping_type": "string",
         "match": "*_text",
         "mapping": {
           "type": "text",
           "analyzer": "english"
         }
       }
     },
     {
       "long_numbers": {
         "match_mapping_type": "long",
         "match": "*_long",
         "mapping": {
           "type": "long"
         }
       }
     }
   ]
 }
}

In the example above, we have defined two dynamic templates:

full_text: This template matches any field with a name that ends in _text and has a string data type. The mapping settings for this template configure the field as a text field with the english analyzer.

long_numbers: This template matches any field with a name that ends in _long and has a long data type. The mapping settings for this template configure the field as a long field.

When documents with new fields are indexed ES will match its name against the patterns provided in the templates. If a field does not match any dynamic template, Elasticsearch will apply its default dynamic mapping rules.

Many prefer to use dynamic templates as they regard it as a common ground between static and dynamic mappings.

Analyzers

Analyzers are text-processing mechanisms that enable you to introduce rules for processing strings. These rules are part of the field-level mapping settings. Therefore, changing them requires reindexing the data. Each field can have an analyzer configured, which determines how the text is broken down into tokens that can then be searched. The same process may also be applied to the query string.

Analyzers consist of different components, mainly a tokenizer and zero or more token filters. Tokenizers are responsible for breaking down the input text into individual tokens, while token filters are used to modify or remove tokens based on specific criteria. Examples of token filters include lowercase filters, stop word filters, and word delimiter filters. These components enable a wide range of text-processing features to improve the search experience.

You can use the analyzer as in this example:

{
 "my_index":{
  "settings":{
    "analyzer": {
           "default": {
             "filter": [
               "french_lowercase",
               "french_asciifolding"
             ],
             "char_filter": [
               "french_char_filter"
             ],
             "type": "custom",
             "tokenizer": "standard"
           }
         }
    },
…..
}

The benefit of using analyzers is that you can let the data be kept and retrieved as is, while allowing to reach it based on the rules and filters that you define(make the search for francais return français without manipulating the word and storing it in every possible form).

Common analyzer pitfalls: Incorrect selections and misconfigurations

Two common analyzer pitfalls are:

  • Using the wrong analyzer: Elasticsearch provides many built-in analyzers for different languages and use cases. For example, using a French language analyzer for the Turkish language can cause misinterpretation and false matches for some of the characters that are different between the two languages.
  • Failing to configure the analyzer: Elasticsearch provides many options for configuring analyzers, such as setting stop words, stemming rules, and synonyms. You have to ensure that the default values for these configurations (especially check the stop words for words that are added but might be necessary for your product) match your product needs. The same goes for the configurations you have set. Otherwise, you could have poor-quality results.

Key takeaways

When working with Elasticsearch mappings, it is important to follow best practices to ensure optimal search and indexing performance. Here are some recommendations to keep in mind:

  • Make sure mapping creation is done correctly: When creating mappings, define each field’s data type and other properties correctly. Elasticsearch may only index and search the data correctly if the mapping is correct.
  • Use the Reindex API: If you need to change the mapping of an index, use the Reindex API to create a new index with the correct mapping and copy the data from the old index to the new one.
  • Only store data structures that you use: Indexing unnecessary fields can slow down indexing and increase the index size. Only index fields are necessary for searching and filtering.

Understanding Elasticsearch mappings is crucial to ensure that data is indexed and queried efficiently. By selecting the appropriate data types, field properties, and analyzers, and avoiding common mistakes, it is possible to optimize Elasticsearch’s performance and improve search results.

Java Logging Basics: Concepts, Tools, and Best Practices

Imagine you're a detective trying to solve a crime, but...

Best Web Transaction Monitoring Tools in 2024

Websites are no longer static pages.  They’re dynamic, transaction-heavy ecosystems...

17 Linux Log Files You Must Be Monitoring

Imagine waking up to a critical system failure that has...