Intro to Elasticsearch, September 25-26, 2017

Days: September 25-26, 2017

Time: 9:00 AM to 1:00 PM EDT each day

Cost: $800 / participant

Opening soon


Radu Gheorghe

Comprehensive 1-day Elasticsearch class taught by Radu Gheorghe a seasoned Elasticsearch instructor and consultant from Sematext, author of Elasticsearch in Action, and frequent conference speaker. After taking this course you will understand all core Elasticsearch concepts – data, master, and client nodes and their differences, sharding, replication, mapping, search relevance scores, etc. You will be able to index data into Elasticsearch and retrieve it using search and realtime get APIs. You will have a solid grasp of the underlying query parsing, analysis, tokenization, and various types of queries. Finally, you will learn about a number of different types of Elasticsearch aggregations. See course outline below for more. Each section is followed by a lab with multiple hands-on exercises.

Who Should Attend

The course is designed for technical attendees with any knowledge level. No prior Elasticsearch experience or knowledge is required. Experience with Linux systems is not a must, but a basic familiarity with running shell commands (e.g., using curl command) will make the course more enjoyable.


None, just desire to learn!

Things To Remember

  • Arrive at least 20 minutes early to class and on time after each break.
  • Participants must bring their own laptop with OSX, Linux or Windows to the workshop. Laptops should have the latest version of Java installed. You should be comfortable using a terminal / command line.

  • If you have any dietary restrictions be sure to let us know at least a week prior to the training.

What We Provide

For this training Sematext provides:
  • A digital copy of the training material
  • A VM with all configs, scripts, exercises, etc.
  • Breakfast, lunch, snacks, coffee, tea, juices, soft drinks, and water

Course Outline


  1. Basic flow of data in Elasticsearch
    • what is Elasticsearch and typical use-cases
    • shards and replicas; packaging
    • installation; configuration files
    • indexing; what is an index, type and ID
    • mappings; stored and indexed fields; _source and _all
    • analysis basics
    • realtime get
    • search; how searches are distributed to shards
    • ranking by TF/IDF and BM25
    • aggregations and doc values introduction
    • updates; versioning
    • deletes; introduction to Lucene segment merges
    • Lab
      • CRUD operations
      • query and filter
      • pagination
  3. Controlling how data is indexed and stored
    • mappings and mapping types
    • multi-field definitions
    • default mappings; dynamic mappings
    • texts, keywords, integers and other core types
    • date formats
    • predefined fields; when to store fields separately vs using _source
    • analyzers; using the Analyze API
    • char filters
    • tokenizers: standard vs whitespace
    • token filters: lowercase, stopwords, synonyms, ngrams and shingles
    • Lab
      • exact match vs full-text search
      • using the asciifolding token filter for better internationalization
      • using language analyzers to support stemming
  5. Searching through your data
    • selecting fields, source filtering and fielddata fields
    • sorting and pagination
    • search basics: term, range and bool queries
    • enable caching through the filter context
    • match query: configuring the analyzer, operator, common terms and fuzziness
    • query string and simple query string queries
    • Lab
      • using various ways of selecting fields
      • configure sorting and pagination
      • using a bool query to combine different match, range and term queries
      • boosting exact matches above stemmed ones
  7. Aggregations
    • relationships between queries and aggregations; post filter, global aggregations
    • general optimizations: avoid script fields, set result size to 0 to cache
    • metrics aggregations: stats, cardinality, percentiles
    • why terms, cardinality and percentiles are approximate
    • multi-bucket aggregations: terms, ranges and histograms
    • single-bucket aggregations and nesting; how nesting works
    • Lab
      • configure sizes of results, per-shard and overall buckets
      • computing the cardinality of a field
      • sorting buckets by results of sub-aggregations
      • optimizing terms queries by configuring collect mode
      • nest the sum and histogram aggregations