Advanced Elasticsearch


Radu Gheorghe

Comprehensive 1-day Elasticsearch class taught by Radu Gheorghe a seasoned Elasticsearch instructor and consultant from Sematext, author of Elasticsearch in Action, and frequent conference speaker. After taking this course you will understand all core Elasticsearch concepts – data, master, and client nodes and their differences, sharding, replication, mapping, search relevance scores, etc. You will be able to index data into Elasticsearch and retrieve it using search and realtime get APIs. You will have a solid grasp of the underlying query parsing, analysis, tokenization, and various types of queries. Finally, you will learn about a number of different types of Elasticsearch aggregations. See course outline below for more. Each section is followed by a lab with multiple hands-on exercises.

Who Should Attend

The course is designed for technical attendees with basic Elasticsearch experience. A person should be able to index data to Elasticsearch, run queries and aggregations, work with mappings and analysis. Experience with Linux systems is not a must, but a basic familiarity with running shell commands (e.g., using curl command) will make the course more enjoyable. If you do not have prior Elasticsearch experience, we strongly suggest you consider attending our Intro to Elasticsearch class first.


Intro to Elasticsearch or pre-existing knowledge of Elasticsearch concepts covered in Intro to Elasticsearch

Why Attend

  • Experienced instructor: The class is taught by an experienced Solr and Elasticsearch consultant who has worked on well over 100 Solr and Elasticsearch deployments.
  • Objective: We don’t have a natural bias. We’ll point out not only the good, but also the bad and the ugly about the subject we teach.
  • Pragmatic: We work with Solr / Elasticsearch on a daily basis and share our hard earned tips, tricks, and gotchas.
  • Hands-on: A lot of emphasis is put on the “lab” part of the class. Every section is followed by several hands-on exercises to help you learn better.
  • Comprehensive: We don’t give you just high level overviews. We go deep and answer any questions you have if you want to go even deeper.

Things to Remember

  • Arrive at least 20 minutes early to class and on time after each break.
  • Participants must bring their own laptop with OSX, Linux or Windows to the workshop. Laptops should have the latest version of Java installed. You should be comfortable using a terminal / command line.
  • If you have any dietary restrictions be sure to let us know at least a week prior to the training.

Course Outline


  1. Relevancy tuning
    • analysis: stopwords, synonyms, ngrams and shingles and their alternatives
    • using the Reindex API when mappings need to be changed
    • a deep look into BM25
    • multi-match query: choosing between best fields, most fields and cross fields modes
    • tweaking the score with the function score query
    • Lab
      • using the letter tokenizer as an option for URL matching
      • using ngrams to tolerate typos
      • using shingles to match compound words
      • implement hashtag search via the word delimiter token filter
      • searching across multiple fields
      • boosting documents based on date and number of views
      • typo tolerance without using ngrams
      • reducing the impact of common words without using stopwords
  2. Advanced aggregations
    • finding trends and outliers with the significant terms aggregation
    • cheaper and more representative results with the sampler aggregation
    • field collapsing with the top hits aggregations
    • pipeline aggregations; moving averages
    • Lab
      • checking trends the significant terms aggregation
      • show the latest hit per category
      • using the moving average aggregation
  3. Working with relational data
    • arrays and objects; why the offer the best performance and when they fail
    • nested documents
    • nested queries; using inner hits
    • parent-child relations
    • denormalizing and application-side joins
    • deciding on which feature/technique to use
    • Lab
      • model a one-to-one relationship
      • model a query-heavy one-to-many relationship
      • model an update-heavy one-to-many relationship
      • model a many-to-many relationship
  4. Percolator
    • percolator basics
    • configuring mappings for percolation
    • using routing, filters, sorting and aggregations with the Percolator Query
    • Lab
      • using Percolator to trigger alerts
      • using metadata to filter and aggregate matching queries
  5. Suggesters
    • overview of types and requests
    • term vs. phrase suggester
    • how the phrase suggester collects candidates
    • using a shingle field to score candidate phrases
    • completion vs context suggesters
    • completion suggesters vs prefix queries
    • mapping for completion suggesters
    • weights and fuzzy matches
    • payloads for instant-search kind of autocomplete
    • Lab
      • using the term suggester to suggest single word corrections
      • using the phrase suggester against a shingle field for multi-word suggestions
      • using a separate index for autocomplete
      • using the _suggest endpoint instead of _search
      • boosting suggestions via static weights
      • add fuzzy support for suggestions
      • filtering suggestions
      • using metadata for ranking suggestions (terms, location)
  6. Geo-spatial search
    • Basics: geo-point and geo-shape types
    • how shape matching is done via geohashes
    • distance, distance range and bounding box queries
    • Lab
      • indexing geo-points and searching them via bounding box and polygon queries
      • filtering and aggregating geo-points by distance
      • matching a shape against a point
  7. Highlighting
    • how the default highlighter works
    • common highlighter options: size, order and number of fragments
    • postings highlighter: overhead, use-cases, mapping
    • fast vector highlighter: using term vectors for extra flexibility
    • Lab
      • selecting fields to highlight and disabling _source from the response
      • choosing highlight tags, number of fragments, their size and order
      • using the postings highlighter for long natural language fields
      • using the fast vector highlighter for multi-fields