Skip to main content

Advanced Elasticsearch

Go beyond Keyword Search

If you’re a developer looking to do more, in this course you will learn to index data into Elasticsearch and retrieve it using search and realtime get APIs. You will have a solid grasp of the underlying query parsing. Analysis, tokenization, and various types of queries.

Your trainer is an active Elasticsearch consultant who worked with clients from 20+ different industries and the author of Elasticsearch in Action.

Here are some problems Radu Gheorghe, your Elasticsearch trainer, solved for Sematext clients recently:

  • Improved search relevancy using Learning to Rank
  • Optimized multiple petabyte-scale clusters. Some up to 400 nodes.
  • Designed Elasticsearch index and cluster architecture for dozens of clients
  • Optimized log ingestion pipelines to parse and enrich 100K+ events/second
  • Helped clients reduce production Elasticsearch and ingestion pipeline costs by as much as 10x

A word from Radu Gheorghe

“Attendees come in highly motivated, making the class feel more “alive” than I expected. They constantly look for takeaways to improve their setup, from tweaking a boost to changing the sharding strategy. Their use-cases are very diverse, too, so we end up covering a lot of material.

Radu Gheorghe Sematext Elasticsearch Training Instructor

8-hour online class available upon request

Looking for an extended knowledge-based introduction to Elasticsearch training? You’ve come to the right place.

Request Now

Why attend?

  • Small, interactive, instructor-led classes
  • Lots of hands-on exercises
  • Customized learning experience
  • More flexible – no need to travel
  • Certificate of Completion included

Who should attend?

This Elasticsearch course is designed for technical attendees with basic Elasticsearch experience. A person should be able to index data to Elasticsearch, run queries and aggregations, work with mappings and analysis.

Experience with Linux systems is not a must, but a basic familiarity with running shell commands (e.g., using curl command) will make the course more enjoyable. If you do not have prior Elasticsearch experience, we strongly suggest you consider attending our Intro to Elasticsearch class first.

What attendees say

Sematext was an ideal training partner for Parse.ly. We had just recently adopted Elasticsearch on a new project, and they gave us two days of solid training that was tailored to our team’s needs. The material was built atop strong foundations and moved quickly into advanced areas around querying, Lucene internals, and cluster performance. It was clear that it was all informed by real-world experience operating these systems at scale.

Andrew Montalenti CTO/Founder – Parse.ly

Course Outline

Relevancy tuning
  • Analysis: stopwords, synonyms, ngrams and shingles and their alternatives
  • Using the Reindex API when mappings need to be changed
  • A deep look into BM25
  • Multi-match query: choosing between best fields, most fields and cross fields modes
  • Tweaking the score with the function score query
  • Lab
    • Using the letter tokenizer as an option for URL matching
    • Using ngrams to tolerate typos
    • Using shingles to match compound words
    • Implement hashtag search via the word delimiter token filter
    • Searching across multiple fields
    • Boosting documents based on date and number of views
    • Typo tolerance without using ngrams
    • Reducing the impact of common words without using stopwords
Advanced aggregations
  • Finding trends and outliers with the significant terms aggregation
  • Cheaper and more representative results with the sampler aggregation
  • Field collapsing with the top hits aggregations
  • Pipeline aggregations; moving averages
  • Lab
    • Checking trends the significant terms aggregation
    • Show the latest hit per category
    • Using the moving average aggregation
Working with relational data
  • Arrays and objects; why they offer the best performance and when they fail
  • Nested documents
  • Nested queries; using inner hits
  • Parent-child relations
  • Denormalizing and application-side joins
  • Deciding on which feature/technique to use
  • Lab
    • Model a one-to-one relationship
    • Model a query-heavy one-to-many relationship
    • Model an update-heavy one-to-many relationship
    • Model a many-to-many relationship
Percolator
  • Percolator basics
  • Configuring mappings for percolation
  • Using routing, filters, sorting and aggregations with the Percolator Query
  • Lab
    • Using Percolator to trigger alerts
    • Using metadata to filter and aggregate matching queries
Suggesters
  • Overview of types and requests
  • Term vs. phrase suggester
  • How the phrase suggester collects candidates
  • Using a shingle field to score candidate phrases
  • Completion vs context suggesters
  • Completion suggesters vs prefix queries
  • Mapping for completion suggesters
  • Weights and fuzzy matches
  • Payloads for instant-search kind of autocomplete
  • Lab
    • Using the term suggester to suggest single word corrections
    • Using the phrase suggester against a shingle field for multi-word suggestions
    • Using a separate index for autocomplete
    • Using the _suggest endpoint instead of _search
    • Boosting suggestions via static weights
    • Add fuzzy support for suggestions
    • Filtering suggestions
    • Using metadata for ranking suggestions (terms, location)
Geo-spatial search
  • Basics: geo-point and geo-shape types
  • How shape matching is done via geohashes
  • Distance, distance range and bounding box queries
  • Lab
    • Indexing geo-points and searching them via bounding box and polygon queries
    • Filtering and aggregating geo-points by distance
    • Matching a shape against a point
Highlighting
  • How the default highlighter works
  • Common highlighter options: size, order and number of fragments
  • Postings highlighter: overhead, use-cases, mapping
  • Fast vector highlighter: using term vectors for extra flexibility
  • Lab
    • Selecting fields to highlight and disabling _source from the response
    • Choosing highlight tags, number of fragments, their size and order
    • Using the postings highlighter for long natural language fields
    • Using the fast vector highlighter for multi-fields

Main Topics

  • Tuning for Relevancy
  • Aggregations 202: Significant Terms, Top Hits, Pipeline Aggregations
  • Relational Data in Elasticsearch
  • Percolator
  • Did-You-Mean and Autocomplete Suggesters
  • Highlighting
  • Geo Search and Aggregations

Elasticsearch Training

Course key takeaways

After taking this course you will know:

  • How to implement product search like a pro: from type-ahead and highlighting to boosting recent products and promotions
  • Advanced data analysis: from showing categories most relevant to the search to exploring trends

  • How to implement a large-scale notifying or tagging system using Percolator
  • What kind of joins you can do in Elasticsearch and how

Things to remember

Participants must use their own computer with OSX, Linux, or Windows, with a recent version of Java installed.

Participants should be comfortable using a terminal/command line. Sematext provides:
  • A digital copy of the training material
  • A VM with all configs, scripts, exercises, etc.

Want to master your Elasticsearch use case faster?

Pick from a wide range of short (2h), use case focused classes to fit your exact needs

  • Online
  • 2-hours
  • Use-case focused
  • Instructor-led

Elasticsearch Fundamentals

Understand how Elasticsearch works and get started with setting it up for either search or log aggregation.

Read more

Kibana and Logstash Fundamentals

Get started with Logstash and Kibana, so you can build an ELK stack: from parsing logs to building dashboards.

Read more

Elasticsearch Scaling 101

Learn about how nodes and shards work, so you can scale your Elasticsearch cluster from PoC to as much as you hardware can hold.

Read more

Elasticsearch Scaling 202

Learn index and cluster architectures that make clusters scale, from time- and size-based indices, to cross-cluster search.

Read more

Elasticsearch Tuning 101

From caches and refreshes to routing, learn about the most important knobs that influence both indexing and search performance.

Read more

Elasticsearch Tuning 202

From hardware choices, to garbage collection, merge policy and thread pool tuning – learn how to squeeze even more performance from your cluster.

Read more

Monitoring Elasticsearch

Bridge the gap between having proper Elasticsearch monitoring in place and understanding how to diagnose and troubleshoot the cluster.

Read more

Administering Elasticsearch

Fully understand Elasticsearch’s management capabilities: from pre-configuring index settings and mappings to how to safely perform upgrades.

Read more

Need On-Site or Remote Training

Get in touch with us

Stay up to date

Get tips, how-tos, and news about Elastic / ELK Stack, Observability, Solr, and Sematext Cloud news and updates.

Sematext Newsletter