Elasticsearch for Developers

June 13-14 (Mon & Tue) – 9:00 am to 5:00 pm

This training is taking place at New Horizons Computer Learning Center in Midtown Manhattan Once you register, we will send you a confirmation email that will include the information you will need to attend this training.   Cost: $1,200 “early bird rate” (valid through June 1) and $1,500 afterwards.   There’s also a 50% discount for the purchase of a 2nd seat! (limit of 1 discounted seat per full-price seat)  


For those of you interested in some comprehensive Elasticsearch and ELK Stack training taught by an expert from Sematext who knows it inside and out, we’re running a super hands-on training workshop in New York City on June 13 – 14. Each section is followed by a lab with multiple hands­-on exercises. In two days of training Radu will:
  1. Bring Elasticsearch novices to the level where he/she would be comfortable with taking Elasticsearch to production
  2. Give experienced Elasticsearch users proven and practical advice based on years of experience designing, tuning, and operating numerous Elasticsearch clusters to help with their most advanced and pressing issues
Radu Gheorghe

This two-day workshop will be taught by Sematext engineer — and author of Elasticsearch books.

Radu Gheorghe

Audience / Level / Pre-requisites


Developers and Devops who want to configure, tune and manage Elasticsearch and ELK Stack at scale.


Attendees are encouraged to arrive at least 20 minutes early to class and on time after each break.

Important We require all participants to bring their own laptop during the workshop. Laptops are required with the latest version of Java installed in Mac, Linux or Windows. You should be comfortable using a terminal or command line.

Course Structure

Each section is followed by a lab with multiple hands­-on exercises.

What We Provide

For this training Sematext provides:
  • A digital copy of the training materials will be available on the portal 48 hours prior to the training course. Please read our Public Training Agreement
  • Refreshments. This usually includes coffee, tea, juices, soft drinks, and water to keep you hydrated.
  • Snacks. This usually includes croissants, bagels, danishes, or other pastry.

Course Outline


  1. Basic flow of data in Elasticsearch
    • what is Elasticsearch and typical use­-cases installation
    • shards and replicas; packaging
    • installation; configuration files
    • indexing; what is an index, type and ID
    • mappings; stored and indexed fields; _source and _all
    • analysis basics
    • realtime get
    • search; how searches are distributed to shards
    • ranking by TF/IDF and BM25
    • aggregations and doc values introduction
    • updates; versioning
    • deletes; introduction to Lucene segment merges
    • Lab
      • CRUD operations
      • query and filter
      • pagination
  3. Controlling how data is indexed and stored
    • mappings and mapping types
    • multi-field definitions
    • default mappings; dynamic mappings
    • texts, keywords, integers and other core types
    • predefined fields; when to store fields separately vs using _source
    • analyzers; using the Analyze API
    • char filters
    • tokenizers: standard vs whitespace
    • token filters: lowercase, stopwords, synonyms, ngrams and shingles
    • using the Reindex API when mappings need to be changed
    • Lab
      • exact match vs full-text search
      • using the asciifolding token filter for better internationalization
      • using language analyzers to support stemming
      • using the letter tokenizer as an option for URL matching
      • using ngrams to tolerate typos
      • using shingles to match compound words
      • implement hashtag search via the word delimiter token filter
  5. Searching through your data
    • a deeper look into BM25
    • selecting fields, source filtering and fielddata fields
    • sorting and pagination
    • search basics: term, range and bool queries
    • enable caching through the filter context
    • match query: configuring the analyzer, operator, common terms and fuzziness
    • multi-match query: choosing between best fields, most fields and cross fields modes
    • query string and simple query string queries
    • tweaking the score with the function score query
    • Lab
      • using various ways of selecting fields
      • configure sorting and pagination
      • using a bool query to combine different match, range and term queries
      • boosting exact matches above stemmed ones
      • searching across multiple fields
      • boosting documents based on date and number of views
      • typo tolerance without using ngrams
      • reducing the impact of common words without using stopwords
  7. Aggregations
    • relationships between queries and aggregations; post filter, global aggregations
    • general optimizations: avoid script fields, set result size to 0 to cache
    • metrics aggregations: stats, cardinality, percentiles
    • multi-bucket aggregations: terms, significant terms, sampler, ranges and histograms
    • single-bucket aggregations and nesting; how nesting works
    • filter and top hits aggregations
    • pipeline aggregations; moving averages
    • Lab
      • configure sizes of results, per-shard and overall buckets
      • using the global aggregation and post filters
      • using the filters aggregation
      • computing the cardinality of a field
      • sorting buckets by results of sub-aggregations
      • optimizing terms queries by configuring collect mode
      • using the sum, histogram, average bucket and significant terms aggregations
  9. Working with relational data
    • arrays and objects
    • nested documents
    • parent­-child relations
    • denormalizing and application­-side joins
    • Lab
      • model a one-to-one relationship
      • model a query-heavy one-to-many relationship
      • model an update-heavy one-to-many relationship
      • model a many-to-many relationship
  11. Beyond keyword search
    • percolator basics
    • configuring mappings for percolation
    • using routing, filters, sorting and aggregations with the Percolator Query
    • suggesters: overview of types and requests
    • term vs phrase suggester
    • how the phrase suggester collects candidates
    • using a shingle field to score candidate phrases
    • completion vs context suggesters
    • completion suggesters vs prefix queries
    • mapping for completion suggesters
    • weights and fuzzy matches
    • payloads for instant-search kind of autocomplete
    • geo-spatial search basics: geo-point and geo-shape types
    • using geohashes for geo-point filtering and aggregations
    • how shape matching is done via geohashes
    • distance, distance range and bounding box queries
    • highlighting: how the default highlighter works
    • common highlighter options: size, order and number of fragments
    • postings highlighter: overhead, use-cases, mapping
    • fast vector highlighter: using term vectors for extra flexibility
    • Lab
      • indexing geo-points and searching them via bounding box and polygon queries
      • filtering and aggregating geo-points by distance
      • filtering and aggregating by geohash
      • matching a shape against a point
      • using Percolator to trigger alerts
      • using metadata to filter and aggregate matching queries
      • using the term suggester to suggest single word corrections
      • using the phrase suggester against a shingle field for multi-word suggestions
      • using a separate index for autocomplete
      • using the _suggest endpoint instead of _search
      • boosting suggestions via static weights
      • add fuzzy support for suggestions
      • filtering suggestions
      • using metadata for ranking suggestions (terms, location)
      • selecting fields to highlight and disabling _source from the response
      • choosing highlight tags, number of fragments, their size and order
      • using the postings highlighter for long natural language fields
      • using the fast vector highlighter for multi-fields
  13. Performance and Scaling
    • bulk, multiget and multisearch APIs
    • JVM vs OS caches
    • field data vs doc values
    • how often to commit: translog, index buffer and refresh interval
    • how data and queries are distributed: routing, search type and shard preference
    • using scroll for deep paging
    • choosing the number of shards and replicas
    • network settings
    • locking all memory on startup
    • node roles; minimum master nodes
    • time-based indices and aliases
    • rolling indices by size
    • tribe node: how it works and when to use it
    • Lab
      • using the Bulk API
      • using routing
      • getting global document frequencies for more accurate scoring
      • using scroll to go over all documents matching a query
      • adjusting the refresh interval
      • starting a two node cluster; adjust index settings
      • using a tribe node to search in two clusters