Intro to Elasticsearch, December 4-5, 2017

Days: December 4-5, 2017

Time: 9:00 AM to 1:00 PM EDT each day

Cost: $800 per participant

Overview

Comprehensive 2-day sessions (two 4-hour sessions), this Elasticsearch online class is taught by Radu Gheorghe, a seasoned Elasticsearch instructor, and consultant from Sematext, author of “Elasticsearch in Action”, and frequent conference speaker. The training is held online from 9:00 am – 1:00 pm (ET).

After taking this course you will:
  • understand all core Elasticsearch concepts – data, master, and client nodes and their differences, sharding, replication, mapping, search relevance scores, etc.
  • be able to index data into Elasticsearch and retrieve it using search and realtime get APIs
  • have a solid grasp of the underlying query parsing, analysis, tokenization, and various types of queries.
  • learn about a number of different types of Elasticsearch aggregations
Each section is followed by a lab with multiple hands-on exercises. See course outline below for more.

Who Should Attend

This Elasticsearch online course is designed for technical attendees with any knowledge level. No prior Elasticsearch experience or knowledge is required. Experience with Linux systems is not a must, but basic familiarity with running shell commands (e.g., using curl command) will make the course more enjoyable.

Prerequisites

None, just desire to learn!

Why Attend

The virtual Elasticsearch training gives you and your team the skills needed to successfully use Elasticsearch capabilities by improving your workflow and increasing efficiency.  Further benefits:
  • a customized learning experience
  • same high-quality instruction as our public or private Elasticsearch classes
  • more affordable than public training
  • more flexible – no need to travel

Things to Remember

For the online training all participants must use their own computer with OSX, Linux, or Windows, with the latest version of Java installed.  Participants should be comfortable using a terminal / command line. Sematext provides:
  • a digital copy of the training material
  • a VM with all configs, scripts, exercises, etc.

Course Outline

Modules

  1. Basic flow of data in Elasticsearch
    • what is Elasticsearch and typical use-cases
    • shards and replicas; packaging
    • installation; configuration files
    • indexing; what is an index, type and ID
    • mappings; stored and indexed fields; _source and _all
    • analysis basics
    • realtime get
    • search; how searches are distributed to shards
    • ranking by TF/IDF and BM25
    • aggregations and doc values introduction
    • updates; versioning
    • deletes; introduction to Lucene segment merges
    • Lab
      • CRUD operations
      • query and filter
      • pagination
  2. Controlling how data is indexed and stored
    • mappings and mapping types
    • multi-field definitions
    • default mappings; dynamic mappings
    • texts, keywords, integers and other core types
    • date formats
    • predefined fields; when to store fields separately vs using _source
    • analyzers; using the Analyze API
    • char filters
    • tokenizers: standard vs whitespace
    • token filters: lowercase, stopwords, synonyms, ngrams and shingles
    • Lab
      • exact match vs full-text search
      • using the asciifolding token filter for better internationalization
      • using language analyzers to support stemming
  3. Searching through your data
    • selecting fields, source filtering and fielddata fields
    • sorting and pagination
    • search basics: term, range and bool queries
    • enable caching through the filter context
    • match query: configuring the analyzer, operator, common terms and fuzziness
    • query string and simple query string queries
    • Lab
      • using various ways of selecting fields
      • configure sorting and pagination
      • using a bool query to combine different match, range and term queries
      • boosting exact matches above stemmed ones
  4. Aggregations
    • relationships between queries and aggregations; post filter, global aggregations
    • general optimizations: avoid script fields, set result size to 0 to cache
    • metrics aggregations: stats, cardinality, percentiles
    • why terms, cardinality and percentiles are approximate
    • multi-bucket aggregations: terms, ranges and histograms
    • single-bucket aggregations and nesting; how nesting works
    • Lab
      • configure sizes of results, per-shard and overall buckets
      • computing the cardinality of a field
      • sorting buckets by results of sub-aggregations
      • optimizing terms queries by configuring collect mode
      • nest the sum and histogram aggregations