Elasticsearch Operations, September 27-28, 2017

Days: September 27-28, 2017

Time: 9:00 AM to 1:00 PM EDT each day

Cost: $720 (early bird until July 25) $800 after / participant

Overview

Comprehensive 2-day sessions (two 4-hour sessions), this Elasticsearch online class is taught by Radu Gheorghe, a seasoned Elasticsearch instructor, and consultant from Sematext, author of “Elasticsearch in Action”, and frequent conference speaker. The training is held online from 9:00 am – 1:00 pm (ET).

After taking this course you will learn about:
  • everything you need to handle your Elasticsearch clusters in production – from tuning OS and JVM for performance through commits, merge policies and caches, query routing, scrolling, thread pools, and so on.
  • a number of tips and tricks for scaling out your cluster, different types of nodes and deployment topologies, best way to handle time-based indices, etc.
  • various Elasicsearch APIs important for keeping your cluster healthy, about backups, hot threads, logging, monitoring tools and so on.
Each section is followed by a lab with multiple hands-on exercises. See course outline below for more.

Who Should Attend

The course is designed for technical attendees with basic Elasticsearch experience. A person should be able to index data to Elasticsearch, run queries and aggregations, work with mappings and analysis. Experience with Linux systems is not a must, but a basic familiarity with running shell commands (e.g., using curl command) will make the course more enjoyable. If you do not have prior Elasticsearch experience, we strongly suggest you consider attending our Intro to Elasticsearch class first.

 

Prerequisites

Intro to Elasticsearch or pre-existing knowledge of Elasticsearch concepts covered in Intro to Elasticsearch

Why Attend

The virtual Elasticsearch training gives you and your team the skills needed to successfully use Elasticsearch capabilities by improving your workflow and increasing efficiency.  Further benefits:
  • a customized learning experience
  • same high-quality instruction as our public or private Elasticsearch classes
  • more affordable than public training
  • more flexible – no need to travel

Things to Remember

For the online training all participants must use their own computer with OSX, Linux, or Windows, with the latest version of Java installed.  Participants should be comfortable using a terminal / command line. Sematext provides:
  • a digital copy of the training material
  • a VM with all configs, scripts, exercises, etc.

Course Outline

Modules

  1. Performance tuning
    • bulk, multiget and multisearch APIs
    • OS cache vs JVM heap
    • locking memory on startup
    • sizing the heap to allow for just enough overhead: how big is your live set?
    • dealing with high GC, especially with big heaps
    • the controversial G1 GC: when to use it, when not to
    • managing field data if it’s needed: size, circuit breakers, filtering
    • eager loading of field data and global ordinals
    • query and request cache sizing
    • page recycler cache: what it is and how to size it
    • how often to commit: translog, index buffer and refresh interval
    • merge policies: when to tune, when to force merges
    • how data and queries are distributed: routing, search type and shard preference
    • using scroll for deep paging
    • a closer look into doc values
    • thread pools: when to tune them and how
    • hardware considerations: CPU vs RAM vs Disk throughput & latency
    • the controversial network storage: when to use it and how
    • Lab
      • using the bulk, multiget and multisearch APIs
      • using routing
      • getting global document frequencies for more accurate scoring
      • using scroll to go over all documents matching a query
      • querying locally stored shards
      • adjusting heap size and GC settings
      • tuning query cache, field data and index buffer sizes
      • adjusting the refresh interval and translog thresholds
      • tuning the merge policy for heavy indexing
      • sizing threadpool queues for use-cases with many shards
  2. Scaling out
    • unicast settings; network settings
    • minimum master nodes
    • choosing the number of shards and replicas
    • node roles
    • when to use dedicated masters and dedicated load balancer nodes
    • time-based indices and aliases
    • rolling indices by size
    • using shard allocation for high availability between racks and availability zones
    • setting up a tiered cluster
    • using the Cluster Reroute API
    • configuring the delayed timeout and synced flush for when a node leaves
    • using tribe nodes
    • Lab
      • setting up a two node cluster
      • configuring delayed allocation
      • adjusting index settings on creation and at runtime
      • setting up a tiered cluster
      • setting up a dedicated master and a dedicated load balancer
      • configuring allocation awareness for two availability zones
      • setting up recovery to start only when a quorum of nodes is up
      • tuning recovery and relocation settings to the network bandwidth
      • setting up a tribe node
  3. Monitor and administer your cluster
    • index and search templates
    • order of applying index templates
    • snapshot and restore
    • how incremental snapshots work
    • index and cluster stats APIs
    • segments and recovery APIs
    • cluster health API: what do the color codes mean
    • cat APIs: health, indices, shards
    • monitoring products
    • important metrics to watch and alert on; what they indicate
    • logging; changing settings on the fly
    • hot threads API
    • Lab
      • working with index templates
      • using the Cat API to check nodes and thread pool stats
      • exploring the cluster state
      • gathering query slow logs