Elasticsearch for Logging, San Francisco – Fall 2016

Oct 7, 2016 (Fri) – 9:00 am to 5:00 pm

Completed

Overview

Radu Gheorghe

Comprehensive 1-day Elasticsearch class taught by Radu Gheorghe a seasoned Elasticsearch instructor and consultant from Sematext, author of Elasticsearch in Action, and frequent conference speaker. After taking this course you will know how to set up and use Kibana and Timelion, build different types of visualizations, how to create dashboards, dig in with sub-aggregations, and use Kibana to search through data. We’ll cover log shipping with Logstash, various Beats, Logagent, and rsyslog. This will cover various inputs, outputs, using Elasticsearch Ingest node, using grok, and so on. See course outline below for more. Each section is followed by a lab with multiple hands-on exercises.


Who Should Attend

The course is designed for technical attendees with basic Elasticsearch experience, as we’ll focus on the tooling around Elasticsearch. A person should be able to index data to Elasticsearch, run queries and aggregations, work with mappings and analysis.

Experience with Linux systems is not a must, but a basic familiarity with running shell commands (e.g., using curl command) will make the course more enjoyable. If you do not have prior Elasticsearch experience, we strongly suggest you consider attending our Core Elasticsearch training first.

For running a logging setup in production, with a non-trivial volume of logs, one needs a good understanding of performance, scaling, monitoring and administering the components involved. While we cover these aspects for ETL tools (Logstash, Logagent, etc) here, the equivalent Elasticsearch part is covered in our Elasticsearch Operations course.

If you do not have prior Elasticsearch experience, we strongly suggest you consider attending our Intro to Elasticsearch class first.

Prerequisites

Intro to Elasticsearch or pre-existing knowledge of Elasticsearch concepts covered in Intro to Elasticsearch

Things To Remember

  • Arrive at least 20 minutes early to class and on time after each break.
  • Participants must bring their own laptop with OSX, Linux or Windows to the workshop. Laptops should have the latest version of Java installed. You should be comfortable using a terminal / command line.

  • If you have any dietary restrictions be sure to let us know at least a week prior to the training.

What We Provide

For this training Sematext provides:
  • A digital copy of the training material
  • A VM with all configs, scripts, exercises, etc.
  • Breakfast, lunch, snacks, coffee, tea, juices, soft drinks, and water

Course Outline

Modules

  1. Data visualization through Kibana
    • installation and configuration
    • index patterns; refreshing the fields list
    • discovering and searching raw data
    • Lucene query syntax
    • visualizing data; types of visualizations and their use
    • Timelion charts; using the Timelion query language
    • building dashboards
    • Lab
      • building complex queries through the Lucene query syntax
      • digging deeper into data through sub-aggregations
      • building dashboards on top of saved searches and visualizations
      • comparing different data series in Timelion (raw average vs moving average)
  2. Data ingestion through Logstash
    • installation
    • inputs: popular input plugins and their configuration options
    • codecs: parsing JSON and multiline logs
    • filters: using grok and geoip to parse and enrich data
    • outputs: popular output plugins and their options
    • pipeline pattern: using Logstash on every logging box
    • using Logstash with Kafka and Redis as a buffer
    • adjusting pipeline workers and batch sizes
    • adjusting Logstash heap size
    • specific plugin tunables
    • Lab
      • configuring Logstash to parse and enrich Apache logs
      • tuning Logstash for throughput
      • using Logstash with Kafka
  3. Data collection using Beats
    • installation: Packetbeat, Topbeat, Filebeat
    • Filebeat tunables
    • parsing JSON logs
    • sending logs directly to Elasticsearch
    • using Ingest nodes
    • sending logs directly to Logstash
    • sending logs to Logstash via Kafka
    • Lab
      • setting up TopBeat to push metrics to Elasticsearch
      • shipping parsing Apache logs via Filebeat and Ingest node
      • shipping and parsing Apache logs via Filebeat and Logstash
  4. Data collection using rsyslog
    • installation
    • plugins: main input modules and their configurations
    • message modifiers: using mmnormalize to parse unstructured data in a scalable way
    • parsing JSON logs
    • using grok in rsyslog
    • tuning queues, workers and batch sizes
    • Rainerscript: variables, conditionals, loops and lookup tables
    • using rulesets to manage multiple data flows
    • writing data to Elasticsearch
    • coupling rsyslog with Logstash via Redis/Kafka
    • Lab
      • sending local syslog to Elasticsearch
      • tailing files with rsyslog and sending them to Kafka
      • using rulesets to separate local and remote logs
      • parsing logs with mmnormalize and sending them to Elasticsearch
  5. Data collection using Logagent-js
    • installation
    • running on-demand or as a service
    • parsing rules
    • GeoIP matching and database updates
    • UDP syslog and other listeners
    • Lab
      • parsing and sending local Apache and syslog to Elasticsearch
      • build a pipeline from rsyslog to Elasticsearch through Logagent
  6.