Publications by Sematext Engineers

Reference Architecture: Monitoring and Logging for Docker Datacenter

Docker Datacenter (DDC) simplifies container orchestration and increases the flexibility and scalability of application deployments. However, the high level of automation create new challenges for monitoring and log management. Why? Because each container typically runs a single process, has its own environment, utilizes virtual networks, or has various methods of managing storage.

Performance Monitoring Essentials – Elasticsearch Edition

Elasticsearch is booming. Together with Logstash, a tool for collecting and processing logs, and Kibana, a tool for searching and visualizing data in Elasticsearch (collectively they comprise the “ELK stack”), adoption of Elasticsearch continues to grow by leaps and bounds. In this detailed booklet Sematext’s DevOps Evangelist, Stefan Thies, walks readers through Elasticsearch and ELK stack basics and supplies numerous graphs, diagrams and infographics to clearly explain the essential elements. There is also a “Top 10 Elasticsearch Metrics” list with corresponding explanations and screenshots. The booklet will be especially helpful to those readers new to Elasticsearch and ELK stack, and also to experienced users who want a quick start into performance monitoring.

Log Management & Analytics – A Quick Guide to Logging Basics

This all-things-Logging booklet will especially appeal to readers who are looking to replace Splunk or a similar commercial application with Elasticsearch, Logstash, and Kibana (aka, “ELK stack”) or an alternative logging stack. Logging experts how-to instructions, screenshots, code, and more include: 5-Minute Logstash: Parsing and Sending a Log File, Encrypting Logs on Their Way to Elasticsearch, Recipe: rsyslog + Elasticsearch + Kibana, and Structured Logging with rsyslog and Elasticsearch. For more information about logging, see logging posts on Sematext Blog.

Lucene in Action – Second Edition

When Lucene first appeared, this superfast search engine was nothing short of amazing. Today, Lucene still delivers. Its high-performance, easy-to-use API, features like numeric fields, payloads, near-real-time search, and huge increases in indexing and searching speed make it the leading search tool. And with clear writing, reusable examples, and unmatched advice, Lucene in Action, Second Edition is still the definitive guide to effectively integrating search into your applications. It introduces you to searching, sorting, and filtering, and covers the numerous improvements to Lucene since the first edition. Source code is for Lucene 3.0.1.

Mastering Elasticsearch – Second Edition

“Mastering Elasticsearch – Second Edition” covers intermediate and advanced functionalities of Elasticsearch and walks you through its internals including caches, the Apache Lucene library, and its monitoring capabilities. You’ll learn about practical usage of Elasticsearch configuration parameters and how to use the monitoring API. With this book, you’ll delve into Elasticsearch’s query rewrite, query template, bulk operation, document grouping, and function score queries. You will also learn how to improve user search experience, index distribution, segment statistics, and merging. By the end of the book, you will be able to enhance Elasticsearch’s performance and create your own Elasticsearch plugins.

Apache Solr 4 Cookbook

Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features. “Apache Solr 4 Cookbook” will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.

Elasticsearch Server – Second Edition

This book begins by introducing the most commonly used Elasticsearch server functionalities, from creating your own index structure, through querying, faceting, and aggregations, and ends with cluster monitoring and problem diagnosis. As you progress through the book, you will cover topics such as starting Elasticsearch, creating a new index, and designing its proper structure. After that, you’ll read about the query API that Elasticsearch exposes, as well as about filtering capabilities, aggregations, and faceting. Last but not least, you will get to know how to find similar documents by using similar functionalities and how to implement application alerts by using the prospective search functionality called percolator. Some advanced topics such as shard allocation control, gateway configuration, and how to use the discovery module will also be discussed. This book will also show you the possibilities of cluster state and health monitoring as well as how to use third-party tools.

Elasticsearch in Action

Elasticsearch makes it easy to add efficient and scalable searches to your enterprise applications. Busy administrators and developers love this open source real-time search and analytics engine because they can simply install it, make a few tweaks, and go on with their work. And once Elasticsearch is up and running, you’ll discover that it’s miles deep, so you can build nearly any custom search solution you can imagine. The book focuses on Elasticsearch’s REST API via HTTP. Code snippets are written mostly in bash using curl, which makes them easily translatable to other languages.

Spark in Action

Spark in Action teaches you to use Spark for stream and batch data processing. It starts with an introduction to the Spark architecture and ecosystem followed by a taste of Spark’s command line interface. You then discover the most fundamental concepts and abstractions of Spark, particularly Resilient Distributed Datasets (RDDs) and the basic data transformations that RDDs provide. The first part of the book also introduces you to writing Spark applications using the the core APIs. Next, you learn about different Spark components: how to work with structured data using Spark SQL, how to process near-real time data with Spark Streaming, how to apply machine learning algorithms with Spark MLlib, how to apply graph algorithms on graph-shaped data using Spark GraphX, and a clear introduction to Spark clustering.

Apache Solr 3.1 Cookbook

This cookbook will show you how to get the most out of your search engine. Each chapter covers a different aspect of working with Solr from analyzing your text data through querying, performance improvement, and developing your own modules. The practical recipes will help you to quickly solve common problems with data analysis, show you how to use faceting to collect data and to speed up the performance of Solr. This practical guide shows you how to get the most out of Apache Solr 3.1 with recipes that show you how to improve your search engine’s performance, analyze data quickly and efficiently, and customize the search server with your own modules.

The Essential Apache HBase Cheat Sheet

HBase is the Hadoop database: a distributed, scalable Big Data store that lets you host very large tables — billions of rows multiplied by millions of columns — on clusters built with commodity hardware Just as Google Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.