Publications
Books
Sematext Books
Lucene in Action – Second Edition
When Lucene first appeared, this superfast search engine was nothing short of amazing. Today, Lucene still delivers. Its high-performance, easy-to-use API, features like numeric fields, payloads, near-real-time search, and huge increases in indexing and searching speed make it the leading search tool. And with clear writing, reusable examples, and unmatched advice, Lucene in Action, Second Edition is still the definitive guide to effectively integrating search into your applications. It introduces you to searching, sorting, and filtering, and covers the numerous improvements to Lucene since the first edition. Source code is for Lucene 3.0.1.
Solr Cookbook – Third Edition
Similar to the previous edition of cookbook, we took the time to rebuild the book and all recipes were updated, half of the previous content has been thrown away and new content was added. The very important thing in our minds is that Solr Cookbook Third Edition covers Solr 4.x version (basing on the newest 4.10.3 version of Solr) and Solr 5.0 which should be released very soon.
The book is targeting beginners and intermediate users working with Apache Solr. You’ll find recipes that should make your life easier when you take the first steps with Solr and when you are encountering common problems that intermediate users tend to struggle with. However I don’t recommend the book for those of you who knows everything about Solr – you may find parts of the book interesting, but this book is not directed to you.
Mastering Elasticsearch – Second Edition
“Mastering Elasticsearch – Second Edition” covers intermediate and advanced functionalities of Elasticsearch and walks you through its internals including caches, the Apache Lucene library, and its monitoring capabilities. You’ll learn about practical usage of Elasticsearch configuration parameters and how to use the monitoring API. With this book, you’ll delve into Elasticsearch’s query rewrite, query template, bulk operation, document grouping, and function score queries. You will also learn how to improve user search experience, index distribution, segment statistics, and merging. By the end of the book, you will be able to enhance Elasticsearch’s performance and create your own Elasticsearch plugins.
Apache Solr 4 Cookbook
Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features. “Apache Solr 4 Cookbook” will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.
Elasticsearch Server – Second Edition
This book begins by introducing the most commonly used Elasticsearch server functionalities, from creating your own index structure, through querying, faceting, and aggregations, and ends with cluster monitoring and problem diagnosis. As you progress through the book, you will cover topics such as starting Elasticsearch, creating a new index, and designing its proper structure. After that, you’ll read about the query API that Elasticsearch exposes, as well as about filtering capabilities, aggregations, and faceting. Last but not least, you will get to know how to find similar documents by using similar functionalities and how to implement application alerts by using the prospective search functionality called percolator. Some advanced topics such as shard allocation control, gateway configuration, and how to use the discovery module will also be discussed. This book will also show you the possibilities of cluster state and health monitoring as well as how to use third-party tools.
Elasticsearch in Action
Elasticsearch makes it easy to add efficient and scalable searches to your enterprise applications. Busy administrators and developers love this open source real-time search and analytics engine because they can simply install it, make a few tweaks, and go on with their work. And once Elasticsearch is up and running, you’ll discover that it’s miles deep, so you can build nearly any custom search solution you can imagine. The book focuses on Elasticsearch’s REST API via HTTP. Code snippets are written mostly in bash using curl, which makes them easily translatable to other languages.
Spark in Action
Spark in Action teaches you to use Spark for stream and batch data processing. It starts with an introduction to the Spark architecture and ecosystem followed by a taste of Spark’s command line interface. You then discover the most fundamental concepts and abstractions of Spark, particularly Resilient Distributed Datasets (RDDs) and the basic data transformations that RDDs provide. The first part of the book also introduces you to writing Spark applications using the the core APIs. Next, you learn about different Spark components: how to work with structured data using Spark SQL, how to process near-real time data with Spark Streaming, how to apply machine learning algorithms with Spark MLlib, how to apply graph algorithms on graph-shaped data using Spark GraphX, and a clear introduction to Spark clustering.
Apache Solr 3.1 Cookbook
This cookbook will show you how to get the most out of your search engine. Each chapter covers a different aspect of working with Solr from analyzing your text data through querying, performance improvement, and developing your own modules. The practical recipes will help you to quickly solve common problems with data analysis, show you how to use faceting to collect data and to speed up the performance of Solr. This practical guide shows you how to get the most out of Apache Solr 3.1 with recipes that show you how to improve your search engine’s performance, analyze data quickly and efficiently, and customize the search server with your own modules.