At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

Search

OpenSearch RAG Tutorial with ml-commons

RAG is all the RAGe these days. If you don't know the term, it's short for "retrieval-augmented generation", which means that we're using retrieval (in our case OpenSearch) to augment

Scaling Elasticsearch by Cleaning the Cluster State

We often get questions like: How much data can I put in an Elasticsearch cluster? How many nodes can an Elasticsearch cluster have? What's the biggest cluster that you've seen?

Elasticsearch to OpenSearch Migration Facilitated by Sematext Cloud

OK, so you've decided to move from Elasticsearch to OpenSearch. Maybe our comparison helped you decide and maybe you've checked our guide on how to perform the migration. But how

Running OpenSearch on Kubernetes With Its Operator

If you’re thinking of running OpenSearch on Kubernetes, you have to check out the OpenSearch Kubernetes Operator. It’s by far the easiest way to get going, you can configure pretty

Using Solr Operator to Autoscale Solr on Kubernetes

In this tutorial, you'll see how to deploy Solr on Kubernetes. You'll also see how to use the Solr Operator to autoscale a SolrCloud cluster based on CPU with the

11 Small Search Platforms: Powerful Alternatives to Elasticsearch, OpenSearch, and Solr

Introduction In the ever-evolving world of search engines, Elasticsearch, OpenSearch, and Solr have long held the spotlight. However, there are several smaller search platforms that pack a punch and offer

Migration from Elasticsearch to OpenSearch

Introduction In this tutorial, we will guide you through the process of migrating from Elasticsearch to OpenSearch. OpenSearch is aan open-source search and analytics suite that is compatible with Elasticsearch.

OpenSearch vs Solr: Which One Is Better to Use?

If you’re looking for a short answer on OpenSearch vs Solr, here’s a flow chart: We normally recommend the one you (or your team) already know or the prefer because,

OpenSearch vs Elasticsearch: Which One Is Better to Use?

Whenever we start a search consulting project from scratch, the obvious question is: which search engine to use? We’ve talked about Elasticsearch vs Solr before, but here we’ll compare Elasticsearch

All About Solr Replica Placement Plugins

With Solr 9 the Autoscaling Framework was removed - for being too complex and not terribly reliable - and instead we have Replica Placement Plugins. Unlike Autoscaling, replica placement only

Writing a Custom Sort Plugin for Solr

OK, so you want to sort your documents by something that can’t be implemented with Solr’s built-in functions. This calls for a custom function, which you can implement through your

Autoscaling Elasticsearch Clusters for Logs: Using a Kubernetes Operator to Scale Up or Down

When we say “logs” we really mean any kind of time-series data: events, social media, you name it. See Jordan Sissel’s definition of time + data. And when we talk

OpenSearch 2.1 Release Highlights

OpenSearch 2.1 was recently released and here are the highlights: Snapshot Management: you could back up indices using Index Management before, but this only works well for time-series use-cases, like

solr-reindexer: Quick Way to Reindex to a New Collection

If you’re using Solr, for sure there are times when you change the schema and need to reindex. Quite often the source of truth is a database, so you can

Solr vs Elasticsearch: Performance Differences & More. How to Decide Which One Is Best for You

“Solr or Elasticsearch?”…well, at least that is the common question we hear from Sematext’s consulting services clients and prospects. Which one is better, Solr or Elasticsearch? Which one is faster?

Working with Solr Plugins System

Apache Solr was always ready to be extended. What was only needed is a binary with the code and the modification of the Solr configuration file, the solrconfig.xml and we

Solr-diagnostics: How to use it and what it collects

If you’re running Solr and have to troubleshoot it (or maybe you just want a good overview!), then you’d probably want to collect logs, configs, maybe a snapshot of metrics

Elasticsearch security: Authentication, Encryption, Backup

There’s no need to look outside the ELK Stack for apps to ensure data protection.  Basic Elasticsearch Security features are free and include a lot of functionality to help you

Entity Extraction for Product Searches

What is Entity Extraction? Entity extraction is, in the context of search, the process of figuring out which fields a query should target, as opposed to always hitting all fields.

Entity Extraction with spaCy

What is Entity Extraction? Entity extraction is, in the context of search, the process of figuring out which fields a query should target, as opposed to always hitting all fields.

Open Distro for Elasticsearch Review

Over the years the adoption of Elasticsearch and its ecosystem of tools positioned them as the leaders in the time series data management and analysis market. With strong search capabilities,

Entity Extraction with Scikit-learn Classifiers

What is entity extraction? Entity extraction is the process of figuring out which fields a query should target, as opposed to always hitting all fields. For example: how to tell,

Elastic Stack Features (formerly X-Pack) Alternatives Comparison

Elastic Stack Features (formerly X-Pack) is an Elastic Stack extension that bundles security, alerting, monitoring, reporting, and graph capabilities. One could use either all or specific components. Elastic Stack Features as

Using Solr to Tag Text

Over the years, natural language processing, in the world of search, went from interesting detail to a must have, especially in areas such as e-commerce. Engineers started incorporating classification, synonym

Search Relevance – Solr & Elasticsearch Similarities

What is Search Relevance Similarity Lucene has a lot of options for configuring similarity. By extension, Solr and Elasticsearch have the same options. Similarity makes the base of your relevancy

Solr Learning To Rank and Streaming Expressions

During the Entity Extraction For Product Searches talk that Radu Gheorghe and I gave at Activate conference in Montreal last year, we talked about various natural language processing and machine learning algorithms. We

Generating Word Embeddings with Gensim’s word2vec

During our Activate presentation, we talked about how to do query expansion by dynamically generating synonyms. Instead of statically defining synonyms lists, we showed a demo of how you could

Field Stats for Elasticsearch 6.x

We're excited to announce the release of the Field Stats API plugin for Elasticsearch. The Field Stats API used to be present from Elasticsearch 1.6 to 5.6, to provide efficient

Named Entity Extraction with OpenNLP

We recently had a presentation at Activate 2018 about entity extraction in the context of a product search. For example: how to tell, when the user typed in Activate 2018,

Garbage Collection Settings for Elasticsearch Master Nodes

Elasticsearch comes with good out-of-the-box Garbage Collection settings. So good in fact that the Definitive Guide recommends not changing them. While we agree that most use-cases wouldn’t benefit from GC

AWS Elasticsearch Service vs. Elasticsearch on EC2

Many of our customers use AWS EC2. In the context of Elasticsearch consulting or support, one question we often get is: should we use AWS Elasticsearch Service instead of deploying Elasticsearch ourselves? The

Solr Streaming Expressions for Collection auto-updating

One of the things that were extensively changed in Solr 6.0 is the Streaming Expressions and what we can do with them (hint: amazing stuff!). We already described Solr SQL

Solr 6, SolrCloud and SQL Queries

With the recent release of Apache Lucene and Solr 6, we should familiarize ourselves with the juicy features that come with them. We have the new default Similarity implementation -

Solr 7 – New Replica Types

With the release of Solr 7 the community around it produced yet another great version of this search engine. As usual, there is an extensive list of changes, bug fixes

Solr: Optimize Is (Not) Bad for You – Video & Slides

Another Lucene/Solr Revolution happened on September 12-15, 2017 in Las Vegas. Sematext was there, exhibiting AND giving two talks! Thanks to everyone who stopped by our booth and attended our two talks: Optimize Is (Not) Bad

Java 9 Elasticsearch Benchmark

TL;DR: The main question here is: How Does Java 9 Work with Elasticsearch 6? It works well, but don't expect miracles. Unless you're using G1, then there are some miracles. With

Search Guard – Security for Elasticsearch

Note: This is a guest post by Jochen Kressin, the CTO of floragunn GmbH, the makers of Search Guard, an open-source X-Pack Security alternative. Elasticsearch is a great piece of software.

Securing Elasticsearch and Kibana with Search Guard for free

Note: This is a guest post by Jochen Kressin, the CTO of floragunn GmbH, the makers of Search Guard, an open-source X-Pack Security alternative. In this article, we show you how

Solr V2 API – Quick Look

Last updated on Jan 11, 2018 We are all used to the Solr API that has been present in Solr from its beginnings. We send the data using HTTP protocol,

Sematext Solr AutoComplete: Introduction and Howto

Sematext Solr AutoComplete is an open-source Solr add-on that provides suggest-as-you-type functionality. In this post we'll explain how you can install it, load the autocomplete collection/core with suggestions and how

Solr New Metrics API: Quick look at Solr 6.4

As you know, in Sematext we looooove logs and metrics and we enjoy playing with them on a daily basis. We have our Logsene, which is all about logs and

Making Elasticsearch in Docker Swarm Elastic

Running Elasticsearch in Docker containers sounds like a natural fit - both technologies promise elasticity. However, running a truly elastic Elasticsearch cluster on Docker Swarm became somewhat difficult with Docker

Running Solr in Docker: How & Why

Docker is all the rage these days, but one doesn't hear about running Solr on Docker very much. Last month, we gave a talk on the topic of running containerized

Handling Shards in SolrCloud

Last updated on Jan 10, 2018 One of the things you learn when attending Sematext Solr training is how to scale Solr. We discuss various topics regarding leader shards and

DocValues Reindexing with Solr Streaming Expressions

Last updated on Jan 8, 2018 Last time, when talking about Solr 6 we learned how to use streaming expressions to automatically update data in a collection. You can imagine

Reindexing Data with Elasticsearch

Last updated on Jan 8, 2018 SIDE NOTE: We run Elasticsearch and ELK trainings, which may be of interest to you and your teammates. Sooner or later, you'll run into

Presentation: Large Scale Log Analytics with Solr

In this presentation from Lucene/Solr Revolution 2015, Sematext engineers -- and Solr and centralized logging experts -- Radu Gheorghe and Rafal Kuć talk about searching and analyzing time-based data at

Presentation: Log Analysis with Elasticsearch

Fresh from the Velocity NYC conference is the latest presentation from Sematext engineers Rafal Kuć and Radu Gheorghe — “From zero to production hero: Log Analysis with Elasticsearch.” The talk

SolrCloud: Dealing with Large Tenants and Routing

Last updated on Jan 10, 2018 Many Solr users need to handle multi-tenant data. There are different techniques that deal with this situation: some good, some not-so-good. Using routing to handle such

Replaying Elasticsearch Slowlogs with Logstash and JMeter

Sometimes we just need to replay production queries - whether it's because we want a realistic load test for the new version of a product or because we want to