Search
OpenSearch RAG Tutorial with ml-commons
RAG is all the RAGe these days. If you don't know the term, it's short for "retrieval-augmented generation", which means that we're using retrieval (in our case OpenSearch) to augment
Scaling Elasticsearch by Cleaning the Cluster State
We often get questions like: How much data can I put in an Elasticsearch cluster? How many nodes can an Elasticsearch cluster have? What's the biggest cluster that you've seen?
Elasticsearch to OpenSearch Migration Facilitated by Sematext Cloud
OK, so you've decided to move from Elasticsearch to OpenSearch. Maybe our comparison helped you decide and maybe you've checked our guide on how to perform the migration. But how
Running OpenSearch on Kubernetes With Its Operator
If you’re thinking of running OpenSearch on Kubernetes, you have to check out the OpenSearch Kubernetes Operator. It’s by far the easiest way to get going, you can configure pretty
Using Solr Operator to Autoscale Solr on Kubernetes
In this tutorial, you'll see how to deploy Solr on Kubernetes. You'll also see how to use the Solr Operator to autoscale a SolrCloud cluster based on CPU with the
11 Small Search Platforms: Powerful Alternatives to Elasticsearch, OpenSearch, and Solr
Introduction In the ever-evolving world of search engines, Elasticsearch, OpenSearch, and Solr have long held the spotlight. However, there are several smaller search platforms that pack a punch and offer
Migration from Elasticsearch to OpenSearch
Introduction In this tutorial, we will guide you through the process of migrating from Elasticsearch to OpenSearch. OpenSearch is aan open-source search and analytics suite that is compatible with Elasticsearch.
OpenSearch vs Solr: Which One Is Better to Use?
If you’re looking for a short answer on OpenSearch vs Solr, here’s a flow chart: We normally recommend the one you (or your team) already know or the prefer because,
OpenSearch vs Elasticsearch: Which One Is Better to Use?
Whenever we start a search consulting project from scratch, the obvious question is: which search engine to use? We’ve talked about Elasticsearch vs Solr before, but here we’ll compare Elasticsearch
All About Solr Replica Placement Plugins
With Solr 9 the Autoscaling Framework was removed - for being too complex and not terribly reliable - and instead we have Replica Placement Plugins. Unlike Autoscaling, replica placement only
Writing a Custom Sort Plugin for Solr
OK, so you want to sort your documents by something that can’t be implemented with Solr’s built-in functions. This calls for a custom function, which you can implement through your
Autoscaling Elasticsearch Clusters for Logs: Using a Kubernetes Operator to Scale Up or Down
When we say “logs” we really mean any kind of time-series data: events, social media, you name it. See Jordan Sissel’s definition of time + data. And when we talk
OpenSearch 2.1 Release Highlights
OpenSearch 2.1 was recently released and here are the highlights: Snapshot Management: you could back up indices using Index Management before, but this only works well for time-series use-cases, like
solr-reindexer: Quick Way to Reindex to a New Collection
If you’re using Solr, for sure there are times when you change the schema and need to reindex. Quite often the source of truth is a database, so you can
Solr vs Elasticsearch: Performance Differences & More. How to Decide Which One Is Best for You
“Solr or Elasticsearch?”…well, at least that is the common question we hear from Sematext’s consulting services clients and prospects. Which one is better, Solr or Elasticsearch? Which one is faster?
Working with Solr Plugins System
Apache Solr was always ready to be extended. What was only needed is a binary with the code and the modification of the Solr configuration file, the solrconfig.xml and we
Solr-diagnostics: How to use it and what it collects
If you’re running Solr and have to troubleshoot it (or maybe you just want a good overview!), then you’d probably want to collect logs, configs, maybe a snapshot of metrics
Elasticsearch security: Authentication, Encryption, Backup
There’s no need to look outside the ELK Stack for apps to ensure data protection. Basic Elasticsearch Security features are free and include a lot of functionality to help you
Entity Extraction for Product Searches
What is Entity Extraction? Entity extraction is, in the context of search, the process of figuring out which fields a query should target, as opposed to always hitting all fields.
Entity Extraction with spaCy
What is Entity Extraction? Entity extraction is, in the context of search, the process of figuring out which fields a query should target, as opposed to always hitting all fields.
Open Distro for Elasticsearch Review
Over the years the adoption of Elasticsearch and its ecosystem of tools positioned them as the leaders in the time series data management and analysis market. With strong search capabilities,
Entity Extraction with Scikit-learn Classifiers
What is entity extraction? Entity extraction is the process of figuring out which fields a query should target, as opposed to always hitting all fields. For example: how to tell,
Elastic Stack Features (formerly X-Pack) Alternatives Comparison
Elastic Stack Features (formerly X-Pack) is an Elastic Stack extension that bundles security, alerting, monitoring, reporting, and graph capabilities. One could use either all or specific components. Elastic Stack Features as
Using Solr to Tag Text
Over the years, natural language processing, in the world of search, went from interesting detail to a must have, especially in areas such as e-commerce. Engineers started incorporating classification, synonym
Search Relevance – Solr & Elasticsearch Similarities
What is Search Relevance Similarity Lucene has a lot of options for configuring similarity. By extension, Solr and Elasticsearch have the same options. Similarity makes the base of your relevancy
Solr Learning To Rank and Streaming Expressions
During the Entity Extraction For Product Searches talk that Radu Gheorghe and I gave at Activate conference in Montreal last year, we talked about various natural language processing and machine learning algorithms. We
Generating Word Embeddings with Gensim’s word2vec
During our Activate presentation, we talked about how to do query expansion by dynamically generating synonyms. Instead of statically defining synonyms lists, we showed a demo of how you could
Field Stats for Elasticsearch 6.x
We're excited to announce the release of the Field Stats API plugin for Elasticsearch. The Field Stats API used to be present from Elasticsearch 1.6 to 5.6, to provide efficient
Named Entity Extraction with OpenNLP
We recently had a presentation at Activate 2018 about entity extraction in the context of a product search. For example: how to tell, when the user typed in Activate 2018,
Garbage Collection Settings for Elasticsearch Master Nodes
Elasticsearch comes with good out-of-the-box Garbage Collection settings. So good in fact that the Definitive Guide recommends not changing them. While we agree that most use-cases wouldn’t benefit from GC
AWS Elasticsearch Service vs. Elasticsearch on EC2
Many of our customers use AWS EC2. In the context of Elasticsearch consulting or support, one question we often get is: should we use AWS Elasticsearch Service instead of deploying Elasticsearch ourselves? The
Solr Streaming Expressions for Collection auto-updating
One of the things that were extensively changed in Solr 6.0 is the Streaming Expressions and what we can do with them (hint: amazing stuff!). We already described Solr SQL
Solr 6, SolrCloud and SQL Queries
With the recent release of Apache Lucene and Solr 6, we should familiarize ourselves with the juicy features that come with them. We have the new default Similarity implementation -
Solr 7 – New Replica Types
With the release of Solr 7 the community around it produced yet another great version of this search engine. As usual, there is an extensive list of changes, bug fixes
Solr: Optimize Is (Not) Bad for You – Video & Slides
Another Lucene/Solr Revolution happened on September 12-15, 2017 in Las Vegas. Sematext was there, exhibiting AND giving two talks! Thanks to everyone who stopped by our booth and attended our two talks: Optimize Is (Not) Bad
Java 9 Elasticsearch Benchmark
TL;DR: The main question here is: How Does Java 9 Work with Elasticsearch 6? It works well, but don't expect miracles. Unless you're using G1, then there are some miracles. With
Search Guard – Security for Elasticsearch
Note: This is a guest post by Jochen Kressin, the CTO of floragunn GmbH, the makers of Search Guard, an open-source X-Pack Security alternative. Elasticsearch is a great piece of software.
Securing Elasticsearch and Kibana with Search Guard for free
Note: This is a guest post by Jochen Kressin, the CTO of floragunn GmbH, the makers of Search Guard, an open-source X-Pack Security alternative. In this article, we show you how
Solr V2 API – Quick Look
Last updated on Jan 11, 2018 We are all used to the Solr API that has been present in Solr from its beginnings. We send the data using HTTP protocol,
Sematext Solr AutoComplete: Introduction and Howto
Sematext Solr AutoComplete is an open-source Solr add-on that provides suggest-as-you-type functionality. In this post we'll explain how you can install it, load the autocomplete collection/core with suggestions and how
Solr New Metrics API: Quick look at Solr 6.4
As you know, in Sematext we looooove logs and metrics and we enjoy playing with them on a daily basis. We have our Logsene, which is all about logs and
Making Elasticsearch in Docker Swarm Elastic
Running Elasticsearch in Docker containers sounds like a natural fit - both technologies promise elasticity. However, running a truly elastic Elasticsearch cluster on Docker Swarm became somewhat difficult with Docker
Running Solr in Docker: How & Why
Docker is all the rage these days, but one doesn't hear about running Solr on Docker very much. Last month, we gave a talk on the topic of running containerized
Handling Shards in SolrCloud
Last updated on Jan 10, 2018 One of the things you learn when attending Sematext Solr training is how to scale Solr. We discuss various topics regarding leader shards and
DocValues Reindexing with Solr Streaming Expressions
Last updated on Jan 8, 2018 Last time, when talking about Solr 6 we learned how to use streaming expressions to automatically update data in a collection. You can imagine
Reindexing Data with Elasticsearch
Last updated on Jan 8, 2018 SIDE NOTE: We run Elasticsearch and ELK trainings, which may be of interest to you and your teammates. Sooner or later, you'll run into
Presentation: Large Scale Log Analytics with Solr
In this presentation from Lucene/Solr Revolution 2015, Sematext engineers -- and Solr and centralized logging experts -- Radu Gheorghe and Rafal Kuć talk about searching and analyzing time-based data at
Presentation: Log Analysis with Elasticsearch
Fresh from the Velocity NYC conference is the latest presentation from Sematext engineers Rafal Kuć and Radu Gheorghe — “From zero to production hero: Log Analysis with Elasticsearch.” The talk
SolrCloud: Dealing with Large Tenants and Routing
Last updated on Jan 10, 2018 Many Solr users need to handle multi-tenant data. There are different techniques that deal with this situation: some good, some not-so-good. Using routing to handle such
Replaying Elasticsearch Slowlogs with Logstash and JMeter
Sometimes we just need to replay production queries - whether it's because we want a realistic load test for the new version of a product or because we want to