If you’re looking for a short answer on OpenSearch vs Solr, here’s a flow chart:
We normally recommend the one you (or your team) already know or the prefer because, for most projects, there’s not that much in it in terms of features. Both search engines are well supported and have strong communities behind them.
That said, there are significant differences, too. More features are different than in the case of OpenSearch vs Elasticsearch, closer to those between Elasticsearch and Solr. And it’s not all about features, it’s also about governance or whether the engine is more developer- or devops-friendly. Let’s expand on each of these.
OpenSearch vs Solr Features
If you have an E-commerce site, you’ll look for different features than if you want to extract interesting counters from social media. So let’s break it down by category.
Full Text Search
In enterprise search use-cases, you’ll want to get the correct results for your search, ranked in the right order. Both OpenSearch and Solr rely on the same underlying Lucene functionality to retrieve documents and give them an initial score. By and large, you’ll be able to achieve the same results with both, but there are two major differences:
- Parsing your query into a Lucene query. OpenSearch has a nice JSON query DSL forked from that of Elasticsearch, which makes it both feature-rich and easy to use. Solr, on the other hand, has a ton of query parsers and you can combine them in many ways. Potentially more powerful, but not as easy to use.
- Functionality on top of Lucene. Here I’m referring to ways of manipulating document scores or their ranking. From functions that allow you to boost the score of more recent documents to Learning to Rank. Here, Solr is more mature (e.g., it has had a Learning to Rank module for many years) but it’s worth noting that OpenSearch made a lot of progress recently (e.g., by introducing a model-serving framework).
To sum up, if I had to choose between OpenSearch and Solr purely on full-text search functionality, Solr would probably have the edge for now.
In enterprise search, data is often relational: t-shirts have different sizes and colors, documents may be written by users who belong to certain groups and so on. Here, both OpenSearch and Solr expose Lucene’s join functionality, but – as always – the devil is in the details:
- While both OpenSearch and Solr expose Lucene’s index-time join (a.k.a. nested documents), Solr takes a more flexible and also more dangerous route of allowing the user to manipulate “child” documents separately. Which means you can search and retrieve them separately, but you could also leave orphan child documents if you’re not careful with delete-by-query. But we recommend avoiding delete-by-query in any case.
- For query-time joins, the two engines diverge even more. Once again, OpenSearch takes the “clean” route via join fields: under the hood, parent documents and their children go to the same shard, which allows the join to be local. With Solr, you can do the same on the fly with the Join query parser, but you’ll effectively define the relationship at query-time. That’s more flexible. The Join query parser can also work across collections. Here, the join can be local (if you can replicate one side of the relationship to all nodes) or remote, using streaming expressions (we’ll come back to them later).
To sum up, Solr wins in this category as well, because it offers more flexibility. That said, neither search engine can replace a relational database or a graph database for dealing with complex relationships.
From application logs to social media metadata, a lot of organizations use search engines to extract insights in real-time. What are the most frequent errors in the last hour? How are people liking our new product? To answer these questions, you need a faceted search: filter the data you need, then do some computation on top of it.
With Solr, there are N ways of doing facets, though serious faceting consumers will use JSON facets, which are very fast and – for the most part – the most feature-rich. But OpenSearch’s aggregations, derived from Elasticsearch, offer a lot more flexibility. So if you need to do more complex real-time analytics, OpenSearch is likely to be the search engine for you.
What if your analytics jobs go beyond the computations you can do on every shard? For example, if you want to join your query results and facets with results from an external database or with results from a different collection? Or if you want to group on a very high cardinality field? Or graph traversal beyond a simple join?
One could argue that you need to use the right tool for the job: a graph database for graph traversal, a batch/streaming framework for high cardinality data and so on. But then there’s the operational complexity… which can be avoided by many Solr users by using Streaming Expressions: a framework that allows you to get data out of queries, facets and other data-sources in a streaming fashion and perform all sorts of computations – from simple counts to training a model to graph traversal.
OpenSearch vs Solr Learning Curve
Not all important functionality is directly related to the use-case, though. You’ll have to learn how to use your search engine to make the most out of it, you’ll have to operate it in production and you might want to extend its functionality as well.
My rule of thumb here is that Solr is a “geeky” search engine: its internals are quite exposed, making it overwhelming at the beginning, but then you can enjoy its transparency. For example, you can see exactly what happens at index-time by looking at your update processor chain, but that information can be too much if you’re new.
On the other hand, we have Elasticsearch. And no, it’s not a typo, I’d like to bring Elasticsearch here because it’s the basis for OpenSearch. By contrast to Solr, Elasticsearch is easier to start with, has better defaults (e.g., pre-configured circuit breakers and other limits). It’s more user-friendly. But then when it eventually becomes challenging, you may not have the knowledge to troubleshoot it. Plus, it’s not as easy to see the implementation beyond the docs. Which, in all fairness, are very good.
OpenSearch started from where Elasticsearch was (in its Apache-licensed variant) at 7.10. It’s been a while, and in the meantime the general feeling is that OpenSearch went a little closer to where Solr is: more exposed, more developer-friendly, but also sacrificing a little bit of that consistency and quality that we came to expect of Elasticsearch. For example, OpenSearch is moving to a more modular codebase, but I still have to refer to Elasticsearch’s documentation to see the details of many features.
If the statements above feel a little too general, don’t worry: we’ll move to more specific areas.
Is My Engine Opinionated?
OpenSearch is opinionated in the sense that, for the most part, there’s a “correct” way to do a task. It also comes with a lot of pre-configured safety nets, from the number of fields you can have in an index to circuit breakers that make OpenSearch error out before it runs out of heap. You also get a feeling of consistency across APIs.
Solr, on the other hand, has N ways of doing pretty much everything. You can query using URL parameters, but you can also send JSON or XML payloads. Stats are exposed by the stats component, but there are also JSON facets and streaming aggregations. You have a lot of flexibility but you may get confused and even shoot yourself in the foot in the process.
I don’t see a right and a wrong here, just a trade-off. If you’re just getting started, opinionated software is much easier to use but may be limiting if you’re trying to do something outside the original design.
Easy to Customize?
Along the same lines, both OpenSearch and Solr have modules and plugins. But few people write plugins for OpenSearch (or Elasticsearch, for that matter), while many of our clients write plugins for Solr as if it’s nothing special.
Say you want to build a new way to parse a query. At the time of writing this, searching “how to write an OpenSearch plugin” on Google doesn’t give any interesting results. We have a nice documentation page on Elasticsearch plugins, which should apply to OpenSearch as well. There are examples on writing new rescoring, custom settings and so on, but it’s not obvious how to create a new query type. Meanwhile, if you google “writing a custom Solr query parser”, you’ll find tons of tutorials, even conference talks on the topic.
In short, Solr wins on the customizability front…
…but loses on the operations front. At the time of writing this, there’s no API to automatically rebalance shards in Solr (it’s on the way). Elasticsearch did that from the beginning and OpenSearch inherited the same functionality. And while Zookeeper is great, OpenSearch’s built-in approach (Zen) feels just as reliable, if not more, in production.
While OpenSearch isn’t as transparent as Solr when it comes to how it does things, it’s very good at showing you what it does at any given moment. You have rich, consistent and easy-to-use stats APIs showing you how much CPU time was spent indexing, merging, refreshing, searching, fetching, etc. This information can be broken down per node, per index or both. You even have a built-in on-demand profiler in the form of the Hot Threads API. Last but not least, the Cat APIs show you a nice table format of the same info – very useful for live troubleshooting from the terminal.
It’s not like you’re flying blind with Solr. It has a nice Metrics API, for example. But it’s nowhere near as DevOps-friendly as OpenSearch.
Monitoring-wise, it also depends on the tool you’re using. Here’s where Sematext Cloud comes in handy, because it can collect, out of the box, metrics and logs from both OpenSearch and Solr. For example, if you’re interested in indexing throughput, here are two built-in charts. See if you can spot which is from OpenSearch and which is from Solr:
Hint: both of these graphs are taken from the demo account.
When Elasticsearch became more popular, there was a concern that Solr was going to die. When OpenSearch forked from Elasticsearch, there was a concern that it wouldn’t take off. By now, these concerns are outdated, to say the least 🙂
Commits and Committers
It’s hard to compare the number of commits in GitHub, because Solr is a monorepo (for the most part, Solr Operator for Kubernetes is a notable exception) while OpenSearch has different repositories for important modules like security. Still, we can get a sense of the health by looking at contributions over time:
Notice how both numbers dropped in recent years. In OpenSearch’s case, it’s mostly because of the fork from Elasticsearch which is a monorepo (more on that in the Elasticsearch vs OpenSearch post) while for Solr it’s mostly because for a while Lucene and Solr were the same Apache project.
Still, if you look at the number of commits in the last year, you can see that OpenSearch has more activity (especially if you take the monorepo aspect into account):
It’s again hard to compare apples-to-apples because some of the discussions happen on GitHub issues, some in Jira (in the case of Solr), some on the forum (OpenSearch) and some on the mailing lists (Solr). Still, the average of about 25 forum posts per week in the case of OpenSearch (+comparable modules to what Solr offers) is the same as the number of dev+user Emails on the Solr mailing lists last week.
In short, both developer and user interest seems to be similar at this point.
If we look at Google interest over time, we can see that OpenSearch just surpassed Solr in the last year:
We can safely assume that OpenSearch will become more popular in the future. Meanwhile, Solr seems to have had a steady interest in the past 3 years, and I would expect that trend to continue.
License and Governance
Both search engines use the Apache 2.0 license, which effectively means you can do anything you want to with the code – use it, modify it, embed it in your product and so on.
When it comes to governance, it’s not the same: OpenSearch is stewarded by AWS, much like Elasticsearch is stewarded by Elastic. Meanwhile, Solr has been an Apache project for many years now, meaning that committers are chosen based on merit (i.e., contributions). So if open-source “community over code” is important to you, Solr is likely to be more appealing. And it’s not only about moral stance, there are practical implications:
- Because of the number of committers and entities that contribute to Solr, once a piece of code gets in, it’s likely to stay there for quite a while. Backwards compatibility has always been Solr’s strong point.
- If you (or your company) want to get involved, it’s easier to contribute after a while because you can become a committer, effectively “owning” the part of Solr that you contribute mostly to.
There are advantages to the “stewarding” model as well: it ensures the project stays consistent – we brought up consistency and “clean code” a few times by now – and also that it’s funded.
We’ve discussed mailing lists and forums, but then you might also need:
- Expert consulting, to help you develop a project
- Training to level up your expertise throughout the project
- Production support, for when there’s a production fire
For both OpenSearch and Solr you have a number of third parties offering the above. We’d recommend Sematext, of course: we might be biased, sure, but there are some objective points to consider:
- We’ve been offering all the above since 2010 – 13+ years as of this writing – for both Elasticsearch and Solr, supporting OpenSearch since its inception. We worked with many clients of all shapes and sizes (and industries).
- We also offer Sematext Cloud, a monitoring SaaS that can aggregate the metrics and logs of both Solr and OpenSearch (and Elasticsearch) out-of-the-box. This comes in very handy when you’re operating them in production or when you’re performance testing. Sematext Cloud can also monitor other technologies, from Kubernetes to PostgreSQL to synthetic website testing and everything in between.
Conclusion: Which One Is Better to Use?
OpenSearch and Solr have a lot of similarities: they’re both open-source, built on Apache Lucene, serving similar use-cases. Namely enterprise search and real-time analytics.
While there are some feature differences between them, most people choose based on other criteria: trendiness (OpenSearch is hotter), open-source purity (Solr wins here) or pre-existing knowledge. I hope that the details above put enough flesh on the overly-simplified diagram from the beginning 🙂