Solr Query Segmenter: How to Provide Better Search Experience

One way to create a better search experience is to understand the user intent.  One of the phases in that process is query understanding, and one simple step in that direction is query segmentation. In this post, we’ll cover what query segmentation is and when it is useful. We will also introduce to you Solr Query Segmenter, a open-sourced Solr component that we developed to make search experience better.

Read More

Parameterizing Queries in Solr and Elasticsearch

We all know how good it is to have abstraction layers in software we create. We tend to abstract implementation from the method contracts using interfaces, we use n-tier architectures so that we can abstract and divide different system layers from each other. This is very good – when we change one piece, we don’t need to touch the other parts that only knew about method contracts, API’s, etc. Why not do the same with search queries? Can we even do that in Elasticsearch and Solr? We can and I’ll show you how to do that.

Read More

Announcement: Coming Up in Site Search Analytics

Have you checked out Site Search Analytics yet?  If not, and if you think that gaining insight into user search behavior and experience is valuable information, then we’ve got something for you that’s battle-tested and ready to go.

This year we are adding some killer new features that will make SSA even more useful.  So, if you want to be enjoying benefits like:

  • Viewing real-time graphs showing search and click-through rates
  • Awareness of your top queries, top zero-hit queries, most seen and clicked on hits, etc.
  • Having a mechanism to perform search relevance A/B tests and a relevance feedback mechanism
  • Not having to develop, set up, manage or scale all the infrastructure needed for query and click log analysis
  • And many others — here is a full list of features and benefits

…then you will love the new functionality we have on the way.  After all, how can you improve search quality if you don’t measure it first and keep track of it?

Site Search Analytics
Site Search Analytics

Sound interesting?  Then check out a live demo.  SSA is 100% focused on helping you to improve the search experience of your customers and prospects.  And a better search experience translates into more traffic to your web site and greater awareness of your business.

What's New in Sematext Search Analytics

We’ve been busy with adding functionality and improving SPM, our Performance Monitoring service, but we’ve also been quietly working on our free Search Analytics service (internally known as SA).  As a matter of fact, not coincidentally, SPM and SA share a lot of backend components, as well as UI-level pieces.  This, of course, allows a good amount of software reuse and lets us improve both services without double the effort.

Here are some of the new things in Search Analytics:

Live Demo. Before you create your Sematext Apps account (it’s free, no need to take out your credit card) you can check out the live demo and see both SPM and SA in action.

Real-time. Previously, SA used MapReduce jobs to process the collected data and make them available as reports.  That is no longer the case. We’ve put SA on the same real-time OLAP engine that powers SPM.  This means you’ll see your graphs refresh and change before your eyes.

Dashboards. Just like we’ve added Dashboards to SPM, we’ve added them to SA, too.  You can now create custom Dashboards, pick which graphs you want on them, and where you want to put them on a Dashboard.  This is great if you want to display your search stats and trends on a large office monitor, as some SA users are already doing.  Moreover, you can put widgets from multiple SA and SPM Apps all on the same Dashboard, so you can see your performance metrics, SPM custom metrics (e.g. your KPIs), and your SA metrics on a single Dashboard, side by side.

Report/URL Sharing. Just like in SPM, we’ve made it possible to copy the URL from the browser, give it to anyone who has access to your SA reports.  When this URL is opened any filters or time selection will be automatically applied.  This makes it very easy for multiple people to easily share their “SA view” by sharing the URL instead of having to tell others which report they should look at, what time and what filters they should select, etc.

Graph Embedding and Sharing. Similar to URL Sharing (but different!), you can now share and embed individual SA graphs.  Each graph has a short URL that you can Tweet or share elsewhere.  You can also get a URL/HTML snippet and embed SA graphs in your blog, wiki, web site, etc.

User Sessions. We’ve added a User Sessions report. This report shows you the number of search sessions over time, the number of queries per session, as well as the number of distinct users using your search.  If anyone asked you to provide these numbers for your site, would you know them?  Most people would say no.  These metrics are good to know and with SA everyone will now be able to say yes to that question.

Distinct Queries. We’ve added the number of Distinct Queries to the Rate & Volume report.  Another nice metric.

Hourly Granularity. All graphs in SA now go down to hourly granularity.  This let’s you see how trends change over the course of each day.  This can lead to insights around differences in how your users use your search in the morning vs. during work hours vs. evening.

HTTPS/SSL. The SA JavaScript beacon can now use HTTPS.  This is important if your site uses HTTPS when displaying search results.  To send your search and clickstream data VIA HTTPS just replace http:// with https:// in SA JavaScript beacon.

We hope you like these changes.  Please leave a comment or let us know if you have suggestions for other improvements or new features you would like to see in Sematext Search Analytics.


Berlin Buzzwords 2013 – Two Talks from Sematext

Last year at Berlin Buzzwords we were proud to give three talks. Alex talked about “Real-time Analytics with HBase” (slides, video), Otis talked about large scale monitoring in his talked titled “Large Scale ElasticSearch, Solr & HBase Performance Monitoring” (slides, video) and Rafał gave a talk about how we scale ElasticSearch clusters in his “Scaling Massive ElasticSearch Clusters” talk (slides, video). We were also very happy to be one of the sponsors of this great conference 🙂 Because we really enjoyed the conference we decided to submit a few proposals this year and they got accepted. In this years schedule we will be giving the following talks:

Radu: JSON Logging with ElasticSearch

This talk is about aggregating loooots of logs – searching of seriously big data. We’ll go through everything we can possibly go through in 20 minutes. We’ll look at how, where, when, why, and what to log. We’ll show how to use Elasticsearch as a data store for logs and what the benefits of doing so are. We’ll discuss advantages and disadvantages of logging in JSON, which is easily processed by machines, over traditional logging, which is easily processed by humans. Finally, we’ll explore how you can get your logs – JSON or not – into Elasticsearch, run searches and statistics on them, and create pretty graphs you can’t stop staring at.

Rafał: Battle of the Giants, Round 2


Learn about how both of these great enterprise search servers are evolving and adding new features. We will be comparing the latest and greatest versions of Solr and ES, both of which are using Lucene 4.x and bringing different approaches to handling codecs, per field similarities, and more. Of course, we’ll not only look at technical aspects of both Apache Solr and ElasticSearch, but will also dig into the makeup of their contributors, compare the code and of course the user community. By the end of the talk you’ll learn the main differences when it comes to these two search servers, how they handle shard and replica distribution, automatic data replication, and different query types. In addition, you’ll learn what the admin APIs for both Solr and ElasticSearch look like and how to use them to control and alter your cluster state. Last, but not least, you’ll learn what to avoid when using ElasticSearch or Apache Solr.

[Note: for those of you who don’t have the time or inclination to go through all the technical details, here’s a high-level, up-to-date (2015) Solr vs. Elasticsearch overview]

We hope to see some of you in Berlin.  If these topics are of interest to you, but you won’t be coming to Berlin, feel free to get in touch, leave comments, or ping @sematext. As usual we’ll be posting slides after the talks and the organizers will probably record the talk and publish it after the conference. And if you love working with things our talks are about, we are hiring world-wide!

Poll: Using SolrCloud or Not?

We know that as of February 2013, of those Solr users who follow Sematext Blog about 75% use one some version of Solr 4.x.  But today we are trying to get to another interesting stat:

What portion of Solr 4.x users use SolrCloud?

Let’s find out!  Please tweet this to help us get more votes and better stats.

Please vote only if you are using Solr 4.x.  Please do NOT vote if you are using 1.x or 3.x version of Solr.

Solr vs. ElasticSearch: Part 6 – User & Dev Communities

[Note: for those of you don’t have the time or inclination to go through all the technical details, here’s a high-level, up-to-date (2015) Solr vs. Elasticsearch overview]

One of the questions after my talk during the recent ApacheCon EU was what I thought about the communities of the two search engines I was comparing. Not surprisingly, this is also a question we often address in our consulting engagements.  As a part of our Apache Solr vs ElasticSearch post series we decided to step away from the technical aspects of SolrCloud vs. ElasticSearch and look at the communities gathered around thesee two projects. If you haven’t read the previous posts about Apache Solr vs. ElasticSearch here are pointers to all of them:

Read More

Solr vs ElasticSearch: Part 5 – Management API Capabilities

[Note: for those of you who don’t have the time or inclination to go through all the technical details, here’s a high-level, up-to-date (2015) Solr vs. Elasticsearch overview]

In previous posts, all listed below, we’ve discussed general architecture, full text search capabilities and facet aggregations possibilities. However, till now we have not discussed any of the administration and management options and things you can do on a live cluster without any restart. So let’s get into it and see what Apache Solr and ElasticSearch have to offer.

Read More

Slides: Battle of the Giants – Solr 4.0 vs ElasticSearch 0.20.0

[Note: for those of you who don’t have the time or inclination to go through all the technical details, here’s a high-level, up-to-date (2015) Solr vs. Elasticsearch overview]

Slides for the Battle of the Giants talk Rafał Kuc (@kucrafal) gave at ApacheCon EU 2012 are now up!

If you like working with Solr and/or ElasticSearch, or HBase, Hadoop, Kafka, Flume, etc., use and/or develop highly scalable distributed applications and frameworks, if you like to work on Analytics and Big Data applications and services, we’re looking for good, smart, and fun people!

And if you liked the above presentation, you may also want read our ElasticSearch vs. Solr series and see Scaling Massive ElasticSearch Clusters.

Solr vs ElasticSearch: Part 4 – Faceting

[Note: for those of you who don’t have the time or inclination to go through all the technical details, here’s a high-level, up-to-date (2015) Solr vs. Elasticsearch overview]

Solr 4 (aka SolrCloud) has just been released, so it’s the perfect time to continue our ElasticSearch vs. Solr series. In the last three parts of the ElasticSearch vs. Solr series we gave a general overview of the two search engines, about data handling, and about their full text search capabilities. In this part we  look at how these two engines handle faceting.

Read More