Sematext Solr AutoComplete is an open-source Solr add-on that provides suggest-as-you-type functionality. In this post we’ll explain how you can install it, load the autocomplete collection/core with suggestions and how to run queries to get those suggestions back.
Why Sematext Solr AutoComplete?
Before we start, you might wonder how is Sematext Solr AutoComplete different from Solr’s Suggesters. The most important pluses of AutoComplete are:
- query flexibility. For example, with built-in suggesters you can choose an implementation that allows for fuzzy matches (vashin can return washington) or one matching infixes (wash can return the washington times), but you can’t have both. AutoComplete can do that (vashin can return the washington times)
- ranking flexibility. Besides static boosts, you can boost based on word order (washington ti can return the washington times above time in washington) or completed words (new can return new york above newton). You can also group suggestions based on a field, for example to have sponsored suggestions higher
- it comes with a few tools that help you load suggestions into the collection/core used for autocomplete. You can load suggestions from a file, another index or via the DataImportHandler
- it comes with a GUI component – an AJAX that can be attached to an HTML search form to query the backend
Solr’s built-in suggesters are easier to maintain when it comes to upgrades and are potentially faster, depending on the selected implementation and number of suggestions that have to be queried. We suggest checking them out as well as Sematext Solr AutoComplete so you can choose what’s best for your use-case. In general, AutoComplete helps when you need more control over your suggestions, especially since it makes this customization easier via import tools and GUI code.
First, you’ll need to clone the AutoComplete repository and package it. At the time of this writing, the last supported Solr version is 6.3. Just make sure you have Java 8 and Maven and then:
git clone https://github.com/sematext/solr-autocomplete.git cd solr-autocomplete mvn clean package
Once the build process is done, copy the AutoComplete jar to Solr’s installation:
cp target/st-AutoComplete-220.127.116.11.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/
AutoComplete depends on Sematext Solr ReSearcher‘s core jar. ReSearcher is another Solr addon that complements and extends Solr’s built-in spellcheckers, much like what AutoComplete does to Solr’s suggesters. We’ll explain ReSearcher in another post, but for now let’s treat it as a dependency:
git clone https://github.com/sematext/solr-researcher cd core mvn clean package cp target/st-ReSearcher-core-18.104.22.168.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/
With all the jars in place, start Solr. Here it’s in Cloud mode, but it works with Master-Slave Solr as well:
/opt/solr-6.3.0/bin/solr start -c
Create an AutoComplete collection (or core, if it weren’t SolrCloud). Here’s we’ll name it autocomplete, but it can be anything:
/opt/solr-6.3.0/bin/solr create -c autocomplete -d ./solr/collection1/conf/
Index and query suggestions
Everything is now ready to load some suggestions. AutoComplete comes with a few example files that we can load with the FileLoader tool. In production, it’s likely that the tools you use for manual or automatic curation of suggestions will output to a file, so you may end up using FileLoader as more than just a test script:
cat example/exampledocs/just-phrases.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete
To ask for a suggestion, we’ll just run a query on the autocomplete collection with the prefix, while specifying the dismax_ac query handler:
Tweaking queries and ranking
From the previous query, you’ll normally get back newton and new york. You can boost completed words (i.e. the new york suggestion) by setting ac_matchFullWords:
Typos can be tolerated too, via ac_spellcheck:
Lastly, let’s make some suggestions sponsored:
$ cat example/exampledocs/phrases-sponsored.txt phrase:First Item Here is_sponsored:false phrase:Second Item Here is_sponsored:true phrase:Here Item is_sponsored:false $ cat example/exampledocs/phrases-sponsored.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete
Now we can group them so that sponsored items come first:
Looking for a solution to monitor performance metrics? Try SPM!
Get perfect view of your environment with components view, transaction tracing, and more.
Once you get AutoComplete working as you wish, you can use the HTML and JS examples that come with AutoComplete to make this part easier. More details can be found on the Github README and the result should be similar to that on our search sites: search-lucene.com, search-hadoop.com and search-devops.com: