Sematext Solr Autocomplete

Sematext Solr AutoComplete: Introduction and Howto

Sematext Solr AutoComplete is an open-source Solr add-on that provides suggest-as-you-type functionality. In this post we’ll explain how you can install it, load the autocomplete collection/core with suggestions and how to run queries to get those suggestions back.

Why Sematext Solr AutoComplete?

Before we start, you might wonder how is Sematext Solr AutoComplete different from Solr’s Suggesters. The most important pluses of AutoComplete are:

  • query flexibility. For example, with built-in suggesters you can choose an implementation that allows for fuzzy matches (vashin can return washington) or one matching infixes (wash can return the washington times), but you can’t have both. AutoComplete can do that (vashin can return the washington times)
  • ranking flexibility. Besides static boosts, you can boost based on word order (washington ti can return the washington times above time in washington) or completed words (new can return new york above newton). You can also group suggestions based on a field, for example to have sponsored suggestions higher
  • it comes with a few tools that help you load suggestions into the collection/core used for autocomplete. You can load suggestions from a file, another index or via the DataImportHandler
  • it comes with a GUI component – an AJAX that can be attached to an HTML search form to query the backend

Solr’s built-in suggesters are easier to maintain when it comes to upgrades and are potentially faster, depending on the selected implementation and number of suggestions that have to be queried. We suggest checking them out as well as Sematext Solr AutoComplete so you can choose what’s best for your use-case. In general, AutoComplete helps when you need more control over your suggestions, especially since it makes this customization easier via import tools and GUI code.

Installation

First, you’ll need to clone the AutoComplete repository and package it. At the time of this writing, the last supported Solr version is 6.3. Just make sure you have Java 8 and Maven and then:

git clone https://github.com/sematext/solr-autocomplete.git
cd solr-autocomplete
mvn clean package

Once the build process is done, copy the AutoComplete jar to Solr’s installation:

cp target/st-AutoComplete-1.6.6.3.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/

AutoComplete depends on Sematext Solr ReSearcher‘s core jar. ReSearcher is another Solr addon that complements and extends Solr’s built-in spellcheckers, much like what AutoComplete does to Solr’s suggesters. We’ll explain ReSearcher in another post, but for now let’s treat it as a dependency:

git clone https://github.com/sematext/solr-researcher
cd core
mvn clean package
cp target/st-ReSearcher-core-1.12.6.3.1-SNAPSHOT.jar /opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/

With all the jars in place, start Solr. Here it’s in Cloud mode, but it works with Master-Slave Solr as well:

/opt/solr-6.3.0/bin/solr start -c

Create an AutoComplete collection (or core, if it weren’t SolrCloud). Here’s we’ll name it autocomplete, but it can be anything. You’d use the configuration from the cloned solr-autocomplete directory:

/opt/solr-6.3.0/bin/solr create -c autocomplete -d ./example/solr/collection1/conf/

Index and query suggestions

Everything is now ready to load some suggestions. AutoComplete comes with a few example files that we can load with the FileLoader tool. In production, it’s likely that the tools you use for manual or automatic curation of suggestions will output to a file, so you may end up using FileLoader as more than just a test script:

cat example/exampledocs/just-phrases.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete

To ask for a suggestion, we’ll just run a query on the autocomplete collection with the prefix, while specifying the dismax_ac query handler:

curl 'localhost:8983/solr/autocomplete/select?q=new&qt=dismax_ac&indent=true'

Tweaking queries and ranking

From the previous query, you’ll normally get back newton and new york. You can boost completed words (i.e. the new york suggestion) by setting ac_matchFullWords:

curl 'localhost:8983/solr/autocomplete/select?q=new&qt=dismax_ac&ac_matchFullWords=true&indent=true'

Typos can be tolerated too, via ac_spellcheck:

curl 'localhost:8983/solr/autocomplete/select?q=nee&qt=dismax_ac&ac_spellcheck=true&indent=true'

Lastly, let’s make some suggestions sponsored. Note that this file is tab delimited:

$ cat example/exampledocs/phrases-sponsored.txt
phrase:First Item Here   is_sponsored:false
phrase:Second Item Here  is_sponsored:true
phrase:Here Item         is_sponsored:false
$ cat example/exampledocs/phrases-sponsored.txt | java -cp /opt/solr-6.3.0/dist/*:/opt/solr-6.3.0/server/lib/ext/*:/opt/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib/* com.sematext.autocomplete.loader.FileLoader http://localhost:8983/solr/autocomplete

Now we can group them so that sponsored items come first:

curl 'localhost:8983/solr/autocomplete/select?q=ite&qt=dismax_ac&ac_grouping_field=is_sponsored&indent=true'


 Next steps

Once you get AutoComplete working as you wish, you can use the HTML and JS examples that come with AutoComplete to make this part easier. More details can be found on the Github README and the result should be similar to that on our search sites: search-lucene.com, search-hadoop.com and search-devops.com:

 

2 thoughts on “Sematext Solr AutoComplete: Introduction and Howto

  1. Does your autocomplete support context. In other words can we include info in the autocomplete to show the document type or other field info associated with the term. If yes, can you explain how it is supported.

    1. Yes, that’s supported. Autocomplete uses a separate Solr core/collection, which comes with a predefined schema. You can use the fields from that schema as metadata (for filtering) or you can add your own. To filter, you should be able to use fq like with any other query.

Leave a Reply