Solr Digest, August 2010

August brought a lot of activity into Solr world. There were many important developments, so we again compiled the most interesting ones for you, grouped into 4 categories:

Some new (and already committed) features

  • We already wrote about new work done on CollapsingComponent in June’s digest under SOLR-1682. A lot of work was done on this component and it appears that it is very close to being committed. Patches attached to the issue are functional, so you can give it a try.
  • SpellCheckComponent got improvement related to recent Lucene changes –  Add support for specifying Spelling SuggestWord Comparator to Lucene spell checkers for SpellCheckComponent. Issue SOLR-2053 is already fixed, patch is attached if you need it, but it is also committed to trunk and 3_x branch.
  • Another minor feature is improvement of WordDelimiterFilter in SOLR-2059Allow customizing how WordDelimiterFilter tokenizes text. Patch is already there and committed to trunk and 3_x.
  • Performance boost for faceting can be found in SOLR-2089Faceting: order term ords before converting to values. Behind this intimidating title hides a very decent speedup in cases when facet.limit is high. Patch is available, trunk and branch 3_x also got this magic committed.

Some new features being discussed and implemented

  • One very important (and probably much wanted) feature just got its Jira issue – SOLR-2080Create a Related Search Component. The issue was created by Grant Ingersoll, so we can expect some quality work do be done here. There are no patches (or even discussions) yet as the issue is in its infancy, but you can watch its progress in Jira. In the meantime, if you’re interested in such functionality, you can check Sematext’s RelatedSearches product.
  • Jira issue SOLR-2026Need infrastructure support in Solr for requests that perform multiple sequential queries – might add some interesting capabilities to search components, especially if you’re writing some of them on your own. We at Sematext have plenty of experience with writing of custom Solr components (check, for instance, our DYM ReSearcher or its Relaxer sibling), so we know that sometimes it is not a very pleasant task. If Solr gets better support for execution of multiple queries during a single request, writing custom components will become easier. One patch is already posted to this issue, so you can check it out, however, it is still unclear in which way this feature will evolve. We’re hoping for a flexible and comprehensive solution which would be easily extensible to many other features.
  • Defining QueryComponent’s default query parser can be made configurable with the patch attached to the issue SOLR-2031. You probably didn’t encounter many cases where you needed this functionality, but if you needed it, you had a problem before, and now that problem will become history.
  • It appears that QueryElevationComponent might get an improvement : Distinguish Editorial Results from “normal” results in the QueryElevationComponent. Jira issue SOLR-2037 will be the place to watch the progress.

Some newly found bugs

  • DataImportHandler has a bug – Multivalued fields with dynamic names does not work properly with DIH – the fix isn’t available, but if you have such problems, you check the status here.
  • Another bug in DataImportHandler points to a connection-leak issues – DIH doesn’t release JDBC connections in conjunction with DB2. There is no fix at the moment but, as usual, you can check the status in Jira.

Other interesting news

  • One potentially useful tool we recommend checking is SolrMeter. It is a standalone tool for stress testing of you Solr. From their site: The main goal of this open source project is to bring to the solr user community a “generic tool to interact specifically with solr”, firing queries and adding documents to make sure that your Solr implementation will support the real use. With SolrMeter you can simulate your work load over solr index and retrieve statistics graphically.
  • In which IDEs do you work with Solr/Lucene? Here at Sematext, we use both Eclipse and IntelliJ IDEA. If you use the latter and you want to set up Lucene or Solr in it, you can check a very useful description and patch in LUCENE-2611 IntelliJ IDEA setup.

We hope you enjoyed another Solr Digest from @sematext.  Come back and read us next month!

Leave a Reply