Solr Digest, October 2010

Another busy month is behind us.  There were plenty of interesting topics, so let’s get started:

Already committed functionality

Interesting functionality in development

  • Faceting is heavily used functionality, but occasionally people find they’re missing some form of faceting. Hierarchical faceting is one such thing. It has been in development for a very long time but, despite a few posted patches, is still not a part of Solr distribution, plus it hasn’t seen much activity lately. There is another similar issue – Pivot (aka Decision Tree) Faceting Component which should come to life as a separate component. However, there is renewed effort to make it usable, so eventually we’ll see expanded faceting support in Solr.

Interesting new functionality

  • Extending SchemaField with custom attributes is being dealt with in the issue Custom SchemaField object.
  • Improving search relevance is always a big issue (and represents a good part of what Sematext does in client engagements), no matter how good out of the box Solr and Lucene relevance is.  One very useful addition to our search relevance arsenal could come from the Anti-phrasing feature. The idea is that some word sequences in a query are irrelevant to the query meaning (like “Where can I find” or “Where is.”) and could/should be ignored while searching the index. This JIRA issue is still very fresh, so don’t hold your breath waiting for the implementation to become available next week, although we are bound to see this feature in one of the future Solr releases.
  • If you often working with financial data, you might find patches from issue Money FieldType useful. The new field type will support point and range queries, sorting, and exchange rates.
  • Lucene’s ICUTokenizer is useful for multilingual tokenizing but until recently there was no support for it in Solr. The issue Provide Solr FilterFactory for Lucene ICUTokenizer will provide a filter factory which will enable us to use it from Solr. Bingo! The patch already exists, so it can be tried already. Additional new functionality will be added over time.  If you need multilingual support in Solr, have a look at Sematext‘s popular Multilingual Indexer.

Miscellaneous

  • One of the favorite topics, which we also cover frequently, is related to the ongoing confusion about Solr versions. October didn’t disappoint, this topic was discussed on mailing lists again. So, here is one such thread – Which version of Solr to use?. Let us summarize the key parts. Solr 1.5 will probably never be released.  The branch_3x is a stable version from which the next Solr 3.1 version will likely be released.  The trunk contains relatively stable, but still development version of what will become Solr 4.0 one day.
  • If you provide faceting functionality in your application, here is a small (but interesting) discussion that might give you a few ideas about how to optimize it – Faceting and first letter of fields.
  • It appears that Solr has problems running on Tomcat 7. These problems are not related to a particular version of Solr, but to all versions. To learn more, start with Problems running on tomcat and SOLR-2022 .
  • The replication between Solr master and slave when they’re running different versions of Solr is broken, as you can see in issue Cross-version replication broken by new javabin format. The cause is the new javabin format, so in cases like the one described in this issue (master 1.4.1, slave 3x), you’ll encounter problems. Keep that in mind if you plan cross-version replication for some reason.

These were the most interesting highlights for the month of October. Thank you for reading Sematext Blog and following @sematext on Twitter.

Solr Digest, September 2010

Leave a Reply