Solr Digest, September 2010

It is a busy time of year here at Sematext – we have 3 different presentations to prepare for 3 different conferences to prepare (2 down, 1 more to go!), so we’re a bit late with our digests. Nevertheless, we managed to compile a list of interesting topics in Solr world:

Already committed functionality

  • Solr was upgraded to use Tika 0.7SOLR-1819 – the fix was applied to 1.4.2, 3.1 and 4.0 versions.  Of course, Tika 0.8 is going to happen in not very distant future.
  • If you’re still using old rsync based replication and have a need to throttle transfer rate, have a look at a patch contributed in JIRA issue SOLR-2099. Unfortunatelly, if you’re using 1.4 Java based replication, there is currently no way to throttle replication.
  • If you are using new spatial capabilities in Solr, you might have noticed some incorrect calculations. One of them is fixed – Spatial filter is not accurate –  on 3.1 and 4.0 branches
  • Another minor but useful addition – function queries can now be defined in terms of parameters from other request parameters. Check JIRA issue “full parameter dereferencing for function queries”. It is already implemented in 3.1 and 4.0 and is ready to be used. Here is a short example from JIRA (check how add function is defined and note v1 and v2 request parameters):

http://localhost:8983/solr/select?defType=func&fl=id,score&q=add($v1,$v2)&v1=mul(2,3)&v2=10

Can we say, Solr Calculator, eh?

Interesting functionalities in development for some time

  • Ever wanted to add some custom fields to a response, although they were not stored in your Solr index? You could always create a custom response writer which would add those fields (although it would probably be a “dirty” copy of some already existing Solr’s response writer). However, we all know that it doesn’t sound as the right way to code.  One JIRA issue might deliver a correct way some day – Allow components to add fields to outgoing documents. We say “some day“, since this functionality has been in development for quite some time now and, although it has some patches (currently unfunctional, it seems), is probably is not very near being completed.  But it will be handy to have once it’s done.

Interesting new functionalities

  • Highlighter could get one frequently requested improvement – Highlighter fragement/formatter for returning just the matching terms – we believe this will be a useful addition, although we don’t expect it very soon.
  • One potentially useful feature for all of you who use HDFS – DIH should be able read data directly from HDFS for indexing.  This issue already contains some working code, although it is a question if the fix will become a part of standard Solr distribution.  Still, if you’re using Solr 1.4.1 and you have data in HDFS that you want to index with Solr, have a look at this contribution.
  • Another improvement related to replication is in SOLR-2117 – Allow slaves to replicate at different times. This should be useful to anyone who has long (and therefore heavy) warmup periods on their slaves after replication. This way, you can have your slaves replicate at different time and at the time of replication just take replicating slave offline (to avoid degradation of response times). Be careful though, there is a downside : for some time (limited, but still), your slaves will serve different data. Patch is available for 4.0 version.

Miscellaneous

So, we had a little bit of everything from Solr this month. Until late October (or start of November) when new Solr Digest arrives, stay tuned to @sematext, where we tweet other interesting stuff on a wider set of topics from time to time.

Leave a Reply