Subject: Re: Solr-recommender


Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, not sure how much work.

It sounds like you are doing item-item similarities for recommendations, not actually calculating user-history based recs, is that true?

You bring up a point that we're finding. I'm not so sure we need or want a recommender query API that is separate from the Solr query API. What we are doing on our demo site is putting the output of the Solr-recommender where Solr can index it. Our web app framework then allows very flexible queries against Solr, using simple user history, producing the typical user-history based recommendations, or mixing/boosting based on metadata or contextual data. If we leave the recommender query API in Solr we get web app framework integration for free.

Another point is where the data is stored for the running system. If we allow Solr to index from any storage service that it supports then we also get free integration with most any web app framework and storage service. For the demo site we put the data in a DB and have Solr index it from there. We also store the user history and metadata there. This is supported by most web app frameworks out of the box. You could go a different route and use almost any storage system/file system/content format since Solr supports a wide variety.

Given a fully flexible Solr standard query and indexing scheme all you need do is tweak the query or data source a bit and you have an item-set recommender (shopping cart) or a contextual recommender (for example boost recs from a category) or a pure metadata/content based recommender.  

If the query and storage is left to Solr+web app framework then the github version is complete if not done. Solr still needs LLR in the more-like-this queries. Term weights to encode strength scores would also be nice and I agree that both of these could use some work.

BTW lest we forget this does not imply the Solr-recommender is better than Myrrix or the Mahout-only recommenders. There needs to be some careful comparison of results. Michael, did you do offline or A/B tests during your implementation?

On Oct 9, 2013, at 6:13 AM, Michael Sokolov <[EMAIL PROTECTED]> wrote:

Just to add a note of encouragement for the idea of better integration between Mahout and Solr:

On safariflow.com, we've recently converted our recommender, which computes similarity scores w/Mahout, from storing scores and running queries w/Postgres, to doing all that in Solr.  It's been a big improvement, both in terms of indexing speed, and more importantly, the flexibility of the queries we can write.  I believe that having scoring built in to the query engine is a key feature for recommendations.  More and more I am coming to believe that recommendation should just be considered as another facet of search: as one among many variables the system may take into account when presenting relevant information to the user.  In our system, we still clearly separate search from recommendations, and we probably will always do that to some extent, but I think we will start to blend the queries more so that there will be essentially a continuum of query options including more or less "user preference" data.

I think what I'm talking about may be a bit different than what Pat is describing (in implementation terms), since we do LLR calculations off-line in Mahout and then bulk load them into Solr.  We took one of Ted's earlier suggestions to heart, and simply ignored the actual numeric scores: we index the top N similar items for each item.  Later we may incorporate numeric scores in Solr as term weights.  If people are looking for things to do :) I think that would be a great software contribution that could spur this effort onward since it's difficult to accomplish right now given the Solr/Lucene indexing interfaces, but is already supported by the underlying data model and query engine.
-Mike

On 10/2/13 12:19 PM, Pat Ferrel wrote: