Subject: Re: Solr-recommender

1) Using the user history for the current user in a more-like-this query against the item-item similarity matrix will produce a user-history based recommendation. Simply fetching the item-item history row for a particular item will give you the item-similarity based recs with no account of user history. One could imagine a user-user similarity setup, but that's not what we did.

2) What you are doing is something else that I was calling a shopping-cart recommender. You are using the item-set in the current cart and finding similar, what, items? A different way to tackle this is to store all other shopping carts then use the current cart contents as a more-like-this query against past carts. This will give you items-purchased-together by other users. If you have enough carts it might give even better results. In any case they will be different.
But if you already have the item-item similarity matrix indexed this project wont add much. If you have purchase events and view-details events IDed by user you might try out the cross-recommender part. We've been searching for a data set to try this on.

On Oct 9, 2013, at 12:54 PM, Michael Sokolov <[EMAIL PROTECTED]> wrote:

On 10/9/13 3:08 PM, Pat Ferrel wrote:
> Solr uses cosine similarity for it's queries. The implementation on github uses Mahout LLR for calculating the item-item similarity matrix but when you do the more-like-this query at runtime Solr uses cosine. This can be fixed in Solr, not sure how much work.
It's not clear to me whether it's worth "fixing" this or not.  It would certainly complicate scoring calculations when mixing with traditional search terms.
> It sounds like you are doing item-item similarities for recommendations, not actually calculating user-history based recs, is that true?
Yes that's true so far.  Our recommender system has the ability to provide recs based on user history, but we have not deployed this in our app yet.  My plan was simply to query based on all the items in the user's "basket" - not sure that this would require a different back end?  We're not at the moment considering user-user similarity measures.
> You bring up a point that we're finding. I'm not so sure we need or want a recommender query API that is separate from the Solr query API. What we are doing on our demo site is putting the output of the Solr-recommender where Solr can index it. Our web app framework then allows very flexible queries against Solr, using simple user history, producing the typical user-history based recommendations, or mixing/boosting based on metadata or contextual data. If we leave the recommender query API in Solr we get web app framework integration for free.
> Another point is where the data is stored for the running system. If we allow Solr to index from any storage service that it supports then we also get free integration with most any web app framework and storage service. For the demo site we put the data in a DB and have Solr index it from there. We also store the user history and metadata there. This is supported by most web app frameworks out of the box. You could go a different route and use almost any storage system/file system/content format since Solr supports a wide variety.
> Given a fully flexible Solr standard query and indexing scheme all you need do is tweak the query or data source a bit and you have an item-set recommender (shopping cart) or a contextual recommender (for example boost recs from a category) or a pure metadata/content based recommender.
> If the query and storage is left to Solr+web app framework then the github version is complete if not done. Solr still needs LLR in the more-like-this queries. Term weights to encode strength scores would also be nice and I agree that both of these could use some work.
I would like to take a look at that version - I may have missed some discussion about it; would you posting a link please?
> BTW lest we forget this does not imply the Solr-recommender is better than Myrrix or the Mahout-only recommenders. There needs to be some careful comparison of results. Michael, did you do offline or A/B tests during your implementation?

I ran some offline tests using our historical data, but I don't have a lot of faith in these beyond the fact they indicate we didn't make any obvious implementation errors.  We haven't attempted A/B testing yet since our site is so new, and we need to get a meaningful baseline going and sort out a lot of other more pressing issues on the site - recommendations are only one piece, albeit an important one.
Actually there was an interesting idea for an article posted recently about the difficulty of comparing results across systems in this field: but that's no excuse not to do better.  I'll certainly share when I know more :)