On Fri, Jun 21, 2013 at 10:59 AM, Dan Filimon
Luduan document classification.
Adaptive search engines.
The question of how similar items are is much harder to attack than the
question of roughly which items are very similar. You can deal with the
most related, but in the mid-range even order is very fuzzy.
I'm essentially working with a custom-tailored RowSimilarityJob after
Not that it much matters, I tend to filter out user x item entries based on
the item *and* the user prevalence. This gives me a nicely bounded number
of occurrences for every user and every item.
If you don't want to count the item frequency in advance, then just
down-sampling crazy users is fine.
The reason that it doesn't much matter is that very few elements are