As you say, this essentially never happens in practice and is even more rare
to have large numbers of identical users.
Small numbers of identical users add nothing, but (other than taking up
space) they also cost nothing in terms of recommendations.
I could imagine certain use cases where many users had an identical
introductory user experience, though. In that situation, your suggestion
would have significant merit.
My own feeling is that nearest-neighbor algorithms in user space are not a
good idea except for very small systems. Better to do an off-line reduction
of all of the data to get a better smoothing over all cases. Then you can
do real time recommendations using a latent form of the nearest neighbor
algorithm. The advantages are two-fold. First, you get much better speed
and significantly lower memory residency. Secondly, you can get much better
results because the analysis you do off-line can expend significantly more
resources than you can afford to expend in real-time.
On Mon, Jun 1, 2009 at 2:06 PM, Otis Gospodnetic <[EMAIL PROTECTED]
> I was stepping through Taste and noticed that users with 1.0 similarity to
> the target user end up in that user's neighbourhood. 1.0 similarity between
> users means users are exactly the same, so is there a point in collecting
> them? Since they are exactly the same as the target user, we can't really
> get any new items to recommend from them. Is this correct?
> It's probably not a frequent case to have users with identical item
> preferences, but imagine a case where you are computing recommendations from
> top 10 most similar users and those 10 most similar users happen to be all
> perfectly similar users, thus yielding no recommendations.
> Sematext -- http://sematext.com/
-- Lucene - Solr - Nutch
Ted Dunning, CTO
111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086http://www.deepdyve.com