Subject: recommendations with Hadoop and RecommenderJob in Amazon EC2, suggestions for performance?


Hi Stefano, happy new year too!

The running time of RecommenderJob is neither proportional to the number
of users you wanna compute recommendations for nor to the number of
recommendations per single user. Those parameters just influence the
last step of the job, but most time will be spent before when computing
item-item-similarities, which is done independently of the number of
users you wanna have recommendations for or the number of
recommendations per user.

We have some parameters to control the amount of data considered in the
recommendation process, have you tried adjusting them to your needs? If
you haven't I think playing with those should be the best place to start
for you:

  --maxPrefsPerUser maxPrefsPerUser
Maximum number of preferences considered per user in final
recommendation phase

  --maxSimilaritiesPerItem maxSimilaritiesPerItem
Maximum number of similarities considered per item

  --maxCooccurrencesPerItem (-o) maxCooccurrencesPerItem
try to cap the number of cooccurrences per item to this number
It would be very cool if you could keep us up to date with your progress
and maybe provide some numbers. I think there are a lot of things in the
RecommenderJob that could be optimized by us to increase its performance
and scalability, I think we'd be happy to patch it for you if you
encounter a problem.

--sebastian
Am 02.01.2011 10:36, schrieb Stefano Bellasio: