It rounds like the original poster isn't clear about the division between
off-line and on-line work.
Almost all production recommendation systems have a large off-line
component which analyzes logs of behavior and produces a recommendation
model. This model typically consists of item-item relationships stored in
a form that is usable by the on-line component of the system. This part is
preparation for recommendation, but is not itself recommendation. This
off-line component can run sequentially or in parallel using map-reduce.
In my experience, with decent down-sampling of excessively active users
and excessively popular items, it isn't unreasonable to reach 100M
non-zeros in the user x item history in the off-line component.
The actual recommendations are produced using the on-line component. This
component reads in the recommendation model, possibly all at once, possibly
on demand and possibly as the model is changed. The model may be read from
a database or from flat files or many other sources. To make a
recommendation, a user history or user id is presented to the
recommendation system. If an id is presented, it is presumed that the
history is available somewhere or that the recommendations have been
pre-computed for that user. In any case, the history is combined with the
recommendation model to produce a recommendation list for the user of the
On Sun, Mar 25, 2012 at 12:25 PM, Sean Owen <[EMAIL PROTECTED]> wrote: