That is a form of dimensionality reduction, but that is not what is normally
meant by this. Normally, dimensionality reduction is done by transforming
the input space (dimension = number of items considered) to an abstract
representation that has far fewer variables (usually 10-300, down from 10's
of thousands to millions in the original case).
This can be done using some form of linear transformation (AKA singular
value decomposition AKA latent semantic analysis) or non-linear
transformation (AKA bottleneck network AKA autoencoder) or mixture
distribution (including, inter alia, latent Dirichlet allocation, PLSA,
radial basis functions).
Dimensionality reduction can be really, really bad for runtimes because you
suddenly have a dense matrix problem instead of a sparse matrix problem.
That means that search becomes exhaustive search because you can't use
indexes any more.
You can get around that by reconstructing an item-item mapping using your
reduced dimensional representation and truncating small results. This gives
you a (pretty) sparse result that is amenable to indexing and such. At this
point, you can use indexed retrieval methods such as Lucene to do
recommendation in one step.
The contrast is with nearest neighbor methods which require two retrievals
(first users, then items). The two retrievals aren't just twice the cost
because there are often many more users than there are items. The nearest
neighbor methods also cannot easily make use of global information exposed
by the dimensionality reduction, nor can they use external information such
as that derived by mixed task learning.
On Mon, Jun 1, 2009 at 2:31 PM, Sean Owen <[EMAIL PROTECTED]> wrote:
Ted Dunning, CTO