Subject: Tastify

Hierarchical modeling techniques work well on structures like this if you
have good resolution of your meta-data.  Resolving and disambiguating artist
and track names can be difficult unless you have total control over the
meta-data source.

The basic idea is that you model an artist as a distribution over "concept
space", which is just a fancy name for latent  variables you don't plan to
understnad.   then an album is sampled from the artists and is another
distribution and finally a track is sampled from the album.  This is similar
to the way that in LDA, documents and words are distributions over your
latent concept variables.  Specific meanings are chosen at each point in a
document and the word you observe is chosen based on the concept at that

Since you only observe which word appears in which document, you have to
reverse-engineer what the latent concepts might have been by getting a
compromise between the word and document distributions.

In your case, you have a simpler generative model, but similar techniques
should apply.

On Sat, Jun 13, 2009 at 8:53 AM, Karl Wettin <[EMAIL PROTECTED]> wrote:
Ted Dunning, CTO