Subject: Vectorizing arbitrary value types with seq2sparse


Yeah.. that doesn't work at all.

You need different analyzers at least and some fields are numeric, some
textual.  The same words
in different fields (usually) need to be considered separately.  N-grams
raises all kinds of crazy issues.

For instance, what does an n-gram of tags mean?  Are tags even ordered?

Some fields contain dates, but different dates need to be considered ages,
or points in time.

It gets whacky fast.

On Fri, May 6, 2011 at 1:52 PM, Frank Scholten <[EMAIL PROTECTED]>wrote: