BTW you may be able to just run the same csv through multiple times and pick a different item-ID column for each “action”. BTW here “csv” means a text file with some delimeter, not the full spec csv with headers, quoted values, and escaped characters.
On Dec 8, 2014, at 4:11 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote:
No classifier, just turn the one csv into several, each being a collection for one action.
user ID,item ID
Where the item ID is whatever the action corresponds too. For instance a <user ID>,<location ID> for being at a location or <user ID>,<item ID> for a purchase etc. These can go directly into the command line of spark-itemsimilarity. --input will always be the file with purchase, --input2 will be the file with the secondary action.
On Dec 8, 2014, at 1:22 AM, Yash Patel <[EMAIL PROTECTED]> wrote:
most columns have different values,when you say preprocess do you mean
using classifiers ?
my dataset is highly structured in nature so i dont understand how a
classifier will work.
On Dec 8, 2014 2:20 AM, "Pat Ferrel" <[EMAIL PROTECTED]> wrote: