Subject: "LLR with time"


BTW you should take time buckets that are relatively free of daily cycles like 3 day, week, or month buckets for “hot”. This is to remove cyclical affects from the frequencies as much as possible since you need 3 buckets to see the change in change, 2 for the change, and 1 for the event volume.
On Nov 10, 2017, at 4:12 PM, Pat Ferrel <[EMAIL PROTECTED]> wrote:

So your idea is to find anomalies in event frequencies to detect “hot” items?

Interesting, maybe Ted will chime in.

What I do is take the frequency, first, and second, derivatives as measures of popularity, increasing popularity, and increasingly increasing popularity. Put another way popular, trending, and hot. This is simple to do by taking 1, 2, or 3 time buckets and looking at the number of events, derivative (difference), and second derivative. Ranking all items by these value gives various measures of popularity or its increase.

If your use is in a recommender you can add a ranking field to all items and query for “hot” by using the ranking you calculated.

If you want to bias recommendations by hotness, query with user history and boost by your hot field. I suspect the hot field will tend to overwhelm your user history in this case as it would if you used anomalies so you’d also have to normalize the hotness to some range closer to the one created by the user history matching score. I haven’t found a vey good way to mix these in a model so use hot as a method of backfill if you cannot return enough recommendations or in places where you may want to show just hot items. There are several benefits to this method of using hot to rank all items including the fact that you can apply business rules to them just as normal recommendations—so you can ask for hot in “electronics” if you know categories, or hot "in-stock" items, or ...

Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <[EMAIL PROTECTED]> wrote:

Hi "all",

I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.

There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data

saying "Time aware variant of G-Test is possible"

I remember i experimented with exponentially decayed counts some years ago
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure like
T-Digest when working with a lot of counts / cells for every combination of
items. Keeping a t-digest for every combination seems unfeasible.

How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...

Cheers,

Johannes