Otis Gospodnetic

2009-06-30, 17:27

Ted Dunning

2009-06-30, 21:35

Otis Gospodnetic

2009-07-02, 16:32

Ted Dunning

2009-07-02, 17:51

Otis Gospodnetic

2009-07-02, 21:40

Ted Dunning

2009-07-05, 00:12

Hello,

This jittering and your sample output look good, and intuitively makes sense, though it looks like I'm not interpreting your recipe correctly.

Here is the code snippet that pretends to be looping through top 20 recommendations:

// exp(-n/5) + rexp() * 0.1

for (int i=1; i < 20; i++) {

float exp = (float) i / 5; // not sure why you used -i /n

float rexp = (float) Math.log(i-Math.random()); // tried with 1 instead of i like you said, too

float rank = exp + rexp * 0.1f;

float round = Math.round(rank);

System.out.println("EXP: " + exp + "\tREXP: " + rexp + "\RANK: " + rank + "\tROUND: " + round);

}

But the output doesn't quite look like yours, so I must be misinterpreting something.

EXP: 0.2 REXP: -0.59645164 RANK: 0.14035484 ROUND: 0.0

EXP: 0.4 REXP: 0.48764116 RANK: 0.44876412 ROUND: 0.0

EXP: 0.6 REXP: 0.89331275 RANK: 0.6893313 ROUND: 1.0

EXP: 0.8 REXP: 1.2796263 RANK: 0.92796266 ROUND: 1.0

EXP: 1.0 REXP: 1.5976489 RANK: 1.1597649 ROUND: 1.0

EXP: 1.2 REXP: 1.6399297 RANK: 1.363993 ROUND: 1.0

EXP: 1.4 REXP: 1.8479612 RANK: 1.5847961 ROUND: 2.0

EXP: 1.6 REXP: 1.9524398 RANK: 1.795244 ROUND: 2.0

EXP: 1.8 REXP: 2.0999322 RANK: 2.009993 ROUND: 2.0

EXP: 2.0 REXP: 2.218352 RANK: 2.2218351 ROUND: 2.0

EXP: 2.2 REXP: 2.3666646 RANK: 2.4366665 ROUND: 2.0

EXP: 2.4 REXP: 2.4445183 RANK: 2.6444519 ROUND: 3.0

EXP: 2.6 REXP: 2.5367393 RANK: 2.853674 ROUND: 3.0

EXP: 2.8 REXP: 2.633957 RANK: 3.0633957 ROUND: 3.0

EXP: 3.0 REXP: 2.7041435 RANK: 3.2704144 ROUND: 3.0

EXP: 3.2 REXP: 2.751766 RANK: 3.4751766 ROUND: 3.0

EXP: 3.4 REXP: 2.8236845 RANK: 3.6823685 ROUND: 4.0

EXP: 3.6 REXP: 2.854854 RANK: 3.8854854 ROUND: 4.0

EXP: 3.8 REXP: 2.9142253 RANK: 4.0914226 ROUND: 4.0

Thanks,

Otis

--

Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----

> From: Ted Dunning <[EMAIL PROTECTED]>

> To: [EMAIL PROTECTED]

> Sent: Wednesday, June 3, 2009 5:23:15 PM

> Subject: Re: Inconsistent recommendations

>

> My experience is that users like to see recommendations that change.

>

> In fact, this preference is strong enough that I now typically add jitter to

> the recommendations that I return. Typically I do this by computing a

> synthetic score used only to permute results lists:

>

> exp(-n/5) + rexp() * 0.1

>

> Here rexp is an exponentially distributed random deviate that can be

> generated using Math.log(1-Math.random()). The value n is the rank of the

> item (offset 0 or 1, doesn't matter). The magic constants (/ 5 and * 0.1)

> must be tuned to fit the number of results you show and how you want to

> trade off stability versus novelty. Usually I implement this as a

> meta-recommendation engine that uses the output of another engine as input

> and which returns permuted results.

>

> The idea here is that the first few items will generally appear in the

> "correct" order. Items beyond the top 10 will be dramatically shuffled.

> Here is a sequence of 20 draws for the top 20 items out of 200 (after

> permutation by synthetic score):

>

> 2 1 4 3 5 8 7 6 14 13 10 15 37 30 146 40 94 76 172

> 125

> 1 2 4 6 3 8 9 5 11 13 15 7 169 190 174 44 90 171 95

> 74

> 1 2 3 4 5 6 9 7 11 8 15 14 16 17 53 81 34 33 37

> 30

> 1 2 3 4 5 6 8 10 7 9 15 12 19 88 121 30 55 43 200

> 168

> 1 2 4 3 5 7 8 10 6 13 11 19 17 133 139 124 194 123 79

> 186

> 1 2 3 4 5 6 9 11 8 12 10 13 14 19 151 53 102 48 117

> 169

> 1 3 2 5 7 4 6 10 12 15 24 59 83 61 156 148 109 28 188

> 126

> 1 2 3 4 6 7 5 8 10 12 14 11 192 57 54 182 38 158 128

> 123

> 1 3 4 6 5 7 8 11 9 10 13 15 2 12 24 28 43 179 180

Otis,

There are several substantive problems with your code, mostly due, I am

sure, to my posting R code which is unfamiliar. The most important that I

see off-hand is that the exponential random variable must be defined as:

- Math.log(1 - Math.random())

The idea is that the argument to log must be in the range (0, 1] so that the

result will be in the range [0, inf). The 1-Math.random() is that way

because the range of Math.random() is [0, 1) instead of (0, 1].

I have a few style beefs that I hope you will take in good humor as well.

I will make comments about both of these in-line.

On Tue, Jun 30, 2009 at 10:27 AM, Otis Gospodnetic <

[EMAIL PROTECTED]> wrote:

>

> // exp(-n/5) + rexp() * 0.1

> for (int i=1; i < 20; i++) {

You should use double for all of this code, otherwise your code may be

considerably slower than desired due to float/double conversions and also

since we are doing exp of some potentially good sized numbers, it is very

easy to run out of dynamic range for floats leading to very surprising

results. This is essentially a style question, but I find it to be a very

bad idea to do this kind of premature optimization of floating point

arithmetic.

float exp = (float) i / 5; // not

> sure why you used -i /n

-i/n was used because that will lead to doing exp(negative number). For

large negative numbers, the slope of exp() becomes very flat which makes

large rearrangements possible. Without the negation, the randomization will

have a very different effect.

float rexp = (float) Math.log(i-Math.random()); // tried with 1

> instead of i like you said, too

The 1 is critical as mentioned above.

float rank = exp + rexp * 0.1f;

I don't see a call to Math.exp anywhere. Perhaps it got lost? That would

probably explain a large part of the problems. Also, this is not the rank,

but rather the synthetic score. Thus this is a misleading name.

float round = Math.round(rank);

I don't think that you want to round like this. Instead, what you should be

doing is accumulating the scores in an array and then sorting the scores.

What I displayed was a permutation that resulted from sorting.

>

> System.out.println("EXP: " + exp + "\tREXP: " + rexp + "\RANK: " +

> rank + "\tROUND: " + round);

> }

>

There are several substantive problems with your code, mostly due, I am

sure, to my posting R code which is unfamiliar. The most important that I

see off-hand is that the exponential random variable must be defined as:

- Math.log(1 - Math.random())

The idea is that the argument to log must be in the range (0, 1] so that the

result will be in the range [0, inf). The 1-Math.random() is that way

because the range of Math.random() is [0, 1) instead of (0, 1].

I have a few style beefs that I hope you will take in good humor as well.

I will make comments about both of these in-line.

On Tue, Jun 30, 2009 at 10:27 AM, Otis Gospodnetic <

[EMAIL PROTECTED]> wrote:

>

> // exp(-n/5) + rexp() * 0.1

> for (int i=1; i < 20; i++) {

considerably slower than desired due to float/double conversions and also

since we are doing exp of some potentially good sized numbers, it is very

easy to run out of dynamic range for floats leading to very surprising

results. This is essentially a style question, but I find it to be a very

bad idea to do this kind of premature optimization of floating point

arithmetic.

float exp = (float) i / 5; // not

> sure why you used -i /n

-i/n was used because that will lead to doing exp(negative number). For

large negative numbers, the slope of exp() becomes very flat which makes

large rearrangements possible. Without the negation, the randomization will

have a very different effect.

float rexp = (float) Math.log(i-Math.random()); // tried with 1

> instead of i like you said, too

The 1 is critical as mentioned above.

float rank = exp + rexp * 0.1f;

I don't see a call to Math.exp anywhere. Perhaps it got lost? That would

probably explain a large part of the problems. Also, this is not the rank,

but rather the synthetic score. Thus this is a misleading name.

float round = Math.round(rank);

I don't think that you want to round like this. Instead, what you should be

doing is accumulating the scores in an array and then sorting the scores.

What I displayed was a permutation that resulted from sorting.

>

> System.out.println("EXP: " + exp + "\tREXP: " + rexp + "\RANK: " +

> rank + "\tROUND: " + round);

> }

>

Thanks Ted, I got it working now.

I was casting to floats just to make the output easier on the eyes for emailing.

I was missing Math.exp(...) and once I had that in, it all started working.

It then became obvious the final output was not a rank, but a score that dictated the output/ordering.

I even put that in a separate class and called it TedsJitter (implements Jitter, since I am trying some other jittering approaches).

Thanks,

Otis

--

Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----

> From: Ted Dunning <[EMAIL PROTECTED]>

> To: [EMAIL PROTECTED]

> Sent: Tuesday, June 30, 2009 5:35:36 PM

> Subject: Re: Inconsistent recommendations

>

> Otis,

>

> There are several substantive problems with your code, mostly due, I am

> sure, to my posting R code which is unfamiliar. The most important that I

> see off-hand is that the exponential random variable must be defined as:

>

> - Math.log(1 - Math.random())

>

> The idea is that the argument to log must be in the range (0, 1] so that the

> result will be in the range [0, inf). The 1-Math.random() is that way

> because the range of Math.random() is [0, 1) instead of (0, 1].

>

> I have a few style beefs that I hope you will take in good humor as well.

>

> I will make comments about both of these in-line.

>

> On Tue, Jun 30, 2009 at 10:27 AM, Otis Gospodnetic <

> [EMAIL PROTECTED]> wrote:

>

> >

> > // exp(-n/5) + rexp() * 0.1

> > for (int i=1; i < 20; i++) {

>

>

> You should use double for all of this code, otherwise your code may be

> considerably slower than desired due to float/double conversions and also

> since we are doing exp of some potentially good sized numbers, it is very

> easy to run out of dynamic range for floats leading to very surprising

> results. This is essentially a style question, but I find it to be a very

> bad idea to do this kind of premature optimization of floating point

> arithmetic.

>

> float exp = (float) i / 5; // not

> > sure why you used -i /n

>

>

> -i/n was used because that will lead to doing exp(negative number). For

> large negative numbers, the slope of exp() becomes very flat which makes

> large rearrangements possible. Without the negation, the randomization will

> have a very different effect.

>

> float rexp = (float) Math.log(i-Math.random()); // tried with 1

> > instead of i like you said, too

>

>

> The 1 is critical as mentioned above.

>

> float rank = exp + rexp * 0.1f;

>

>

> I don't see a call to Math.exp anywhere. Perhaps it got lost? That would

> probably explain a large part of the problems. Also, this is not the rank,

> but rather the synthetic score. Thus this is a misleading name.

>

> float round = Math.round(rank);

>

>

> I don't think that you want to round like this. Instead, what you should be

> doing is accumulating the scores in an array and then sorting the scores.

> What I displayed was a permutation that resulted from sorting.

>

>

> >

> > System.out.println("EXP: " + exp + "\tREXP: " + rexp + "\RANK: " +

> > rank + "\tROUND: " + round);

> > }

> >

On Thu, Jul 2, 2009 at 9:32 AM, Otis Gospodnetic <[EMAIL PROTECTED]

> wrote:

> Thanks Ted, I got it working now.

>

Excellent.

I even put that in a separate class and called it TedsJitter (implements

> Jitter, since I am trying some other jittering approaches).

>

Blush.

Hopefully the name will change before production!

--

Ted Dunning, CTO

DeepDyve

> wrote:

> Thanks Ted, I got it working now.

>

Excellent.

I even put that in a separate class and called it TedsJitter (implements

> Jitter, since I am trying some other jittering approaches).

>

Blush.

Hopefully the name will change before production!

--

Ted Dunning, CTO

DeepDyve

----- Original Message ----

> From: Ted Dunning <[EMAIL PROTECTED]>

> To: [EMAIL PROTECTED]

> Sent: Thursday, July 2, 2009 1:51:16 PM

> Subject: Re: Inconsistent recommendations

> I even put that in a separate class and called it TedsJitter (implements

> > Jitter, since I am trying some other jittering approaches).

> >

>

> Blush.

>

> Hopefully the name will change before production!

I haven't been able to come up with a better one. It's going to production on Monday. Got suggestions? :)

Otis

--

Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Exponential random jitter?

On Thu, Jul 2, 2009 at 2:40 PM, Otis Gospodnetic <[EMAIL PROTECTED]

> wrote:

> I haven't been able to come up with a better one. It's going to production

> on Monday. Got suggestions? :)

On Thu, Jul 2, 2009 at 2:40 PM, Otis Gospodnetic <[EMAIL PROTECTED]

> wrote:

> I haven't been able to come up with a better one. It's going to production

> on Monday. Got suggestions? :)