Svetlomir Dimitrov Kasabo...

2011-06-05, 15:08

Hector Yee

2011-06-05, 17:04

Ted Dunning

2011-06-06, 10:12

Svetlomir Kasabov

2011-06-06, 13:36

Svetlomir Dimitrov Kasabo...

2011-06-06, 13:46

Ted Dunning

2011-06-06, 13:53

Josh Patterson

2011-06-06, 20:30

Svetlomir Kasabov

2011-06-14, 14:14

Hello,

I plan using Apache Mahout's Logistic Regression (LR) implementation

in my Master-Thesis. We plan using time series in order to predict,

whether a particular patient will have an instable blood flow soon or

not. Thats's why I want to ask you if it is possible to use Mahout in

connection with time series ? Do you see any potential problems /

risks ?

Many thanks and best regards!

Svetlomir Kasabov.

--

Svetlomir Dimitrov Kasabov

----------------------------------------------------------------

This message was sent using IMP, the Internet Messaging Program.

I plan using Apache Mahout's Logistic Regression (LR) implementation

in my Master-Thesis. We plan using time series in order to predict,

whether a particular patient will have an instable blood flow soon or

not. Thats's why I want to ask you if it is possible to use Mahout in

connection with time series ? Do you see any potential problems /

risks ?

Many thanks and best regards!

Svetlomir Kasabov.

--

Svetlomir Dimitrov Kasabov

----------------------------------------------------------------

This message was sent using IMP, the Internet Messaging Program.

You can also try HMMs:

https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

If you want to do it with a classifier you can window your time series and

make a training set

e.g.

label, feature

stable, (last X seconds of time series)

unstable, (last X seconds of time series)

On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

[EMAIL PROTECTED]> wrote:

> Hello,

>

> I plan using Apache Mahout's Logistic Regression (LR) implementation in my

> Master-Thesis. We plan using time series in order to predict, whether a

> particular patient will have an instable blood flow soon or not. Thats's why

> I want to ask you if it is possible to use Mahout in connection with time

> series ? Do you see any potential problems / risks ?

>

> Many thanks and best regards!

>

> Svetlomir Kasabov.

>

>

>

> --

> Svetlomir Dimitrov Kasabov

>

> ----------------------------------------------------------------

> This message was sent using IMP, the Internet Messaging Program.

>

>

--

Yee Yang Li Hector

http://hectorgon.blogspot.com/ (tech + travel)

http://hectorgon.com (book reviews)

https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

If you want to do it with a classifier you can window your time series and

make a training set

e.g.

label, feature

stable, (last X seconds of time series)

unstable, (last X seconds of time series)

On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

[EMAIL PROTECTED]> wrote:

> Hello,

>

> I plan using Apache Mahout's Logistic Regression (LR) implementation in my

> Master-Thesis. We plan using time series in order to predict, whether a

> particular patient will have an instable blood flow soon or not. Thats's why

> I want to ask you if it is possible to use Mahout in connection with time

> series ? Do you see any potential problems / risks ?

>

> Many thanks and best regards!

>

> Svetlomir Kasabov.

>

>

>

> --

> Svetlomir Dimitrov Kasabov

>

> ----------------------------------------------------------------

> This message was sent using IMP, the Internet Messaging Program.

>

>

Yee Yang Li Hector

http://hectorgon.blogspot.com/ (tech + travel)

http://hectorgon.com (book reviews)

What Hector said.

You will need to extract features from your time history.

The question also comes up about how large is your data set. If it is less

than 100,000 training examples or so, then you will probably be better off

using a system like R which handles that much data easily and has

essentially every kind of classifier available for you to try.

If you have 1 million training examples or more, then Mahout begins to

dominate alternatives. Even there, Mahout is currently optimized for sparse

data which is not what you have. My guess is that using the

OnlineLogisticRegression or some of Hector's recent patches is the way to

go. The AdaptiveLogisticRegression is heavily oriented around per term

annealing and magic knob tuning in the context of sparse data.

Can you post your data?

On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee <[EMAIL PROTECTED]> wrote:

> You can also try HMMs:

>

>

> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>

> If you want to do it with a classifier you can window your time series and

> make a training set

>

> e.g.

>

> label, feature

> stable, (last X seconds of time series)

> unstable, (last X seconds of time series)

>

> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

> [EMAIL PROTECTED]> wrote:

>

> > Hello,

> >

> > I plan using Apache Mahout's Logistic Regression (LR) implementation in

> my

> > Master-Thesis. We plan using time series in order to predict, whether a

> > particular patient will have an instable blood flow soon or not. Thats's

> why

> > I want to ask you if it is possible to use Mahout in connection with time

> > series ? Do you see any potential problems / risks ?

> >

> > Many thanks and best regards!

> >

> > Svetlomir Kasabov.

> >

> >

> >

> > --

> > Svetlomir Dimitrov Kasabov

> >

> > ----------------------------------------------------------------

> > This message was sent using IMP, the Internet Messaging Program.

> >

> >

>

>

> --

> Yee Yang Li Hector

> http://hectorgon.blogspot.com/ (tech + travel)

> http://hectorgon.com (book reviews)

>

You will need to extract features from your time history.

The question also comes up about how large is your data set. If it is less

than 100,000 training examples or so, then you will probably be better off

using a system like R which handles that much data easily and has

essentially every kind of classifier available for you to try.

If you have 1 million training examples or more, then Mahout begins to

dominate alternatives. Even there, Mahout is currently optimized for sparse

data which is not what you have. My guess is that using the

OnlineLogisticRegression or some of Hector's recent patches is the way to

go. The AdaptiveLogisticRegression is heavily oriented around per term

annealing and magic knob tuning in the context of sparse data.

Can you post your data?

On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee <[EMAIL PROTECTED]> wrote:

> You can also try HMMs:

>

>

> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>

> If you want to do it with a classifier you can window your time series and

> make a training set

>

> e.g.

>

> label, feature

> stable, (last X seconds of time series)

> unstable, (last X seconds of time series)

>

> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

> [EMAIL PROTECTED]> wrote:

>

> > Hello,

> >

> > I plan using Apache Mahout's Logistic Regression (LR) implementation in

> my

> > Master-Thesis. We plan using time series in order to predict, whether a

> > particular patient will have an instable blood flow soon or not. Thats's

> why

> > I want to ask you if it is possible to use Mahout in connection with time

> > series ? Do you see any potential problems / risks ?

> >

> > Many thanks and best regards!

> >

> > Svetlomir Kasabov.

> >

> >

> >

> > --

> > Svetlomir Dimitrov Kasabov

> >

> > ----------------------------------------------------------------

> > This message was sent using IMP, the Internet Messaging Program.

> >

> >

>

>

> --

> Yee Yang Li Hector

> http://hectorgon.blogspot.com/ (tech + travel)

> http://hectorgon.com (book reviews)

>

Thanks for the useful replies, I really appreciate that!

@Ted and Hector: My initial parameters (predictors) are blood pressures,

heart rates, etc: they come every minute from a patient's monitor.

In my implementation, I plan refering to this Paper :

http://www.multi-science.co.uk/acce-free.pdf on page 7 (Table 1) you can

see the parameters used. On page 17, figure 4 you can see vizualization

of the prediction using time series:

I think I still plan using the logistic regression implementation (since

I am already worked into it), but I am confuzed how to implement time

series with Mahout. Should I create periodically (for example every 15

minutes) a new logistic regression model, in order to predict the

probability of instability? Then the amount of training data depends on

the 'time window for the past' that I will be using. For example, for

data only two hours from the past, I will have only circa 60 * 2 = 120

examples for creating a temporal model (I assume that I will need one

compound data vector pro minute, including blood pressures, heart rates,

etc...)

Or should I implement the time series so, that I train the model only

once with old data of many patients and the training algorithm will be

so, that it checks what is the patient's hemodynamic stability in two

hours (since this data is also known during the training)? In this case,

I will have potentually many more examples (one million or more...)

Many thanks, best regards and sorry for the long post.

Svetlomir.

Am 06.06.2011 12:12, schrieb Ted Dunning:

> What Hector said.

>

> You will need to extract features from your time history.

>

> The question also comes up about how large is your data set. If it is less

> than 100,000 training examples or so, then you will probably be better off

> using a system like R which handles that much data easily and has

> essentially every kind of classifier available for you to try.

>

> If you have 1 million training examples or more, then Mahout begins to

> dominate alternatives. Even there, Mahout is currently optimized for sparse

> data which is not what you have. My guess is that using the

> OnlineLogisticRegression or some of Hector's recent patches is the way to

> go. The AdaptiveLogisticRegression is heavily oriented around per term

> annealing and magic knob tuning in the context of sparse data.

>

> Can you post your data?

>

> On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee<[EMAIL PROTECTED]> wrote:

>

>> You can also try HMMs:

>>

>>

>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>>

>> If you want to do it with a classifier you can window your time series and

>> make a training set

>>

>> e.g.

>>

>> label, feature

>> stable, (last X seconds of time series)

>> unstable, (last X seconds of time series)

>>

>> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov<

>> [EMAIL PROTECTED]> wrote:

>>

>>> Hello,

>>>

>>> I plan using Apache Mahout's Logistic Regression (LR) implementation in

>> my

>>> Master-Thesis. We plan using time series in order to predict, whether a

>>> particular patient will have an instable blood flow soon or not. Thats's

>> why

>>> I want to ask you if it is possible to use Mahout in connection with time

>>> series ? Do you see any potential problems / risks ?

>>>

>>> Many thanks and best regards!

>>>

>>> Svetlomir Kasabov.

>>>

>>>

>>>

>>> --

>>> Svetlomir Dimitrov Kasabov

>>>

>>> ----------------------------------------------------------------

>>> This message was sent using IMP, the Internet Messaging Program.

>>>

>>>

>> --

>> Yee Yang Li Hector

>> http://hectorgon.blogspot.com/ (tech + travel)

>> http://hectorgon.com (book reviews)

>>

@Ted and Hector: My initial parameters (predictors) are blood pressures,

heart rates, etc: they come every minute from a patient's monitor.

In my implementation, I plan refering to this Paper :

http://www.multi-science.co.uk/acce-free.pdf on page 7 (Table 1) you can

see the parameters used. On page 17, figure 4 you can see vizualization

of the prediction using time series:

I think I still plan using the logistic regression implementation (since

I am already worked into it), but I am confuzed how to implement time

series with Mahout. Should I create periodically (for example every 15

minutes) a new logistic regression model, in order to predict the

probability of instability? Then the amount of training data depends on

the 'time window for the past' that I will be using. For example, for

data only two hours from the past, I will have only circa 60 * 2 = 120

examples for creating a temporal model (I assume that I will need one

compound data vector pro minute, including blood pressures, heart rates,

etc...)

Or should I implement the time series so, that I train the model only

once with old data of many patients and the training algorithm will be

so, that it checks what is the patient's hemodynamic stability in two

hours (since this data is also known during the training)? In this case,

I will have potentually many more examples (one million or more...)

Many thanks, best regards and sorry for the long post.

Svetlomir.

Am 06.06.2011 12:12, schrieb Ted Dunning:

> What Hector said.

>

> You will need to extract features from your time history.

>

> The question also comes up about how large is your data set. If it is less

> than 100,000 training examples or so, then you will probably be better off

> using a system like R which handles that much data easily and has

> essentially every kind of classifier available for you to try.

>

> If you have 1 million training examples or more, then Mahout begins to

> dominate alternatives. Even there, Mahout is currently optimized for sparse

> data which is not what you have. My guess is that using the

> OnlineLogisticRegression or some of Hector's recent patches is the way to

> go. The AdaptiveLogisticRegression is heavily oriented around per term

> annealing and magic knob tuning in the context of sparse data.

>

> Can you post your data?

>

> On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee<[EMAIL PROTECTED]> wrote:

>

>> You can also try HMMs:

>>

>>

>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>>

>> If you want to do it with a classifier you can window your time series and

>> make a training set

>>

>> e.g.

>>

>> label, feature

>> stable, (last X seconds of time series)

>> unstable, (last X seconds of time series)

>>

>> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov<

>> [EMAIL PROTECTED]> wrote:

>>

>>> Hello,

>>>

>>> I plan using Apache Mahout's Logistic Regression (LR) implementation in

>> my

>>> Master-Thesis. We plan using time series in order to predict, whether a

>>> particular patient will have an instable blood flow soon or not. Thats's

>> why

>>> I want to ask you if it is possible to use Mahout in connection with time

>>> series ? Do you see any potential problems / risks ?

>>>

>>> Many thanks and best regards!

>>>

>>> Svetlomir Kasabov.

>>>

>>>

>>>

>>> --

>>> Svetlomir Dimitrov Kasabov

>>>

>>> ----------------------------------------------------------------

>>> This message was sent using IMP, the Internet Messaging Program.

>>>

>>>

>> --

>> Yee Yang Li Hector

>> http://hectorgon.blogspot.com/ (tech + travel)

>> http://hectorgon.com (book reviews)

>>

Thanks for the useful replies, I really appreciate that!

@Ted and Hector: My initial parameters (predictors) are blood

pressures, heart rates, etc: they come every minute from a patient's

monitor.

In my implementation, I plan refering to this Paper :

http://www.multi-science.co.uk/acce-free.pdf on page 7 (Table 1) you

can see the parameters used. On page 17, figure 4 you can see

vizualization of the prediction using time series:

I think I still plan using the logistic regression implementation

(since I am already worked into it), but I am confuzed how to

implement time series with Mahout. Should I create periodically (for

example every 15 minutes) a new logistic regression model, in order to

predict the probability of instability? Then the amount of training

data depends on the 'time window for the past' that I will be using.

For example, for data only two hours from the past, I will have only

circa 60 * 2 = 120 examples for creating a temporal model (I assume

that I will need one compound data vector pro minute, including blood

pressures, heart rates, etc...)

Or should I implement the time series so, that I train the model only

once with old data of many patients and the training algorithm will be

so, that it checks what is the patient's hemodynamic stability in two

hours (since this information is also known during the training)? In

this case, I will potentually have many more examples (one million or

more...)

Many thanks, best regards and sorry for the long post.

Svetlomir.

Zitat von Ted Dunning <[EMAIL PROTECTED]>:

> What Hector said.

>

> You will need to extract features from your time history.

>

> The question also comes up about how large is your data set. If it is less

> than 100,000 training examples or so, then you will probably be better off

> using a system like R which handles that much data easily and has

> essentially every kind of classifier available for you to try.

>

> If you have 1 million training examples or more, then Mahout begins to

> dominate alternatives. Even there, Mahout is currently optimized for sparse

> data which is not what you have. My guess is that using the

> OnlineLogisticRegression or some of Hector's recent patches is the way to

> go. The AdaptiveLogisticRegression is heavily oriented around per term

> annealing and magic knob tuning in the context of sparse data.

>

> Can you post your data?

>

> On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee <[EMAIL PROTECTED]> wrote:

>

>> You can also try HMMs:

>>

>>

>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>>

>> If you want to do it with a classifier you can window your time series and

>> make a training set

>>

>> e.g.

>>

>> label, feature

>> stable, (last X seconds of time series)

>> unstable, (last X seconds of time series)

>>

>> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

>> [EMAIL PROTECTED]> wrote:

>>

>> > Hello,

>> >

>> > I plan using Apache Mahout's Logistic Regression (LR) implementation in

>> my

>> > Master-Thesis. We plan using time series in order to predict, whether a

>> > particular patient will have an instable blood flow soon or not. Thats's

>> why

>> > I want to ask you if it is possible to use Mahout in connection with time

>> > series ? Do you see any potential problems / risks ?

>> >

>> > Many thanks and best regards!

>> >

>> > Svetlomir Kasabov.

>> >

>> >

>> >

>> > --

>> > Svetlomir Dimitrov Kasabov

>> >

>> > ----------------------------------------------------------------

>> > This message was sent using IMP, the Internet Messaging Program.

>> >

>> >

>>

>>

>> --

>> Yee Yang Li Hector

>> http://hectorgon.blogspot.com/ (tech + travel)

>> http://hectorgon.com (book reviews)

>>

>

--

Svetlomir Dimitrov Kasabov

----------------------------------------------------------------

This message was sent using IMP, the Internet Messaging Program.

@Ted and Hector: My initial parameters (predictors) are blood

pressures, heart rates, etc: they come every minute from a patient's

monitor.

In my implementation, I plan refering to this Paper :

http://www.multi-science.co.uk/acce-free.pdf on page 7 (Table 1) you

can see the parameters used. On page 17, figure 4 you can see

vizualization of the prediction using time series:

I think I still plan using the logistic regression implementation

(since I am already worked into it), but I am confuzed how to

implement time series with Mahout. Should I create periodically (for

example every 15 minutes) a new logistic regression model, in order to

predict the probability of instability? Then the amount of training

data depends on the 'time window for the past' that I will be using.

For example, for data only two hours from the past, I will have only

circa 60 * 2 = 120 examples for creating a temporal model (I assume

that I will need one compound data vector pro minute, including blood

pressures, heart rates, etc...)

Or should I implement the time series so, that I train the model only

once with old data of many patients and the training algorithm will be

so, that it checks what is the patient's hemodynamic stability in two

hours (since this information is also known during the training)? In

this case, I will potentually have many more examples (one million or

more...)

Many thanks, best regards and sorry for the long post.

Svetlomir.

Zitat von Ted Dunning <[EMAIL PROTECTED]>:

> What Hector said.

>

> You will need to extract features from your time history.

>

> The question also comes up about how large is your data set. If it is less

> than 100,000 training examples or so, then you will probably be better off

> using a system like R which handles that much data easily and has

> essentially every kind of classifier available for you to try.

>

> If you have 1 million training examples or more, then Mahout begins to

> dominate alternatives. Even there, Mahout is currently optimized for sparse

> data which is not what you have. My guess is that using the

> OnlineLogisticRegression or some of Hector's recent patches is the way to

> go. The AdaptiveLogisticRegression is heavily oriented around per term

> annealing and magic knob tuning in the context of sparse data.

>

> Can you post your data?

>

> On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee <[EMAIL PROTECTED]> wrote:

>

>> You can also try HMMs:

>>

>>

>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>>

>> If you want to do it with a classifier you can window your time series and

>> make a training set

>>

>> e.g.

>>

>> label, feature

>> stable, (last X seconds of time series)

>> unstable, (last X seconds of time series)

>>

>> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

>> [EMAIL PROTECTED]> wrote:

>>

>> > Hello,

>> >

>> > I plan using Apache Mahout's Logistic Regression (LR) implementation in

>> my

>> > Master-Thesis. We plan using time series in order to predict, whether a

>> > particular patient will have an instable blood flow soon or not. Thats's

>> why

>> > I want to ask you if it is possible to use Mahout in connection with time

>> > series ? Do you see any potential problems / risks ?

>> >

>> > Many thanks and best regards!

>> >

>> > Svetlomir Kasabov.

>> >

>> >

>> >

>> > --

>> > Svetlomir Dimitrov Kasabov

>> >

>> > ----------------------------------------------------------------

>> > This message was sent using IMP, the Internet Messaging Program.

>> >

>> >

>>

>>

>> --

>> Yee Yang Li Hector

>> http://hectorgon.blogspot.com/ (tech + travel)

>> http://hectorgon.com (book reviews)

>>

>

--

Svetlomir Dimitrov Kasabov

----------------------------------------------------------------

This message was sent using IMP, the Internet Messaging Program.

Each training example should have recent historical predictor variables and

the future state.

I would generate training data relatively often. One way to do that is to

take all points where you see instability and all minutes where you don't

see near future instability. There will be a fair bit of repetition which

you could decrease by looking only at a sampling of negative examples.

Millions of data points would be excellent.

On Mon, Jun 6, 2011 at 6:46 AM, Svetlomir Dimitrov Kasabov <

[EMAIL PROTECTED]> wrote:

> Thanks for the useful replies, I really appreciate that!

>

> @Ted and Hector: My initial parameters (predictors) are blood pressures,

> heart rates, etc: they come every minute from a patient's monitor.

> In my implementation, I plan refering to this Paper :

> http://www.multi-science.co.uk/acce-free.pdf on page 7 (Table 1) you can

> see the parameters used. On page 17, figure 4 you can see vizualization of

> the prediction using time series:

>

> I think I still plan using the logistic regression implementation (since I

> am already worked into it), but I am confuzed how to implement time series

> with Mahout. Should I create periodically (for example every 15 minutes) a

> new logistic regression model, in order to predict the probability of

> instability? Then the amount of training data depends on the 'time window

> for the past' that I will be using. For example, for data only two hours

> from the past, I will have only circa 60 * 2 = 120 examples for creating a

> temporal model (I assume that I will need one compound data vector pro

> minute, including blood pressures, heart rates, etc...)

>

> Or should I implement the time series so, that I train the model only once

> with old data of many patients and the training algorithm will be so, that

> it checks what is the patient's hemodynamic stability in two hours (since

> this information is also known during the training)? In this case, I will

> potentually have many more examples (one million or more...)

>

>

> Many thanks, best regards and sorry for the long post.

>

> Svetlomir.

>

>

>

>

> Zitat von Ted Dunning <[EMAIL PROTECTED]>:

>

> What Hector said.

>>

>> You will need to extract features from your time history.

>>

>> The question also comes up about how large is your data set. If it is

>> less

>> than 100,000 training examples or so, then you will probably be better off

>> using a system like R which handles that much data easily and has

>> essentially every kind of classifier available for you to try.

>>

>> If you have 1 million training examples or more, then Mahout begins to

>> dominate alternatives. Even there, Mahout is currently optimized for

>> sparse

>> data which is not what you have. My guess is that using the

>> OnlineLogisticRegression or some of Hector's recent patches is the way to

>> go. The AdaptiveLogisticRegression is heavily oriented around per term

>> annealing and magic knob tuning in the context of sparse data.

>>

>> Can you post your data?

>>

>> On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee <[EMAIL PROTECTED]> wrote:

>>

>> You can also try HMMs:

>>>

>>>

>>>

>>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>>>

>>> If you want to do it with a classifier you can window your time series

>>> and

>>> make a training set

>>>

>>> e.g.

>>>

>>> label, feature

>>> stable, (last X seconds of time series)

>>> unstable, (last X seconds of time series)

>>>

>>> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

>>> [EMAIL PROTECTED]> wrote:

>>>

>>> > Hello,

>>> >

>>> > I plan using Apache Mahout's Logistic Regression (LR) implementation in

>>> my

>>> > Master-Thesis. We plan using time series in order to predict, whether a

>>> > particular patient will have an instable blood flow soon or not.

>>> Thats's

>>> why

>>> > I want to ask you if it is possible to use Mahout in connection with

>

the future state.

I would generate training data relatively often. One way to do that is to

take all points where you see instability and all minutes where you don't

see near future instability. There will be a fair bit of repetition which

you could decrease by looking only at a sampling of negative examples.

Millions of data points would be excellent.

On Mon, Jun 6, 2011 at 6:46 AM, Svetlomir Dimitrov Kasabov <

[EMAIL PROTECTED]> wrote:

> Thanks for the useful replies, I really appreciate that!

>

> @Ted and Hector: My initial parameters (predictors) are blood pressures,

> heart rates, etc: they come every minute from a patient's monitor.

> In my implementation, I plan refering to this Paper :

> http://www.multi-science.co.uk/acce-free.pdf on page 7 (Table 1) you can

> see the parameters used. On page 17, figure 4 you can see vizualization of

> the prediction using time series:

>

> I think I still plan using the logistic regression implementation (since I

> am already worked into it), but I am confuzed how to implement time series

> with Mahout. Should I create periodically (for example every 15 minutes) a

> new logistic regression model, in order to predict the probability of

> instability? Then the amount of training data depends on the 'time window

> for the past' that I will be using. For example, for data only two hours

> from the past, I will have only circa 60 * 2 = 120 examples for creating a

> temporal model (I assume that I will need one compound data vector pro

> minute, including blood pressures, heart rates, etc...)

>

> Or should I implement the time series so, that I train the model only once

> with old data of many patients and the training algorithm will be so, that

> it checks what is the patient's hemodynamic stability in two hours (since

> this information is also known during the training)? In this case, I will

> potentually have many more examples (one million or more...)

>

>

> Many thanks, best regards and sorry for the long post.

>

> Svetlomir.

>

>

>

>

> Zitat von Ted Dunning <[EMAIL PROTECTED]>:

>

> What Hector said.

>>

>> You will need to extract features from your time history.

>>

>> The question also comes up about how large is your data set. If it is

>> less

>> than 100,000 training examples or so, then you will probably be better off

>> using a system like R which handles that much data easily and has

>> essentially every kind of classifier available for you to try.

>>

>> If you have 1 million training examples or more, then Mahout begins to

>> dominate alternatives. Even there, Mahout is currently optimized for

>> sparse

>> data which is not what you have. My guess is that using the

>> OnlineLogisticRegression or some of Hector's recent patches is the way to

>> go. The AdaptiveLogisticRegression is heavily oriented around per term

>> annealing and magic knob tuning in the context of sparse data.

>>

>> Can you post your data?

>>

>> On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee <[EMAIL PROTECTED]> wrote:

>>

>> You can also try HMMs:

>>>

>>>

>>>

>>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/sequencelearning/hmm/package-tree.html

>>>

>>> If you want to do it with a classifier you can window your time series

>>> and

>>> make a training set

>>>

>>> e.g.

>>>

>>> label, feature

>>> stable, (last X seconds of time series)

>>> unstable, (last X seconds of time series)

>>>

>>> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov <

>>> [EMAIL PROTECTED]> wrote:

>>>

>>> > Hello,

>>> >

>>> > I plan using Apache Mahout's Logistic Regression (LR) implementation in

>>> my

>>> > Master-Thesis. We plan using time series in order to predict, whether a

>>> > particular patient will have an instable blood flow soon or not.

>>> Thats's

>>> why

>>> > I want to ask you if it is possible to use Mahout in connection with

I've done a bit of time series data mining with Hadoop; I've written

up some basics on time series and map reduce at our blog:

http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/

http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/

http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

while these articles wont help you on the LR end of things, it does

give you working code on github to work from as a basis wrt time

series and secondary sort (and sliding window).

Josh

On Sun, Jun 5, 2011 at 10:08 AM, Svetlomir Dimitrov Kasabov

<[EMAIL PROTECTED]> wrote:

> Hello,

>

> I plan using Apache Mahout's Logistic Regression (LR) implementation in my

> Master-Thesis. We plan using time series in order to predict, whether a

> particular patient will have an instable blood flow soon or not. Thats's why

> I want to ask you if it is possible to use Mahout in connection with time

> series ? Do you see any potential problems / risks ?

>

> Many thanks and best regards!

>

> Svetlomir Kasabov.

>

>

>

> --

> Svetlomir Dimitrov Kasabov

>

> ----------------------------------------------------------------

> This message was sent using IMP, the Internet Messaging Program.

>

>

--

Twitter: @jpatanooga

Solution Architect @ Cloudera

hadoop: http://www.cloudera.com

blog: http://jpatterson.floe.tv

up some basics on time series and map reduce at our blog:

http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/

http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/

http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

while these articles wont help you on the LR end of things, it does

give you working code on github to work from as a basis wrt time

series and secondary sort (and sliding window).

Josh

On Sun, Jun 5, 2011 at 10:08 AM, Svetlomir Dimitrov Kasabov

<[EMAIL PROTECTED]> wrote:

> Hello,

>

> I plan using Apache Mahout's Logistic Regression (LR) implementation in my

> Master-Thesis. We plan using time series in order to predict, whether a

> particular patient will have an instable blood flow soon or not. Thats's why

> I want to ask you if it is possible to use Mahout in connection with time

> series ? Do you see any potential problems / risks ?

>

> Many thanks and best regards!

>

> Svetlomir Kasabov.

>

>

>

> --

> Svetlomir Dimitrov Kasabov

>

> ----------------------------------------------------------------

> This message was sent using IMP, the Internet Messaging Program.

>

>

--

Twitter: @jpatanooga

Solution Architect @ Cloudera

hadoop: http://www.cloudera.com

blog: http://jpatterson.floe.tv

Many thanks for the replies to all of you!

Ok, now I have developed a vague concept how to train Mahout's

OnlineLogisticRegression moded using times series (correct me if you

detect some issue):

Given the following observations for patient 1, where a predictor is

'Heart Rate' and a target variable is 'State':

Hour | Heart Rate (mean) | State

-----------------------------------------------

1. | 90 | stable

2. | 92 | stable

3. | 94 | stable

4. | 98 | stable

5 | 100 | instable

I want to train Mahout to predict the 'State' from 1 hour in the future

(future window), based on the data from 1 hour in the past (past

window). We assume we are in hour number 2 from the table. We should

take 'Heart Rate' (or some other deltas, derived from heart rates) from

hour 1 and the 'State' from hour 3 in order to create a training

example. The next training example will be with 'Heart Rate' from hour

2 and the 'State' from hour 4. And so on.

My question is: how does Mahout discover the 'time'-aspect of the

training: won't I achieve the same result when I swap the training

examples ? Am I missing something ? Are there other issues in the concept?

Thanks and best regards,

Svetlomir.

Am 06.06.2011 22:30, schrieb Josh Patterson:

> I've done a bit of time series data mining with Hadoop; I've written

> up some basics on time series and map reduce at our blog:

>

> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/

> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/

> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

>

> while these articles wont help you on the LR end of things, it does

> give you working code on github to work from as a basis wrt time

> series and secondary sort (and sliding window).

>

> Josh

>

> On Sun, Jun 5, 2011 at 10:08 AM, Svetlomir Dimitrov Kasabov

> <[EMAIL PROTECTED]> wrote:

>> Hello,

>>

>> I plan using Apache Mahout's Logistic Regression (LR) implementation in my

>> Master-Thesis. We plan using time series in order to predict, whether a

>> particular patient will have an instable blood flow soon or not. Thats's why

>> I want to ask you if it is possible to use Mahout in connection with time

>> series ? Do you see any potential problems / risks ?

>>

>> Many thanks and best regards!

>>

>> Svetlomir Kasabov.

>>

>>

>>

>> --

>> Svetlomir Dimitrov Kasabov

>>

>> ----------------------------------------------------------------

>> This message was sent using IMP, the Internet Messaging Program.

>>

>>

>

>

Ok, now I have developed a vague concept how to train Mahout's

OnlineLogisticRegression moded using times series (correct me if you

detect some issue):

Given the following observations for patient 1, where a predictor is

'Heart Rate' and a target variable is 'State':

Hour | Heart Rate (mean) | State

-----------------------------------------------

1. | 90 | stable

2. | 92 | stable

3. | 94 | stable

4. | 98 | stable

5 | 100 | instable

I want to train Mahout to predict the 'State' from 1 hour in the future

(future window), based on the data from 1 hour in the past (past

window). We assume we are in hour number 2 from the table. We should

take 'Heart Rate' (or some other deltas, derived from heart rates) from

hour 1 and the 'State' from hour 3 in order to create a training

example. The next training example will be with 'Heart Rate' from hour

2 and the 'State' from hour 4. And so on.

My question is: how does Mahout discover the 'time'-aspect of the

training: won't I achieve the same result when I swap the training

examples ? Am I missing something ? Are there other issues in the concept?

Thanks and best regards,

Svetlomir.

Am 06.06.2011 22:30, schrieb Josh Patterson:

> I've done a bit of time series data mining with Hadoop; I've written

> up some basics on time series and map reduce at our blog:

>

> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-1/

> http://www.cloudera.com/blog/2011/03/simple-moving-average-secondary-sort-and-mapreduce-part-2/

> http://www.cloudera.com/blog/2011/04/simple-moving-average-secondary-sort-and-mapreduce-part-3/

>

> while these articles wont help you on the LR end of things, it does

> give you working code on github to work from as a basis wrt time

> series and secondary sort (and sliding window).

>

> Josh

>

> On Sun, Jun 5, 2011 at 10:08 AM, Svetlomir Dimitrov Kasabov

> <[EMAIL PROTECTED]> wrote:

>> Hello,

>>

>> I plan using Apache Mahout's Logistic Regression (LR) implementation in my

>> Master-Thesis. We plan using time series in order to predict, whether a

>> particular patient will have an instable blood flow soon or not. Thats's why

>> I want to ask you if it is possible to use Mahout in connection with time

>> series ? Do you see any potential problems / risks ?

>>

>> Many thanks and best regards!

>>

>> Svetlomir Kasabov.

>>

>>

>>

>> --

>> Svetlomir Dimitrov Kasabov

>>

>> ----------------------------------------------------------------

>> This message was sent using IMP, the Internet Messaging Program.

>>

>>

>

>