Hi,

I was reading about Facebook Beringei when I spotted this:

- Extremely efficient streaming compression algorithm. Our streaming

compression algorithm is able to compress real world time series data by

over 90%. The delta of delta compression algorithm used by Beringei is also

fast - we see that a single machine is able to compress more than 1.5

million datapoints/second.

That "*delta of delta*" caught my attention.... This delta of delta

encoding is one of the Facebook Gorilla tricks that allows it to compress

16 bytes into 1.37 bytes on average -- see section 4.1 that describes it --

http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

This seems to be aimed at both time fields and numerical values.

Would Lucene benefit from this?

https://github.com/burmanm/gorilla-tsc seems to be a fresh Java

implementation.

Otis

--

Monitoring - Log Management - Alerting - Anomaly Detection

Solr & Elasticsearch Consulting Support Training - http://sematext.com/

I was reading about Facebook Beringei when I spotted this:

- Extremely efficient streaming compression algorithm. Our streaming

compression algorithm is able to compress real world time series data by

over 90%. The delta of delta compression algorithm used by Beringei is also

fast - we see that a single machine is able to compress more than 1.5

million datapoints/second.

That "*delta of delta*" caught my attention.... This delta of delta

encoding is one of the Facebook Gorilla tricks that allows it to compress

16 bytes into 1.37 bytes on average -- see section 4.1 that describes it --

http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

This seems to be aimed at both time fields and numerical values.

Would Lucene benefit from this?

https://github.com/burmanm/gorilla-tsc seems to be a fresh Java

implementation.

Otis

--

Monitoring - Log Management - Alerting - Anomaly Detection

Solr & Elasticsearch Consulting Support Training - http://sematext.com/