I think it makes sense indeed for time-series databases. The time field
should grow by regular increments, and numerical values of consecutive
documents are likely to be close to each other. Both cases should compress
efficiently by doing delta of delta encoding.
We haven't really started exploring leveraging the fact that doc values
have an iterator API for compression at all. I think this delta-of-delta
approach would be interesting to explore. Maybe we could encode values in
blocks like postings and decide how to encode each block based on the
actual data. Delta-of-delta would be one option, but sometimes we might
also go with RLE or FOR depending on which one suits the actual data best.
Le mar. 25 avr. 2017 à 04:43, Otis Gospodnetić <[EMAIL PROTECTED]>
a écrit :