Thank you for your response.

My target is to create a tool capable of grepping through large data sets
of logs (e.g: the size of these sets range from 1TB onwards) and offer
answers to queries in reasonable amount of time (e.g: from seconds to
several minutes, at most 1 hour). The logs are placed in S3 (e.g: the logs
are produced by EMR jobs) in a compressed format (e.g: gzip or LZO). I will
expect some performance tuning to be done in the end in order accomplish my
performance targets.

I don't know your current roadmap, but I will like to contribute to Chukwa
by providing support for reading/storing compressed logs for different
formats (e.g: gzip, bzip2, LZO, Snappy, etc.). Moreover, I will test Chukwa
with S3 as input source and see if it works and contribute here too if
necessary. Are you interested in these kind of contributions ? Does your
roadmap include any performance tuning tasks?


On 9 January 2018 at 18:33, Popa Nicolae <[EMAIL PROTECTED]> wrote: