Subject: [DISCUSS] KIP-405: Kafka Tiered Storage


Harsha, Sriharsha, Suresh, a couple thoughts:

- How could this be used to leverage fast key-value stores, e.g. Couchbase,
which can serve individual records but maybe not entire segments? Or is the
idea to only support writing and fetching entire segments? Would it make
sense to support both?

- Instead of defining a new interface and/or mechanism to ETL segment files
from brokers to cold storage, can we just leverage Kafka itself? In
particular, we can already ETL records to HDFS via Kafka Connect, Gobblin
etc -- we really just need a way for brokers to read these records back.
I'm wondering whether the new API could be limited to the fetch, and then
existing ETL pipelines could be more easily leveraged. For example, if you
already have an ETL pipeline from Kafka to HDFS, you could leave that in
place and just tell Kafka how to read these records/segments from cold
storage when necessary.

- I'm wondering if we could just add support for loading segments from
remote URIs instead of from file, i.e. via plugins for s3://, hdfs:// etc.
I suspect less broker logic would change in that case -- the broker
wouldn't necessarily care if it reads from file:// or s3:// to load a given
segment.

Combining the previous two comments, I can imagine a URI resolution chain
for segments. For example, first try file:///logs/{topic}/{segment}.log,
then s3://mybucket/{topic}/{date}/{segment}.log, etc, leveraging your
existing ETL pipeline(s).

Ryanne
On Mon, Feb 4, 2019 at 12:01 PM Harsha <[EMAIL PROTECTED]> wrote: