Subject: Running a stand alone Mirror Maker 2.0 cluster and Issues


>> You are correct. I'm working on a KIP and PoC to introduce
transactions to
>> Connect for this exact purpose :)

That is awesome. Any time frame ?
In the mean time the SLA as of now

1. It is conceivable that we flush the producer to the target cluster but
fail to offset commit. If there was a restart before the next successful
offset commit, there will be  duplicates  and a part of data is replayed (
at least once ) ?

2. The same can be said about  partial flushes, though am not sure about
how kafka addresses flush ( Is a flush either success or a failure, and
nothing in between )

Thanks..

On Tue, Oct 15, 2019 at 12:34 PM Ryanne Dolan <[EMAIL PROTECTED]> wrote:

> Hey Vishal, glad to hear you're making progress.
>
> > 1. It seems though that flushing [...] the producer and setting the
> > offset to the compacting topic is not atomic  OR  do we use
> > transactions here  ?
>
> You are correct. I'm working on a KIP and PoC to introduce transactions to
> Connect for this exact purpose :)
>
> > I think these are 4 threads ( b'coz num.tasks=4 ), and I have 2 topics
> with
> > 1 partition each. Do I assume this right, as in there are 4 consumer
> groups
> > ( on CG per thread ) ...
>
> Some details here:
> - tasks.max controls the maximum number of tasks created per Connector
> instance. Both MirrorSourceConnector and MirrorCheckpointConnector will
> create multiple tasks (up to tasks.max), but MirrorHeartbeatConnector only
> ever creates a single task. Moreover, there cannot be more tasks than
> topic-partitions (for MirrorSourceConnector) or consumer groups (for
> MirrorCheckpointConnector). So if you have two topics with one partition
> each and 1 consumer group total, you'll have two MirrorSourceConnector
> tasks, one MirrorHeartbeatConnector task, and one MirrorCheckpointConnector
> tasks, for a total of four. And that's in one direction only: if you have
> multiple source->target herders enabled, each will create tasks
> independently.
> - There are no consumer groups in MM2, technically. The Connect framework
> uses the Coordinator API and internal topics to divide tasks among workers
> -- not a consumer group per se. The MM2 connectors use the assign() API,
> not the subscribe() API, so there are no consumer groups there either. In
> fact, they don't commit() either. This is nice, as it eliminates a lot of
> the rebalancing problems legacy MirrorMaker has been plagued with. With
> MM2, rebalancing only occurs when the number of workers changes or when the
> assignments change (e.g. new topics are discovered).
>
> Ryanne
>
> On Tue, Oct 15, 2019 at 10:23 AM Vishal Santoshi <
> [EMAIL PROTECTED]>
> wrote:
>
> > Hey Ryanne,
> >
> >
> >            The test was on topics that had a 7 day retention. Which
> > generally implies that the batch size for flush is pretty high ( till the
> > consumption becomes current ). The  offset.flush.timeout.ms defaults to
> 5
> > seconds and the code will not send in the offsets if the flush is not
> > complete. Increasing that time out did solve the "not sending the offset
> to
> > topic" issue.
> >
> > Two questions ( I am being greedy here :) )
> >
> > 1. It seems though that flushing the flushing the producer and setting
> the
> > offset to the compacting topic is not atomic  OR  do we use
> > transactions here  ?
> >
> > 2. I see
> >
> >  WorkerSourceTask{id=MirrorHeartbeatConnector-0} flushing 956435
> >
> >  WorkerSourceTask{id=MirrorSourceConnector-1} flushing 356251
> >
> >  WorkerSourceTask{id=MirrorCheckpointConnector-2} flushing 0
> >
> >  WorkerSourceTask{id=MirrorCheckpointConnector-3} flushing 0 outstanding
> > messages
> >
> >
> > I think these are 4 threads ( b'coz num.tasks=4 ), and I have 2 topics
> with
> > 1 partition each. Do I assume this right, as in there are 4 consumer
> groups
> > ( on CG per thread ) ...
> >
> >
> >
> >
> >
> > THANKS A LOT
> >
> >
> > Vishal.
> >
> >
> >
> > On Mon, Oct 14, 2019 at 3:42 PM Ryanne Dolan <[EMAIL PROTECTED]>
> > wrote: