Subject: Running a stand alone Mirror Maker 2.0 cluster and Issues


I misspoke

>> I now have 8 VMs 8 cpus with 48 max tasks and it did spread to the the
8  VMs. I then upscaled to 12 VMs and the tasks *have not *migrated as I
would expect .
On Fri, Oct 18, 2019 at 8:00 PM Vishal Santoshi <[EMAIL PROTECTED]>
wrote:

> OK, You will have to explain :)
>
> I had 12 VMs with 8 cpus and 8 max tasks.  I thought let me give a CPU to
> each task, which I presumed is a java thread ( even though I know the
> thread would be mostly ip bound ). . I saw the issue I pointed up.
> *I now have 8 VMs 8 cpus with 48 max tasks and it did spread to the the 8
> VMs. I then upscaled to 12 VMs and the tasks migrated as I would expect .*
>
>  I know that a VM will have MirrorSourceConnector and
> MirrorHeartbeatConnector tasks up till  tasks.max.  So a few questions
>
>
>
> * When we say there are 48 max tasks, are we saying there are  48 threads
> ( in fact 96, each for the 2 groups above,  worst case  + 2 ) ?
> * When we talk about Connector, are we talking about a JVM process, as in
> a Connector is a JVM process ?
> * Why larger number of tasks.max help the spread  ?  As in I would assume
> there are up till 8 tasks ( or 16 )  per VM but how that should not have
> prevented  re assignment  on a scale up ( as it clearly did ) ?
>
> The reason I ask is that I plan to run mm2 cluster on  k8s and I want to
> make sure that I use the version of JVM that is more docker friendly vis a
> vis, how many cpus it believes it has  and as explained here
> https://blog.softwaremill.com/docker-support-in-new-java-8-finally-fd595df0ca54
>
>
>
>
> On Fri, Oct 18, 2019 at 4:15 PM Ryanne Dolan <[EMAIL PROTECTED]>
> wrote:
>
>> What is tasks.max? Consider bumping to something like 48 if you're running
>> on a dozen nodes.
>>
>> Ryanne
>>
>> On Fri, Oct 18, 2019, 1:43 PM Vishal Santoshi <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Hey Ryanne,
>> >
>> >
>> >             I see a definite issue. I am doing an intense test and I
>> bring
>> > up 12 VMs ( they are 12 pods with 8 cpus each ), replicating about 1200
>> > plus topics ( fairly heavy 100mbps ) ... They are acquired and are
>> > staggered as they come up..I see a fraction of these nodes not assigned
>> any
>> > replication....There is plenty to go around. ( more then a couple of
>> > thousand partitions ) .   is there something I am missing.... As in my
>> > current case 5 of the 12 VMs are idle..
>> >
>> > Vishal
>> >
>> > On Fri, Oct 18, 2019 at 7:05 AM Vishal Santoshi <
>> [EMAIL PROTECTED]
>> > >
>> > wrote:
>> >
>> > > Oh sorry a. COUNTER... is more like it....
>> > >
>> > > On Fri, Oct 18, 2019, 6:58 AM Vishal Santoshi <
>> [EMAIL PROTECTED]
>> > >
>> > > wrote:
>> > >
>> > >> Will do
>> > >>     One more thing the age/latency metrics seem to be analogous as in
>> > >> they seem to be calculated using similar routines.  I would think a
>> > metric
>> > >> tracking
>> > >> the number of flush failures ( as a GAUGE )  given
>> > >> offset.flush.timeout.ms would be highly beneficial.
>> > >>
>> > >> Regards..
>> > >>
>> > >>
>> > >> On Thu, Oct 17, 2019 at 11:53 PM Ryanne Dolan <[EMAIL PROTECTED]
>> >
>> > >> wrote:
>> > >>
>> > >>> Ah, I see you are correct. Also I misspoke saying "workers"
>> earlier, as
>> > >>> the
>> > >>> consumer is not created by the worker, but the task.
>> > >>>
>> > >>> I suppose the put() could be changed to putIfAbsent() here to enable
>> > this
>> > >>> property to be changed. Maybe submit a PR?
>> > >>>
>> > >>> Ryanne
>> > >>>
>> > >>> On Thu, Oct 17, 2019 at 10:00 AM Vishal Santoshi <
>> > >>> [EMAIL PROTECTED]>
>> > >>> wrote:
>> > >>>
>> > >>> > Hmm  ( I did both )
>> > >>> >
>> > >>> > another->another_test.enabled = true
>> > >>> >
>> > >>> > another->another_test.topics = act_post
>> > >>> >
>> > >>> > another->another_test.emit.heartbeats.enabled = false
>> > >>> >
>> > >>> > another->another_test.consumer.auto.offset.reset = latest
>> > >>> >
>> > >>> > another->another_test.sync.topic.acls.enabled = false