Subject: Source Connector Task in a distributed env


Venkata, the example you have linked creates a single task config s.t.
there is no parallelism -- a single task runs on the cluster, regardless of
the number of nodes. In order to introduce parallelism, your
SourceConnector needs to group all known files among N partitions and
return N task configs for N tasks. You can use
ConnectorUtils.groupPartitions() for this. In each task config, specify the
specific group of files for that task, as grouped by groupPartitions().

Then your SourceConnector can watch for new files. Anytime a new file is
detected, call context.requestTaskReconfiguration(), which will restart
this process.

Ryanne

On Wed, Apr 24, 2019 at 3:08 AM Venkata S A <[EMAIL PROTECTED]> wrote: