As I mentioned earlier, file-based synchronization has been deprecated. We
strongly recommend that you use Zookeeper-based synchronization.
I am very confused that you claim you can run jobs on specific cluster
members. Job work is distributed among all cluster members, and you should
see the same jobs no matter which tomcat webapp you go to when you view the
jobs, since they are stored in the database. The global database
configuration should be in Zookeeper, and the same Zookeeper instance
should be referenced by all cluster members.
In any case, if what you were trying to achieve by all of this was parallel
execution of jobs, you probably did all that work based on a false
assumption. ManifoldCF will crawl jobs in parallel even with only one
agents process, but there will be a delay before the "second" job's
documents get served. It is not a question of having multiple cluster
members; it is because ManifoldCF puts its job queue in the database.
Documents are given a "docpriority", which is a number, at the time they
are queued. The query that pulls documents out of the queue for servicing
orders documents by docpriority. What that means in practice is that when
you start your second job, ALL the documents that were queued for
processing must be processed before any new documents from the second job
get looked at. This is, unfortunately, unavoidable. You can, however,
reset the document priorities for a job by pausing it and resuming it -- so
if you start your second job, and then pause and restart the first, the
documents for the first job get reprioritized.
Reprioritization is expensive when the job queue is large, so it is
unlikely we'd consider "automatically" reprioritizing all documents
whenever a job is started.
Hope this helps,
On Fri, Mar 23, 2018 at 8:24 AM, Shashank Raj <[EMAIL PROTECTED]>