Thats right SEMI_AUTO will only change the role of the replica. It will
never move the replicas.
Instead of answering each question, I will try to explain what happens
under the hood.
- Each participant maintains a persistent connection with Zookeeper and
sends the heartbeat every X seconds. I think this is called tick time.
- When the participant fails to send heartbeat, there is a disconnect
callback from local ZK client code. Note this callback does not come from
the ZK server, it will occur as soon as the participant fails to send the
- Let's say the participant connects back to ZK after period T. Now there
are two cases.
- T < session timeout: In this case, the participant gets "connected"
callback and its session is still valid and nothing has changed from ZK
server/Helix controller/spectator point of view.
- T > session timeout: This is when the participant gets "session
expiry" callback from the ZK. Note that this happens only after the
participant reconnects to ZK. So it might be minutes or even hours
(depending on the cause of disconnection from ZK) before the participant
gets this call back. But outside world - ZK
Server/Controller/Spectator will know about the session expiry immediately
after the session timeout.
Helix gets to know about the session expiry and will initiate a mastership
transfer from old master to new master. It cannot send any Master - Slave
transition message to the old master because the old master is disconnected
from ZK and is unreachable. Helix will automatically change the external
View to reflect that the old master is offline for all the replicas it
owns. The clients (spectators) will immediately know about this and they
can stop sending requests to the old master.
Similarly, once the new master processes the slave to master transition is
successful, the external view will be updated and the clients (spectators)
can now start routing the requests to the new master.
As you pointed out in your email, you can start a timer in participant
after you get a disconnected event and after session timeout time, stop
processing the requests. We could have done this automatically in Helix but
it really depends on the application. This is typically needed only in
master-slave state model and we could not come up with automatic way. But
we could have potentially done this based on a config variable. It will be
awesome if you can contribute this feature.
The controller will change all the relevant data structures in ZK when the
node goes down (session expires). There is no need for any extra work here.
On Tue, Jan 2, 2018 at 7:03 PM, Bo Liu <[EMAIL PROTECTED]> wrote: