1. I had on k8s standlone job
env.getCheckpointConfig().setFailOnCheckpointingErrors(true)// the default.
The job failed on chkpoint and I would have imagined that under HA the job
would restore from the last checkpoint but it did not ( The UI showed the
job had restarted without a restore . The state was wiped out and the job
was relaunched but with no state.
2. I had the inprogress files from that failed instance and that is
consistent with no restored state.
Thus there are few questions
1. In k8s and with stand alone job cluster, have we tested the scenerio of
the* container failing* ( the pod remained in tact ) and restore ? In this
case the pod remained up and running but it was definitely a clean relaunch
of the container the pod was executing.
2. Did I have any configuration missing . given the below ?
StreamExecutionEnvironment env =
env.enableCheckpointing(30 * 60000);
StateBackend stateBackEnd = new FsStateBackend(
3. What is the nature of RollingFileSink ? How does it enable exactly once
semantics ( or does it not . ) ?
Any help will be appreciated.
On Mon, Feb 11, 2019 at 5:00 AM Fabian Hueske <[EMAIL PROTECTED]> wrote: