Subject: flatMapGroupsWithState not timing out (spark 2.2.1)


Aah okay!

How are testing whether there is a timeout? The situation that would lead
to the *EventTimeTimeout* would be the following.
1. Send bunch of data to group1, to set the timeout timestamp using
event-time
2. Then send more data to group2 only, to advance the watermark (since it's
based on event time across all the groups) and see timeout occurs.
Note that you have to keep sending some data to other groups so
that microbatches are triggered continuously and watermark is recalculated.
If you send bunch of data and then stop sending and just wait, then the
watermark will not advance (as there is no data to recalculate watermark)
and therefore may not hit the condition watermark > timeout timestamp.

For *ProcessingTimeTimeout* the situation is different. That should
rely solely on the wallclock time, not on any watermark.
In that case, you still have to keep sending data to continuously trigger
microbatches, as without any data, there wont be microbatches triggered and
therefore no timeouts will be processed. This is a known issue that we will
fix. It should work fine if you keep pushing data to group2; group1 should
timeout.

Did that make sense?

TD

On Fri, Jan 12, 2018 at 3:43 PM, daniel williams <[EMAIL PROTECTED]>
wrote: