Subject: flatMapGroupsWithState not timing out (spark 2.2.1)

Hello Dan,

From your code, it seems like you are setting the timeout timestamp based
on the current processing-time / wall-clock-time, while the watermark is
being calculated on the event-time ("when" column). The semantics of the
EventTimeTimeout is that when the last set timeout timestamp of a group
becomes older than the watermark (that is calculated across all groups)
because that group did not get any new data for a while, then there is a
timeout and the function is called with hasTimedOut to true. However, in
this case, the timeout timestamp is being from a different source of time
(using the wall clock time) than the watermark (using event-time), so they
may not correlate correctly. For example, if the event-time in the test
data is such that it is always one hour behind the wall clock time, the
watermark will be atleast 1 hour older than the set timeout timestamp, and
the group would have to not received data for more than an hour before it
times out.

So I would verify what is the gap between the event-time in data, and the
wall-clock time that is being used to set to understand what is going on.
Or even better, just use the event-time in the data to calculate the
timeout timestamp and not use processing time timeout anywhere.

Let me know how it goes.


On Fri, Jan 12, 2018 at 2:36 PM, daniel williams <[EMAIL PROTECTED]>