I had a spark streaming app that reads from kafka running for a few hours after which it failed with error
*17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time 1486497850000 ms java.lang.IllegalStateException: No current assignment for partition mt_event-5 at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251) at org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:315) at org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1170) at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:179) at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:196) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)*
17/02/07 20:04:10 ERROR ApplicationMaster: User class threw exception: java.lang.IllegalStateException: No current assignment for partition mt_event-5 java.lang.IllegalStateException: No current assignment for partition mt_event-5
17/02/07 20:05:00 WARN JobGenerator: Timed out while stopping the job generator (timeout = 50000) Driver did not recover from this error and failed. The previous batch ran 5sec back. There are no indications in the logs that some rebalance happened. As per kafka admin, kafka cluster health was good when this happened and no maintenance was being done.
Any idea what could have gone wrong and why this is a fatal error?
This is running in YARN cluster mode. It was restarted automatically and continued fine. I was trying to see what went wrong. AFAIK there were no task failure. Nothing in executor logs. The log I gave is in driver.
After some digging, I did see that there was a rebalance in kafka logs around this time. So will driver fail and exit in such cases? I've seen drivers exit after a job has hit max retry attempts. This is different though rt?
Srikanth On Tue, Feb 7, 2017 at 5:25 PM, Tathagata Das <[EMAIL PROTECTED]> wrote: