I have a question regarding ordering of consumed messages. We timestamp our messages and send them into Kafka in order. I wrote a simple consumer that simply consumes the messages and prints out the timestamp. I see messages for all seven days worth of date being consumed at once.
The consumer has 10 threads, simply connects, consumes and prints timestamps. It is set to the "smallest" offset so that it reads from the beginning. There are many millions of messages so I think I can rule out some partitions not having messages for certain days as the cause. I know that Kafka doesn't guarantee ordering across partitions but I would assume that with this volume of messages I would see the timestamps for the first day, followed by the second day, etc. Instead I see them all print at once.
I think we figured this out. It looks like the consumption of partitions is wildly unpredictable. We see a single partition being consumed almost halfway before switching to another partition for consumption. This causes us to read messages from a range of dates out of order.
Interesting at least. Thanks for your help. It may be hard to reason about ordering across 1400 partitions. Could you use the SimpleConsumerShell to consume messages from 1 partition and see if messages are ordered?
On Fri, Mar 21, 2014 at 1:04 AM, Tom Amon <[EMAIL PROTECTED]> wrote: at once.