Make seek(); commit(); work without commit discarding the seek change #148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once upon a time I was moving between Kafka 0.8.0 and Kafka 0.8.1 rather freely. As such, I forked kafka-python. In particular, I've enabled offset management via the Kafka API, added a bunch of performance tweaks, and in general made it work. The results are visible here: https://github.com/wizzat/kafka-python/tree/easykafka.
However, during a load test we ran Kafka out of disk space. In response to this, we set the TTL on the cluster to about an hour, which deleted all the data out of the cluster. The offsets the application had stored in Zookeeper (via the Kafka server) were now wrong - and worse, didn't exist. This caused the application to raise lots of OFFSET_OUT_OF_RANGE errors.
In response to these OFFSET_OUT_OF_RANGE errors, I devised a brilliant work around. At the start of the application, I would advance the topic offsets something like this:
However, as it turns out, however, no data actually went through the application - and thus the commit was lost due to the following line:
I thought about extracting commit() into two methods: commit() which delegated to force_commit(), however it seemed a smaller change to make seek() count as a count_since_commit. Thus, I propose a patch.
Let me know if you'd prefer to go with commit() and force_commit(), or perhaps commit(partitions=None, force=False).