Open
Description
sample code pulled from one of our internal applications:
# kafka_producer is configured with:
# "key_serializer": json.dumps,
# "value_serializer": json.dumps,
key = None # None produces round-robin
if Const.FIELD_USER in message:
key = message[Const.FIELD_USER]
kafka_producer.send(topic, key=key, value=message)
Unsurprisingly, using json.dumps
will serialize key=None
to 'null'
.
Surprisingly, this results in key=None
behaving as if it were a keyed message and always being sent to a single partition rather than round-robining.
This is because the serialization layer is processed before the partitioning logic. So by the time https://github.com/dpkp/kafka-python/blob/1.4.4/kafka/partitioner/default.py#L24 is hit, the key is already the string 'null'
.
I found this extremely surprising... at a minimum we need to call this out in the docs.
Alternatively, we could offer default helpers that handle null keys/values (for deleting messages in compacted topics) in a less surprising way.
Related: #913.