Description
I'm getting quite a bit slower throughput for producers on 1.4.1 than on 1.3.5: about 2.5 MB/s for the former and about 25 MB/s for the latter.
The setup is very simple: one broker (Kafka 1.0.0), one producer, no consumers. The producer records the time, then in a small loop send()s 250 messages, each of which is 40,000 bytes long, to a topic which only has one partition (so: 10 MB total). Afterwards, a flush() is called and the total time and rate are calculated. (This measurement is then repeatedly taken in a loop, but the results seem pretty stable with time) Whether produced locally or from a second computer over a 10 Gbit/s link, this takes about 0.4 sec if I use kafka-python 1.3.4 and 1.3.5 (I haven't tested earlier versions), and about 4 sec if I use kafka-python 1.4.0 or 1.4.1. To switch between these results I only need to uninstall one kafka-python library version and install a different one.
For this, I've set acks=1, compression=none, plaintext. The producer is running under Python 2.7. The config files on the brokers are pretty nearly the standard ones which are distributed with Kafka. The topic has no overrides. This has been tested on two machines running Debian and well as one machine running SL7.
Let me know if there's any further information I can provide. (I'll be the first to admit I'm a bit of a newbie in both the Kafka universe as well as in Python, but I'll do my best) :)