Description
We have used kafka-python
to interact with our Kafka brokers since 2020.
However, during the last Kafka Broker upgrades (i.e. 3.3 => 3.5) we experienced spikes in the CPU utilization of the broker's nodes. The same happens upon hardware failure in the kafka cluster that results in replacing a broker with another node.
The issue is mitigated by restarting the consumers in our python apps. Aiven responded that this is a known issue on apps relying on kafka-python
and we should switch to https://github.com/confluentinc/confluent-kafka-python.
Here is their whole response:
We also noticed that the impacted consumers are those based on client library kafka-python-2.0.2 , which one was last updated 3 years ago, and is based on an older version of Kafka protocol. In Kafka 3.3.1, we are aware that this older protocol has lost in efficiency, why generally leads to more latency and consumer rebalances. The short-term solution is to relax your configurations as follow:
- Increase broker value for group.initial.rebalance.delay.ms (default: 3s).
- Increase consumer value for session.timeout.ms (default: 45s).
- Make sure that the consumer value of heartbeat.interval.ms (default: 3s) is lower than session.timeout.ms, but not higher than 1/3 of that value.
The long term solution is to migrate these consumers to client library confuent-kafka-python-2.2.0 which one offers a similar SDK, and is up-to-date.
Is there any development in progress so that we tackle this issue?
If not we will have to switch to another kafka library.