Closed
Description
Describe the bug
If a non-leader pod gets shutdown, during shutdown for some reasons leader jobs like telemetry reporting or status updating are started.
To Reproduce
- Deploy NGF with multiple replicas
- Watch (kubect logs -f) logs of a non-leader pod.
- Shutdown the pod by kubectl delete pod
- See in the logs errors like below:
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for non leader election runnables"}
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"secret","controllerGroup":"","controllerKind":"Secret"}
. . .
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Stopping and waiting for leader election runnables"}
(the two lines below should correspond to status updater, which should not have been kicked off)
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Writing last statuses"}
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"statusUpdater","msg":"Updating Gateway API statuses"}
. . .
(two lines below correspond to telemetry reported, which should not have been started)
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"telemetryJob","msg":"Starting cronjob"}
{"level":"error","ts":"2024-03-20T19:28:58Z","logger":"telemetryJob","msg":"Failed to collect telemetry data"," ...
. . .
{"level":"info","ts":"2024-03-20T19:28:58Z","logger":"telemetryJob","msg":"Stopping cronjob"}
. . .
{"level":"info","ts":"2024-03-20T19:28:58Z","msg":"Wait completed, proceeding to shutdown the manager"}
Expected behavior
leader jobs should not start during shutdown of a non-leader pod.
Your environment
NGF - edge, 5b13734