Skip to content

[flagd] improve error logging for reconnect scenarios #1010

Closed
@aepfli

Description

@aepfli

Our FlagD in-process provider is built with resilience. We will try to reconnect when there is a loss of connection, which is excellent and the right thing to do in a distributed world. We want to be able to continue to serve flags properly. The Problem we're facing is that each reconnect will log an error, but is it an error if we recover from this state? IMHO, it is an error if we reach our maximum reconnect delay.

I propose to change our log level here to info, and additionally log an error, if we reach the max delay here.

} else {
log.error(String.format("Error initializing stream or metadata, retrying in %dms",
retryDelay), response.getError());
if (!writeTo.offer(
new QueuePayload(QueuePayloadType.ERROR, "Error from stream or metadata",
metadataResponse))) {
log.error("Failed to convey ERROR status, queue is full");
}
}

This way, we handle reconnects gracefully but will still retrieve the information about connection issues in a timely manner in an error case.

Additionally/Optionally, we can add an immediate error log for the first connection attempts. In this case, we might not want to wait for the maximum delay.

Goals

  • reducing normal reconnection logs to info or warn
  • separate logs with dedicated message for metadata or stream
  • log the error if we reach max delay
  • Optional: log the error immediately if we're in the first connection.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions