Description
Is your feature request related to a problem? Please describe.
My colleague chatted with you on Discord about this. Our main use case is disparate clusters, where each one is behind a load balancer. We want to use one of those clusters as a primary location, and only swap to the other cluster when the main cluster is completely down/unavailable. We think that this could be easily achievable with a little more interrogation from externally-pluggable logic when setting up the ConnectionFactory.
Describe the solution you'd like
Ultimately something like the RetryListener, but for Connections and not just for Topology. Or, it could be done with lambdas like Predicates, and also on the connection that failed. Maybe also a connection retry count passed in to help make judgments.
We envision setting cluster tags in our servers that inform the client about which cluster they're connected to, and perhaps additionally a cluster tag to indicate that the address used was behind a load balancer. So, we could check to see if the server tags indicate a load balancer address, combined with the reason the connection was shut down.
Maybe an easy way to plug this in currently is to have an interface that returns an AddressResolver, so an easy default implementation is to return the current AddressResolver unconditionally. This would preserve current behavior.
So, maybe, all notional:
public interface ConnectionRetryListener {
AddressResolver onRetry(Connection failed, Exception cause, int retryCount);
}
/* somewhere in initialization */
if (this.connectionRetryListener == null) {
this.connectionRetryListener = (conn, cause, count) -> this.addressResolver;
}
Then we could send a non-shuffling list of [secondary, primary]
when there's an unexpected issue, or if retry count goes higher than some tolerable level, otherwise ask the system to attempt [primary, secondary]
in standard scenarios. Or even skip sending primary/secondary together and let the new implementation determine whether the primary or secondary should be tried by itself. I.e. try primary three times, then try secondary three times, then give up.
I have not yet looked at downstream impacts of wiring this through the existing code. First just want to hash ideas on what you guys like / don't like. We're willing to do the legwork to contribute.
Describe alternatives you've considered
Currently we override AddressResolver to always return a fixed list and skip shuffling, which works mostly well but there are edge cases where a client may cascade to the more distant cluster when their primary is still up.
Additional context
No response