-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Default reconnect strategy uses exponential backoff and jitter #2736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Both are recommended parts of client reconnect strategies to prevent thundering herd problems when many clients lose their connection at once (for example, during a Redis upgrade).
Hi @qsymmachus, thanks for contributing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, just a few small comments and questions. You can ignore the nits and I'll do them myself.
docs/client-configuration.md
Outdated
@@ -13,7 +13,7 @@ | |||
| socket.keepAlive | `true` | Toggle [`keep-alive`](https://nodejs.org/api/net.html#socketsetkeepaliveenable-initialdelay) functionality | | |||
| socket.keepAliveInitialDelay | `5000` | If set to a positive number, it sets the initial delay before the first keepalive probe is sent on an idle socket | | |||
| socket.tls | | See explanation and examples [below](#TLS) | | |||
| socket.reconnectStrategy | `retries => Math.min(retries * 50, 500)` | A function containing the [Reconnect Strategy](#reconnect-strategy) logic | | |||
| socket.reconnectStrategy | `((retries^2) * 50 ms) + 0-200 ms` | A function containing the [Reconnect Strategy](#reconnect-strategy) logic | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ^
operator is "Bitwise XOR" in JavaScript, not power.. we need to find a better way to document this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with a plain English explanation of the strategy in 38eb603, rather than try to figure out how to pseudo-code this in a legible way.
packages/client/lib/client/socket.ts
Outdated
@@ -82,6 +82,15 @@ export default class RedisSocket extends EventEmitter { | |||
} | |||
|
|||
#createReconnectStrategy(options?: RedisSocketOptions): ReconnectStrategyFunction { | |||
const defaultStrategy = (retries: number) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be moved to a #defaultReconnectStrategy
function and used from lines 109 and 114
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in e7304ad 👍
@leibale I responded to your latest feedback; sorry for the delay! |
@leibale bumping this one! Thanks for taking another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM, but I'm not too sure about the documentation.. @guyroyse can you help us with it please?
Docs look good to me. |
I'll leave the that @leibale as he is the primary owner. I just pop in and do the occasional review. ;) |
@qsymmachus done! thanks for taking the time to contribute, and sorry for the delays.. |
Description
The current default reconnect strategy increases delay linearly with no jitter. This can lead to a thundering herd problem if many clients lose their connection at once, which is common during events like a Redis upgrade.
A reconnect strategy that uses exponential backoff plus a random jitter helps mitigate this risk and is recommended in many best practices documents for Redis clients, so I propose making this the default.
The default strategy I implemented will do the following:
(n^2) * 50 ms
wheren
is the current retry count, with a maximum value of 2000 ms.The millisecond values I choose are somewhat arbitrary, and I'm happy to change them if we think they're not right.
Checklist
npm test
pass with this change (including linting)?reconnectStrategy
still pass. We have no tests for the default strategy, nor am I sure if and how we'd want to test it. Any thoughts on this?