Skip to content

createCluster clients don't handle on('error') correctly #2721

Closed
@kseth

Description

@kseth

Description

We use cluster-mode with redis for sharded pub-sub (we have 3 masters and 3 replicas in a kubernetes cluster).

We have the following args for the clients:

    const clusterArgs = {
      rootNodes: [
        {
          url: `redis://${REDIS_CLUSTER_PUBSUB_HOST}:${redisPort}`,
        },
      ],
      defaults: {
        username: REDIS_CLUSTER_PUBSUB_NAME,
        password: REDIS_CLUSTER_PUBSUB_PASS,
        socket: {
          reconnectStrategy(retries: number) {
            if (retries >= 10) {
              console.error(
                `lost connection to redis cluster-pubsub cluster: tried ${retries} times`
              );
            } else {
              console.warn(
                `retrying redis cluster-pubsub cluster connection: tried ${retries} times`
              );
            }

            // reconnect after
            return Math.min(retries * 200, 2000);
          },
          connectTimeout: 10000,
          keepAlive: 60000,
        },
      },
    };

and then we create the client(s) like this:

const client = createCluster(clusterArgs);
await client.connect();
client.on('error', (err) => {
  console.error(`[PUB-SUB ERROR]: ${err}`);
});

Sometimes our redis pub-sub cluster goes down (i.e. for maintenance, when we upgrade to a new version, since we run it in kubernetes), and we'll receive the following error:

Error: Socket closed unexpectedly

We correctly log the error by catching it in the error handler, but we never seem to retry / reconnect -- the only way I can get a reconnect to actually happen is to continually restart the process until the reconnection succeeds.

Also, if the process tries to issue a command, we sometimes get an internal error killing the process because of a node uncaught exception, even though I've added a client.on('error') above.

I followed the findings from #2120 and #2302, but those don't really seem to solve our problems.

What I'd like is to be able to specify a reconnect strategy so that we continually try to retry (according to the reconnectStrategy) if we lose our TLS connection / fail to talk to a node in the cluster. Also, I'd like that we actually queue messages when we're offline instead of throwing an error and taking down the process.

Node.js Version

20.11.1

Redis Server Version

7.0.10

Node Redis Version

4.6.13

Platform

linux

Logs

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions