Skip to content

Race condition in function clusterNodes.Addrs() #3218

Closed
@shawnwgit

Description

@shawnwgit

A race condition exists in the redis cluster client involving c.activeAddrs(a slice that stores redis node addresses). Specifically, the cluster client’s GC function writes to c.activeAddrs, while concurrently, callers may invoke the Addrs function to read from it.

Write path

Function GC overwrites the activeAddrs from its beginning.

// GC removes unused nodes.
func (c *clusterNodes) GC(generation uint32) {
	//nolint:prealloc
	var collected []*clusterNode

	c.mu.Lock()

	c.activeAddrs = c.activeAddrs[:0] // <-------------
	for addr, node := range c.nodes {
		if node.Generation() >= generation {
			c.activeAddrs = append(c.activeAddrs, addr) // <-------------
			if c.opt.RouteByLatency {
				go node.updateLatency()
			}
			continue
		}

		delete(c.nodes, addr)
		collected = append(collected, node)
	}

	c.mu.Unlock()

	for _, node := range collected {
		_ = node.Client.Close()
	}
}

Read path

The function Addrs() returns a reference to a string slice. The returned addrs references either c.activeAddrs or c.addrs from the original object.

func (c *clusterNodes) Addrs() ([]string, error) {
	var addrs []string

	c.mu.RLock()
	closed := c.closed //nolint:ifshort
	if !closed {
		if len(c.activeAddrs) > 0 {
			addrs = c.activeAddrs // <-------------
		} else {
			addrs = c.addrs // <-------------
		}
	}
	c.mu.RUnlock()

	if closed {
		return nil, pool.ErrClosed
	}
	if len(addrs) == 0 {
		return nil, errClusterNoNodes
	}
	return addrs, nil // <-------------
}

Expected Behavior

There is no race condition between the "addrs" consumers and the GC function.

Current Behavior

WARNING: DATA RACE
Write at 0x00c00068a5d0 by goroutine 248:
  github.com/redis/go-redis/v9.(*clusterNodes).GC()
      /home/user/work/go-redis-shawnw/osscluster.go:521 +0x284
  github.com/redis/go-redis/v9.newClusterState.func1()
      /home/user/work/go-redis-shawnw/osscluster.go:690 +0x4c

Previous read at 0x00c00068a5d0 by goroutine 247:
  github.com/redis/go-redis/v9.(*ClusterClient).loadState()
      /home/user/work/go-redis-shawnw/osscluster.go:1221 +0x2be
  github.com/redis/go-redis/v9.(*ClusterClient).loadState-fm()
      <autogenerated>:1 +0x47
  github.com/redis/go-redis/v9.(*clusterStateHolder).Reload()
      /home/user/work/go-redis-shawnw/osscluster.go:848 +0x50
  github.com/redis/go-redis/v9.(*clusterStateHolder).LazyReload.func1()
      /home/user/work/go-redis-shawnw/osscluster.go:863 +0x8f

Possible Solution

Return a new copy of the addr slice instead of a reference to the original slice.

Steps to Reproduce

  1. Apply the attached patch: repro.patch
  2. Run the new test in the patch: go test -race -v --timeout 1m -run "TestGinkgoSuite"

Context (Environment)

The race condition has been detected in the real Redis cluster and is also reproducible with the above steps.

Possible Implementation

One possible solution is to modify the func (c *clusterNodes) Addrs() ([]string, error) function to return a copied slice instead of a reference.

func (c *clusterNodes) Addrs() ([]string, error) {
...
	if !closed {
		if len(c.activeAddrs) > 0 {
			addrs = c.activeAddrs  // make a copy
		} else {
			addrs = c.addrs // make a copy
		}
	}

The fix might slightly increase the runtime of the Addrs() function, as copying the original slice generally takes more time than creating a simple reference. However, this impact appears to be negligible. Details are as follows:

  • Benchmarking on an "AMD EPYC 7B13" host shows that copying a slice of 500 addresses takes approximately 2.5 microseconds.
  • The existing callers of the Addr() function, such as ClusterClient.loadState, clusterNodes.Random, and ClusterClient.cmdsInfo, are not on the perf critical path. Therefore, adding a few microseconds is unlikely to have any significant impact.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions