Open
Description
We use spring-data-redis with lettuce as the redis connection lib.
The version is: spring-data-redis-2.4.9.jar.
Recently we found a lot of BLOCKING in production, all threads of servlet thread pool are blocked waiting for connection from connection pool. Then we removed connection pool conf and it goes well.
After tracking and analysing the problem, we found there is a Lettuce Connection Leak in spring-data-redis while using execute(SessionCallback) and executeWithStickyConnection(SessionCallback) in same thread by random turn. For example:
//step1
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
@Override
public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
try (cursor) {
while (cursor.hasNext()) {
String next = cursor.next();
String value = next == null ? "" : next;
}
} catch (IOException e) {
}
return null;
}
});
//step2
redisTemplate.executePipelined(new SessionCallback<Object>() {
@Override
public <K, V> Object execute(RedisOperations<K, V> operations) throws DataAccessException {
redisTemplate.hasKey("testDefault" );
redisTemplate.hasKey("testSecond");
return null;
}
});
//step3
redisTemplate.execute(new SessionCallback<Map<String, Integer>>() {
@Override
public Map<String, Integer> execute(RedisOperations redisOperations) throws DataAccessException {
Cursor<String> cursor = redisTemplate.opsForSet().scan("TEST_KEY",
new ScanOptions.ScanOptionsBuilder().match("*").count(1000).build());
try (cursor) {
while (cursor.hasNext()) {
String next = cursor.next();
String value = next == null ? "" : next;
}
} catch (IOException e) {
}
return null;
}
});
As you can see in the code above:
- step1: Use redistemplate to execute a sscan. sscan is implemented by executeWithStickyConnection by the way. This would cause
TransactionSynchronizationManager
bind aLettuceConnection
(contains aasyncSharedConn
which is a share connection) to current thread and not released(unbind) because thereferenceCount
is not zero. Both execute and sscan causereferenceCount
+ 1 but only execute triggerreferenceCount
minus 1 at the end. As cursor is closed finally after try block, cursor close will causeLettuceConnection
close, this would setisClosed = true
- step2: Use redistemplate to executePipeline. As in step1, current thread is bound to a
LettuceConnection
and this time would use thisLettuceConnection
to execute pipeline commands. As pipeline commands should use Dedicated connection, fetch a new Dedicated connection and put inLettuceConnection
(cantains aasyncSharedConn
and aasyncDedicatedConn
now). After executePipeline, theLettuceConnection
is not released becausereferenceCount
is still not zero. - step3: Use redistemplate to execute a sscan again. As cursor is closed finally after try block, cursor close will cause connection close. But
LettuceConnection
'sisClosed
is already true this time, nothing will be done according to the source of below. AndasyncDedicatedConn
would never be returned back:
@Override
public void close() throws DataAccessException {
super.close();
if (isClosed) {
return;
}
isClosed = true;
if (asyncDedicatedConn != null) {
try {
if (customizedDatabaseIndex()) {
potentiallySelectDatabase(defaultDbIndex);
}
connectionProvider.release(asyncDedicatedConn);
} catch (RuntimeException ex) {
throw convertLettuceAccessException(ex);
}
}
if (subscription != null) {
if (subscription.isAlive()) {
subscription.doClose();
}
subscription = null;
}
this.dbIndex = defaultDbIndex;
}
Never using scan within execute(sessioncallback) would help, but is this a bug or drawback of design?