Deadlock when using async monitor persistence

I think my team and I stumbled upon a deadlock bug on LDK. It goes like this:

1. We call `ChainMonitor::channel_monitor_updated()`
2. `ChannelManager::get_and_clear_pending_msg_events()` eventually gets called, takes a read lock on `total_consistency_lock` and calls `process_pending_monitor_events()`
3. One of the pending monitor events is `MonitorEvent::Completed`, so `ChannelManager::channel_monitor_updated()` is called, which also takes a read lock on `total_consistency_lock`

If between the 2 read locks in steps 2. and 3. another concurrent task tries to get a write lock, a deadlock can occur, depending on the queuing policy of the OS. On my machine (MacOS) I never experienced this, but on Linux machines, we get random hangs. It's likely the `BackgroundProcessor` calling `persist_manager`, which takes a write lock on `total_consistency_lock` inside the `write()` method of `ChannelManager`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock when using async monitor persistence #2000

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deadlock when using async monitor persistence #2000

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions