Skip to content

Deadlock when using async monitor persistence #2000

Closed
@danielgranhao

Description

@danielgranhao

I think my team and I stumbled upon a deadlock bug on LDK. It goes like this:

  1. We call ChainMonitor::channel_monitor_updated()
  2. ChannelManager::get_and_clear_pending_msg_events() eventually gets called, takes a read lock on total_consistency_lock and calls process_pending_monitor_events()
  3. One of the pending monitor events is MonitorEvent::Completed, so ChannelManager::channel_monitor_updated() is called, which also takes a read lock on total_consistency_lock

If between the 2 read locks in steps 2. and 3. another concurrent task tries to get a write lock, a deadlock can occur, depending on the queuing policy of the OS. On my machine (MacOS) I never experienced this, but on Linux machines, we get random hangs. It's likely the BackgroundProcessor calling persist_manager, which takes a write lock on total_consistency_lock inside the write() method of ChannelManager.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions