-
Notifications
You must be signed in to change notification settings - Fork 403
Hold PeerState
in an RwLock
rather than a Mutex
#2968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RwLock
rather than a Mutex
PeerState
in an RwLock
rather than a Mutex
5c6cf90
to
4f4fa58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to better understand the reported use for this - list_channels
, specifically, should be plenty fast enough that the time under a single lock is ~instant and this doesn't change much of anything. If there's some other places where we can be read-only for longer (eg I/O) then I'd say lets do it.
4f4fa58
to
3658581
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2968 +/- ##
==========================================
+ Coverage 89.31% 89.37% +0.05%
==========================================
Files 117 117
Lines 95618 95618
Branches 95618 95618
==========================================
+ Hits 85404 85456 +52
+ Misses 7974 7926 -48
+ Partials 2240 2236 -4 ☔ View full report in Codecov by Sentry. |
Right, the users who reported this should def. investigate why it was blocking for so long. I now went through all usages and found several places where we could also just us a IMO, the other cases make this a worthwhile improvement, independently of the concrete reported issue. |
Writing the manager basically halts the world no matter what due to the consistency write lock. I'm not entirely convinced by the remaining changes that they're worth it. |
Yes, for many operations, but we don't acquire What am I missing? Is not acquiring it a lock order violation we need to fix? The docs seem to indicate so, but I'm not sure taking |
Ah, fair point, duh. Note that
No, there's no lockorder violation if we skip a lock, only if they're in the wrong order. |
I don't think so? During channel data serialization we take the Then later we take the So, IIUC, we can have during access during parts of the serialization, but not for the full
Ah, okay, the doc's explanation around 'different branches' had me confused whether we're allowed to skip the root or not. Logically it makes sense, the docs aren't fully clear on that though, IMO. |
Yea, I guess I was figuring that loop would be pretty quick, but I guess it does write all the channels so its not free, even if pretty quick.
Yea, its...delicate. IIRC this is the only place we actually take per-peer locks recursively, so the total_consistency_guard saves us, but of course it is a little less fragile to keep relying on the per_peer write lock for that. |
Previously, we would hold the `PeerState` in a `Mutex`, which disallows concurrent read-only operations. Here we switch to an `RwLock` making this possible.
3658581
to
cf2d418
Compare
Rebased on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not entirely convinced this improves parallelism in any practical cases, honestly.
let mut peer_state_lock = peer_state_rwlock.write().unwrap(); | ||
let peer_state = &mut *peer_state_lock; | ||
let peer_state_lock = peer_state_rwlock.read().unwrap(); | ||
let peer_state = &*peer_state_lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a race condition here where we decide whether or not to include the channel at this step but then the channel gets updated and we write the wrong number of channels? Doesn't seem like it was introduced in this PR but still worth fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed that as well and created a TODO for it but so far intended to fix it elsewhere. Looking at it again it seems that this could be fixed by a) only iterating the peer states once and persisting to a temporary buffer while keeping track of serializable_peer_count
and number_of_funded_channels
or b) taking the per_peer_state.write()
lock earlier and holding it for the remainder of the method.
Seems a) is rather inefficient and b) would mean dropping any chance of concurrent access during at least parts of write
. I think switching to an RwLock
might still be nice in general and going forward, but if you don't agree, possibly we should just do b) in a separate PR and close this one?
IIRC we discussed offline and concluded to abandon this in favor of fixing the issue at #2968 (comment) by holding the top-level write lock for longer. Leaving this open until we move that to an issue or open the followup pr. |
Now opened to this effect: #2998 Closing this as the benefits of switching to |
Previously, we would hold the
PeerState
in aMutex
, which disallows concurrent read-only operations. Here we switch to anRwLock
making this possible.To this end, we first switch all instances of
Mutex::lock
toRwLock::write
, and then selectively adapt some usages permitting for it toRwLock::read
.