-
Notifications
You must be signed in to change notification settings - Fork 402
Disallow dual-sync-async persistence without restarting #3737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Disallow dual-sync-async persistence without restarting #3737
Conversation
TheBlueMatt
commented
Apr 15, 2025
👋 Thanks for assigning @valentinewallace as a reviewer! |
We may be able to remove the test |
4e41cb7
to
70f9409
Compare
Would it be a good take-a-Friday issue to fix the legacy tests that use both sync + async so we don't need to gate the panics anymore? |
/// We only support using one of [`ChannelMonitorUpdateStatus::InProgress`] and | ||
/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts: is there a way to enforce this with the compiler? And is it a goal of the project to continue supporting both sync + async persistence forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could have some kind of flag you set on startup which indicates which persistence method you're using, but I'm sure really sure that it would be an issue in practice - we expect users to use either Persist
or some future AsyncPersist
which will handle this stuff for them, and I don't really see why someone would ever persist sometimes-sync-sometimes-async - either your disk is slow and you need async or you don't.
We definitely want to support both at the project level, though, for the same reason we want to support async and sync everywhere in the project - Rust-async-native projects are becoming more common, but Rust-non-async projects also exist (as well as platforms where that's required) as well as our current bindings are not async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely want to support both at the project level, though, for the same reason we want to support async and sync everywhere in the project - Rust-async-native projects are becoming more common, but Rust-non-async projects also exist (as well as platforms where that's required) as well as our current bindings are not async.
To clarify, by sync + async persistence I mean supporting either returning ChannelMonitorUpdateStatus::Completed
or ::InProgress
. It feels like we expect most production users to persist data asynchronously and then later call channel_monitor_updated
, so it could be reasonable to drop support for the ::Completed
option down the line. Seems like users that are the exception to that can just call channel_monitor_updated
immediately, as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? I imagine many small nodes will still want the simplicity of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, seems worth exploring though. Bookmarking to discuss further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, maybe when we're super 100% on the stability of async persist? It wouldn't free up a lot of code though cause it's all the same pathway now anyway.
fuzz/src/chanmon_consistency.rs
Outdated
let empty_node_a_ser = node_a.encode(); | ||
let empty_node_b_ser = node_b.encode(); | ||
let empty_node_c_ser = node_c.encode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These don't seem to be used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, part of a later patch :)
70f9409
to
ddccbda
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3737 +/- ##
==========================================
+ Coverage 89.12% 89.82% +0.69%
==========================================
Files 156 156
Lines 123514 129338 +5824
Branches 123514 129338 +5824
==========================================
+ Hits 110086 116176 +6090
+ Misses 10749 10554 -195
+ Partials 2679 2608 -71 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Yea, I imagine its a good bit of work, though. |
Needs a squash |
In general, we don't expect users to persist `ChannelMonitor[Update]`s both synchronously and asynchronously for a single `ChannelManager` instance. If a user has implemented asynchronous persistence, they should generally always use that, as there is then no advantage to them to occasionally persist synchronously. Even still, in 920d96e we fixed some bugs related to such operation, and noted that "there isn't much cost to supporting it". Sadly, this is not true. Specifically, the dual-sync-async persistence flow is ill-defined and difficult to define in away that a user can realistically implement. Consider the case of a `ChannelMonitorUpdate` which is persisted asynchronously and while it is still being persisted a new `ChannelMonitorUpdate` is created. If the second `ChannelMonitorUpdate` is persisted synchronously, the `ChannelManager` will be left with a single pending `ChannelMonitorUpdate` which is not the latest. If we were to then restart, the latest copy of the `ChannelMonitor` would be that without any updates, but the `ChannelManager` has a pending `ChannelMonitorUpdate` for the next update, but not the one after that. The user would then have to handle the replayed `ChannelMonitorUpdate` and then find the second `ChannelMonitorUpdate` on disk and somehow know to replay that one as well. Further, we currently have a bug in handling this scenario as we'll complete all pending post-update actions when the second `ChannelMonitorUpdate` gets persisted synchronously, even though the first `ChannelMonitorUpdate` is still pending. While we could rather trivially fix these issues, addressing the larger API question above is difficult and as we don't anticipate this use-case being important, we just disable it here. Note that we continue to support it internally as some 39 tests rely on it. We do, however, remove test_sync_async_persist_doesnt_hang which was added specifically to test for this use-case, and we now do not support it. Issue highlighted by (changes to the) chanmon_consistency fuzz target (in the next commit).
When we reload a node in the `chanmon_consistency` fuzzer, we always reload with the latest `ChannelMonitor` state which was confirmed as persisted to the running `ChannelManager`. This is nice in that it tests losing the latest `ChannelMonitor`, but there may also be bugs in the on-startup `ChannelMonitor` replay. Thus, here, we optionally reload with a newer `ChannelMonitor` than the last-persisted one.
`chanmon_consistency` was originally written with lots of macros due to some misguided concept of code being unrolled at compile-time. This is, of course, a terrible idea not just for compile times but also for performance. Here, we make `reload_node` a function in anticipation of it being used in additional places in future work.
ddccbda
to
d009b39
Compare
Squashed with a minor comment fix: $ git diff-tree -U1 ddccbdabd d009b3918
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index c86eed77b..54b607d88 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -2573,4 +2573,4 @@ where
/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not
- /// otherwise directly enforce this, we enforce it in debug builds here by storing which one is
- /// in use.
+ /// otherwise directly enforce this, we enforce it in non-test builds here by storing which one
+ /// is in use.
#[cfg(not(test))] |
These externalized tests seem to be failing:
|