Disallow dual-sync-async persistence without restarting #3737

TheBlueMatt · 2025-04-15T15:32:02Z

In general, we don't expect users to persist
`ChannelMonitor[Update]`s both synchronously and asynchronously for
a single `ChannelManager` instance. If a user has implemented
asynchronous persistence, they should generally always use that,
as there is then no advantage to them to occasionally persist
synchronously.

Even still, in 920d96edb6595289902f287419de2d002e2dc2ee we fixed
some bugs related to such operation, and noted that "there isn't
much cost to supporting it". Sadly, this is not true.

Specifically, the dual-sync-async persistence flow is ill-defined
and difficult to define in away that a user can realistically
implement.

Consider the case of a `ChannelMonitorUpdate` which is persisted
asynchronously and while it is still being persisted a new
`ChannelMonitorUpdate` is created. If the second
`ChannelMonitorUpdate` is persisted synchronously, the
`ChannelManager` will be left with a single pending
`ChannelMonitorUpdate` which is not the latest.

If we were to then restart, the latest copy of the `ChannelMonitor`
would be that without any updates, but the `ChannelManager` has a
pending `ChannelMonitorUpdate` for the next update, but not the one
after that. The user would then have to handle the replayed
`ChannelMonitorUpdate` and then find the second
`ChannelMonitorUpdate` on disk and somehow know to replay that one
as well.

Further, we currently have a bug in handling this scenario as we'll
complete all pending post-update actions when the second
`ChannelMonitorUpdate` gets persisted synchronously, even though
the first `ChannelMonitorUpdate` is still pending. While we could
rather trivially fix these issues, addressing the larger API
question above is difficult and as we don't anticipate this
use-case being important, we just disable it here.

Note that we continue to support it internally as some 39 tests
rely on it.

Issue highlighted by (changes to the) chanmon_consistency fuzz
target (in the next commit).

ldk-reviews-bot · 2025-04-15T15:32:05Z

👋 Thanks for assigning @valentinewallace as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

lightning/src/ln/channelmanager.rs

valentinewallace · 2025-04-15T18:40:46Z

We may be able to remove the test test_sync_async_persist_doesnt_hang since it specifically covers async + sync persistence.

valentinewallace · 2025-04-16T17:26:29Z

Would it be a good take-a-Friday issue to fix the legacy tests that use both sync + async so we don't need to gate the panics anymore?

fuzz/src/chanmon_consistency.rs

lightning/src/chain/mod.rs

valentinewallace · 2025-04-16T18:05:15Z

lightning/src/ln/channelmanager.rs

+	/// We only support using one of [`ChannelMonitorUpdateStatus::InProgress`] and
+	/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not


Some thoughts: is there a way to enforce this with the compiler? And is it a goal of the project to continue supporting both sync + async persistence forever?

I suppose we could have some kind of flag you set on startup which indicates which persistence method you're using, but I'm sure really sure that it would be an issue in practice - we expect users to use either Persist or some future AsyncPersist which will handle this stuff for them, and I don't really see why someone would ever persist sometimes-sync-sometimes-async - either your disk is slow and you need async or you don't.

We definitely want to support both at the project level, though, for the same reason we want to support async and sync everywhere in the project - Rust-async-native projects are becoming more common, but Rust-non-async projects also exist (as well as platforms where that's required) as well as our current bindings are not async.

We definitely want to support both at the project level, though, for the same reason we want to support async and sync everywhere in the project - Rust-async-native projects are becoming more common, but Rust-non-async projects also exist (as well as platforms where that's required) as well as our current bindings are not async.

To clarify, by sync + async persistence I mean supporting either returning ChannelMonitorUpdateStatus::Completed or ::InProgress. It feels like we expect most production users to persist data asynchronously and then later call channel_monitor_updated, so it could be reasonable to drop support for the ::Completed option down the line. Seems like users that are the exception to that can just call channel_monitor_updated immediately, as well.

Maybe? I imagine many small nodes will still want the simplicity of sync.

Maybe, seems worth exploring though. Bookmarking to discuss further.

Yea, maybe when we're super 100% on the stability of async persist? It wouldn't free up a lot of code though cause it's all the same pathway now anyway.

fuzz/src/chanmon_consistency.rs

valentinewallace · 2025-04-16T18:21:33Z

fuzz/src/chanmon_consistency.rs

+	let empty_node_a_ser = node_a.encode();
+	let empty_node_b_ser = node_b.encode();
+	let empty_node_c_ser = node_c.encode();


These don't seem to be used anywhere?

Oops, part of a later patch :)

codecov · 2025-04-16T20:30:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.82%. Comparing base (83e9e80) to head (ddccbda).
Report is 19 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3737      +/-   ##
==========================================
+ Coverage   89.12%   89.82%   +0.69%     
==========================================
  Files         156      156              
  Lines      123514   129338    +5824     
  Branches   123514   129338    +5824     
==========================================
+ Hits       110086   116176    +6090     
+ Misses      10749    10554     -195     
+ Partials     2679     2608      -71

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TheBlueMatt · 2025-04-16T20:35:08Z

Would it be a good take-a-Friday issue to fix the legacy tests that use both sync + async so we don't need to gate the panics anymore?

Yea, I imagine its a good bit of work, though.

wpaulino · 2025-04-28T17:18:05Z

Needs a squash

fuzz/src/chanmon_consistency.rs

lightning/src/ln/channelmanager.rs

In general, we don't expect users to persist `ChannelMonitor[Update]`s both synchronously and asynchronously for a single `ChannelManager` instance. If a user has implemented asynchronous persistence, they should generally always use that, as there is then no advantage to them to occasionally persist synchronously. Even still, in 920d96e we fixed some bugs related to such operation, and noted that "there isn't much cost to supporting it". Sadly, this is not true. Specifically, the dual-sync-async persistence flow is ill-defined and difficult to define in away that a user can realistically implement. Consider the case of a `ChannelMonitorUpdate` which is persisted asynchronously and while it is still being persisted a new `ChannelMonitorUpdate` is created. If the second `ChannelMonitorUpdate` is persisted synchronously, the `ChannelManager` will be left with a single pending `ChannelMonitorUpdate` which is not the latest. If we were to then restart, the latest copy of the `ChannelMonitor` would be that without any updates, but the `ChannelManager` has a pending `ChannelMonitorUpdate` for the next update, but not the one after that. The user would then have to handle the replayed `ChannelMonitorUpdate` and then find the second `ChannelMonitorUpdate` on disk and somehow know to replay that one as well. Further, we currently have a bug in handling this scenario as we'll complete all pending post-update actions when the second `ChannelMonitorUpdate` gets persisted synchronously, even though the first `ChannelMonitorUpdate` is still pending. While we could rather trivially fix these issues, addressing the larger API question above is difficult and as we don't anticipate this use-case being important, we just disable it here. Note that we continue to support it internally as some 39 tests rely on it. We do, however, remove test_sync_async_persist_doesnt_hang which was added specifically to test for this use-case, and we now do not support it. Issue highlighted by (changes to the) chanmon_consistency fuzz target (in the next commit).

When we reload a node in the `chanmon_consistency` fuzzer, we always reload with the latest `ChannelMonitor` state which was confirmed as persisted to the running `ChannelManager`. This is nice in that it tests losing the latest `ChannelMonitor`, but there may also be bugs in the on-startup `ChannelMonitor` replay. Thus, here, we optionally reload with a newer `ChannelMonitor` than the last-persisted one.

`chanmon_consistency` was originally written with lots of macros due to some misguided concept of code being unrolled at compile-time. This is, of course, a terrible idea not just for compile times but also for performance. Here, we make `reload_node` a function in anticipation of it being used in additional places in future work.

TheBlueMatt · 2025-04-29T13:43:23Z

Squashed with a minor comment fix:

$ git diff-tree -U1 ddccbdabd d009b3918
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index c86eed77b..54b607d88 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -2573,4 +2573,4 @@ where
 	/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not
-	/// otherwise directly enforce this, we enforce it in debug builds here by storing which one is
-	/// in use.
+	/// otherwise directly enforce this, we enforce it in non-test builds here by storing which one
+	/// is in use.
 	#[cfg(not(test))]

wpaulino · 2025-04-29T17:56:15Z

These externalized tests seem to be failing:

- test_batch_funding_close_after_funding_signed
- test_batch_channel_open

TheBlueMatt added this to the 0.2 milestone Apr 15, 2025

ldk-reviews-bot requested a review from valentinewallace April 15, 2025 15:42

wpaulino reviewed Apr 15, 2025

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

TheBlueMatt force-pushed the 2025-04-no-dual-sync-async branch from 4e41cb7 to 70f9409 Compare April 16, 2025 14:37

valentinewallace reviewed Apr 16, 2025

View reviewed changes

TheBlueMatt force-pushed the 2025-04-no-dual-sync-async branch from 70f9409 to ddccbda Compare April 16, 2025 19:58

valentinewallace reviewed Apr 28, 2025

View reviewed changes

fuzz/src/chanmon_consistency.rs Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved

TheBlueMatt added 3 commits April 29, 2025 13:42

TheBlueMatt force-pushed the 2025-04-no-dual-sync-async branch from ddccbda to d009b39 Compare April 29, 2025 13:42

TheBlueMatt requested review from wpaulino and valentinewallace April 29, 2025 13:43

valentinewallace approved these changes Apr 29, 2025

View reviewed changes

wpaulino approved these changes Apr 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow dual-sync-async persistence without restarting #3737

Disallow dual-sync-async persistence without restarting #3737

TheBlueMatt commented Apr 15, 2025

ldk-reviews-bot commented Apr 15, 2025 •

edited

Loading

valentinewallace commented Apr 15, 2025

valentinewallace commented Apr 16, 2025

valentinewallace Apr 16, 2025

TheBlueMatt Apr 16, 2025

valentinewallace Apr 16, 2025

TheBlueMatt Apr 16, 2025

valentinewallace Apr 16, 2025

TheBlueMatt Apr 16, 2025

valentinewallace Apr 16, 2025

TheBlueMatt Apr 16, 2025

codecov bot commented Apr 16, 2025 •

edited

Loading

TheBlueMatt commented Apr 16, 2025

wpaulino commented Apr 28, 2025

TheBlueMatt commented Apr 29, 2025

wpaulino commented Apr 29, 2025

		/// We only support using one of [`ChannelMonitorUpdateStatus::InProgress`] and
		/// [`ChannelMonitorUpdateStatus::Completed`] without restarting. Because the API does not

Disallow dual-sync-async persistence without restarting #3737

Are you sure you want to change the base?

Disallow dual-sync-async persistence without restarting #3737

Conversation

TheBlueMatt commented Apr 15, 2025

ldk-reviews-bot commented Apr 15, 2025 • edited Loading

valentinewallace commented Apr 15, 2025

valentinewallace commented Apr 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 16, 2025 • edited Loading

Codecov Report

TheBlueMatt commented Apr 16, 2025

wpaulino commented Apr 28, 2025

TheBlueMatt commented Apr 29, 2025

wpaulino commented Apr 29, 2025

ldk-reviews-bot commented Apr 15, 2025 •

edited

Loading

codecov bot commented Apr 16, 2025 •

edited

Loading