-
Notifications
You must be signed in to change notification settings - Fork 941
Fix nagle, other minor optimizations #8222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rustyrussell
merged 21 commits into
ElementsProject:master
from
rustyrussell:guilt/fix-tcp-cork
May 8, 2025
Merged
Fix nagle, other minor optimizations #8222
rustyrussell
merged 21 commits into
ElementsProject:master
from
rustyrussell:guilt/fix-tcp-cork
May 8, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
63a2a8b
to
545970f
Compare
545970f
to
b301e5d
Compare
3eeff79
to
821e111
Compare
Once we speed things up, we can *receive* commitment_signed before we disconnect due to sending ours: in this case, we *will* have a scratch tx. Signed-off-by: Rusty Russell <[email protected]>
Once we speed things up, this can receive COMMITMENT_SIGNED, and hence succeed. Let's drop on receive, not send! ``` 2025-05-06T06:07:51.5997333Z # Peers try RBF, break on initial COMMITMENT_SIGNED 2025-05-06T06:07:51.5997855Z bump = l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt']) 2025-05-06T06:07:51.5998214Z > with pytest.raises(RpcError): 2025-05-06T06:07:51.5998542Z E Failed: DID NOT RAISE <class 'pyln.client.lightning.RpcError'> ``` Signed-off-by: Rusty Russell <[email protected]>
Presumably we want to wait for htlcs to be all resolved here. With the speedup, this failed under Postgres: ``` > wait_for(lambda: only_one(l2.rpc.listpeerchannels(l3.info['id'])['channels'])['htlcs'] != []) tests/test_closing.py:4191: ``` Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
…e a short signature. ``` > assert txinfo['weight'] == prep['estimated_final_weight'] E assert 755 == 756 ``` Signed-off-by: Rusty Russell <[email protected]>
This happens one time in 256: ``` 2025-05-06T11:00:43.2201832Z > assert feerate - 1 < total_feerate_perkw < feerate + 1 2025-05-06T11:00:43.2202133Z E assert 12005.233318796338 < (12000 + 1) ``` Signed-off-by: Rusty Russell <[email protected]>
We have to make sure the new tx has been received by bitcoind, otherwise we could be examining the previous: ``` > check_feerate(l2, total_feerate_perkw, feerate) tests/test_closing.py:4053: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ node = <fixtures.LightningNode object at 0x7f645789ee90> actual_feerate = 14000.0, expected_feerate = 15000 def check_feerate(node, actual_feerate, expected_feerate): # Feerate can't be lower. > assert actual_feerate > expected_feerate - 2 ``` Signed-off-by: Rusty Russell <[email protected]>
We can fail to connect because l3 connects to l2 at the same time l2 connects to l3: ``` # The plugin in l2 fixes up the connection, so this works! > l1.rpc.fetchinvoice(offer['bolt12']) tests/test_connection.py:4792: ... elif "error" in resp: > raise RpcError(method, payload, resp['error']) E pyln.client.lightning.RpcError: RPC call failed: method: fetchinvoice, payload: {'offer': 'lno1qgsqvgnwgcg35z6ee2h3yczraddm72xrfua9uve2rlrm9deu7xyfzrcgqgqjczs7w3jhxazldahxjmmwd4jhxumpvaj47en0wfmkzunytanxz6tvzcssxhftzxfdlwsnfcgw2sy8t5mxa0ytcdfat2nkdwqvpy9nnsa9mzza'}, error: {'code': 1005, 'message': 'Timeout waiting for response'} ``` From logs: ``` lightningd-2 2025-05-07T21:20:54.367Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-connectd: peer_out WIRE_INIT lightningd-2 2025-05-07T21:20:54.367Z DEBUG 035d2b1192dfba134e10e540875d366ebc8bc353d5aa766b80c090b39c3a5d885d-connectd: Connect IN `` Signed-off-by: Rusty Russell <[email protected]>
821e111
to
fde04af
Compare
We need to make sure the lease starts a the block we expect: ``` # Note that l3 has the whole lease delay (minus blocks already mined) _, _, l3blocks = l3.wait_for_onchaind_tx('OUR_DELAYED_RETURN_TO_WALLET', 'OUR_UNILATERAL/DELAYED_OUTPUT_TO_US') > assert l3blocks == 4032 - 6 - 2 - 1 E assert 4022 == (((4032 - 6) - 2) - 1) tests/test_closing.py:985: AssertionError ``` Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Our CORK logic was wrong, and it's better to use Nagle anyway: we disable Nagle just before sending timing-critical messages. Time for 100 (failed) payments: Before: 148.8573575 After: 10.7356977 Note this revealed a timing problem in test_reject_invalid_payload: we would miss the cause of the sendonion failure, and waitsendpay would be called *after* it had failed, so simply returns "Payment failure reason unknown". Signed-off-by: Rusty Russell <[email protected]> Changelog-Fixed: Protocol: Removed 200ms latency from sending commit/revoke messages.
To run, use: VALGRIND=0 TEST_BENCH=1 eatmydata pytest tests/test_connection.py::test_bench Example of running on my laptop (without --enable-debugbuild, over of 10 runs): FAILED tests/test_connection.py::test_bench - assert 47.403406-49.191765(48.343+/-0.48) == 0 With eatmydata: FAILED tests/test_connection.py::test_bench - assert 25.867860-27.549878(26.5762+/-0.5) == 0 You can also run perf on l1 once it's running: perf record --call-graph dwarf -q -p $(cat /tmp/ltests-*/test_bench_1/lightning-1/lightningd-regtest.pid) Then ^C after 10 seconds and run "perf report". Things which stood out: 1. Tracing in db_exec_prepared_v2 (fixed in another PR) 31.12% 0.04% lightningd lightningd [.] db_exec_prepared_v2 - 31.08% db_exec_prepared_v2 + 22.96% db_sqlite3_exec + 4.46% trace_span_end + 1.77% trace_span_start + 1.11% trace_span_tag + 0.72% tal_free 2. Logging: - 16.03% logv - 8.15% maybe_print - log_to_files + 4.51% __GI__IO_fflush (inlined) + 1.97% tal_fmt_ + 0.51% __GI___strftime_l (inlined) 3. Notification (when nothing is listening) in notify_log: - 6.84% maybe_notify_log - notify_log + 3.37% notify_send + 1.75% notify_start + 1.71% log_notification_serialize 0.56% new_log_entry Signed-off-by: Rusty Russell <[email protected]>
…hecking. Before: 92.26076173782349 91.9576666355133 90.92732524871826 After: 90.5989830493927 88.10309219360352 90.07689118385315
Signed-off-by: Rusty Russell <[email protected]>
If nobody is subscribed, have notify_start return NULL and the caller can skip serialization. This is particularly useful for the "log" notification which can get called a lot. Signed-off-by: Rusty Russell <[email protected]>
This means we don't have to iterate through all plugins, making our "do we even have to construct this notification" optimization much more efficient. Signed-off-by: Rusty Russell <[email protected]>
This speeds logging slightly, but most of the time is actually spent in fflush() (which we need). Result (10 runs, eatmydata): FAILED tests/test_connection.py::test_bench - assert 24.086638-25.347475(24.6901+/-0.4) == 0 This is an 8% performance improvement (over the entire series), which is not bad. Signed-off-by: Rusty Russell <[email protected]>
If there are pending channel announcements, they'll look like a leak unless we scan into the maps. ``` lightningd-2 2025-05-01T07:27:03.922Z **BROKEN** gossipd: MEMLEAK: 0x60d000000478 lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: label=gossipd/gossmap_manage.c:595:struct pending_cannounce lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: alloc: lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:488 (tal_alloc_) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:595 (gossmap_manage_channel_announcement) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/gossipd/gossipd.c:205 (handle_recv_gossip) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/gossipd/gossipd.c:300 (connectd_req) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/common/daemon_conn.c:35 (handle_read) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:60 (next_plan) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:422 (do_plan) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:439 (io_ready) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:455 (io_loop) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: /home/runner/work/lightning/lightning/gossipd/gossipd.c:660 (main) lightningd-2 2025-05-01T07:27:03.923Z **BROKEN** gossipd: ../sysdeps/nptl/libc_start_call_main.h:58 (__libc_start_call_main) lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd: ../csu/libc-start.c:392 (__libc_start_main_impl) lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd: parents: lightningd-2 2025-05-01T07:27:03.924Z **BROKEN** gossipd: gossipd/gossmap_manage.c:475:struct gossmap_manage ``` Signed-off-by: Rusty Russell <[email protected]>
This significantly slows down the cycle when we have flakes (it's good if the tree is completely broken though!). Signed-off-by: Rusty Russell <[email protected]>
It's disabled. Signed-off-by: Rusty Russell <[email protected]>
We don't care about data persistence, and this suppresses all sync calls (speeding sqlite3 specifically). Signed-off-by: Rusty Russell <[email protected]>
fde04af
to
ad11fe6
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed we were not going as fast as we should, in a 100-payments test. Turns out our Nagle-suppressing code was wrong!
Then I continued with minor optimization until our profiling basically only contained sqlite operations, resulting in an 8% speedup in bouncing payments. (when eatmydata was used).
(Dropped patch which speeds CI using eatmydata, as it broke too much!)