-
Notifications
You must be signed in to change notification settings - Fork 407
Include short channel id in PaymentPathFailed #1077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include short channel id in PaymentPathFailed #1077
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1077 +/- ##
==========================================
+ Coverage 90.64% 92.06% +1.42%
==========================================
Files 65 66 +1
Lines 34226 41090 +6864
==========================================
+ Hits 31024 37831 +6807
- Misses 3202 3259 +57
Continue to review full report at Codecov.
|
lightning/src/routing/router.rs
Outdated
/// A path that included the failing [`RouteHop`]. | ||
pub path: Vec<RouteHop>, | ||
|
||
/// The index of the failing [`RouteHop`] within the path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantics of this are really confusing. Most of that is just cause RouteHop
confusingly refers to both a node and a channel, which is itself confusing, but its compounded here.
In general, I think we have three choices here
- take the complicated approach and have logic here to indicate if we're blaming a node or a channel.
- take the moderate and decide which of the inbound or outbound channels to blame based on what the responding node said or,
- take the easy approach and always blame the outbound channel from the node that we received the failure from.
I think the easy approach is basically what you do here, and ok for a firstpass - we'll never take the same path, which is good, but we may disable a channel that we really shouldn't.
In any case, I think this needs to be much clearer, and its unclear to me why we need this full struct when we could just return a short channel id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I think this needs to be much clearer, and its unclear to me why we need this full struct when we could just return a short channel id.
We'll need the path for retry (to compute the amount), but I tend to agree it may be better to keep the path separate. It would at least be symmetric with the forthcoming event for a successful payment path.
As far as the scoring API goes, I had planned on giving a reference to the path and to the failing hop. But the latter could be the short channel id instead. Users could always scan the path if they want to find the matching hop, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I guess could go either way, but it does feel cleaner imo to just give the SCID instead of a reference to a path id - you're just going to go update the SCID in the graph anyway, right?
lightning/src/ln/onion_utils.rs
Outdated
failing_route_hop = if payment_failed { None } else { | ||
Some(FailingRouteHop { | ||
path: path.clone(), | ||
index: if is_from_final_node { route_hop_idx } else { route_hop_idx + 1 } as u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should at least make a tiny bit of effort to decide if the error code seems to be blaming the inbound or outbound channel, instead of always saying outbound for not-final-node and inbound for final-node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was primarily matching the logic used for the NetworkUpdate
above. Would we want to do something similar there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we should maybe do something similar there, and also map some things that are currently channel failures to other types of failures. The code here is pretty old and admittedly a bit lazy. This comment is maybe on the wrong line, though - I guess the bigger question is later on in the non-PERM blocks. Maybe its ok to handle this stuff later and just land it as-is for now, as long as we get the API where we want.
Could use a rebase, not that we have the path itself now in PaymentPathFailed, just not the failing hop. |
02b4f09
to
6c23617
Compare
735414a
to
b6536a1
Compare
lightning/src/routing/scorer.rs
Outdated
pub struct Scorer<S: routing::Score> { | ||
scoring: S, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have this struct rather than just always using the generic inner object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm a bit confused why we have two structs here instead of just defining a new trait that requires both EventHandler
and routing::Score
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally did this, but then the user would have to implement EventHandler
on their own. Figure we can correctly implement the boilerplate for them and let them focus on implementing their scoring logic. One less thing for them to do and get wrong. Especially if the events need to be processed differently in the future.
There are other advantages to this approach. It lets us vary Scorer
's interface without users needing to update their code. Any "default" scoring could be pushed directly into Scorer
if desired. Plus, Scorer
doesn't require payment_path_failed
and payment_path_successful
, so those doesn't need to be part of the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess as a user it just looks confusing to expose all the wrapper guts - I've got a score object and a scorer trait and I have to construct the scorer object with my score-implementing thing then use the score wrapper that does.... magic?
Does it make more sense to do the adapting between events and path-fail-succeed calls in a wrapper in the background processor? Then we'd still get the nice adapted interface but wouldn't have to expose an awkward wrapper to the world.
More generally, though, I think we're optimizing for the wrong thing here - the common user behavior will be one of (a) constant penalty, (b) our default that uses payment data, (c) data fetched from a server that does active probing.
I don't know how much demand there really is for using payment data with a user-provider scorer that we need to go through the effort to have a public event-score-thing wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess as a user it just looks confusing to expose all the wrapper guts - I've got a score object and a scorer trait and I have to construct the scorer object with my score-implementing thing then use the score wrapper that does.... magic?
That's overcomplicating what's happening, IMHO. Our interfaces take Scorer
. Docs say it is parameterized by scoring logic. User either (a) decides they want custom logic and implement the parameterized trait or (b) use Scorer::default()
.
Does it make more sense to do the adapting between events and path-fail-succeed calls in a wrapper in the background processor? Then we'd still get the nice adapted interface but wouldn't have to expose an awkward wrapper to the world.
Hmmm... if anything given InvoicePayer
takes a Router
maybe it's best to push the event handling logic there? The Scorer
(or whatever it ends up being) will need to make its way there somehow. So then it seems a Router
implementation would need the scoring parameterization and path-fail-succeed functions, and InvoicePayer
could call the latter as needed.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's overcomplicating what's happening, IMHO.
Yea, that's probably right, I just wanted to highlight by being a bit facetious that we've seen some user confusion and we should probably lean hard towards "simpler" APIs where possible.
Hmmm... if anything given InvoicePayer takes a Router maybe it's best to push the event handling logic there?
I'd buy that. That feels right.
@@ -203,6 +203,8 @@ pub enum Event { | |||
all_paths_failed: bool, | |||
/// The payment path that failed. | |||
path: Vec<RouteHop>, | |||
/// The channel responsible for the failed payment path. | |||
short_channel_id: Option<u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems like network_update
already effectively contains the short_channel_id
(s). Explicitly if NetworkUpdate::ChannelUpdateMsg
or ::ChannelClosed
, and implicitly if NodeFailure
(i.e. the relevant SCIDs would be all of that node's channels)
Possible to omit this field for those reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are a few cases in test_onion_failure
that test None
for network_update
and Some
for short_channel_id
. Then again, maybe I'm setting it when not appropriate?
@TheBlueMatt For NodeFailure
, would negatively scoring all the nodes channels be appropriate here? We'd need to either include them all or have Scorer
have a reference to the NetworkGraph
so that it can call Score::payment_path_failed
for each channel. But that would be strange since the given path wouldn't necessarily contain the short_channel_id
. And there would be no simple way to improve the score for all the node's channels if a path containing the node is eventually successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, CHannelUpdateMsg
is set when the failure-generating node sets the UPDATE
flag and we were able to parse a valid ChannelUpdate
from that peer. eg if the peer fails to return a ChannelUpdate
we definitely want to de-score the channel, cause they're returning garbage :).
For NodeFailure, would negatively scoring all the nodes channels be appropriate here?
I don't think there's a "right" answer for any of this. I don't have a gut feel for whether nodes will return, eg, temporary node failures that are per-channel or per-node, so for now lets keep it simple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to do the scorer and get_route->get_route_and_payment_hash changes in a new PR, to better parallelize landing the first few commits?
lightning/src/routing/scorer.rs
Outdated
pub struct Scorer<S: routing::Score> { | ||
scoring: S, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm a bit confused why we have two structs here instead of just defining a new trait that requires both EventHandler
and routing::Score
?
SGTM |
f050f64
to
44156ba
Compare
With channel scoring and payment retries, it is no longer necessary to have expiry_too_far imply a payment failure.
This simplifies failing route hop calculation, which will be useful for later identifying the failing hop for PaymentFailed events.
44156ba
to
d4c85b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, one comment then I'm ACK, will need some test tweaks, I think.
lightning/src/ln/onion_utils.rs
Outdated
@@ -446,10 +452,14 @@ pub(super) fn process_onion_failure<T: secp256k1::Signing, L: Deref>(secp_ctx: & | |||
}); | |||
} | |||
|
|||
if short_channel_id.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I don't think we want to set scid when payment_failed
as its things like "unknown payment hash" which shouldn't factor into scoring (when they come from the last-hop). However, we'll need to exempt the 18|19 cases from this which imply someone on the route screwed with the payment, probably the second-to-last-hop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, latest fixup should reflect this, IIUC. Think I'm getting a better feel for this code.
run_onion_failure_test("fee_insufficient", 0, &nodes, &route, &payment_hash, &payment_secret, |msg| { | ||
msg.amount_msat -= 1; | ||
}, || {}, true, Some(UPDATE|12), Some(NetworkUpdate::ChannelClosed { short_channel_id: channels[0].0.contents.short_channel_id, is_permanent: true})); | ||
}, || {}, true, Some(UPDATE|12), Some(NetworkUpdate::ChannelClosed { short_channel_id, is_permanent: true}), None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect fee_insufficient
to include an SCID, especially if we're marking the channel as closed - a few things here:
- are we generating invalid fee_insufficient responses (ie not including the ChannelUpdate object)?
- whether we are or not, we probably should have a SCID to blame if we get a failure with UPDATE but no ChannelUpdate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect
fee_insufficient
to include an SCID, especially if we're marking the channel as closed - a few things here:
- are we generating invalid fee_insufficient responses (ie not including the ChannelUpdate object)?
Looks like we are here: https://github.com/rust-bitcoin/rust-lightning/blob/d66574803e08c65db0e5cec66a3ac7ba5a68bf4f/lightning/src/ln/channelmanager.rs#L1847
- whether we are or not, we probably should have a SCID to blame if we get a failure with UPDATE but no ChannelUpdate.
Updated accordingly.
@@ -362,7 +373,7 @@ fn test_onion_failure() { | |||
let session_priv = SecretKey::from_slice(&[3; 32]).unwrap(); | |||
let onion_keys = onion_utils::construct_onion_keys(&Secp256k1::new(), &route.paths[0], &session_priv).unwrap(); | |||
msg.reason = onion_utils::build_first_hop_failure_packet(&onion_keys[0].shared_secret[..], NODE|2, &[0;0]); | |||
}, ||{}, true, Some(NODE|2), Some(NetworkUpdate::NodeFailure{node_id: route.paths[0][0].pubkey, is_permanent: false})); | |||
}, ||{}, true, Some(NODE|2), Some(NetworkUpdate::NodeFailure{node_id: route.paths[0][0].pubkey, is_permanent: false}), None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - temporary_node_failure
definitely needs an SCID to blame, otherwise we'll probably retry the same path through a node we dont know how to talk to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. That should have been an obvious one lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise we'll probably retry the same path through a node we dont know how to talk to.
Why does this apply to temporary node failures but not e.g. the BADONION permanent ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely applies to BADONION as well! Good catch. It should apply any time the payment is retryable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sg, FYI I interpret retryable == "not rejected by dest"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we may need to debate changing that name, but the docs basically say "you can retry unless this is set"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added handling of BADONION
but couldn't figure out a way to test this. 😕
Will update the docs but I'm afraid the logic is complicated enough that I'm not certain short_channel_id
is always set when process_onion_failure
returns that it is retryable. Seems there are a few test cases where it is true but we expect None
, in fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not too familiar with this code area but it's looking pretty sane! The redundant short_channel_id
field makes me uncomfy but I can live with it for expediency :P
} else { | ||
Some(NetworkUpdate::ChannelUpdateMessage { | ||
if failing_route_hop.short_channel_id == chan_update.contents.short_channel_id { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment here could be useful, to know the reasoning for when to set short_channel_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
18|19 => Some(route_hop.short_channel_id), | ||
_ => None, | ||
}; | ||
} else { | ||
// We can't understand their error messages and they failed to | ||
// forward...they probably can't understand our forwards so its | ||
// really not worth trying any further. | ||
network_update = Some(NetworkUpdate::NodeFailure { | ||
node_id: route_hop.pubkey, | ||
is_permanent: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem odd to generate a permanent node failure without banning all of that node's channels. Should there be a todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think of it, processing this NetworkUpdate
should result in the node and all its channels being removed from NetworkGraph
when the event is handled by NetGraphMsgHandler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm mistaken here. This happens through NetworkGraph
's interface but our code for processing NodeFailure
just has a TODO already.
6553205
to
38a928f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the onion tests in more detail, seem to make sense. Two small questions then I'm ACK 💯
Appreciate the cleanups in process_onion_failure
! Readability 🚀
let (_, payment_hash, payment_secret) = get_payment_preimage_hash!(nodes[2]); | ||
|
||
run_onion_failure_test("invalid_onion_version", 0, &nodes, &route, &payment_hash, &payment_secret, |msg| { msg.onion_routing_packet.version = 1; }, ||{}, true, | ||
Some(BADONION|PERM|4), None); | ||
Some(BADONION|PERM|4), None, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, these tests seem to have retryable
but no update, so we'd always retry the same path in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Jeff pointed out these errors are coming from our immediate peer, not a few-hops away one. Ideally we'd have tests for both cases, but these specific cases should not include an SCID I think - punishing our own channel isn't really helpful, we should be closing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment here as discussed offline.
b301aae
to
cfb8030
Compare
@@ -2985,6 +2987,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana | |||
network_update: None, | |||
all_paths_failed, | |||
path: path.clone(), | |||
short_channel_id: Some(path.first().unwrap().short_channel_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a huge fan of "punishing" our own channels - on one hand its nice that the payment goes through, but on the other hand it seems like we're addressing the problem in the wrong way - we have a peer who isn't able to read our HTLC-add messages, we shouldn't just stop sending payments their way, we should close the channel with them and disconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Punishment only occurs during event handling, so this is just signaling which is the responsible channel. The event handler could check if short_channel_id
is the same as the first hop in the path and act accordingly. For instance, InvoicePayer
could not pass the hop to the scorer in this case. And another event handler (perhaps in BackgroundProcessor
?) could close the channel. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make sense, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make it an enum
OurChannel(u64)
NetworkChannel(u64)
NotRetryable
Just some inspo, I'm fine with it as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this either way, but do want to discuss punishing our own channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this either way, but do want to discuss punishing our own channels.
run_onion_failure_test("fee_insufficient", 0, &nodes, &route, &payment_hash, &payment_secret, |msg| { | ||
msg.amount_msat -= 1; | ||
}, || {}, true, Some(UPDATE|12), Some(NetworkUpdate::ChannelClosed { short_channel_id: channels[0].0.contents.short_channel_id, is_permanent: true})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this was already here but why would UPDATE|12
result in NetworkUpdate::ChannelClosed
instead of NetworkUpdate::ChannelUpdate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, I had to go look up the same cause it was confusing - its because the update we got back shouldn't have resulted in a payment failure. They're telling us the fee they charged was, in fact, enough for the payment, implying the previous-hop node malleated the payment before forwarding, taking funds they weren't supposed to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. @jkczyz think we could add a comment for this one since we're here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is the case where fee_insufficient
is returned but shouldn't have been based on our calculation using the data given to us in the update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, didn't notice last two comments before replying. I'll add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I think after these comments + one outstanding I'm happy
@@ -333,9 +343,10 @@ fn test_onion_failure() { | |||
// describing a length-1 TLV payload, which is obviously bogus. | |||
new_payloads[0].data[0] = 1; | |||
msg.onion_routing_packet = onion_utils::construct_onion_packet_bogus_hopdata(new_payloads, onion_keys, [0; 32], &payment_hash); | |||
}, ||{}, true, Some(PERM|22), Some(NetworkUpdate::ChannelClosed{short_channel_id: channels[1].0.contents.short_channel_id, is_permanent: true})); | |||
}, ||{}, true, Some(PERM|22), Some(NetworkUpdate::ChannelClosed{short_channel_id, is_permanent: true}), Some(short_channel_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also already here but I think NetworkUpdate::ChannelClosed
needs a rename or a docs update at some point if it's used in situations like these, given no channel was closed iiuc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. ChannelFailure
would be a better name. We either disable or remove it from the NetworkGraph
depending on is_permanent
, which I guess affects gossip. But, indeed, the channel may not actually be closed.
let (_, payment_hash, payment_secret) = get_payment_preimage_hash!(nodes[2]); | ||
|
||
// Our immediate peer sent UpdateFailMalformedHTLC because it couldn't understand the onion in | ||
// the UpdateAddHTLC that we sent. Thus, short_channel_id is None because there's no sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment needs updating for Some short_channel_id
Could also add a note for why NetworkUpdate is None (is it the same reason, i think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... @TheBlueMatt for these test cases presumably we don't give a NetworkUpdate
since it's our immediate peer. But for the case of BADONION
inside process_onion_failure
, should we return NetworkUpdate::ChannelClosed
? The latter case is currently untested as noted earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this corresponds to the TODO at https://github.com/rust-bitcoin/rust-lightning/blob/main/lightning/src/ln/channelmanager.rs#L3099 noting that updating the network graph won't do anything. For BADONION from a later node, I'd agree a ChannelClosed
makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, since all BADONION
failures are also PERM
then we could probably drop the BADONION
handling code in process_onion_failure
and let the PERM
handling code take care of this.
let (_, payment_hash, payment_secret) = get_payment_preimage_hash!(nodes[2]); | ||
|
||
run_onion_failure_test("final_expiry_too_soon", 1, &nodes, &route, &payment_hash, &payment_secret, |msg| { | ||
let height = msg.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS + 1; | ||
connect_blocks(&nodes[0], height - nodes[0].best_block_info().1); | ||
connect_blocks(&nodes[1], height - nodes[1].best_block_info().1); | ||
connect_blocks(&nodes[2], height - nodes[2].best_block_info().1); | ||
}, || {}, true, Some(17), None); | ||
}, || {}, true, Some(17), None, None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final_expiry_to_soon
case should be retryable/have a Some
scid IIUC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, yea, I think we should have a Some
here. Specifically, lightning/bolts#922 may cause a node to bounce our payment because of simple incompatibility, but we should retry the payment via another path. This means we should punish the node, but sadly not because they're actually misbehaving, just cause lightning sucks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using test case 1
which indicates the final node failed, so it is not retryable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is why our definition of retryable
/rejected_by_dest
needs a little bit of nuance/description since that still sounds retryable to me, even though it was the final node that made that call. Fine for now, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we already give a buffer of a block or two, so its possible we hit that because some intermediary node delayed the payment by two blocks, but short of that the recipient is rejecting the payment for a bogus reason. Agree all of this is super subtle and some things can have multiple meanings. Can't solve that here, though :/
run_onion_failure_test("fee_insufficient", 0, &nodes, &route, &payment_hash, &payment_secret, |msg| { | ||
msg.amount_msat -= 1; | ||
}, || {}, true, Some(UPDATE|12), Some(NetworkUpdate::ChannelClosed { short_channel_id: channels[0].0.contents.short_channel_id, is_permanent: true})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. @jkczyz think we could add a comment for this one since we're here?
Created #1117 to track any followup stuff. |
Ok, I think I addressed the remaining comments, but let me know if I missed something. Lots of moving parts. 😬 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fielding all this feedback! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs squash.
This will be useful for scoring channels when a payment fails.
32c89cc
to
202acd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue, but lets look at it in a followup, we need to move forward here.
Update
process_onion_failure
to return a short channel id of the responsible channel and include it inPaymentPathFailed
events. This will allow for scoring channels based on past performance when finding the best payments paths inget_route
.