-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BOLT 04: Add failure code for invalid payload. #627
Conversation
04-onion-routing.md
Outdated
@@ -724,6 +724,12 @@ General permanent failure of the processing node. | |||
|
|||
The processing node has a required feature which was not in this onion. | |||
|
|||
1. type: BADONION|PERM (`invalid_onion_payload`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I treat this as the default BADONION
error, but maybe it would make more sense to assign it a specific bit (e.g. BADONION|PERM|1
or BADONION|PERM|7
).
Let me know what you think would be best.
490bfd2
to
9f8d44e
Compare
Hmm, we used to have a scheme where the offset of the error was in the error message, but I can't find it so maybe it was in my imagination! Would be nice to have, to offer a debugging hint as to what field you disliked. This shouldn't be BADONION I think; that's for when we literally can't make a reply due to parsing errors. This is a normal error we can generate ourselves, so that bit probably shouldn't be used. |
Got it, it makes sense to reserve BADONION for cryptographic failures and general onion support (onion version check). Then what about using Should I include some data in the error (should we define an enum for TLV error codes, containing things like We should note as well that another reason why this could fail is that onion payload might be missing some required information (amount, expiry or channel_id). We can't really provide a byte offset of the error in that case so I don't know if it's worth adding. |
Gentle ping @cfromknecht @rustyrussell, I'm curious to know what error code your implementation uses in that case? It would be nice to align before merging variable-onion to implementations' master branches. |
@t-bast is it right that this new message would be just for debugging? The downside of another message is that it may increase the node fingerprinting surface. Especially if the contents of it is not very strict. It may make it easier to probe the network and find out what implementations and (potentially vulnerable) versions are running. |
I thought it could be useful for debugging (especially at the beginning since we're introducing both tlv and variable-length onion to the network). |
The specification currently doesn't specify the case where the onion per-hop payload can't be correctly decoded. This is somewhat fine with the fixed frames because every field of the payload can always be interpreted as a numeric value from the input bytes, so it leads to application errors in upper layers when those values are actually interpreted (and we realize that for instance we have an invalid short_channel_id` value). With variable-length tlv streams in the onion payloads, we will encounter decoding errors (duplicate tlv types, invalid ordering, etc) and the spec should define the failure code to use in that case.
e8498b1
to
60c88fa
Compare
One comment along a different axis: Do we want to move to tlv for the failure message? If yes, we could now define the last failure code we ever need. Its value would be a tlv stream. Any new failure codes like the one proposed in this pr can then be defined as tlv types in that stream. This also opens up the possibility to send back types from the custom range. It could be an option to use those types for (temporary) debugging if needed. It addition to that, it could carry (custom) application-specific failures, like for example "unknown deposit address" (to be used in tandem with a custom record in |
Good feedback, I considered this too. However, I could use a TLV stream in the |
If you then embed into the stream a 0-length type that signals |
Merging as decided in the IRC meeting: http://www.erisian.com.au/meetbot/lightning-dev/2019/lightning-dev.2019-09-02-20.06.log.html |
|
||
The decrypted onion per-hop payload was not understood by the processing node | ||
or is incomplete. If the failure can be narrowed down to a specific tlv type in | ||
the payload, the erring node may include that `type` and its byte `offset` in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i missed the last spec meeting, so forgive me if i'm missing something, but atm this seems underspecified.
if we couldn't parse a record's type, what should type
be? all possible values of varint
correspond to valid types, so how can we distinguish this case?
also, in most cases i would think that offset
is redundant information, as if the sender knows the invalid type they can inspect the payload they sent to figure out where the error is.
the main thing tho, is that it doesn't seem clear from this tho how the sender should choose between interpreting the offset
or the type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the failure can be narrowed down to a specific tlv type
So if you can't parse a record's type, just leave the default value here (0).
That's where the offset might be useful: you can indicate the starting bit where your parser failed to even extract a valid type.
I don't think sender should dynamically interpret this and take an action based on this: this error code indicates a bug in one of the nodes in the route. This is only meant to be logged for further investigation.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my two cents since I wasn't in attendance.
There are five classes of failures I can think of:
- stream not canonical
- unknown required type
- unable to decode field (wrong encoding by sender or receiver)
- field was omitted or included improperly
- unable to parse type
IMO 1 should never happen, we have unit tests and test vectors to ensure this, plus it would be easy to decode your own payload and verify that it's not canonical. Either way, we only need the type to figure out where we went wrong.
For 2, 3, and 4 we already know the type, and that should be enough for the sender to narrow down where the error occurred.
If you squint tho, 5 is really just a variant of 3. The reason a type would fail to parse is likely due to the preceding record being read short/long. The only reason it wouldn't is if our varint encoding is messed up, which also shouldn't happen since we have comprehensive test vectors for BigSize as well.
Is it not sufficient to send the previously parsed type (or 0 if it occurs on the first type)? The sender can look for the next type in the stream >= to that value that is where the failure occurred. This works even if the type being parsed is 0 or 1.
I don't see much value being added by the index, given that we can identify the record in all of these cases just from the type and the payload that the sender included in the onion. If we are sorely missing it, we can always extend the message later and include it.
That being said, I think we could better specify as in the above which types to send back under which failure conditions. Is that something people are interested in? It'd be nice to have some sort of agreement as what the receiver should place in event of each failure, otherwise everyone needs to be aware of the nuisances of how other implementations pass back these debug values. Just adding them for the sake of logging isn't compelling as they're likely to get lost, however we could have actionable responses if we have some standardization, e.g. try to locate a newer node announcement or explain to the user that the node might not support a certain feature.
On a higher level, I am somewhat concerned in general about including the type at all, as it will allow people to fingerprint the set of required types a node supports assuming they themselves are properly encoding their payloads. An empty failure message would be preferable imo until the privacy implications have been considered more thoroughly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consensus was honestly to not over-engineer this since the contents (type
and offset
are only optionally included in the error). This error shouldn't come up at all between valid implementations.
I agree that if your node supports required types that aren't in the spec that may allow probing that your node runs a specific version of a specific implementation. But feature bits likely let you do that probing as well so you're not learning anything new!
@@ -875,6 +875,16 @@ The channel from the processing node has been disabled. | |||
|
|||
The CLTV expiry in the HTLC is too far in the future. | |||
|
|||
1. type: PERM|22 (`invalid_onion_payload`) | |||
2. data: | |||
* [`varint`:`type`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the first place in the wire protocol where we use this BigSize, is that intentional/what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum yeah that's intentional since this is supposed to be the tlv type, can you detail why that wouldn't be what we want?
|
||
The decrypted onion per-hop payload was not understood by the processing node | ||
or is incomplete. If the failure can be narrowed down to a specific tlv type in | ||
the payload, the erring node may include that `type` and its byte `offset` in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my two cents since I wasn't in attendance.
There are five classes of failures I can think of:
- stream not canonical
- unknown required type
- unable to decode field (wrong encoding by sender or receiver)
- field was omitted or included improperly
- unable to parse type
IMO 1 should never happen, we have unit tests and test vectors to ensure this, plus it would be easy to decode your own payload and verify that it's not canonical. Either way, we only need the type to figure out where we went wrong.
For 2, 3, and 4 we already know the type, and that should be enough for the sender to narrow down where the error occurred.
If you squint tho, 5 is really just a variant of 3. The reason a type would fail to parse is likely due to the preceding record being read short/long. The only reason it wouldn't is if our varint encoding is messed up, which also shouldn't happen since we have comprehensive test vectors for BigSize as well.
Is it not sufficient to send the previously parsed type (or 0 if it occurs on the first type)? The sender can look for the next type in the stream >= to that value that is where the failure occurred. This works even if the type being parsed is 0 or 1.
I don't see much value being added by the index, given that we can identify the record in all of these cases just from the type and the payload that the sender included in the onion. If we are sorely missing it, we can always extend the message later and include it.
That being said, I think we could better specify as in the above which types to send back under which failure conditions. Is that something people are interested in? It'd be nice to have some sort of agreement as what the receiver should place in event of each failure, otherwise everyone needs to be aware of the nuisances of how other implementations pass back these debug values. Just adding them for the sake of logging isn't compelling as they're likely to get lost, however we could have actionable responses if we have some standardization, e.g. try to locate a newer node announcement or explain to the user that the node might not support a certain feature.
On a higher level, I am somewhat concerned in general about including the type at all, as it will allow people to fingerprint the set of required types a node supports assuming they themselves are properly encoding their payloads. An empty failure message would be preferable imo until the privacy implications have been considered more thoroughly.
The specification currently doesn't specify the case where the onion per-hop
payload can't be correctly decoded.
This is somewhat fine with the fixed frames because every field of the payload
can always be interpreted as a numeric value from the input bytes, so it leads
to application errors in upper layers when those values are actually
interpreted (and we realize that for instance we have an invalid
short_channel_id
value).With variable-length tlv streams in the onion payloads, we will encounter
decoding errors (duplicate tlv types, invalid ordering, etc) and the spec
should define the failure code to use in that case.