BOLT 04: Add failure code for invalid payload. #627

t-bast · 2019-07-08T09:51:58Z

The specification currently doesn't specify the case where the onion per-hop
payload can't be correctly decoded.

This is somewhat fine with the fixed frames because every field of the payload
can always be interpreted as a numeric value from the input bytes, so it leads
to application errors in upper layers when those values are actually
interpreted (and we realize that for instance we have an invalid
short_channel_id value).

With variable-length tlv streams in the onion payloads, we will encounter
decoding errors (duplicate tlv types, invalid ordering, etc) and the spec
should define the failure code to use in that case.

t-bast · 2019-07-08T09:53:20Z

04-onion-routing.md

@@ -724,6 +724,12 @@ General permanent failure of the processing node.

 The processing node has a required feature which was not in this onion.

+1. type: BADONION|PERM (`invalid_onion_payload`)


I treat this as the default BADONION error, but maybe it would make more sense to assign it a specific bit (e.g. BADONION|PERM|1 or BADONION|PERM|7).
Let me know what you think would be best.

t-bast · 2019-07-12T12:21:27Z

I'm curious to get your feedback on this @cdecker @Roasbeef.
It feels to me that we really need to define this failure code otherwise our implementations will use different errors for that scenario.

rustyrussell · 2019-07-22T20:37:59Z

Hmm, we used to have a scheme where the offset of the error was in the error message, but I can't find it so maybe it was in my imagination! Would be nice to have, to offer a debugging hint as to what field you disliked.

This shouldn't be BADONION I think; that's for when we literally can't make a reply due to parsing errors. This is a normal error we can generate ourselves, so that bit probably shouldn't be used.

t-bast · 2019-07-23T08:13:39Z

This shouldn't be BADONION I think

Got it, it makes sense to reserve BADONION for cryptographic failures and general onion support (onion version check).

Then what about using PERM | 22 (invalid_onion_payload)? I don't think it really makes sense to include the onion's sha256 since it wasn't a BADONION error.

Should I include some data in the error (should we define an enum for TLV error codes, containing things like UnknownEvenType, DuplicateType, InvalidOrdering, etc)?

We should note as well that another reason why this could fail is that onion payload might be missing some required information (amount, expiry or channel_id). We can't really provide a byte offset of the error in that case so I don't know if it's worth adding.

t-bast · 2019-07-31T07:24:36Z

Gentle ping @cfromknecht @rustyrussell, I'm curious to know what error code your implementation uses in that case? It would be nice to align before merging variable-onion to implementations' master branches.

joostjager · 2019-08-05T14:11:14Z

@t-bast is it right that this new message would be just for debugging? The downside of another message is that it may increase the node fingerprinting surface. Especially if the contents of it is not very strict. It may make it easier to probe the network and find out what implementations and (potentially vulnerable) versions are running.

t-bast · 2019-08-05T14:19:17Z

I thought it could be useful for debugging (especially at the beginning since we're introducing both tlv and variable-length onion to the network).
However, if you think this doesn't provide enough benefits, I can retract this and simply send a PERM error without any information (which is good to avoid finger-printing).

The specification currently doesn't specify the case where the onion per-hop payload can't be correctly decoded. This is somewhat fine with the fixed frames because every field of the payload can always be interpreted as a numeric value from the input bytes, so it leads to application errors in upper layers when those values are actually interpreted (and we realize that for instance we have an invalid short_channel_id` value). With variable-length tlv streams in the onion payloads, we will encounter decoding errors (duplicate tlv types, invalid ordering, etc) and the spec should define the failure code to use in that case.

joostjager · 2019-08-22T10:15:24Z

One comment along a different axis:

Do we want to move to tlv for the failure message? If yes, we could now define the last failure code we ever need. Its value would be a tlv stream. Any new failure codes like the one proposed in this pr can then be defined as tlv types in that stream.

This also opens up the possibility to send back types from the custom range. It could be an option to use those types for (temporary) debugging if needed.

It addition to that, it could carry (custom) application-specific failures, like for example "unknown deposit address" (to be used in tandem with a custom record in UpdateAddHtlc).

t-bast · 2019-08-22T11:13:32Z

Do we want to move to tlv for the failure message?

Good feedback, I considered this too.
This is a bigger change though and can be done in many ways (and I expect a lot of debate about it) which is why I chose to do a smaller and simpler change to address a short-term whole in the spec.

However, I could use a TLV stream in the data part of invalid_onion_payload, that could make sense and pave the way for the future.

joostjager · 2019-08-22T11:31:37Z

If you then embed into the stream a 0-length type that signals invalid_onion_payload, we only have to rename it at a later stage to fail_generic :)

t-bast · 2019-09-03T06:53:55Z

Merging as decided in the IRC meeting: http://www.erisian.com.au/meetbot/lightning-dev/2019/lightning-dev.2019-09-02-20.06.log.html
Exploring the use of TLV in onion failure messages is deferred to a later PR.

cfromknecht · 2019-09-04T21:37:26Z

04-onion-routing.md

+
+The decrypted onion per-hop payload was not understood by the processing node
+or is incomplete. If the failure can be narrowed down to a specific tlv type in
+the payload, the erring node may include that `type` and its byte `offset` in


i missed the last spec meeting, so forgive me if i'm missing something, but atm this seems underspecified.

if we couldn't parse a record's type, what should type be? all possible values of varint correspond to valid types, so how can we distinguish this case?

also, in most cases i would think that offset is redundant information, as if the sender knows the invalid type they can inspect the payload they sent to figure out where the error is.

the main thing tho, is that it doesn't seem clear from this tho how the sender should choose between interpreting the offset or the type.

If the failure can be narrowed down to a specific tlv type

So if you can't parse a record's type, just leave the default value here (0).
That's where the offset might be useful: you can indicate the starting bit where your parser failed to even extract a valid type.

I don't think sender should dynamically interpret this and take an action based on this: this error code indicates a bug in one of the nodes in the route. This is only meant to be logged for further investigation.

Does that make sense?

Just my two cents since I wasn't in attendance.

There are five classes of failures I can think of:

stream not canonical

unknown required type

unable to decode field (wrong encoding by sender or receiver)

field was omitted or included improperly

unable to parse type

IMO 1 should never happen, we have unit tests and test vectors to ensure this, plus it would be easy to decode your own payload and verify that it's not canonical. Either way, we only need the type to figure out where we went wrong.

For 2, 3, and 4 we already know the type, and that should be enough for the sender to narrow down where the error occurred.

If you squint tho, 5 is really just a variant of 3. The reason a type would fail to parse is likely due to the preceding record being read short/long. The only reason it wouldn't is if our varint encoding is messed up, which also shouldn't happen since we have comprehensive test vectors for BigSize as well.

Is it not sufficient to send the previously parsed type (or 0 if it occurs on the first type)? The sender can look for the next type in the stream >= to that value that is where the failure occurred. This works even if the type being parsed is 0 or 1.

I don't see much value being added by the index, given that we can identify the record in all of these cases just from the type and the payload that the sender included in the onion. If we are sorely missing it, we can always extend the message later and include it.

That being said, I think we could better specify as in the above which types to send back under which failure conditions. Is that something people are interested in? It'd be nice to have some sort of agreement as what the receiver should place in event of each failure, otherwise everyone needs to be aware of the nuisances of how other implementations pass back these debug values. Just adding them for the sake of logging isn't compelling as they're likely to get lost, however we could have actionable responses if we have some standardization, e.g. try to locate a newer node announcement or explain to the user that the node might not support a certain feature.

On a higher level, I am somewhat concerned in general about including the type at all, as it will allow people to fingerprint the set of required types a node supports assuming they themselves are properly encoding their payloads. An empty failure message would be preferable imo until the privacy implications have been considered more thoroughly.

The consensus was honestly to not over-engineer this since the contents (type and offset are only optionally included in the error). This error shouldn't come up at all between valid implementations.
I agree that if your node supports required types that aren't in the spec that may allow probing that your node runs a specific version of a specific implementation. But feature bits likely let you do that probing as well so you're not learning anything new!

cfromknecht · 2019-09-05T17:49:41Z

04-onion-routing.md

@@ -875,6 +875,16 @@ The channel from the processing node has been disabled.

 The CLTV expiry in the HTLC is too far in the future.

+1. type: PERM|22 (`invalid_onion_payload`)
+2. data:
+   * [`varint`:`type`]


this is the first place in the wire protocol where we use this BigSize, is that intentional/what we want?

Hum yeah that's intentional since this is supposed to be the tlv type, can you detail why that wouldn't be what we want?

cfromknecht · 2019-09-05T19:37:41Z

04-onion-routing.md

+
+The decrypted onion per-hop payload was not understood by the processing node
+or is incomplete. If the failure can be narrowed down to a specific tlv type in
+the payload, the erring node may include that `type` and its byte `offset` in


Just my two cents since I wasn't in attendance.

There are five classes of failures I can think of:

stream not canonical

unknown required type

unable to decode field (wrong encoding by sender or receiver)

field was omitted or included improperly

unable to parse type

IMO 1 should never happen, we have unit tests and test vectors to ensure this, plus it would be easy to decode your own payload and verify that it's not canonical. Either way, we only need the type to figure out where we went wrong.

For 2, 3, and 4 we already know the type, and that should be enough for the sender to narrow down where the error occurred.

If you squint tho, 5 is really just a variant of 3. The reason a type would fail to parse is likely due to the preceding record being read short/long. The only reason it wouldn't is if our varint encoding is messed up, which also shouldn't happen since we have comprehensive test vectors for BigSize as well.

Is it not sufficient to send the previously parsed type (or 0 if it occurs on the first type)? The sender can look for the next type in the stream >= to that value that is where the failure occurred. This works even if the type being parsed is 0 or 1.

I don't see much value being added by the index, given that we can identify the record in all of these cases just from the type and the payload that the sender included in the onion. If we are sorely missing it, we can always extend the message later and include it.

That being said, I think we could better specify as in the above which types to send back under which failure conditions. Is that something people are interested in? It'd be nice to have some sort of agreement as what the receiver should place in event of each failure, otherwise everyone needs to be aware of the nuisances of how other implementations pass back these debug values. Just adding them for the sake of logging isn't compelling as they're likely to get lost, however we could have actionable responses if we have some standardization, e.g. try to locate a newer node announcement or explain to the user that the node might not support a certain feature.

On a higher level, I am somewhat concerned in general about including the type at all, as it will allow people to fingerprint the set of required types a node supports assuming they themselves are properly encoding their payloads. An empty failure message would be preferable imo until the privacy implications have been considered more thoroughly.

t-bast requested a review from cdecker July 8, 2019 09:51

t-bast commented Jul 8, 2019

View reviewed changes

t-bast changed the title ~~Bolt04: Add failure code for invalid payload.~~ BOLT 04: Add failure code for invalid payload. Jul 8, 2019

t-bast force-pushed the b4-malformed-onion-failure-message branch from 490bfd2 to 9f8d44e Compare July 9, 2019 07:34

t-bast added the Meeting Discussion Raise at next meeting label Jul 12, 2019

niftynei removed the Meeting Discussion Raise at next meeting label Jul 22, 2019

t-bast mentioned this pull request Jul 23, 2019

Activate support for variable-length onion ACINQ/eclair#1087

Merged

t-bast added the Meeting Discussion Raise at next meeting label Aug 5, 2019

cfromknecht mentioned this pull request Aug 21, 2019

multi: update the switch+router to be aware of the new TLV EOB format lightningnetwork/lnd#3362

Merged

3 tasks

t-bast added 2 commits August 22, 2019 10:55

Add error data to the invalid_onion_payload error

60c88fa

t-bast force-pushed the b4-malformed-onion-failure-message branch from e8498b1 to 60c88fa Compare August 22, 2019 09:21

t-bast merged commit db92932 into master Sep 3, 2019

t-bast deleted the b4-malformed-onion-failure-message branch September 3, 2019 06:54

cfromknecht reviewed Sep 4, 2019

View reviewed changes

cfromknecht reviewed Sep 5, 2019

View reviewed changes

lightning-developer mentioned this pull request Dec 5, 2021

Add payment metadata to payment request (feature 48) #912

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BOLT 04: Add failure code for invalid payload. #627

BOLT 04: Add failure code for invalid payload. #627

t-bast commented Jul 8, 2019

t-bast Jul 8, 2019

t-bast commented Jul 12, 2019

rustyrussell commented Jul 22, 2019

t-bast commented Jul 23, 2019 •

edited

Loading

t-bast commented Jul 31, 2019

joostjager commented Aug 5, 2019

t-bast commented Aug 5, 2019

joostjager commented Aug 22, 2019

t-bast commented Aug 22, 2019

joostjager commented Aug 22, 2019

t-bast commented Sep 3, 2019

cfromknecht Sep 4, 2019 •

edited

Loading

t-bast Sep 5, 2019

cfromknecht Sep 5, 2019

t-bast Sep 6, 2019

cfromknecht Sep 5, 2019

t-bast Sep 6, 2019

cfromknecht Sep 5, 2019

		@@ -724,6 +724,12 @@ General permanent failure of the processing node.

		The processing node has a required feature which was not in this onion.

		1. type: BADONION\|PERM (`invalid_onion_payload`)

BOLT 04: Add failure code for invalid payload. #627

BOLT 04: Add failure code for invalid payload. #627

Conversation

t-bast commented Jul 8, 2019

t-bast Jul 8, 2019

Choose a reason for hiding this comment

t-bast commented Jul 12, 2019

rustyrussell commented Jul 22, 2019

t-bast commented Jul 23, 2019 • edited Loading

t-bast commented Jul 31, 2019

joostjager commented Aug 5, 2019

t-bast commented Aug 5, 2019

joostjager commented Aug 22, 2019

t-bast commented Aug 22, 2019

joostjager commented Aug 22, 2019

t-bast commented Sep 3, 2019

cfromknecht Sep 4, 2019 • edited Loading

Choose a reason for hiding this comment

t-bast Sep 5, 2019

Choose a reason for hiding this comment

cfromknecht Sep 5, 2019

Choose a reason for hiding this comment

t-bast Sep 6, 2019

Choose a reason for hiding this comment

cfromknecht Sep 5, 2019

Choose a reason for hiding this comment

t-bast Sep 6, 2019

Choose a reason for hiding this comment

cfromknecht Sep 5, 2019

Choose a reason for hiding this comment

t-bast commented Jul 23, 2019 •

edited

Loading

cfromknecht Sep 4, 2019 •

edited

Loading