Add guidance on handling of PTO #467

Yanmei-Liu · 2024-11-11T13:21:00Z

To fix issue #457.
We could add some implementation guidance, but how aggressive depends on the implementations. (according to the discussion in IETF 121.)

gloinul · 2024-11-11T15:30:47Z

draft-ietf-quic-multipath.md

+An implementation should follow the mechanism specified in {{QUIC-RECOVERY}}
+for detecting packet loss on each individual path.
+When an endpoint transmits a significant number of packets on a specific path,
+and the path turned into a blackhole while acknowledgements can not be received from the path,


I think the part after "into a blackhole" is a bit strange. One reason is that either acks sent on this path can't be received, or the equally relevant reason they may not be sent is that no forward packets on this path reaches the endpoint. Thus I suggest:

When an endpoint transmits a significant number of packets on a specific path,
and the path turned into a blackhole resulting in that either no ACK is sent when no packets are received, or no ACKs sent on this path arrive at the packet sender, then the packet sender's probe timeout (PTO) will trigger following {{QUIC-RECOVERY}}. However, no packet's will be declared as lost until the packet sender receives an ACK for this path. To utilise the advantages of the multipath extension, when endpoints detect
that one of the paths has turned into a blackhole, endpoints could choose to
retransmit on other available paths if the congestion control window allows.

I think "black hole" is jargon. How about:

An implementation should follow the mechanism specified in {{QUIC-RECOVERY}}
for detecting packet loss on each individual path. A special case happens when
the PTO timer expires. According to {{QUIC-RECOVERY}}, no packet will be declared
lost until either the packet sender receives a new ACK for this path, or the path itself is finally declared
broken. This cautious process minimizes the risk of spurious retransmissions,
but is may cause significant delivery delay for the frames contained in these "lost packets".

Endpoints could take advantage of the multipath extension, and retransmit the content
of the delayed packets on other available paths if the congestion control window on these
paths allows.

This proposal works for me.

mirjak · 2024-11-12T10:32:13Z

Thanks @huitema, your proposed text looks good. I would suggest to not only say "retransmit on another path" but say "retransmit on another path earlier even before the originally used path is abandon" or something. Do we even want to recommend or hint when to retransmit?

Also do we need to add some normative language in the connection closure section that all packets need to be declared lost after path abandon?

huitema · 2024-11-13T08:46:08Z

We already have this text in Path Close:

...However, knowledge of the connection identifiers received from the peer and of the state of the number space associated to the path SHOULD be retained while packets from the peer might still be in transit, i.e., for a delay of 3 PTO after the PATH_ABANDON frame has been received from the peer, both to avoid generating spurious stateless packets as specified in {{spurious-stateless-reset}} and to be able to acknowledge the last packets received from the peer as specified in {{ack-after-abandon}}.

After receiving or sending a PATH_ABANDON frame, the endpoints SHOULD promptly send PATH_ACK frames to acknowledge all packets received on the path and not yet acknowledged, as specified in {{ack-after-abandon}}. When an endpoint finally deletes all resource associated with the path, the packets sent over the path and not yet acknowledged MUST be considered lost.

I think that's pretty clear, no need to add more details on when packets are considered lost.

mirjak · 2024-11-13T12:14:49Z

Ah, thanks I missed that last MUST. So that is pretty clear and already addresses. No change needed.

Then only my other question remains: Do we want to give more clear advice when to retransmit? Like after one PTO?

huitema · 2024-11-14T02:27:31Z

I don't think we should give more guidance. Mostly because we don't really know. When we will have lots of deployment experience, maybe. But for now, just trust implementers. They cannot merely translate a spec into code, they have to think, and I really expect that different implementers will need to make different tradeoffs based on their deployments.

huitema · 2024-11-14T03:30:17Z

oops! Wrong window. I was working on a PR on a different project and just merged this one. Sorry.

mirjak · 2024-11-14T11:20:16Z

There is a "revert" button if needed. However, I guess this was anyway more or less ready to merge.

I would just have had a few more editorial things but I guess I can also create another (editorial) PR.

For this sentence I think we should say somehow that you can retransmit earlier before the packet is declared lost, otherwise the term "retransmit" might be confusing:

OLD
Endpoints could take advantage of the multipath extension, and retransmit the content
of the delayed packets on other available paths if the congestion control window on these
paths allows.

NEW
Endpoints can take advantage of the multipath extension, and retransmit the content
of the delayed packets on other available paths before they are declared lost
if the congestion control window on these paths allows.

Also not sure if we need the "if the congestion control window on these paths allows" part but I guess it doesn't hurt.

And then I would propose to replace "the path itself is finally declared broken" with "the path is explicitly abandon". However, also note that this is exactly the issue because as we don't have/require an idle timeout, you can basically keep a broken path open forever (potentially hoping it will come back one day) and then never declare those packets as lost. I think we need to mention that explicitly somehow.

[+] Add guidance on handling of PTO

53189b9

Yanmei-Liu requested review from huitema, mirjak and gloinul November 11, 2024 13:21

Yanmei-Liu linked an issue Nov 11, 2024 that may be closed by this pull request

Handling of PTO #457

Closed

gloinul reviewed Nov 11, 2024

View reviewed changes

[+] Updating description, thanks to Christian and Magnus's comments.

d691073

Yanmei-Liu added the clarification label Nov 12, 2024

gloinul approved these changes Nov 13, 2024

View reviewed changes

huitema merged commit df0563b into main Nov 14, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add guidance on handling of PTO #467

Add guidance on handling of PTO #467

Yanmei-Liu commented Nov 11, 2024

gloinul Nov 11, 2024

huitema Nov 12, 2024 •

edited

Loading

gloinul Nov 13, 2024

mirjak commented Nov 12, 2024

huitema commented Nov 13, 2024

mirjak commented Nov 13, 2024

huitema commented Nov 14, 2024

huitema commented Nov 14, 2024

mirjak commented Nov 14, 2024

Add guidance on handling of PTO #467

Add guidance on handling of PTO #467

Conversation

Yanmei-Liu commented Nov 11, 2024

gloinul Nov 11, 2024

Choose a reason for hiding this comment

huitema Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

gloinul Nov 13, 2024

Choose a reason for hiding this comment

mirjak commented Nov 12, 2024

huitema commented Nov 13, 2024

mirjak commented Nov 13, 2024

huitema commented Nov 14, 2024

huitema commented Nov 14, 2024

mirjak commented Nov 14, 2024

huitema Nov 12, 2024 •

edited

Loading