Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guidance on handling of PTO #467

Merged
merged 2 commits into from
Nov 14, 2024
Merged

Add guidance on handling of PTO #467

merged 2 commits into from
Nov 14, 2024

Conversation

Yanmei-Liu
Copy link
Contributor

To fix issue #457.
We could add some implementation guidance, but how aggressive depends on the implementations. (according to the discussion in IETF 121.)

@Yanmei-Liu Yanmei-Liu linked an issue Nov 11, 2024 that may be closed by this pull request
An implementation should follow the mechanism specified in {{QUIC-RECOVERY}}
for detecting packet loss on each individual path.
When an endpoint transmits a significant number of packets on a specific path,
and the path turned into a blackhole while acknowledgements can not be received from the path,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the part after "into a blackhole" is a bit strange. One reason is that either acks sent on this path can't be received, or the equally relevant reason they may not be sent is that no forward packets on this path reaches the endpoint. Thus I suggest:

When an endpoint transmits a significant number of packets on a specific path,
and the path turned into a blackhole resulting in that either no ACK is sent when no packets are received, or no ACKs sent on this path arrive at the packet sender, then the packet sender's probe timeout (PTO) will trigger following {{QUIC-RECOVERY}}. However, no packet's will be declared as lost until the packet sender receives an ACK for this path. To utilise the advantages of the multipath extension, when endpoints detect
that one of the paths has turned into a blackhole, endpoints could choose to
retransmit on other available paths if the congestion control window allows.

Copy link
Contributor

@huitema huitema Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "black hole" is jargon. How about:

An implementation should follow the mechanism specified in {{QUIC-RECOVERY}}
for detecting packet loss on each individual path. A special case happens when
the PTO timer expires. According to {{QUIC-RECOVERY}}, no packet will be declared
lost until either the packet sender receives a new ACK for this path, or the path itself is finally declared
broken. This cautious process minimizes the risk of spurious retransmissions,
but is may cause significant delivery delay for the frames contained in these "lost packets".

Endpoints could take advantage of the multipath extension, and retransmit the content
of the delayed packets on other available paths if the congestion control window on these
paths allows.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal works for me.

@mirjak
Copy link
Collaborator

mirjak commented Nov 12, 2024

Thanks @huitema, your proposed text looks good. I would suggest to not only say "retransmit on another path" but say "retransmit on another path earlier even before the originally used path is abandon" or something. Do we even want to recommend or hint when to retransmit?

Also do we need to add some normative language in the connection closure section that all packets need to be declared lost after path abandon?

@huitema
Copy link
Contributor

huitema commented Nov 13, 2024

We already have this text in Path Close:

...However, knowledge of the connection identifiers received from the peer and of the state of the number space associated to the path SHOULD be retained while packets from the peer might still be in transit, i.e., for a delay of 3 PTO after the PATH_ABANDON frame has been received from the peer, both to avoid generating spurious stateless packets as specified in {{spurious-stateless-reset}} and to be able to acknowledge the last packets received from the peer as specified in {{ack-after-abandon}}.

After receiving or sending a PATH_ABANDON frame, the endpoints SHOULD promptly send PATH_ACK frames to acknowledge all packets received on the path and not yet acknowledged, as specified in {{ack-after-abandon}}. When an endpoint finally deletes all resource associated with the path, the packets sent over the path and not yet acknowledged MUST be considered lost.

I think that's pretty clear, no need to add more details on when packets are considered lost.

@mirjak
Copy link
Collaborator

mirjak commented Nov 13, 2024

Ah, thanks I missed that last MUST. So that is pretty clear and already addresses. No change needed.

Then only my other question remains: Do we want to give more clear advice when to retransmit? Like after one PTO?

@huitema
Copy link
Contributor

huitema commented Nov 14, 2024

I don't think we should give more guidance. Mostly because we don't really know. When we will have lots of deployment experience, maybe. But for now, just trust implementers. They cannot merely translate a spec into code, they have to think, and I really expect that different implementers will need to make different tradeoffs based on their deployments.

@huitema huitema merged commit df0563b into main Nov 14, 2024
3 checks passed
@huitema
Copy link
Contributor

huitema commented Nov 14, 2024

oops! Wrong window. I was working on a PR on a different project and just merged this one. Sorry.

@mirjak
Copy link
Collaborator

mirjak commented Nov 14, 2024

There is a "revert" button if needed. However, I guess this was anyway more or less ready to merge.

I would just have had a few more editorial things but I guess I can also create another (editorial) PR.

For this sentence I think we should say somehow that you can retransmit earlier before the packet is declared lost, otherwise the term "retransmit" might be confusing:

OLD
Endpoints could take advantage of the multipath extension, and retransmit the content
of the delayed packets on other available paths if the congestion control window on these
paths allows.

NEW
Endpoints can take advantage of the multipath extension, and retransmit the content
of the delayed packets on other available paths before they are declared lost
if the congestion control window on these paths allows.

Also not sure if we need the "if the congestion control window on these paths allows" part but I guess it doesn't hurt.

And then I would propose to replace "the path itself is finally declared broken" with "the path is explicitly abandon". However, also note that this is exactly the issue because as we don't have/require an idle timeout, you can basically keep a broken path open forever (potentially hoping it will come back one day) and then never declare those packets as lost. I think we need to mention that explicitly somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling of PTO
4 participants