Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mptcp: add_addr: more tolerant with retrans timeout after errors #98

Merged

Conversation

matttbe
Copy link
Member

@matttbe matttbe commented Oct 20, 2022

ADD_ADDR restransmissions are expected to come ~1s after the last sent packet.

In packetdrill, the timing is compared to the last packet. We would like to tell Packetdrill that the packet is expected to come "1s not after the previous packet but the one before" but that's not possible.

In this test the last packet before the ADD_ADDR retransmission is not the ADD_ADDR that is sent before but it is the injected and faulty echo ADD_ADR. With a (very) slow host, it can takes a bit of time to inject the packet. Because of that, the retransmitted packet can arrive earlier than expected: if the faulty echo ADD_ADDR injection takes X sec, the retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X is bigger than the tolerance, the test fails which seems to happen regularly on the public CI.

add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec
script packet:  3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>
actual packet:  2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>

We should then accept packets being sent less than one second before the injected and faulty echo ADD_ADDR.

Even worst with a debug kernel config:

add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec
script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>
actual packet:  5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>

Injecting the faulty echo and processing packets sent by the kernel can take time and we have situations where packets arrive a few seconds before the expected time by Packetdrill!

Sadly, we cannot tell Packetdrill the packet is expected to be sent in the past. So we need to increment the tolerance a bit. But that's find to do that because a new test has been added in the parent commit: it is focussing on the ADD_ADDR retransmissions without injecting other packets in between. This other test can have stricter expected time.

Closes: multipath-tcp/mptcp_net-next#312
Suggested-by: Paolo Abeni
Signed-off-by: Matthieu Baerts

The only difference was the injected and faulty ADD_ADDR: if it is
faulty, it doesn't really matter if it is in v4 or v6, the main point is
to send an ADD_ADDR with the wrong IP.

While at it, also add 'errors' in the name in preparation to other
'retry' scenarios.

Signed-off-by: Matthieu Baerts <[email protected]>
On very slow hosts, injected packets cause troubles because it reset
timers in Packetdrill, see:

  multipath-tcp/mptcp_net-next#312

Add a new "safer" version without these injected packets to monitor if
everything is OK like that.

Signed-off-by: Matthieu Baerts <[email protected]>
ADD_ADDR restransmissions are expected to come ~1s after the last sent
packet.

In packetdrill, the timing is compared to the last packet. We would like
to tell Packetdrill that the packet is expected to come "1s not after
the previous packet but the one before" but that's not possible.

In this test the last packet before the ADD_ADDR retransmission is not
the ADD_ADDR that is sent before but it is the injected and faulty echo
ADD_ADR. With a (very) slow host, it can takes a bit of time to inject
the packet. Because of that, the retransmitted packet can arrive earlier
than expected: if the faulty echo ADD_ADDR injection takes X sec, the
retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X
is bigger than the tolerance, the test fails which seems to happen
regularly on the public CI.

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec
  script packet:  3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>
  actual packet:  2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>

We should then accept packets being sent less than one second before the
injected and faulty echo ADD_ADDR.

Even worst with a debug kernel config:

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec
  script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>
  actual packet:  5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>

Injecting the faulty echo and processing packets sent by the kernel can
take time and we have situations where packets arrive a few seconds
before the expected time by Packetdrill!

Sadly, we cannot tell Packetdrill the packet is expected to be sent in
the past. So we need to increment the tolerance a bit. But that's find
to do that because a new test has been added in the parent commit: it is
focussing on the ADD_ADDR retransmissions without injecting other
packets in between. This other test can have stricter expected time.

Closes: multipath-tcp/mptcp_net-next#312
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
@matttbe
Copy link
Member Author

matttbe commented Oct 31, 2022

@pabeni / @dcaratti : is this PR OK for you? :)

@dcaratti dcaratti merged commit c90ef14 into multipath-tcp:mptcp-net-next Nov 4, 2022
@matttbe matttbe deleted the gh-312-add-addr-unstable branch November 4, 2022 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

packetdrill: add_addr test is regularly failing: packets arriving before the expected time
3 participants