Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packetdrill: add_addr test is regularly failing: packets arriving before the expected time #312

Closed
matttbe opened this issue Oct 11, 2022 · 4 comments · Fixed by multipath-tcp/packetdrill#98
Assignees

Comments

@matttbe
Copy link
Member

matttbe commented Oct 11, 2022

It is usually failing when executing the retry job, in v4, v6mapped and v6):

# FAIL [/opt/packetdrill/gtests/net/mptcp/add_addr/add_addr_retry_v4.pkt (ipv4-mapped-v6)]
# stdout: 
# stderr: 
# add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec
# script packet:  3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>
# actual packet:  2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>

Note that on the public CI, the tests are running with a higher tolerance than what is written in the scripts:

  • times 4 with a non debug kernel config (0.8s here with this test)
  • times 10 with a debug kernel config (2s here with this test)

Reproduced quite regularly these last days on the public CI:

@matttbe
Copy link
Member Author

matttbe commented Oct 11, 2022

Less regularly, the "server" test fails as well

# FAIL [/opt/packetdrill/gtests/net/mptcp/add_addr/add_addr4_server.pkt (ipv6)]
# stdout: 
# stderr: 
# add_addr4_server.pkt:20: error handling packet: timing error: expected outbound packet at 7.925402 sec but happened at 14.886888 sec; tolerance 2.000000 sec
# script packet:  7.925402 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 198.51.100.2 hmac: 4866319833858278738>
# actual packet: 14.886888 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 198.51.100.2 hmac: 4866319833858278738>

This one is more surprising because the ADD_ADDR is supposed to be sent just after the accept and here it is sent a very long time after! (debug kernel config)

@pabeni
Copy link

pabeni commented Oct 12, 2022

It looks like the root cause is the CI being very slow, so that:

<event starting the timeout, at time t=0>
<next event expected t=0, arriving at t=~tolerance && t< tolerance, ok>
<another event, expected t=tolerance (prev. event time), arriving at t=~2tolerance && t< 2tolerance, ok>
<add_add retransmission, expected at t=1s+2*tolerance (prev event time plus drill time) , correctly arriving in any time in the range [~1s, ~1s+tolerance], error>

No idea how to fix it...

@pabeni
Copy link

pabeni commented Oct 18, 2022

a possible fix would be replacing

+1 <timeout pkt>

with:

+0.9~+1.0 <timeout pkt>

and adjust the latter with the current tolerance with sed, before starting the script

(With the assumption that the timeout must be greater then the tolerance)

matttbe added a commit to matttbe/packetdrill that referenced this issue Oct 20, 2022
On very slow hosts, injected packets cause troubles because it reset
timers in Packetdrill, see:

  multipath-tcp/mptcp_net-next#312

Add a new "safer" version without these injected packets to monitor if
everything is OK like that.

Signed-off-by: Matthieu Baerts <[email protected]>
matttbe added a commit to matttbe/packetdrill that referenced this issue Oct 20, 2022
ADD_ADDR restransmissions are expected to come ~1s after the last sent
packet.

In packetdrill, the timing is compared to the last packet. We would like
to tell Packetdrill that the packet is expected to come "1s not after
the previous packet but the one before" but that's not possible.

In this test the last packet before the ADD_ADDR retransmission is not
the ADD_ADDR that is sent before but it is the injected and falty echo
ADD_ADR. With a (very) slow host, it can takes a bit of time to inject
the packet. Because of that, the retransmitted packet can arrive earlier
than expected: if the falty echo ADD_ADDR injection takes X sec, the
retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X
is bigger than the tolerance, the test fails which seems to happen
regularly on the public CI.

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec
  script packet:  3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>
  actual packet:  2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>

We should then accept packets being sent less than one second before the
injected and falty echo ADD_ADDR.

Even worst with a debug kernel config:

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec
  script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>
  actual packet:  5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>

Injecting the falty echo and processing packets sent by the kernel can
take time and we have situations where packets arrive a few seconds
before the expected time by Packetdrill!

Sadly, we cannot tell Packetdrill the packet is expected to be sent in
the past. So we need to increment the tolerance a bit. But that's find
to do that because a new test has been added in the parent commit: it is
focussing on the ADD_ADDR retransmissions without injecting other
packets in between. This other test can have stricter expected time.

Closes: multipath-tcp/mptcp_net-next#312
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
matttbe added a commit to matttbe/packetdrill that referenced this issue Oct 20, 2022
On very slow hosts, injected packets cause troubles because it reset
timers in Packetdrill, see:

  multipath-tcp/mptcp_net-next#312

Add a new "safer" version without these injected packets to monitor if
everything is OK like that.

Signed-off-by: Matthieu Baerts <[email protected]>
matttbe added a commit to matttbe/packetdrill that referenced this issue Oct 20, 2022
ADD_ADDR restransmissions are expected to come ~1s after the last sent
packet.

In packetdrill, the timing is compared to the last packet. We would like
to tell Packetdrill that the packet is expected to come "1s not after
the previous packet but the one before" but that's not possible.

In this test the last packet before the ADD_ADDR retransmission is not
the ADD_ADDR that is sent before but it is the injected and faulty echo
ADD_ADR. With a (very) slow host, it can takes a bit of time to inject
the packet. Because of that, the retransmitted packet can arrive earlier
than expected: if the faulty echo ADD_ADDR injection takes X sec, the
retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X
is bigger than the tolerance, the test fails which seems to happen
regularly on the public CI.

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec
  script packet:  3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>
  actual packet:  2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>

We should then accept packets being sent less than one second before the
injected and faulty echo ADD_ADDR.

Even worst with a debug kernel config:

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec
  script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>
  actual packet:  5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>

Injecting the faulty echo and processing packets sent by the kernel can
take time and we have situations where packets arrive a few seconds
before the expected time by Packetdrill!

Sadly, we cannot tell Packetdrill the packet is expected to be sent in
the past. So we need to increment the tolerance a bit. But that's find
to do that because a new test has been added in the parent commit: it is
focussing on the ADD_ADDR retransmissions without injecting other
packets in between. This other test can have stricter expected time.

Closes: multipath-tcp/mptcp_net-next#312
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
@matttbe matttbe self-assigned this Oct 20, 2022
@matttbe
Copy link
Member Author

matttbe commented Oct 20, 2022

This PR should help to reduce the false positives seen by the CI: multipath-tcp/packetdrill#98

@matttbe matttbe changed the title packetdrill: add_addr test is regularly failing: packets arriving after the expected time packetdrill: add_addr test is regularly failing: packets arriving before the expected time Oct 20, 2022
matttbe added a commit to multipath-tcp/packetdrill that referenced this issue Nov 7, 2022
On very slow hosts, injected packets cause troubles because it reset
timers in Packetdrill, see:

  multipath-tcp/mptcp_net-next#312

Add a new "safer" version without these injected packets to monitor if
everything is OK like that.

Signed-off-by: Matthieu Baerts <[email protected]>
(cherry picked from commit 59cd5dd)
Signed-off-by: Matthieu Baerts <[email protected]>
matttbe added a commit to multipath-tcp/packetdrill that referenced this issue Nov 7, 2022
ADD_ADDR restransmissions are expected to come ~1s after the last sent
packet.

In packetdrill, the timing is compared to the last packet. We would like
to tell Packetdrill that the packet is expected to come "1s not after
the previous packet but the one before" but that's not possible.

In this test the last packet before the ADD_ADDR retransmission is not
the ADD_ADDR that is sent before but it is the injected and faulty echo
ADD_ADR. With a (very) slow host, it can takes a bit of time to inject
the packet. Because of that, the retransmitted packet can arrive earlier
than expected: if the faulty echo ADD_ADDR injection takes X sec, the
retransmission of the ADD_ADDR by the kernel will take (1 - X) sec. If X
is bigger than the tolerance, the test fails which seems to happen
regularly on the public CI.

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 3.214190 sec but happened at 2.384757 sec; tolerance 0.800000 sec
  script packet:  3.214190 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>
  actual packet:  2.384757 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 10354569113664661296>

We should then accept packets being sent less than one second before the
injected and faulty echo ADD_ADDR.

Even worst with a debug kernel config:

  add_addr_retry_v4.pkt:22: error handling packet: timing error: expected outbound packet at 10.335930 sec but happened at 5.908318 sec; tolerance 2.000000 sec
  script packet: 10.335930 . 1:1(0) ack 1 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>
  actual packet:  5.908318 . 1:1(0) ack 1 win 256 <add_address address_id: 1 ipv4: 192.168.0.3 hmac: 14923819917020444620>

Injecting the faulty echo and processing packets sent by the kernel can
take time and we have situations where packets arrive a few seconds
before the expected time by Packetdrill!

Sadly, we cannot tell Packetdrill the packet is expected to be sent in
the past. So we need to increment the tolerance a bit. But that's find
to do that because a new test has been added in the parent commit: it is
focussing on the ADD_ADDR retransmissions without injecting other
packets in between. This other test can have stricter expected time.

Closes: multipath-tcp/mptcp_net-next#312
Suggested-by: Paolo Abeni <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
(cherry picked from commit 7e14dec)
Signed-off-by: Matthieu Baerts <[email protected]>
matttbe pushed a commit that referenced this issue Mar 27, 2024
In case when is64 == 1 in emit(A64_REV32(is64, dst, dst), ctx) the
generated insn reverses byte order for both high and low 32-bit words,
resuling in an incorrect swap as indicated by the jit test:

[ 9757.262607] test_bpf: #312 BSWAP 16: 0x0123456789abcdef -> 0xefcd jited:1 8 PASS
[ 9757.264435] test_bpf: #313 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89 jited:1 ret 1460850314 != -271733879 (0x5712ce8a != 0xefcdab89)FAIL (1 times)
[ 9757.266260] test_bpf: #314 BSWAP 64: 0x0123456789abcdef -> 0x67452301 jited:1 8 PASS
[ 9757.268000] test_bpf: #315 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89 jited:1 8 PASS
[ 9757.269686] test_bpf: #316 BSWAP 16: 0xfedcba9876543210 -> 0x1032 jited:1 8 PASS
[ 9757.271380] test_bpf: #317 BSWAP 32: 0xfedcba9876543210 -> 0x10325476 jited:1 ret -1460850316 != 271733878 (0xa8ed3174 != 0x10325476)FAIL (1 times)
[ 9757.273022] test_bpf: #318 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe jited:1 7 PASS
[ 9757.274721] test_bpf: #319 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476 jited:1 9 PASS

Fix this by forcing 32bit variant of rev32.

Fixes: 1104247 ("bpf, arm64: Support unconditional bswap")
Signed-off-by: Artem Savkov <[email protected]>
Tested-by: Puranjay Mohan <[email protected]>
Acked-by: Puranjay Mohan <[email protected]>
Acked-by: Xu Kuohai <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants