Skip to content

nixos/tests/networking: fix flaky scripted.dhcpSimple test#361834

Merged
misuzu merged 1 commit intoNixOS:masterfrom
misuzu:flaky-dhcpSimple-fix
Dec 5, 2024
Merged

nixos/tests/networking: fix flaky scripted.dhcpSimple test#361834
misuzu merged 1 commit intoNixOS:masterfrom
misuzu:flaky-dhcpSimple-fix

Conversation

@misuzu
Copy link
Contributor

@misuzu misuzu commented Dec 4, 2024

The underlying issue is unknown, but starting the router first and then the client makes the test not flaky.

It's also weird that I can't reproduce this issue on my machine (Ryzen 7 5700U laptop, kernel 6.6.63) while it's reliably reproducible on our aarch64 community box (kernel 5.15.163).

#360089

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

The underlying issue is unknown, but starting
the router first and then the client makes the test not flaky.
@github-actions github-actions bot added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Dec 4, 2024
@misuzu
Copy link
Contributor Author

misuzu commented Dec 4, 2024

@ofborg test networking.scripted.dhcpSimple networking.networkd.dhcpSimple

@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Dec 4, 2024
@misuzu misuzu requested a review from vcunat December 4, 2024 18:35
@vcunat
Copy link
Member

vcunat commented Dec 4, 2024

Testing looks good to me. On an idle larger machine I did a few runs of both before and after this PR, getting 100% failures and 100% successes.

@misuzu misuzu merged commit 2f12b59 into NixOS:master Dec 5, 2024
@misuzu misuzu deleted the flaky-dhcpSimple-fix branch December 5, 2024 12:52
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Dec 6, 2024

Successfully created backport PR for release-24.11:

@numinit
Copy link
Contributor

numinit commented Dec 13, 2024

I may be reproducing something like this on bare metal (24.11.20241205.4dc2fc4 (Vicuna)). Unsure if it's the same issue, but dhcpcd does this and now I'm suspicious of this fix:

Dec 12 20:14:12 monomyth systemd[1]: Started DHCP Client.
Dec 12 20:14:12 monomyth dhcpcd[168047]: lan: probing for an IPv4LL address
Dec 12 20:14:14 monomyth dhcpcd[168047]: arp_announce1: Invalid argument
Dec 12 20:14:17 monomyth dhcpcd[168047]: backbone: using IPv4LL address 169.254.246.184
Dec 12 20:14:17 monomyth dhcpcd[168047]: backbone: adding route to 169.254.0.0/16
Dec 12 20:14:17 monomyth dhcpcd[168047]: backbone: adding default route
Dec 12 20:14:17 monomyth dhcpcd[168047]: lan: using IPv4LL address 169.254.246.184
Dec 12 20:14:17 monomyth dhcpcd[168047]: lan: adding route to 169.254.0.0/16
Dec 12 20:14:18 monomyth dhcpcd[168047]: backbone: deleting default route

I have two virtual interfaces configured using VLANs, lan and backbone. When I unplug and replug the ethernet cable, it suddenly starts working again.

Further, when dhcpcd renews, I get this:

Dec 12 20:14:09 monomyth dhcpcd[168047]: arp_probe1: Invalid argument
Dec 12 20:14:09 monomyth dhcpcd[168047]: arp_probe1: Invalid argument
Dec 12 20:14:10 monomyth dhcpcd[168047]: arp_probe1: Invalid argument
Dec 12 20:14:10 monomyth dhcpcd[168047]: arp_probe1: Invalid argument
Dec 12 20:14:11 monomyth dhcpcd[168047]: backbone: probing for an IPv4LL address
Dec 12 20:14:12 monomyth dhcpcd[168047]: arp_announce1: Invalid argument
Dec 12 20:14:12 monomyth dhcpcd[168047]: arp_announce1: Invalid argument

The sequencing here, ARP errors, and failure to get a DHCP address are making me wonder why this test got flaky.

@misuzu
Copy link
Contributor Author

misuzu commented Dec 13, 2024

The permission errors might be because of #336988
Maybe you could try reverting it locally and check if it makes any difference

@numinit
Copy link
Contributor

numinit commented Dec 13, 2024

That's what I figured it was, and was going to be my next investigative step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants