I had a random connection issue in Podman containers which affected only Enterprise Linux installations. Sometimes, I was unable to connect to another totally unrelated host (even google.com).
Upon further inspection I could see broken TCP communication while handshaking: SYN
, SYN-ACK
, but no ACK
:
This issue only affects communication under the following circumstances:
- You use Enterprise Linux (Alma, Rocky, RHEL)
- You have two bridge networks, net0 and net1 (name arbitrary)
- You have one container on net0 which publishes ports
- You have a second container on net0 and net1
- The issue occurs for any outbound communication that leaves on the network interface corresponding to net0 (this is a random component - sometime the packets leave on one interface, sometimes on the other one!)
This scenario can be set up with vagrant up rocky93
. The other VMs are included for reference/testing purposes, I can confirm that openSUSE
, Fedora
and Ubuntu
work as-is.
After some time you should see the result of the test stage:
It is clear that communication works on one interface, but not the other one. Because of containers/podman#12850, the affected interface name may differ.
After comparing the sysctl
s of Rocky 9.3 and Ubuntu 23.10, I was able to isolate a few interesting differences. After a bit of trial-and-error I was able to boil it down to net.ipv4.conf.default.rp_filter
, which is set to 1
in Enterprise Linux and 2
in Ubuntu.
This sysctl
basically tells the kernel to drop any communication for which the packet's path is suboptimal (as far as I understood RFC3704 from a few minutes of reading) if 1
is set or allow it if by any chance this packet may be legitimate if 2
is set.
The value 0
disables any path checks.
Now if you uncomment line 4 of the VM setup script and run vagrant provision rocky93
, you will see everything works as expected: