-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FRR fails to install route received for an unknown but later-created VRF #13708
Comments
This issue is stale because it has been open 180 days with no activity. Comment or remove the |
This issue will be automatically closed in the specified period unless there is further activity. |
This issue still persists. |
piotrsuchy
added a commit
to piotrsuchy/frr
that referenced
this issue
May 22, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description FRRouting#13708
piotrsuchy
added a commit
to piotrsuchy/frr
that referenced
this issue
May 22, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here FRRouting#13708 Signed-off-by: Piotr Suchy <[email protected]>
piotrsuchy
added a commit
to piotrsuchy/frr
that referenced
this issue
Jun 23, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here FRRouting#13708 Signed-off-by: Piotr Suchy <[email protected]>
piotrsuchy
added a commit
to piotrsuchy/frr
that referenced
this issue
Jun 23, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here FRRouting#13708 Signed-off-by: Piotr Suchy <[email protected]>
piotrsuchy
added a commit
to piotrsuchy/frr
that referenced
this issue
Jun 23, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here FRRouting#13708 Signed-off-by: Piotr Suchy <[email protected]>
piotrsuchy
added a commit
to piotrsuchy/frr
that referenced
this issue
Jun 24, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here FRRouting#13708 Signed-off-by: Piotr Suchy <[email protected]>
piotrsuchy
added a commit
to piotrsuchy/frr
that referenced
this issue
Jun 24, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here FRRouting#13708 Signed-off-by: Piotr Suchy <[email protected]>
mergify bot
pushed a commit
that referenced
this issue
Jun 28, 2024
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here #13708 Signed-off-by: Piotr Suchy <[email protected]> (cherry picked from commit 8044d73)
It should be fixed already by #16306. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
To Reproduce
In this setup we have two hosts, host1 and host2. Each host has the same VRF device added and configured exactly in the same way. We add static routes to the hosts through our FRR configuration file and exchange the static routes via FRR (with a routeserver as the intermediary). We then repeat the below test in an "async fashion on the two hosts" until we see failure:
A key point here is that the two hosts are doing these steps in a non-synchronized way. This means any host can receive a route in a VRF that it doesn't know of yet, but the VRF will be created soon thereafter. This seems to be a key condition in triggering the issue in this ticket.
At some point some or all of the static routes on host1 will fail to get installed on host2 (or vice versa). FRR will believe the routes have been installed, but at least one is not installed in the kernel. Note that:
Note that the test is stopped as soon as we detect this failure. The log files will show many runs of the above test, and the last run is the failure. Also, we are not doing an FRR reload to alleviate the failure in the log file.
Expected behavior
The routes should always get installed into the kernel by FRR, i.e. the FRR RIB should never fall out of sync with the kernel for an indefinite amount of time.
Versions
with two cherry-picked PRs on top of it:
lib, zebra: Fix EVPN nexthop config order #12524
zebra: re-install NHG on interface up
Additional context
Host1 - FRR VRF RIB routes:
Host1 - routes installed in the VRF in the kernel (ip route show vrf ...):
Note: The route to 10.5.89.3 is not installed in the kernel even though FRR claims it is.
Host2 - FRR VRF RIB routes:
Host2 - routes installed in the VRF in the kernel (ip route show vrf ...):
On host2 the routes are installed correctly, but we have seen cases where the routes were not installed properly on host2.
Attached files
host1.tar.gz
host2.tar.gz
They are compressed as they are fairly large (~10-20MB).
Partial log analysis
We believe the key part of the logs related to the issue is the below part in host1's FRR log file.
As far as we can tell this part means that the route 10.5.89.3 is in an unknown VRF, i.e. the VRF device is not yet present on host1. This means Zebra will (and cannot) install the route.
3 seconds later, the VRF device is now present, and the route is processed again. However, Zebra does absolutely nothing with the route.
The last log line in the log is several hours later, and the route has still not be installed:
Netdevices
Probably not relevant but it is here for completeness.
Host2:
Host1:
The text was updated successfully, but these errors were encountered: