Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frr service restart fails with zebra error- zclient_send_message: buffer_write failed to zclient error #17475

Open
2 tasks done
arjunramu opened this issue Nov 21, 2024 · 6 comments
Labels
triage Needs further investigation unsupported-version The version of FRR is unsupported

Comments

@arjunramu
Copy link

arjunramu commented Nov 21, 2024

Description

frr.log

Our setup:

Site A: Ubuntu 22.04 Linux VM and FRR for BGP
Site B: Catalyst Router
FRR configured with a very minimalistic config - just to exchange routes with neighbors

bgp session established between A to B.
systemctl restart frr failed after bgp session was established.

Issue: FRR service restart failing with zebra error- zclient_send_message: buffer_write failed to zclient error

root@10-1-1-1:~# systemctl restart frr
root@10-1-1-1:~# systemctl status frr
× frr.service - FRRouting
     Loaded: loaded (/lib/systemd/system/frr.service; enabled; vendor preset: enabled)
     Active: failed (Result: start-limit-hit) since Tue 2024-11-12 07:48:55 UTC; 57min ago
       Docs: https://frrouting.readthedocs.io/en/latest/setup.html
    Process: 1494090 ExecStart=/usr/lib/frr/frrinit.sh start (code=exited, status=0/SUCCESS)
    Process: 1495628 ExecStop=/usr/lib/frr/frrinit.sh stop (code=exited, status=0/SUCCESS)
   Main PID: 1494100 (code=exited, status=0/SUCCESS)
        CPU: 1.890s

bgp configuration -

master-10-1-1-1# sh running-config 
Building configuration...

Current configuration:
!
frr version 8.1
frr defaults traditional
hostname master-10-1-1-1
log file /var/log/frr/bgpd.log
log syslog
no ipv6 forwarding
bgp no-rib
service integrated-vtysh-config
username root nopassword
!
router bgp 64512
 bgp router-id 10.1.1.1
 neighbor 10.1.1.2 remote-as 64512
 neighbor 10.1.1.2 update-source 10.1.1.1
 neighbor 10.1.1.2 timers 60 180
 !
 address-family ipv4 unicast
  neighbor 10.1.1.2 next-hop-self
  neighbor 10.1.1.2 route-map DENYALL in
 exit-address-family
exit
!
access-list all seq 5 permit any
!
ip prefix-list denyall seq 5 deny 0.0.0.0/0 le 32
!
route-map DENYALL permit 10
 match ip address prefix-list denyall
exit
!
end

Version

root@10-1-1-1:~# vtysh

Hello, this is FRRouting (version 8.1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

10-1-1-1# show version
FRRouting 8.1 (master-10-1-1-1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
10-1-1-1#

How to reproduce

Steps to reproduce -

  1. Add the neighbor router configuration and establish a bgp session
  2. Ensure the bgp session is established between A and B
  3. Restart the frr service
  4. frr service restart fails.
root@10-1-1-1:~# cat a.sh
while true ; do sleep 3 ; systemctl restart frr ; systemctl status frr | grep running; if [ $? -eq 1 ]; then     exit 1; fi; done
root@10-1-1-1:~#

root@10-1-1-1:~# cat /tmp/a.log
     Active: active (running) since Thu 2024-11-21 08:54:31 UTC; 5ms ago
     Active: active (running) since Thu 2024-11-21 08:54:39 UTC; 5ms ago
     Active: active (running) since Thu 2024-11-21 08:54:48 UTC; 5ms ago
Job for frr.service failed.

Expected behavior

Steps -

  1. Add the neighbor router configuration and establish a bgp session
  2. Ensure the bgp session is established between A and B
  3. Restart the frr service
  4. frr restart should be successful

Actual behavior

Steps -

  1. Add the neighbor router configuration and establish a bgp session
  2. Ensure the bgp session is established between A and B
  3. Restart the frr service
  4. frr restart failed

Additional context

Workaround is to stop and start the frr service -

Nov 21 08:44:25 10-1-1-1 bgpd[42189]: [YAF85-253AP][EC 100663299] buffer_write: write error on fd 15: Broken pipe
Nov 21 08:44:25 10-1-1-1 bgpd[42189]: [X6B3Y-6W42R][EC 100663302] zclient_send_message: buffer_write failed to zclient fd 15, closing
Nov 21 08:44:25 10-1-1-1 zebra[42184]: [QS0NJ-H5QKJ] Zebra final shutdown
Nov 21 08:44:25 10-1-1-1 frrinit.sh[42335]:  * Stopped staticd
Nov 21 08:44:25 10-1-1-1 frrinit.sh[42336]:  * Stopped bgpd
Nov 21 08:44:25 10-1-1-1 frrinit.sh[42337]:  * Stopped zebra
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Deactivated successfully.
Nov 21 08:44:25 10-1-1-1 systemd[1]: Stopped FRRouting.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Start request repeated too quickly.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Failed with result 'start-limit-hit'.
Nov 21 08:44:25 10-1-1-1 systemd[1]: Failed to start FRRouting.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Triggering OnFailure= dependencies.
Nov 21 08:44:25 10-1-1-1 systemd[1]: frr.service: Failed to enqueue OnFailure= job, ignoring: Unit [email protected] not f>
Nov 21 08:44:52 10-1-1-1 systemd[1]: frr.service: Start request repeated too quickly.
Nov 21 08:44:52 10-1-1-1 systemd[1]: frr.service: Failed with result 'start-limit-hit'.
Nov 21 08:44:52 10-1-1-1 systemd[1]: Failed to start FRRouting.
root@10-1-1-1 :~#
root@10-1-1-1 :~#
root@10-1-1-1 :~# systemctl stop frr
root@10-1-1-1 :~# systemctl start frr
root@10-1-1-1 :~# systemctl status frr
● frr.service - FRRouting
     Loaded: loaded (/lib/systemd/system/frr.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-11-21 08:50:16 UTC; 2s ago
       Docs: https://frrouting.readthedocs.io/en/latest/setup.html
    Process: 47341 ExecStart=/usr/lib/frr/frrinit.sh start (code=exited, status=0/SUCCESS)
   Main PID: 47350 (watchfrr)
     Status: "FRR Operational"
      Tasks: 13 (limit: 23695)
     Memory: 17.2M
        CPU: 435ms
     CGroup: /system.slice/frr.service
             ├─47350 /usr/lib/frr/watchfrr -d -F traditional zebra bgpd staticd
             ├─47366 /usr/lib/frr/zebra -d -F traditional -A 127.0.0.1 -s 90000000
             ├─47372 /usr/lib/frr/bgpd -d -F traditional --daemon -A 127.0.0.1 -l 10.1.1.1
             └─47379 /usr/lib/frr/staticd -d -F traditional -A 127.0.0.1

Nov 21 08:50:12 10-1-1-1  zebra[47366]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Nov 21 08:50:12 10-1-1-1  bgpd[47372]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Nov 21 08:50:12 10-1-1-1  staticd[47379]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Nov 21 08:50:12 10-1-1-1  watchfrr[47350]: [ZJW5C-1EHNT] restart all process 47351 exited with non-zero status 13
Nov 21 08:50:16 10-1-1-1  watchfrr[47350]: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded
Nov 21 08:50:16 10-1-1-1  watchfrr[47350]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Nov 21 08:50:16 10-1-1-1  watchfrr[47350]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Nov 21 08:50:16 10-1-1-1  watchfrr[47350]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Nov 21 08:50:16 10-1-1-1 frrinit.sh[47341]:  * Started watchfrr
Nov 21 08:50:16 10-1-1-1 systemd[1]: Started FRRouting.
root@10-1-1-1:~# 

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@arjunramu arjunramu added the triage Needs further investigation label Nov 21, 2024
@ton31337
Copy link
Member

Could you enable debug logging and show us the logs? debug bgp updates, debug bgp neighbor.

@arjunramu
Copy link
Author

arjunramu commented Nov 25, 2024

Enabled debug logging and here are the logs -

bgpd.log
frr.log
journalctl_-xeu_frr_service.log

@arjunramu
Copy link
Author

@ton31337 Any update on the issue?

@ton31337
Copy link
Member

ton31337 commented Dec 9, 2024

What about newer versions? 8.1 is way too old, unfortunately.

@ton31337 ton31337 added the unsupported-version The version of FRR is unsupported label Dec 9, 2024
@arjunramu
Copy link
Author

What about newer versions? 8.1 is way too old, unfortunately.

Any specific version which has the fix, we can give it a try.

@ton31337
Copy link
Member

Let's try at least 10.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation unsupported-version The version of FRR is unsupported
Projects
None yet
Development

No branches or pull requests

2 participants