-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[services] BGP service depends on SwSS service #3078
Conversation
if we restart swss service, does it restart bgp service? |
Yes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jleveque @lguohan Since GR is negotiated by default, this change doesn't achieve anything when swss restarts (traffic will be blackholed anyway since we have GR). In fact, a second restart of swss, will kill GR and that's undesirable. This isn't the right thing to do. In general, such dependency shouldn't exist between the routing application and swss.
if swss goes down (crashes), then we need to stop bgp since the asic is not synced with the bgp fib and we are going to or is dropping packets. In this case, we would like to bring down the bgp service. Meanwhile, if we restart the swss (in warm reboot), we should not restart bgp. So, I think we need more sophiscated mechansim to differentiate these two cases. |
@lguohan The first assumption is not right. Other services running on the device will still need the bgp routes in the kernel. Bringing down bgp can isolate the device completely from the network and make it unreachable (that precludes us from running bgp over the mgmt port or have bgp install routes). If the behavior you are describing is needed, it needs to be configurable and can't be the default. |
@xinliu-seattle This PR needs to be closed and must not be committed. As I pointed out already, it will break existing functionality. |
…lly (#18451) #### Why I did it src/sonic-swss ``` * f1aad1f0 - (HEAD -> 202311, origin/202311) [acl] Add IN_PORTS qualifier for L3 table (#3078) (10 hours ago) [Neetha John] * 3b953ae7 - [Mellanox] Fix inconsistence in the shared headroom pool initialization (#3057) (22 hours ago) [Stephen Sun] ``` #### How I did it #### How to verify it #### Description for the changelog
…commits * c96a2f84 - Revert "[acl] Add IN_PORTS qualifier for L3 table (sonic-net#3078)" (sonic-net#3092) (6 days ago) [Neetha John] * 80e0b57d - [Copp]Refactor coppmgr tests (sonic-net#3093) (8 days ago) [Sudharsan Dhamal Gopalarathnam] * a4647299 - [portsorch] process only updated APP_DB fields when port is already created (sonic-net#3025) (10 days ago) [Stepan Blyshchak] * 91bacca5 - [buffermgrd] Move switch-statement outside of if-statement in BufferMgr::doTask (sonic-net#3055) (2 weeks ago) [Amir] * 04912ad0 - [bulker] add support for neighbor bulking (sonic-net#2768) (2 weeks ago) [Nikola Dancejic] * 9d4a3add - [acl] Add IN_PORTS qualifier for L3 table (sonic-net#3078) (2 weeks ago) [Neetha John] * a13e081f - [Mellanox] Fix inconsistence in the shared headroom pool initialization (sonic-net#3057) (3 weeks ago) [Stephen Sun] * ff2b2b85 - Add basic fabric link monitoring counters and states handling. (sonic-net#2988) (3 weeks ago) [jfeng-arista] * 0c620910 - Add port flap count and last flap timestamp to APPL_DB (sonic-net#3052) (3 weeks ago) [Prince George] * e9931f31 - [EVPN] Skip EVPN routes with invalid VNI or router mac field (sonic-net#3073) (3 weeks ago) [Lior Avramov] * 600d5e80 - Set HOST_TX_READY_NOTIFY attribute only after query capabilities(sonic-net#3070) (3 weeks ago) [noaOrMlnx]
…commits (#18576) * c96a2f84 - Revert "[acl] Add IN_PORTS qualifier for L3 table (#3078)" (#3092) (6 days ago) [Neetha John] * 80e0b57d - [Copp]Refactor coppmgr tests (#3093) (8 days ago) [Sudharsan Dhamal Gopalarathnam] * a4647299 - [portsorch] process only updated APP_DB fields when port is already created (#3025) (10 days ago) [Stepan Blyshchak] * 91bacca5 - [buffermgrd] Move switch-statement outside of if-statement in BufferMgr::doTask (#3055) (2 weeks ago) [Amir] * 04912ad0 - [bulker] add support for neighbor bulking (#2768) (2 weeks ago) [Nikola Dancejic] * 9d4a3add - [acl] Add IN_PORTS qualifier for L3 table (#3078) (2 weeks ago) [Neetha John] * a13e081f - [Mellanox] Fix inconsistence in the shared headroom pool initialization (#3057) (3 weeks ago) [Stephen Sun] * ff2b2b85 - Add basic fabric link monitoring counters and states handling. (#2988) (3 weeks ago) [jfeng-arista] * 0c620910 - Add port flap count and last flap timestamp to APPL_DB (#3052) (3 weeks ago) [Prince George] * e9931f31 - [EVPN] Skip EVPN routes with invalid VNI or router mac field (#3073) (3 weeks ago) [Lior Avramov] * 600d5e80 - Set HOST_TX_READY_NOTIFY attribute only after query capabilities(#3070) (3 weeks ago) [noaOrMlnx]
- What I did/How I did it
Make BGP service dependent upon SwSS service. BGP service should never run unless SwSS is running and healthy, otherwise the switch could blackhole traffic. By adding SwSS to the
Requires=
andAfter=
options, the BGP service will now start only after SwSS successfully starts.Also, with the addition of #2845, I also added SwSS to the
WantedBy=
option, which will ensure the BGP service gets stopped if SwSS ever stops, and it will also get restarted if SwSS gets restarted.- How to verify it
pkill -11 orchagent
). This will cause the SwSS service to stop and restart. Ensure that the BGP service stops shortly after the SwSS service stops. Wait until SwSS automatically restarts, then ensure the BGP service also gets restarted afterward.