-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mp info changes #16961
Mp info changes #16961
Conversation
ef0d50e
to
328d823
Compare
328d823
to
050093c
Compare
Signed-off-by: Donald Sharp <[email protected]>
050093c
to
12b1f2c
Compare
The mpath data structure has data that is only relevant for the first mpath in the list. It is not being used anywhere else. Let's document that a bit more. Signed-off-by: Donald Sharp <[email protected]>
Test was confusing. Add some useful data and clean up some debugs Signed-off-by: Donald Sharp <[email protected]>
Currently bgp multipath has these properties: a) mp_info may or may not be on a single path, based upon path perturbations in the past. b) mp_info->count started counting at 0( meaning 1 ). As that the bestpath path_info was never included in the count c) The first mp_info in the list held the multipath data associated with the multipath. As such if you were at any other node that data was not filled in. d) As such the mp_info's that are not first on the list basically were just pointers to the corresponding bgp_path_info that was in the multipath. e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was created. f) This linklist was passed in to a comparison function that took the old mpinfo list and compared it item by item to the linklist and doing magic to figure out how to create a new mp_info list. g) the old mp_info and the link list had to be memory managed and freed up. h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the multipath. This is really complicated. Let's change the algorithm to this: a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as BGP_PATH_MULTIPATH_NEW. b) When running multipath, just walk the list of bgp_path_info's and if it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH. If we run out of space to put in the ecmp, clear the flag on the rest. c) Clean up the counting of sometimes adding 1 to the mpath count. d) Only allocate a mpath_info node for the bestpath. Clean it up when done with it. e) remove the unneeded list management associated with the linklist and the mp_list. This greatly simplifies multipath computation for bgp and reduces memory load for large scale deployments. 2 full feeds in work_queue_run prior: 0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run BGP multipath info : 1941844 48 110780992 1941844 110780992 2 full feeds in work_queue_run after change: 1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run BGP multipath info : 970860 32 38836880 970866 38837120 Aproximately 4 seconds of saved cpu time for convergence and ~75 mb smaller run time. Signed-off-by: Donald Sharp <[email protected]>
This function is no doing any work. Let's remove. Signed-off-by: Donald Sharp <[email protected]>
12b1f2c
to
5b8bca8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one question to be sure I got the point. Did I understand this correctly that we get rid of managing bgp_path_info_mpath
linked-list at all and just using best path's information for all the multi paths?
yes exactly. The only mp_info allocated is for the bestpath bgp_path_info node. |
Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
…net#21199) Why I did it Added patches from FRR to support scale of BGP neighbors to 256/514. Below are the patches Patch FRR Pull request 0069-lib-nexthop-code-should-use-uint16_t-for-nexthop-cou.patch FRRouting/frr#16967 0070-Allow-16-bit-size-for-nexthops.patch FRRouting/frr#17023 0071-zebra-Only-notify-dplane-work-pthread-when-needed.patch FRRouting/frr#17062 0072-Fix-up-improper-handling-of-nexthops-for-nexthop-tra.patch FRRouting/frr#17076 0073-remove-in6addr-cmp.patch FRRouting/frr#17312 0074-bgp-best-port-reordering.patch FRRouting/frr#15572 0075-bgp-mp-info-changes.patch FRRouting/frr#16961 0076-Optimizations-and-problem-fixing-for-large-scale-ecmp-from-bgp.patch FRRouting/frr#17229
See last commit for full data. Effectively though, mp_info is expensive this reduces the cost by a decent amount.