-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mux] Route handling based on mux status, kernel tunnel support #1615
Conversation
retest vs please |
cfgmgr/tunnelmgr.cpp
Outdated
|
||
if (it == m_tunnelCache.end()) | ||
{ | ||
SWSS_LOG_NOTICE("Tunnel %s not found", tunnelName.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we can ever receive a delete request for a tunnel that is not there? If not, may have to set syslog with Error or Warning for the least...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to Error
/* Decrement ref count for ECMP NH members */ | ||
l_fn (nh_key, num_routes); | ||
|
||
neigh = NeighborEntry(it->first, alias_); | ||
if (!gNeighOrch->disableNeighbor(neigh)) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
IpPrefix pfx = it->first.to_string(); | ||
if (create_route(pfx, it->second) != SAI_STATUS_SUCCESS) | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_route has ERR log in case of failure
orchagent/routeorch.cpp
Outdated
@@ -371,6 +374,7 @@ bool RouteOrch::invalidnexthopinNextHopGroup(const NextHopKey &nexthop) | |||
return false; | |||
} | |||
|
|||
++ count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this "count" variable? Does not look like it is being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the return value as part of function param.
orchagent/routeorch.cpp
Outdated
while(rt != it->second.end()) | ||
{ | ||
SWSS_LOG_INFO("Updating route %s", (*rt).prefix.to_string().c_str()); | ||
next_hop_id = m_neighOrch->getNextHopId(nextHop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really intended? Doe not look like nexchop is modified in this while loop therefore next_hop_id will always be the same each loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, during mux state change, the nexthop id would have been changed to tunnel nexthop
orchagent/routeorch.cpp
Outdated
@@ -333,6 +334,7 @@ bool RouteOrch::validnexthopinNextHopGroup(const NextHopKey &nexthop) | |||
return false; | |||
} | |||
|
|||
++ count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the space here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -250,11 +259,11 @@ bool NeighOrch::setNextHopFlag(const NextHopKey &nexthop, const uint32_t nh_flag | |||
} | |||
|
|||
nhop->second.nh_flags |= nh_flag; | |||
|
|||
uint32_t count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this count here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where valid/invalid nexthop group is used and the API is modified to return number of nexthop groups that were modified. This is a no-op here. The caller can determine what to do with returned value
orchagent/routeorch.h
Outdated
@@ -132,6 +150,7 @@ class RouteOrch : public Orch, public Subject | |||
|
|||
RouteTables m_syncdRoutes; | |||
NextHopGroupTable m_syncdNextHopGroups; | |||
NextHopRouteTable m_nextHops; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to explain what does the nexthop router table store? the nexthop -> route map? does this includes all the routes to T1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mentioned at line 67. This is generic to be useful for other use-cases
orchagent/routeorch.cpp
Outdated
RouteKey r_key = { vrf_id, ipPrefix }; | ||
auto nexthop = NextHopKey(nextHops.to_string()); | ||
addNextHopRoute(nexthop, r_key); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of add the <nexthop, r_key> ? is this used for track the nexthop change, so that we can find the route to be changed?
however, what about if there is a T1 route has only single nexthop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, what is the implication if we have this nexthop route table on T1? feel it is totally unnecessary.
In reply to: 568514575 [](ancestors = 568514575)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a common data structure for holding Single NH -> Routes. It includes both T1 and Mux routes. I think it is hard to differentiate with T1 routes. Do you have a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first, if the muxorch is not enabled, we should not enable this nexthop -> route tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't true we will know which port has muxcable, and only track nexthop if it is pointing that port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nexthop map is required for evpn case. Also to clarify, this does not contain T1 routes since in most cases its ECMP. This datastructure is only if a route has single nexthop all the time.
orchagent/routeorch.cpp
Outdated
auto rt = it->second.begin(); | ||
while(rt != it->second.end()) | ||
{ | ||
SWSS_LOG_INFO("Updating route %s", (*rt).prefix.to_string().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating route %s for what purpose? I think it is better to finish the sentence, it is unclear what do you update.
orchagent/routeorch.cpp
Outdated
|
||
if (it == m_nextHops.end()) | ||
{ | ||
SWSS_LOG_INFO("No routes found for NH %s", nextHop.ip_address.to_string().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this debug level info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
orchagent/muxorch.cpp
Outdated
auto it = neighbors_.getIpAddresses().begin(); | ||
while (it != neighbors_.getIpAddresses().end()) | ||
auto l_fn = [&] (NextHopKey nh_key, uint32_t val) { | ||
SWSS_LOG_INFO("Inc ref for %s by %d", nh_key.ip_address.to_string().c_str(), val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this debug level?
bool NeighOrch::hasNextHop(const NextHopKey &nexthop) | ||
{ | ||
// First check if mux has NH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why neighbor orch has to care if mux has NH or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a route is programmed and if the neighbor is standby, then it will be a tunnel nexthop and not regular nexthop which will be owned by muxorch. So for mux neighbors, the nexthop management is by mux orch. For non-mux neighbor, it will follow the existing flow.
orchagent/muxorch.cpp
Outdated
auto it = neighbors_.getIpAddresses().begin(); | ||
while (it != neighbors_.getIpAddresses().end()) | ||
auto l_fn = [&] (NextHopKey nh_key, uint32_t val) { | ||
SWSS_LOG_INFO("Dec ref for %s by %d", nh_key.ip_address.to_string().c_str(), val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug
orchagent/muxorch.cpp
Outdated
for (uint32_t cnt = 0; cnt < val; ++cnt) | ||
{ | ||
gNeighOrch->decreaseNextHopRefCount(nh_key); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of looping, can we just decrease the ref count by val refcount - val?
orchagent/muxorch.cpp
Outdated
SWSS_LOG_INFO("Inc ref for %s by %d", nh_key.ip_address.to_string().c_str(), val); | ||
for (uint32_t cnt = 0; cnt < val; ++cnt) | ||
{ | ||
gNeighOrch->increaseNextHopRefCount(nh_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we increase the refcnt by value instead of looping here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is existing API and would need to define new if we have to increase by value. Also, considering the number of routes, prefer we can use this loop instead of defining new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that will be cleaner, correct?
orchagent/muxorch.cpp
Outdated
return false; | ||
} | ||
|
||
if (!gRouteOrch->validnexthopinNextHopGroup(nh_key, num_routes)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both line 574 and 580 has num_routes, this cause readability of the code.
i think it is more readable to get del_route_num in line 574 and get add_route_num in line 580, and you can get their substraction add_route_num - del_route_num as refcnt change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In enable flow, it is moving from Tunnel NH -> Direct NH (Neighbor). In this case, the ref count must be incremented only for the Direct NH. Similarly for disable flow, ref count is decremented for Direct NH. However, there is no ref_cnt management for tunnel NH as it is one time created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
orchagent/routeorch.cpp
Outdated
SWSS_LOG_NOTICE("Remove overlay Nexthop %s", ol_nextHops.to_string().c_str()); | ||
removeOverlayNextHops(vrf_id, ol_nextHops); | ||
} | ||
else if (ol_nextHops.getSize() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to also check if the nexthop is pointing to the muxcable port or not.
cfgmgr/tunnelmgr.cpp
Outdated
const std::string & op = kfvOp(t); | ||
TunnelInfo tunInfo; | ||
|
||
for (auto idx : kfvFieldsValues(t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: idx
-> fieldValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
cfgmgr/tunnelmgr.cpp
Outdated
ret = cmdIpTunnelIfAddress(ipPrefix.to_string(), res); | ||
if (ret != 0) | ||
{ | ||
SWSS_LOG_WARN("Failed to assign IP addr for tun if %s", ipPrefix.to_string().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe printout of res
would help with debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
cfgmgr/tunnelmgr.cpp
Outdated
ret = cmdIpTunnelRouteAdd(prefix, res); | ||
if (ret != 0) | ||
{ | ||
SWSS_LOG_WARN("Failed to add route %s", prefix.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same regarding res here and throughout the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
cfgmgr/tunnelmgrd.cpp
Outdated
@@ -42,11 +42,15 @@ int main(int argc, char **argv) | |||
|
|||
try | |||
{ | |||
vector<string> cfg_tun_tables = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: camelCase style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
orchagent/muxorch.h
Outdated
@@ -94,7 +101,7 @@ class MuxCable | |||
bool stateStandby(); | |||
|
|||
bool aclHandler(sai_object_id_t, bool = true); | |||
bool nbrHandler(bool = true); | |||
bool nbrHandler(bool, bool = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args name would?? Documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
As part of dual-tor features, kernel tunnel interface (tun0) is being created. The routes over this is intended for kernel forwarding and not expected to be installed in ASIC. Ref PR - sonic-net/sonic-swss#1615
/Azurepipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Program/Reprogram routes to hardware based on Mux status Create tunnel interface (tun0) in kernel Add/remove tunnel routes in kernel
As part of dual-tor features, kernel tunnel interface (tun0) is being created. The routes over this is intended for kernel forwarding and not expected to be installed in ASIC. Ref PR - sonic-net/sonic-swss#1615
…c-net#1615) Program/Reprogram routes to hardware based on Mux status Create tunnel interface (tun0) in kernel Add/remove tunnel routes in kernel
As part of dual-tor features, kernel tunnel interface (tun0) is being created. The routes over this is intended for kernel forwarding and not expected to be installed in ASIC. Ref PR - sonic-net/sonic-swss#1615
…c-net#1615) Program/Reprogram routes to hardware based on Mux status Create tunnel interface (tun0) in kernel Add/remove tunnel routes in kernel
…c-net#1615) Program/Reprogram routes to hardware based on Mux status Create tunnel interface (tun0) in kernel Add/remove tunnel routes in kernel
* [show] add support for muxcable metrics What I did Added support for show muxcable metrics. This essentially records what events came to different modules per se for toggling the mux from one state to another. for example admin@sonic$ show muxcable metrics Ethernet0 --json { "linkmgrd_switch_active_start": "2021-May-13 10:00:21.420898", "linkmgrd_switch_standby_end": "2021-May-13 10:01:15.696728", "linkmgrd_switch_unknown_end": "2021-May-13 10:00:26.123319", "xcvrd_switch_standby_end": "2021-May-13 10:01:15.696051", "xcvrd_switch_standby_start": "2021-May-13 10:01:15.690835" } or admin@sonic:$ show muxcable metrics Ethernet0 PORT EVENT TIME --------- ---------------------------- --------------------------- Ethernet0 linkmgrd_switch_active_start 2021-May-13 10:00:21.420898 Ethernet0 linkmgrd_switch_standby_end 2021-May-13 10:01:15.696728 Ethernet0 linkmgrd_switch_unknown_end 2021-May-13 10:00:26.123319 Ethernet0 xcvrd_switch_standby_end 2021-May-13 10:01:15.696051 Ethernet0 xcvrd_switch_standby_start 2021-May-13 10:01:15.690835 How I did it added changes in show/muxcable.py by reading and publishing the state DB contents for the corresponding table Signed-off-by: vaibhav-dahiya <[email protected]>
As part of dual-tor features, kernel tunnel interface (tun0) is being created. The routes over this is intended for kernel forwarding and not expected to be installed in ASIC. Ref PR - sonic-net/sonic-swss#1615
What I did
Why I did it
Support dual-tor SLB scenarios
How I verified it
Details if related