Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: sonic-net/sonic-linkmgrd
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: e191338e863d561d201ff1e272ca37e48441cbc4
Choose a base ref
...
head repository: sonic-net/sonic-linkmgrd
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 3e7a9df8a4b3c33fa0af42cd84f6ad5428bdc0e1
Choose a head ref
  • 3 commits
  • 3 files changed
  • 2 contributors

Commits on Feb 21, 2023

  1. Fix mux config when gRPC connection is lost (#166)

    Approach
    What is the motivation for this PR?
    Fix issue: #165
    
    Signed-off-by: Longxiang Lyu lolv@microsoft.com
    
    How did you do it?
    Reset the link prober state if the mux state is unknown when the mux config is changed back from active or standby to auto.
    
    How did you verify/test it?
    lolyu authored and yxieca committed Feb 21, 2023
    Copy the full SHA
    df862ad View commit details
  2. [active-active] fix issue that interfaces get stuck in active if se…

    …rvice starts up with link state down (#169)
    
    Description of PR
    Summary:
    Fixes # (issue)
    
    To fix issue #168.
    
    In the failure scenario, the event sequence was
    
    write_standby set state to "active" when services restarted
    mux config auto
    init link state down
    init mux state unknown
    state machine activated, re-init link prober state to (wait, wait, down)
    link prober state event: unknown, supposed to switch to standby, but there wasn't transaction handler defined for "unknown, unknown, down" or "unknown, wait, down"
    The fix here is to define no-op handlers to enforce a standby state or a state probing.
    
    sign-off: Jing Zhang zhangjing@microsoft.com.
    
    Type of change
     Bug fix
    
    Approach
    What is the motivation for this PR?
    To switch active interface into standby if link state is down after booting up.
    
    How did you do it?
    Define link down related state transition handlers.
    
    How did you verify/test it?
    Tested on active-active dualtor, show mux status return standby eventually after service was restarted.
    zjswhhh authored and yxieca committed Feb 21, 2023
    Copy the full SHA
    8ab1b2b View commit details
  3. [active-active] Toggle to standby if default route is missing (#171)

    Approach
    What is the motivation for this PR?
    To avoid extra disruption after a BGP shutdown.
    The reason is that, after a BGP shutdown, linkmgrd will stop heartbeats, and both local ToR and peer ToR will receive the heartbeat missing and toggle the local ToR to standby.
    There is a chance that the peer ToR toggles local ToR to standby first, and the local ToR will probe the mux finding that itself mux state as standby, the local ToR will toggle itself back to active. When the local ToR receives the link prober unknown event, it will toggle itself into standby at last.
    
    Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
    
    How did you do it?
    When linkmgrd receives default route missing event, make an extra toggle to standby, so local ToR's toggle to standby should be earlier than peer ToR's.
    And the disruption total time could be reduced because linkmgrd makes the toggle immediately after receiving the default route missing event instead of waiting for the link prober unknown event after shutdown the heartbeat sending.
    lolyu authored and yxieca committed Feb 21, 2023
    Copy the full SHA
    3e7a9df View commit details
69 changes: 66 additions & 3 deletions src/link_manager/LinkManagerStateMachineActiveActive.cpp
Original file line number Diff line number Diff line change
@@ -243,6 +243,9 @@ void ActiveActiveStateMachine::handleMuxConfigNotification(const common::MuxPort
switchMuxState(nextState, mux_state::MuxState::Label::Active, true);
} else if (mode == common::MuxPortConfig::Mode::Standby && ms(mCompositeState) != mux_state::MuxState::Label::Standby) {
switchMuxState(nextState, mux_state::MuxState::Label::Standby, true);
} else if (mode == common::MuxPortConfig::Mode::Auto && ms(mCompositeState) == mux_state::MuxState::Label::Unknown) {
MUXLOGINFO(boost::format("%s: reset link prober state") % mMuxPortConfig.getPortName());
initLinkProberState(nextState);
} else {
// enforce a state transtion calculation based on current states
mStateTransitionHandler[ps(nextState)][ms(nextState)][ls(nextState)](nextState);
@@ -703,6 +706,24 @@ void ActiveActiveStateMachine::initializeTransitionFunctionTable()
this,
boost::placeholders::_1
);

mStateTransitionHandler[link_prober::LinkProberState::Label::Unknown]
[mux_state::MuxState::Label::Wait]
[link_state::LinkState::Label::Down] =
boost::bind(
&ActiveActiveStateMachine::LinkProberUnknownMuxWaitLinkDownTransitionFunction,
this,
boost::placeholders::_1
);

mStateTransitionHandler[link_prober::LinkProberState::Label::Unknown]
[mux_state::MuxState::Label::Unknown]
[link_state::LinkState::Label::Down] =
boost::bind(
&ActiveActiveStateMachine::LinkProberUnknownMuxUnknownLinkDownTransitionFunction,
this,
boost::placeholders::_1
);
}

//
@@ -828,6 +849,36 @@ void ActiveActiveStateMachine::LinkProberUnknownMuxWaitLinkUpTransitionFunction(
switchMuxState(nextState, mux_state::MuxState::Label::Standby);
}

//
// ---> LinkProberUnknownMuxWaitLinkDownTransitionFunction(CompositeState &nextState)
//
// transition function when entering {LinkProberUnknown, MuxWait, LinkDown} state
//
void ActiveActiveStateMachine::LinkProberUnknownMuxWaitLinkDownTransitionFunction(CompositeState &nextState)
{
MUXLOGINFO(mMuxPortConfig.getPortName());
if (ps(mCompositeState) != link_prober::LinkProberState::Unknown) {
switchMuxState(nextState, mux_state::MuxState::Label::Standby);
} else {
startMuxProbeTimer();
}
}

//
// ---> LinkProberUnknownMuxUnknownLinkDownTransitionFunction(CompositeState &nextState)
//
// transition function when entering {LinkProberUnknown, MuxUnknown, LinkDown} state
//
void ActiveActiveStateMachine::LinkProberUnknownMuxUnknownLinkDownTransitionFunction(CompositeState &nextState)
{
MUXLOGINFO(mMuxPortConfig.getPortName());
if (ps(mCompositeState) != link_prober::LinkProberState::Unknown) {
switchMuxState(nextState, mux_state::MuxState::Label::Standby);
} else {
startMuxProbeTimer();
}
}

/*--------------------------------------------------------------------------------------------------------------
| utility methods to check/modify state
---------------------------------------------------------------------------------------------------------------*/
@@ -1193,6 +1244,18 @@ void ActiveActiveStateMachine::handleDefaultRouteStateNotification(const Default

mDefaultRouteState = routeState;
shutdownOrRestartLinkProberOnDefaultRoute();

if (mComponentInitState.all() &&
mMuxPortConfig.getMode() != common::MuxPortConfig::Mode::Active &&
mDefaultRouteState == DefaultRoute::NA) {
// Add a switch to standby here, so the self ToR should be able to
// toggle itself to standby earlier than the peer ToR.
CompositeState nextState = mCompositeState;
switchMuxState(nextState, mux_state::MuxState::Label::Standby);
LOGWARNING_MUX_STATE_TRANSITION(mMuxPortConfig.getPortName(), mCompositeState, nextState);
mCompositeState = nextState;
}

updateMuxLinkmgrState();
}

@@ -1208,9 +1271,9 @@ void ActiveActiveStateMachine::shutdownOrRestartLinkProberOnDefaultRoute()

if (mComponentInitState.all()) {
if ((mMuxPortConfig.getMode() == common::MuxPortConfig::Mode::Auto ||
mMuxPortConfig.getMode() == common::MuxPortConfig::Mode::Detached ||
mMuxPortConfig.getMode() == common::MuxPortConfig::Mode::Standby)
&& mDefaultRouteState != DefaultRoute::OK) {
mMuxPortConfig.getMode() == common::MuxPortConfig::Mode::Detached ||
mMuxPortConfig.getMode() == common::MuxPortConfig::Mode::Standby) &&
mDefaultRouteState != DefaultRoute::OK) {
mShutdownTxFnPtr();
} else {
// If mux mode is in manual/active mode, we should restart link prober.
18 changes: 18 additions & 0 deletions src/link_manager/LinkManagerStateMachineActiveActive.h
Original file line number Diff line number Diff line change
@@ -356,6 +356,24 @@ class ActiveActiveStateMachine : public LinkManagerStateMachineBase,
*/
void LinkProberUnknownMuxWaitLinkUpTransitionFunction(CompositeState &nextState);

/**
* @method LinkProberUnknownMuxWaitLinkDownTransitionFunction
*
* @brief transition function when entering {LinkProberUnknown, MuxWait, LinkDown} state
*
* @param nextState reference to composite state
*/
void LinkProberUnknownMuxWaitLinkDownTransitionFunction(CompositeState &nextState);

/**
* @method LinkProberUnknownMuxUnknownLinkDownTransitionFunction
*
* @brief transition function when entering {LinkProberUnknown, MuxUnknown, LinkDown} state
*
* @param nextState reference to composite state
*/
void LinkProberUnknownMuxUnknownLinkDownTransitionFunction(CompositeState &nextState);

private: // utility methods to check/modify state
/**
* @method setLabel
119 changes: 119 additions & 0 deletions test/LinkManagerStateMachineActiveActiveTest.cpp
Original file line number Diff line number Diff line change
@@ -461,12 +461,36 @@ TEST_F(LinkManagerStateMachineActiveActiveTest, MuxActivDefaultRouteState)
postDefaultRouteEvent("na", 1);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mShutdownTxProbeCallCount, 1);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mRestartTxProbeCallCount, 2);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 2);
EXPECT_EQ(mDbInterfacePtr->mLastSetMuxState, mux_state::MuxState::Label::Standby);
VALIDATE_STATE(Active, Standby, Up);

postDefaultRouteEvent("ok", 1);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mShutdownTxProbeCallCount, 1);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mRestartTxProbeCallCount, 3);
}

TEST_F(LinkManagerStateMachineActiveActiveTest, MuxActivDefaultRouteStateMuxConfigActive)
{
setMuxActive();
VALIDATE_STATE(Active, Active, Up);
EXPECT_FALSE(mMuxConfig.getIfEnableDefaultRouteFeature());

mMuxConfig.enableDefaultRouteFeature(true);
postDefaultRouteEvent("ok", 1);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mShutdownTxProbeCallCount, 0);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mRestartTxProbeCallCount, 1);

handleMuxConfig("active", 2);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mRestartTxProbeCallCount, 2);

postDefaultRouteEvent("na", 1);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mShutdownTxProbeCallCount, 0);
EXPECT_EQ(mFakeMuxPort.mFakeLinkProber->mRestartTxProbeCallCount, 3);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 1);
VALIDATE_STATE(Active, Active, Up);
}

TEST_F(LinkManagerStateMachineActiveActiveTest, LinkmgrdBootupSequenceHeartBeatFirst)
{
activateStateMachine();
@@ -792,4 +816,99 @@ TEST_F(LinkManagerStateMachineActiveActiveTest, MuxActivePeriodicalCheck)
VALIDATE_STATE(Active, Active, Up);
}

TEST_F(LinkManagerStateMachineActiveActiveTest, MuxActiveConfigStandbygRPCError)
{
setMuxActive();

handleMuxConfig("standby", 1);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 2);
EXPECT_EQ(mDbInterfacePtr->mLastSetMuxState, mux_state::MuxState::Label::Standby);
VALIDATE_STATE(Active, Standby, Up);

handleMuxState("unknown", 3);
VALIDATE_STATE(Active, Unknown, Up);

handleProbeMuxState("unknown", 4);
VALIDATE_STATE(Active, Unknown, Up);

postLinkProberEvent(link_prober::LinkProberState::Active, 3);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 2);
VALIDATE_STATE(Active, Unknown, Up);

handleMuxConfig("auto", 1);
VALIDATE_STATE(Wait, Unknown, Up);

postLinkProberEvent(link_prober::LinkProberState::Active, 3);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 3);
VALIDATE_STATE(Active, Active, Up);

handleMuxState("unknown", 4);
VALIDATE_STATE(Active, Unknown, Up);
}

TEST_F(LinkManagerStateMachineActiveActiveTest, MuxStandbyConfigStandbygRPCError)
{
setMuxStandby();

handleMuxConfig("active", 1);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 2);
EXPECT_EQ(mDbInterfacePtr->mLastSetMuxState, mux_state::MuxState::Label::Active);
VALIDATE_STATE(Unknown, Active, Up);

handleMuxState("unknown", 3);
VALIDATE_STATE(Unknown, Unknown, Up);

handleProbeMuxState("unknown", 4);
VALIDATE_STATE(Unknown, Unknown, Up);

postLinkProberEvent(link_prober::LinkProberState::Unknown, 3);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 2);
VALIDATE_STATE(Unknown, Unknown, Up);

handleMuxConfig("auto", 1);
VALIDATE_STATE(Wait, Unknown, Up);

postLinkProberEvent(link_prober::LinkProberState::Unknown, 3);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 3);
VALIDATE_STATE(Unknown, Standby, Up);

handleMuxState("unknown", 4);
VALIDATE_STATE(Unknown, Unknown, Up);
}

TEST_F(LinkManagerStateMachineActiveActiveTest, StateMachineActivatedLinkDownLinkProberFirst)
{
activateStateMachine();
VALIDATE_STATE(Wait, Wait, Down);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 0);

postLinkProberEvent(link_prober::LinkProberState::Unknown, 2);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 1);
EXPECT_EQ(mDbInterfacePtr->mLastSetMuxState, mux_state::MuxState::Label::Standby);
VALIDATE_STATE(Unknown, Standby, Down);

handleMuxState("unknown", 3);
VALIDATE_STATE(Unknown, Unknown, Down);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 1);
}

TEST_F(LinkManagerStateMachineActiveActiveTest, StateMachineActivatedLinkDownMuxFirst)
{
activateStateMachine();
VALIDATE_STATE(Wait, Wait, Down);

handleMuxState("unknown", 3);
VALIDATE_STATE(Wait, Unknown, Down);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 0);

postLinkProberEvent(link_prober::LinkProberState::Unknown, 2);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 1);
EXPECT_EQ(mDbInterfacePtr->mLastSetMuxState, mux_state::MuxState::Label::Standby);
VALIDATE_STATE(Unknown, Standby, Down);

handleMuxState("unknown", 3);
VALIDATE_STATE(Unknown, Unknown, Down);
EXPECT_EQ(mDbInterfacePtr->mSetMuxStateInvokeCount, 1);
}

} /* namespace test */