Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated cherry pick of #4126: Add OFSwitch connection check to Agent's liveness probes #4447

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Dec 7, 2022

Cherry pick of #4126 on release-1.7.

#4126: Add OFSwitch connection check to Agent's liveness probes

For details on the cherry pick process, see the cherry pick requests page.

@tnqn tnqn added the kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release label Dec 7, 2022
@tnqn tnqn force-pushed the automated-cherry-pick-of-#4126-upstream-release-1.7 branch from 748df8c to 239c28f Compare December 7, 2022 03:50
@codecov
Copy link

codecov bot commented Dec 7, 2022

Codecov Report

Merging #4447 (00d545e) into release-1.7 (693f568) will decrease coverage by 1.61%.
The diff coverage is 76.78%.

Impacted file tree graph

@@               Coverage Diff               @@
##           release-1.7    #4447      +/-   ##
===============================================
- Coverage        64.06%   62.44%   -1.62%     
===============================================
  Files              293      298       +5     
  Lines            43417    44428    +1011     
===============================================
- Hits             27815    27743      -72     
- Misses           13362    14464    +1102     
+ Partials          2240     2221      -19     
Flag Coverage Δ
kind-e2e-tests 49.68% <86.20%> (-0.76%) ⬇️
unit-tests 44.14% <73.21%> (-0.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
cmd/antrea-agent/agent.go 0.00% <0.00%> (ø)
cmd/antrea-agent/options.go 4.50% <82.35%> (ø)
pkg/agent/apiserver/apiserver.go 71.27% <100.00%> (+4.25%) ⬆️
pkg/ovs/openflow/ofctrl_bridge.go 57.85% <100.00%> (+0.99%) ⬆️
pkg/agent/controller/networkpolicy/packetin.go 16.90% <0.00%> (-52.82%) ⬇️
pkg/agent/flowexporter/exporter/exporter.go 49.00% <0.00%> (-28.97%) ⬇️
pkg/agent/flowexporter/utils.go 44.68% <0.00%> (-27.66%) ⬇️
pkg/controller/networkpolicy/tier.go 50.00% <0.00%> (-5.00%) ⬇️
pkg/agent/util/ipset/ipset.go 67.56% <0.00%> (-2.71%) ⬇️
pkg/ovs/openflow/ofctrl_packetin.go 69.62% <0.00%> (-1.27%) ⬇️
... and 32 more

@wenyingd
Copy link
Contributor

wenyingd commented Dec 7, 2022

The unit test case "TestOFBridgeIsConnected" is failed, because ofnet is stuck at waiting for tlvMap reply in the versions earlier than this patch is merged antrea-io/ofnet#31 . To resolve the ut failure, we should modify some functions like this,

func newFakeOFSwitch(stream *util.MessageStream, app ofctrl.AppInterface) *ofctrl.OFSwitch {
	dpid, _ := net.ParseMAC("01:02:03:04:05:06:07:08")
	connCh := make(chan int)
	sw := ofctrl.NewSwitch(stream, dpid, app, connCh, 100)
	return sw
}

func sendTlvMapReply(stream *util.MessageStream) {
	reply := &openflow13.TLVTableReply{
		MaxSpace:  248,
		MaxFields: 62,
		TlvMaps: []*openflow13.TLVTableMap{
			{
				OptClass:  0xffff,
				OptType:   0,
				OptLength: 16,
				Index:     0,
			},
			{
				OptClass:  0xffff,
				OptType:   1,
				OptLength: 16,
				Index:     1,
			},
		},
	}
	tlvReplyMessage := openflow13.NewNXTVendorHeader(openflow13.Type_TlvTableReply)
	tlvReplyMessage.VendorData = reply
	stream.Inbound <- tlvReplyMessage
}

// TestOFBridgeIsConnected verifies it's thread-safe to call OFBridge's IsConnected method.
func TestOFBridgeIsConnected(t *testing.T) {
	stream := util.NewMessageStream(&fakeConn{}, nil)
	b := NewOFBridge("test-br", GetMgmtAddress(ovsconfig.DefaultOVSRunDir, "test-br"))
	var wg sync.WaitGroup
	wg.Add(1)
	go func() {
		defer wg.Done()
		newFakeOFSwitch(stream, b)
	}()
	go func() {
		select {
		case <-time.After(time.Millisecond * 100):
			sendTlvMapReply(stream)
		}
	}()
	wg.Add(1)
	go func() {
		defer wg.Done()
		b.IsConnected()
	}()
	wg.Wait()
}

OFBridge.Connected is called automatically in funciton ofctrl.NewSwitch, and no need to explicitly call it in the test.

@tnqn
Copy link
Member Author

tnqn commented Dec 8, 2022

Thanks @wenyingd for the suggestion, will apply it.

This helps automatic recovery if some issues cause OFSwitch reconnection to
not work properly.

It also fixes a race condition between the IsConnected and SwitchConnected
methods of OFBridge and makes necessary changes to the constructor of
APIServer to allow testing.

For antrea-io#4092

Signed-off-by: Quan Tian <[email protected]>
@xliuxu xliuxu force-pushed the automated-cherry-pick-of-#4126-upstream-release-1.7 branch from 239c28f to 00d545e Compare December 15, 2022 10:30
@tnqn
Copy link
Member Author

tnqn commented Dec 15, 2022

/test-latest-all

@xliuxu
Copy link
Contributor

xliuxu commented Dec 15, 2022

/test-latest-e2e

@tnqn
Copy link
Member Author

tnqn commented Dec 16, 2022

e2e failed on #4477, which should be unrelated to the backport.

@tnqn tnqn merged commit 40bebc4 into antrea-io:release-1.7 Dec 16, 2022
@tnqn tnqn deleted the automated-cherry-pick-of-#4126-upstream-release-1.7 branch December 16, 2022 02:50
@xliuxu xliuxu mentioned this pull request Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants