Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SecondaryNetwork] Attach OVS uplink after removing flow-restore-wait flag #6504

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

wenyingd
Copy link
Contributor

@wenyingd wenyingd commented Jul 4, 2024

Physcial network interface is attached on the OVS bridge with SecondaryNetwork feature enabled. Antrea uses a global configuration flow-restore-wait='true' to ensure that OVS OpenFlow entries can start working after the dependencies are ready. A connectivity issue exists if a setup uses the Node NIC as secondary network interface and connects the NIC to OVS bridge before removing the flow-restore-wait option.

This change ensures agent attaches the physical network interface to the secondary OVS bridge after the global flow-restore-wait option is removed.

Fix: #6448

cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/init_linux.go Outdated Show resolved Hide resolved
pkg/agent/secondarynetwork/podwatch/controller.go Outdated Show resolved Hide resolved
@wenyingd wenyingd force-pushed the issue_6448 branch 2 times, most recently from 10c00aa to 331a31c Compare July 5, 2024 02:09
@wenyingd
Copy link
Contributor Author

wenyingd commented Jul 5, 2024

/test-all

@tnqn tnqn added this to the Antrea v2.1 release milestone Jul 5, 2024
@tnqn tnqn added action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. labels Jul 5, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments, will defer to @jianjuns for final approval

agentconfig "antrea.io/antrea/pkg/config/agent"
"antrea.io/antrea/pkg/util/channel"
)

func Initialize(
type Controller struct {
PodController *podwatch.PodController
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call Controller.PodController.Run() in cmd/antrea-agent/agent.go after exposing PodController as public. If I don't add this property in init_windows.go, it returns compile error on Windows.

Copy link
Member

@tnqn tnqn Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to call a struct's member's method from the struct's consumer, I think you should keep the Controller's Run method.

When I suggested to make podController public, I mean making it public so pkg/agent/secondary can use call the struct's method directly, instead of having to relying on another interface. I don't mean to calling podController's method directly in pkg/cmd directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I would update.

@@ -864,6 +865,14 @@ func run(o *Options) error {
return fmt.Errorf("failed to connect uplink to OVS bridge: %w", err)
}
}
// secondaryNetworkController Initialize must be run after FlowRestoreComplete.
if features.DefaultFeatureGate.Enabled(features.SecondaryNetwork) {
defer secondaryNetworkController.RestoreHostInterfaceConfiguration()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe name it secondaryNetworkController.Restore() to hide details from the program entry and to be consistent with the other functions: Initialize, Run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated.

@wenyingd wenyingd force-pushed the issue_6448 branch 2 times, most recently from dde8a35 to 4c26b7a Compare July 5, 2024 14:54
tnqn
tnqn previously approved these changes Jul 5, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should revise the commit message "Attach OVS uplink for after FlowRestoreComplete".

pkg/agent/secondarynetwork/init_linux.go Outdated Show resolved Hide resolved
cmd/antrea-agent/agent.go Show resolved Hide resolved
cmd/antrea-agent/agent.go Outdated Show resolved Hide resolved
@wenyingd wenyingd changed the title [SecondaryNetwork] Attach OVS uplink for after FlowRestoreComplete [SecondaryNetwork] Attach OVS uplink after FlowRestoreComplete Jul 8, 2024
@wenyingd wenyingd changed the title [SecondaryNetwork] Attach OVS uplink after FlowRestoreComplete [SecondaryNetwork] Attach OVS uplink after removing flow-restore-wait flag Jul 8, 2024
@wenyingd
Copy link
Contributor Author

wenyingd commented Jul 8, 2024

/test-all

@wenyingd
Copy link
Contributor Author

wenyingd commented Jul 8, 2024

/test-latest-all

podController *podwatch.PodController
}

func NewController(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this one to init.go? It seems have no OS specific code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the original logic, Windows does not supported SecondaryNetwork feature, so we would expect to return error on Windows when calling function NewController. If we move to init.go, we can't satisfy this precheck on Windows.

I can move the struct definition of Controller to init.go, and prefer to leave different functions of NewController in init_linux.go and init_windows.go.

What is your thought @jianjuns ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: I tried to move the struct of Controller to init.go, then I would hit golangci-lint failures, because the properties of ovsBridgeClient and secNetConfig are not used in init.go. So I moved it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want not Linux feature code to be in common files, then there are many more code / funcs to move to _linux.go. For example, all secondary network should be moved to _linux.go then.

So, I feel it does not help just to put NewController() to init_linux.go. For consideration of supporting Windows in future, it is also good make common code compliable on both Linux and Windows?

Copy link
Contributor Author

@wenyingd wenyingd Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your suggestion is we modify the functions in secondaryNetwork module more common in this bug fix? If this is the purpose, I feel most code in the current init_linux.go can move to init.go, the OS specific code are only for calls "util.PrepareHostInterfaceConnection" in Initialize and "RestoreHostInterfaceConfiguration" in Restore.

If we follow this direction, I would move most of the init_linux.go to init.go, and leave the two calls in _linux.go and _windows.go, is this what you want? @jianjuns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm.. I was just suggesting moving NewController() to init.go, if it does not have any Linux dependency. Moving Initialize() and Restore() requires more code changes and probably more dummy funcs for Windows, and so not worthwhile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I will update accordingly.

if len(secNetConfig.OVSBridges) != 0 && len(secNetConfig.OVSBridges[0].PhysicalInterfaces) == 1 {
util.RestoreHostInterfaceConfiguration(secNetConfig.OVSBridges[0].BridgeName, secNetConfig.OVSBridges[0].PhysicalInterfaces[0])
// Run starts the Pod controller for secondary networks.
func (c *Controller) Run(stopCh <-chan struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate Run() func? Why not run the controller in Initialize()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we must keep it, it can be moved to init.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate Run() func? Why not run the controller in Initialize()?

From the function content perspective, "Initialize" is a one-time function, which is to initialize the secondary OVS bridge configurations including attaching the uplink to OVS bridge, and Run() is a long-run function which runs the PodController until the a signal receives by stopCh. So I would prefer to separate them into two functions.

I can move the Run function to init.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. We can follow the philosophy you suggested.

@wenyingd wenyingd force-pushed the issue_6448 branch 3 times, most recently from 6bd1a4c to ad15933 Compare July 10, 2024 05:47
… flag

Physcial network interface is attached on the OVS bridge with SecondaryNetwork
feature enabled. Antrea uses a global configuration flow-restore-wait='true'
to ensure that OVS OpenFlow entries can start working after the dependencies are
ready. A connectivity issue exists if a setup uses the Node NIC as secondary
network interface and connects the NIC to OVS bridge before removing the
flow-restore-wait option.

This change ensures agent attaches the physical network interface to the secondary
OVS bridge after the global flow-restore-wait option is removed.

Signed-off-by: Wenying Dong <[email protected]>
@wenyingd
Copy link
Contributor Author

/test-all

@jianjuns jianjuns merged commit f4ef620 into antrea-io:main Jul 10, 2024
53 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SecondaryNetwork breaks network connection of compute nodes
3 participants