-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Reconcile host-network Pods after agent is restarted #6944
[Windows] Reconcile host-network Pods after agent is restarted #6944
Conversation
@wenyingd there is a golang-lint issue, please fix it. |
547d964
to
da952ef
Compare
updated. |
/test-all |
@@ -98,3 +99,13 @@ func getInfraContainer(containerID, netNS string) string { | |||
func (c *CNIConfig) getInfraContainer() string { | |||
return getInfraContainer(c.ContainerId, c.Netns) | |||
} | |||
|
|||
// getPodsListOptions returns the Pods running on the current Node. Note, the host-network Pods are not filtered | |||
// out on Windows because they are also managed by antrea as long as "spec.SecurityContext.windowsOptions.hostProcess" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen if "spec.SecurityContext.windowsOptions.hostProcess" is configured and the Pod is not filtered out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing was performed since we didn't create OVS interfaces for such Pods.
Having offline sync with @tnqn , we would not perform other filters on Windows in reconcile logic except for the node name to get local Pods and resourceVersion=0 to avoid performance issue, this is to try our best to ensure the reconciled Pod set can cover those Antrea have handled.
da952ef
to
065af4a
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one nit.
@tnqn @antoninbas can you take a look? thanks.
065af4a
to
1193dc7
Compare
/test-all |
/test-windows-e2e |
/test-windows-networkpolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add an e2e test for this as well?
@@ -741,11 +758,19 @@ func TestReconcile(t *testing.T) { | |||
go cniServer.podConfigurator.Run(stopCh) | |||
|
|||
// Re-install Pod1 flows | |||
podFlowsInstalled := make(chan string, 2) | |||
expReInstalledPodCount := 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: expectedReinstalledPodCount
reinstall is one word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
I will add an e2e case to verify the host-network Pod traffic is working even if antrea-agent is restarted on Windows.
1193dc7
to
f6c3466
Compare
/test-windows-all |
@@ -98,3 +99,15 @@ func getInfraContainer(containerID, netNS string) string { | |||
func (c *CNIConfig) getInfraContainer() string { | |||
return getInfraContainer(c.ContainerId, c.Netns) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Pod Type | Spec.HostNetwork | windowsOptions.hostProcess | Kubelet CMDADD | OVS interface needed? | Reconcile needed? | | |
| 1. Normal Pod (non-HostNetwork) | false | false / not set | Yes | Yes | Yes | | |
| 2. Windows HostNetwork Pod | true | false / not set | Yes | Yes | Yes | | |
| 3. Windows HostProcess Pod | (any) | true | No | No | No (no real network config) | |
@wenyingd Can we add a table to the code comments to explain the different behaviors of Windows Pods for developers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
f6c3466
to
c3d9724
Compare
/test-windows-e2e |
1 similar comment
/test-windows-e2e |
/test-windows-all |
c3d9724
to
127aaf4
Compare
/test-windows-e2e |
test/e2e/connectivity_test.go
Outdated
assert.NoErrorf(t, err, "The Openflow entries should be consistent after Antrea agent restarts") | ||
|
||
data.runPingMesh(t, podInfos, toolboxContainerName, true) | ||
time.Sleep(10 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you had this for debugging purposes, that's a very long sleep :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will remove it after finishing the debug on the new e2e case.
// This case only run on Windows. The Pods in the test includes both host-network Pod and none host-network Pod, | ||
// because CNI on Windows is also responsible for the host-network Pod's networking as long as it is not using | ||
// host-process containers. | ||
func testWindowsPodConnectivityAfterAntreaRestart(t *testing.T, data *TestData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one issue with the test as currently written is that if there is a failure, it's not clear which one of the 2 Pods may be responsible (assuming only 1 Pod is broken after restart).
Maybe we could have 1 Linux Pod as the client, and then use it to ping each Windows Pod independently? This way we could split this test into 2 subtests, one for the regular Pod and one for the hostNetwork Pod. Would that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, updated.
127aaf4
to
1cf8dcf
Compare
1cf8dcf
to
6234815
Compare
/test-windows-all |
6234815
to
471d2a7
Compare
/test-windows-all |
/test-all |
The new added e2e is succeeded, the failed case in windows-e2e is See: http://10.164.243.223/view/Windows/job/antrea-windows-e2e-for-pull-request/89 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple nits, otherwise LGTM
if err := data.createToolboxPodOnNode(clientPod.Name, clientPod.Namespace, clientPod.NodeName, false); err != nil { | ||
t.Fatalf("Error when creating Pod '%s': %v", clientPod.Name, err) | ||
} | ||
defer deletePodWrapper(t, data, clientPod.Namespace, clientPod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to omit this, especially for a simple Linux Pod like this one, the Pod will be automatically deleted when the test Namespace is deleted at the end of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this code since personally I prefer to clean up the new
resources created dedicated for the case after the test.
test/e2e/connectivity_test.go
Outdated
data.runPingMesh(t, testPodInfos, toolboxContainerName, true) | ||
|
||
// Count the OVS flows. | ||
oriOVSFlows := data.dumpOVSFlows(t, winWorkerNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spell out original
originalOVSFlows
or initialOVSFlows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated accordingly.
test/e2e/connectivity_test.go
Outdated
@@ -121,6 +125,100 @@ func (data *TestData) runPingMesh(t *testing.T, podInfos []PodInfo, ctrname stri | |||
} | |||
} | |||
|
|||
// verifyWindowsPodConnectivity checks the Pods' connectivity after antrea-agent is restarted on Windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checks Pod connectivity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
test/e2e/connectivity_test.go
Outdated
@@ -121,6 +125,100 @@ func (data *TestData) runPingMesh(t *testing.T, podInfos []PodInfo, ctrname stri | |||
} | |||
} | |||
|
|||
// verifyWindowsPodConnectivity checks the Pods' connectivity after antrea-agent is restarted on Windows. | |||
// The Pods in the test includes both host-network Pod and the generic Pod, because CNI on Windows is also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We test both the generic Pod case and the host-network Pod case, because ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea has received the CmdAdd request for such Pods when they are created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <[email protected]>
471d2a7
to
a8a7ef7
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@wenyingd will you backport this? |
Sure, I will back port it to release-2.1 and release-2.2. |
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <[email protected]>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <[email protected]>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers.
Antrea has received the CmdAdd request for such Pods when they are created, so they should be included in the Pod reconcile list after agent is restarted.
Fix: #6943