[Windows]Check HostNamespace value when removing HcnEndpoint #2306

wenyingd · 2021-06-24T12:32:38Z

For docker on Windows, no HostComputeNamespace is created before calling CNI to add network configuration. But containerd would do it. If a HnsEndpoint is created without HostComputeNamespace, a default value "00000000-0000-0000-0000-000000000000" should be set. HCN API should return an Error if a HnsEndpoint is removed from host Namespace "00000000-0000-0000-0000-000000000000". Hence, we skip invocation of "hcn.RemoveNamespaceEndpoint" for docker case.

Signed-off-by: wenyingd [email protected]

Fixes #2301

wenyingd · 2021-06-24T13:03:41Z

/test-all

codecov-commenter · 2021-06-24T13:34:55Z

Codecov Report

Merging #2306 (b96fdac) into main (ef14ad1) will increase coverage by 3.43%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #2306      +/-   ##
==========================================
+ Coverage   62.05%   65.48%   +3.43%     
==========================================
  Files         277      277              
  Lines       21588    21588              
==========================================
+ Hits        13396    14137     +741     
+ Misses       6792     6008     -784     
- Partials     1400     1443      +43

Flag	Coverage Δ
e2e-tests	`55.81% <ø> (?)`
kind-e2e-tests	`52.41% <ø> (+0.06%)`	⬆️
unit-tests	`41.87% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...gent/controller/networkpolicy/status_controller.go	`75.34% <0.00%> (-2.74%)`	⬇️
pkg/ovs/openflow/ofctrl_bridge.go	`50.36% <0.00%> (+0.36%)`	⬆️
pkg/agent/openflow/network_policy.go	`76.41% <0.00%> (+0.59%)`	⬆️
pkg/agent/cniserver/pod_configuration.go	`54.68% <0.00%> (+0.78%)`	⬆️
pkg/agent/flowexporter/exporter/exporter.go	`79.58% <0.00%> (+0.91%)`	⬆️
pkg/controller/egress/store/egressgroup.go	`1.38% <0.00%> (+1.38%)`	⬆️
pkg/ovs/openflow/ofctrl_flow.go	`47.36% <0.00%> (+1.75%)`	⬆️
pkg/flowaggregator/certificate.go	`79.27% <0.00%> (+2.70%)`	⬆️
pkg/agent/route/route_linux.go	`47.87% <0.00%> (+3.19%)`	⬆️
pkg/agent/openflow/cookie/allocator.go	`67.74% <0.00%> (+3.22%)`	⬆️
... and 39 more

antoninbas · 2021-06-24T16:56:41Z

@wenyingd please backport to 0.13, 1.0 and 1.1 once this is merged (see https://github.com/antrea-io/antrea/blob/main/docs/contributors/cherry-picks.md for directions)

tnqn

@wenyingd I feel the commit message is a bit hard to understand.

If a HnsEndpoint is created without HostComputeNamespace, a default
value "00000000-0000-0000-0000-000000000000" should be set.

In which situation the if condition could happen? does it depend on user input or it's always like this. And do you mean "If ..., a default value ... will be set."?

But we can't find and HnsEndpoint in this namespace.

s/and/the/ ?

tnqn · 2021-06-25T03:37:08Z

pkg/agent/cniserver/interface_configuration_windows.go

@@ -363,6 +363,10 @@ func (ic *ifConfigurator) removeHNSEndpoint(endpoint *hcsshim.HNSEndpoint, conta
 	return nil
 }

+func isValidHostNamespace(hostNamespace string) bool {
+	return hostNamespace != "" && hostNamespace != "00000000-0000-0000-0000-000000000000"


In which case hostNamespace will be "00000000-0000-0000-0000-000000000000"? Does it apply to all Pods we create or it's because the pod is removed in some specific order?
Is it safe to not call RemoveNamespaceEndpoint at all in this case?

"00000000-0000-0000-0000-000000000000" is used if we use "docker" to create container on Windows. It is because docker doesn't create host namespace when creating the sandbox container. But for "containerd", creating host namespace is a step before calling CNI to configure container network, so there should be a valid value.
To support containerd, we have a check here to ensure we could remove the HcnEndpoint(HnsEndpoint) from the host namespace.
However, if we remove HcnEndpoint from host namespace "00000000-0000-0000-0000-000000000000", Windows HCN API might return an Error, and it will break the continual steps (deleting HndEndpoint), and kept the HnsEndpoint on the host.

Thanks for the explanation. could we also add the comment to the code for future reference? and maybe add the runtime information to #2301 so people know it's specific to docker.

wenyingd · 2021-06-28T05:39:39Z

/test-windows-conformance
/test-ipv6-conformance
/test-ipv6-e2e
/test-ipv6-networkpolicy
/test-ipv6-only-conformance
/test-ipv6-only-e2e
/test-ipv6-only-networkpolicy

tnqn · 2021-06-29T06:19:17Z

Do you think it's possible to have a windows version of TestDeletePod to ensure all resources associated with a Pod is deleted after the Pod is deleted to avoid regression in the future?

antrea/test/e2e/basic_test.go

Line 182 in e8ca544

func TestDeletePod(t *testing.T) {

If a HnsEndpoint is created without HostComputeNamespace, a default value "00000000-0000-0000-0000-000000000000" should be set. But we can't find and HnsEndpoint in this namespace. Signed-off-by: wenyingd <[email protected]>

wenyingd · 2021-06-29T07:33:33Z

/test-all

wenyingd · 2021-06-29T07:35:00Z

Do you think it's possible to have a windows version of TestDeletePod to ensure all resources associated with a Pod is deleted after the Pod is deleted to avoid regression in the future?

antrea/test/e2e/basic_test.go

Line 182 in e8ca544

func TestDeletePod(t *testing.T) {

That would be great. But I would like to add it in next release. Besides, I am just confused do we have Windows e2e testbed now?

tnqn

LGTM

tnqn · 2021-06-29T08:46:22Z

Do you think it's possible to have a windows version of TestDeletePod to ensure all resources associated with a Pod is deleted after the Pod is deleted to avoid regression in the future?

antrea/test/e2e/basic_test.go

Line 182 in e8ca544

func TestDeletePod(t *testing.T) {

That would be great. But I would like to add it in next release. Besides, I am just confused do we have Windows e2e testbed now?

There are already some Windows specific e2e tests. I believe test-windows-e2e runs e2e on windows testbed?

tnqn · 2021-06-29T08:48:31Z

/test-windows-e2e

lzhecheng · 2021-06-29T08:51:48Z

/test-windows-e2e

lzhecheng · 2021-06-29T08:53:01Z

/test-windows-e2e

lzhecheng · 2021-06-29T08:54:03Z

/test-windows-e2e

tnqn · 2021-06-29T09:11:18Z

/test-windows-networkpolicy
/test-windows-conformance

wenyingd · 2021-06-29T09:48:55Z

/test-e2e

tnqn · 2021-06-29T10:38:57Z

/test-windows-e2e

tnqn · 2021-06-29T11:35:41Z

windows-e2e failed once but I don't think it's related to this PR, and the same test passed in normal e2e test: https://jenkins.antrea-ci.rocks/job/antrea-e2e-for-pull-request/2780/.

=== CONT  TestClusterIP
    service_test.go:59: 
        	Error Trace:	service_test.go:59
        	            				service_test.go:81
        	Error:      	Received unexpected error:
        	            	nc stdout: <>, stderr: <nc: 10.99.89.26 (10.99.89.26:80): Connection timed out
        	            	nc: 10.99.89.26 (10.99.89.26:80): Connection timed out
        	            	nc: 10.99.89.26 (10.99.89.26:80): Connection timed out
        	            	nc: 10.99.89.26 (10.99.89.26:80): Connection timed out
        	            	nc: 10.99.89.26 (10.99.89.26:80): Connection timed out
        	            	>, err: <command terminated with exit code 1>
        	Test:       	TestClusterIP
        	Messages:   	Pod client-on-same-node should be able to connect 10.99.89.26:80, but was not able to connect
=== CONT  TestClusterIP/ClusterIP/Linux_Pod_on_same_Node_can_access_the_Service
    testing.go:1103: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
=== CONT  TestClusterIP
    fixtures.go:220: Exporting test logs to '/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/antrea-test-logs/TestClusterIP/beforeTeardown.Jun29-09-18-08'
    fixtures.go:324: Error when exporting kubelet logs: error when running journalctl on Node 'a-ms-0005-0', is it available? Error: <nil>
    fixtures.go:345: Deleting 'antrea-test' K8s Namespace
--- FAIL: TestClusterIP (74.15s)
    --- FAIL: TestClusterIP/ClusterIP (0.00s)
        --- PASS: TestClusterIP/ClusterIP/Same_Linux_Node_can_access_the_Service (0.75s)
        --- PASS: TestClusterIP/ClusterIP/Different_Linux_Node_can_access_the_Service (0.90s)
        --- PASS: TestClusterIP/ClusterIP/Windows_host_can_access_the_Service (5.51s)
        --- PASS: TestClusterIP/ClusterIP/Linux_Pod_on_different_Node_can_access_the_Service (18.12s)
        --- FAIL: TestClusterIP/ClusterIP/Linux_Pod_on_same_Node_can_access_the_Service (36.10s)

Merging it to start backporting, will watch latest windows-e2e test for this PR to see if the failure is consistent.

tnqn · 2021-06-29T12:46:28Z

Weird, windows-e2e failed another time for same reason:

=== CONT  TestClusterIP
    service_test.go:59: 
        	Error Trace:	service_test.go:59
        	            				service_test.go:81
        	Error:      	Received unexpected error:
        	            	nc stdout: <>, stderr: <nc: 10.106.134.164 (10.106.134.164:80): Connection timed out
        	            	nc: 10.106.134.164 (10.106.134.164:80): Connection timed out
        	            	nc: 10.106.134.164 (10.106.134.164:80): Connection timed out
        	            	nc: 10.106.134.164 (10.106.134.164:80): Connection timed out
        	            	nc: 10.106.134.164 (10.106.134.164:80): Connection timed out
        	            	>, err: <command terminated with exit code 1>
        	Test:       	TestClusterIP
        	Messages:   	Pod client-on-same-node should be able to connect 10.106.134.164:80, but was not able to connect
=== CONT  TestClusterIP/ClusterIP/Linux_Pod_on_same_Node_can_access_the_Service
    testing.go:1048: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
=== CONT  TestClusterIP
    fixtures.go:220: Exporting test logs to '/var/lib/jenkins/workspace/antrea-windows-e2e-for-pull-request/antrea-test-logs/TestClusterIP/beforeTeardown.Jun29-11-59-00'
    fixtures.go:324: Error when exporting kubelet logs: error when running journalctl on Node 'a-ms-0001-0', is it available? Error: <nil>
    fixtures.go:345: Deleting 'antrea-test' K8s Namespace
--- FAIL: TestClusterIP (73.73s)
    --- FAIL: TestClusterIP/ClusterIP (0.00s)
        --- PASS: TestClusterIP/ClusterIP/Same_Linux_Node_can_access_the_Service (0.74s)
        --- PASS: TestClusterIP/ClusterIP/Windows_host_can_access_the_Service (5.23s)
        --- PASS: TestClusterIP/ClusterIP/Different_Linux_Node_can_access_the_Service (0.60s)
        --- PASS: TestClusterIP/ClusterIP/Linux_Pod_on_different_Node_can_access_the_Service (20.13s)
        --- FAIL: TestClusterIP/ClusterIP/Linux_Pod_on_same_Node_can_access_the_Service (36.16s)

tnqn · 2021-06-29T17:12:56Z

Figured out the reason why it failed in windows-e2e only, filed #2330 to track the issue. It's a regression when Egress and AntreaProxy are enabled together.

wenyingd requested review from lzhecheng and tnqn June 24, 2021 12:33

tnqn reviewed Jun 25, 2021

View reviewed changes

wenyingd requested a review from tnqn June 28, 2021 03:52

wenyingd force-pushed the issue_2301 branch from b96fdac to ad46d87 Compare June 29, 2021 07:28

[Windows]Check HostNamespace value when removing HcnEndpoint

ec03cfe

If a HnsEndpoint is created without HostComputeNamespace, a default value "00000000-0000-0000-0000-000000000000" should be set. But we can't find and HnsEndpoint in this namespace. Signed-off-by: wenyingd <[email protected]>

wenyingd force-pushed the issue_2301 branch from ad46d87 to ec03cfe Compare June 29, 2021 07:29

tnqn approved these changes Jun 29, 2021

View reviewed changes

tnqn merged commit 4405aaa into antrea-io:main Jun 29, 2021

tnqn mentioned this pull request Jun 29, 2021

Cannot access ClusterIP service if the endpoint is on local Node when both AntreaProxy and Egress are enabled #2330

Closed

wenyingd deleted the issue_2301 branch October 8, 2021 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Windows]Check HostNamespace value when removing HcnEndpoint #2306

[Windows]Check HostNamespace value when removing HcnEndpoint #2306

wenyingd commented Jun 24, 2021 •

edited

Loading

wenyingd commented Jun 24, 2021

codecov-commenter commented Jun 24, 2021 •

edited

Loading

antoninbas commented Jun 24, 2021

tnqn left a comment

tnqn Jun 25, 2021 •

edited

Loading

wenyingd Jun 28, 2021 •

edited

Loading

tnqn Jun 29, 2021

wenyingd commented Jun 28, 2021

tnqn commented Jun 29, 2021

wenyingd commented Jun 29, 2021

wenyingd commented Jun 29, 2021

tnqn left a comment

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

lzhecheng commented Jun 29, 2021

lzhecheng commented Jun 29, 2021

lzhecheng commented Jun 29, 2021

tnqn commented Jun 29, 2021

wenyingd commented Jun 29, 2021

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

[Windows]Check HostNamespace value when removing HcnEndpoint #2306

[Windows]Check HostNamespace value when removing HcnEndpoint #2306

Conversation

wenyingd commented Jun 24, 2021 • edited Loading

wenyingd commented Jun 24, 2021

codecov-commenter commented Jun 24, 2021 • edited Loading

Codecov Report

antoninbas commented Jun 24, 2021

tnqn left a comment

Choose a reason for hiding this comment

tnqn Jun 25, 2021 • edited Loading

Choose a reason for hiding this comment

wenyingd Jun 28, 2021 • edited Loading

Choose a reason for hiding this comment

tnqn Jun 29, 2021

Choose a reason for hiding this comment

wenyingd commented Jun 28, 2021

tnqn commented Jun 29, 2021

wenyingd commented Jun 29, 2021

wenyingd commented Jun 29, 2021

tnqn left a comment

Choose a reason for hiding this comment

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

lzhecheng commented Jun 29, 2021

lzhecheng commented Jun 29, 2021

lzhecheng commented Jun 29, 2021

tnqn commented Jun 29, 2021

wenyingd commented Jun 29, 2021

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

tnqn commented Jun 29, 2021

wenyingd commented Jun 24, 2021 •

edited

Loading

codecov-commenter commented Jun 24, 2021 •

edited

Loading

tnqn Jun 25, 2021 •

edited

Loading

wenyingd Jun 28, 2021 •

edited

Loading