Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle CRI Device.HostPath on Windows #6618

Merged
merged 3 commits into from
Mar 11, 2022

Conversation

TBBle
Copy link
Contributor

@TBBle TBBle commented Mar 4, 2022

Fixes: #4878

This implements a mapping from CRI's Device.HostPath into OCI's Windows.Devices, per discussion in #4878 and kubernetes/kubernetes#97739.

The HostPath is interpreted as IDType://ID and those fields are mapped into the same-named fields in WindowsDevice. class/X is special-cased to IDType: class, ID: X as there is existing code in the wild that both generates and consumes that particular format.

oci.WithWindowsDevice helper is added for adding Window Device mounts into OCI container specs for direct users of the oci API; on Windows oci.WithDevices is not implemented as its behaviour depends on Unix host-specific details.

And as a bonus, ctr run --device idType://id now works on Windows. ctr run --device class/GUID does not work though.

@k8s-ci-robot
Copy link

Hi @TBBle. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@TBBle

This comment was marked as resolved.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 4, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 34s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 5m 53s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 24m 24s (non-voting)

@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch 3 times, most recently from fe810fe to f1ea5de Compare March 4, 2022 15:10
@TBBle
Copy link
Contributor Author

TBBle commented Mar 4, 2022

Okay. CRI Integration test passed in https://github.com/containerd/containerd/actions/runs/1934414068. Even the symlink tests which have been failing before now... 😳

https://github.com/containerd/containerd/actions/runs/1934549500 is without the actual code change, so I'm happy the test is effective.

2022-03-04T16:01:34.0228677Z     container_log_test.go:150: 
2022-03-04T16:01:34.0229226Z         	Error Trace:	container_log_test.go:150
2022-03-04T16:01:34.0230308Z         	            				windows_device_test.go:89
2022-03-04T16:01:34.0230894Z         	Error:      	Not equal: 
2022-03-04T16:01:34.0232972Z         	            	expected: "stdout F /Windows/System32/HostDriverStore/FileRepository"
2022-03-04T16:01:34.0234514Z         	            	actual  : "stderr F ls: /Windows/System32/HostDriverStore/*: No such file or directory"
2022-03-04T16:01:34.0235403Z         	            	
2022-03-04T16:01:34.0236146Z         	            	Diff:
2022-03-04T16:01:34.0236962Z         	            	--- Expected
2022-03-04T16:01:34.0237429Z         	            	+++ Actual
2022-03-04T16:01:34.0238115Z         	            	@@ -1 +1 @@
2022-03-04T16:01:34.0239406Z         	            	-stdout F /Windows/System32/HostDriverStore/FileRepository
2022-03-04T16:01:34.0241646Z         	            	+stderr F ls: /Windows/System32/HostDriverStore/*: No such file or directory
2022-03-04T16:01:34.0242775Z         	Test:       	TestWindowsDevice
2022-03-04T16:01:34.0243334Z         	Messages:   	log content should match

@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch from 1515c24 to a5cf682 Compare March 4, 2022 16:10
oci/spec_opts_windows.go Outdated Show resolved Hide resolved
@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch 2 times, most recently from 0851ce8 to 92ecec8 Compare March 4, 2022 16:59
@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 4, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 47s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 5m 54s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 24m 27s (non-voting)

@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch 2 times, most recently from c2d8a1e to 85cc85a Compare March 4, 2022 20:00
oci/spec_opts_windows.go Outdated Show resolved Hide resolved
oci/spec_opts_windows.go Outdated Show resolved Hide resolved
@TBBle TBBle marked this pull request as ready for review March 4, 2022 20:10
@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 4, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 50s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 5m 51s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 24m 50s (non-voting)

@dcantah
Copy link
Member

dcantah commented Mar 4, 2022

fyi @katiewasnothere @kevpar

@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch from 85cc85a to 0e1815e Compare March 6, 2022 09:15
@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 6, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 36s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 5m 40s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 29m 58s (non-voting)

@TBBle
Copy link
Contributor Author

TBBle commented Mar 7, 2022

Another stray thought: Since there isn't actually a spec for interpreting CRI Devices on Windows, perhaps we should limit it to only ://-separated because that's easier to distinguish. And maybe for Linux Sandbox device-mounts, introuce a new type rather than pinging off a POSIX path, although then containerd needs to understand more about the host_path than I'd hoped.

However, that would mean that Kubernetes users would have to use a different path for CRI-based runtimes compared to dockershim. I assume if there's a Docker-based CRI implementation, they've solved this in some way too (probably just passthrough host_path into /-splitting code in Docker), but I should poke around and find out.

I'm not aware of any other CRI implementations that would run on Windows, but would definitely welcome advice that such exists, particularly if they manage this already.

Edit: cri-dockerd just passes the CRI Device fields into the matching Docker DeviceMapping fields, and hence (I assume) ends up in the dockerd code that looks specifically for class/<ID>.

@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch from 0e1815e to 9a6e3df Compare March 8, 2022 14:45
@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 8, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 38s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 5m 47s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 29m 14s (non-voting)

@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch from 9a6e3df to 50a4ecb Compare March 9, 2022 12:01
@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 9, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 36s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 5m 47s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 25m 36s (non-voting)

TBBle added a commit to TBBle/containerd that referenced this pull request Mar 9, 2022
There's two mappings of hostpath to IDType and ID in the wild:
- dockershim and dockerd-cri (implicitly via docker) use class/ID
-- The only supported IDType in Docker is 'class'.
-- https://github.com/aarnaud/k8s-directx-device-plugin generates this form
- https://github.com/jterry75/cri (windows_port branch) uses IDType://ID

`://` is much more easily distinguishable, so I've gone with that one as
the generic separator, with `class/` as a special-case.

This also includes support for setting OCI LinuxDevices on LCOW OCI
specs, but does not expose a mapping for that from CRI.

While https://github.com/jterry75/cri supports a syntax for this,
discussion on PR containerd#6618 indicates that this feature may be
better-implemented as an OCI WindowsDevice IDType recognised at a lower
level, so this syntax is being left unimplemented.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch from 50a4ecb to 777a77c Compare March 9, 2022 14:40
TBBle added a commit to TBBle/containerd that referenced this pull request Mar 9, 2022
There's two mappings of hostpath to IDType and ID in the wild:
- dockershim and dockerd-cri (implicitly via docker) use class/ID
-- The only supported IDType in Docker is 'class'.
-- https://github.com/aarnaud/k8s-directx-device-plugin generates this form
- https://github.com/jterry75/cri (windows_port branch) uses IDType://ID

`://` is much more easily distinguishable, so I've gone with that one as
the generic separator, with `class/` as a special-case.

This also includes support for setting OCI LinuxDevices on LCOW OCI
specs, but does not expose a mapping for that from CRI.

While https://github.com/jterry75/cri supports a syntax for this,
discussion on PR containerd#6618 indicates that this feature may be
better-implemented as an OCI WindowsDevice IDType recognised at a lower
level, so this syntax is being left unimplemented.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch from 1414cf6 to 2a43af5 Compare March 11, 2022 18:37
@TBBle
Copy link
Contributor Author

TBBle commented Mar 11, 2022

Not sure why CI failed. staticcheck is correct that on Windows, those functions always return an error... but I didn't change that, and this seems like idiomatic Go for that use-case. I guess it's not clear why it's only complaining now.

Copy link
Member

@kzys kzys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let me re-run the tests just in case.

@kevpar
Copy link
Member

kevpar commented Mar 11, 2022

Not sure why CI failed. staticcheck is correct that on Windows, those functions always return an error... but I didn't change that, and this seems like idiomatic Go for that use-case. I guess it's not clear why it's only complaining now.

We've observed this on some other PRs as well. I have no idea how something like staticcheck can be inconsistent on the same piece of code.

@TBBle
Copy link
Contributor Author

TBBle commented Mar 11, 2022

My best guess is that there's a cached pass for that code in the cache, and for some reason it's getting a cache-miss here and hence reanalysing it. That would suggest that running it on the code without any cache would always fail, and it needs some markup to except it.

Edit: Looking more closely, and looking at what the SA4023 failure is supposed to mean, it's a faulty test. It's complaining that we're checking nil-ness of a interface{} return, but the type in question is an error, not an interface{}. So this just looks like a bug in the linter, possibly it's confused the first return value (which is an interface{}).

Edit 2: Oops, re-read the FAQ. error is an interface here. It's not just about interface{} specifically.

Edit 3: But I still think it's faulty, in that we're not returning an interface with non-nil-T and nil-V as described in the issue, we're returning an interface with non-nil-T and non-nil-V, so it's the wrong check to fail.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 11, 2022

Build succeeded.

@kevpar
Copy link
Member

kevpar commented Mar 11, 2022

My best guess is that there's a cached pass for that code in the cache, and for some reason it's getting a cache-miss here and hence reanalysing it. That would suggest that running it on the code without any cache would always fail, and it needs some markup to except it.

I was wondering if it could be cache related as well. I'll poke around a little and see what I can find out.

kzys added a commit to kzys/containerd that referenced this pull request Mar 11, 2022
The GitHub Action is unstable especially on Windows (see containerd#6618).
This change may not address the issue itself, but using the latest
version makes reporting the upstream the issue easier.

Signed-off-by: Kazuyoshi Kato <[email protected]>
@kzys
Copy link
Member

kzys commented Mar 11, 2022

@TBBle @kevpar Can you take a look #6666? Once we have this change in main, we can rebase this PR to see whether we still have the lint issue or not.

kzys added a commit to kzys/containerd that referenced this pull request Mar 11, 2022
The GitHub Action is unstable especially on Windows (see containerd#6618).
This change may not address the issue itself, but using the latest
version makes reporting the upstream the issue easier.

Signed-off-by: Kazuyoshi Kato <[email protected]>
This test takes advantage of the fact that when you tell Windows to
mount the GUID_DEVINTERFACE_DISPLAY_ADAPTER class, it will also mount
the host's device store into the container, even if there is no real GPU
on the host.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
There's two mappings of hostpath to IDType and ID in the wild:
- dockershim and dockerd-cri (implicitly via docker) use class/ID
-- The only supported IDType in Docker is 'class'.
-- https://github.com/aarnaud/k8s-directx-device-plugin generates this form
- https://github.com/jterry75/cri (windows_port branch) uses IDType://ID
-- hcsshim's CRI test suite generates this form

`://` is much more easily distinguishable, so I've gone with that one as
the generic separator, with `class/` as a special-case.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
Also fixes the issue that `ctr run` on Windows offered help for the
non-Windows implementation, but was silently ignored.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle TBBle force-pushed the handle-device-host_path-on-windows branch from 2a43af5 to 2a42599 Compare March 11, 2022 21:16
@TBBle
Copy link
Contributor Author

TBBle commented Mar 11, 2022

Well, it passed the linter after rebasing on #6666 so either the issue was fixed in a new release, the flakiness has flaked in my favour this time, or it only fails if the commit hash has an odd number of 7's in it or something equally obscure.

@theopenlab-ci
Copy link

theopenlab-ci bot commented Mar 11, 2022

Build succeeded.

@kzys
Copy link
Member

kzys commented Mar 11, 2022

The only test failure is a know issue.


Summarizing 1 Failure:

[Fail] [k8s.io] Streaming runtime should support streaming interfaces [It] runtime should support exec with tty=true and stdin=true [Conformance] 
github.com/kubernetes-sigs/cri-tools/pkg/validate/streaming.go:203

Ran 32 of 39 Specs in 255.375 seconds
FAIL! -- 31 Passed | 1 Failed | 0 Pending | 7 Skipped

--- FAIL: TestCRISuite (255.40s)

@jterry75
Copy link
Contributor

I say ship it!

@kzys kzys merged commit d4641e1 into containerd:main Mar 11, 2022
@TBBle TBBle deleted the handle-device-host_path-on-windows branch March 11, 2022 23:34
estesp pushed a commit to estesp/containerd that referenced this pull request Mar 12, 2022
The GitHub Action is unstable especially on Windows (see containerd#6618).
This change may not address the issue itself, but using the latest
version makes reporting the upstream the issue easier.

Signed-off-by: Kazuyoshi Kato <[email protected]>
(cherry picked from commit 622a35a)
dmcgowan pushed a commit to dmcgowan/containerd that referenced this pull request Mar 23, 2022
The GitHub Action is unstable especially on Windows (see containerd#6618).
This change may not address the issue itself, but using the latest
version makes reporting the upstream the issue easier.

Signed-off-by: Kazuyoshi Kato <[email protected]>
(cherry picked from commit 622a35a)
(cherry picked from commit 06985e7)
Signed-off-by: Derek McGowan <[email protected]>
dmcgowan pushed a commit to dmcgowan/containerd that referenced this pull request Mar 23, 2022
The GitHub Action is unstable especially on Windows (see containerd#6618).
This change may not address the issue itself, but using the latest
version makes reporting the upstream the issue easier.

Signed-off-by: Kazuyoshi Kato <[email protected]>
(cherry picked from commit 622a35a)
Signed-off-by: Derek McGowan <[email protected]>
dmcgowan pushed a commit to dmcgowan/containerd that referenced this pull request Mar 23, 2022
The GitHub Action is unstable especially on Windows (see containerd#6618).
This change may not address the issue itself, but using the latest
version makes reporting the upstream the issue easier.

Signed-off-by: Kazuyoshi Kato <[email protected]>
(cherry picked from commit 622a35a)
Signed-off-by: Derek McGowan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support device mount for windows container
8 participants