Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/api: honor cdi devices from the hostconfig #25171

Merged

Conversation

giuseppe
Copy link
Member

Disclaimer: I am not very familiar with all the details of CDI, so there might still be something missing, but the following patch seems to be enough for the test cases that were reported so I open a PR to gather some feedback.

Pass down the devices specifies in the resources block so that CDI devices in the compose file are honored.

Tested manually with the following compose file:

services:
testgpupodman_count:
image: ubuntu:latest
command: ["nvidia-smi"]
profiles: [gpu]
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
testgpupodman_deviceid:
image: docker.io/ubuntu:latest
command: ["nvidia-smi"]
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: [gpu]

Does this PR introduce a user-facing change?

Now Podman honors CDI devices passed from the api

@giuseppe giuseppe added the No New Tests Allow PR to proceed without adding regression tests label Jan 30, 2025
@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 30, 2025
@github-actions github-actions bot added the kind/api-change Change to remote API; merits scrutiny label Jan 30, 2025
@baude
Copy link
Member

baude commented Jan 30, 2025

LGTM

@mheon
Copy link
Member

mheon commented Jan 30, 2025

This fixes #19338 right?

@mheon
Copy link
Member

mheon commented Jan 30, 2025

Code LGTM but we really need CI testing for this

@Luap99
Copy link
Member

Luap99 commented Jan 31, 2025

This fixes #19338 right?

If so make sure to add the fixes to the commit please.

As for tests I guess someone needs to take care of #21448 first so we can create a small example cdi spec in a tmp dir foe a test to use. Then use that in either the compose or apiv2 tests and ensure the cdi spec was parsed and the right device was added.

But unless the other PR is in I see no way to correctly test this. I really, really hate the practise of tests dumping stuff in my /etc... and that never works rootless anyway so I would be fine without a test for now.

@giuseppe giuseppe force-pushed the pass-device-requests-to-cli branch from d8347cf to f4bf555 Compare January 31, 2025 10:57
@giuseppe
Copy link
Member Author

But unless the other PR is in I see no way to correctly test this. I really, really hate the practise of tests dumping stuff in my /etc... and that never works rootless anyway so I would be fine without a test for now.

yeah I've avoided a test because it is not easy to add one, and the current one we have for CDI writes to /etc.

I've added a test for compose, it works only for root, and it is slightly better than the integration test, as it creates a tmpfs instead of rm -rf /etc/cdi to cleanup, no matter what was there before

@giuseppe
Copy link
Member Author

This fixes #19338 right?

If so make sure to add the fixes to the commit please.

I didn't initially add it because I was not sure it covers all that was discussed there

@giuseppe giuseppe force-pushed the pass-device-requests-to-cli branch 2 times, most recently from 9d2910a to 23654c6 Compare January 31, 2025 11:01
test/compose/cdi_device/teardown.sh Outdated Show resolved Hide resolved
test/compose/cdi_device/setup.sh Show resolved Hide resolved
test/compose/cdi_device/tests.sh Outdated Show resolved Hide resolved
@giuseppe giuseppe force-pushed the pass-device-requests-to-cli branch 2 times, most recently from 707ab97 to 9223c57 Compare January 31, 2025 14:01
pass down the devices specifies in the resources block so that CDI
devices in the compose file are honored.

Tested manually with the following compose file:

services:
  testgpupodman_count:
    image: ubuntu:latest
    command: ["nvidia-smi"]
    profiles: [gpu]
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]
  testgpupodman_deviceid:
      image: docker.io/ubuntu:latest
      command: ["nvidia-smi"]
      deploy:
        resources:
          reservations:
            devices:
            - driver: cdi
              device_ids: ['nvidia.com/gpu=all']
              capabilities: [gpu]

Closes: containers#19338

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the pass-device-requests-to-cli branch from 9223c57 to 18e2907 Compare January 31, 2025 14:26
@giuseppe
Copy link
Member Author

@Luap99 comments addressed

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

openshift-ci bot commented Jan 31, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Luap99
Copy link
Member

Luap99 commented Jan 31, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 31, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit a923409 into containers:main Jan 31, 2025
80 checks passed
@@ -0,0 +1,9 @@
if is_rootless; then
reason=" - can't write to /etc/cdi"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using /var/run/cdi allow for these to run in rootless environments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, we are not currently using that in Podman (probably we should).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but even when we add that a rootless user would still have no write privs there.
Someone should finish #21448 so the test can pick their own private dir for the configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if the handling of /etc/cdi and /var/run/cdi respect the relevant XDG_ envvars.

@mheon
Copy link
Member

mheon commented Feb 3, 2025

Is this a good 5.4 candidate? I think I'd like it in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/api-change Change to remote API; merits scrutiny lgtm Indicates that a PR is ready to be merged. No New Tests Allow PR to proceed without adding regression tests release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants