Skip to content

build(cuda): Add CUDA_VERSION build arg to adapters dockerfile#16234

Closed
bdice wants to merge 2 commits intofacebookincubator:mainfrom
bdice:make-cuda-configurable
Closed

build(cuda): Add CUDA_VERSION build arg to adapters dockerfile#16234
bdice wants to merge 2 commits intofacebookincubator:mainfrom
bdice:make-cuda-configurable

Conversation

@bdice
Copy link
Copy Markdown
Collaborator

@bdice bdice commented Feb 4, 2026

Summary

We would like to make it easier to configure the CUDA version used in building Velox containers.

This PR adds a CUDA_VERSION build arg to the adapters stage in centos-multi.dockerfile which allows overriding the CUDA version at build time. As before, it defaults to 12.9.

This mirrors the approach in prestodb/presto#27074.

@netlify
Copy link
Copy Markdown

netlify bot commented Feb 4, 2026

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 81adc2d
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/6983b3f798d903000831a400

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026
COPY scripts/setup-centos-adapters.sh /

ARG CUDA_VERSION
ENV CUDA_VERSION=${CUDA_VERSION:-12.9}
Copy link
Copy Markdown
Collaborator Author

@bdice bdice Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This environment variable is read by bash /setup-centos-adapters.sh install_cuda in the next step.

@karthikeyann karthikeyann requested a review from czentgr February 9, 2026 18:02
Copy link
Copy Markdown
Collaborator

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
@czentgr to review this PR since it adds a new ARG to docker image.

Copy link
Copy Markdown
Collaborator

@zoltan zoltan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bdice bdice added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Feb 11, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Feb 11, 2026

@peterenescu has imported this pull request. If you are a Meta employee, you can view this in D92988821.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Feb 12, 2026

@jainxrohit merged this pull request in 9ea2545.

@kgpai
Copy link
Copy Markdown
Contributor

kgpai commented Feb 12, 2026

@bdice Folks looks like the docker publish scripts arent working : https://github.com/facebookincubator/velox/actions/runs/21961775674/job/63448645977 so your cuda version will not be updated.
Can you folks fix the script ? I am going to revert this change in the meantime - We can reland this after the fix.

@karthikeyann
Copy link
Copy Markdown
Collaborator

karthikeyann commented Feb 13, 2026

@kgpai could you elaborate, what is the issue with the script?

@bdice
Copy link
Copy Markdown
Collaborator Author

bdice commented Feb 13, 2026

Looks like this:

ERROR: ghcr.io/facebookincubator/velox-dev:buildx.build.warnings-arm64: not found

Nothing I changed here is arm64-specific. Do the images build successfully without this change? It’s not a network error or something temporary?

@karthikeyann
Copy link
Copy Markdown
Collaborator

karthikeyann commented Feb 13, 2026

@kgpai Triaged the issue.
This error comes because of buildx.build.warnings artifact appears from "Build ci Image on arm64" job, and it is generated from
https://github.com/facebookincubator/velox/actions/runs/21961775674/job/63440855278#step:8:321

It comes from

          "filename": "scripts/docker/centos-multi.dockerfile",
              "line": 167

which is

ENV LD_LIBRARY_PATH="/usr/local/lib:/usr/local/lib64:$LD_LIBRARY_PATH"

This seems to be unrelated to this line change, only connection is this PR changes the same file scripts/docker/centos-multi.dockerfile but different line.

Fix could be either

ENV LD_LIBRARY_PATH="/usr/local/lib:/usr/local/lib64:${LD_LIBRARY_PATH:-}"

or

ENV LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64

Or Suggested alternative fix in issue #16386

@bdice
Copy link
Copy Markdown
Collaborator Author

bdice commented Feb 13, 2026

To add to @karthikeyann's analysis -- the Dockerfile warning is the trigger, but the reason it breaks the build is a bug in the "Export digests" step of docker.yml. The jq filter iterates over all top-level keys in the bake metadata and creates a file for each one. When buildx emits a buildx.build.warnings key in the metadata (due to the LD_LIBRARY_PATH lint warning), that key gets turned into a file alongside the real targets (adapters, centos9, pyvelox). Then the merge job's for file in * loop tries to create a multi-arch manifest for it and fails.

Fixing the Dockerfile warning resolves the immediate issue, but the jq filter should also be hardened so that any future buildx metadata key (e.g., buildx.build.ref, buildx.build.provenance) doesn't cause the same failure. The real image targets all have an image.name field in their metadata value -- the filter could select only entries where that field exists:

'def skip: ["ubuntu", "fedora"];
  . | to_entries[] | select(.value | has("image.name")) |
  .key | split("-")[0] |
  if . as $name | skip | index($name) != null then empty else . end'

@kgpai
Copy link
Copy Markdown
Contributor

kgpai commented Feb 20, 2026

Cool Thank you @bdice & @karthikeyann ; Not sure when these warnings started appearing in the digest , but as suggested it makes sense to use ld.conf and also fix the jq script.

meta-codesync bot pushed a commit that referenced this pull request Feb 25, 2026
…16387)

Summary:
Fixes issues discussed in #16234.

The "Export digests" jq filter processes all top-level bake metadata keys, including non-image keys like `buildx.build.warnings`. The merge job then fails trying to create a manifest for them. Filter to only entries with `image.name` (real image targets), and fix the undefined `$LD_LIBRARY_PATH` that triggers the warning.

Analysis: #16234 (comment) and #16234 (comment)

I implemented the proposal in #16386 with `ldconfig`. Closes #16386.

Pull Request resolved: #16387

Reviewed By: kagamiori

Differential Revision: D93810988

Pulled By: kgpai

fbshipit-source-id: b8d5d94c5677340aca851153eceb55834dc7cf10
sperlingxx pushed a commit to sperlingxx/velox that referenced this pull request Mar 5, 2026
…ookincubator#16234)

Summary:
We would like to make it easier to configure the CUDA version used in building Velox containers.

This PR adds a `CUDA_VERSION` build arg to the adapters stage in `centos-multi.dockerfile` which allows overriding the CUDA version at build time. As before, it defaults to 12.9.

This mirrors the approach in prestodb/presto#27074.

Pull Request resolved: facebookincubator#16234

Reviewed By: kevinwilfong

Differential Revision: D92988821

Pulled By: jainxrohit

fbshipit-source-id: f9756fb577bbb8aa4b7d4b775b2cb1c806aa97ac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants