Skip to content

docker: Add CUDA 13.2 Docker containers#2843

Merged
yongwww merged 1 commit intoflashinfer-ai:mainfrom
bkryu:cu132-container
Mar 22, 2026
Merged

docker: Add CUDA 13.2 Docker containers#2843
yongwww merged 1 commit intoflashinfer-ai:mainfrom
bkryu:cu132-container

Conversation

@bkryu
Copy link
Collaborator

@bkryu bkryu commented Mar 20, 2026

📌 Description

  • Add Dockerfile.cu132 and Dockerfile.cu132.dev based on nvidia/cuda:13.2.0-devel-ubuntu24.04
  • Replace cu131 with cu132 in the CI Docker release workflow, since PyTorch skipped cu131 wheels
  • Use PyTorch nightly (nightly/cu132) until a stable release is available for cu132

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Chores
    • Updated CI/CD pipeline to support CUDA 13.2, replacing the previous 13.1 version for all image builds
    • Added Docker runtime container image for CUDA 13.2 with Python 3.12 and production dependencies
    • Added Docker development container image for CUDA 13.2 with comprehensive developer tools, MPI support, and shell utilities

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates support for CUDA 13.2 into the project's Docker build system. It introduces new Dockerfiles for both standard and development environments, enabling the use of the latest NVIDIA GPU capabilities and PyTorch versions. This update ensures compatibility with current hardware and software stacks, facilitating ongoing development and testing.

Highlights

  • New CUDA 13.2 Dockerfiles: Introduced Dockerfile.cu132 for production and Dockerfile.cu132.dev for development, both based on nvidia/cuda:13.2.0-devel-ubuntu24.04.
  • PyTorch Integration: Configured Docker images to use PyTorch nightly builds for CUDA 13.2, with plans to switch to stable releases once available.
  • CI Workflow Update: Updated the CI Docker release workflow to use cu132 instead of cu131, aligning with PyTorch's wheel availability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/release-ci-docker.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@bkryu bkryu added the run-ci label Mar 20, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

This pull request updates the CI workflow to target CUDA 13.2 instead of 13.1 and adds two new Docker images: a production image and a development image based on CUDA 13.2 with Ubuntu 24.04. The workflow matrix and manifest generation now reference the new cu132 variant for all image builds.

Changes

Cohort / File(s) Summary
CI Workflow Configuration
.github/workflows/release-ci-docker.yml
Updated the build matrix, manifest creation matrix, and generated tag entries to replace cu131 with cu132 for CUDA variant targeting across all image build jobs and downstream manifest operations.
Production Docker Image
docker/Dockerfile.cu132
Added a new CUDA 13.2 production Docker image based on NVIDIA CUDA 13.2.0 on Ubuntu 24.04 with Conda py312 environment, cuBLAS path configuration, and nightly cu132 Python dependencies including tilelang, cuda-tile, and mpi4py.
Development Docker Image
docker/Dockerfile.cu132.dev
Added a new CUDA 13.2 development Docker image with developer tools (clang-format, clangd, git, curl), non-root user with sudo access, pre-commit hooks, oh-my-zsh shell customization, and the same py312 environment and dependencies as the production image.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

docker

Suggested reviewers

  • yzh119
  • cyx-6

Poem

🐰 Hopping through CUDA lands so new,
cu132 shines in every view,
Docker containers, dev and prod,
Ubuntu 24.04, tools that are mod,
Python py312, a merry crew! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docker: Add CUDA 13.2 Docker containers' accurately and concisely summarizes the main changes: adding two new CUDA 13.2 Dockerfiles and updating the CI workflow for cu132 support.
Description check ✅ Passed The PR description covers the key changes (new Dockerfiles, cu131→cu132 migration, PyTorch nightly usage), marks the PR checklist items, but lacks explicit links to related issues in the 'Related Issues' section.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds Dockerfiles for CUDA 13.2. The changes look good overall. I've made a few suggestions to improve the Dockerfiles by following best practices, such as optimizing apt-get commands to reduce image size and improve build performance. Specifically, I've recommended cleaning up the apt cache in one file and consolidating package installations in the other to reduce redundant operations and image layers.

Comment on lines +6 to +9
RUN apt-get update && apt-get install -y \
curl \
git \
wget
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a Docker best practice to clean up the apt cache within the same RUN layer that apt-get update is called. This reduces the final image size. Please add && rm -rf /var/lib/apt/lists/* to this command.

RUN apt-get update && apt-get install -y \
    curl \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

Comment on lines +6 to +14
RUN apt-get update && apt-get install -y \
curl \
git \
wget \
clang-format \
clangd-19 \
vim \
zsh \
&& rm -rf /var/lib/apt/lists/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To optimize Docker layers and reduce build time, it's best to install all apt packages in a single RUN instruction. Consider adding sudo here, as it's installed in a separate layer later on. This will also allow removing a redundant apt-get update.

RUN apt-get update && apt-get install -y \
    curl \
    git \
    wget \
    clang-format \
    clangd-19 \
    vim \
    zsh \
    sudo \
    && rm -rf /var/lib/apt/lists/*

Comment on lines +22 to +29
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
# [Optional] Add sudo support
&& apt-get update \
&& apt-get install -y sudo \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME \
&& rm -rf /var/lib/apt/lists/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Following the previous suggestion to install sudo earlier, this RUN instruction can be simplified to only handle user creation and sudoers configuration. This avoids an unnecessary apt-get update and apt-get install, making the Dockerfile more efficient.

RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
    # [Optional] Add sudo support
    && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
    && chmod 0440 /etc/sudoers.d/$USERNAME

Copy link
Member

@yongwww yongwww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/release-ci-docker.yml (1)

39-61: ⚠️ Potential issue | 🟠 Major

The new .dev Dockerfile variant has no build/publish automation.

The workflow builds only docker/Dockerfile.${{ matrix.cuda }} (line 61), which resolves to the standard variants (cu132, not cu132.dev). No other automation path in the codebase references .dev Dockerfiles or -dev images. If docker/Dockerfile.cu132.dev is being added, either add it to the matrix or implement a separate build path for it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/release-ci-docker.yml around lines 39 - 61, The workflow
only builds docker/Dockerfile.${{ matrix.cuda }} (the "Build and push ${{
matrix.cuda }} ${{ matrix.arch }} image" step), so any new .dev Dockerfiles
(e.g., docker/Dockerfile.cu132.dev) are never built or published; fix by either
adding the .dev variant to the matrix (expand matrix.cuda to include cu132.dev)
or add a dedicated job/step that builds and pushes docker/Dockerfile.${{
matrix.cuda }}.dev (and tags the image with a -dev suffix), ensuring the step
uses docker/build-push-action@v5 and mirrors the existing context/file/tag/push
settings used by the current Build and push step.
🧹 Nitpick comments (2)
.github/workflows/release-ci-docker.yml (1)

37-40: Add fail-fast: true to this matrix.

With another CUDA entry in the fan-out, this job burns more self-hosted time after the first hard failure. Failing fast seems like the better default here.

Suggested diff
   build:
     runs-on: [self-hosted, linux, x64, cpu, on-demand]
     needs: generate-tag
     strategy:
+      fail-fast: true
       matrix:
         cuda: [cu126, cu128, cu129, cu130, cu132]
         arch: [amd64, arm64]

Based on learnings, in GitHub Actions workflow files under .github/workflows, set fail-fast: true for matrix jobs to reduce overall test time.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/release-ci-docker.yml around lines 37 - 40, The matrix job
under the strategy block currently enumerates cuda and arch entries (matrix: {
cuda: [...], arch: [...] }) and should include fail-fast: true so the matrix
stops all remaining jobs after the first hard failure; add fail-fast: true at
the same indentation level as matrix under the strategy block (adjacent to
matrix) to enable fast failure for the cuda/arch fan-out.
docker/Dockerfile.cu132 (1)

6-9: Trim this apt layer.

Please add --no-install-recommends and clear /var/lib/apt/lists in the same layer. The dev variant already does the cleanup, and keeping it here avoids extra image size and pull latency on every CI pull.

Suggested diff
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     curl \
     git \
-    wget
+    wget \
+    && rm -rf /var/lib/apt/lists/*
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.cu132` around lines 6 - 9, The RUN apt-get layer in
Dockerfile.cu132 installs curl/git/wget without pruning caches; update the RUN
command that starts with "apt-get update && apt-get install -y" to use
"--no-install-recommends" and perform apt list cleanup in the same layer (e.g.,
append "&& rm -rf /var/lib/apt/lists/*" after the install) so no extra image
layer or cache remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/Dockerfile.cu132.dev`:
- Around line 43-51: The dev Dockerfile is missing the CUDA wheel library path
override, causing different CUDA user-space ordering than CI; update the
Dockerfile.cu132.dev to prepend the cu132 wheel libs into LD_LIBRARY_PATH (e.g.,
set ENV LD_LIBRARY_PATH to include /home/$USERNAME/conda/envs/py312/lib before
$LD_LIBRARY_PATH) so the nightly/cu132 Python wheel libs are searched first
(make the change near the existing ENV PATH settings or the RUN that installs
nightly/cu132).

---

Outside diff comments:
In @.github/workflows/release-ci-docker.yml:
- Around line 39-61: The workflow only builds docker/Dockerfile.${{ matrix.cuda
}} (the "Build and push ${{ matrix.cuda }} ${{ matrix.arch }} image" step), so
any new .dev Dockerfiles (e.g., docker/Dockerfile.cu132.dev) are never built or
published; fix by either adding the .dev variant to the matrix (expand
matrix.cuda to include cu132.dev) or add a dedicated job/step that builds and
pushes docker/Dockerfile.${{ matrix.cuda }}.dev (and tags the image with a -dev
suffix), ensuring the step uses docker/build-push-action@v5 and mirrors the
existing context/file/tag/push settings used by the current Build and push step.

---

Nitpick comments:
In @.github/workflows/release-ci-docker.yml:
- Around line 37-40: The matrix job under the strategy block currently
enumerates cuda and arch entries (matrix: { cuda: [...], arch: [...] }) and
should include fail-fast: true so the matrix stops all remaining jobs after the
first hard failure; add fail-fast: true at the same indentation level as matrix
under the strategy block (adjacent to matrix) to enable fast failure for the
cuda/arch fan-out.

In `@docker/Dockerfile.cu132`:
- Around line 6-9: The RUN apt-get layer in Dockerfile.cu132 installs
curl/git/wget without pruning caches; update the RUN command that starts with
"apt-get update && apt-get install -y" to use "--no-install-recommends" and
perform apt list cleanup in the same layer (e.g., append "&& rm -rf
/var/lib/apt/lists/*" after the install) so no extra image layer or cache
remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4ba18c8-9a44-4047-b5e2-d64593bec871

📥 Commits

Reviewing files that changed from the base of the PR and between 8e1642a and d16eb34.

📒 Files selected for processing (3)
  • .github/workflows/release-ci-docker.yml
  • docker/Dockerfile.cu132
  • docker/Dockerfile.cu132.dev

Comment on lines +43 to +51
RUN echo "source activate py312" >> ~/.bashrc
ENV PATH="/home/$USERNAME/conda/bin:$PATH"
ENV PATH="/home/$USERNAME/conda/envs/py312/bin:$PATH"

# Install torch and other python packages
# use nightly/cu132 temporarily and change to cu132 when torch releases stable version
COPY requirements.txt /install/requirements.txt
COPY docker/install/install_python_packages.sh /install/install_python_packages.sh
RUN bash /install/install_python_packages.sh nightly/cu132 && pip3 install pre-commit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the dev image on the same CUDA library search order as the CI image.

docker/Dockerfile.cu132 adds the cu13 wheel libs to LD_LIBRARY_PATH, but this dev variant installs the same nightly/cu132 stack without that override. That can make local repro load a different CUDA user-space than CI.

Suggested diff
 RUN echo "source activate py312" >> ~/.bashrc
 ENV PATH="/home/$USERNAME/conda/bin:$PATH"
 ENV PATH="/home/$USERNAME/conda/envs/py312/bin:$PATH"
+ENV LD_LIBRARY_PATH="/home/$USERNAME/conda/envs/py312/lib/python3.12/site-packages/nvidia/cu13/lib/:$LD_LIBRARY_PATH"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile.cu132.dev` around lines 43 - 51, The dev Dockerfile is
missing the CUDA wheel library path override, causing different CUDA user-space
ordering than CI; update the Dockerfile.cu132.dev to prepend the cu132 wheel
libs into LD_LIBRARY_PATH (e.g., set ENV LD_LIBRARY_PATH to include
/home/$USERNAME/conda/envs/py312/lib before $LD_LIBRARY_PATH) so the
nightly/cu132 Python wheel libs are searched first (make the change near the
existing ENV PATH settings or the RUN that installs nightly/cu132).

@johnnynunez
Copy link
Contributor

super... this adds Jetson AGX Orin SBSA Support in flashinfer

@yongwww yongwww merged commit ff86ea0 into flashinfer-ai:main Mar 22, 2026
64 of 97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants