Skip to content

ci(docker): install triton_kernels for ATOM_USE_TRITON_MOE#687

Merged
valarLip merged 1 commit intomainfrom
ci/dockerfile-triton-kernels
May 6, 2026
Merged

ci(docker): install triton_kernels for ATOM_USE_TRITON_MOE#687
valarLip merged 1 commit intomainfrom
ci/dockerfile-triton-kernels

Conversation

@valarLip
Copy link
Copy Markdown
Collaborator

@valarLip valarLip commented May 4, 2026

Summary

Install the companion triton_kernels package in the ATOM Docker base image so models that opt into the Triton MoE path (e.g. DeepSeek-V4 with ATOM_USE_TRITON_MOE=1) can import it.

triton_kernels is shipped as a sibling pure-Python package under python/triton_kernels in the same ROCm/triton checkout we already clone in the build_triton stage; it is not installed by pip install . of the main triton package and was therefore missing from images.

Without this change, launching V4-Pro fails at import:

ModuleNotFoundError: No module named 'triton_kernels'

(callers: atom/model_ops/fused_moe_triton.py, atom/model_ops/moe.py:692-697).

Changes

  • build_triton stage: pip install ./python/triton_kernels after the main triton install (pure-Python, no extra build deps).
  • atom_image final stage: extend the existing --mount=from=build_triton cp block to also copy triton_kernels/ and triton_kernels-*.dist-info/ alongside triton/.
  • Both stages get a python -c "import triton_kernels; ..." validation so the build fails early if the copy regresses.

Derived stages — verified safe

  • atom_oot: triton backup/restore globs only match triton/ and triton-*.dist-info; triton_kernels (separate package) survives untouched.
  • atom_sglang: reinstalls triton==3.6.0, does not touch triton_kernels.

Test plan

  • Local docker build --target atom_image reaches the new validation step and prints triton_kernels: ....
  • After build, python -c "import triton_kernels; print(triton_kernels.__file__)" resolves inside the image.
  • Launch DeepSeek-V4-Pro with ATOM_USE_TRITON_MOE=1 in the freshly built image — no ModuleNotFoundError and GSM8K-50 ≥ 0.94 (matches V4 baseline noted in PR feat(deepseek_v4): PR1 skeleton — end-to-end inference with triton MoE #650).

`triton_kernels` is the companion pure-Python package shipped under
`python/triton_kernels` in the ROCm/triton checkout. ATOM imports it
when `ATOM_USE_TRITON_MOE=1` (DeepSeek-V4 launch path,
fused_moe_triton.py, moe.py). Without it the model fails at import
with `ModuleNotFoundError: triton_kernels`.

- build_triton stage: pip install python/triton_kernels after main
  triton install (pure-Python, no extra build deps).
- atom_image final stage: copy `triton_kernels` + `triton_kernels-*.dist-info`
  alongside the existing triton mount-copy.

Derived stages are unaffected:
- atom_oot only uninstalls/restores `triton` (glob does not match).
- atom_sglang reinstalls `triton==3.6.0`, does not touch triton_kernels.
Copilot AI review requested due to automatic review settings May 4, 2026 16:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the ATOM Docker base image build to ensure the sibling triton_kernels Python package (from the same ROCm/triton checkout) is installed and present in the final atom_image, preventing runtime ModuleNotFoundError: No module named 'triton_kernels' for Triton-MoE code paths.

Changes:

  • Install ./python/triton_kernels in the build_triton stage after installing Triton, and validate it imports.
  • Copy triton_kernels/ and its *.dist-info/ from build_triton into the final atom_image venv alongside triton/, and validate both imports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/Dockerfile
Comment on lines +314 to +315
# Companion `triton_kernels` package (pure-Python). Required by ATOM when
# `ATOM_USE_TRITON_MOE=1` (e.g. DeepSeek-V4 launch path). Lives under
Comment thread docker/Dockerfile
# Triton: copy package from build stage into current venv
# Use RUN --mount to avoid COPY glob issues, and preserve mori already in venv
# Also copy the companion `triton_kernels` package (pure-Python, built in the
# same stage) — required by ATOM_USE_TRITON_MOE=1 model paths.
@valarLip valarLip merged commit 08b268f into main May 6, 2026
28 of 34 checks passed
@valarLip valarLip deleted the ci/dockerfile-triton-kernels branch May 6, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants