Skip to content

[AMD] Fix aiter import failure in ROCm Docker images#22363

Open
vvagaytsev wants to merge 1 commit intosgl-project:mainfrom
vvagaytsev:fix/rocm-images-aiter-module-load
Open

[AMD] Fix aiter import failure in ROCm Docker images#22363
vvagaytsev wants to merge 1 commit intosgl-project:mainfrom
vvagaytsev:fix/rocm-images-aiter-module-load

Conversation

@vvagaytsev
Copy link
Copy Markdown

@vvagaytsev vvagaytsev commented Apr 8, 2026

Motivation

This should fix #22279.

Problem

The v0.5.10-rocm*-mi35x Docker images fail on startup with error (when SGLANG_USE_AITER=1):

ImportError: cannot import name 'dynamic_per_tensor_quant' from 'aiter' (unknown location)

Both rocm700 and rocm720 images are affected in 0.5.10, because starting from #19203 (it was merged after 0.5.9 release) the images use the same rocm.Dockerfile.

Root Cause

Commit 2e76824 (see #20195) changed aiter installation from python setup.py develop to python setup.py build_ext --inplace + pip install -e ..

The old setup.py develop used the egg-link mechanism, which added /sgl-workspace/aiter directly to sys.path via a .pth file. Modern pip install -e . (strict editable mode) instead registers a custom import finder appended to the end of sys.meta_path.

Since the image's WORKDIR is /sgl-workspace, the default PathFinder (which runs before the editable finder) discovers /sgl-workspace/aiter/ - the git repo root directory - via cwd ('' in sys.path). Because the repo root has no __init__.py, Python resolves it as a namespace package and stops searching. The editable finder, which has the correct mapping (aiter → /sgl-workspace/aiter/aiter), is never consulted.

The result: aiter imports as an empty namespace package (__file__=None, loader=None), and none of the actual package contents (including dynamic_per_tensor_quant) are available.

Modifications

Use pip install --config-settings editable_mode=compat -e . instead of pip install -e .. Compat editable mode writes a simple .pth path entry (/sgl-workspace/aiter) into site-packages, which PathFinder processes before falling back to cwd. This makes PathFinder find /sgl-workspace/aiter/aiter/__init__.py correctly — the same reliable mechanism that setup.py develop used previously.

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

`pip install -e .` in strict editable mode registers a custom import
finder at the end of `sys.meta_path`. Since `WORKDIR` is `/sgl-workspace`,
the default `PathFinder` finds the git repo directory
`/sgl-workspace/aiter/` (no `__init__.py`) via cwd first and resolves
it as a namespace package, shadowing the editable finder that has the
correct path mapping.

Switch to `pip install --config-settings editable_mode=compat -e .`
which writes a `.pth` path entry instead of a finder, matching the
behavior of the previous `setup.py develop` approach.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the pip install commands in docker/rocm.Dockerfile to use --config-settings editable_mode=compat for editable installations. The review comments highlight that the sh -c wrappers are redundant and potentially unsafe due to unquoted variable expansion, and suggest that the manual PYTHONPATH export may no longer be necessary. However, as these comments lack specific code suggestions to address the identified risks, I have no further feedback to provide.

@vvagaytsev
Copy link
Copy Markdown
Author

/tag-and-rerun-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] aiter module crash in SGLang MI350X Docker images — aiter.dtypes AttributeError

1 participant