Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions .github/workflows/_e2e_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,6 @@ jobs:
pip install -r requirements-dev.txt
pip install -v -e .

- name: Run vllm-project/vllm-ascend test (non triton)
env:
VLLM_WORKER_MULTIPROC_METHOD: spawn
PYTORCH_NPU_ALLOC_CONF: max_split_size_mb:256
if: ${{ inputs.type == 'full' }}
run: |
pytest -sv --durations=0 tests/e2e/singlecard/test_aclgraph_mem.py
pytest -sv --durations=0 tests/e2e/singlecard/test_camem.py

- name: Install Ascend toolkit & triton_ascend
shell: bash -l {0}
run: |
Expand All @@ -94,6 +85,8 @@ jobs:
run: |
# pytest -sv --durations=0 tests/e2e/singlecard/test_aclgraph_accuracy.py
# pytest -sv --durations=0 tests/e2e/singlecard/test_quantization.py
pytest -sv --durations=0 tests/e2e/singlecard/test_aclgraph_mem.py
pytest -sv --durations=0 tests/e2e/singlecard/test_camem.py
pytest -sv --durations=0 tests/e2e/singlecard/test_vlm.py::test_multimodal_vl
pytest -sv --durations=0 tests/e2e/singlecard/pooling/test_classification.py::test_qwen_pooling_classify_correctness

Expand Down
5 changes: 1 addition & 4 deletions vllm_ascend/ops/triton/activation/swiglu_quant.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
import torch
from vllm.triton_utils import HAS_TRITON, tl, triton

if HAS_TRITON:
import torch_npu._inductor # noqa: F401
from vllm.triton_utils import tl, triton

from vllm_ascend.ops.triton.triton_utils import get_vectorcore_num

Expand Down
5 changes: 1 addition & 4 deletions vllm_ascend/ops/triton/fla/fused_qkvzba_split_reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,7 @@
# ruff: noqa: E501
# mypy: ignore-errors
import torch
from vllm.triton_utils import HAS_TRITON, tl, triton

if HAS_TRITON:
import torch_npu._inductor # noqa: F401
from vllm.triton_utils import tl, triton
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we need to add a check Forbid import torch_npu._inductor in vllm_ascend/ops/triton/
https://github.com/vllm-project/vllm/blob/main/.pre-commit-config.yaml#L132

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import of torch_npu._inductor fixes graph mode running errors with triton-ascend. Once added, the issue no longer occurs, so future Triton ops won't need similar imports in ops/triton. Therefore, we don't need a dedicated pre-commit specifically for this.



@triton.jit
Expand Down
5 changes: 1 addition & 4 deletions vllm_ascend/ops/triton/rope.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,7 @@
# limitations under the License.
# This file is a part of the vllm-ascend project.
#
from vllm.triton_utils import HAS_TRITON, tl, triton

if HAS_TRITON:
import torch_npu._inductor # noqa: F401
from vllm.triton_utils import tl, triton

from vllm_ascend.ops.triton.triton_utils import get_vectorcore_num

Expand Down
5 changes: 5 additions & 0 deletions vllm_ascend/worker/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,11 @@ def __init__(
# register patch for vllm
from vllm_ascend.utils import adapt_patch
adapt_patch()
# Import _inductor for graph mode execution with triton
# This lazy import avoids torch_npu re-initialization in patch
from vllm.triton_utils import HAS_TRITON
if HAS_TRITON:
import torch_npu._inductor # noqa: F401
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would mind adding a note to show why need import this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've added a comment explaining.

# Register ops when worker init.
from vllm_ascend import ops
ops.register_dummy_fusion_op()
Expand Down
Loading