Skip to content

[main][test] Refactor the mtp and eagle test case#5326

Merged
wangxiyuan merged 3 commits intovllm-project:mainfrom
lilinsiman:refactor_mtp_eagle
Dec 31, 2025
Merged

[main][test] Refactor the mtp and eagle test case#5326
wangxiyuan merged 3 commits intovllm-project:mainfrom
lilinsiman:refactor_mtp_eagle

Conversation

@lilinsiman
Copy link
Copy Markdown
Collaborator

@lilinsiman lilinsiman commented Dec 24, 2025

What this PR does / why we need it?

  1. Refactor the current test with mtp and eagle cases
  2. Add new necessary cases with mtp and eagle

Does this PR introduce any user-facing change?

no

How was this patch tested?

ut

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MTP and Eagle speculative decoding test cases, which is a good improvement for maintainability. However, I've identified several areas with code duplication and misuse of context managers. Specifically, there are redundant calls to cleanup functions and del statements for objects managed by with statements. These issues affect code clarity and could lead to subtle bugs. I've provided suggestions to refactor the duplicated code and remove the unnecessary calls.

print(f"golden: {golden}")

assert match
cleanup_dist_env_and_memory()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The VllmRunner context manager automatically handles resource cleanup, including calling cleanup_dist_env_and_memory(), in its __exit__ method. This explicit call is redundant and should be removed to improve code clarity.

Comment on lines +136 to +179
if not enforce_eager:
if cudagraph_mode == "FULL_DECODE_ONLY":
pytest.skip("This case will be supported in future")
with VllmRunner(model_name,
tensor_parallel_size=4,
max_model_len=4096,
gpu_memory_utilization=0.8,
distributed_executor_backend="mp",
speculative_config={
"method":
"mtp",
"num_speculative_tokens":
num_speculative_tokens,
"disable_padded_drafter_batch":
disable_padded_drafter_batch,
},
enforce_eager=enforce_eager,
compilation_config=CompilationConfig(
cudagraph_mode=cudagraph_mode,
cudagraph_capture_sizes=[12],
)) as spec_llm:
spec_outputs = spec_llm.generate_greedy(example_prompts, max_tokens)
del spec_llm

else:
if cudagraph_mode == "PIECEWISE":
pytest.skip("skipping the repeating case")
with VllmRunner(model_name,
tensor_parallel_size=4,
max_model_len=4096,
gpu_memory_utilization=0.8,
distributed_executor_backend="mp",
speculative_config={
"method":
"mtp",
"num_speculative_tokens":
num_speculative_tokens,
"disable_padded_drafter_batch":
disable_padded_drafter_batch,
},
enforce_eager=enforce_eager,
) as spec_llm:
spec_outputs = spec_llm.generate_greedy(example_prompts, max_tokens)
del spec_llm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The if/else block for enforce_eager contains a lot of duplicated code for initializing VllmRunner. This can be refactored to reduce redundancy and improve readability. Additionally, the del spec_llm calls are unnecessary when using a with statement, as the context manager handles cleanup.

    compilation_config = None
    if not enforce_eager:
        if cudagraph_mode == "FULL_DECODE_ONLY":
            pytest.skip("This case will be supported in future")
        compilation_config = CompilationConfig(
            cudagraph_mode=cudagraph_mode,
            cudagraph_capture_sizes=[12],
        )
    else:
        if cudagraph_mode == "PIECEWISE":
            pytest.skip("skipping the repeating case")

    with VllmRunner(model_name,
                    tensor_parallel_size=4,
                    max_model_len=4096,
                    gpu_memory_utilization=0.8,
                    distributed_executor_backend="mp",
                    speculative_config={
                        "method":
                        "mtp",
                        "num_speculative_tokens":
                        num_speculative_tokens,
                        "disable_padded_drafter_batch":
                        disable_padded_drafter_batch,
                    },
                    enforce_eager=enforce_eager,
                    compilation_config=compilation_config) as spec_llm:
        spec_outputs = spec_llm.generate_greedy(example_prompts, max_tokens)

# Heuristic: expect at least 66% of the prompts to match exactly
# Upon failure, inspect the outputs to check for inaccuracy.
assert matches > int(0.66 * len(ref_outputs))
cleanup_dist_env_and_memory()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The VllmRunner context manager automatically handles resource cleanup, including calling cleanup_dist_env_and_memory(), in its __exit__ method. This explicit call is redundant and should be removed.

Comment on lines +66 to +111
if not enforce_eager:
with VllmRunner(model_name,
tensor_parallel_size=1,
max_num_seqs=256,
gpu_memory_utilization=0.7,
distributed_executor_backend="mp",
enable_expert_parallel=True,
speculative_config={
"method":
"mtp",
"num_speculative_tokens":
num_speculative_tokens,
"disable_padded_drafter_batch":
disable_padded_drafter_batch,
},
enforce_eager=enforce_eager,
max_model_len=2000,
compilation_config=CompilationConfig(
cudagraph_mode=cudagraph_mode,
cudagraph_capture_sizes=[12],
)) as spec_llm:
sampling_config = SamplingParams(temperature=0, max_tokens=256, ignore_eos=False)
spec_outputs = spec_llm.generate(example_prompts, sampling_config)

else:
if cudagraph_mode == "PIECEWISE":
pytest.skip("skipping the repeating case")
with VllmRunner(model_name,
tensor_parallel_size=1,
max_num_seqs=256,
gpu_memory_utilization=0.7,
distributed_executor_backend="mp",
enable_expert_parallel=True,
speculative_config={
"method":
"mtp",
"num_speculative_tokens":
num_speculative_tokens,
"disable_padded_drafter_batch":
disable_padded_drafter_batch,
},
enforce_eager=enforce_eager,
max_model_len=2000
) as spec_llm:
sampling_config = SamplingParams(temperature=0, max_tokens=256, ignore_eos=False)
spec_outputs = spec_llm.generate(example_prompts, sampling_config)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This if/else block for enforce_eager has a lot of duplicated code for VllmRunner initialization. It can be refactored to be more concise and maintainable.

    compilation_config = None
    if not enforce_eager:
        compilation_config = CompilationConfig(
            cudagraph_mode=cudagraph_mode,
            cudagraph_capture_sizes=[12],
        )
    else:
        if cudagraph_mode == "PIECEWISE":
            pytest.skip("skipping the repeating case")

    with VllmRunner(model_name,
                    tensor_parallel_size=1,
                    max_num_seqs=256,
                    gpu_memory_utilization=0.7,
                    distributed_executor_backend="mp",
                    enable_expert_parallel=True,
                    speculative_config={
                        "method":
                        "mtp",
                        "num_speculative_tokens":
                        num_speculative_tokens,
                        "disable_padded_drafter_batch":
                        disable_padded_drafter_batch,
                    },
                    enforce_eager=enforce_eager,
                    max_model_len=2000,
                    compilation_config=compilation_config) as spec_llm:
        sampling_config = SamplingParams(temperature=0, max_tokens=256, ignore_eos=False)
        spec_outputs = spec_llm.generate(example_prompts, sampling_config)

Comment on lines +136 to +137
cleanup_dist_env_and_memory()
del spec_llm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The VllmRunner context manager handles resource cleanup. The explicit calls to cleanup_dist_env_and_memory() and del spec_llm are redundant and should be removed.

Comment on lines +192 to +193
cleanup_dist_env_and_memory()
del llm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The VllmRunner context manager (with statement) automatically handles cleanup. The explicit calls to cleanup_dist_env_and_memory() and del llm are redundant. Placing them inside the with block is particularly confusing and they should be removed.

Comment on lines +206 to +207
cleanup_dist_env_and_memory()
del llm
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The VllmRunner context manager (with statement) automatically handles cleanup. The explicit calls to cleanup_dist_env_and_memory() and del llm are redundant and should be removed.

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 24, 2025
Comment thread .github/workflows/_e2e_test.yaml Outdated
VLLM_WORKER_MULTIPROC_METHOD: spawn
if: ${{ inputs.type == 'full' }}
run: |
pytest -sv --durations=0 tests/e2e/multicard/spec_decode_v1/test_mtp_qwen3_next.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 4-card test part

@wangxiyuan wangxiyuan removed the ready-for-test start test by label for PR label Dec 24, 2025
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.


@pytest.mark.parametrize("model_name", MODELS)
@pytest.mark.parametrize("num_speculative_tokens", [1,2,3])
@pytest.mark.parametrize("enforce_eager", [True, False])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove enforce_eager=True

@pytest.mark.parametrize(
"cudagraph_mode",
[
CUDAGraphMode.NONE,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

[
CUDAGraphMode.NONE,
CUDAGraphMode.PIECEWISE,
CUDAGraphMode.FULL_DECODE_ONLY,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it back once FULL_DECODE_ONLY works

@pytest.mark.parametrize("method", ["eagle", "eagle3"])
@pytest.mark.parametrize("disable_padded_drafter_batch", [True, False])
@pytest.mark.parametrize("async_scheduling", [True, False])
def test_offline_eagle_correctness(model_name: str,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_<model_name>

test_deepseek_mtp_accuracy

@lilinsiman lilinsiman force-pushed the refactor_mtp_eagle branch 2 times, most recently from 80d4d99 to b38bde5 Compare December 25, 2025 11:24
@MengqingCao MengqingCao added the ready-for-test start test by label for PR label Dec 25, 2025
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wjunLu
Copy link
Copy Markdown
Collaborator

wjunLu commented Dec 26, 2025

There are some conflicts, please rebase

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
Signed-off-by: lilinsiman <lilinsiman@gmail.com>
@lilinsiman lilinsiman force-pushed the refactor_mtp_eagle branch 2 times, most recently from f6fce45 to bba8a82 Compare December 29, 2025 03:21
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
@wangxiyuan wangxiyuan merged commit 46862ce into vllm-project:main Dec 31, 2025
14 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 31, 2025
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [feature] mooncake support pcp/dcp in common conditions (vllm-project#5224)
  [Bugfix] Fix mm_merge (vllm-project#5249)
  [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)
  [Feature] Refactor PCP &DCP related code (vllm-project#5214)
  [main][test] Refactor the mtp and eagle test case (vllm-project#5326)
  [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (vllm-project#5521)
  [2/N] Upgrade nightly doc (vllm-project#5534)
  [Doc] Add new contributors. (vllm-project#5537)
  [3/N][Nightly] Move ops tests to nightly (vllm-project#5538)
wangyibo1005 pushed a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025
### What this PR does / why we need it?
1. Refactor the current test with mtp and eagle cases
2. Add new necessary cases with mtp and eagle

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
### What this PR does / why we need it?
1. Refactor the current test with mtp and eagle cases
2. Add new necessary cases with mtp and eagle

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
1. Refactor the current test with mtp and eagle cases
2. Add new necessary cases with mtp and eagle

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
1. Refactor the current test with mtp and eagle cases
2. Add new necessary cases with mtp and eagle

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
1. Refactor the current test with mtp and eagle cases
2. Add new necessary cases with mtp and eagle

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ut

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@5fbfa8d

---------

Signed-off-by: lilinsiman <lilinsiman@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants