[Test][LoRA] Add e2e test for base model inference by paulyu12 · Pull Request #6624 · vllm-project/vllm-ascend

paulyu12 · 2026-02-09T03:01:25Z

What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls do_sample with lora_id=0 to target the base model and asserts the output against expected SQL queries.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with the new test case. The test can be run with:

pytest -sv tests/e2e/singlecard/test_llama32_lora.py

Signed-off-by: paulyu12 <507435917@qq.com>

gemini-code-assist · 2026-02-09T03:01:38Z

Summary of Changes

Hello @paulyu12, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the end-to-end testing suite for the Llama 3.2 LoRA integration. Following a fix for the LoRA base model request issue, this change introduces a dedicated test case to verify the continued correctness and expected behavior of the base model when LoRA functionality is present but not actively applied. This ensures the stability of the core model's output generation.

Highlights

New Expected Output Definition: A new constant, EXPECTED_BASE_MODEL_OUTPUT, has been added to test_llama32_lora.py. This list contains expected SQL query strings that the base model should generate.
Base Model Test Case: A new assertion has been introduced within the generate_and_test function to validate the output of the base model. This test ensures that when LoRA is enabled but lora_id is set to 0 (indicating the base model), the model produces the EXPECTED_BASE_MODEL_OUTPUT.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

tests/e2e/singlecard/test_llama32_lora.py
- Added EXPECTED_BASE_MODEL_OUTPUT list containing expected SQL query strings.
- Introduced a new assertion in the generate_and_test function to test the base model's output when lora_id is 0, comparing it against EXPECTED_BASE_MODEL_OUTPUT.

Activity

The pull request was created by paulyu12 to add a test case for the base model when LoRA is enabled, following a previous fix for a LoRA base model request issue.
The changes were tested using pytest -sv test_llama32_lora.py.
No other review comments or activity have been recorded for this pull request based on the provided context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds an end-to-end test case to verify base model inference when LoRA is enabled. The changes are correct and improve test coverage for LoRA functionality.

Following the repository's style guide, here are suggestions for the pull request title and summary:

Suggested PR Title:

[Test][LoRA] Add e2e test for base model inference

Suggested PR Summary:

### What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with the new test case. The test can be run with:
```bash
pytest -sv tests/e2e/singlecard/test_llama32_lora.py

github-actions · 2026-02-09T03:10:50Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2026-02-09T10:53:08Z

        lora_id=2,
    ) == EXPECTED_LORA_OUTPUT)

+    print("base model")


the print here can be removed

The print line refered to the L97 and L105 in this file. So I'd like to preseve this line.

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [Feat] 310p support MoE W8A8 quantizaition (vllm-project#6641) [TEST]add a qwen3-30b acc case with mooncake mempool (vllm-project#6244) [MOE Refactor] Remove QuantType in prepare_finalize.py (vllm-project#6534) [EPLB] Avoiding eplb's dependency on a specified model (vllm-project#6528) [Doc][Misc] Restructure tutorial documentation (vllm-project#6501) implement batch invariant with ascendc (vllm-project#6590) [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (vllm-project#6629) [Misc] upgrade to vllm main (vllm-project#6646) [main][Docs] Fix spelling errors across documentation (vllm-project#6649) [bugfix]Fix no attribute 'data' when MLAPO is enable (vllm-project#6601) [DOC]Add Memcache Usage Guide (vllm-project#6476) [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (vllm-project#6606) [Test][LoRA] Add e2e test for base model inference (vllm-project#6624) [refactor]Optimized the kvcache usage of Deepseek v3.2 (vllm-project#6610) [Feat](sfa,dcp) support dcp for sfa (vllm-project#6563) [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (vllm-project#6581) [fix bug] fix tensor mismatch bug in sigmoid operate test case (vllm-project#6619) [Kernel]: Optimize DispatchFFNCombine performance (vllm-project#6468) [MISC] Clean up useless env USE_OPTIMIZED_MODEL (vllm-project#6618)

### What this PR does / why we need it? This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with the new test case. The test can be run with: ```bash pytest -sv tests/e2e/singlecard/test_llama32_lora.py Signed-off-by: paulyu12 <507435917@qq.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

zhangxinyuehfad · 2026-02-25T03:03:58Z

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534
Can you fix this accuracy issue?

paulyu12 · 2026-02-25T08:34:47Z

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534 Can you fix this accuracy issue?

How to reproduce this ?

zhangxinyuehfad · 2026-02-25T08:38:46Z

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534 Can you fix this accuracy issue?

How to reproduce this ?

You can use vllm version (hash:83b47f67b1dfad505606070ae4d9f83e50ad4ebd) and vllm-ascend main to reproduce it.

### What this PR does / why we need it? This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with the new test case. The test can be run with: ```bash pytest -sv tests/e2e/singlecard/test_llama32_lora.py Signed-off-by: paulyu12 <507435917@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with the new test case. The test can be run with: ```bash pytest -sv tests/e2e/singlecard/test_llama32_lora.py Signed-off-by: paulyu12 <507435917@qq.com>

paulyu12 · 2026-03-03T09:05:48Z

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534 Can you fix this accuracy issue?

How to reproduce this ?

You can use vllm version (hash:83b47f67b1dfad505606070ae4d9f83e50ad4ebd) and vllm-ascend main to reproduce it.

I'm trying to make a workaround to fix it by #6958 .

### What this PR does / why we need it? This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with the new test case. The test can be run with: ```bash pytest -sv tests/e2e/singlecard/test_llama32_lora.py Signed-off-by: paulyu12 <507435917@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with the new test case. The test can be run with: ```bash pytest -sv tests/e2e/singlecard/test_llama32_lora.py Signed-off-by: paulyu12 <507435917@qq.com>

[e2e][LoRA] Add testcase of base model when LoRA enabled

312fd3b

Signed-off-by: paulyu12 <507435917@qq.com>

paulyu12 requested a review from wangxiyuan as a code owner February 9, 2026 03:01

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

paulyu12 changed the title ~~[e2e][LoRA] Add testcase of base model when LoRA enabled~~ [e2e][LoRA] Add a testcase of base model when LoRA enabled Feb 9, 2026

github-actions bot added the module:tests label Feb 9, 2026

paulyu12 changed the title ~~[e2e][LoRA] Add a testcase of base model when LoRA enabled~~ [Test][LoRA] Add e2e test for base model inference Feb 9, 2026

paulyu12 added ready read for review ready-for-test start test by label for PR labels Feb 9, 2026

wangxiyuan approved these changes Feb 9, 2026

View reviewed changes

paulyu12 merged commit 8d44dda into vllm-project:main Feb 9, 2026
56 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test][LoRA] Add e2e test for base model inference#6624

[Test][LoRA] Add e2e test for base model inference#6624
paulyu12 merged 1 commit intovllm-project:mainfrom
paulyu12:main

paulyu12 commented Feb 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

wangxiyuan Feb 9, 2026

Uh oh!

paulyu12 Feb 9, 2026

Uh oh!

Uh oh!

zhangxinyuehfad commented Feb 25, 2026

Uh oh!

paulyu12 commented Feb 25, 2026

Uh oh!

zhangxinyuehfad commented Feb 25, 2026

Uh oh!

paulyu12 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

paulyu12 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

wangxiyuan Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

paulyu12 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhangxinyuehfad commented Feb 25, 2026

Uh oh!

paulyu12 commented Feb 25, 2026

Uh oh!

zhangxinyuehfad commented Feb 25, 2026

Uh oh!

paulyu12 commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

paulyu12 commented Feb 9, 2026 •

edited

Loading