Skip to content

[Test][LoRA] Add e2e test for base model inference#6624

Merged
paulyu12 merged 1 commit intovllm-project:mainfrom
paulyu12:main
Feb 9, 2026
Merged

[Test][LoRA] Add e2e test for base model inference#6624
paulyu12 merged 1 commit intovllm-project:mainfrom
paulyu12:main

Conversation

@paulyu12
Copy link
Copy Markdown
Collaborator

@paulyu12 paulyu12 commented Feb 9, 2026

What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls do_sample with lora_id=0 to target the base model and asserts the output against expected SQL queries.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with the new test case. The test can be run with:

pytest -sv tests/e2e/singlecard/test_llama32_lora.py

Signed-off-by: paulyu12 <507435917@qq.com>
@paulyu12 paulyu12 requested a review from wangxiyuan as a code owner February 9, 2026 03:01
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @paulyu12, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the end-to-end testing suite for the Llama 3.2 LoRA integration. Following a fix for the LoRA base model request issue, this change introduces a dedicated test case to verify the continued correctness and expected behavior of the base model when LoRA functionality is present but not actively applied. This ensures the stability of the core model's output generation.

Highlights

  • New Expected Output Definition: A new constant, EXPECTED_BASE_MODEL_OUTPUT, has been added to test_llama32_lora.py. This list contains expected SQL query strings that the base model should generate.
  • Base Model Test Case: A new assertion has been introduced within the generate_and_test function to validate the output of the base model. This test ensures that when LoRA is enabled but lora_id is set to 0 (indicating the base model), the model produces the EXPECTED_BASE_MODEL_OUTPUT.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tests/e2e/singlecard/test_llama32_lora.py
    • Added EXPECTED_BASE_MODEL_OUTPUT list containing expected SQL query strings.
    • Introduced a new assertion in the generate_and_test function to test the base model's output when lora_id is 0, comparing it against EXPECTED_BASE_MODEL_OUTPUT.
Activity
  • The pull request was created by paulyu12 to add a test case for the base model when LoRA is enabled, following a previous fix for a LoRA base model request issue.
  • The changes were tested using pytest -sv test_llama32_lora.py.
  • No other review comments or activity have been recorded for this pull request based on the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds an end-to-end test case to verify base model inference when LoRA is enabled. The changes are correct and improve test coverage for LoRA functionality.

Following the repository's style guide, here are suggestions for the pull request title and summary:

Suggested PR Title:

[Test][LoRA] Add e2e test for base model inference

Suggested PR Summary:

### What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base model inference when LoRA is enabled. This is to ensure that after a LoRA base model request issue was fixed, the functionality remains correct and does not regress. The new test case calls `do_sample` with `lora_id=0` to target the base model and asserts the output against expected SQL queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with the new test case. The test can be run with:
```bash
pytest -sv tests/e2e/singlecard/test_llama32_lora.py

@paulyu12 paulyu12 changed the title [e2e][LoRA] Add testcase of base model when LoRA enabled [e2e][LoRA] Add a testcase of base model when LoRA enabled Feb 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 9, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@paulyu12 paulyu12 changed the title [e2e][LoRA] Add a testcase of base model when LoRA enabled [Test][LoRA] Add e2e test for base model inference Feb 9, 2026
@paulyu12 paulyu12 added ready read for review ready-for-test start test by label for PR labels Feb 9, 2026
lora_id=2,
) == EXPECTED_LORA_OUTPUT)

print("base model")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the print here can be removed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print line refered to the L97 and L105 in this file. So I'd like to preseve this line.

@paulyu12 paulyu12 merged commit 8d44dda into vllm-project:main Feb 9, 2026
56 of 57 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Feb 11, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [Feat] 310p support MoE W8A8 quantizaition (vllm-project#6641)
  [TEST]add a qwen3-30b acc case with mooncake mempool (vllm-project#6244)
  [MOE Refactor] Remove QuantType in prepare_finalize.py (vllm-project#6534)
  [EPLB] Avoiding eplb's dependency on a specified model (vllm-project#6528)
  [Doc][Misc] Restructure tutorial documentation (vllm-project#6501)
  implement batch invariant with ascendc (vllm-project#6590)
  [Refact]Refact MLA/SFA weight prefetch to consist with moe weight prefetch (vllm-project#6629)
  [Misc] upgrade to vllm main (vllm-project#6646)
  [main][Docs] Fix spelling errors across documentation (vllm-project#6649)
  [bugfix]Fix no attribute 'data' when MLAPO is enable  (vllm-project#6601)
  [DOC]Add Memcache Usage Guide (vllm-project#6476)
  [main][bugfix] Fix spec acceptance rate problem in vllm_0.15.0 (vllm-project#6606)
  [Test][LoRA] Add e2e test for base model inference (vllm-project#6624)
  [refactor]Optimized the kvcache usage of Deepseek v3.2 (vllm-project#6610)
  [Feat](sfa,dcp) support dcp for sfa (vllm-project#6563)
  [BugFix] Add support for rotary_dim parameter when using partial rope in rotary_embedding (vllm-project#6581)
  [fix bug] fix tensor mismatch bug in sigmoid operate test case (vllm-project#6619)
  [Kernel]: Optimize DispatchFFNCombine performance (vllm-project#6468)
  [MISC] Clean up useless env USE_OPTIMIZED_MODEL (vllm-project#6618)
chenchuw886 pushed a commit to chenchuw886/vllm-ascend that referenced this pull request Feb 12, 2026
### What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base
model inference when LoRA is enabled. This is to ensure that after a
LoRA base model request issue was fixed, the functionality remains
correct and does not regress. The new test case calls `do_sample` with
`lora_id=0` to target the base model and asserts the output against
expected SQL queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with the new test case. The test can be run with:
```bash
pytest -sv tests/e2e/singlecard/test_llama32_lora.py

Signed-off-by: paulyu12 <507435917@qq.com>
Signed-off-by: momochenchuw <chenchuw@huawei.com>
@zhangxinyuehfad
Copy link
Copy Markdown
Collaborator

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534
Can you fix this accuracy issue?

@paulyu12
Copy link
Copy Markdown
Collaborator Author

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534 Can you fix this accuracy issue?

How to reproduce this ?

@zhangxinyuehfad
Copy link
Copy Markdown
Collaborator

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534 Can you fix this accuracy issue?

How to reproduce this ?

You can use vllm version (hash:83b47f67b1dfad505606070ae4d9f83e50ad4ebd) and vllm-ascend main to reproduce it.

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base
model inference when LoRA is enabled. This is to ensure that after a
LoRA base model request issue was fixed, the functionality remains
correct and does not regress. The new test case calls `do_sample` with
`lora_id=0` to target the base model and asserts the output against
expected SQL queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with the new test case. The test can be run with:
```bash
pytest -sv tests/e2e/singlecard/test_llama32_lora.py

Signed-off-by: paulyu12 <507435917@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base
model inference when LoRA is enabled. This is to ensure that after a
LoRA base model request issue was fixed, the functionality remains
correct and does not regress. The new test case calls `do_sample` with
`lora_id=0` to target the base model and asserts the output against
expected SQL queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with the new test case. The test can be run with:
```bash
pytest -sv tests/e2e/singlecard/test_llama32_lora.py

Signed-off-by: paulyu12 <507435917@qq.com>
@paulyu12
Copy link
Copy Markdown
Collaborator Author

paulyu12 commented Mar 3, 2026

After adapting to the vllm update, lora e2e reports error accuracy issues, which are currently skipped to avoid. The error is reported as follows https://github.com/vllm-project/vllm-ascend/actions/runs/22377770119/job/64771873534 Can you fix this accuracy issue?

How to reproduce this ?

You can use vllm version (hash:83b47f67b1dfad505606070ae4d9f83e50ad4ebd) and vllm-ascend main to reproduce it.

I'm trying to make a workaround to fix it by #6958 .

ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base
model inference when LoRA is enabled. This is to ensure that after a
LoRA base model request issue was fixed, the functionality remains
correct and does not regress. The new test case calls `do_sample` with
`lora_id=0` to target the base model and asserts the output against
expected SQL queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with the new test case. The test can be run with:
```bash
pytest -sv tests/e2e/singlecard/test_llama32_lora.py

Signed-off-by: paulyu12 <507435917@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?

This PR adds an end-to-end test case to verify the correctness of base
model inference when LoRA is enabled. This is to ensure that after a
LoRA base model request issue was fixed, the functionality remains
correct and does not regress. The new test case calls `do_sample` with
`lora_id=0` to target the base model and asserts the output against
expected SQL queries.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI passed with the new test case. The test can be run with:
```bash
pytest -sv tests/e2e/singlecard/test_llama32_lora.py

Signed-off-by: paulyu12 <507435917@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants