Add e2e smoke tests for the new datagen system by fynnsu · Pull Request #378 · vllm-project/speculators

fynnsu · 2026-04-02T20:24:38Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

#353 adds new online/offline training datagen using vLLM's hidden states extraction feature. This pr adds some e2e smoke tests for both workflows. It also fixes an issue in train.py that the tests found.

Description

Add the two tests under:
tests/e2e/vllm/test_offline_training.py
tests/e2e/vllm/test_online_training.py

Add utils for launching / stopping the vllm server + polling it to see when its ready. Also a util fn for running the prepare_data.py step.

Update launch_vllm.py so that it uses the python env it is run with, and improve its handling of args.

Fix a bug in train.py regarding hidden states dtype handling.

Related Issue

Tests

Run e2e:
https://github.com/neuralmagic/llm-compressor-testing/actions/runs/23920324667

I have filled in:

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan/results, such as providing test command and pasting the results.
(Optional) The necessary documentation update.
I (a human) have written or reviewed the code in this pr to the best of my ability.

Summary by CodeRabbit

Release Notes

Tests
- Added comprehensive end-to-end tests for offline and online training workflows to validate complete training pipelines.
Improvements
- Enhanced data preparation with improved hidden states dtype handling throughout the training pipeline.
- Refined vLLM server integration and management for more reliable model serving during training.

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

github-actions · 2026-04-02T20:25:21Z

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/24140149515/artifacts/6329351737.
They will be retained for up to 30 days.
Commit: 657fe92

mergify · 2026-04-02T20:25:22Z

The quality checks have failed. Please run make style and make quality under
the root directory to address the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/speculators/blob/main/CONTRIBUTING.md

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

shanjiaz

Just curious, are we testing acceptance rate for these trained models?

fynnsu · 2026-04-06T14:34:23Z

Just curious, are we testing acceptance rate for these trained models?

Not for these tests, they're just smoke tests. I will add regression tests as a follow up, which will basically be the same but use more samples and run for longer.

shanjiaz

Looks good! Maybe we can make these tests configurable so it's easier to add regression tests in the future.

dsikka

LGTM. Two comments

coderabbitai · 2026-04-08T14:15:14Z

📝 Walkthrough

Walkthrough

This PR modifies the vLLM launcher to execute the Python module via the current interpreter, updates the training script to propagate hidden-states dtype to dataset constructors, and introduces comprehensive end-to-end tests for both offline and online training workflows with supporting server orchestration utilities.

Changes

Cohort / File(s)	Summary
Launch & Training Scripts `scripts/launch_vllm.py`, `scripts/train.py`	Updated argument parsing to use `parse_known_args()` and Python interpreter invocation; added `hidden_states_dtype` parameter propagation to dataset constructors in both legacy and arrow paths.
E2E Test Suite `tests/e2e/vllm/test_offline_training.py`, `tests/e2e/vllm/test_online_training.py`	Added two new end-to-end tests orchestrating complete offline and online training workflows, including vLLM server startup, data preparation, subprocess invocation of training scripts, and checkpoint validation.
Test Utilities `tests/e2e/vllm/utils.py`	Expanded with process lifecycle management functions (`launch_vllm_server`, `stop_vllm_server`, `wait_for_server`), data preparation helper, and configuration constants (`SCRIPTS_DIR`, `VLLM_PYTHON`) for vLLM e2e orchestration.

Sequence Diagram(s)

sequenceDiagram
    participant Test as test_offline_training
    participant Server as vLLM Server
    participant DataGen as data_generation_offline2.py
    participant Trainer as train.py
    participant Engine as run_vllm_engine

    Test->>Server: launch_vllm_server(Qwen3-0.6B)
    activate Server
    Test->>Test: prepare_data()
    Test->>DataGen: subprocess.run with HTTP endpoint
    DataGen->>Server: POST requests for hidden states
    Server-->>DataGen: hidden state responses
    DataGen-->>Test: return (exit code 0)
    Test->>Server: stop_vllm_server()
    deactivate Server
    Test->>Trainer: subprocess.run with prepared data
    Trainer-->>Test: return (exit code 0, checkpoint saved)
    Test->>Engine: run_vllm_engine(checkpoint/0)
    Engine-->>Test: validation results

sequenceDiagram
    participant Test as test_online_training
    participant Server as vLLM Server
    participant Trainer as train.py
    participant Engine as run_vllm_engine

    Test->>Server: launch_vllm_server(Qwen3-0.6B)
    activate Server
    Test->>Test: prepare_data()
    Test->>Trainer: subprocess.run with live endpoint
    activate Trainer
    Trainer->>Server: query hidden states during training
    Server-->>Trainer: streaming responses
    Trainer-->>Test: return (exit code 0, checkpoint saved)
    deactivate Trainer
    Test->>Server: stop_vllm_server()
    deactivate Server
    Test->>Engine: run_vllm_engine(checkpoint/0)
    Engine-->>Test: validation results

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A hop through servers and tests so grand,
With hidden states in hand,
We launch and train, both near and far,
While vLLM shines like a data star! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the PR: adding end-to-end smoke tests for the new datagen system.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch e2e_new_datagen

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/e2e/vllm/utils.py (1)

147-147: Minor: VLLM_PYTHON shadows module-level constant.

Line 147 redefines VLLM_PYTHON locally, shadowing the module-level constant defined at line 24. Consider reusing the module-level constant for consistency.

♻️ Suggested fix

 def run_vllm_engine(
     model_path: str,
     tmp_path: Path,
     prompts: list[list[dict[str, str]]],
     disable_compile_cache: bool = False,
     max_tokens: int = 50,
     ignore_eos: bool = True,
     acceptance_thresholds: Iterable[float] | None = None,
 ):
-    VLLM_PYTHON = os.environ.get("VLLM_PYTHON", sys.executable)
     logger.info("vLLM Python executable: {}", VLLM_PYTHON)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/vllm/utils.py` at line 147, A local assignment redefines
VLLM_PYTHON and shadows the module-level constant; remove the local `VLLM_PYTHON
= os.environ.get("VLLM_PYTHON", sys.executable)` and use the existing
module-level `VLLM_PYTHON` constant instead (or, if a different value is
required, rename the local variable to something like `vllm_python_override`),
ensuring any code that referenced the local name now references the module-level
`VLLM_PYTHON` constant.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/e2e/vllm/utils.py`:
- Line 147: A local assignment redefines VLLM_PYTHON and shadows the
module-level constant; remove the local `VLLM_PYTHON =
os.environ.get("VLLM_PYTHON", sys.executable)` and use the existing module-level
`VLLM_PYTHON` constant instead (or, if a different value is required, rename the
local variable to something like `vllm_python_override`), ensuring any code that
referenced the local name now references the module-level `VLLM_PYTHON`
constant.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f31d5945-e839-4614-837c-d3a1c4a44871

📥 Commits

Reviewing files that changed from the base of the PR and between e2abceb and 657fe92.

📒 Files selected for processing (5)

scripts/launch_vllm.py
scripts/train.py
tests/e2e/vllm/test_offline_training.py
tests/e2e/vllm/test_online_training.py
tests/e2e/vllm/utils.py

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED. ## Purpose Currently we aren't storing the layer ids in the Eagle3 model configs and instead just match the defaults vllm use. We would instead like to explicitly set these, which will also allow users to use custom layers.  ## Description Update launch.py and train.py with a `--target-layer-ids` arg. Explicitly add layer ids to eagle3 config, even if they are automatically inferred from num_hidden_layers. Add user warnings to remind users to that custom layer ids must be passed into both scripts.  ## Related Issue  ## Tests ~~WIP. I need to test that this still loads into vLLM well. I also want to merge #378 first, because it fixes an issue with `launch_vllm.py` arg processing.~~ Tested on the merge commit between this pr and #378. Works as expected.  I have filled in: - [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [x] The test plan/results, such as providing test command and pasting the results. - [ ] (Optional) The necessary documentation update. - [x] I (a human) have written or reviewed the code in this pr to the best of my ability. --------- Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED. ## Purpose  vllm-project#353 adds new online/offline training datagen using vLLM's hidden states extraction feature. This pr adds some e2e smoke tests for both workflows. It also fixes an issue in `train.py` that the tests found. ## Description  Add the two tests under: `tests/e2e/vllm/test_offline_training.py` `tests/e2e/vllm/test_online_training.py` Add utils for launching / stopping the vllm server + polling it to see when its ready. Also a util fn for running the prepare_data.py step. Update `launch_vllm.py` so that it uses the python env it is run with, and improve its handling of args. Fix a bug in `train.py` regarding hidden states dtype handling. ## Related Issue  ## Tests Run e2e: https://github.com/neuralmagic/llm-compressor-testing/actions/runs/23920324667  I have filled in: - [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [x] The test plan/results, such as providing test command and pasting the results. - [ ] (Optional) The necessary documentation update. - [x] I (a human) have written or reviewed the code in this pr to the best of my ability.  ## Summary by CodeRabbit # Release Notes * **Tests** * Added comprehensive end-to-end tests for offline and online training workflows to validate complete training pipelines. * **Improvements** * Enhanced data preparation with improved hidden states dtype handling throughout the training pipeline. * Refined vLLM server integration and management for more reliable model serving during training.  --------- Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED. ## Purpose Currently we aren't storing the layer ids in the Eagle3 model configs and instead just match the defaults vllm use. We would instead like to explicitly set these, which will also allow users to use custom layers.  ## Description Update launch.py and train.py with a `--target-layer-ids` arg. Explicitly add layer ids to eagle3 config, even if they are automatically inferred from num_hidden_layers. Add user warnings to remind users to that custom layer ids must be passed into both scripts.  ## Related Issue  ## Tests ~~WIP. I need to test that this still loads into vLLM well. I also want to merge vllm-project#378 first, because it fixes an issue with `launch_vllm.py` arg processing.~~ Tested on the merge commit between this pr and vllm-project#378. Works as expected.  I have filled in: - [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [x] The test plan/results, such as providing test command and pasting the results. - [ ] (Optional) The necessary documentation update. - [x] I (a human) have written or reviewed the code in this pr to the best of my ability. --------- Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

fynnsu added 5 commits April 2, 2026 19:07

Update launch_vllm.py to use calling python env

df8499c

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Add e2e online training smoke test

34516cc

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Fix hidden states dtype issue

c5209cb

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Refactor to move logic from test_online_training into utils

057efbd

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

Add e2e offline training smoke test

4ca1580

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

mergify Bot added the quality-failed label Apr 2, 2026

Format

7d172ad

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>

mergify Bot removed the quality-failed label Apr 3, 2026

fynnsu mentioned this pull request Apr 3, 2026

Add explicit target-layer-ids handling #379

Merged

4 tasks

shanjiaz reviewed Apr 6, 2026

View reviewed changes

shanjiaz approved these changes Apr 6, 2026

View reviewed changes

dsikka reviewed Apr 7, 2026

View reviewed changes

Comment thread tests/e2e/vllm/test_offline_training.py

Comment thread tests/e2e/vllm/test_online_training.py

fynnsu requested a review from dsikka April 7, 2026 18:51

dsikka approved these changes Apr 8, 2026

View reviewed changes

Merge branch 'main' into e2e_new_datagen

657fe92

coderabbitai Bot mentioned this pull request Apr 8, 2026

Docs: Add depth to online training workflow documentation #383

Closed

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

fynnsu merged commit f84ff30 into main Apr 8, 2026
15 checks passed

fynnsu deleted the e2e_new_datagen branch April 8, 2026 14:32

This was referenced Apr 8, 2026

[Testings] Reorganize regression and smoke testing #386

Merged

Expand E2E testing #388

Merged

coderabbitai Bot mentioned this pull request Apr 21, 2026

fully deprecate old data generation system #433

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add e2e smoke tests for the new datagen system#378

Add e2e smoke tests for the new datagen system#378
fynnsu merged 7 commits into
mainfrom
e2e_new_datagen

fynnsu commented Apr 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 2, 2026

Uh oh!

shanjiaz left a comment

Uh oh!

fynnsu commented Apr 6, 2026

Uh oh!

shanjiaz left a comment

Uh oh!

dsikka left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fynnsu commented Apr 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Description

Related Issue

Tests

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented Apr 2, 2026

Uh oh!

shanjiaz left a comment

Choose a reason for hiding this comment

Uh oh!

fynnsu commented Apr 6, 2026

Uh oh!

shanjiaz left a comment

Choose a reason for hiding this comment

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fynnsu commented Apr 2, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Apr 2, 2026 •

edited

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading