Skip to content

Add e2e smoke tests for the new datagen system#378

Merged
fynnsu merged 7 commits into
mainfrom
e2e_new_datagen
Apr 8, 2026
Merged

Add e2e smoke tests for the new datagen system#378
fynnsu merged 7 commits into
mainfrom
e2e_new_datagen

Conversation

@fynnsu
Copy link
Copy Markdown
Collaborator

@fynnsu fynnsu commented Apr 2, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

#353 adds new online/offline training datagen using vLLM's hidden states extraction feature. This pr adds some e2e smoke tests for both workflows. It also fixes an issue in train.py that the tests found.

Description

Add the two tests under:
tests/e2e/vllm/test_offline_training.py
tests/e2e/vllm/test_online_training.py

Add utils for launching / stopping the vllm server + polling it to see when its ready. Also a util fn for running the prepare_data.py step.

Update launch_vllm.py so that it uses the python env it is run with, and improve its handling of args.

Fix a bug in train.py regarding hidden states dtype handling.

Related Issue

Tests

Run e2e:
https://github.com/neuralmagic/llm-compressor-testing/actions/runs/23920324667

I have filled in:

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan/results, such as providing test command and pasting the results.
  • (Optional) The necessary documentation update.
  • I (a human) have written or reviewed the code in this pr to the best of my ability.

Summary by CodeRabbit

Release Notes

  • Tests

    • Added comprehensive end-to-end tests for offline and online training workflows to validate complete training pipelines.
  • Improvements

    • Enhanced data preparation with improved hidden states dtype handling throughout the training pipeline.
    • Refined vLLM server integration and management for more reliable model serving during training.

fynnsu added 5 commits April 2, 2026 19:07
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/24140149515/artifacts/6329351737.
They will be retained for up to 30 days.
Commit: 657fe92

@mergify
Copy link
Copy Markdown

mergify Bot commented Apr 2, 2026

The quality checks have failed. Please run make style and make quality under
the root directory to address the lint failures. You will need to install the
dev optional install to get the required linting packages:
https://github.com/vllm-project/speculators/blob/main/CONTRIBUTING.md

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
@mergify mergify Bot removed the quality-failed label Apr 3, 2026
@fynnsu fynnsu mentioned this pull request Apr 3, 2026
4 tasks
Copy link
Copy Markdown
Collaborator

@shanjiaz shanjiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, are we testing acceptance rate for these trained models?

@fynnsu
Copy link
Copy Markdown
Collaborator Author

fynnsu commented Apr 6, 2026

Just curious, are we testing acceptance rate for these trained models?

Not for these tests, they're just smoke tests. I will add regression tests as a follow up, which will basically be the same but use more samples and run for longer.

Copy link
Copy Markdown
Collaborator

@shanjiaz shanjiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Maybe we can make these tests configurable so it's easier to add regression tests in the future.

Copy link
Copy Markdown
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two comments

Comment thread tests/e2e/vllm/test_offline_training.py
Comment thread tests/e2e/vllm/test_online_training.py
@fynnsu fynnsu requested a review from dsikka April 7, 2026 18:51
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

This PR modifies the vLLM launcher to execute the Python module via the current interpreter, updates the training script to propagate hidden-states dtype to dataset constructors, and introduces comprehensive end-to-end tests for both offline and online training workflows with supporting server orchestration utilities.

Changes

Cohort / File(s) Summary
Launch & Training Scripts
scripts/launch_vllm.py, scripts/train.py
Updated argument parsing to use parse_known_args() and Python interpreter invocation; added hidden_states_dtype parameter propagation to dataset constructors in both legacy and arrow paths.
E2E Test Suite
tests/e2e/vllm/test_offline_training.py, tests/e2e/vllm/test_online_training.py
Added two new end-to-end tests orchestrating complete offline and online training workflows, including vLLM server startup, data preparation, subprocess invocation of training scripts, and checkpoint validation.
Test Utilities
tests/e2e/vllm/utils.py
Expanded with process lifecycle management functions (launch_vllm_server, stop_vllm_server, wait_for_server), data preparation helper, and configuration constants (SCRIPTS_DIR, VLLM_PYTHON) for vLLM e2e orchestration.

Sequence Diagram(s)

sequenceDiagram
    participant Test as test_offline_training
    participant Server as vLLM Server
    participant DataGen as data_generation_offline2.py
    participant Trainer as train.py
    participant Engine as run_vllm_engine

    Test->>Server: launch_vllm_server(Qwen3-0.6B)
    activate Server
    Test->>Test: prepare_data()
    Test->>DataGen: subprocess.run with HTTP endpoint
    DataGen->>Server: POST requests for hidden states
    Server-->>DataGen: hidden state responses
    DataGen-->>Test: return (exit code 0)
    Test->>Server: stop_vllm_server()
    deactivate Server
    Test->>Trainer: subprocess.run with prepared data
    Trainer-->>Test: return (exit code 0, checkpoint saved)
    Test->>Engine: run_vllm_engine(checkpoint/0)
    Engine-->>Test: validation results
Loading
sequenceDiagram
    participant Test as test_online_training
    participant Server as vLLM Server
    participant Trainer as train.py
    participant Engine as run_vllm_engine

    Test->>Server: launch_vllm_server(Qwen3-0.6B)
    activate Server
    Test->>Test: prepare_data()
    Test->>Trainer: subprocess.run with live endpoint
    activate Trainer
    Trainer->>Server: query hidden states during training
    Server-->>Trainer: streaming responses
    Trainer-->>Test: return (exit code 0, checkpoint saved)
    deactivate Trainer
    Test->>Server: stop_vllm_server()
    deactivate Server
    Test->>Engine: run_vllm_engine(checkpoint/0)
    Engine-->>Test: validation results
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A hop through servers and tests so grand,
With hidden states in hand,
We launch and train, both near and far,
While vLLM shines like a data star!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: adding end-to-end smoke tests for the new datagen system.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e_new_datagen

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/e2e/vllm/utils.py (1)

147-147: Minor: VLLM_PYTHON shadows module-level constant.

Line 147 redefines VLLM_PYTHON locally, shadowing the module-level constant defined at line 24. Consider reusing the module-level constant for consistency.

♻️ Suggested fix
 def run_vllm_engine(
     model_path: str,
     tmp_path: Path,
     prompts: list[list[dict[str, str]]],
     disable_compile_cache: bool = False,
     max_tokens: int = 50,
     ignore_eos: bool = True,
     acceptance_thresholds: Iterable[float] | None = None,
 ):
-    VLLM_PYTHON = os.environ.get("VLLM_PYTHON", sys.executable)
     logger.info("vLLM Python executable: {}", VLLM_PYTHON)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/vllm/utils.py` at line 147, A local assignment redefines
VLLM_PYTHON and shadows the module-level constant; remove the local `VLLM_PYTHON
= os.environ.get("VLLM_PYTHON", sys.executable)` and use the existing
module-level `VLLM_PYTHON` constant instead (or, if a different value is
required, rename the local variable to something like `vllm_python_override`),
ensuring any code that referenced the local name now references the module-level
`VLLM_PYTHON` constant.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/e2e/vllm/utils.py`:
- Line 147: A local assignment redefines VLLM_PYTHON and shadows the
module-level constant; remove the local `VLLM_PYTHON =
os.environ.get("VLLM_PYTHON", sys.executable)` and use the existing module-level
`VLLM_PYTHON` constant instead (or, if a different value is required, rename the
local variable to something like `vllm_python_override`), ensuring any code that
referenced the local name now references the module-level `VLLM_PYTHON`
constant.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f31d5945-e839-4614-837c-d3a1c4a44871

📥 Commits

Reviewing files that changed from the base of the PR and between e2abceb and 657fe92.

📒 Files selected for processing (5)
  • scripts/launch_vllm.py
  • scripts/train.py
  • tests/e2e/vllm/test_offline_training.py
  • tests/e2e/vllm/test_online_training.py
  • tests/e2e/vllm/utils.py

@fynnsu fynnsu merged commit f84ff30 into main Apr 8, 2026
15 checks passed
@fynnsu fynnsu deleted the e2e_new_datagen branch April 8, 2026 14:32
fynnsu added a commit that referenced this pull request Apr 8, 2026
<!-- markdownlint-disable -->

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT
THE BOTTOM) HAVE BEEN CONSIDERED.

## Purpose

Currently we aren't storing the layer ids in the Eagle3 model configs
and instead just match the defaults vllm use. We would instead like to
explicitly set these, which will also allow users to use custom layers.


<!--- Why your changes are needed -->

## Description

Update launch.py and train.py with a `--target-layer-ids` arg.
Explicitly add layer ids to eagle3 config, even if they are
automatically inferred from num_hidden_layers.

Add user warnings to remind users to that custom layer ids must be
passed into both scripts.

<!--- High-level concise summary of changes -->

## Related Issue

<!--- Link related issue if applicable -->

## Tests

~~WIP. I need to test that this still loads into vLLM well. I also want
to merge #378 first, because it fixes an issue with `launch_vllm.py` arg
processing.~~

Tested on the merge commit between this pr and #378. Works as expected. 

<!--- Please describe in detail how you tested your changes. -->

I have filled in:

- [x] The purpose of the PR, such as "Fix some issue (link existing
issues this PR will resolve)".
- [x] The test plan/results, such as providing test command and pasting
the results.
- [ ] (Optional) The necessary documentation update.
- [x] I (a human) have written or reviewed the code in this pr to the
best of my ability.

---------

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
my-other-github-account pushed a commit to my-other-github-account/speculators that referenced this pull request May 15, 2026
<!-- markdownlint-disable -->

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT
THE BOTTOM) HAVE BEEN CONSIDERED.

## Purpose

<!--- Why your changes are needed -->

vllm-project#353 adds new online/offline training datagen using vLLM's hidden states
extraction feature. This pr adds some e2e smoke tests for both
workflows. It also fixes an issue in `train.py` that the tests found.


## Description

<!--- High-level concise summary of changes -->

Add the two tests under:
`tests/e2e/vllm/test_offline_training.py`
`tests/e2e/vllm/test_online_training.py`

Add utils for launching / stopping the vllm server + polling it to see
when its ready. Also a util fn for running the prepare_data.py step.

Update `launch_vllm.py` so that it uses the python env it is run with,
and improve its handling of args.

Fix a bug in `train.py` regarding hidden states dtype handling.


## Related Issue

<!--- Link related issue if applicable -->

## Tests

Run e2e:

https://github.com/neuralmagic/llm-compressor-testing/actions/runs/23920324667


<!--- Please describe in detail how you tested your changes. -->

I have filled in:

- [x] The purpose of the PR, such as "Fix some issue (link existing
issues this PR will resolve)".
- [x] The test plan/results, such as providing test command and pasting
the results.
- [ ] (Optional) The necessary documentation update.
- [x] I (a human) have written or reviewed the code in this pr to the
best of my ability.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

# Release Notes

* **Tests**
* Added comprehensive end-to-end tests for offline and online training
workflows to validate complete training pipelines.

* **Improvements**
* Enhanced data preparation with improved hidden states dtype handling
throughout the training pipeline.
* Refined vLLM server integration and management for more reliable model
serving during training.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
my-other-github-account pushed a commit to my-other-github-account/speculators that referenced this pull request May 15, 2026
<!-- markdownlint-disable -->

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT
THE BOTTOM) HAVE BEEN CONSIDERED.

## Purpose

Currently we aren't storing the layer ids in the Eagle3 model configs
and instead just match the defaults vllm use. We would instead like to
explicitly set these, which will also allow users to use custom layers.


<!--- Why your changes are needed -->

## Description

Update launch.py and train.py with a `--target-layer-ids` arg.
Explicitly add layer ids to eagle3 config, even if they are
automatically inferred from num_hidden_layers.

Add user warnings to remind users to that custom layer ids must be
passed into both scripts.

<!--- High-level concise summary of changes -->

## Related Issue

<!--- Link related issue if applicable -->

## Tests

~~WIP. I need to test that this still loads into vLLM well. I also want
to merge vllm-project#378 first, because it fixes an issue with `launch_vllm.py` arg
processing.~~

Tested on the merge commit between this pr and vllm-project#378. Works as expected. 

<!--- Please describe in detail how you tested your changes. -->

I have filled in:

- [x] The purpose of the PR, such as "Fix some issue (link existing
issues this PR will resolve)".
- [x] The test plan/results, such as providing test command and pasting
the results.
- [ ] (Optional) The necessary documentation update.
- [x] I (a human) have written or reviewed the code in this pr to the
best of my ability.

---------

Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants