Skip to content

Conversation

@rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Jan 8, 2026

Adds parameterized pytest tests to detect acceptance length regressions in EAGLE3 speculative decoding. These tests ensure that new commits do not degrade speculative decoding performance.

Changes

  • tests/v1/spec_decode/test_acceptance_length.py: New test file with parameterized tests for EAGLE3 model pairs

Models Tested

Verifier Drafter
meta-llama/Llama-3.1-8B-Instruct RedHatAI/Llama-3.1-8B-Instruct-speculator.eagle3
Qwen/Qwen3-8B RedHatAI/Qwen3-8B-speculator.eagle3
openai/gpt-oss-20b RedHatAI/gpt-oss-20b-speculator.eagle3

Test Design

  • Uses philschmid/mt-bench dataset (80 prompts)
  • Runs inference with 3 speculative tokens
  • Extracts acceptance length via llm.get_metrics()
  • Asserts within 2% relative tolerance of expected baseline
  • Tests are parameterized for easy addition of new model configurations

Test Plan

  • Run baseline commands to determine expected acceptance lengths
  • Update EAGLE3_MODEL_CONFIGS with baseline values
  • Run tests: CUDA_VISIBLE_DEVICES=3 pytest tests/v1/spec_decode/test_acceptance_length.py -v -s
  • Verify all tests pass within tolerance

Usage

# Run all acceptance length tests
CUDA_VISIBLE_DEVICES=3 pytest tests/v1/spec_decode/test_acceptance_length.py -v -s

# Run specific model
CUDA_VISIBLE_DEVICES=3 pytest tests/v1/spec_decode/test_acceptance_length.py -v -s -k "llama3"

@rahul-tuli rahul-tuli force-pushed the add-acceptance-length-tests branch 3 times, most recently from 1f8b8e0 to d425f1c Compare January 9, 2026 14:42
@rahul-tuli rahul-tuli force-pushed the add-acceptance-length-tests branch 2 times, most recently from 977d295 to 2e12d39 Compare January 19, 2026 14:56
…n validation

  Add parameterized pytest tests to detect acceptance length regressions
  in EAGLE3 speculative decoding. Tests run inference on MT-Bench dataset
  (80 prompts) and assert both mean and per-position acceptance lengths
  are within 2% tolerance of baseline.

  Models tested:
  - Llama-3.1-8B-Instruct (AL: 2.60)
  - Qwen3-8B (AL: 2.26)
  - GPT-OSS-20B (AL: 2.56)

Signed-off-by: rahul-tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
  - Use VllmRunner context manager instead of direct LLM instantiation
  - Use monkeypatch.context() for proper env var scoping
  - Use AcceptanceMetrics TypedDict in return statement
  - Remove docstrings from TypedDict and dataclass definitions
  - Remove inline comments from constants
  - Remove prototyping skip condition (all configs have baselines)
  - Fix gpt-oss-20b expected position 2 value (0.3220 -> 0.3337)

Signed-off-by: rahul-tuli <[email protected]>
Signed-off-by: rahul-tuli <[email protected]>
@rahul-tuli rahul-tuli force-pushed the add-acceptance-length-tests branch from 2e12d39 to 5db9a9a Compare January 19, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants