Skip to content

[Spec Decoding] Add DFlash e2e tests and Buildkite CI#1870

Open
aaronzhfeng wants to merge 1 commit intovllm-project:mainfrom
aaronzhfeng:pr_dflash_1c
Open

[Spec Decoding] Add DFlash e2e tests and Buildkite CI#1870
aaronzhfeng wants to merge 1 commit intovllm-project:mainfrom
aaronzhfeng:pr_dflash_1c

Conversation

@aaronzhfeng
Copy link
Copy Markdown

Description

Add e2e tests and Buildkite CI for DFlash block-diffusion speculative decoding. The DFlash model/proposer were added in #1868, and pipeline integration in #1869. This PR adds the test coverage and CI.

Verified on both TPU v4 and v5p across 9 datasets (math, code, chat) with Qwen3-4B target + z-lab/Qwen3-4B-DFlash-b16 draft, achieving 3x average speedup.

Files:

  • tests/e2e/test_speculative_decoding.py -- add test_dflash_correctness (Qwen3-4B + DFlash draft, output correctness) and test_dflash_performance (1.5x speedup threshold)
  • .buildkite/features/Speculative_Decoding-_DFlash.yml -- Buildkite CI pipeline for DFlash correctness and performance, modeled after Eagle3's Speculative_Decoding-_Eagle3.yml

Tests

pytest tests/e2e/test_speculative_decoding.py::test_dflash_correctness
pytest tests/e2e/test_speculative_decoding.py::test_dflash_performance

Checklist

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: aaronzhfeng <fzx333578@gmail.com>
@bvrockwell
Copy link
Copy Markdown
Collaborator

Thanks so much for this contribution and breaking up the PRs like this, once the other 2 are in good shape, could you please review this guidance for landing the feature in our support matrices:

https://github.com/vllm-project/tpu-inference/tree/main/.buildkite#adding-a-new-feature-to-ci

cc @jcyang43 @yarongmu-google @kyuyeunk @vkantamani-cienet

@Lumosis
Copy link
Copy Markdown
Collaborator

Lumosis commented Apr 1, 2026

Consider adding the tests to pipeline_jax.yml so that it can run before merging every PR.

@Lumosis Lumosis self-requested a review April 1, 2026 01:37
aaronzhfeng added a commit to aaronzhfeng/tpu-inference that referenced this pull request Apr 10, 2026
…ject#1869 + vllm-project#1870)

Integration from dev workspace for full e2e validation:
- tpu_runner.py: DFlash + DFlash torchax dispatch
- speculative_decoding_manager.py: DFlash propose flow
- model_loader.py: DFlash model registration
- compilation_manager.py: DFlash precompilation
- test_speculative_decoding.py: e2e test

Signed-off-by: Aaron Feng <aaronzhfeng@gmail.com>
Signed-off-by: aaronzhfeng <fzx333578@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants