[Spec Decoding] Add DFlash e2e tests and Buildkite CI by aaronzhfeng · Pull Request #1870 · vllm-project/tpu-inference

aaronzhfeng · 2026-03-05T22:05:52Z

Description

Add e2e tests and Buildkite CI for DFlash block-diffusion speculative decoding. The DFlash model/proposer were added in #1868, and pipeline integration in #1869. This PR adds the test coverage and CI.

Verified on both TPU v4 and v5p across 9 datasets (math, code, chat) with Qwen3-4B target + z-lab/Qwen3-4B-DFlash-b16 draft, achieving 3x average speedup.

Files:

tests/e2e/test_speculative_decoding.py -- add test_dflash_correctness (Qwen3-4B + DFlash draft, output correctness) and test_dflash_performance (1.5x speedup threshold)
.buildkite/features/Speculative_Decoding-_DFlash.yml -- Buildkite CI pipeline for DFlash correctness and performance, modeled after Eagle3's Speculative_Decoding-_Eagle3.yml

Tests

pytest tests/e2e/test_speculative_decoding.py::test_dflash_correctness
pytest tests/e2e/test_speculative_decoding.py::test_dflash_performance

Checklist

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have made or will make corresponding changes to any relevant documentation.

Signed-off-by: aaronzhfeng <fzx333578@gmail.com>

bvrockwell · 2026-03-31T16:54:16Z

Thanks so much for this contribution and breaking up the PRs like this, once the other 2 are in good shape, could you please review this guidance for landing the feature in our support matrices:

https://github.com/vllm-project/tpu-inference/tree/main/.buildkite#adding-a-new-feature-to-ci

cc @jcyang43 @yarongmu-google @kyuyeunk @vkantamani-cienet

Lumosis · 2026-04-01T01:36:49Z

Consider adding the tests to pipeline_jax.yml so that it can run before merging every PR.

…ject#1869 + vllm-project#1870) Integration from dev workspace for full e2e validation: - tpu_runner.py: DFlash + DFlash torchax dispatch - speculative_decoding_manager.py: DFlash propose flow - model_loader.py: DFlash model registration - compilation_manager.py: DFlash precompilation - test_speculative_decoding.py: e2e test Signed-off-by: Aaron Feng <aaronzhfeng@gmail.com> Signed-off-by: aaronzhfeng <fzx333578@gmail.com>

[Spec Decoding] Add DFlash e2e tests and Buildkite CI

bdc64e9

Signed-off-by: aaronzhfeng <fzx333578@gmail.com>

aaronzhfeng requested review from QiliangCui, jcyang43 and vipannalla as code owners March 5, 2026 22:05

kyuyeunk mentioned this pull request Mar 31, 2026

[Spec Decoding] Integrate DFlash into speculative decoding pipeline #1869

Open

3 tasks

Lumosis self-requested a review April 1, 2026 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec Decoding] Add DFlash e2e tests and Buildkite CI#1870

[Spec Decoding] Add DFlash e2e tests and Buildkite CI#1870
aaronzhfeng wants to merge 1 commit intovllm-project:mainfrom
aaronzhfeng:pr_dflash_1c

aaronzhfeng commented Mar 5, 2026

Uh oh!

bvrockwell commented Mar 31, 2026

Uh oh!

Lumosis commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aaronzhfeng commented Mar 5, 2026

Description

Tests

Checklist

Uh oh!

bvrockwell commented Mar 31, 2026

Uh oh!

Lumosis commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants