Add openai/gpt-oss-20b recipe and Laguna-XS.2 DFlash spec decoding by esmeetu · Pull Request #447 · vllm-project/recipes

esmeetu · 2026-05-08T11:21:07Z

Summary

New recipe openai/gpt-oss-20b — 21B / 3.6B-active MoE with native MXFP4, fits in 16GB VRAM. Single-node TP=1 default, with Hopper / Blackwell / AMD tuning ported from the gpt-oss-120b sibling (shared kernel paths). Removes the now-redundant "20b" variant from gpt-oss-120b.yaml so each HF id maps to one recipe page.
poolside/Laguna-XS.2 spec decoding — adds spec_decoding feature wiring --speculative-config to poolside/Laguna-XS.2-speculator.dflash (DFlash method, num_speculative_tokens=7). Guide documents the VLLM_USE_DEEP_GEMM=0 requirement and the vLLM PR #41880 dependency.

Test plan

node scripts/build-recipes-api.mjs → ✓ JSON API: 91 models, 8 strategies (was 90 before).
public/openai/gpt-oss-20b.json recommended command matches the YAML on H200 default.
public/poolside/Laguna-XS.2.json shows spec_decoding in features and listed under opt_in_features.
Reviewer eyeballs the Hopper/Blackwell tuning copied to gpt-oss-20b — confirm it's still correct on the smaller model.

🤖 Generated with Claude Code

- Add standalone openai/gpt-oss-20b recipe (21B/3.6B-A MoE, MXFP4, 16GB VRAM, single_node_tp with tp=1), with hardware tuning ported from the 120b sibling for the shared gpt-oss kernel paths. - Remove the now-redundant "20b" variant from gpt-oss-120b.yaml so the 20b page is the single source of truth. - Add spec_decoding feature to poolside/Laguna-XS.2 using the Laguna-XS.2-speculator.dflash draft model (DFlash, 7 tokens, greedy); document the VLLM_USE_DEEP_GEMM=0 requirement and PR #41880 dependency in the guide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: yasong.wang <yasong.wang@inferact.ai>

vercel · 2026-05-08T11:21:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	May 8, 2026 11:22am

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1bafcf0e1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-08T11:24:55Z

+  spec_decoding:
+    description: "DFlash speculative decoding with the Laguna-XS.2 draft model (7 tokens, greedy)"
+    args:
+      - "--speculative-config"
+      - '{"model":"poolside/Laguna-XS.2-speculator.dflash","num_speculative_tokens":7,"method":"dflash"}'


Include required DeepGEMM env guard for spec decoding

This spec_decoding feature only adds --speculative-config, but the same recipe explicitly documents that DFlash requires VLLM_USE_DEEP_GEMM=0 to work. In this codebase, feature toggles are rendered as CLI args only, so users who enable this toggle via the command builder will get a command that is missing the required environment setting and can fail at runtime unless they manually edit it. Please wire the required env guard into the generated configuration path, not only the guide text.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request refactors the OpenAI model configurations by moving the GPT-OSS 20B variant into its own dedicated file and updates the Poolside Laguna-XS.2 configuration to support DFlash speculative decoding. Feedback indicates that the newly added strategies for Laguna-XS.2 are missing their definition files, which will prevent them from appearing in the UI. Additionally, there is an inconsistency in the --max-model-len value provided in the Laguna-XS.2 guide that requires clarification or correction.

gemini-code-assist · 2026-05-08T11:27:01Z

+  - single_node_tep
+  - single_node_dep


The strategies single_node_tep and single_node_dep have been added to compatible_strategies, but the corresponding strategy definition files (e.g., strategies/single_node_tep.yaml) appear to be missing from this pull request. Without these files, these strategies will be filtered out by the CommandBuilder logic and will not be available for selection in the UI.

gemini-code-assist · 2026-05-08T11:27:01Z

+  ```bash
+  VLLM_USE_DEEP_GEMM=0 vllm serve poolside/Laguna-XS.2 \
+    --trust-remote-code \
+    --max-model-len 16384 \


There is an inconsistency in the --max-model-len value between the base launch command (131072 on line 103) and the speculative decoding example (16384 on line 159). If this reduction is a technical requirement for DFlash or due to memory constraints when VLLM_USE_DEEP_GEMM=0 is set, it should be explicitly noted in the guide to avoid confusing users who expect the full 128K context.

vercel Bot deployed to Preview May 8, 2026 11:22 View deployment

esmeetu merged commit 56922ce into main May 8, 2026
4 checks passed

esmeetu deleted the add-gpt-oss-20b-and-laguna-spec-decoding branch May 8, 2026 11:24

chatgpt-codex-connector Bot reviewed May 8, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add openai/gpt-oss-20b recipe and Laguna-XS.2 DFlash spec decoding#447

Add openai/gpt-oss-20b recipe and Laguna-XS.2 DFlash spec decoding#447
esmeetu merged 1 commit intomainfrom
add-gpt-oss-20b-and-laguna-spec-decoding

esmeetu commented May 8, 2026

Uh oh!

vercel Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esmeetu commented May 8, 2026

Summary

Test plan

Uh oh!

vercel Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 8, 2026 •

edited

Loading