Skip to content

Add Qwen3NextForCausalLM to mamba_like_arch#1450

Merged
jbyczkow merged 1 commit into
vllm-project:mainfrom
rsmyrek:qwen3-next-mamba-like-arch
May 19, 2026
Merged

Add Qwen3NextForCausalLM to mamba_like_arch#1450
jbyczkow merged 1 commit into
vllm-project:mainfrom
rsmyrek:qwen3-next-mamba-like-arch

Conversation

@rsmyrek
Copy link
Copy Markdown
Contributor

@rsmyrek rsmyrek commented May 15, 2026

Qwen3Next uses a hybrid GDN+attention architecture that requires separate KV cache groups for GDN vs standard attention layers. Add it to the mamba_like_arch list so maybe_set_mamba_kv_cache_groups_ids() sets up the cache groups correctly.

Qwen3Next uses a hybrid GDN+attention architecture that requires
separate KV cache groups for GDN vs standard attention layers.
Add it to the mamba_like_arch list so maybe_set_mamba_kv_cache_groups_ids()
sets up the cache groups correctly.

Signed-off-by: Radoslaw Smyrek <radoslawx.smyrek@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates Gaudi HPU model runner logic to treat an additional Qwen model architecture as “mamba-like” for KV-cache group ID configuration.

Changes:

  • Add Qwen3NextForCausalLM to the mamba_like_arch allowlist used by maybe_set_mamba_kv_cache_groups_ids.

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

@jbyczkow jbyczkow merged commit d999b2e into vllm-project:main May 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants