Add registration API for external linear attention backend by charlotte12l · Pull Request #21983 · sgl-project/sglang

charlotte12l · 2026-04-02T23:25:43Z

Motivation

SGLang currently hardcodes hybrid model support (GDN, KDA, Mamba2, Lightning) across 5 core files via isinstance checks and architecture name lists. Adding a new linear attention hybrid model requires modifying all 5 files:

model_runner.py — mambaish_config property
attention_registry.py — backend dispatch in attn_backend_wrapper
scheduler.py — is_hybrid_ssm flag for MambaRadixCache
server_args.py — _handle_model_specific_adjustments
triton_backend.py — v_head_dim check

This makes it difficult for external or experimental linear attention models (e.g., custom KDA variants) to integrate with SGLang without forking the core. A registration API lets external models self-register and plug into all 5 integration points with zero modifications to SGLang source.

Modifications

New file: python/sglang/srt/configs/linear_attn_model_registry.py

LinearAttnModelSpec dataclass — captures config class, backend class name, arch names, and cache behavior flags
register_linear_attn_model() — appends a spec to the global registry (called at import time)
get_linear_attn_config() — isinstance-based lookup, returns (spec, resolved_config) or None
get_linear_attn_spec_by_arch() — arch name lookup for server_args dispatch
import_backend_class() — lazy import of the backend class from a fully-qualified dotted name

Integration (all purely additive, existing behavior unchanged):

model_runner.py — new linear_attn_model_spec property; mambaish_config falls back to registry after existing hardcoded checks
attention_registry.py — attn_backend_wrapper falls back to registry when no hardcoded backend matches
scheduler.py — is_hybrid_ssm checks registry for uses_mamba_radix_cache
server_args.py — _handle_model_specific_adjustments calls _handle_mamba_radix_cache for registered models
triton_backend.py — v_head_dim check includes linear_attn_model_spec

Usage:

from sglang.srt.configs.linear_attn_model_registry import (
    register_linear_attn_model, LinearAttnModelSpec,
)

register_linear_attn_model(LinearAttnModelSpec(
    config_class=MyLinearAttnConfig,
    backend_class_name="sglang.srt.layers.attention.linear.kda_backend.KDAAttnBackend",
    arch_names=["MyModelForCausalLM"],
    uses_mamba_radix_cache=True,
))

Accuracy Tests

Unit tests covering all registry functions (11 tests, all passing):

test_register_and_lookup_by_config ... ok
test_lookup_no_match ... ok
test_lookup_empty_registry ... ok
test_unwrap_text_config ... ok
test_unwrap_text_config_no_match ... ok
test_lookup_by_arch ... ok
test_lookup_by_arch_empty_registry ... ok
test_multiple_registrations ... ok
test_first_match_wins ... ok
test_import_backend_class ... ok
test_spec_defaults ... ok

No accuracy regression is possible — the registry is purely additive. All existing hardcoded checks execute first; the registry is only consulted as a fallback when no existing model matches. When the registry is empty (default), all code paths are identical to before this change.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-02T23:25:47Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

merrymercy · 2026-04-03T02:21:39Z

+    unwrap_text_config: bool = False
+
+    # If True, asserts the model is not used with MLA backends.
+    mla_incompatible: bool = False


when is this used?

This check is inspired from https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/attention/attention_registry.py#L194-L196

merrymercy · 2026-04-03T02:27:50Z

/tag-and-rerun-ci

SGLang currently hardcodes hybrid model support (GDN, KDA, Mamba2, Lightning) across 5 files via isinstance checks and architecture name lists. Adding a new linear attention hybrid model requires modifying all 5 core files. This adds a `register_linear_attn_model()` API so external models can self-register without modifying SGLang source: from sglang.srt.configs.linear_attn_model_registry import ( register_linear_attn_model, LinearAttnModelSpec, ) register_linear_attn_model(LinearAttnModelSpec( config_class=MyConfig, backend_class_name="sglang.srt...KDAAttnBackend", arch_names=["MyModelForCausalLM"], uses_mamba_radix_cache=True, )) All 5 integration points now check the registry as a fallback after existing hardcoded checks, so this is purely additive with zero behavior change for existing models: - model_runner.py: mambaish_config + new linear_attn_model_spec property - attention_registry.py: backend dispatch fallback - scheduler.py: is_hybrid_ssm fallback - server_args.py: _handle_model_specific_adjustments fallback - triton_backend.py: v_head_dim check fallback Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Remove the redundant mla_incompatible field from LinearAttnModelSpec and its associated assertion in attention_registry.py, as requested in PR sgl-project#21983 review. Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l · 2026-04-06T18:31:20Z

@merrymercy I've addressed the comments, those timeout errors are not related...

Follow-up to removing the mla_incompatible field from LinearAttnModelSpec per reviewer feedback. Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

merrymercy · 2026-04-06T22:33:11Z

/tag-and-rerun-ci

…ct#21983) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l requested review from Fridge003, HaiShaw, Qiaolin-Yu, Ying1123, hebiao064, hnyls2002, ispobock, merrymercy and xiezhq-hermann as code owners April 2, 2026 23:25

charlotte12l force-pushed the lxy/hybrid-model-registry branch 2 times, most recently from b1161f2 to 8492037 Compare April 2, 2026 23:40

merrymercy requested changes Apr 3, 2026

View reviewed changes

github-actions Bot added the run-ci label Apr 3, 2026

charlotte12l force-pushed the lxy/hybrid-model-registry branch from 8492037 to 46e8768 Compare April 3, 2026 05:41

charlotte12l force-pushed the lxy/hybrid-model-registry branch from 46e8768 to 3258b3e Compare April 3, 2026 05:48

[Hybrid] Clean up comments in linear attention model registry

83d132d

Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l force-pushed the lxy/hybrid-model-registry branch from 3258b3e to 83d132d Compare April 3, 2026 06:13

Merge branch 'main' into lxy/hybrid-model-registry

e83985f

charlotte12l changed the title ~~[Hybrid] Add registration API for external linear attention models~~ Add registration API for external linear attention backend Apr 5, 2026

[Hybrid] Remove mla_incompatible field per reviewer feedback

224ee85

Remove the redundant mla_incompatible field from LinearAttnModelSpec and its associated assertion in attention_registry.py, as requested in PR sgl-project#21983 review. Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

charlotte12l and others added 2 commits April 6, 2026 11:58

Merge branch 'main' into lxy/hybrid-model-registry

bd4c52d

[Hybrid] Remove mla_incompatible from test_spec_defaults

fe83db9

Follow-up to removing the mla_incompatible field from LinearAttnModelSpec per reviewer feedback. Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

merrymercy approved these changes Apr 6, 2026

View reviewed changes

Merge branch 'main' into lxy/hybrid-model-registry

73502a4

merrymercy merged commit 98f38b1 into sgl-project:main Apr 7, 2026
134 of 145 checks passed

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Add registration API for external linear attention backend (sgl-proje…

46904a2

…ct#21983) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

caitengwei pushed a commit to caitengwei/sglang that referenced this pull request Jun 1, 2026

Add registration API for external linear attention backend (sgl-proje…

bed69c0

…ct#21983) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add registration API for external linear attention backend#21983

Add registration API for external linear attention backend#21983
merrymercy merged 7 commits into
sgl-project:mainfrom
charlotte12l:lxy/hybrid-model-registry

charlotte12l commented Apr 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 2, 2026

Uh oh!

merrymercy Apr 3, 2026

Uh oh!

charlotte12l Apr 3, 2026

Uh oh!

Uh oh!

merrymercy commented Apr 3, 2026

Uh oh!

charlotte12l commented Apr 6, 2026

Uh oh!

merrymercy commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

charlotte12l commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 2, 2026

Uh oh!

merrymercy Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

charlotte12l Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

merrymercy commented Apr 3, 2026

Uh oh!

charlotte12l commented Apr 6, 2026

Uh oh!

merrymercy commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

charlotte12l commented Apr 2, 2026 •

edited

Loading