Add registration API for external linear attention backend#21983
Merged
merrymercy merged 7 commits intoApr 7, 2026
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
b1161f2 to
8492037
Compare
merrymercy
requested changes
Apr 3, 2026
| unwrap_text_config: bool = False | ||
|
|
||
| # If True, asserts the model is not used with MLA backends. | ||
| mla_incompatible: bool = False |
Contributor
Author
There was a problem hiding this comment.
Contributor
|
/tag-and-rerun-ci |
8492037 to
46e8768
Compare
SGLang currently hardcodes hybrid model support (GDN, KDA, Mamba2,
Lightning) across 5 files via isinstance checks and architecture name
lists. Adding a new linear attention hybrid model requires modifying
all 5 core files.
This adds a `register_linear_attn_model()` API so external models can
self-register without modifying SGLang source:
from sglang.srt.configs.linear_attn_model_registry import (
register_linear_attn_model, LinearAttnModelSpec,
)
register_linear_attn_model(LinearAttnModelSpec(
config_class=MyConfig,
backend_class_name="sglang.srt...KDAAttnBackend",
arch_names=["MyModelForCausalLM"],
uses_mamba_radix_cache=True,
))
All 5 integration points now check the registry as a fallback after
existing hardcoded checks, so this is purely additive with zero
behavior change for existing models:
- model_runner.py: mambaish_config + new linear_attn_model_spec property
- attention_registry.py: backend dispatch fallback
- scheduler.py: is_hybrid_ssm fallback
- server_args.py: _handle_model_specific_adjustments fallback
- triton_backend.py: v_head_dim check fallback
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
46e8768 to
3258b3e
Compare
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
3258b3e to
83d132d
Compare
Remove the redundant mla_incompatible field from LinearAttnModelSpec and its associated assertion in attention_registry.py, as requested in PR sgl-project#21983 review. Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Contributor
Author
|
@merrymercy I've addressed the comments, those timeout errors are not related... |
Follow-up to removing the mla_incompatible field from LinearAttnModelSpec per reviewer feedback. Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
merrymercy
approved these changes
Apr 6, 2026
Contributor
|
/tag-and-rerun-ci |
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
…ct#21983) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
caitengwei
pushed a commit
to caitengwei/sglang
that referenced
this pull request
Jun 1, 2026
…ct#21983) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
SGLang currently hardcodes hybrid model support (GDN, KDA, Mamba2, Lightning) across 5 core files via
isinstancechecks and architecture name lists. Adding a new linear attention hybrid model requires modifying all 5 files:model_runner.py—mambaish_configpropertyattention_registry.py— backend dispatch inattn_backend_wrapperscheduler.py—is_hybrid_ssmflag for MambaRadixCacheserver_args.py—_handle_model_specific_adjustmentstriton_backend.py—v_head_dimcheckThis makes it difficult for external or experimental linear attention models (e.g., custom KDA variants) to integrate with SGLang without forking the core. A registration API lets external models self-register and plug into all 5 integration points with zero modifications to SGLang source.
Modifications
New file:
python/sglang/srt/configs/linear_attn_model_registry.pyLinearAttnModelSpecdataclass — captures config class, backend class name, arch names, and cache behavior flagsregister_linear_attn_model()— appends a spec to the global registry (called at import time)get_linear_attn_config()— isinstance-based lookup, returns(spec, resolved_config)orNoneget_linear_attn_spec_by_arch()— arch name lookup forserver_argsdispatchimport_backend_class()— lazy import of the backend class from a fully-qualified dotted nameIntegration (all purely additive, existing behavior unchanged):
model_runner.py— newlinear_attn_model_specproperty;mambaish_configfalls back to registry after existing hardcoded checksattention_registry.py—attn_backend_wrapperfalls back to registry when no hardcoded backend matchesscheduler.py—is_hybrid_ssmchecks registry foruses_mamba_radix_cacheserver_args.py—_handle_model_specific_adjustmentscalls_handle_mamba_radix_cachefor registered modelstriton_backend.py—v_head_dimcheck includeslinear_attn_model_specUsage:
Accuracy Tests
Unit tests covering all registry functions (11 tests, all passing):
No accuracy regression is possible — the registry is purely additive. All existing hardcoded checks execute first; the registry is only consulted as a fallback when no existing model matches. When the registry is empty (default), all code paths are identical to before this change.
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci