-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Add Liquid Foundation Model (LFM2) #16890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
ef89773
Add version 0 of LFM2
tugot17 3ae717b
Add LFM2 to test_generation_models
tugot17 ac91a83
Add function calling
tugot17 de77456
Clean up the lfm2 implementation
tugot17 dafdb3b
LMF2 - optimized conv1d forward pass
tugot17 6b3ff94
Use optimized RMSNorm kernel
tugot17 a168c42
Handle the cuda graph caching issue steaming from the default dtype
tugot17 27a59cd
Add LFM2 to tool_choice tests
tugot17 493eb5b
Apply pre-commit formatting fixes (isort, black, typo fix)
tugot17 123e04e
Removed unused _find_matching_bracket
tugot17 97e6138
Use model dtype for LFM2 conv state cache
tugot17 b82020a
Merge branch 'main' into add-LFM2
tugot17 d66681f
Merge branch 'main' into add-LFM2
JustinTong0323 e6d0060
Fix LFM2 on Blackwell (SM100) GPUs
tugot17 e159dfe
Updated way we set up conv type in Mamba2StateDType
tugot17 c5ee6ea
Set conv dtype variable to mach model dtype in tests
tugot17 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| # coding=utf-8 | ||
| # Copyright 2024 Liquid AI and the HuggingFace Inc. team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| """LFM2 (Liquid Foundation Model 2) configuration""" | ||
|
|
||
| from typing import List, Optional | ||
|
|
||
| from transformers import CONFIG_MAPPING | ||
| from transformers import Lfm2Config as HFLfm2Config | ||
| from transformers.utils import logging | ||
|
|
||
| from sglang.srt.configs.mamba_utils import Mamba2CacheParams, Mamba2StateShape | ||
|
|
||
| logger = logging.get_logger(__name__) | ||
|
|
||
|
|
||
| class Lfm2Config(HFLfm2Config): | ||
| """ | ||
| SGLang configuration for LFM2 models. | ||
|
|
||
| Extends HuggingFace's Lfm2Config with hybrid model properties needed by SGLang. | ||
| LFM2 uses a hybrid architecture mixing full attention and ShortConv layers. | ||
| """ | ||
|
|
||
| @property | ||
| def full_attention_layer_ids(self) -> List[int]: | ||
| """Return indices of attention layers for KV cache.""" | ||
| return [i for i, lt in enumerate(self.layer_types) if lt == "full_attention"] | ||
|
|
||
| @property | ||
| def linear_layer_ids(self) -> List[int]: | ||
| """Return indices of conv layers for conv state cache.""" | ||
| return [ | ||
| i for i, lt in enumerate(self.layer_types) if lt in ("conv", "short_conv") | ||
| ] | ||
|
|
||
| @property | ||
| def mamba_chunk_size(self) -> int: | ||
| """Return chunk size for Mamba2 backend. LFM2 doesn't use chunking, return 1.""" | ||
| return 1 | ||
|
|
||
| @property | ||
| def mamba2_cache_params(self) -> Optional[Mamba2CacheParams]: | ||
| """ | ||
| Get cache params for HybridReqToTokenPool initialization. | ||
|
|
||
| LFM2 uses ShortConv layers with a small fixed-size cache (kernel_size - 1). | ||
| Unlike full Mamba2 models, LFM2 only uses the conv state, not SSM temporal state. | ||
| """ | ||
| from sglang.srt.layers.dp_attention import get_attention_tp_size | ||
|
|
||
| conv_layer_ids = self.linear_layer_ids | ||
| if not conv_layer_ids: | ||
| return None | ||
|
|
||
| hidden_size = self.hidden_size | ||
| # conv_L_cache in config is kernel_size (e.g., 3) | ||
| conv_kernel = int(self.conv_L_cache) | ||
| L_cache = conv_kernel - 1 # actual cache size (e.g., 2 for kernel=3) | ||
|
|
||
| # get_attention_tp_size() requires initialization, default to 1 if not available | ||
| try: | ||
| tp_size = get_attention_tp_size() | ||
| except (AssertionError, RuntimeError): | ||
| tp_size = 1 | ||
|
|
||
| # For ShortConv layers, we use a simplified Mamba2StateShape | ||
| # LFM2 doesn't use SSM state (state_size=0), only conv state | ||
| shape = Mamba2StateShape.create( | ||
| tp_world_size=tp_size, | ||
| intermediate_size=hidden_size, | ||
| n_groups=1, # ShortConv doesn't use grouping | ||
| num_heads=1, # ShortConv is not multi-head | ||
| head_dim=hidden_size, # Conv operates on full hidden dim | ||
| state_size=0, # No SSM temporal state for ShortConv | ||
| conv_kernel=conv_kernel, | ||
| ) | ||
|
|
||
| # Uses default mamba2_state_dtype() which reads SGLANG_MAMBA_CONV_DTYPE env var | ||
| # (defaults to bfloat16). Set SGLANG_MAMBA_CONV_DTYPE=float16 for fp16 inference. | ||
| return Mamba2CacheParams( | ||
| shape=shape, | ||
| layers=conv_layer_ids, | ||
| ) | ||
|
|
||
|
|
||
| # Override HuggingFace's Lfm2Config with our extended version | ||
| # Cannot use .register() because lfm2 is already registered by transformers | ||
| # Directly modify the internal _extra_content dict instead | ||
| CONFIG_MAPPING._extra_content["lfm2"] = Lfm2Config | ||
| logger.info("Registered SGLang Lfm2Config to override HuggingFace's version") | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to refactor it later but it is ok for current pr. I think ShortConv-only models being mixed with mamba models is tricky here. cc @ispobock @hebiao064
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to do some refactor later.