Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
c8b0c83
Add support for sliding window mechanism in `CohereAttention` for v2 …
ljw-mc Jan 2, 2026
06926f6
Merge branch 'sgl-project:main' into ljw-mc/support_cohere2
ljw-mc Jan 6, 2026
b5ab1a5
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 12, 2026
9a8e735
add `logit_scale` attribute from Cohere2
ljw-mc Jan 12, 2026
7423bf1
Merge branch 'ljw-mc/support_cohere1' of https://github.com/ljw-mc/sg…
ljw-mc Jan 12, 2026
21ed2ff
make conditions for instantiating `self.sliding_window_size` more rea…
ljw-mc Jan 13, 2026
27b8fd6
Merge branch 'sgl-project:main' into ljw-mc/support_cohere2
ljw-mc Jan 13, 2026
28ac624
make it clear there is v1 and v2
ljw-mc Jan 13, 2026
02cda49
Merge branch 'ljw-mc/support_cohere2' of https://github.com/ljw-mc/sg…
ljw-mc Jan 13, 2026
7243ce5
Merge branch 'sgl-project:main' into ljw-mc/support_cohere2
ljw-mc Jan 13, 2026
fe7e5bb
comments about the RoPE conditions
ljw-mc Jan 13, 2026
544cbff
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 14, 2026
0847c04
precommit
ljw-mc Jan 14, 2026
e11cb20
Merge branch 'ljw-mc/support_cohere2' of https://github.com/ljw-mc/sg…
ljw-mc Jan 14, 2026
a6ccf25
Merge branch 'sgl-project:main' into ljw-mc/support_cohere2
ljw-mc Jan 14, 2026
b3c457e
Merge branch 'sgl-project:main' into ljw-mc/support_cohere2
ljw-mc Jan 16, 2026
db03abd
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 16, 2026
2469767
add model to docs
ljw-mc Jan 16, 2026
2f75792
Merge branch 'ljw-mc/support_cohere2' of https://github.com/ljw-mc/sg…
ljw-mc Jan 16, 2026
c9f1d68
docs upd
ljw-mc Jan 16, 2026
400d2b5
docs upd
ljw-mc Jan 16, 2026
be1f77f
docs upd
ljw-mc Jan 16, 2026
2760bcd
docs upd
ljw-mc Jan 16, 2026
bdce48b
docs upd
ljw-mc Jan 16, 2026
534a2a3
add command-r command-a to docs
ljw-mc Jan 16, 2026
c9bbcda
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 17, 2026
9e8d9f6
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 17, 2026
6d94ea8
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 17, 2026
3e93328
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 18, 2026
084a2a8
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 18, 2026
f4a8fd5
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 20, 2026
e51c81f
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 20, 2026
a9fef41
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 21, 2026
d08e22c
Merge branch 'main' into ljw-mc/support_cohere2
ljw-mc Jan 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/supported_models/generative_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ in the GitHub search bar.
| **OLMoE** (Open MoE) | `allenai/OLMoE-1B-7B-0924` | Allen AI’s open Mixture-of-Experts model (7B total, 1B active parameters) delivering state-of-the-art results with sparse expert activation. |
| **MiniMax-M2** (M2, M2.1) | `minimax/MiniMax-M2`, `minimax/MiniMax-M2.1` | MiniMax’s SOTA LLM for coding & agentic workflows. |
| **StableLM** (3B, 7B) | `stabilityai/stablelm-tuned-alpha-7b` | StabilityAI’s early open-source LLM (3B & 7B) for general text generation; a demonstration model with basic instruction-following ability. |
| **Command-R** (Cohere) | `CohereForAI/c4ai-command-r-v01` | Cohere’s open conversational LLM (Command series) optimized for long context, retrieval-augmented generation, and tool use. |
| **Command-(R,A)** (Cohere) | `CohereLabs/c4ai-command-r-v01`, `CohereLabs/c4ai-command-r7b-12-2024`, `CohereLabs/c4ai-command-a-03-2025` | Cohere’s open conversational LLM (Command series) optimized for long context, retrieval-augmented generation, and tool use. |
| **DBRX** (Databricks) | `databricks/dbrx-instruct` | Databricks’ 132B-parameter MoE model (36B active) trained on 12T tokens; competes with GPT-3.5 quality as a fully open foundation model. |
| **Grok** (xAI) | `xai-org/grok-1` | xAI’s grok-1 model known for vast size(314B parameters) and high quality; integrated in SGLang for high-performance inference. |
| **ChatGLM** (GLM-130B family) | `THUDM/chatglm2-6b` | Zhipu AI’s bilingual chat model (6B) excelling at Chinese-English dialogue; fine-tuned for conversational quality and alignment. |
Expand All @@ -64,4 +64,4 @@ in the GitHub search bar.
| **NVIDIA Nemotron Nano 2.0** | `nvidia/NVIDIA-Nemotron-Nano-9B-v2` | The [NVIDIA Nemotron](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/) family of multimodal models provides state-of-the-art reasoning models specifically designed for enterprise-ready AI agents. `Nemotron-Nano-9B-v2` is a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. |
| **StarCoder2** (3B-15B) | `bigcode/starcoder2-7b` | StarCoder2 is a family of open large language models (LLMs) specialized for code generation and understanding. It is the successor to StarCoder, jointly developed by the BigCode project (a collaboration between Hugging Face, ServiceNow Research, and other contributors). |
| **Jet-Nemotron** | `jet-ai/Jet-Nemotron-2B` | Jet-Nemotron is a new family of hybrid-architecture language models that surpass state-of-the-art open-source full-attention language models, while achieving significant efficiency gains. |
| **Trinity** (Nano, Mini) | `arcee-ai/Trinity-Mini` | Arcee's foundational MoE Trinity family of models, open weights under Apache 2.0. |
| **Trinity** (Nano, Mini) | `arcee-ai/Trinity-Mini` | Arcee's foundational MoE Trinity family of models, open weights under Apache 2.0. |
20 changes: 17 additions & 3 deletions python/sglang/srt/models/commandr.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
import torch.utils.checkpoint
from torch import nn
from torch.nn.parameter import Parameter
from transformers import PretrainedConfig
from transformers import Cohere2Config, CohereConfig, PretrainedConfig

from sglang.srt.distributed import (
get_tensor_model_parallel_rank,
Expand Down Expand Up @@ -198,12 +198,23 @@ def __init__(
rope_scaling=self.rope_scaling,
is_neox_style=False,
)

self.v1 = isinstance(config, CohereConfig)
self.v2 = isinstance(config, Cohere2Config)

# Model v2 has interleaved sliding windows, v1 does not
if self.v2 and config.layer_types[layer_id] == "sliding_attention":
self.sliding_window_size = config.sliding_window
else:
self.sliding_window_size = -1

self.attn = RadixAttention(
self.num_heads,
self.head_dim,
self.scaling,
num_kv_heads=self.num_kv_heads,
layer_id=layer_id,
sliding_window_size=self.sliding_window_size,
quant_config=quant_config,
prefix=add_prefix("attn", prefix),
)
Expand Down Expand Up @@ -235,7 +246,9 @@ def forward(
q, k, v = qkv.split([self.q_size, self.kv_size, self.kv_size], dim=-1)
if self.use_qk_norm:
q, k = self._apply_qk_norm(q, k)
q, k = self.rotary_emb(positions, q, k)
# Model v1 uses RoPE throughout, Model v2 uses RoPE only for SWA layers
if self.v1 or self.sliding_window_size > 0:
q, k = self.rotary_emb(positions, q, k)
attn_output = self.attn(q, k, v, forward_batch)
output, _ = self.o_proj(attn_output)
return output
Expand Down Expand Up @@ -348,7 +361,8 @@ def __init__(
super().__init__()
self.config = config
self.quant_config = quant_config
self.logits_processor = LogitsProcessor(config)
self.logit_scale = getattr(config, "logit_scale", None)
self.logits_processor = LogitsProcessor(config, logit_scale=self.logit_scale)
self.model = CohereModel(
config, quant_config, prefix=add_prefix("model", prefix)
)
Expand Down
Loading