[new-model] Add support for `Cohere2ForCausalLM` behind Command-A and Command-R Models by ljw-mc · Pull Request #16927 · sgl-project/sglang

ljw-mc · 2026-01-12T04:13:23Z

Motivation

This PR addresses #4570 and provides support for the Cohere2ForCausalLM architecture behind Cohere's Command-A and Command-R7B model variants for SGLang.

Background

Cohere2ForCausalLM features a hybrid attention approach, compared to CohereForCausalLM:

Alternates between 3 SWA layers and 1 global attention layer.
RoPE applied only to SWA layers, NoPE on global attention layer.

For more details refer to the original Command-A Paper by Cohere

Benchmarking and Profiling

Benchmarked through MMLU using the following commands:

python3 -m sglang.launch_server \
    --model-path CohereLabs/c4ai-command-r7b-12-2024 \
    --trust-remote-code

python3 bench_sglang.py

SGLang model accuracy (66.1) vs Cohere Paper reported accuracy (65.2) for Command R7b

mmlu:
Total latency: 133.511   (on L40 GPU)
Average accuracy: 66.1     (Cohere official paper: 65.2)

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

…model

gemini-code-assist · 2026-01-12T04:13:37Z

Summary of Changes

Hello @ljw-mc, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for Cohere's second-generation models, specifically the Cohere2ForCausalLM architecture, which encompasses models like Command-A and Command-R. The core changes involve adapting the attention mechanism to correctly utilize Cohere2's sliding window attention (SWA) by integrating a sliding_window_size parameter. Additionally, the application of rotary embeddings has been refined to ensure proper behavior and compatibility across both Cohere v1 and v2 models, particularly in the context of SWA layers.

Highlights

New Model Support: Added support for the Cohere2ForCausalLM architecture, enabling compatibility with Cohere's Command-A and Command-R model variants.
Sliding Window Attention (SWA) Integration: Implemented the sliding_window_size attribute within CohereAttention to properly handle and support Sliding Window Attention (SWA) in Cohere2 models.
Conditional Rotary Embeddings: Modified the application of rotary embeddings to be conditional, applying them only for Cohere v1 models or Cohere2 layers that utilize a sliding window attention mechanism.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the Cohere2ForCausalLM architecture, which is used by Cohere's Command-A and Command-R models. The changes correctly identify the model version and configure the sliding window attention for Cohere2 models. However, I've found a critical issue in how the layer type is determined, which could lead to an IndexError at runtime. My review includes a suggested fix for this issue.

python/sglang/srt/models/commandr.py

…lang into ljw-mc/support_cohere2

…dable

…lang into ljw-mc/support_cohere2

JustinTong0323 · 2026-01-16T23:26:45Z

/tag-and-rerun-ci

JustinTong0323 · 2026-01-16T23:33:06Z

Need to cherry pick this PR #17236 (dfc03db) to launch the server, and the test result seems reasonable.

mmlu:
Total latency: 35.161
Average accuracy: 0.660

JustinTong0323 · 2026-01-16T23:53:34Z

/tag-and-rerun-ci

ljw-mc and others added 3 commits January 1, 2026 23:09

Add support for sliding window mechanism in CohereAttention for v2 …

c8b0c83

…model

Merge branch 'sgl-project:main' into ljw-mc/support_cohere2

06926f6

Merge branch 'main' into ljw-mc/support_cohere2

b5ab1a5

gemini-code-assist bot reviewed Jan 12, 2026

View reviewed changes

python/sglang/srt/models/commandr.py Outdated Show resolved Hide resolved

ljw-mc and others added 9 commits January 12, 2026 06:21

add logit_scale attribute from Cohere2

9a8e735

Merge branch 'ljw-mc/support_cohere1' of https://github.com/ljw-mc/sg…

7423bf1

…lang into ljw-mc/support_cohere2

make conditions for instantiating self.sliding_window_size more rea…

21ed2ff

…dable

Merge branch 'sgl-project:main' into ljw-mc/support_cohere2

27b8fd6

make it clear there is v1 and v2

28ac624

Merge branch 'ljw-mc/support_cohere2' of https://github.com/ljw-mc/sg…

02cda49

…lang into ljw-mc/support_cohere2

Merge branch 'sgl-project:main' into ljw-mc/support_cohere2

7243ce5

comments about the RoPE conditions

fe7e5bb

Merge branch 'main' into ljw-mc/support_cohere2

544cbff

ljw-mc changed the title ~~[new-model] Adds support for arch Cohere2ForCausalLM~~ [new-model] Add support for Cohere2ForCausalLM behind Command-A and Command-R Models Jan 14, 2026

ljw-mc and others added 7 commits January 14, 2026 07:07

precommit

0847c04

Merge branch 'ljw-mc/support_cohere2' of https://github.com/ljw-mc/sg…

e11cb20

…lang into ljw-mc/support_cohere2

Merge branch 'sgl-project:main' into ljw-mc/support_cohere2

a6ccf25

Merge branch 'sgl-project:main' into ljw-mc/support_cohere2

b3c457e

Merge branch 'main' into ljw-mc/support_cohere2

db03abd

add model to docs

2469767

Merge branch 'ljw-mc/support_cohere2' of https://github.com/ljw-mc/sg…

2f75792

…lang into ljw-mc/support_cohere2

github-actions bot added the documentation Improvements or additions to documentation label Jan 16, 2026

ljw-mc added 6 commits January 16, 2026 23:13

docs upd

c9f1d68

docs upd

400d2b5

docs upd

be1f77f

docs upd

2760bcd

docs upd

bdce48b

add command-r command-a to docs

534a2a3

ljw-mc marked this pull request as ready for review January 16, 2026 23:23

JustinTong0323 self-assigned this Jan 16, 2026

github-actions bot added the run-ci label Jan 16, 2026

JustinTong0323 approved these changes Jan 16, 2026

View reviewed changes

ljw-mc added 9 commits January 16, 2026 21:26

Merge branch 'main' into ljw-mc/support_cohere2

c9bbcda

Merge branch 'main' into ljw-mc/support_cohere2

9e8d9f6

Merge branch 'main' into ljw-mc/support_cohere2

6d94ea8

Merge branch 'main' into ljw-mc/support_cohere2

3e93328

Merge branch 'main' into ljw-mc/support_cohere2

084a2a8

Merge branch 'main' into ljw-mc/support_cohere2

f4a8fd5

Merge branch 'main' into ljw-mc/support_cohere2

e51c81f

Merge branch 'main' into ljw-mc/support_cohere2

a9fef41

Merge branch 'main' into ljw-mc/support_cohere2

d08e22c

Kangyan-Zhou merged commit cf89351 into sgl-project:main Jan 21, 2026
90 of 104 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[new-model] Add support for `Cohere2ForCausalLM` behind Command-A and Command-R Models#16927

[new-model] Add support for `Cohere2ForCausalLM` behind Command-A and Command-R Models#16927
Kangyan-Zhou merged 34 commits intosgl-project:mainfrom
ljw-mc:ljw-mc/support_cohere2

ljw-mc commented Jan 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ljw-mc commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Background

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Jan 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

JustinTong0323 commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ljw-mc commented Jan 12, 2026 •

edited

Loading