Add Code World Model (CWM) #41199

jacobkahn · 2025-09-29T14:02:49Z

Adds the Code World Model (CWM) - https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/

High-level implementation details:

This is a GQA + local/global sliding window attention model
Implemented in HF Llama3 + interleaved sliding window attention
Inheriting from Gemma2/3 requires weight remapping which breaks VLLM compatibility and other components, so this is implemented using the existing causal mask utils from HF

The model repos are:

Note that for VLLM compatibility, model config.json still refer to Llama3ForCausalLM and a llama model_type — see example. vllm-project/vllm#25611 adds support mapping CwmForCausalLM to the Llama3 model class in VLLM since VLLM supports Llama3 + layer_types with local/global attention - see docs. The model type in the config.json will be updated on HF (and the special automapping condition removed) once this PR is merged and a Transformers release has happened containing the CwmForCausalLM model class.

@ArthurZucker, @zucchini-nlp

Supersedes #41188 due to some fork misery

zucchini-nlp

Thanks, left some comments to clean up. Btw, do we have converted weights already which we can use for the integration tests?

docs/source/en/model_doc/cwm.md

src/transformers/models/auto/configuration_auto.py

src/transformers/models/cwm/__init__.py

src/transformers/models/cwm/modular_cwm.py

tests/models/cwm/test_modeling_cwm.py

src/transformers/models/cwm/modular_cwm.py

zucchini-nlp · 2025-10-01T09:00:40Z

Can we update slow test ids and then I will trigger the slow CI? Overall lgtm and we can merge

zucchini-nlp · 2025-10-01T16:40:46Z

run-slow: cwm

github-actions · 2025-10-01T16:44:08Z

This comment contains run-slow, running the specified jobs:

models: ['models/cwm']
quantizations: [] ...

HuggingFaceDocBuilderDev · 2025-10-01T16:49:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

LGTM I think you can completely inherit from Qwen2Model tho

src/transformers/models/cwm/modular_cwm.py

tests/models/cwm/test_modeling_cwm.py

ArthurZucker

Let's go! 🤗

src/transformers/models/cwm/modular_cwm.py

github-actions · 2025-10-08T09:21:05Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, cwm

* [wip][cwm] Code World Model stubs and setup in HF Transformers * [wip] Get other things working * [wip] Working * Tokenizer pad * fix: cwm window attn * temp remove test * temp remove test * Fixes * Temporarily add auto config remapping option until VLLM 0.11 is out * Fix model type and add layer validation * Lint, remove CwmForSequenceClassification * Lint, tests * Remove CwmForSequenceClassification * Lint * Remove intermediary layer expors/doc errorss, fix tests * Lint * run python utils/sort_auto_mappings.py --check_only * Remove Cwm processor mapping, get check_repo passing * Remove CwmTextConfig from test * Add docstring for CwmConfig * remove global_window and window_pattern params from config * Fix docstrings * Revert change to auto docstring util * lint * Fixes minus test improvements * Alter tests to simply check logits * lint * Have slow tests use repo, make CwmPretrainedModel passthrough * Remove decoder layer implementation, use Llama3Decoder + CwmAttetion * Use linear w/o bias for CwmAttention, add token-level integration test * Don't ignore config attention bias * Remove attention bias parameter entirely from config --------- Co-authored-by: galco <galco@meta.com>

jacobkahn and others added 12 commits September 24, 2025 18:31

[wip][cwm] Code World Model stubs and setup in HF Transformers

ed328b6

[wip] Get other things working

c6adecb

[wip] Working

4652931

Tokenizer pad

20485eb

fix: cwm window attn

a7bfd9f

temp remove test

db2da31

temp remove test

7d31c57

Merge branch 'main' into develop

9bab5a7

Fixes

3b3c910

Temporarily add auto config remapping option until VLLM 0.11 is out

95c3013

Fix model type and add layer validation

db58f4f

Add Code World Model (CWM)

db14577

jacobkahn mentioned this pull request Sep 29, 2025

feat: support cwm modeling #41188

Closed

jacobkahn added 14 commits September 29, 2025 14:35

Lint, remove CwmForSequenceClassification

46c55e2

Lint, tests

05a9fb0

Remove CwmForSequenceClassification

fb8b721

Lint

921c4ba

Remove intermediary layer expors/doc errorss, fix tests

ee19e08

Lint

beaa15f

run python utils/sort_auto_mappings.py --check_only

c7a4be2

Remove Cwm processor mapping, get check_repo passing

05e86aa

Remove CwmTextConfig from test

aa61459

Add docstring for CwmConfig

e83610a

remove global_window and window_pattern params from config

cc53513

Fix docstrings

a662db1

Revert change to auto docstring util

2f8228d

lint

9eb95a9

zucchini-nlp reviewed Sep 30, 2025

View reviewed changes

jacobkahn added 2 commits September 30, 2025 15:10

Fixes minus test improvements

b885a8b

Alter tests to simply check logits

b80ff8a

lint

e729586

jacobkahn requested a review from zucchini-nlp September 30, 2025 22:21

zucchini-nlp approved these changes Oct 1, 2025

View reviewed changes

tests/models/cwm/test_modeling_cwm.py Outdated Show resolved Hide resolved

src/transformers/models/cwm/modular_cwm.py Outdated Show resolved Hide resolved

Have slow tests use repo, make CwmPretrainedModel passthrough

d13e5c2

ArthurZucker reviewed Oct 2, 2025

View reviewed changes

src/transformers/models/cwm/modular_cwm.py Outdated Show resolved Hide resolved

src/transformers/models/cwm/modular_cwm.py Show resolved Hide resolved

tests/models/cwm/test_modeling_cwm.py Show resolved Hide resolved

jacobkahn added 3 commits October 6, 2025 13:21

Remove decoder layer implementation, use Llama3Decoder + CwmAttetion

9f2a7ab

Use linear w/o bias for CwmAttention, add token-level integration test

7802c05

Don't ignore config attention bias

489a72f

ArthurZucker approved these changes Oct 8, 2025

View reviewed changes

src/transformers/models/cwm/modular_cwm.py Outdated Show resolved Hide resolved

Remove attention bias parameter entirely from config

2bf54c2

ArthurZucker enabled auto-merge (squash) October 9, 2025 15:01

auto-merge was automatically disabled October 9, 2025 15:30
Pull Request is not mergeable

ArthurZucker merged commit 0eae41a into huggingface:main Oct 9, 2025
25 checks passed

Add Code World Model (CWM) #41199

Add Code World Model (CWM) #41199

Uh oh!

Conversation

jacobkahn commented Sep 29, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp commented Oct 1, 2025

Uh oh!

zucchini-nlp commented Oct 1, 2025

Uh oh!

github-actions bot commented Oct 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants