Skip to content

Conversation

@jacobkahn
Copy link
Contributor

Adds the Code World Model (CWM) - https://ai.meta.com/research/publications/cwm-an-open-weights-llm-for-research-on-code-generation-with-world-models/

High-level implementation details:

  • This is a GQA + local/global sliding window attention model
  • Implemented in HF Llama3 + interleaved sliding window attention
  • Inheriting from Gemma2/3 requires weight remapping which breaks VLLM compatibility and other components, so this is implemented using the existing causal mask utils from HF

The model repos are:

Note that for VLLM compatibility, model config.json still refer to Llama3ForCausalLM and a llama model_type — see example. vllm-project/vllm#25611 adds support mapping CwmForCausalLM to the Llama3 model class in VLLM since VLLM supports Llama3 + layer_types with local/global attention - see docs. The model type in the config.json will be updated on HF (and the special automapping condition removed) once this PR is merged and a Transformers release has happened containing the CwmForCausalLM model class.

@ArthurZucker, @zucchini-nlp

Supersedes #41188 due to some fork misery

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left some comments to clean up. Btw, do we have converted weights already which we can use for the integration tests?

@zucchini-nlp
Copy link
Member

Can we update slow test ids and then I will trigger the slow CI? Overall lgtm and we can merge

@zucchini-nlp
Copy link
Member

run-slow: cwm

@github-actions
Copy link
Contributor

github-actions bot commented Oct 1, 2025

This comment contains run-slow, running the specified jobs:

models: ['models/cwm']
quantizations: [] ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I think you can completely inherit from Qwen2Model tho

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go! 🤗

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, cwm

@ArthurZucker ArthurZucker enabled auto-merge (squash) October 9, 2025 15:01
auto-merge was automatically disabled October 9, 2025 15:30

Pull Request is not mergeable

@ArthurZucker ArthurZucker merged commit 0eae41a into huggingface:main Oct 9, 2025
25 checks passed
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
* [wip][cwm] Code World Model stubs and setup in HF Transformers

* [wip] Get other things working

* [wip] Working

* Tokenizer pad

* fix: cwm window attn

* temp remove test

* temp remove test

* Fixes

* Temporarily add auto config remapping option until VLLM 0.11 is out

* Fix model type and add layer validation

* Lint, remove CwmForSequenceClassification

* Lint, tests

* Remove CwmForSequenceClassification

* Lint

* Remove intermediary layer expors/doc errorss, fix tests

* Lint

* run python utils/sort_auto_mappings.py --check_only

* Remove Cwm processor mapping, get check_repo passing

* Remove CwmTextConfig from test

* Add docstring for CwmConfig

* remove global_window and window_pattern params from config

* Fix docstrings

* Revert change to auto docstring util

* lint

* Fixes minus test improvements

* Alter tests to simply check logits

* lint

* Have slow tests use repo, make CwmPretrainedModel passthrough

* Remove decoder layer implementation, use Llama3Decoder + CwmAttetion

* Use linear w/o bias for CwmAttention, add token-level integration test

* Don't ignore config attention bias

* Remove attention bias parameter entirely from config

---------

Co-authored-by: galco <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants