[BREAKING][model, data] feat: add support for Mistral3#2338
[BREAKING][model, data] feat: add support for Mistral3#2338diqiuzhuanzhuan wants to merge 7 commits intoverl-project:mainfrom
Conversation
| min_num_params: 0 | ||
|
|
||
| # list of transformer layer classes to wrap with FSDP | ||
| transformer_layer_cls_to_wrap: [] |
There was a problem hiding this comment.
This configuration seems not take effect in code?
There was a problem hiding this comment.
There was a problem hiding this comment.
That's right!
| @@ -0,0 +1,155 @@ | |||
| # Copyright 2024 Bytedance Ltd. and/or its affiliates | |||
There was a problem hiding this comment.
Thanks for the contribution. Currently when adding a new model, the adaptors are scattered around and many files are touch. I am thinking maybe reorganzing the files need to be changed so that we can have one folder per model. For instance:
verl/models/transformers/llama
verl/models/transformers/qwen2_5_vl
verl/models/transformers/qwen2
verl/models/transformers/[model_name] # model name should be the same as the one in https://github.com/huggingface/transformers/tree/main/src/transformers/models
And in each model folder, the structure is like below (take mistral3 as the example):
mistral3_collate_utils.py
mistral3_flops_counter.py
mistral3_any_other_change_required.py
what do you think? cc @hiyouga @Fazziekey
BTW I am also not sure if we want to have verl/models/transformers and verl/models/mcore as two folders both containing model specific code. Maybe we should let model related code to be at the level of
verl/models/transformers # common registry utils. No model specific code
verl/models/mcore # common registry utils specfic for mcore. No model specific code
verl/models/llama
verl/models/qwen2_5_vl
verl/models/qwen2
verl/models/[model_name]
@ISEEKYAN what do you think?
Similarly, tests can be standardized:
tests/models/test_llama.py
tests/models/test_[model].py
With a better code structure it will be easier to write a new model onboarding documentation and let the community add new SOTA models.
There was a problem hiding this comment.
I think a unified model unit test like tests/models/test_[model].py is good after the refactor of unified training engine APIs.
For megatron, LLMs of different archs share the same GPTModel API. The efforts of supporting new models will be config mapping and weights mapping and maybe some few patches. VLM would need more definition files since the LLaVaModel's development is slow.
Mbridge/megatron-hub is the official solution of supporting new megatron models, we recommend to obsolete the verl/models/mcore once the code is totally transferred to mbridge and use verl/models/[model_name] for transformers.
And if we need to define a megatron model in verl anyway, the solution is to inherit LLMBridge for that model, like how slime did. And that will be inside the directory verl/models/[model_name]

What does this PR do?
This PR update adds support for Mistral3. Additionally, this update introduces a registry-based mechanism for managing dataset collate functions. It enables flexible selection of batch collation logic by name, supporting both the default and PixtralProcessor-specific collation.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_c ontroller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
wandb log: https://wandb.ai/diqiuzhuanzhuan/verl_grpo_example_geo3k/runs/84fvuv9s
API and Usage Example
Usage Example (One node with 8 A800/A100 GPUs)
Add 'get_collate_fn_manager_cls' in verl/utils/dataset/dataset_utils.py and get specific 'collate_fn' via different models.
In main_ppo.py,
High-Level Design
Design and Purpose of
get_collate_fn_manager_clsThe
get_collate_fn_manager_clsfunction, together with theregister_collate_fndecorator and theCOLLATE_FN_MANAGER_REGISTRYdictionary, provides a flexible and extensible mechanism for managing and retrieving collate functions in the dataset pipeline.Key Points:
Registration Mechanism
The
register_collate_fndecorator allows different collate functions to be registered under a unique string key (e.g.,"default","PixtralProcessor"). This enables easy extension and modularization of batch collation logic for different data formats or model requirements.Centralized Retrieval
The
get_collate_fn_manager_cls(name)function serves as a unified interface to retrieve the appropriate collate function by name. If the requested name is not registered, it falls back to the"default"collate function, ensuring robustness.Extensibility
New collate functions can be added simply by defining them and registering with the decorator, without modifying the main data pipeline logic. This design supports future expansion for new models or data modalities.
Maintainability
By decoupling collate function registration and retrieval, the codebase becomes easier to maintain and reason about, especially as the number of supported data types grows.
Example Usage
Design Summary
Specific Changes
Add 'transformer_layer_cls_to_wrap' to fsdp_config.wrap_policy
Add 'model_dtype' to fsdp_config.
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace.