[FEAT] Add the support for more VLMs(Gemma3)#1613
[FEAT] Add the support for more VLMs(Gemma3)#1613Qsingle wants to merge 0 commit intoverl-project:mainfrom
Conversation
1e5b0b2 to
6d1195a
Compare
07a24f3 to
9a3f0f1
Compare
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR #1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: #1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
|
Thanks for the PR! We noticed that hf transformer is doing refactoring for the vlm model interface. We will temporarily pause merging effort to avoid too much adhoc model specific integration code brought by vlm |
Yeah, I've also noticed that. The latest Transformers has some conflicts with the VLLM. I will try to add the code to resolve these conflicts and ensure version compatibility. |
|
Thank you for your brilliant work! I’m interested in using Gemma3 as the base to perform RL training and I have tested your PR committed codes on my own machine. training configurationmachine environmentvllm==0.8.6I could execute the training job by this environment but I got the Gibberish output of my rollout model like: vllm==0.8.5.post1I have also tested the same configuration on the following environment: But this time, I cannot execute the training job and got the following exception: params key mismatch of Gemma3 between transformers and vllmI have noticed a mismatch between the params key of state_dict of HF Gemma3 model and VLLM Gemma3 model and add the following codes to More infoI could successfully launch the Gemma3 model by either transformers or vllm alone in my environment and get reasonable output. |
You need to add a configure |
|
Thank you for your suggestion! I have lowered down my transformers to 4.51.3 and observed reasonable rollout output by setting Also, I noticed your latest commit and I have rebased onto it to test my experiments. It's awesome! I found the codes in I have confirmed in my vllm==0.8.6 and nightly vllm that Moreover, I observed a warning during training about the attention implementation of Gemma3: I tried to add a if-statement in Seems |
I think using the |
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
|
Please advise, how should I merge the trained Gemma3 weights? I currently encounter an error: ValueError: Unrecognized configuration class <class 'transformers.models.gemma3.configuration_gemma3.Gemma3Config'> for this kind of AutoModel: AutoModelForVision2Seq. I don't know how to solve it, could you give me some guidance? |
|
Thank you again for the repo. I am wondering if I could have some advice. I have tried the solution you proposed of freezing the image tower, but the issue persists. I have tried other processors like InternVLProcessor from transformers and the InternVLProcessor from vllm, but that does not fix the issue. Do you have any suggestions? Note: the verl version you are using is more than three months newer. My environment is: |
have you solved this problem? @dle666 |
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
Checklist Before Starting
What does this PR do?
High-Level Design
Specific Changes
API
Usage Example
Test
The training curve for InternVl2.5-1B

The training curve for InternVL3-1B

Additional Info.
Checklist Before Submitting
[BREAKING]to the PR title if it breaks any API.