[feat] support kimi_vl VLM model#1639
Merged
hiyouga merged 5 commits intoverl-project:mainfrom May 30, 2025
Merged
Conversation
Signed-off-by: ShareLer <ShareLe@163.com>
Signed-off-by: ShareLer <ShareLe@163.com>
Signed-off-by: ShareLer <ShareLe@163.com>
hiyouga
approved these changes
May 28, 2025
Collaborator
hiyouga
left a comment
There was a problem hiding this comment.
Overall looks good to me!
hiyouga
requested changes
May 28, 2025
Signed-off-by: ShareLer <ShareLe@163.com>
Signed-off-by: ShareLer <ShareLe@163.com>
Contributor
Author
|
@hiyouga It seems that there are some problems with the CI environment. Could you help retrigger the failed CI? |
hiyouga
approved these changes
May 30, 2025
Collaborator
hiyouga
left a comment
There was a problem hiding this comment.
No problem, let's merge
7 tasks
wwwjn
pushed a commit
to wwwjn/verl
that referenced
this pull request
Jun 10, 2025
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
vermouth1992
pushed a commit
that referenced
this pull request
Jun 10, 2025
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test  ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>
yellowbee686
pushed a commit
to yellowbee686/verl
that referenced
this pull request
Jun 10, 2025
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test  ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>
whatadayG
pushed a commit
to whatadayG/verl
that referenced
this pull request
Sep 5, 2025
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test  ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>
chenjiaoAngel
added a commit
to chenjiaoAngel/verl
that referenced
this pull request
Nov 14, 2025
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
chenjiaoAngel
added a commit
to chenjiaoAngel/verl
that referenced
this pull request
Nov 14, 2025
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test  ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>
TimurTaepov
pushed a commit
to giorgossideris/verl
that referenced
this pull request
Dec 20, 2025
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
TimurTaepov
pushed a commit
to giorgossideris/verl
that referenced
this pull request
Dec 20, 2025
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test  ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>
vyomakesh0728
added a commit
to vyomakesh0728/verl
that referenced
this pull request
Jan 22, 2026
### Checklist Before Starting - [x] Search for similar PR(s). Some code will conflict with this PR verl-project#1613 ### What does this PR do? Add initial support for Kimi_vl; Add sp patch for kimi_vl. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Add some minor changes to be compatible with kimi_vl - Add patch to support ulysses_sequence_parallel ### API > Demonstrate how the API changes if any. ### Usage Example ```bash python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@ ``` ### Test & Problem During the dev, I discovered some issues, but they did not affect the code for this PR. Existing problems:(with vllm==0.8.5.post1) - Occasional errors of vllm ```python File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build self.page_size) ^^^^^^^^^^^^^^ AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size' ``` releated: vllm-project/vllm#16908 Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment) - Garbled output from vllm under specific circumstances During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo. releated: vllm-project/vllm#18378 Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues. Training curve: The training curve will comming soon after I solve the second problem. ### Additional Info. - **Issue Number**: verl-project#1428 - **Training**: FSDP - **Inference**: vLLM ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: ShareLer <ShareLe@163.com>
vyomakesh0728
added a commit
to vyomakesh0728/verl
that referenced
this pull request
Jan 22, 2026
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Fix sequence parallelism conflict in kimiVL patch. Background: A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism logic of VLM: Split inputs_embeds after the model's embedding layer instand of spliting input_ids and position_ids before forward. However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still implemented in accordance with the old logic. And split the image token at the combination of image_token and text_token to avoid the problem of 'the Image features and image tokens do not match'. Since these two PR were developed in parallel which led to logical conflicts after the PR were merged. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes - Delete the patch for _merge_with_image_features which to assign the image token to the corresponding SP rank. - Adjust the processing related to position_ids in _ulysses_flash_attn_forward. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test  ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path. --------- Signed-off-by: ShareLer <ShareLe@163.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist Before Starting
Some code will conflict with this PR [FEAT] Add the support for more VLMs(Gemma3) #1613
What does this PR do?
Add initial support for Kimi_vl;
Add sp patch for kimi_vl.
High-Level Design
Specific Changes
API
Usage Example
python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=$DATA_PATH/geo3k/test.parquet \ data.val_files=$DATA_PATH/geo3k/test.parquet \ data.train_batch_size=16 \ data.max_prompt_length=2048 \ data.max_response_length=4096 \ data.filter_overlong_prompts=True \ data.truncation='error' \ data.image_key=images \ data.shuffle=False \ +data.trust_remote_code=True \ actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \ actor_rollout_ref.actor.ppo_mini_batch_size=8 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.01 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=False \ actor_rollout_ref.model.trust_remote_code=True \ actor_rollout_ref.actor.fsdp_config.param_offload=True \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.rollout.tensor_model_parallel_size=8\ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ actor_rollout_ref.rollout.n=8 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.val_before_train=False \ trainer.critic_warmup=0 \ trainer.logger=['console','wandb'] \ trainer.project_name='Kimi_VL_test' \ trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \ trainer.n_gpus_per_node=8\ trainer.nnodes=1\ trainer.save_freq=50 \ trainer.test_freq=5 \ trainer.total_epochs=15 $@Test & Problem
During the dev, I discovered some issues, but they did not affect the code for this PR.
Existing problems:(with vllm==0.8.5.post1)
releated: vllm-project/vllm#16908
Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment)
During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo.
releated: vllm-project/vllm#18378
Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues.
Training curve:
The training curve will comming soon after I solve the second problem.
Additional Info.
Checklist Before Submitting
[BREAKING]to the PR title if it breaks any API.