Skip to content

[feat] support kimi_vl VLM model#1639

Merged
hiyouga merged 5 commits intoverl-project:mainfrom
ShareLer:kimi_vl_support
May 30, 2025
Merged

[feat] support kimi_vl VLM model#1639
hiyouga merged 5 commits intoverl-project:mainfrom
ShareLer:kimi_vl_support

Conversation

@ShareLer
Copy link
Contributor

@ShareLer ShareLer commented May 22, 2025

Checklist Before Starting

What does this PR do?

Add initial support for Kimi_vl;
Add sp patch for kimi_vl.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

  • Add some minor changes to be compatible with kimi_vl
  • Add patch to support ulysses_sequence_parallel

API

Demonstrate how the API changes if any.

Usage Example

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$DATA_PATH/geo3k/test.parquet \
    data.val_files=$DATA_PATH/geo3k/test.parquet \
    data.train_batch_size=16 \
    data.max_prompt_length=2048 \
    data.max_response_length=4096 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    data.shuffle=False \
    +data.trust_remote_code=True \
    actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.actor.ppo_mini_batch_size=8 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=False \
    actor_rollout_ref.model.trust_remote_code=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8\
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.val_before_train=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='Kimi_VL_test' \
    trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \
    trainer.n_gpus_per_node=8\
    trainer.nnodes=1\
    trainer.save_freq=50 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

Test & Problem

During the dev, I discovered some issues, but they did not affect the code for this PR.
Existing problems:(with vllm==0.8.5.post1)

  • Occasional errors of vllm
  File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build
    self.page_size)
    ^^^^^^^^^^^^^^
AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size'

releated: vllm-project/vllm#16908
Reference this method to avoid the problem temporarily: vllm-project/vllm#16908 (comment)

  • Garbled output from vllm under specific circumstances
    During test, I found that when SamplingParams.n > 1,vllm's output is some meaningless characters or keeps repeating. This will affect grpo.

releated: vllm-project/vllm#18378
Note: Using the Hopper architecture gpu can avoid this problem, but it is not clear whether there are still potential issues.

Training curve:
The training curve will comming soon after I solve the second problem.

Additional Info.

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title if it breaks any API.
  • Update the documentation about your changes in the docs.
  • Add CI test(s) if necessary.

ShareLer added 3 commits May 21, 2025 18:41
Signed-off-by: ShareLer <ShareLe@163.com>
Signed-off-by: ShareLer <ShareLe@163.com>
Signed-off-by: ShareLer <ShareLe@163.com>
Copy link
Collaborator

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me!

ShareLer added 2 commits May 29, 2025 14:53
Signed-off-by: ShareLer <ShareLe@163.com>
Signed-off-by: ShareLer <ShareLe@163.com>
@ShareLer
Copy link
Contributor Author

@hiyouga It seems that there are some problems with the CI environment. Could you help retrigger the failed CI?

Copy link
Collaborator

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, let's merge

@hiyouga hiyouga merged commit 96903e0 into verl-project:main May 30, 2025
32 of 36 checks passed
wwwjn pushed a commit to wwwjn/verl that referenced this pull request Jun 10, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).
Some code will conflict with this PR verl-project#1613 

### What does this PR do?

Add initial support for Kimi_vl;
Add sp patch for kimi_vl.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Add some minor changes to be compatible with kimi_vl
- Add patch to support ulysses_sequence_parallel

### API

> Demonstrate how the API changes if any.

### Usage Example

```bash

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$DATA_PATH/geo3k/test.parquet \
    data.val_files=$DATA_PATH/geo3k/test.parquet \
    data.train_batch_size=16 \
    data.max_prompt_length=2048 \
    data.max_response_length=4096 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    data.shuffle=False \
    +data.trust_remote_code=True \
    actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.actor.ppo_mini_batch_size=8 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=False \
    actor_rollout_ref.model.trust_remote_code=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8\
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.val_before_train=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='Kimi_VL_test' \
    trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \
    trainer.n_gpus_per_node=8\
    trainer.nnodes=1\
    trainer.save_freq=50 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

```


### Test & Problem
During the dev, I discovered some issues, but they did not affect the
code for this PR.
Existing problems:(with vllm==0.8.5.post1)
- Occasional errors of vllm
```python
  File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build
    self.page_size)
    ^^^^^^^^^^^^^^
AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size'
```
releated: vllm-project/vllm#16908
Reference this method to avoid the problem temporarily:
vllm-project/vllm#16908 (comment)

- Garbled output from vllm under specific circumstances
During test, I found that when SamplingParams.n > 1,vllm's output is
some meaningless characters or keeps repeating. This will affect grpo.

releated: vllm-project/vllm#18378
Note: Using the Hopper architecture gpu can avoid this problem, but it
is not clear whether there are still potential issues.


Training curve:
The training curve will comming soon after I solve the second problem.


### Additional Info.

- **Issue Number**: verl-project#1428 
- **Training**: FSDP
- **Inference**: vLLM

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
vermouth1992 pushed a commit that referenced this pull request Jun 10, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 10, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).
Some code will conflict with this PR verl-project#1613 

### What does this PR do?

Add initial support for Kimi_vl;
Add sp patch for kimi_vl.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Add some minor changes to be compatible with kimi_vl
- Add patch to support ulysses_sequence_parallel

### API

> Demonstrate how the API changes if any.

### Usage Example

```bash

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$DATA_PATH/geo3k/test.parquet \
    data.val_files=$DATA_PATH/geo3k/test.parquet \
    data.train_batch_size=16 \
    data.max_prompt_length=2048 \
    data.max_response_length=4096 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    data.shuffle=False \
    +data.trust_remote_code=True \
    actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.actor.ppo_mini_batch_size=8 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=False \
    actor_rollout_ref.model.trust_remote_code=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8\
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.val_before_train=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='Kimi_VL_test' \
    trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \
    trainer.n_gpus_per_node=8\
    trainer.nnodes=1\
    trainer.save_freq=50 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

```


### Test & Problem
During the dev, I discovered some issues, but they did not affect the
code for this PR.
Existing problems:(with vllm==0.8.5.post1)
- Occasional errors of vllm
```python
  File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build
    self.page_size)
    ^^^^^^^^^^^^^^
AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size'
```
releated: vllm-project/vllm#16908
Reference this method to avoid the problem temporarily:
vllm-project/vllm#16908 (comment)

- Garbled output from vllm under specific circumstances
During test, I found that when SamplingParams.n > 1,vllm's output is
some meaningless characters or keeps repeating. This will affect grpo.

releated: vllm-project/vllm#18378
Note: Using the Hopper architecture gpu can avoid this problem, but it
is not clear whether there are still potential issues.


Training curve:
The training curve will comming soon after I solve the second problem.


### Additional Info.

- **Issue Number**: verl-project#1428 
- **Training**: FSDP
- **Inference**: vLLM

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).
Some code will conflict with this PR verl-project#1613 

### What does this PR do?

Add initial support for Kimi_vl;
Add sp patch for kimi_vl.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Add some minor changes to be compatible with kimi_vl
- Add patch to support ulysses_sequence_parallel

### API

> Demonstrate how the API changes if any.

### Usage Example

```bash

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$DATA_PATH/geo3k/test.parquet \
    data.val_files=$DATA_PATH/geo3k/test.parquet \
    data.train_batch_size=16 \
    data.max_prompt_length=2048 \
    data.max_response_length=4096 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    data.shuffle=False \
    +data.trust_remote_code=True \
    actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.actor.ppo_mini_batch_size=8 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=False \
    actor_rollout_ref.model.trust_remote_code=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8\
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.val_before_train=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='Kimi_VL_test' \
    trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \
    trainer.n_gpus_per_node=8\
    trainer.nnodes=1\
    trainer.save_freq=50 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

```


### Test & Problem
During the dev, I discovered some issues, but they did not affect the
code for this PR.
Existing problems:(with vllm==0.8.5.post1)
- Occasional errors of vllm
```python
  File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build
    self.page_size)
    ^^^^^^^^^^^^^^
AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size'
```
releated: vllm-project/vllm#16908
Reference this method to avoid the problem temporarily:
vllm-project/vllm#16908 (comment)

- Garbled output from vllm under specific circumstances
During test, I found that when SamplingParams.n > 1,vllm's output is
some meaningless characters or keeps repeating. This will affect grpo.

releated: vllm-project/vllm#18378
Note: Using the Hopper architecture gpu can avoid this problem, but it
is not clear whether there are still potential issues.


Training curve:
The training curve will comming soon after I solve the second problem.


### Additional Info.

- **Issue Number**: verl-project#1428 
- **Training**: FSDP
- **Inference**: vLLM

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
### Checklist Before Starting

- [x] Search for similar PR(s).
Some code will conflict with this PR verl-project#1613 

### What does this PR do?

Add initial support for Kimi_vl;
Add sp patch for kimi_vl.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Add some minor changes to be compatible with kimi_vl
- Add patch to support ulysses_sequence_parallel

### API

> Demonstrate how the API changes if any.

### Usage Example

```bash

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$DATA_PATH/geo3k/test.parquet \
    data.val_files=$DATA_PATH/geo3k/test.parquet \
    data.train_batch_size=16 \
    data.max_prompt_length=2048 \
    data.max_response_length=4096 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    data.shuffle=False \
    +data.trust_remote_code=True \
    actor_rollout_ref.model.path=moonshotai/Kimi-VL-A3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
    actor_rollout_ref.actor.ppo_mini_batch_size=8 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=False \
    actor_rollout_ref.model.trust_remote_code=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=8\
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.val_before_train=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='Kimi_VL_test' \
    trainer.experiment_name='kimi_vl_grpo_geo3k_cp2' \
    trainer.n_gpus_per_node=8\
    trainer.nnodes=1\
    trainer.save_freq=50 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

```


### Test & Problem
During the dev, I discovered some issues, but they did not affect the
code for this PR.
Existing problems:(with vllm==0.8.5.post1)
- Occasional errors of vllm
```python
  File "/home/sharele/anaconda3/lib/python3.11/site-packages/vllm/v1/attention/backends/mla/common.py", line 504, in build
    self.page_size)
    ^^^^^^^^^^^^^^
AttributeError: 'MLACommonMetadataBuilder' object has no attribute 'page_size'
```
releated: vllm-project/vllm#16908
Reference this method to avoid the problem temporarily:
vllm-project/vllm#16908 (comment)

- Garbled output from vllm under specific circumstances
During test, I found that when SamplingParams.n > 1,vllm's output is
some meaningless characters or keeps repeating. This will affect grpo.

releated: vllm-project/vllm#18378
Note: Using the Hopper architecture gpu can avoid this problem, but it
is not clear whether there are still potential issues.


Training curve:
The training curve will comming soon after I solve the second problem.


### Additional Info.

- **Issue Number**: verl-project#1428 
- **Training**: FSDP
- **Inference**: vLLM

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?
Fix sequence parallelism conflict in kimiVL patch.

Background:
A recent VLM-related PR(verl-project#1739 ) has modified the sequence parallelism
logic of VLM: Split inputs_embeds after the model's embedding layer
instand of spliting input_ids and position_ids before forward.
However, the SP logic I implemented in KimiVL's PR(verl-project#1639 ) was still
implemented in accordance with the old logic. And split the image token
at the combination of image_token and text_token to avoid the problem of
'the Image features and image tokens do not match'.
Since these two PR were developed in parallel which led to logical
conflicts after the PR were merged.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

- Delete the patch for _merge_with_image_features which to assign the
image token to the corresponding SP rank.
- Adjust the processing related to position_ids in
_ulysses_flash_attn_forward.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test


![image](https://github.com/user-attachments/assets/82ef7a74-66f8-4bb0-a0fc-3702b215c8c0)


### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Signed-off-by: ShareLer <ShareLe@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants