[FSDP, VLM] feat: add vlm training for FSDP by nanjiangwill · Pull Request #501 · THUDM/slime

nanjiangwill · 2025-10-15T06:20:48Z

Goal: Support VLM training on slime with FSDP

TODO

zhuzilin · 2025-10-20T03:53:32Z

slime/rollout/sglang_rollout.py

+            # Process images for training (like tokenization for images)
+            if images_for_training and state.processor is not None:
+                processed = state.processor(images=images_for_training, return_tensors="pt")
+                sample.pixel_values = processed["pixel_values"]


we shouldn't try to pass the pixel values from sglang to megatron, but instead maybe re-process the image from the training side is better.

coding-famer · 2025-10-26T03:56:11Z

Hi @nanjiangwill, glad to see this important feature! I'd love to help with this PR if you're open to collaborating. I can help with supporting qwen-vl models and geo3k training example. Please LMK if you are happy with collaborating!

nanjiangwill · 2025-10-27T00:40:58Z

Hi @nanjiangwill, glad to see this important feature! I'd love to help with this PR if you're open to collaborating. I can help with supporting qwen-vl models and geo3k training example. Please LMK if you are happy with collaborating!

hi @coding-famer, that will be amazing! can i have ur email/contact number so i can reach out to you?

coding-famer · 2025-10-27T02:21:59Z

Hi @nanjiangwill, glad to see this important feature! I'd love to help with this PR if you're open to collaborating. I can help with supporting qwen-vl models and geo3k training example. Please LMK if you are happy with collaborating!

hi @coding-famer, that will be amazing! can i have ur email/contact number so i can reach out to you?

Have sent you an email!

dongyuanjushi · 2025-11-03T03:22:54Z

Hi @nanjiangwill and @coding-famer, I'd love to help with this PR about the multi-turn part if you are open to this.

nanjiangwill · 2025-11-05T05:46:02Z

Hi @nanjiangwill and @coding-famer, I'd love to help with this PR about the multi-turn part if you are open to this.

heyy thanks for reaching out! can i have your email?

dongyuanjushi · 2025-11-05T22:40:37Z

Hi @nanjiangwill and @coding-famer, I'd love to help with this PR about the multi-turn part if you are open to this.

heyy thanks for reaching out! can i have your email?

Have sent the email!

zhaochenyang20 · 2025-11-05T23:09:17Z

我的神 @nanjiangwill 😭

zhaochenyang20 · 2025-11-09T06:25:17Z

牛逼！

aJupyter · 2025-11-12T09:10:36Z

请问大概什么时候才会合并呀

zhaochenyang20 · 2025-11-16T19:12:51Z

请问大概什么时候才会合并呀

肯定会早于星际之门集群建设好

jhinpan · 2025-12-02T15:19:48Z

Could the code for multimodal processing be merged as soon as possible? The support for multi-turn is a separate module and can be merged later. Otherwise, this will affect the issues based on the multimodal model and merge code.

This PR will be merge in the next few days. Pls just give us some time to do final check and review.

zhaochenyang20 · 2025-12-03T03:55:43Z

🈚️敌！

jhinpan · 2025-12-03T22:52:31Z

We ran three experiments comparing different reward model configurations on 8*H100:

geo3k RM with tol=0.05 (original)
geo3k RM with tol=0.0 (strict matching)
default math RM

Results showed all three configurations perform similarly, indicating:

The tolerance parameter has minimal impact on training outcomes
The geo3k-specific reward model provides no advantage over the default math RM

I will later remove them. Just leave a note here to mark:

zhuzilin

LGTM! Left some minor comments.

zhuzilin · 2025-12-04T02:59:54Z

slime/backends/fsdp_utils/data_packing.py

                    flat_rollout_log_probs, dtype=torch.float32, device=torch.cuda.current_device()
                ),
+                "multimodal_inputs": multimodal_data,
+                "multimodal_num_items": multimodal_num_items,


here we need sth like:

packed_batch = { "tokens": ..., } if multimodal_inputs: for key, mm_tensor in multimodal_inputs[i].items(): ... packed_batch.extend({ "multimodal_inputs": multimodal_data, "multimodal_num_items": multimodal_num_items, }) result.append(packed_batch)

used .update() instead of .extend() since dictionaries use update to merge key-value pairs.

zhuzilin · 2025-12-04T03:09:46Z

slime/ray/rollout.py

        ):
            # group norm
-            rewards = torch.tensor(raw_rewards, dtype=torch.float)
+            rewards = torch.tensor(raw_rewards, dtype=torch.float16)


hmm.. it seems better to move this type conversion into the custom reward model? otherwise, it may influence the other users who are using dense rm for rlhf.

move float16 specifically to notes in geo3k_vlm example

simondong1 · 2025-12-07T21:07:36Z

Hi @nanjiangwill, this is awesome! I am also working on RL with VLMs. Would love to contribute and collaborate on further works! my email is simondong0919 at gmail.com love to get in touch

Co-authored-by: Chenhe Gu <chenhegu0109@gmail.com> Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>

adol001 · 2025-12-26T12:40:41Z

@nanjiangwill @zhuzilin

It looks like this commit changed the behavior of the apply_chat_template parameter. As a result, this parameter becomes useless during dataset construction: instead of applying the chat template, the data from prompt_key is fed directly into something like:

messages = [{"role": "user", "content": messages}]

In custom generate scenarios, do you realize what this can lead to?

You’re effectively replacing what used to be a str with a dict (or even a list of dicts). If downstream code doesn’t explicitly validate types, it will just get stuffed into the prompt and you end up with outputs like:

First rollout sample: ['<|im_start|>user\n[{'role': 'user', 'content':

This kind of change alters the default behavior in a non-backward-compatible way. At the very least, compatibility with existing users should be considered — and there should be warnings or clear notices. In my case, it indirectly caused training quality to degrade. I wouldn’t have noticed this logic change if it weren’t for a new training job that made the issue obvious.

zhuzilin · 2025-12-26T14:53:58Z

@adol001 really sorry about this, we will revert this change.

nanjiangwill · 2025-12-27T00:32:46Z

@adol001 sorry again about this, we have fixed the issue with #1232 and #1234

Co-authored-by: Chenhe Gu <chenhegu0109@gmail.com> Co-authored-by: Jin Pan <jpan236@wisc.edu> Co-authored-by: Jinn <47354855+jhinpan@users.noreply.github.com>

nanjiangwill marked this pull request as draft October 15, 2025 06:20

nanjiangwill force-pushed the feat/vlm branch from 4585622 to 95c371f Compare October 20, 2025 03:28

zhuzilin reviewed Oct 20, 2025

View reviewed changes

nanjiangwill force-pushed the feat/vlm branch from 4d7cf35 to cafac52 Compare October 22, 2025 15:19

dongyuanjushi mentioned this pull request Oct 30, 2025

Do slime support the training of qwen2.5-vl? #489

Open

nanjiangwill force-pushed the feat/vlm branch from cafac52 to c29983b Compare November 9, 2025 04:44

feat: dataset and rollout vlm single turn ready

a478470

nanjiangwill force-pushed the feat/vlm branch from fcc6ef1 to a478470 Compare November 12, 2025 08:45

misc: remove code

316cf44

coding-famer added 6 commits November 13, 2025 07:35

add geo3k reward utils

9299459

multimodal inputs

366f694

do apply_chat_template in inference side

86511b6

apply precommit

a31fb56

fix

16e83da

add script

42271a3

nanjiangwill added 5 commits November 16, 2025 17:08

Merge remote-tracking branch 'origin/geo3k_utils' into feat/vlm

8acccdd

feat: vlm single turn all components ready, start testing

122ef7c

merge main

030af3f

update

906ae47

lint

52ce7fa

update vlm readme

33536db

jhinpan and others added 8 commits December 3, 2025 22:59

Remove geo3k reward model & tol and use default math RM

20cabd5

Resolve conflicts

0f10239

Merge branch 'main' into feat/vlm

d042915

Add new exp figs

80cebdf

pre-commit and tiny fix

0f9d7a0

remove unused script

115ce76

update cleaner notes about numerical precision issue

dac6703

revert tol in math utils

ef10af1

zhuzilin approved these changes Dec 4, 2025

View reviewed changes

jhinpan added 3 commits December 4, 2025 03:18

resolve 1st comment

7965a26

solve 2nd comments

d8f5320

merge two multimodal blocks

9f07e85

zhuzilin marked this pull request as ready for review December 4, 2025 03:47

zhuzilin added the run-ci-short label Dec 4, 2025

fix ci

2a4f58d

zhuzilin merged commit 81d8cd6 into THUDM:main Dec 4, 2025
7 of 16 checks passed

nanjiangwill changed the title ~~[WIP] feat: add vlm training for FSDP~~ [FSDP, VLM] feat: add vlm training for FSDP Dec 4, 2025

coding-famer mentioned this pull request Dec 5, 2025

SFT for multi-turn tool call data is not supported properly #1008

Open

nanjiangwill mentioned this pull request Dec 10, 2025

[Roadmap][VLM] Support VLM Multi-Turn #1075

Open

3 tasks

zhuzilin mentioned this pull request Dec 26, 2025

Revert data processing of VLM #1232

Merged

Conversation

nanjiangwill commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhuzilin Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coding-famer commented Oct 26, 2025

Uh oh!

nanjiangwill commented Oct 27, 2025

Uh oh!

coding-famer commented Oct 27, 2025

Uh oh!

dongyuanjushi commented Nov 3, 2025

Uh oh!

nanjiangwill commented Nov 5, 2025

Uh oh!

dongyuanjushi commented Nov 5, 2025

Uh oh!

zhaochenyang20 commented Nov 5, 2025

Uh oh!

zhaochenyang20 commented Nov 9, 2025

Uh oh!

aJupyter commented Nov 12, 2025

Uh oh!

zhaochenyang20 commented Nov 16, 2025

Uh oh!

jhinpan commented Dec 2, 2025

Uh oh!

zhaochenyang20 commented Dec 3, 2025

Uh oh!

jhinpan commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhuzilin left a comment

Choose a reason for hiding this comment

Uh oh!

zhuzilin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

jhinpan Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

zhuzilin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

jhinpan Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

simondong1 commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adol001 commented Dec 26, 2025

Uh oh!

zhuzilin commented Dec 26, 2025

Uh oh!

nanjiangwill commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

nanjiangwill commented Oct 15, 2025 •

edited

Loading

jhinpan commented Dec 3, 2025 •

edited

Loading

simondong1 commented Dec 7, 2025 •

edited

Loading