Skip to content

[merger] ci: add FSDP checkpoint merging test into CI#1266

Closed
0x404 wants to merge 3 commits intoverl-project:mainfrom
0x404:test_fsdp
Closed

[merger] ci: add FSDP checkpoint merging test into CI#1266
0x404 wants to merge 3 commits intoverl-project:mainfrom
0x404:test_fsdp

Conversation

@0x404
Copy link
Copy Markdown
Collaborator

@0x404 0x404 commented Apr 26, 2025

Currently --test option is only supported for merging megatron checkpoints, this PR support test compatibility for FSDP checkpoints.

@ETOgaosion
Copy link
Copy Markdown
Collaborator

ETOgaosion commented Apr 26, 2025

Hi @0x404 , thanks a lot for contribution!

The testing option is mainly work for CI tests, could you help to add some testing cases in e2e_ppo_trainer.yml?

You may need to store the checkpoint first for converter to compare, add some option to tests/e2e folder scripts may help.

Your great contribution will be appreciated~

@0x404
Copy link
Copy Markdown
Collaborator Author

0x404 commented Apr 27, 2025

Sure! I will update this latter.

@0x404 0x404 changed the title [merger] feat: support test correctness of hf_model when merging FSDP checkpints [merger] ci: add FSDP checkpoint merging test into CI Apr 28, 2025
@0x404
Copy link
Copy Markdown
Collaborator Author

0x404 commented Apr 29, 2025

This PR depends on #1288. Hi @ETOgaosion, could you help review these two changes?

@0x404 0x404 marked this pull request as ready for review April 29, 2025 02:34
@ETOgaosion
Copy link
Copy Markdown
Collaborator

Appreciate a lot for contribution, please check my comments in #1288 , let's work together to merge that PR first.

@0x404
Copy link
Copy Markdown
Collaborator Author

0x404 commented Apr 29, 2025

please check my comments in #1288

Sure, but I don't see a comment there yet.

@ETOgaosion
Copy link
Copy Markdown
Collaborator

@0x404 Could you see these?
#1288 (comment)

And the reviewing content?

@0x404
Copy link
Copy Markdown
Collaborator Author

0x404 commented Apr 29, 2025

Could you see these?

I can see this comment #1288 (comment)

And the reviewing content?

But I can't see the review comments, maybe you forgot to submit them?

@ETOgaosion
Copy link
Copy Markdown
Collaborator

ETOgaosion commented Apr 29, 2025

Sorry, my fault, @0x404 , actually submit button is needed.

And I think that why can't we just add tests to that PR directly and test its function? So we can debug directly and save our time. Actually it would be better to test every function proposed in PR.

So it would be better to merge this one to #1288 ~

@0x404
Copy link
Copy Markdown
Collaborator Author

0x404 commented Apr 29, 2025

No problem, I will merge this pr to it and close this latter tonight.

@0x404
Copy link
Copy Markdown
Collaborator Author

0x404 commented Apr 29, 2025

Move to #1288 and close this.

@0x404 0x404 closed this Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants