-
Notifications
You must be signed in to change notification settings - Fork 369
Checkpoint conversion tools #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@stas00 FYI |
|
OK, to complete the conversion and make the model usable with "--finetune" the missing bits are: the first 2 of course need to be adjusted to the target tp/pp sizes. the last 2 need to be reset otherwise meg tries to resume training from some really high number of a sample which should be 0 instead. I haven't quite figured out how to solve the the workaround is to use the actual padded vocab size when finetuning, i.e.: for when the padded vocab ends up being 50688. |
|
Then the files layout, clearly Meg-LM expects this layout: whereas in the Meg-DS tree it wants: no The first segment of the path is: in the meg code. Additionally, we could probably convert |
Thanks for the feedback. I can update deepspeed_to_megatron.py for the first 2 to fix an inconsistency in the checkpoint state. However, I am unsure where to handle the last 2 since it would prevent the converted checkpoint from being used for continued training. So perhaps the finetuning script should handle the last 2. What do you think? |
|
but this checkpoint can't be loaded for continued training at the moment. e.g. it lacks the I'm not sure how you'd change the Perhaps we have 2 different modes here:
|
|
Found one more culprit - remember how Jared's script was asking for a megatron clone path when doing the conversion? It proved to be essential, since if we use the default Meg-DS when converting, it then fails to Things have changed in Meg-DS and it now it can't find grr, this appears really tricky, now that the codebases are starting to diverge. I can't even load the bigscience meg-ds checkpoint using Med-DS codebase in OK solved this by adding both clones to but it still doesn't work when then I try to train with the Megatron-LM tree. I'm trying to ask to restore that. bigscience-workshop/Megatron-DeepSpeed#7 (comment) Unless you have some bright ideas how to not to pickle structures that may be lacking in the target? |
I am a bit confused by this. Are you seeing iter=1 in the converted checkpoint? |
|
I think it perhaps expects the top-level in your code
I see |
It would be great to get some clarity on which iteration to use here. Also should it be
|
the problem is that it seems that Meg-DS doesn't update Meg's native
The latter. But it seems to be pointless, because once I add the missing key, it then fails with: we can't manifest optimizer states for Meg-LM out of nowhere, so it appears that after the conversion only inference or finetuning is possible. In which case it's probably pointless to try to keep Perhaps let's for now handle just the clear case of inference/finetuning with an assumption that finetuning will require a different dataset? I think it's only when we reshape the checkpoint as discussed a few days later to support changing the degree of TP, is that when we would try to preserve everything, but that's when saving it from Meg-DS back to Meg-DS. |
|
So I think the only remaining thing to address (other than embeddings) is: #14 (comment) And let's set:
to whatever sorry, brainstorming here... but then what if the input checkpoint isn't named |
|
In another checkpoint I was given the discrepancy is 2x, Meg-DS file is This tells me that |
|
OK, I have figured this one out. You were getting the wrong this is the real iteration, but the original code args.iteration is whatever was the iteration at the start of training. Does it make sense? |
|
@tjruwase, 2 more things that I discovered are different from the checkpoint generated by Meg-LM natively. So these need to be changed as demonstrated: and: So I think after this fix, the resulting checkpoint will be matching the native one. Plus the resulting file structure: #14 (comment) and then it's good to be merged. I did multiple tests on the final stage meg2hf and the conversion appears to be correct. |
Iteration folder latest checkpoint version file
|
This is good to merge, @tjruwase! Thank you! |
|
@tjruwase, I made a small change to your work to separate the creation of the checkpoint and saving it, so that I could re-use it to create the HF transformers checkpoint on the fly. Also made the PP/TP size 1 by default, since in the HF case it's always that for now. If it looks acceptable to you perhaps let's merge this back into your master tree? If so please cherry-pick these 2: Thank you! |
|
here you go - I added 2 more where I made the scripts executable ;) |
Sunspot frameworks tests
Tools for converting checkpoints.