Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkpoint.pt for OMat24 eqV2 Fine Tuning #1073

Closed
yang-lin430 opened this issue Mar 20, 2025 · 2 comments
Closed

checkpoint.pt for OMat24 eqV2 Fine Tuning #1073

yang-lin430 opened this issue Mar 20, 2025 · 2 comments

Comments

@yang-lin430
Copy link

What would you like to report?

Hello contributors,

I'm running fine-tuning according to the tutorial with the provided dataset. It seems the fine tune finished successfully, because there is no error in train.txt. But the checkpoints folder is empty, I can't find the trained model. Would you mind guiding me how to deal with it?

Referring to issues #990 , I started from fine-tune config yml file at /fairchem/configs/omat24/finetune. The changes I made:

  • add the dataset part according to the fine tuning tutorial;
  • load_balancing_on_error: warn_and_no_balance; gpus: 1; set logger: tensorboard

The command I used to fine-tune is :
! python {fairchem_main()} --mode train --config-yml {yml} --checkpoint {checkpoint_path} --run-dir fine-tuning --identifier ft-oxides --num-gpus 1 > train.txt 2>&1

There are no errors in the train.txt, but it looks quite different from the one in tutorial

Thank you so much for your attention!

@kyonofx
Copy link
Collaborator

kyonofx commented Mar 20, 2025

Hi,

sounds like it could be due to saving frequency (https://github.com/FAIR-Chem/fairchem/blob/main/configs/omat24/finetune/eqV2_31M_ft_salexmptrj.yml#L97) lower than total training steps.

@yang-lin430
Copy link
Author

Hi kyonofx,

Thanks for the reply. I adjust the eval_every, it works now! Thanks very much for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants