checkpoint.pt for OMat24 eqV2 Fine Tuning #1073

yang-lin430 · 2025-03-20T10:07:09Z

What would you like to report?

Hello contributors,

I'm running fine-tuning according to the tutorial with the provided dataset. It seems the fine tune finished successfully, because there is no error in train.txt. But the checkpoints folder is empty, I can't find the trained model. Would you mind guiding me how to deal with it?

Referring to issues #990 , I started from fine-tune config yml file at /fairchem/configs/omat24/finetune. The changes I made:

add the dataset part according to the fine tuning tutorial;
load_balancing_on_error: warn_and_no_balance; gpus: 1; set logger: tensorboard

The command I used to fine-tune is :
! python {fairchem_main()} --mode train --config-yml {yml} --checkpoint {checkpoint_path} --run-dir fine-tuning --identifier ft-oxides --num-gpus 1 > train.txt 2>&1

There are no errors in the train.txt, but it looks quite different from the one in tutorial

Thank you so much for your attention!

kyonofx · 2025-03-20T18:05:46Z

Hi,

sounds like it could be due to saving frequency (https://github.com/FAIR-Chem/fairchem/blob/main/configs/omat24/finetune/eqV2_31M_ft_salexmptrj.yml#L97) lower than total training steps.

yang-lin430 · 2025-03-21T01:17:51Z

Hi kyonofx,

Thanks for the reply. I adjust the eval_every, it works now! Thanks very much for the help!

yang-lin430 closed this as completed Mar 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpoint.pt for OMat24 eqV2 Fine Tuning #1073

checkpoint.pt for OMat24 eqV2 Fine Tuning #1073

yang-lin430 commented Mar 20, 2025

kyonofx commented Mar 20, 2025

yang-lin430 commented Mar 21, 2025

checkpoint.pt for OMat24 eqV2 Fine Tuning #1073

checkpoint.pt for OMat24 eqV2 Fine Tuning #1073

Comments

yang-lin430 commented Mar 20, 2025

What would you like to report?

kyonofx commented Mar 20, 2025

yang-lin430 commented Mar 21, 2025