Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training duration of pre-training model #77

Closed
zhongmengyi opened this issue May 19, 2024 · 1 comment
Closed

Training duration of pre-training model #77

zhongmengyi opened this issue May 19, 2024 · 1 comment

Comments

@zhongmengyi
Copy link

Hello, I would like to ask how long it took to train the three pre-training models provided by Graphcast, and how much memory they occupy? Is there any specific data? Thanks!

@alvarosg
Copy link
Collaborator

Thanks for your question.

Training the main 0.25 deg ERA5 GraphCast those models took about four weeks on 32 TPU v4 devices (each TPU with 32GB of RAM). About two weeks for the initial 1 step phase, and another two weeks for the 2-12 steps annealing.

However, for ease of training (see more details here) I would recommend to use GPUs/TPUs with more memory than 32GB.

The operational one took about the same, except that it has an additional phase of 1AR fine-tuning in between those two phases, which takes an extra day.

The 1 deg model takes about 1.5 days to train in total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants