Training duration of pre-training model #77

zhongmengyi · 2024-05-19T03:37:39Z

Hello, I would like to ask how long it took to train the three pre-training models provided by Graphcast, and how much memory they occupy? Is there any specific data? Thanks!

alvarosg · 2024-07-17T09:38:27Z

Thanks for your question.

Training the main 0.25 deg ERA5 GraphCast those models took about four weeks on 32 TPU v4 devices (each TPU with 32GB of RAM). About two weeks for the initial 1 step phase, and another two weeks for the 2-12 steps annealing.

However, for ease of training (see more details here) I would recommend to use GPUs/TPUs with more memory than 32GB.

The operational one took about the same, except that it has an additional phase of 1AR fine-tuning in between those two phases, which takes an extra day.

The 1 deg model takes about 1.5 days to train in total.

alvarosg closed this as completed Jul 17, 2024

This was referenced Jul 17, 2024

Issues with GraphCast Training – Request for Assistance #78

Closed

Training: Learning Rate schedule based on iterations rather than epochs #80

Closed

Regarding training #12

Closed

How to train a model by myself #69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training duration of pre-training model #77

Training duration of pre-training model #77

zhongmengyi commented May 19, 2024

alvarosg commented Jul 17, 2024

Training duration of pre-training model #77

Training duration of pre-training model #77

Comments

zhongmengyi commented May 19, 2024

alvarosg commented Jul 17, 2024