Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor leaderboard code #990

Merged
merged 1 commit into from
Oct 11, 2024
Merged

Refactor leaderboard code #990

merged 1 commit into from
Oct 11, 2024

Conversation

ph-kev
Copy link
Member

@ph-kev ph-kev commented Oct 3, 2024

closes #948 , closes #956

This PR refactors the leaderboard code using the new features from ClimaAnalysis. This commit also deletes the old leaderboard code and the tests for it. Also, there is an off by one month issue when handling the dates and seasons. This issue arises because the first day of each month in the simulation data represents the monthly average of the previous month. This issue only affect the plots involving seasonal data. This is fixed in this commit.

The commit also moves the code to leaderboard.jl and a line is added to the pipeline to run the script.

One significant difference is the leaderboard which now plot the best and worst single model using only annual rather than averaging the error over the annual and seasonal data.

The leaderboard now looks like this:
bias_leaderboard

To compare, the previous leaderboard looks like this:
bias_leaderboard

The errors for the seasonal data change slightly because of the off by one month issue. The new bias plot for MAM looks like this:
bias_pr_MAM

The old bias plot for MAM looks this:
bias_pr_MAM

In addition to the plots above, there is now a plot with all the seasons for a subset of the variables (see compare_vars_biases_groups for which variables are plotted together in data_sources.jl).
bias_pr_all_seasons

@ph-kev ph-kev force-pushed the kp/leaderboard branch 20 times, most recently from 7e866ff to 441d700 Compare October 5, 2024 18:47
Copy link
Member

@Sbozzolo Sbozzolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to split the leaderboard.jl file in such a way that if one wants to add a new variable they would have to only change one file? (Essentially, moving the loaders to a separate file)

@ph-kev ph-kev force-pushed the kp/leaderboard branch 3 times, most recently from 1b8a197 to 9030e9d Compare October 7, 2024 22:41
@ph-kev
Copy link
Member Author

ph-kev commented Oct 7, 2024

Would it be possible to split the leaderboard.jl file in such a way that if one wants to add a new variable they would have to only change one file? (Essentially, moving the loaders to a separate file)

I moved all the code involving data and preprocessing to data_sources.jl.

seasons = ["ANN", "MAM", "JJA", "SON", "DJF"]

# Print dates for debugging
pr_var = sim_var_dict["pr"]() # it shouldn't matter what short name we use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assuming that pr is available, you can pick the first entry in the dict

Comment on lines 91 to 90
if season != "ANN"
CairoMakie.save(
joinpath(leaderboard_base_path, "bias_$(first(compare_vars_biases))_$season.png"),
fig_bias,
)
else
CairoMakie.save(
joinpath(leaderboard_base_path, "bias_$(first(compare_vars_biases))_total.png"),
fig_bias,
)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if season != "ANN"
CairoMakie.save(
joinpath(leaderboard_base_path, "bias_$(first(compare_vars_biases))_$season.png"),
fig_bias,
)
else
CairoMakie.save(
joinpath(leaderboard_base_path, "bias_$(first(compare_vars_biases))_total.png"),
fig_bias,
)
end
CairoMakie.save(
joinpath(leaderboard_base_path, "bias_$(first(compare_vars_biases))_$season.png"),
fig_bias,
)

It's fine to change the name to _ANN

Copy link
Member

@Sbozzolo Sbozzolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fantastic. Can you add an entry to the NEWS file to briefly mention to script? And can you add a section somewhere in the docs about "How do I add a new variable to the leaderboard?"

@ph-kev ph-kev force-pushed the kp/leaderboard branch 2 times, most recently from 57eb779 to 4e03a02 Compare October 9, 2024 04:07
Copy link
Member

@Sbozzolo Sbozzolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

I'd recommend you tag Julia for a second review before merging.

Copy link
Member

@juliasloan25 juliasloan25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you Kevin! I just had a comment about moving some information to the docs, but after that it looks good to merge

experiments/ClimaEarth/run_amip.jl Outdated Show resolved Hide resolved
experiments/ClimaEarth/leaderboard/leaderboard.jl Outdated Show resolved Hide resolved
Copy link
Member

@juliasloan25 juliasloan25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'm not sure why the GPU slabplanet runs fail - is this rebased on the latest main?

@ph-kev
Copy link
Member Author

ph-kev commented Oct 10, 2024

Looks good! I'm not sure why the GPU slabplanet runs fail - is this rebased on the latest main?

I don't think this is rebased on the latest main. I am going to rebase and push right now.

@ph-kev ph-kev force-pushed the kp/leaderboard branch 2 times, most recently from 0fde103 to 458cb92 Compare October 10, 2024 23:55
This commit refactors the leaderboard code using the new features from
ClimaAnalysis. This commit also deletes the old leaderboard code and the
tests for it. Also, there is an off by one month issue when handling the
dates and seasons. This arises because the simulation data assigns the
next month the monthly average for this month. For instance, the monthly
average for January 2010 is assigned the date 2/1/2010. This is fixed in
this commit.

The commit also moves the code to leaderboard.jl and data_sources.jl.
The file read_amip.jl calls leaderboard.jl to plot now. The conditional
to run leaderboard.jl is changed from t_end > 86400 to t_end
> 86400 * 31 * 3 since it doesn't make sense to run the code unless
monthly averages are computed. This is because the dataset contains
monthly averages.

One significant difference is the leaderboard which now plot the best
and worst single model using only annual rather than averaging the error
over annual and seasonal data.
@ph-kev ph-kev marked this pull request as ready for review October 11, 2024 18:45
@ph-kev ph-kev merged commit ebb58bc into main Oct 11, 2024
8 checks passed
@ph-kev ph-kev deleted the kp/leaderboard branch October 11, 2024 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace leaderboard with ClimaAnalysis leaderboard Leaderboard test fails due to undownloadable artifact
3 participants