Skip to content

Conversation

@bouweandela
Copy link
Member

@bouweandela bouweandela commented Oct 24, 2022

Description

Make esmvalcore._config public by moving it to esmvalcore.config and remove duplicate configuration code by moving it from the esmvalcore.experimental module to esmvalcore.config.

This also adds proper caching of the CMOR tables based on the modification time of config-developer.yml, leading to a considerable speedup of the tests.

Closes #1568
Closes #1579

Related to (the by now somewhat outdated) #498

Related to ESMValGroup/ESMValTool#2736

Part of #1609.

Deprecations:

  • esmvalcore.cmor.table.read_cmor_tables(cfg_developer): cfg_developer used to be the content of the config-developer file already loaded using yaml.safe_load, from now on it should be just the path to the file. The old behavior will be supported until version v2.10.
  • esmvalcore._config: this is a private module that is not part of the public API. However, because it is used in ESMValTool, it will remain available until v2.9. The esmvalcore.config.CFG.start_session method should be used as a replacement for the read_config_user_file function.
  • esmvalcore.experimental.config.Session.to_config_user and from_config_user are deprecated and will be removed in v2.9.

Link to documentation: https://esmvaltool--1769.org.readthedocs.build/projects/ESMValCore/en/1769/api/esmvalcore.config.html


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@bouweandela bouweandela added the enhancement New feature or request label Oct 24, 2022
@bouweandela bouweandela added this to the v2.8.0 milestone Oct 24, 2022
@bouweandela bouweandela force-pushed the deduplicate-config-code branch from b33dc26 to ad1ad77 Compare November 1, 2022 20:20
@codecov
Copy link

codecov bot commented Nov 1, 2022

Codecov Report

Merging #1769 (3eed805) into main (3149d07) will increase coverage by 0.31%.
The diff coverage is 99.60%.

@@            Coverage Diff             @@
##             main    #1769      +/-   ##
==========================================
+ Coverage   91.16%   91.48%   +0.31%     
==========================================
  Files         202      201       -1     
  Lines       10882    10878       -4     
==========================================
+ Hits         9921     9952      +31     
+ Misses        961      926      -35     
Impacted Files Coverage Δ
esmvalcore/config/_diagnostics.py 87.83% <ø> (ø)
esmvalcore/config/_logging.py 97.67% <ø> (ø)
esmvalcore/config/_config.py 98.68% <98.68%> (ø)
esmvalcore/_citation.py 80.99% <100.00%> (-6.62%) ⬇️
esmvalcore/_config/__init__.py 100.00% <100.00%> (ø)
esmvalcore/_data_finder.py 96.69% <100.00%> (ø)
esmvalcore/_main.py 90.66% <100.00%> (+13.31%) ⬆️
esmvalcore/_recipe.py 95.81% <100.00%> (-0.10%) ⬇️
esmvalcore/_task.py 71.55% <100.00%> (-0.81%) ⬇️
esmvalcore/cmor/table.py 94.49% <100.00%> (+0.20%) ⬆️
... and 18 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@bouweandela bouweandela force-pushed the deduplicate-config-code branch from 160d2fe to 3b25fe0 Compare November 1, 2022 22:41
@bouweandela bouweandela force-pushed the deduplicate-config-code branch 2 times, most recently from 7093fcd to aaf9a4a Compare November 3, 2022 21:33
@bouweandela bouweandela force-pushed the deduplicate-config-code branch from 69e6918 to 0b15dc6 Compare November 4, 2022 14:38
@bouweandela bouweandela changed the title Remove duplicate configuration code Add esmvalcore.config module Nov 4, 2022
@bouweandela bouweandela changed the title Add esmvalcore.config module Add esmvalcore.config module Nov 4, 2022
@bouweandela bouweandela added the api Notebook API label Nov 4, 2022
@@ -0,0 +1,130 @@
"""Functions dealing with config-user.yml / config-developer.yml."""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: this file was moved here from esmvalcore/_config/_config.py, but because all the old code for reading config-user.yml was removed from it, git doesn't recognize it.

@bouweandela bouweandela force-pushed the deduplicate-config-code branch from d82146e to 004a687 Compare November 4, 2022 15:48
@bouweandela bouweandela marked this pull request as ready for review November 4, 2022 16:18
@bouweandela
Copy link
Member Author

@ESMValGroup/esmvaltool-coreteam It would be great if some of you could have a look at this.

@bouweandela bouweandela added the bug Something isn't working label Nov 4, 2022
@schlunma
Copy link
Contributor

schlunma commented Nov 7, 2022

I try to find some time this week to have a look at this 👍

One question at the very beginning: as far as I understand this is only a refactoring and does not change any functionality, is that correct? So we would not expect any recipe output to change, right?

@bouweandela
Copy link
Member Author

Thanks! Indeed I wouldn't expect any recipe output to change. Maybe some command line option does no longer work, but even that seems unlikely because we have fairly good test coverage and we've already been using the esmvalcore.experimental.config code for validating the config-user.yml file for some time now.

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Bouwe, great work! This really simplifies the configuration module and making it public is also a good idea.

I tested this with a bunch of recipes, different command line options, and different options in the config file. Everything worked fine!! 🎉

I have a couple of comments on the code itself, and two more general comments:

  • Running recipes with this produces a bunch of ESMValCoreDeprecationWarning: The private module esmvalcore._config has been deprecated in ESMValCore version 2.8 and is scheduled for removal in version 2.9.0. Please use the public module esmvalcore.config instead. warnings. I guess they come from ESMValTool, not ESMValCore; is this correct?
  • Why did you decide to keep the esmvalcore.experimental.config module? It's just a copy of esmvalcore.config. IMHO we could also deprecate this.

@bouweandela
Copy link
Member Author

Thanks for the very thorough review @schlunma! I didn't manage to work my way through all comments, will continue tomorrow..

@bouweandela
Copy link
Member Author

Running recipes with this produces a bunch of ESMValCoreDeprecationWarning: The private module esmvalcore._config has been deprecated in ESMValCore version 2.8 and is scheduled for removal in version 2.9.0. Please use the public module esmvalcore.config instead. warnings. I guess they come from ESMValTool, not ESMValCore; is this correct?

There were also some coming from ESMValCore, but I suppressed those now. Once #1609 completely merged, the suppressions should no longer be needed, but I didn't want to change esmvalcore/_recipe.py in this pull request.

Why did you decide to keep the esmvalcore.experimental.config module? It's just a copy of esmvalcore.config. IMHO we could also deprecate this.

Good idea, I just did that.

@bouweandela
Copy link
Member Author

Thanks a lot for reviewing @schlunma! I think I have addressed everything now, could you have another look please?

Copy link
Contributor

@schlunma schlunma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for implementing my comments @bouweandela! I re-tested this and everything works as expected 🚀

@ESMValGroup/esmvaltool-coreteam would be great if someone could merge this, thanks 👍

@valeriupredoi
Copy link
Contributor

I shallz! But am not entirely happy with the cov test failing, lemme have a looksee first (before moaning about it)

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of good stuff here, awesome work, guys! Just a couple quick and dirty pointers from me, I will merge since the cov test is complaining about old code 👍

if session['compress_netcdf']:
logger.warning(
"You have enabled NetCDF compression. Accessing .nc files can be "
"much slower than expected if your access pattern does not match "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cryptic - what does "if your access pattern does not match" mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminds me of "Negative, Ghost Rider, the pattern is full!" 😆 (Top Gun)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can read more about HDF5 compression here: https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/ (and see also here https://docs.xarray.dev/en/stable/user-guide/io.html#chunk-based-compression). What it comes down to is that enabling compression will create compressed chunks of data inside the file. In order to read a single point from a chunk, you need to decompress the entire chunk. So if you first save with chunks that are e.g. one entire lat/lon slice and then try to read a long timeseries from a single lat/lon point this will be very inefficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know all that stuff, man, cheers, but a user will not (in 98% of cases) 😁

"""
from ._config import DIAGNOSTICS, configure_logging
from .config._diagnostics import DIAGNOSTICS
from .config._logging import configure_logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there is a good reason why these imports are not happening at top level, but why are they not happening at top level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it gives a very slow user experience, e.g. if you run esmvaltool --help, you expect a prompt response. However, if it needs to import everything and load all the cmor tables, this takes up to a few seconds and gives the command line a very sluggish feel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha, very cool point I didn't think of 🎖️

Comment on lines +395 to +396
print(f"ERROR: output directory {session.session_dir} already"
" exists, aborting to prevent data loss")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am with Manu on this one, but OK since mkdir will raise something anyway

@valeriupredoi valeriupredoi merged commit 0b02960 into main Nov 14, 2022
@valeriupredoi valeriupredoi deleted the deduplicate-config-code branch November 14, 2022 12:46
@bouweandela
Copy link
Member Author

I am with Manu on this one, but OK since mkdir will raise something anyway

Maybe it shouldn't even do that, right? Maybe it should just automatically come up with an another directory name? I tried to implement something like that here: 52af816.

@bouweandela
Copy link
Member Author

Thanks for reviewing @schlunma and merging @valeriupredoi! 🥳

@valeriupredoi
Copy link
Contributor

I am with Manu on this one, but OK since mkdir will raise something anyway

Maybe it shouldn't even do that, right? Maybe it should just automatically come up with an another directory name? I tried to implement something like that here: 52af816.

indeed, but that's a magnet for disaster, we have to be careful with our data spawning on HPC's

@schlunma
Copy link
Contributor

Apparently this broke our CI tests (I already re-started them again once). Anyone got an idea what's going on there?

@valeriupredoi
Copy link
Contributor

@zklaus 🦅 =eyed it and raised it on Slack - tis the implicit line appearing twice in setup.cfg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Notebook API bug Something isn't working deprecated feature enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading a custom configuration file through the Python API does not work Error message is broken when non-existent rootpath is used from Python API

4 participants