-
Notifications
You must be signed in to change notification settings - Fork 44
Add an interface for adding new data sources and add support for intake-esgf as a first example #2765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'll work with you on this one @bouweandela 🍺 |
e91e383 to
9d67ed5
Compare
valeriupredoi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a dive in this, bud - let me know how I can help!
|
this one here ties in very well with this PR, bud #2785 - enjoy your time off 🏖️ |
|
hey @bouweandela hope you're enjoying your holiday time! I kept myself busy and we now have Zarr support (in |
196c6e4 to
8a7a935
Compare
5283ecb to
b98ab5d
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2765 +/- ##
==========================================
+ Coverage 95.50% 95.66% +0.15%
==========================================
Files 259 263 +4
Lines 15099 15505 +406
==========================================
+ Hits 14421 14833 +412
+ Misses 678 672 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3bf06ad to
ef2e7cd
Compare
0b12c7b to
1794742
Compare
ca867c6 to
94287ab
Compare
|
am finally able to start looking at this in great detail, bud, sorry, got hijacked by other things until now 🍺 |
|
many thanks @bouweandela - as promised, I have now started to stress-test this baby - please see my very initial query #2765 (comment) The first type of test is a basic run (what they do with any aircraft prototype - they just taxi it at the very beginning):
Am looking through the debug log and am seeing -> that's about 3 minutes of Globus SDK going around over requests and fetching data that takes 30s with the old esgf pyclient - we need to sort this out somehow or we'll be toast! |
|
another thing from the first test: we really shouldn't dump the intake config yamls in the debug log file - that poor thing is now 33k lines 😆 |
e94e3ec to
c0939d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds a modular interface for configuring and using multiple data sources in ESMValCore. The main change introduces support for intake-esgf as an alternative to the existing esgf-pyclient for finding ESGF data, while refactoring the existing local and ESGF data source implementations to use the new interface.
Key changes:
- New
esmvalcore.iomodule withDataSourceandDataElementprotocols for extensible data source support - Configuration via YAML files under
projectsinstead of deprecatedrootpath,drs, andsearch_esgfsettings - New CLI commands:
esmvaltool config show,esmvaltool config list,esmvaltool config copy
Reviewed Changes
Copilot reviewed 102 out of 104 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_provenance.py |
Removed tests (moved to tests/unit/provenance/test_trackedfile.py) |
tests/unit/test_dataset.py |
Updated tests to use new data source configuration and mock structure |
tests/unit/task/test_print.py |
Updated test expectations for LocalFile and Path representations |
tests/unit/task/test_diagnostic_task.py |
Updated TrackedFile initialization to use Path |
tests/unit/recipe/test_to_datasets.py |
Updated tests to handle new data source interface and removed _file_globs references |
tests/unit/recipe/test_recipe.py |
Removed test for _schedule_for_download function |
tests/unit/provenance/test_trackedfile.py |
Added comprehensive tests for TrackedFile with new protocol support |
tests/unit/preprocessor/test_shared.py |
Updated PreprocessorFile instantiation to use Path |
tests/unit/main/test_esmvaltool.py |
Updated test to use search_data instead of deprecated search_esgf |
tests/unit/local/test_to_iris.py |
Added test for _get_attr_from_field_coord function moved to local module |
tests/unit/local/test_time.py |
Removed _get_start_end_year tests and updated date extraction tests |
tests/unit/local/test_get_data_sources.py |
Added deprecation test for legacy DataSource class |
tests/unit/local/test_facets.py |
Updated tests for renamed LocalDataSource and new timerange handling |
tests/unit/io/test_load_data_sources.py |
Added tests for new data source loading functionality |
tests/unit/io/test_intake_esgf.py |
Added comprehensive tests for intake-esgf data source implementation |
tests/unit/esgf/test_search.py |
Added tests for new ESGFDataSource class |
tests/unit/esgf/test_download.py |
Updated tests for ESGFFile as DataElement with new methods |
tests/unit/config/test_data_sources.py |
Added tests for data source loading with legacy configuration |
tests/unit/config/test_config_validator.py |
Added validator tests for search_data configuration option |
tests/unit/config/test_config_object.py |
Updated tests to use search_data instead of search_esgf |
tests/unit/config/test_config.py |
Added tests for built-in configuration file descriptions |
tests/sample_data/experimental/test_run_recipe.py |
Updated to configure data sources explicitly |
tests/integration/test_main.py |
Added tests for new config commands and updated existing tests |
tests/integration/recipe/test_recipe.py |
Added data source configuration helpers and updated tests |
tests/integration/recipe/test_check.py |
Updated data availability check tests |
tests/integration/preprocessor/_io/test_zarr.py |
Added @pytest.mark.online markers |
tests/integration/preprocessor/_io/test_load.py |
Updated load function call to use new interface |
tests/integration/esgf/test_search_download.py |
Added @pytest.mark.online marker |
tests/integration/esgf/search_results/expected.yml |
Added timerange facet to expected results |
tests/integration/dataset/test_dataset.py |
Updated to configure data sources explicitly |
tests/integration/conftest.py |
Updated test fixtures to work with new data source interface |
tests/integration/cmor/_fixes/icon/test_icon_xpp.py |
Added session fixtures and @pytest.mark.online markers |
tests/integration/cmor/_fixes/icon/test_icon.py |
Added session fixtures and @pytest.mark.online markers |
tests/integration/cmor/_fixes/icon/conftest.py |
Added fixture for ICON data source configuration |
tests/conftest.py |
Removed rootpath configuration from session fixture |
pyproject.toml |
Added intake-esgf and globus-sdk dependencies and new pytest marker |
esmvalcore/typing.py |
Updated type definitions to be more specific |
esmvalcore/preprocessor/_io.py |
Refactored to use DataElement protocol and removed functions moved to local module |
esmvalcore/preprocessor/__init__.py |
Updated to use DataElement instead of file paths |
esmvalcore/local.py |
Major refactoring with LocalDataSource class, moved functions from _io.py, and deprecation warnings |
esmvalcore/io/protocol.py |
New file defining DataSource and DataElement protocols |
esmvalcore/io/intake_esgf.py |
New module implementing intake-esgf data source |
esmvalcore/io/__init__.py |
New module with load_data_sources function |
esmvalcore/exceptions.py |
Added SuppressedError base to RecipeError |
esmvalcore/esgf/_search.py |
Added ESGFDataSource class and updated file selection |
esmvalcore/esgf/_download.py |
Updated ESGFFile to implement DataElement protocol |
esmvalcore/esgf/__init__.py |
Updated docstring and exports |
esmvalcore/dataset.py |
Major refactoring to use data sources instead of direct file finding |
esmvalcore/config/configurations/defaults/*.yml |
Added comments to default configuration files |
esmvalcore/config/configurations/data-*.yml |
New data source configuration files |
esmvalcore/config/config-logging.yml |
Updated logging configuration |
esmvalcore/config/_validated_config.py |
Added return type annotation |
esmvalcore/config/_data_sources.py |
New module for data source configuration |
esmvalcore/config/_config_validators.py |
Added validators and deprecators for new/deprecated options |
esmvalcore/config/_config_object.py |
Updated to load projects from config-developer file |
esmvalcore/config/_config.py |
Added return type to load_config_developer |
esmvalcore/cmor/_fixes/icon/_base_fixes.py |
Updated to use new data sources interface |
esmvalcore/_task.py |
Updated to use Path for filenames |
environment.yml |
Added intake-esgf and globus-sdk dependencies |
doc/recipe/overview.rst |
Updated documentation for new configuration option |
doc/configurations |
Added symlink to configurations directory |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks a lot for all your advice so far @valeriupredoi! I've finally marked it as 'ready for review'. |
|
Merging this now so it is ready to be presented at the workshop. The design of the new way of configuring the tool has been open for discussion for over a year now in #2371, with added detail in #2584, so I assume that anyone has had time to comment on its usability. If you encounter any issues with the new way of configuring data sources or using intake-esgf, please report them in an issue. |
momentous 😃
|

Description
Add an interface for adding new data sources. Documentation of the new interface is available here:
esmvalcore.io.The existing
esmvalcore.localandesmvalcore.esgfmodules have been modified to make use of the new interface and as an example use case, support for using intake-esgf to find input data has been added.The plan is to use this interface to add support for xcube and intake-esm next.
Several commands have been added:
esmvaltool config show: print the current configurationesmvaltool config list: list available example configuration filesesmvaltool config copy: copy an example configuration file to your configuration directory, i.e.~/.config/esmvaltoolor the path defined by theESMVALTOOL_CONFIG_DIRenvironment variable.To try the new intake-esgf data source, configure
esmvaltoolto use it by running the commandesmvalcore config copy intake-esgf-data.yml.This pull request also adds the option to skip tests that require an internet connection by marking those as
online. E.g.pytest -m 'not online'will skip those.Related to #2584
Deprecations
The following configuration settings are no longer needed because they can be configured using the new data sources format.
The following configuration settings are deprecated and will be removed in v2.16:
rootpathdrsdownload_dirsearch_esgf: use the new data sources to configure the 'never' option. The 'when_missing' and 'always' values are equivalent to configuring both local and ESGF data sources and using the newsearch_dataoption with value 'quick' and 'complete' respectively.and in config-developer.yml
input_dirinput_fileignore_warningsThe following Python functions are deprecated and will be removed in v2.16:
esmvalcore.local.DataSource: useesmvalcore.local.LocalDataSourceinsteadesmvalcore.local.find_files: useesmvalcore.local.LocalDataSource.find_datainsteadDocumentation
Follow up ideas
esmvalcore.esgfandesmvalcore.localintoesmvalcore.io. To avoid introducing even more changes in the pull request, I will do this in a follow up pull request. Move thelocalandesgfmodules fromesmvalcoretoesmvalcore.io#2882siteconfiguration setting that selects defaults appropriate to that site, e.g.site: levantewould select data sources and dask settings appropriate to Levante,site: jasminfor Jasmin, to simplify configuration of the tool Add asiteoption to theget_config_usercommand #1706Before you get started
Checklist
It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.
To help with the number pull requests: