Skip to content
This repository has been archived by the owner on Jan 10, 2025. It is now read-only.

Two dates reforecasts dask warnings and takes ages #9

Closed
aaronspring opened this issue May 7, 2021 · 4 comments
Closed

Two dates reforecasts dask warnings and takes ages #9

aaronspring opened this issue May 7, 2021 · 4 comments

Comments

@aaronspring
Copy link
Collaborator

aaronspring commented May 7, 2021

It takes ages to load two dates from reforecasts. Shouldnt that work better and faster? @floriankrb @b8raoult

>>> ds = cml.load_dataset("s2s-ai-challenge-training-input", origin="ecmwf", date=["20200102","20200109"], parameter='t2m')
By downloading data from this dataset, you agree to the terms and conditions defined at https://apps.ecmwf.int/datasets/data/s2s/licence/. If you do not agree with such terms, do not download the data.
ds.to_xarray()
>>> ds.to_xarray()
WARNING: ecmwflibs universal: found eccodes at /work/mh0727/m300524/conda-envs/s2s-ai/lib/libeccodes.so
Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.12.3
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/core.py:4299: PerformanceWarning: Increasing number of chunks by factor of 20
  **blockwise_kwargs,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/core.py:4299: PerformanceWarning: Increasing number of chunks by factor of 11
  **blockwise_kwargs,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/core.py:4299: PerformanceWarning: Increasing number of chunks by factor of 46
  **blockwise_kwargs,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/core.py:4299: PerformanceWarning: Increasing number of chunks by factor of 20
  **blockwise_kwargs,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/core.py:4299: PerformanceWarning: Increasing number of chunks by factor of 11
  **blockwise_kwargs,
/work/mh0727/m300524/conda-envs/s2s-ai/lib/python3.7/site-packages/dask/array/core.py:4299: PerformanceWarning: Increasing number of chunks by factor of 46
  **blockwise_kwargs,
... takes more than 10 minutes

I think the challenge here is that we have 20 forecast_reference_times per file and they cannot be stacked easily. As it is now, this is a huge bottleneck.

@aaronspring
Copy link
Collaborator Author

even on EWC, where download happens in a snap, this takes raises the chunking warning. not specifying format takes grib.

With netcdf, it works. But also takes one minute. That means it takes much longer for 53 reforecasts.

ds = cml.load_dataset("s2s-ai-challenge-training-input", origin="ecmwf", date=["20200102","20200109"], parameter='t2m', format='netcdf').to_xarray()

@aaronspring
Copy link
Collaborator Author

xr.open_mfdataset('202001*.nc', combine='nested') is much faster, therefore ecmwf/climetlab#16

@aaronspring
Copy link
Collaborator Author

see for faster access keywords also: pydata/xarray#1823 (comment), but also maybe chunks

@aaronspring
Copy link
Collaborator Author

works nicer now. thanks for fixing @floriankrb @b8raoult

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant