Long import time #6726

leroyvn · 2022-06-25T07:01:18Z

What is your issue?

Importing the xarray package takes a significant amount of time. For instance:

❯ time python -c "import xarray"
python -c "import xarray"  1.44s user 0.52s system 132% cpu 1.476 total

compared to others

❯ time python -c "import pandas"
python -c "import pandas"  0.45s user 0.35s system 177% cpu 0.447 total

❯ time python -c "import scipy"
python -c "import scipy"  0.29s user 0.23s system 297% cpu 0.175 total

❯ time python -c "import numpy"
python -c "import numpy"  0.29s user 0.43s system 313% cpu 0.229 total

❯ time python -c "import datetime"
python -c "import datetime"  0.05s user 0.00s system 99% cpu 0.051 total

I am obviously not surprised that importing xarray takes longer than importing Pandas, Numpy or the datetime module, but 1.5 s is something you clearly notice when it is done e.g. by a command-line application.

I inquired about import performance and found out about a lazy module loader proposal by the Scientific Python community. AFAIK SciPy uses a similar system to populate its namespaces without import time penalty. Would it be possible for xarray to use delayed imports when relevant?

The text was updated successfully, but these errors were encountered:

mathause · 2022-06-25T15:10:03Z

Thanks for the report. I think one resaon is that we import all the io libraries non-lazy (I think since the backend refactor). And many of the dependecies still use pkg_resources instead of importlib.metadata (which is considetably slower).

We'd need to take a look at the lazy loader.

headtr1ck · 2022-06-25T18:04:43Z

Useful for debugging:
python -X importtime -c "import xarray"

mathause · 2022-07-30T19:12:06Z

I just had another look at this using

python -X importtime -c "import llvmlite" 2> import.log

and tuna for the visualization.

pseudoNETCDF adds quite some overhead, but I think only few people have this installed (could be made faster, but not sure if worth it)
llmvlite (required by numba) seems the last dependency relying on pkg_resources but this is fixed in the new version which should be out soonish
dask recently merged a PR that avoids a slow import Only import IPython if type checking dask/dask#9230 (from which we should profit)

This should bring it down a bit by another 0.25 s, but I agree it would be nice to have it even lower.

eendebakpt · 2022-08-23T10:17:46Z

Some other projects are considering lazy imports as well: https://scientific-python.org/specs/spec-0001/

headtr1ck · 2022-09-25T21:05:38Z

I think we could rework our backend solution to do the imports lazy:
To check if a file might be openable via some backend we usually do not need to import its dependency module.

headtr1ck · 2022-09-28T21:37:30Z

I just checked, many backends are importing their external dependencies at module level with a try-except block.
This could be replaced by importlib.util.find_spec.

However, many backends also check for ImportErrors (not ModuleNotFoundError) that occur when a library is not correctly installed. I am not sure if in this case the backend should simply be disabled like it is now (At least cfgrib is raising a warning instead)?
Would it be a problem if this error is only appearing when actually trying to open a file? If that is the case, we could move to lazy external lib loading for the backends.

Not sure how much it actually saves, but should be ~0.2s (at least on my machine, but depends on the number of intalled backends, the fewer are installed the faster the import should be).

dcherian · 2022-10-03T16:27:42Z

his could be replaced by importlib.util.find_spec.

Nice. Does it work on python 3.8?

However, many backends also check for ImportErrors (not ModuleNotFoundError) that occur when a library is not correctly installed. I am not sure if in this case the backend should simply be disabled like it is now (At least cfgrib is raising a warning instead)?

Would it be a problem if this error is only appearing when actually trying to open a file

Sounds OK to error when trying to use the backend.

headtr1ck · 2022-10-03T16:30:25Z

Nice. Does it work on python 3.8?

according to the docu it exists since 3.4.

hmaarrfk · 2022-10-16T22:33:17Z

In developing #7172, there are also some places where class types are used to check for features:
https://github.com/pydata/xarray/blob/main/xarray/core/pycompat.py#L35

Dask and sparse and big contributors due to their need to resolve the class name in question.

Ultimately. I think it is important to maybe constrain the problem.

Are we ok with 100 ms over numpy + pandas? 20 ms?

On my machines, the 0.5 s that xarray is close to seems long... but everytime I look at it, it seems to "just be a python problem".

leroyvn added the needs triage Issue that has not been reviewed by xarray team member label Jun 25, 2022

mathause added topic-internals dependencies Pull requests that update a dependency file and removed needs triage Issue that has not been reviewed by xarray team member labels Jun 25, 2022

mathause mentioned this issue Sep 12, 2022

Remove dask_array_type checks #7023

Merged

4 tasks

headtr1ck mentioned this issue Oct 6, 2022

import xarray causes fatal python crash on windows when h5netcdf and netcdf4 are installed #7136

Closed

4 tasks

headtr1ck mentioned this issue Oct 16, 2022

Lazy import dask.distributed to reduce import time of xarray #7172

Merged

4 tasks

headtr1ck mentioned this issue Oct 17, 2022

Add import ASV benchmark #7176

Merged

benjeffery mentioned this issue Oct 17, 2022

import sgkit takes ~1.5s sgkit-dev/sgkit#931

Open

headtr1ck mentioned this issue Oct 17, 2022

Lazy Imports #7179

Merged

4 tasks

dcherian closed this as completed in #7179 Oct 28, 2022

Illviljan mentioned this issue Oct 8, 2023

Add min_weight param to rolling_exp functions #8285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long import time #6726

Long import time #6726

leroyvn commented Jun 25, 2022

mathause commented Jun 25, 2022

headtr1ck commented Jun 25, 2022

mathause commented Jul 30, 2022 •

edited

Loading

eendebakpt commented Aug 23, 2022

headtr1ck commented Sep 25, 2022

headtr1ck commented Sep 28, 2022 •

edited

Loading

dcherian commented Oct 3, 2022

headtr1ck commented Oct 3, 2022

hmaarrfk commented Oct 16, 2022

Long import time #6726

Long import time #6726

Comments

leroyvn commented Jun 25, 2022

What is your issue?

mathause commented Jun 25, 2022

headtr1ck commented Jun 25, 2022

mathause commented Jul 30, 2022 • edited Loading

eendebakpt commented Aug 23, 2022

headtr1ck commented Sep 25, 2022

headtr1ck commented Sep 28, 2022 • edited Loading

dcherian commented Oct 3, 2022

headtr1ck commented Oct 3, 2022

hmaarrfk commented Oct 16, 2022

mathause commented Jul 30, 2022 •

edited

Loading

headtr1ck commented Sep 28, 2022 •

edited

Loading