-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very poor html repr performance on large multi-indexes #5529
Comments
I think it's some lazy calculation that kicks in. Because I can reproduce using np.asarray. import numpy as np
import xarray as xr
ds = xr.tutorial.load_dataset("air_temperature")
da = ds["air"].stack(z=[...])
coord = da.z.variable.to_index_variable()
# This is very slow:
a = np.asarray(coord)
da._repr_html_() |
Yes, I think it's materializing the multiindex as an array of tuples. Which we definitely shouldn't be doing for reprs. @Illviljan nice profiling view! What is that? |
One way of solving it could be to slice the arrays to a smaller size but still showing the same repr. Because I'm using https://github.com/spyder-ide/spyder for the profiling and general hacking. |
Yes very much so @Illviljan . But weirdly the linked PR is attempting to do that — so maybe this code path doesn't hit that change? Spyder's profiler looks good! |
I think the linked PR only fixed the summary (inline) repr. The bottleneck here is when formatting the array detailed view for the multi-index coordinates, which triggers the conversion of the whole pandas MultiIndex (tuple elements) and each of its levels as a numpy arrays. |
What happened:
We have catestrophic performance on the html repr of some long multi-indexed data arrays. Here's a case of it taking 12s.
Minimal Complete Verifiable Example:
Anything else we need to know?:
I thought we'd fixed some issues here: https://github.com/pydata/xarray/pull/4846/files
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.10 (default, May 9 2021, 13:21:55)
[Clang 12.0.5 (clang-1205.0.22.9)]
python-bits: 64
OS: Darwin
OS-release: 20.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 0.18.2
pandas: 1.2.4
numpy: 1.20.3
scipy: 1.6.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.8.3
cftime: 1.4.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.3
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.06.1
distributed: 2021.06.1
matplotlib: 3.4.2
cartopy: None
seaborn: 0.11.1
numbagg: 0.2.1
pint: None
setuptools: 56.0.0
pip: 21.1.2
conda: None
pytest: 6.2.4
IPython: 7.24.0
sphinx: 4.0.1
The text was updated successfully, but these errors were encountered: