Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax #7014

Closed
4 tasks done
ghiggi opened this issue Sep 9, 2022 · 10 comments
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@ghiggi
Copy link

ghiggi commented Sep 9, 2022

What happened?

If cmap.set_over is specified, the array color mapping and the colorbar behave badly if the array does not contain values above the norm.vmax.

Let's take an array and apply a colormap and norm (see code below)
image
Now, if in the array I change the array values larger than the norm.vmax (the 2 bottom right pixels) with other values inside the norm:

  • Using matplotlib I get the expected results
    image
  • Using xarray I get this weird behavior.
    image

What did you expect to happen?

The colorbar should not "shift" and the array should be colormapped correctly
This is possibily related also to #4061

Minimal Complete Verifiable Example

import matplotlib.colors
import numpy as np
import xarray as xr 
import matplotlib as mpl 
import matplotlib.pyplot as plt

# Define DataArray 
arr = np.array([[0, 10, 15, 20],
                [ np.nan, 40, 50, 100], 
                [150, 158, 160, 161],
               ])
lon = np.arange(arr.shape[1])
lat = np.arange(arr.shape[0])[::-1]
lons, lats = np.meshgrid(lon, lat)
da = xr.DataArray(arr,
                  dims=["y", "x"],
                  coords={"lon": (("y","x"), lons),
                          "lat": (("y","x"), lats),
                          }
                  )
da

# Define colormap  
color_list = ["#9c7e94", "#640064",   "#009696", "#C8FF00",  "#FF7D00"]
levels =  [0.05, 1, 10, 20, 150, 160]
cmap = mpl.colors.LinearSegmentedColormap.from_list("cmap", color_list, len(levels) - 1)
norm = mpl.colors.BoundaryNorm(levels, cmap.N)
cmap.set_over("darkred")   # color for above 160
cmap.set_under("none")     # color for below 0.05
cmap.set_bad("gray", 0.2)  # color for nan 

# Define colorbar settings 
ticks = levels 
cbar_kwargs = {     
    'extend': "max",  
     
}    

# Correct plot 
p = da.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs)
plt.show()

# Remove values larger than the norm.vmax level
da1 = da.copy() 
da1.data[da1.data>=norm.vmax] = norm.vmax - 1 # could be replaced with any value inside the norm 

# With matplotlib.pcolormesh [OK]
p = plt.pcolormesh(da1["lon"].data, 
                   da1["lat"],
                   da1.data, 
                   cmap=cmap, norm=norm)
plt.colorbar(p, **cbar_kwargs)
plt.show()

# With matplotlib.imshow [OK]
p = plt.imshow(da1.data, 
               cmap=cmap, norm=norm)
plt.colorbar(p, **cbar_kwargs)
plt.show()

# With xarray.pcolormesh [BUG]
# --> The colorbar shift !!!  
da1.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs)
plt.show()

# With xarray.imshow [BUG]
# --> The colorbar shift !!!
da1.plot.imshow(cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs, origin="upper")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.22.4
scipy: 1.9.0
netCDF4: 1.6.0
pydap: None
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.12.0
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.0
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.7.1
distributed: 2022.7.1
matplotlib: 3.5.2
cartopy: 0.20.3
seaborn: 0.11.2
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: 0.19.2
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.3.0
pip: 22.2.2
conda: None
pytest: None
IPython: 7.33.0
sphinx: 5.1.1
/home/ghiggi/anaconda3/envs/gpm_geo/lib/python3.9/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

@ghiggi ghiggi added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 9, 2022
@ghiggi ghiggi changed the title xarray imshow and pcolormesh behave badly when the array does not contain "over" values of the BoundaryNorm xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax Sep 9, 2022
@headtr1ck
Copy link
Collaborator

It seems that the cmap gets replaced by xarray in

cmap_params, cbar_kwargs = _process_cmap_cbar_kwargs(

which calls
def _process_cmap_cbar_kwargs(

which then calls
def _determine_cmap_params(

and this method has some special casing of BoundaryNorm that will create a new cmap

Not sure what all this code is supposed to do, though...
But hopefully these findings can help someone with better understanding to debug this :)

@veenstrajelmer
Copy link
Contributor

Is there any update on this issue? I have been running into the same problem recently and am happy to see that this issue was already recognized by others.

@kmuehlbauer
Copy link
Contributor

I'm still struggling to understand that whole workflow and it looks like this is no easy way of getting to the bottom of this.

But adding kwarg extend="max" to the calls to pcolormesh/imshow fixes this.

The docstring gives a hint:

extend : {'neither', 'both', 'min', 'max'}, optional
    How to draw arrows extending the colorbar beyond its limits. If not
    provided, ``extend`` is inferred from ``vmin``, ``vmax`` and the data limits.

So in first case everything works nice since extend is derived from vmin/vmax and data limits. In the second case, extend is obviously not set to the expected value (as in cbar_kwargs) as we've tampered with the maximum data values.

Not sure if aligning extend and cbar_kwargs-extend is already the solution.

@Huite
Copy link
Contributor

Huite commented Feb 1, 2023

Debugging this, @headtr1ck points correctly to _determine_cmap_params:

    if levels is not None or isinstance(norm, mpl.colors.BoundaryNorm):
        cmap, newnorm = _build_discrete_cmap(cmap, levels, extend, filled)
        norm = newnorm if norm is None else norm

The problem lies in the second line. In _build_discrete_cmap, a new cmap is returned with a different number of levels. However, if the original norm is not None, you end up with a mismatch, as the old norm expects the old cmap.

What then happens, is that the norm calls into the cmap. Calling into cmap doesn't do any checks whether the value is larger than N, it just takes the highest available value. The examples in #4061 show this quite clearly, but to illustrate:

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np

data = np.arange(100).reshape((10, 10))
cmap = mpl.cm.get_cmap("viridis")
print(cmap.N)   # 256
boundaries = [0, 25, 50, 75, 100]
norm = mpl.colors.BoundaryNorm(boundaries, cmap.N)
fig, ax = plt.subplots()
ax.imshow(data, norm=norm, cmap=cmap)

# %%

colors = [cmap(i/255) for i in np.linspace(0, cmap.N, len(boundaries) - 1)]
new_cmap, new_norm = mpl.colors.from_levels_and_colors(boundaries, colors)
print(new_cmap.N)  # 4
fig, ax = plt.subplots()
ax.imshow(data, norm=new_norm, cmap=new_cmap)

# %%
# Mismatched

fig, ax = plt.subplots()
ax.imshow(data, norm=norm, cmap=new_cmap)

# %%

This is avoided here by removing the conditional in the second line, or just making sure both cmap and norm are replaced by their new values:

    if levels is not None or isinstance(norm, mpl.colors.BoundaryNorm):
        cmap, norm = _build_discrete_cmap(cmap, levels, extend, filled)

Then, the cmap and norm remain in sync.

However, when running the tests in test_plot.py, this gives a single questionable(?) failure:

    def test_norm_sets_vmin_vmax(self) -> None:
        vmin = self.data.min()
        vmax = self.data.max()

        for norm, extend, levels in zip(
            [
                mpl.colors.Normalize(),
                mpl.colors.Normalize(),
                mpl.colors.Normalize(vmin + 0.1, vmax - 0.1),
                mpl.colors.Normalize(None, vmax - 0.1),
                mpl.colors.Normalize(vmin + 0.1, None),
            ],
            ["neither", "neither", "both", "max", "min"],
            [7, None, None, None, None],
        ):

            test_min = vmin if norm.vmin is None else norm.vmin
            test_max = vmax if norm.vmax is None else norm.vmax

            cmap_params = _determine_cmap_params(self.data, norm=norm, levels=levels)
            assert cmap_params["vmin"] is None
            assert cmap_params["vmax"] is None
            assert cmap_params["norm"].vmin == test_min
            assert cmap_params["norm"].vmax == test_max
            assert cmap_params["extend"] == extend
>           assert cmap_params["norm"] == norm
E           assert <matplotlib.colors.BoundaryNorm object at 0x000001C7A4CB07C0> == <matplotlib.colors.Normalize object at 0x000001C7A50C4760>

I don't understand why the conditional is there. At first sight, it doesn't make a lot of sense to create a new norm and cmap, but then take the original norm?

@veenstrajelmer
Copy link
Contributor

veenstrajelmer commented Feb 23, 2023

I just combined @Huite's suggestion with splitting the if-statement. This works for both solving the issue and keeping the testcases in test_plot.py green.

    if levels is not None:
        cmap, newnorm = _build_discrete_cmap(cmap, levels, extend, filled)
        norm = newnorm if norm is None else norm
    if isinstance(norm, mpl.colors.BoundaryNorm):
        cmap, norm = _build_discrete_cmap(cmap, levels, extend, filled)

This could replace this code:

if levels is not None or isinstance(norm, mpl.colors.BoundaryNorm):

However, a bit up in the code there is a if isinstance(norm, mpl.colors.BoundaryNorm) statement, which I guess could be combined.

if isinstance(norm, mpl.colors.BoundaryNorm):

    if isinstance(norm, mpl.colors.BoundaryNorm):
        levels = norm.boundaries

I think it is a potential solution nevertheless, but some help is appreciated with the last steps. Also since the case of @ghiggi seems not to be solved with this fix. It does solve #4061 though.

veenstrajelmer added a commit to veenstrajelmer/xarray that referenced this issue Feb 23, 2023
Proposing update to fix pydata#7014
veenstrajelmer added a commit to veenstrajelmer/xarray that referenced this issue Feb 23, 2023
added testcase for pydata#7014
This was referenced Feb 23, 2023
@veenstrajelmer
Copy link
Contributor

veenstrajelmer commented Feb 27, 2023

The related issues #4061 and Deltares/xugrid#49 are fixed by supplying levels=levels instead of norm=norm to ds.plot(), as suggested by #7553 (comment). However, I cannot judge if this also fixes the example from this issue, since the plots still all look different. However, commenting #da1.data[da1.data>=norm.vmax] = norm.vmax - 1 solves most of the differences. @ghiggi: could you check if this suggestion solves your issue indeed?

@ghiggi
Copy link
Author

ghiggi commented Feb 27, 2023

Thanks to all the people above that have started digging into the problem !
@veenstrajelmer : Adding the levels=levels argument (together with norm, ... or dropping norm) does not correct/change the output figure.
Of course commenting #da1.data[da1.data>=norm.vmax] = norm.vmax - 1 "solves" the issue, but this line of code is what enables to show up the bug, which is occurring when the array does not contain any value equal to or higher than norm.vmax

@veenstrajelmer
Copy link
Contributor

@ghiggi: If I understand it correctly, your issue/examplecode covers multiple issues. Since one subissue might be using norm instead levels, I would recommend trimming down your example code do only show the remaining vmax issue.

@ghiggi
Copy link
Author

ghiggi commented Feb 28, 2023

@veenstrajelmer I am not sure I understand what you are saying. In the example I pass only norm to the plotting function ...
As suggested by @kmuehlbauer the solution to this issue is to specify the extend argument in both the plotting call and cbar_kwargs.
In the example, extend was only defined in cbar_kwargs (since is an argument also of mpl.Figure.colorbar). Likely we should align the arguments somewhere in the code with something like this:

if extend is None and cbar_kwargs.get("extend) is not None: 
   extend = cbar_kwargs.get("extend) 
if extend is not None and "extend" not in  cbar_kwargs
   cbar_kwargs["extend"] = extend 

@veenstrajelmer
Copy link
Contributor

veenstrajelmer commented Mar 10, 2023

Thanks to @jklymak, there was an update in PR I created (#7553). @ghiggi with the code from this PR, your code shows identical plots (except for the first one, but that should be the case). Hopefully the PR can be merged somewhere soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

No branches or pull requests

5 participants