Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better attr diff for testing.assert_identical #8400

Merged
merged 5 commits into from
Nov 4, 2023

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Nov 2, 2023

  • User visible changes (including notable bug fixes) are documented in whats-new.rst

This gives us better reprs where only differing attributes are shown in the diff.

On main:

...
Differing coordinates:
L * x        (x) <U1 'a' 'b'
    foo: bar
    same: same
R * x        (x) <U1 'a' 'c'
    source: 0
    foo: baz
    same: same
...

With this PR:

Differing coordinates:
L * x        (x) %cU1 'a' 'b'
    Differing variable attributes:
        foo: bar
R * x        (x) %cU1 'a' 'c'
    Differing variable attributes:
        source: 0
        foo: baz

@dcherian dcherian marked this pull request as ready for review November 2, 2023 04:34
xarray/core/formatting.py Outdated Show resolved Hide resolved
@benbovy
Copy link
Member

benbovy commented Nov 2, 2023

I think anything is welcome to improve the experience of seeing what exactly differ between two variables. However if this PR is treating the attributes only (IIUC) I find the diff could be potentially misleading.

For variables with many attributes this is useful but in general I'm concerned that users who are not aware of this behavior will struggle figuring out why the variable reprs shown here differ from their full repr.

Alternatively, couldn't we "highlight" the differing attributes but still show the full variable reprs? E.g.,

Differing coordinates:
L * x        (x) <U1 'a' 'b'
    foo: bar
    ^^^^^^^^
    same: same
R * x        (x) <U1 'a' 'c'
    source: 0
    ^^^^^^^^^
    foo: baz
    ^^^^^^^^
    same: same

It requires a bit more work to implement, but this could nicely be generalized to other differing parts like the name, dimensions, labels, etc.:

Differing coordinates:
L * x        (x) <U1 'a' 'b'
                      ^^^^^
    foo: bar
    ^^^^^^^^
    same: same
R * x        (x) <U1 'a' 'c'
                      ^^^^^
    source: 0
    ^^^^^^^^^
    foo: baz
    ^^^^^^^^
    same: same

(not sure about the best way to highlight those parts, though)

@dcherian
Copy link
Contributor Author

dcherian commented Nov 2, 2023

Alternatively, couldn't we "highlight" the differing attributes but still show the full variable reprs?

Good idea but it doesn't scale to large datasets. Here's an example I'm looking at, with only one data variable:

E       AssertionError: Left and right Dataset objects are not identical
E       
E       Differing coordinates:
E       L * longitude   (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
E           units: degrees_east
E           standard_name: longitude
E           long_name: longitude
E       R * longitude   (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
E           long_name: longitude
E           standard_name: longitude
E           units: degrees_east
E       L   surface     float64 ...
E       R   surface     float64 ...
E       L   time        datetime64[ns] 2023-06-29
E           long_name: initial time of forecast
E           standard_name: forecast_reference_time
E       R   time        datetime64[ns] ...
E           long_name: initial time of forecast
E           standard_name: forecast_reference_time
E       L   valid_time  datetime64[ns] ...
E           standard_name: time
E           long_name: time
E       R   valid_time  datetime64[ns] ...
E           long_name: time
E           standard_name: time
E       L * latitude    (latitude) float64 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
E           units: degrees_north
E           standard_name: latitude
E           long_name: latitude
E       R * latitude    (latitude) float64 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
E           long_name: latitude
E           standard_name: latitude
E           units: degrees_north
E       L   step        timedelta64[ns] ...
E           long_name: time since forecast_reference_time
E           standard_name: forecast_period
E       R   step        timedelta64[ns] ...
E           long_name: time since forecast_reference_time
E           standard_name: forecast_period
E       Differing data variables:
E       L   tp          (latitude, longitude) float32 ...
E           GRIB_paramId: 228228
E           GRIB_dataType: fc
E           GRIB_numberOfPoints: 1038240
E           GRIB_typeOfLevel: surface
E           GRIB_stepUnits: 1
E           GRIB_stepType: accum
E           GRIB_gridType: regular_ll
E           GRIB_NV: 0
E           GRIB_Nx: 1440
E           GRIB_Ny: 721
E           GRIB_cfName: unknown
E           GRIB_cfVarName: unknown
E           GRIB_gridDefinitionDescription: Latitude/longitude. Also called equidista...
E           GRIB_iDirectionIncrementInDegrees: 0.25
E           GRIB_iScansNegatively: 0
E           GRIB_jDirectionIncrementInDegrees: 0.25
E           GRIB_jPointsAreConsecutive: 0
E           GRIB_jScansPositively: 0
E           GRIB_latitudeOfFirstGridPointInDegrees: 90.0
E           GRIB_latitudeOfLastGridPointInDegrees: -90.0
E           GRIB_longitudeOfFirstGridPointInDegrees: 0.0
E           GRIB_longitudeOfLastGridPointInDegrees: 359.75
E           GRIB_missingValue: 3.4028234663852886e+38
E           GRIB_name: Total Precipitation
E           GRIB_shortName: tp
E           GRIB_units: kg m**-2
E           long_name: Total Precipitation
E           units: kg m**-2
E           standard_name: unknown
E       R   tp          (latitude, longitude) float64 dask.array<chunksize=(721, 1440), meta=np.ndarray>
E           GRIB_NV: 0
E           GRIB_Nx: 1440
E           GRIB_Ny: 721
E           GRIB_cfName: unknown
E           GRIB_cfVarName: unknown
E           GRIB_dataType: fc
E           GRIB_gridDefinitionDescription: Latitude/longitude. Also called equidista...
E           GRIB_gridType: regular_ll
E           GRIB_iDirectionIncrementInDegrees: 0.25
E           GRIB_iScansNegatively: 0
E           GRIB_jDirectionIncrementInDegrees: 0.25
E           GRIB_jPointsAreConsecutive: 0
E           GRIB_jScansPositively: 0
E           GRIB_latitudeOfFirstGridPointInDegrees: 90.0
E           GRIB_latitudeOfLastGridPointInDegrees: -90.0
E           GRIB_longitudeOfFirstGridPointInDegrees: 0.0
E           GRIB_longitudeOfLastGridPointInDegrees: 359.75
E           GRIB_missingValue: 3.4028234663852886e+38
E           GRIB_name: Total Precipitation
E           GRIB_numberOfPoints: 1038240
E           GRIB_paramId: 228228
E           GRIB_shortName: tp
E           GRIB_stepType: accum
E           GRIB_stepUnits: 1
E           GRIB_typeOfLevel: surface
E           GRIB_units: kg m**-2
E           long_name: Total Precipitation
E           standard_name: unknown
E           units: kg m**-2

I'll update to add "Differing attributes" to make it clear that we aren't printing all attributes:

Differing coordinates:
L * x        (x) %cU1 'a' 'b'
    Differing attributes: foo: bar
R * x        (x) %cU1 'a' 'c'
    Differing attributes: source: 0
                          foo: baz

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

@dcherian dcherian added plan to merge Final call for comments and removed plan to merge Final call for comments labels Nov 2, 2023
@dcherian dcherian merged commit 477f8cf into pydata:main Nov 4, 2023
34 checks passed
@dcherian dcherian deleted the better-diff-repr branch November 4, 2023 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants