Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix extrema(x; dims) for inputs with NaN/missing #43604

Merged
merged 25 commits into from
Jan 18, 2022
Merged

Conversation

N5N3
Copy link
Member

@N5N3 N5N3 commented Dec 30, 2021

The original goal of this PR is to Fix #43599. But I find some more cases whose result is inconsistent with the definition.
The behaviour changes are summarized as follows:

  • extrema(x; dims) now respects -0,0/0.0 (i.e. extrema([-0.0;0.0]; dims = 1) == [(-0.0, 0.0)])
  • extrema(x; dims) now respects NaN
  • extrema(x; dims) can handle inputs with missing
  • extrema(x; dims) disallows empty reduction like minimun and maximum.
  • minimum(x) and maximum(x) give correct result if the input contains both NaN and missing

The last one is used in test so I fix it in this PR by limiting the related optimizaiton to Float only cases. (see 76637c7)

@N5N3 N5N3 force-pushed the extrama branch 5 times, most recently from 19ce317 to 66d2a81 Compare December 31, 2021 10:11
@N5N3 N5N3 force-pushed the extrama branch 6 times, most recently from e09c63d to e015b3e Compare January 5, 2022 01:14
@N5N3
Copy link
Member Author

N5N3 commented Jan 5, 2022

@tkf is there a strong objection for #36265?
I think it should be good to merge that first, and rebase this one.

@JeffBezanson
Copy link
Member

I like #36265, but it doesn't seem to solve the issue with missing?

@N5N3
Copy link
Member Author

N5N3 commented Jan 6, 2022

I like #36265, but it doesn't seem to solve the issue with missing?

I missed the following:

julia/base/multidimensional.jl

Lines 1754 to 1762 in 5669d63

function _extrema_dims(f::F, A::AbstractArray, dims, ::_InitialValue) where {F}
sz = size(A)
for d in dims
sz = setindex(sz, 1, d)
end
T = promote_op(f, eltype(A))
B = Array{Tuple{T,T}}(undef, sz...)
return extrema!(f, B, A)
end

So it seems that PR also didn't solve NaN.

But the _DupY(f) design looks good to me, and it seems reasonable to use it in this PR:

julia> @btime extrema($a)
  1.222 μs (0 allocations: 0 bytes)
(-3.8158492176948755, 4.389900419602923)

julia> @btime extrema(Float64,$a)
  616.500 μs (12298 allocations: 320.52 KiB)
(-3.8158492176948755, 4.389900419602923)

N5N3 and others added 5 commits January 16, 2022 15:51
This optimization gives wrong result, if `NaN` and `missing` both exist.

add missing test
Add TODO

apply suggestions

Update NEWS.md

Co-Authored-By: Takafumi Arakaki <[email protected]>
Co-Authored-By: Takafumi Arakaki <[email protected]>
Mark `BigInt` as broken
Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (other than one nitpick and that we need to update SparseArrays)

base/reducedim.jl Outdated Show resolved Hide resolved
@tkf
Copy link
Member

tkf commented Jan 17, 2022

@N5N3
Copy link
Member Author

N5N3 commented Jan 17, 2022

Local test shows no problem

$ make -C test SparseArrays
make: Entering directory '/cygdrive/c/Users/MYJ/Documents/GitHub/julia/test'
    JULIA test/SparseArrays
Test                    (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
SparseArrays/sparsevector    (4) |        started at 2022-01-17T01:38:57.142
SparseArrays/sparse          (3) |        started at 2022-01-17T01:38:57.262
SparseArrays/higherorderfns  (2) |        started at 2022-01-17T01:38:57.358
SparseArrays/higherorderfns  (2) |    94.64 |   2.66 |  2.8 |   11985.15 |   562.00
SparseArrays/sparsevector    (4) |   164.82 |   4.87 |  3.0 |   24421.82 |   876.65
SparseArrays/sparse          (3) |   263.28 |  23.08 |  8.8 |   27776.47 |   953.35

Test Summary: |  Pass  Broken  Total     Time
  Overall     | 21792       5  21797  4m26.2s
    SUCCESS

The above suggestion will be adopted together with the update of SparseArrays.

Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing a lot of fix requests!

@tkf tkf added the merge me PR is reviewed. Merge when all tests are passing label Jan 18, 2022
@tkf tkf merged commit aa1bd28 into JuliaLang:master Jan 18, 2022
@tkf tkf removed the merge me PR is reviewed. Merge when all tests are passing label Jan 18, 2022
@N5N3 N5N3 deleted the extrama branch January 18, 2022 23:26
N5N3 added a commit to N5N3/julia that referenced this pull request Jan 24, 2022
* Define `extrema` using `mapreduce`; support `init`

* Fix tests for SparseArrays

* API clean and export `extrema!`

* Re-implement `reducedim_init` for extrema

* Merge `master` to pull in JuliaSparse/SparseArrays.jl#63

* Mark `BigInt` tests as broken

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Simeon Schaub <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Tim Holy <[email protected]>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
* Define `extrema` using `mapreduce`; support `init`

* Fix tests for SparseArrays

* API clean and export `extrema!`

* Re-implement `reducedim_init` for extrema

* Merge `master` to pull in JuliaSparse/SparseArrays.jl#63

* Mark `BigInt` tests as broken

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Simeon Schaub <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Tim Holy <[email protected]>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
* Define `extrema` using `mapreduce`; support `init`

* Fix tests for SparseArrays

* API clean and export `extrema!`

* Re-implement `reducedim_init` for extrema

* Merge `master` to pull in JuliaSparse/SparseArrays.jl#63

* Mark `BigInt` tests as broken

Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Simeon Schaub <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Tim Holy <[email protected]>
@greimel
Copy link

greimel commented Apr 1, 2022

Sorry if it's not appropriate to comment on this long-merged PR.

In the docstring it says

The value returned for empty itr can be specified by init. It must be a 2-tuple whose
first and second elements are neutral elements for min and max respectively
(i.e. which are greater/less than or equal to any other element).

However, one very important use-case of this PR could be to compute extrema iteratively (e.g. update bounds as new data come in).

julia> x = 0:10
0:10

julia> y = -1:5
-1:5

julia> extrema(x)
(0, 10)

julia> extrema(y)
(-1, 5)

julia> extrema(y, init=extrema(x)) == extrema(x  y) == (-1, 10)
true

The docstring makes me wonder if this is a valid use of the function, because it says (i.e. which are greater/less than or equal to any other element). That is, can I be sure that this condition will be checked in the future and give an error if it's not satisfied?

Would you be open to a PR that adds my example to the docstring and the tests?

@N5N3
Copy link
Member Author

N5N3 commented Apr 1, 2022

I believe @tkf followed minimum/maximum's docstring, as for maximum

  The value returned for empty itr can be specified by init. It must be a neutral element for max (i.e. which is less than or equal to any other element) as it is       
  unspecified whether init is used for non-empty collections.

And we have the following in sum's docstring

  The value returned for empty itr can be specified by init. It must be the additive identity (i.e. zero) as it is unspecified whether init is used for non-empty        
  collections.

However, your use case is valid (and intuitive) under current implemenation.
If we want to clarify it, then we'd better also modify other reductions' docstring to keep consistancy.
Anyway, PR (with goodwill) is always welcome.

@greimel
Copy link

greimel commented Apr 1, 2022

Great! I've prepared #44819 with updates for minimum, maximum and extrema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extrema(x;dims::Int) behavior with NaN
5 participants