Performance regression from `findall` depwarn suggestion #26183

sbromberger · 2018-02-23T21:33:41Z

Prompted by this deprecation warning:

 Warning: In the future `findall(A)` will only work on boolean collections. Use `findall(x->x!=0, A)` instead.

I decided to do a (n admittedly worst-case) test: A = rand(10_000, 10_000);

julia> @time u = findall(x->x != 0, A);            # this assumes 0 = zero(eltype(A))
  8.606016 seconds (65.73 k allocations: 260.716 MiB, 2.96% gc time)

julia> @time v = findall(!iszero, A);
 12.464981 seconds (40 allocations: 1.531 GiB, 4.27% gc time)

julia> @time w = findall((!iszero).(A));
  1.531174 seconds (23 allocations: 1.502 GiB, 12.50% gc time)

julia> u == v == w
true

Had I taken the recommended approach, I would've run into a pretty bad performance regression. As it stands, the best (time) performing approach is among the worst in memory usage; the recommended approach with the anonymous function is bad in time but best in memory, and the negative function is worst in both.

The text was updated successfully, but these errors were encountered:

mbauman · 2018-02-23T21:42:48Z

The difference here is that findall(::BitArray) counts the true values before constructing the output. findall(f, A) is extremely simple and naive — it just grows on demand and reallocates as necessary:

findall(testf::Function, A) = collect(first(p) for p in pairs(A) if testf(last(p)))

We may want to add some heuristics here — perhaps a sizehint! based upon the density of the beginning of the array or some such. Or maybe add a counting pass like the old find did.

nalimilan · 2018-02-23T22:22:02Z

It's very hard to decide in full generality what the density of true values is going to be. It could make sense to add a sizehint argument, but trying to guess sounds doomed to fail: true and false values may well be grouped, in which case guessing based on the first results would be completely off.

Adding a counting pass could be efficient for functions known to be cheap, like equalto, but doing that in general could be counter-productive. Maybe we could change the deprecation to recommend equalto, and add a specialized method for ::EqualTo, or maybe just EqualTo{T} for some bits types T (since even a comparison could be expensive for some types, e.g. collections). But strictly speaking the deprecation should use == since -0.0 was considered as true (which is probably why #23812 used == in the first place).

There are also some chances that the generic implementation could be made faster by improving push! (#24909).

I'm surprised by the timings you posted. On 0.7 the two first approaches take about the same time (though the number of allocations is lower with iszero for a weird reason and the total allocated size is similar), while using broadcasting first is about twice faster. EDIT: on 0.6 the two first approaches are also equally fast, and the broadcast one is 50% slower.

StefanKarpinski · 2018-02-23T22:38:29Z

If you sample a random set of points in the matrix then you do actually get an accurate estimate regardless of matrix structure. Doing that would require evaluating points out of order, however.

laborg · 2022-02-13T13:34:29Z

I don't see the reported performance problem anymore, but there seems to be a regression from 1.7-> master:

# on Julia 1.7.1 
julia> A = rand(10_000, 10_000);

julia> @btime findall(x->x != 0, A);
  476.016 ms (8 allocations: 1.50 GiB)

julia> @btime findall(!iszero, A);
  470.884 ms (8 allocations: 1.50 GiB)

julia> @btime findall((!iszero).(A));
  469.952 ms (9 allocations: 1.50 GiB)

and

#master Version 1.8.0-DEV.1526 (2022-02-13)
julia> A = rand(10_000, 10_000);

julia> @btime findall(x->x != 0, A);
  505.472 ms (8 allocations: 1.50 GiB)

julia> @btime findall(!iszero, A);
  499.844 ms (8 allocations: 1.50 GiB)

julia> @btime findall((!iszero).(A));
  497.787 ms (9 allocations: 1.50 GiB)

vtjnash · 2022-02-13T20:58:16Z

Duplicate of #42187

vtjnash · 2022-02-13T20:59:14Z

I realize this is the older issue, but this function has changed several time since, so I think that is the more accurate one

mbauman added the performance Must go faster label Feb 23, 2018

nalimilan added the search & find The find* family of functions label Feb 23, 2018

JeffBezanson changed the title ~~Performance regression based on depwarn suggestion~~ Performance regression from findall depwarn suggestion Mar 13, 2018

vtjnash marked this as a duplicate of #42187 Feb 13, 2022

vtjnash closed this as completed Feb 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression from `findall` depwarn suggestion #26183

Performance regression from `findall` depwarn suggestion #26183

sbromberger commented Feb 23, 2018 •

edited by yuyichao

Loading

mbauman commented Feb 23, 2018 •

edited

Loading

nalimilan commented Feb 23, 2018 •

edited

Loading

StefanKarpinski commented Feb 23, 2018

laborg commented Feb 13, 2022

vtjnash commented Feb 13, 2022

vtjnash commented Feb 13, 2022

Performance regression from findall depwarn suggestion #26183

Performance regression from findall depwarn suggestion #26183

Comments

sbromberger commented Feb 23, 2018 • edited by yuyichao Loading

mbauman commented Feb 23, 2018 • edited Loading

nalimilan commented Feb 23, 2018 • edited Loading

StefanKarpinski commented Feb 23, 2018

laborg commented Feb 13, 2022

vtjnash commented Feb 13, 2022

vtjnash commented Feb 13, 2022

Performance regression from `findall` depwarn suggestion #26183

Performance regression from `findall` depwarn suggestion #26183

sbromberger commented Feb 23, 2018 •

edited by yuyichao

Loading

mbauman commented Feb 23, 2018 •

edited

Loading

nalimilan commented Feb 23, 2018 •

edited

Loading