Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eagerly do boundscheck when indexing into CartesianIndices #42119

Merged
merged 7 commits into from
Sep 8, 2021

Conversation

johnnychen94
Copy link
Sponsor Member

@johnnychen94 johnnychen94 commented Sep 4, 2021

This PR eagerly does boundscheck when doing getindex(::CartesianIndices, ...), this generates simpler inlined codes and furthermore enables SIMD

julia> versioninfo()
Julia Version 1.8.0-DEV.473
Commit fce2daff16* (2021-09-04 09:37 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin20.6.0)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
function sumcart_oneto(X)
    out = zero(eltype(X))
    R = CartesianIndices(X) # OneTo
    @inbounds @simd for i in eachindex(X)
        out += X[R[i]]
    end
    return out
end

function sumcart_iur(X)
    out = zero(eltype(X))
    R = CartesianIndices(map(i->Base.IdentityUnitRange(1:i), size(X))) # IdentityUnitRange
    @inbounds @simd for i in eachindex(X)
        out += X[R[i]]
    end
    return out
end

function sumcart_unitrange(X)
    out = zero(eltype(X))
    R = CartesianIndices(map(i->1:i, size(X))) # UnitRange
    @inbounds @simd for i in eachindex(X)
        out += X[R[i]]
    end
    return out
end

X = rand(64, 64);

@btime sumcart_oneto($X);
# PR: 241.691 ns (0 allocations: 0 bytes)
# master: 10.526 μs (0 allocations: 0 bytes)
@btime sumcart_iur($X);
# PR: 236.788 ns (0 allocations: 0 bytes)
# master: 10.950 μs (0 allocations: 0 bytes)
@btime sumcart_unitrange($X);
# PR: 234.622 ns (0 allocations: 0 bytes)
# master: 10.529 μs (0 allocations: 0 bytes)

The earliest version that I obverse this performance regression is in Julia 1.4.0

A related benchmark:

function sumcart_linear(X)
    out = zero(eltype(X))
    @inbounds @simd for i in 1:length(X)
        out += X[i]
    end
    return out
end

@btime sumcart_linear($X);
# master: 219.778 ns (0 allocations: 0 bytes)

fixes #42115

@johnnychen94 johnnychen94 added performance Must go faster compiler:simd instruction-level vectorization labels Sep 4, 2021
Comment on lines 364 to 366
@inline _get_cartesianindex(::OneTo, i::Int) = i
@inline _get_cartesianindex(::Base.IdentityUnitRange, i::Int) = i
@inline _get_cartesianindex(r, i) = getindex(r, i)
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A perhaps more appropriate place to make modifications is adding @propagate_inbounds in

julia/base/range.jl

Lines 879 to 914 in 876df79

function getindex(v::UnitRange{T}, i::Integer) where T
@inline
i isa Bool && throw(ArgumentError("invalid index: $i of type Bool"))
val = convert(T, v.start + (i - 1))
@boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
val
end
const OverflowSafe = Union{Bool,Int8,Int16,Int32,Int64,Int128,
UInt8,UInt16,UInt32,UInt64,UInt128}
function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
@inline
i isa Bool && throw(ArgumentError("invalid index: $i of type Bool"))
val = v.start + (i - 1)
@boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
val % T
end
function getindex(v::OneTo{T}, i::Integer) where T
@inline
i isa Bool && throw(ArgumentError("invalid index: $i of type Bool"))
@boundscheck ((i > 0) & (i <= v.stop)) || throw_boundserror(v, i)
convert(T, i)
end
function getindex(v::AbstractRange{T}, i::Integer) where T
@inline
i isa Bool && throw(ArgumentError("invalid index: $i of type Bool"))
ret = convert(T, first(v) + (i - 1)*step_hp(v))
ok = ifelse(step(v) > zero(step(v)),
(ret <= last(v)) & (ret >= first(v)),
(ret <= first(v)) & (ret >= last(v)))
@boundscheck ((i > 0) & ok) || throw_boundserror(v, i)
ret
end

Copy link
Sponsor Member Author

@johnnychen94 johnnychen94 Sep 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root issue here is that throw_boundserror(v, i) in getindex(v::OneTo{T}, i::Integer) is marked as @noinline and thus disables the propagation of @inbounds

throw_boundserror(A, I) = (@noinline; throw(BoundsError(A, I)))

I now think the current version of this PR is a wrong fix; we can't unconditionally skip the boundscheck here.

@chriselrod Do you have ideas on how to bypass this boundscheck?

Copy link
Sponsor Member

@timholy timholy Sep 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why an error path would do this. But shouldn't the whole expression be in the @boundscheck? That is, instead of

@boundscheck ((i > 0) & ok) || throw_boundserror(v, i) 

you just need an extra set of parentheses:

@boundscheck(((i > 0) & ok) || throw_boundserror(v, i))

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this @boundscheck does get eliminated if we check individual getindex(::OneTo, i), but it doesn't get eliminated if we check getindex(::CartesianIndices, I...)

getindex_inbounds(r, i...) = @inbounds getindex(r, i...)
R = CartesianIndices((2, 3))
julia> @code_llvm getindex_inbounds(R.indices[1], 1)
;  @ REPL[18]:1 within `getindex_inbounds`
define i64 @julia_getindex_inbounds_2636([1 x i64]* nocapture nonnull readonly align 8 dereferenceable(8) %0, i64 signext %1) #0 {
top:
  ret i64 %1
}

julia> @code_llvm getindex_inbounds(R, 1, 1)
...

L12:                                              ; preds = %top
; │││││└└└
; │││││┌ @ broadcast.jl:642 within `_broadcast_getindex`
; ││││││┌ @ broadcast.jl:666 within `_getindex`
; │││││││┌ @ broadcast.jl:621 within `_broadcast_getindex`
; ││││││││┌ @ tuple.jl:29 within `getindex`
           %9 = getelementptr inbounds [1 x [2 x [1 x i64]]], [1 x [2 x [1 x i64]]]* %1, i64 0, i64 0, i64 0
; │││││└└└└
; │││││┌ @ broadcast.jl:643 within `_broadcast_getindex`
; ││││││┌ @ broadcast.jl:670 within `_broadcast_getindex_evalf`
; │││││││┌ @ range.jl:901 within `getindex`
          %10 = call nonnull {}* @j_throw_boundserror_2682([1 x i64]* nocapture nonnull readonly %9, i64 signext %2) #0
          call void @llvm.trap()
          unreachable
...

L26:                                              ; preds = %L18
          %17 = call nonnull {}* @j_throw_boundserror_2681([1 x i64]* nocapture readonly %11, i64 signext %3) #0
          call void @llvm.trap()
          unreachable
...

Notice the two throw_boundserror call. Thus I did an eager check so that we can pass @inbounds hint through the broadcasting/map. After 20c4be5:

julia> @code_llvm getindex_inbounds(R, 1, 1)
;  @ REPL[18]:1 within `getindex_inbounds`
define void @julia_getindex_inbounds_2685([1 x [2 x i64]]* noalias nocapture sret([1 x [2 x i64]]) %0, [1 x [2 x [1 x i64]]]* nocapture nonnull readonly align 8 dereferenceable(16) %1, i64 signext %2, i64 signext %3) #0 {
top:
  %.sroa.0.sroa.0.0..sroa.0.0..sroa_cast4.sroa_idx = getelementptr inbounds [1 x [2 x i64]], [1 x [2 x i64]]* %0, i64 0, i64 0, i64 0
  store i64 %2, i64* %.sroa.0.sroa.0.0..sroa.0.0..sroa_cast4.sroa_idx, align 8
  %.sroa.0.sroa.2.0..sroa.0.0..sroa_cast4.sroa_idx8 = getelementptr inbounds [1 x [2 x i64]], [1 x [2 x i64]]* %0, i64 0, i64 0, i64 1
  store i64 %3, i64* %.sroa.0.sroa.2.0..sroa.0.0..sroa_cast4.sroa_idx8, align 8
  ret void
}

Copy link
Contributor

@Tokazama Tokazama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this!

base/multidimensional.jl Outdated Show resolved Hide resolved
@johnnychen94 johnnychen94 changed the title specialize getindex for CartesianIndices of type OneTo and IdentityUnitRange eagerly do boundscheck when indexing into CartesianIndices Sep 5, 2021
Copy link
Sponsor Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you add a test in test/boundscheck_exec.jl?

@codecov
Copy link

codecov bot commented Sep 5, 2021

Codecov Report

Merging #42119 (8812c5c) into master (876df79) will increase coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head 8812c5c differs from pull request most recent head d58ce6a. Consider uploading reports for the commit d58ce6a to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #42119      +/-   ##
==========================================
+ Coverage   79.62%   79.66%   +0.03%     
==========================================
  Files         351      351              
  Lines       77565    77567       +2     
==========================================
+ Hits        61759    61790      +31     
+ Misses      15806    15777      -29     
Impacted Files Coverage Δ
stdlib/Profile/src/Profile.jl 89.15% <100.00%> (-0.96%) ⬇️
base/errorshow.jl 87.38% <0.00%> (-1.67%) ⬇️
base/show.jl 66.31% <0.00%> (-0.06%) ⬇️
stdlib/LinearAlgebra/src/triangular.jl 97.30% <0.00%> (-0.06%) ⬇️
stdlib/LinearAlgebra/src/bidiag.jl 96.96% <0.00%> (ø)
stdlib/Test/src/Test.jl 64.37% <0.00%> (+0.04%) ⬆️
stdlib/SparseArrays/src/sparsevector.jl 95.19% <0.00%> (+0.08%) ⬆️
stdlib/SparseArrays/src/sparsematrix.jl 95.53% <0.00%> (+0.09%) ⬆️
base/compiler/typeinfer.jl 72.29% <0.00%> (+0.18%) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 876df79...d58ce6a. Read the comment docs.

Comment on lines 373 to 385
@inline function Base.getindex(iter::CartesianIndices{N,R},
I::Vararg{Union{OrdinalRange{<:Integer, <:Integer}, Colon}, N}) where {N,R}
CartesianIndices(getindex.(iter.indices, I))
@boundscheck checkbounds(iter, I...)
indices = map(iter.indices, I) do r, i
@inbounds getindex(r, i)
end
CartesianIndices(indices)
end
@propagate_inbounds function Base.getindex(iter::CartesianIndices{N},
C::CartesianIndices{N}) where {N}
CartesianIndices(getindex.(iter.indices, C.indices))
getindex(iter, C.indices...)
end
@inline Base.getindex(iter::CartesianIndices{0}, ::CartesianIndices{0}) = iter
Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to backport to 1.6 or 1.7, then these changes should be separated to another PR.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the justification here?

julia> ()[CartesianIndex()]
ERROR: MethodError: no method matching getindex(::Tuple{}, ::CartesianIndex{0})
Closest candidates are:
  getindex(::Tuple, ::Int64) at tuple.jl:29
  getindex(::Tuple, ::Real) at tuple.jl:30
  getindex(::Tuple, ::Colon) at tuple.jl:33
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[12]:1

Copy link
Sponsor Member Author

@johnnychen94 johnnychen94 Sep 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra line is for the @test c[c] === c test case. Without this line getindex(iter, C.indices...) will call the getindex(::CartesianIndices{0}) method, which returns CartesianIndex() instead of [CartesianIndex()]

julia/test/cartesian.jl

Lines 150 to 156 in 8812c5c

@testset "0D array" begin
a = zeros()
c = CartesianIndices(a)
@test a[c] == a
@test c[c] === c
@test c[] == CartesianIndex()
end

Edit: I've removed these methods in this PR because I want to backport this PR to 1.6 release. I'll submit a new PR for these after this is merged.

This enables us to backport to 1.6 and 1.7 branch
@johnnychen94
Copy link
Sponsor Member Author

johnnychen94 commented Sep 6, 2021

The getindex(::CartesianIndices, ::RangeTypes) methods are newly introduced in #40344 so I've removed them in d58ce6a. Doing this we can backport this PR to 1.6 and 1.7 branches. I'll submit a separate PR for the removed changes to getindex(::CartesianIndices, ::RangeType) after this is merged.

Edit: I added the backport labels because I think this worths living in our LTS release. If there are concerns please feel free to remove these labels.

@johnnychen94 johnnychen94 added backport 1.6 Change should be backported to release-1.6 backport 1.7 labels Sep 6, 2021
@johnnychen94
Copy link
Sponsor Member Author

I think this one is ready.

cc: @timholy @mbauman

@timholy timholy merged commit 15b9851 into JuliaLang:master Sep 8, 2021
@timholy
Copy link
Sponsor Member

timholy commented Sep 8, 2021

Thanks!

@mbauman
Copy link
Sponsor Member

mbauman commented Sep 8, 2021

Aha, this makes sense! Thank you!

@johnnychen94 johnnychen94 deleted the jc/cartesian_simd branch September 8, 2021 13:32
KristofferC pushed a commit that referenced this pull request Sep 11, 2021
@maleadt maleadt mentioned this pull request Sep 16, 2021
KristofferC pushed a commit that referenced this pull request Oct 29, 2021
KristofferC pushed a commit that referenced this pull request Nov 11, 2021
@KristofferC KristofferC removed the backport 1.6 Change should be backported to release-1.6 label Nov 13, 2021
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:simd instruction-level vectorization performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

@inbounds isn't propagated for CartesianIndices
5 participants