Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CheckIndexStyle trait for bounds-checking #42029

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

timholy
Copy link
Member

@timholy timholy commented Aug 27, 2021

The motivation here is to allow wrapper-types to exploit the same short-
circuits of bounds-checking used for AbstractRange.

This also fixes a previously-undiscovered bug in checkindex for
AbstractRange{Bool} indices. It was hidden by the fact that
checkbounds itself short-circuits before calling checkindex in
this case. But since checkindex is an exported function, it should
also give the correct result.

Example usage: Wrapper(1:5) is not necessarily an AbstractRange (Wrapper may also be able to handle other array types that aren't ranges), yet when used as an index vector it would be nice to be able to short-circuit the bounds checking to just the endpoints. You can specialize checkindex, but that can be specialized on axis types rather than index types, so the trait is the way to go (xref JuliaArrays/OffsetArrays.jl#255).

At the JuliaCon BoF on arrays there was discussion about adding more array traits. This might be viewed as a downpayment on that, but it will be interesting to see whether we think we get to a point of having too many. So imagine this is the first of several and see if you think it's worth it. An alternative would be to put this in ArrayInterface, but we'd have to duplicate a lot of stuff (there's already a fair amount of that in its "src/indexing.jl" file), and since checkindex is defined in Base it seems this is the right place.

CC/review request @mbauman, @jishnub, @oxinabox, @Tokazama, @ChrisRackauckas, @chriselrod.

The motivation here is to allow wrapper-types to exploit the same short-
circuits of bounds-checking used for `AbstractRange`.

This also fixes a previously-undiscovered bug in `checkindex` for
`AbstractRange{Bool}` indices. It was hidden by the fact that
`checkbounds` itself short-circuits before calling `checkindex` in
this case. But since `checkindex` is an exported function, it should
also give the correct result.
@Tokazama
Copy link
Contributor

This seems like a good approach, but I'll have to check the specifics of each trait later.

An alternative would be to put this in ArrayInterface, but we'd have to duplicate a lot of stuff (there's already a fair amount of that in its "src/indexing.jl" file), and since checkindex is defined in Base it seems this is the right place.

I'm hoping to do a clean up of that file soon, but I think if we can get a non-breaking change in Base now this is a better place.

it will be interesting to see whether we think we get to a point of having too many.

If we should instead be making these traits more generalizable than just to checkindex then I think this may be problematic, but bounds checking seems unique and important enough that it's worth it. Maybe put this in Experimental and see how people use it in the wild before committing fully to it? Just a thought.

NEWS.md Outdated Show resolved Hide resolved
@mbauman
Copy link
Member

mbauman commented Aug 27, 2021

Gosh, at first glance this feels quite complicated for not much gain, but I see the ambiguity pain point in JuliaArrays/OffsetArrays.jl#255. The way we've typically dealt with this is through a layered approach:

  • checkbounds is the place for the parent array type to specialize
  • checkindices is the place for the multi-dimensional indices to specialize
  • checkindex is currently the place where both parent axes and indices specialize

All the pain is coming from that joint specialization in that final bullet. This not only adds another function name (_checkindex), but it also adds a trait system. Can we not do this more sustainably through a slightly more generic implementation of checkindex?

For example, the crux of the checkindex(axis, index) operation is simply issubset(index, axis). What if that's where this specialization should go?

Co-authored-by: Lyndon White <[email protected]>
@Tokazama
Copy link
Contributor

Tokazama commented Aug 27, 2021

Perhaps I'm missing something, but I don't see how issubset does anything but pass the problem off to another method. If we are only concerned about preserving optimized checks that are masked by a simple wrapper then I'd do something like this:

function checkindex(::Type{Bool}, inds::AbstractUnitRange, i::I) where {I<:AbsractVector}
    if parent_type(I) <: I
        b = true
        for i in I
            b &= checkindex(Bool, inds, i)
        end
        return b
    else
        return checkindex(Bool, inds, parent(i))
    end
end

We still have the default slow method but we don't have require explicit definitions of checkindex for each new array type, unless it changes the parent array somehow.

EDIT: Just looked at the OffsetArrays issue...not a simple wrapper.

@johnnychen94
Copy link
Member

I'd also like to throw another issue here to consider: JuliaArrays/PaddedViews.jl#40 (comment)

For a lazy array whose axes don't equal to its parent, there comes to two senses of boundscheck, one for getindex and the other for setindex!. The current semantic of boundscheck doesn't support differentiating these two.

@timholy
Copy link
Member Author

timholy commented Aug 28, 2021

For example, the crux of the checkindex(axis, index) operation is simply issubset(index, axis). What if that's where this specialization should go?

The problem is this: if your dispatch rules work on both arguments, you have to solve ambiguities by remaining on the Pareto frontier of specificity. In contrast, with traits-based dispatch, you effectively work on one argument at a time, and hence have to only win the specificity battle in either a row or a column of the matrix defined by arg1_supported_types ⊗ arg2_supported_types2 while also providing a way to narrow and standardize the set of types that have to be in arg2_supported_types2.

If there's some natural way to create a dispatch chain on issubset(index, axis) that is based around issubset(index, something_else(axis)) then that would be effectively equivalent.

@Tokazama
Copy link
Contributor

Tokazama commented Aug 30, 2021

Perhaps what we really want are traits that describe the composition of the index instead of the bounds checking. In this case we can tell from the element type whether it is Bool or another integer. We just want to know if we know if it is sorted and in what direction.

@timholy
Copy link
Member Author

timholy commented Sep 5, 2021

For a lazy array whose axes don't equal to its parent, there comes to two senses of boundscheck, one for getindex and the other for setindex!. The current semantic of boundscheck doesn't support differentiating these two.

Now that I look at this, it occurs to me that here the limitation is with axes rather than checkindex; once you grab the right notion of axes for each operation, checkindex is just fine, right?

@timholy
Copy link
Member Author

timholy commented Sep 5, 2021

I'd like to move this forward. @mbauman, since you're the one who had reservations, can you offer a more concrete proposal? I think I've done the best I can to address your reservations in the abstract.

CheckIndexStyle(A::AbstractArray) = CheckIndexStyle(typeof(A))
CheckIndexStyle(::Type{Union{}}) = CheckIndexAll()
CheckIndexStyle(::Type{<:AbstractArray{T}}) where T = T === Bool ? CheckIndexAxes() : CheckIndexAll()
CheckIndexStyle(::Type{<:AbstractRange{T}}) where T = T === Bool ? CheckIndexAxes() : CheckIndexFirstLast()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include things like FastContiguousSubArray whose parent types have CheckIndexFirstLast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. This checks the indexes, not the array you are indexing. So if we checked only the first and last elements of idx, this would be a problem:

julia> a = [1, 2, 37, 4]
4-element Vector{Int64}:
  1
  2
 37
  4

julia> b = 1:4
1:4

julia> sa = view(a, :)
4-element view(::Vector{Int64}, :) with eltype Int64:
  1
  2
 37
  4

julia> isa(sa, Base.FastContiguousSubArray)
true

julia> b[sa]
ERROR: BoundsError: attempt to access 4-element UnitRange{Int64} at index [[1, 2, 37, 4]]
Stacktrace:
 [1] throw_boundserror(A::UnitRange{Int64}, I::Tuple{SubArray{Int64, 1, Vector{Int64}, Tuple{Base.Slice{Base.OneTo{Int64}}}, true}})
   @ Base ./abstractarray.jl:651
 [2] checkbounds
   @ ./abstractarray.jl:616 [inlined]
 [3] _getindex
   @ ./multidimensional.jl:831 [inlined]
 [4] getindex(A::UnitRange{Int64}, I::SubArray{Int64, 1, Vector{Int64}, Tuple{Base.Slice{Base.OneTo{Int64}}}, true})
   @ Base ./abstractarray.jl:1170
 [5] top-level scope
   @ REPL[6]:1

That said, special rules might apply to AbstractSortedVector but I am not sure we have one of those anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent type would have to return CheckIndexFirstLast, like this view(Wrapper(1:5), :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I get it, you mean if the index is a FastContiguousSubArray. Sure, I'll add that.

@mbauman
Copy link
Member

mbauman commented Sep 8, 2021

So the crux of the issue is that both custom indices and custom axes want to implement checkindex(Bool, axis, index), yeah? I guess my question is what customizations do the axes need to provide here? Don't they just want to just treat their (potentially offset-ly indexed) values as a bag of values? Or is there something I'm missing?

My thought is to just have the following definitions:

# checkindex deals specifically with axis-specific mumbo jumbo
checkindex(::Type{Bool}, axis, index) = issubset(index, axis)
checkindex(::Type{Bool}, axis::OffsetRange, index) = issubset(index, axis.value)
# issubset is mathematical, just deals with things as iterable bags-of-values with a handful of *optimizations*
issubset(index::Integer, r::AbstractUnitRange) = index in r
issubset(index::AbstractVector, r::AbstractUnitRange) = all(i in r for i in index)
issubset(index::CustomType, r::AbstractUnitRange) = ...

The issubset method table is currently very sparse, and I don't think there'd be a need to have anything more specific than ::AbstractUnitRange on the right, except cases where there's also a concrete type on the left.

@timholy
Copy link
Member Author

timholy commented Sep 8, 2021

Thanks for filling in detail. Just a few responses:

custom indices and custom axes want to implement checkindex(Bool, axis, index)

That's the risk.

I guess my question is what customizations do the axes need to provide here?

Example: in FFTViews, the axes have periodic boundary conditions. So there is no such thing as a BoundsError, unless the array is empty. So it has several specializations, and you can see it had to go to some effort to solve ambiguity problems already. As soon as you throw packages into the mix...

My general view is that all the specialization that should be done on the indexes has already been done by Base; it's kind of like how we tell people to try to avoid specializing A[:, idx] for different idx, and just let Base machinery take care of everything other than all-Int indexing. But we allow/encourage array types to define themselves through custom axes, so anything package-specific should be specializing on the axes.

My thought is to just have the following definitions
...
checkindex(::Type{Bool}, axis::OffsetRange, index) = issubset(index, axis.value)

In JuliaArrays/OffsetArrays.jl#255 it's actually index that's being specialized on.

base/indices.jl Outdated Show resolved Hide resolved
@mbauman
Copy link
Member

mbauman commented Sep 8, 2021

One more shot: could we just go to a checkindex(::Type{Bool}, index, lower_bound::Integer, upper_bound::Integer) instead of a traitsystem? Could that satisfy the packages? I suppose even that would be on the fence for FFT as it'd require using sentinels like typemin(Int) and typemax(Int) for its bounds and would have to trust constant prop to remove comparisons against them.

@Tokazama
Copy link
Contributor

Tokazama commented Sep 8, 2021

I think there could be a benefit to a more general AccessStyle trait, as the other side of IndexStyle. checkindex is a very specific place where this is beneficial but I've run into several other places where characterizing the indexer would be useful.

Perhaps this would appease concerns that the current proposal is overly specific to checkindex

@timholy
Copy link
Member Author

timholy commented Sep 10, 2021

I'll take another stab at this, but it may be a bit.

@mbauman
Copy link
Member

mbauman commented Sep 10, 2021

I'm sorry I'm that fly-by stick-in-the-mud here. This does address a real problem, and indeed I've not come up with anything better. My reservations really do stem from the complexity size for what I see as a very narrow application, but that's all it is. Don't let me be the enemy of good here.

@timholy
Copy link
Member Author

timholy commented Sep 10, 2021

No worries. Whatever we add, we have to keep supporting. Even though we're making opposite points, the reality is that there is not much daylight between us; it's worth giving this a fresh think in light of the comments here, but I may need to prioritize some other stuff for a couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants