Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement indexing by value with atvalue #52

Merged
merged 10 commits into from
Jul 6, 2017

Conversation

ajkeller34
Copy link
Collaborator

Some comments:

  • I think the error message here is unreachable. You can see from the order of the if-elseif statements here that axisindexes with a second argument of type T <: Real can’t be called from within to_index, since Real <: Idx. I have removed the error message. For reals, I think it is more consistent to just let indexing fail than to try and guess if the user was attempting to index by value and throw an error accordingly.
  • I don’t understand the definition of Idx very well. Why allow Real but only AbstractArray{Int}? I’m guessing it’s just complicated to try and get AbstractArray{T<:Real}, or would it be unusual to index that way?
  • More speculatively, since we have a way to do atvalue, one could imagine functions like atvalueapprox, atvaluenearest, etc. I haven't felt a need for them, but it may be worth considering. I guess atvalue as I've defined it will fail if there are small differences in floating point values on the axis from what is requested.

@mbauman
Copy link
Member

mbauman commented Oct 28, 2016

This is great, thanks for diving in!

You're totally right that the error message there is currently unreachable. But I think I'd prefer to keep it, since:

I wouldn't read too much into the current definition of Idx. It dates to the time when those were the constraints on view indexing, but that's no longer the case. It could definitely be adjusted to match the currently supported index types in base. And now that non-integer indexing is an error in 0.5, I say we tighten it up. If we do that, we'll need to keep that method to prevent the other one from taking precedence. We now have an even better suggestion to give in the error with this PR!

About atvaluenearest, I agree. Let's not worry about this until someone finds they need it.

@ajkeller34
Copy link
Collaborator Author

I've tightened up the definition of Idx and added the error back, with a new suggestion. I tweaked the method signature slightly on the axisindexes method that throws the error: I don't think we should require ax and idx to have the same element type. You still want the error if you have an axis of Float32s and you index with a Float64 value, for example. I added a few tests too, and a note in the docs and readme about how to use this feature.

One thing to keep in mind is that atvalue could be finicky if you are using it with an axis made from a range type where floating-point numbers may be off by an ULP or two from what you request. I'm not sure if there is a good way around this, apart from something like atvalueapprox. You can break the example I added in the documentation pretty easily (try indexing with atvalue(50.0µs)). Thoughts?

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, see minor comments.

I do agree that the question of ulp-level accuracy might be important. Perhaps we should consider a ValueTol type as well. We could use searchsorted to find the element; if the returned range is empty then we could check the previous and next one to see if it's within the specified tol.

People might specify the value at quite low precision, e.g., using the values displayed in the compact printing of an array.

src/indexing.jl Outdated
# Maybe extend error message to all <: Numbers if Base allows it?
axisindexes{T<:Real}(::Type{Dimensional}, ax::AbstractVector{T}, idx::T) = error("indexing by axis value is not supported for axes with $(eltype(ax)) elements; use an ClosedInterval instead")
axisindexes{T<:Real}(::Type{Dimensional}, ax::AbstractVector, idx::T) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just ids::Real, but not a big deal either way.

src/indexing.jl Outdated
function axisindexes(::Type{Dimensional}, ax::AbstractVector, idx)
idxs = searchsorted(ax, ClosedInterval(idx,idx))
length(idxs) > 1 && error("more than one datapoint lies on axis value $idx; use an interval to return all values")
idxs[1]
end

# Dimensional axes may always be indexed by value if in a Value type wrapper.
axisindexes(::Type{Dimensional}, ax::AbstractVector, idx::Value) =
findfirst(ax.==idx.val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findfirst(ax, idx.val) would be more efficient: ax.==idx.val creates a temporary vector the length of ax, whereas findfirst(ax, idx.val) does not allocate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation really could be the same as the method on line 136. I'm not sure why I spelled it that way, but I think it had something to do with some types not implementing the comparisons that I needed.

@mbauman
Copy link
Member

mbauman commented Nov 3, 2016

Perhaps we should consider a ValueTol type as well

Maybe we spell this as atvalue(x; tol=…), defaulting to tol=0?

@ajkeller34
Copy link
Collaborator Author

After a long delay, I have rebased and addressed the comments. I added both atol and rtol options to mirror the corresponding keyword arguments in isapprox. I think the default tolerances are pretty sensible for typical index-by-value applications where you don't want to be sensitive to tiny float differences, although perhaps my use of Base.rtoldefault is questionable. (I'd need to implement that method for Unitful types, for starters.) Thoughts / further comments?

src/indexing.jl Outdated
else # it's zero
last(idxs) > 0 && abs(ax[last(idxs)] - idx.val) < idx.tol && return last(idxs)
first(idxs) <= length(ax) && abs(ax[first(idxs)] - idx.val) < idx.tol && return first(idxs)
return 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can defeat this with the following:

julia> using AxisArrays, OffsetArrays

julia> A = AxisArray(OffsetArray([1 2; 3 4], 0:1, 1:2), Axis{:x}([1.0,4.0]), Axis{:y}([2.0,6.1]))
2-dimensional AxisArray{Int64,2,...} with axes:
    :x, [1.0, 4.0]
    :y, [2.0, 6.1]
And data, a OffsetArrays.OffsetArray{Int64,2,Array{Int64,2}} with indices 0:1×1:2:
 1  2
 3  4

julia> A[atvalue(5.0)]
1-dimensional AxisArray{Int64,1,...} with axes:
    :y, [2.0, 6.1]
And data, a OffsetArrays.OffsetArray{Int64,1,Array{Int64,1}} with indices 1:2:
 1
 2

Perhaps you can kill two birds with one stone (see your comment below about throwing a more informative BoundsError) and throw the error from here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the error you identified and added the ability to throw a BoundsError with a value index. The issue isn't entirely resolved though, as indicated by a @test_broken in the tests.

I've found a few parts of AxisArrays that I haven't touched that don't work as expected when using OffsetArrays, so I'm not sure how much I should attempt to resolve in this PR. Consider the following:

julia> using AxisArrays,OffsetArrays

julia> B = AxisArray(OffsetArray([1,2,3], 2:4), Axis{:junk}([:a, :b, :c]))
1-dimensional AxisArray{Int64,1,...} with axes:
    :junk, Symbol[:a, :b, :c]
And data, a OffsetArrays.OffsetArray{Int64,1,Array{Int64,1}} with indices 2:4:
 1
 2
 3

julia> B[:a]
ERROR: BoundsError: attempt to access OffsetArrays.OffsetArray{Int64,1,Array{Int64,1}} with indices 2:4 at index [1]
Stacktrace:
 [1] throw_boundserror(::OffsetArrays.OffsetArray{Int64,1,Array{Int64,1}}, ::Tuple{Int64}) at ./abstractarray.jl:433
 [2] checkbounds at ./abstractarray.jl:362 [inlined]
 [3] getindex at /Users/ajkeller/.julia/v0.6/OffsetArrays/src/OffsetArrays.jl:94 [inlined]
 [4] getindex at /Users/ajkeller/.julia/v0.6/AxisArrays/src/indexing.jl:26 [inlined]
 [5] getindex(::AxisArrays.AxisArray{Int64,1,OffsetArrays.OffsetArray{Int64,1,Array{Int64,1}},Tuple{AxisArrays.Axis{:junk,Array{Symbol,1}}}}, ::Symbol) at /Users/ajkeller/.julia/v0.6/AxisArrays/src/indexing.jl:111

Perhaps there should be a separate PR that tries to fix these issues, perhaps via translating from standard indices to offset indices using Base.indices? I'm not terribly familiar with the non-standard indexing schemes in Julia so perhaps there's a better way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, don't worry too much about that case yet, I suspect #90 is relevant.

@timholy
Copy link
Member

timholy commented Jul 4, 2017

LGTM, though is the Travis failure on 0.5 due to these changes or something different?

@ajkeller34
Copy link
Collaborator Author

I think the single test failure on Julia 0.5 is because of the following indexing regression in Base on 0.5.2:

Julia 0.5.2:

julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
A 1  2
 3  4

julia> A[:,1.0]
ERROR: ArgumentError: invalid index: 1.0
 in macro expansion at ./multidimensional.jl:296 [inlined]
 in _unsafe_getindex(::Base.LinearFast, ::Array{Int64,2}, ::Colon, ::Float64) at ./multidimensional.jl:293
 in getindex(::Array{Int64,2}, ::Colon, ::Float64) at ./abstractarray.jl:760

julia> A[:,2.0]
ERROR: ArgumentError: invalid index: 2.0
 in macro expansion at ./multidimensional.jl:296 [inlined]
 in _unsafe_getindex(::Base.LinearFast, ::Array{Int64,2}, ::Colon, ::Float64) at ./multidimensional.jl:293
 in getindex(::Array{Int64,2}, ::Colon, ::Float64) at ./abstractarray.jl:760

julia> A[:,3.0]
ERROR: BoundsError: attempt to access 2×2 Array{Int64,2} at index [Colon(),3.0]
 in throw_boundserror(::Array{Int64,2}, ::Tuple{Colon,Float64}) at ./abstractarray.jl:363
 in checkbounds at ./abstractarray.jl:292 [inlined]
 in _getindex at ./multidimensional.jl:272 [inlined]
 in getindex(::Array{Int64,2}, ::Colon, ::Float64) at ./abstractarray.jl:760

That last error should also be an ArgumentError, not a BoundsError. Note that this did not happen in Julia 0.5.1, and appears to be fixed in Julia 0.6.0, although I couldn't find anything on the issue tracker related to this. Let me know how you'd like me to proceed (mark as a broken test for VERSION == v"0.5.2"?)

@timholy
Copy link
Member

timholy commented Jul 4, 2017

Very interesting. Yes, not really your problem, but if you can add that broken mark then this seems good to go.

@ajkeller34
Copy link
Collaborator Author

ajkeller34 commented Jul 4, 2017

Please hold off on merging until I implement Base.rtoldefault in Unitful, otherwise the README will have invalid examples. I'll work on that later today.

(edit: specifically, atvalue will fail with a Unitful quantity value. I'll bump REQUIRE in this PR to the latest Unitful version after it is patched and released.)

@ajkeller34
Copy link
Collaborator Author

Okay, provided this passes CI, I feel this is good to go. I noticed that there are a few things which are out of date in the README when I was checking the atvalue example. I've taken the liberty of fixing some of the most minor points. For instance, this is a registered package now, so Pkg.add is preferred to Pkg.clone. Additionally, docs/src/index.md was out of sync with README, so now it is as up to date as the README is.

In a future PR I think it would be good to: 1) use a manually seeded random number generator so the examples are exactly reproducible (permitting doc tests in docs/src/index.md) and 2) remove the Clustering/Gadfly example until it works again.

@iamed2
Copy link
Collaborator

iamed2 commented Jul 6, 2017

Thanks extra for the doc updates!

@timholy timholy merged commit d5cdc14 into JuliaArrays:master Jul 6, 2017
@timholy
Copy link
Member

timholy commented Jul 6, 2017

Thanks!

@ajkeller34 ajkeller34 deleted the index_by_value branch July 6, 2017 18:48
test/indexing.jl Outdated
@test_throws ArgumentError A[:,6.1]
if VERSION == v"0.5.2"
# Cannot compose @test_broken with @test_throws (Julia #21098)
# A[:,6.1] incorrectly throws a BoundsError instead of an ArgumentError on Julia 0.5.2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did anyone bisect this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't bisect it, but I'm quite certain that it was JuliaLang/julia#19730. We used to check bounds with < before converting indices. Now we convert indices first. This is expected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? how, this is a problem only on 0.5.2 apparently?

@mbauman
Copy link
Member

mbauman commented Jul 11, 2017

Thanks so much @ajkeller34 and @timholy! This is great. :)

@test_throws ArgumentError A[1.0f0]
if VERSION == v"0.5.2"
# Cannot compose @test_broken with @test_throws (Julia #21098)
# A[:,6.1] incorrectly throws a BoundsError instead of an ArgumentError on Julia 0.5.2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would show up as a package regression if I make an 0.5.3, so I'd like to figure out what caused this and whether it was something that should not have been backported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants