Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findfirst and findnext for general iterables #15755

Closed
wants to merge 2 commits into from
Closed

Conversation

timholy
Copy link
Member

@timholy timholy commented Apr 3, 2016

This fixes #15723. While on balance I think it's an improvement, there are a few negatives:

  • The simplest call to findnext returns an Int for AbstractArrays, and a Nullable{T} for any other iterable. This is a consequence of our choice to use 0 to indicate failure-to-find for arrays; for a general iterable, there is no type-stable construct that can safely return an Int. It's regrettable whenever the general iterable API disagrees with the one for arrays.
  • In light of recent discussions that would allow us to support arrays that index starting at something besides 1, the use of 0 to indicate failure is unfortunate. A safer choice would be to return start - 1 and then test whether findnext(A, start) < start. But presumably everyone who uses findnext tests whether findnext(A, start) == 0, so changing the sentinel value would break a lot of code. Of course, if we change to returning a Nullable even for arrays, this issue goes away, too.

The only solution I can think of for these problems is to introduce findnext(FromFuture, A, start), deprecate findnext(A, start), and make a julia release. Then deprecate findnext(FromFuture, A, start) in favor of the new version of findnext(A, start) in the following cycle.

@timholy
Copy link
Member Author

timholy commented Apr 4, 2016

Looks like this passes all tests. Since this fixes a bug, any concerns about merging? Despite the bigger-picture concerns raised above (which could be addressed separately), we don't do anything very useful here now.

Master:

julia> p = Base.product(1:3,1:2)
Base.Prod2{UnitRange{Int64},UnitRange{Int64}}(1:3,1:2)

julia> findfirst(p, (2,2))
ERROR: MethodError: no method matching getindex(::Base.Prod2{UnitRange{Int64},UnitRange{Int64}}, ::Int64)
 in findnext(::Base.Prod2{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{Int64,Int64}, ::Int64) at ./array.jl:732
 in findfirst(::Base.Prod2{UnitRange{Int64},UnitRange{Int64}}, ::Tuple{Int64,Int64}) at ./array.jl:738
 in eval(::Module, ::Any) at ./boot.jl:237

julia> using Iterators

julia> findfirst(chain(1:3, ['a', 'b', 'c']), 'b')
ERROR: MethodError: no method matching length(::Iterators.Chain)
 in findnext(::Iterators.Chain, ::Char, ::Int64) at ./array.jl:731
 in findfirst(::Iterators.Chain, ::Char) at ./array.jl:738
 in eval(::Module, ::Any) at ./boot.jl:237

julia> findfirst(Iterators.product(1:3,1:2), (2,2))
ERROR: MethodError: no method matching getindex(::Iterators.Product, ::Int64)
 in findnext(::Iterators.Product, ::Tuple{Int64,Int64}, ::Int64) at ./array.jl:732
 in findfirst(::Iterators.Product, ::Tuple{Int64,Int64}) at ./array.jl:738
 in eval(::Module, ::Any) at ./boot.jl:237

This PR:

julia> p = Base.product(1:3,1:2)
Base.Prod2{UnitRange{Int64},UnitRange{Int64}}(1:3,1:2)

julia> state = findfirst(p, (2,2))
Nullable{Tuple{Int64,Int64,Nullable{Int64},Bool}}((2,3,Nullable{Int64}(2),false))

julia> item, newstate = next(p, get(state)); item
(2,2)

julia> using Iterators

julia> findfirst(chain(1:3, ['a', 'b', 'c']), 'b')
Nullable{Tuple{Int64,Int64}}((2,2))

julia> findfirst(Iterators.product(1:3,1:2), (2,2))
Nullable{Tuple{Array{Any,1},Array{Any,1}}}((Any[3,3],Any[2,2]))

@kmsquire
Copy link
Member

kmsquire commented Apr 4, 2016

My only concern here would be consistency.

How many other functions in Base return a Nullable right now? Are there other functions which use 0 to indicate failure? (On my phone right now, so difficult to check...)

If this is the route we want to go, I'd rather see it done on a larger scale, if useful.

At the least, I'd prefer the different versions of findfirst and findnext to be consistent.

(I'm ok if the decision goes otherwise, of course.)

@Jeffrey-Sarnoff
Copy link

-2 on using zero to indicate failure .. for the same reasons you each mention (at least)

          0 ~~> failure  >==<   false <~ bool(0)
this makes me queasy, because false may be success
        "The data shows that this medicine is dangerous." 

@timholy
Copy link
Member Author

timholy commented Apr 7, 2016

Yeah. Now that we have Nullable (which we certainly didn't when this infrastructure was first developed), it seems like the right way to return values from "find"-type operations that might not find anything.

Conceptually, the same problem is encountered with match, which currently returns nothing if no match is found (and is therefore not type-stable):

julia> match(r"b", "abcd")
RegexMatch("b")                                                                                                                                                   

julia> match(r"e", "abcd")

julia> @code_warntype match(r"e", "abcd")
Variables:
  #self#::Base.#match
  r::Regex
  s::ASCIIString

Body:
  begin  # regex.jl, line 180:
      return (Base.match)(_2::Regex,$(Expr(:new, :((top(getfield))(Core,:UTF8String)::Type{UTF8String}), :((top(convert))(Array{UInt8,1},(top(getfield))(_3::ASCIIString,:data)::Array{UInt8,1})::Array{UInt8,1}))),1,(Base.box)(UInt32,(Base.checked_trunc_uint)(UInt32,0)))::Union{RegexMatch,Void}
  end::Union{RegexMatch,Void}

But with both of these, changing the type of the return value in a non-compatible way seems to require a deprecation strategy like the (complicated) one I outlined above. Unless someone else has a bright idea?

@nalimilan
Copy link
Member

Returning a Nullable is an interesting idea, as it would allow making the API more consistent (while returning 0 is indeed a bit ad-hoc and doesn't work everywhere). It would also force us to make it easier to work with Nullables (see e.g. #15174). Cf. #12033 about changing match to return a Nullable.

But the inconsistency depending on the array type is annoying. I'd be in favor of keeping the API consistent, whatever the solution we choose. An alternative deprecation path would be to add a findfirst(x, Order.Forward, start) method which would return a Nullable, as I suggested at #10593 (comment). Then, after a deprecation of the current short findfirst(x, start) syntax in 0.5, we would change its meaning to be equivalent to findfirst(x, Order.Forward, start) in 0.6. That way at least, both syntaxes would remain valid in the long term (contrary to the FromFuture hack).

Cc: @johnmyleswhite

@johnmyleswhite
Copy link
Member

I would love to see us use Nullables more often, but I think this is a place where API design is being held back by performance: until all Nullables are stack allocated, I doubt we can move over to using them consistently (even though they should be stack allocated in this use case).

Maybe open an issue for standardizing on them for 0.6?

@nalimilan
Copy link
Member

Given that the deprecation path is complex, we could start providing a syntax for it ASAP, so that we're ready when performance allows flipping the switch.

@nalimilan nalimilan added this to the 0.5.0 milestone Apr 29, 2016
@nalimilan
Copy link
Member

Marking it as 0.5 since we need to decide now whether to add a deprecation or not, to allow for a change in 0.6.

@StefanKarpinski StefanKarpinski added speculative Whether the change will be implemented is speculative design Design of APIs or of the language itself labels May 5, 2016
@timholy timholy added the needs decision A decision on this change is needed label May 24, 2016
@timholy
Copy link
Member Author

timholy commented May 24, 2016

Adding a "decision" label here. The coding issues are trivial, the key issue is whether we do it and, if so, agreement on a deprecation strategy.

I propose doing it and introducing the FromFuture option. If there's interest, we should probably also consider how this will work with Compat before deciding to pull the trigger.

@StefanKarpinski
Copy link
Member

I think this needs to be worked out with the rest of the collections API, which also needs work.

@StefanKarpinski
Copy link
Member

Which is to say, not yet, unless there's a real deal breaker in here. I don't think we can fix the collections API in bits and pieces – it needs to be done holistically – and we don't really have the API design bandwidth to figure this out in 0.5.

@StefanKarpinski StefanKarpinski modified the milestones: 0.6.0, 0.5.0 May 26, 2016
@timholy
Copy link
Member Author

timholy commented May 26, 2016

Very reasonable; thanks for making a decision!

@nalimilan
Copy link
Member

nalimilan commented Jun 13, 2016

As a data point, Scala's find (equivalent to Julia's findfirst always returns anOption: http://www.scala-lang.org/api/current/index.html#scala.Option@find%28p:A=%3EBoolean%29:Option[A]

@nalimilan nalimilan added the missing data Base.missing and related functionality label Sep 6, 2016
@StefanKarpinski StefanKarpinski removed the needs decision A decision on this change is needed label Sep 14, 2016
@JeffBezanson
Copy link
Member

I think functions like this that operate on indices should require some kind of indexed collection --- it's not really useful to get the index of something in a general iterable. This question also comes up in #22907.

@StefanKarpinski
Copy link
Member

If we only consider collections where indices are integers, then nothing could be a safe sentinel value. If we consider collections where keys can be any value, then nothing is not a safe sentinel. We could use done(itr, ind) to check if a returned opaque (state) value is the sentinel.

@StefanKarpinski
Copy link
Member

Another thought: returning an index-value pair and use nothing (which is not a pair) as a sentinel.

@nalimilan
Copy link
Member

Another thought: returning an index-value pair and use nothing (which is not a pair) as a sentinel.

Rather than introducing an ad-hoc approach like this, I'd rather use the Union{Void, Value}/ Union{Void, Some} type we've discussed to replace Nullable, since the problem we are trying to solve is exactly the same as when doing a dict lookup: being able to tell the difference between a null key/index and no result. This would be similar to returning Option as Scala does.

@JeffBezanson
Copy link
Member

I think instead we can add searching iterators, so this can be handled by done. (#10593)

@DilumAluthge DilumAluthge deleted the teh/findfirst_next branch March 25, 2021 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself missing data Base.missing and related functionality speculative Whether the change will be implemented is speculative
Projects
None yet
Development

Successfully merging this pull request may close these issues.

invalid character index in findfirst
8 participants