-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up search and find API #24673
Clean up search and find API #24673
Conversation
7e109ce
to
e4cc7b6
Compare
base/deprecated.jl
Outdated
# FIXME: no replacement to search for a multibyte char in a ByteArray | ||
@deprecate search(a::Union{String,ByteArray}, b::Union{Int8,UInt8}, i::Integer = 1) findnext(equalto(b), a, i) | ||
@deprecate search(a::String, b::Union{Int8,UInt8}, i::Integer = 1) findnext(equalto(Char(b)), a, i) | ||
@deprecate search(a::ByteArray, b::Char, i::Integer = 1) findnext(equalto(Unt8(b)), a, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo Unt8
base/deprecated.jl
Outdated
@deprecate search(s::String, c::Char) findfirst(equalto(c), s) | ||
# FIXME: no replacement to search for a multibyte char in a ByteArray | ||
@deprecate search(a::Union{String,ByteArray}, b::Union{Int8,UInt8}, i::Integer = 1) findnext(equalto(b), a, i) | ||
@deprecate search(a::String, b::Union{Int8,UInt8}, i::Integer = 1) findnext(equalto(Char(b)), a, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is breaking since the old behavior was to search for bytes. However we might want to deprecate this with no replacement since it doesn't really make sense to search for a UInt8 in a String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these methods were really a mess. Some were probably supposed to be internal, but were de facto exported, and were too flexible regarding the accepted types. It's easy and efficient to do Vector{UInt8}(s)
if you want to look for a byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, maybe better keep a semi-working deprecation than no deprecation at all? It's quite frequent to look for ASCII characters, for which the old method looking for bytes worked.
base/deprecated.jl
Outdated
@deprecate search(a::ByteArray, b::Char, i::Integer = 1) findnext(equalto(Unt8(b)), a, i) | ||
|
||
@deprecate search(s::AbstractString, c::Union{Tuple{Vararg{Char}},AbstractVector{Char},Set{Char}}, i::Integer) findnext(x -> x in c, s, i) | ||
@deprecate search(s::AbstractString, c::Union{Tuple{Vararg{Char}},AbstractVector{Char},Set{Char}}) findfirst(x -> x in c, s, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding occursin(c)
for this function?
👍 Good, I like where this is going! I believe |
Right. Though the exception can be justified, since string and regex matches can span over several entries. We could extend this idea to any sequence of elements, e.g. to look for a vector inside another vector. Do we really need |
Has using |
Yes, I think we need to discuss that, but that will have to be a separate PR since it doesn't affect only strings. EDIT: see JuliaLang/Juleps#47 |
I've added commits deprecating |
22d1f89
to
34751ed
Compare
Bump: should |
I haven't had time to follow this, but my 2c is that if using a range in strings opens up the possibility of using a range for |
@timholy That's actually a different issue. BTW, an argument for |
Very reasonable, esp. since union-splitting keeps the performance hit low. My point was simply that returning |
As a point of reference, A long time ago (probably in Julia 0.3), I did some ad hock benchmarking, and there was some overhead in creating the range compared with returning a single integer. However, the compiler has changed dramatically since then. It’s also unclear how useful it is to always get the range of indices containing a particular value, vs, e.g., just getting the first or last insertion point. |
See #24883 about |
Yes, don't hold up on my accord. I'm not thrilled by I do wonder whether it would be good to have a general verb for pattern matching. |
e4a3a06
to
8610b63
Compare
I would rather merge this in its current form as a massive improvement in consistency and then debate some of the finer points after we've merged. We're not going to get this 100% right and that's ok, we'll learn some lessons during 1.x and do even better in 2.0. |
It would be good to have Compat support for this, e.g. to replace |
Yeah, that's quite a mess to support given all the changes. I've filed JuliaLang/Compat.jl#484 to implement most of the changes made here, but some from other PRs will have to be addressed separately. |
I'm late to the party, but I've been thinking about
PS I do love the direction this PR has taken us in. I'm thinking using and reading stuff like getting all the indices of the As it stands I don't really love |
Do you have a reference? We still use lowercase constructors systematically in I kind of agree the names aren't great, for example we use the third person for |
Rather than having special lowering for |
This is a first step towards deprecating
search
in favor offindfirst
andfindnext
(#10593). The same changes need to be applied torsearch
, but I figured it would make sense to discuss the design before since issues are similar.Remarks/points to discuss:
findnext(::Char, ::AbstractString, idx)
to replace the equivalentsearch
method, the PR requires doingfindnext(equalto(c), haystack, idx)
. This can easily be changed later, but better be strict in 1.0. In the same vein, to find the first char from a set of possibilities, one needs to dofindnext(c in chars, haystack, idx)
; passingchars
directly is deprecated.findnext
andfindfirst
are therefore more restrictive thansplit
andreplace
, which accept a single char or an array of chars, but I think this difference is justified becausefind*
functions are much more general/complex.findnext(::Union{Regex,AbstractString}, ::AbstractString, idx)
is supported even though it has no equivalent for arrays, because it's unambiguous. We could require something likefindnext(seq(needle), haystack)
, which would also work for arrays, but that sounds too strict to me. Note that this method is an exception anyway since it returns a range of indices rather than a single index.