-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple pair arguments in replace on strings. #30457
Conversation
I can't figure out what's causing the whitespace issues. I tried a |
Identified by make check-whitespace
There was trailing whitespace on the line mentioned in the logs. I've added a commit that fixes it. |
See previous (and more ambitious) attempt at #25732. |
Co-Authored-By: MasonProtter <[email protected]>
Co-Authored-By: MasonProtter <[email protected]>
So the MacOS build "exceeded log length and was terminated" causing it to fail whereas the linux and freeBSD builds were fine. How do I fix the MacOS issue? |
CI failures indeed look unrelated. I'd like to raise a more fundamental API question before merging this. Reading the long discussions with @stevengj and @StefanKarpinski at #25396 again, @stevengj was opposed to adding this API because it cannot be made efficient when applying a replacement to multiple strings (e.g. in a loop): compiling a regex once is much faster. The only situation where multiple arguments to |
That's fair, but I think that if you look into code people write when trying to do multiple replacements, they end up getting forced to write things like replace(replace(replace(s, "foo"=>"bar"), "baz"=>"Baz"), "..."=> ".") and it just ends up making a giant code stink that really doesn't need to exist if I think this is a pattern many want to use regardless of performance so what if there was just a note in the docstring warning that this is not ideally performant? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perform a successive series of
replace
operations ons
, evaluating the replacements inreps...
sequentially from left to right.
This text could be misleading. Actually the replace operations are not applied to s, but to the intermediate result of applying the previous operations.
Proposal:
Apply first replacement in `reps` to s, then successively the other replacements to the previous intermediate results. If `reps` is empty, return s.
If that is the meaning you want, I don't see a lot of advantage over just telling people to use "Use this library function if you want to guarantee the slowest possible way to do multiple replacements" is not a compelling case for something to be in the standard library. |
It seems to me we should be able to implement the feature with a note in the docstring about potential performance problems, then chip away at improving it over time. It's not like it's flat out impossible to make this efficient; the Magic of Dispatch™ means we can at some point use a completely different implementation for this method if we want. |
It's extremely difficult to make it faster if you define the semantics as a |
I happen to agree that I'd note that without reading the documentation it wouldn't be clear to someone reading the code if these pairs are applied sequentially, or in some more efficient parallel form, or whatever. In particular, in the OP it is written:
That method isn't sequentially applied; it apparently has different semantics. Consider It seems to me that when/if we eventually have a more efficient, all-at-once algorithm, we would actually want it to use the method signature in this PR. I suppose we could narrow the signature to |
SituationThere are needs of
ProposalLet's forget the One could provide an inductive definition of @inline _replace(s::S, pat::Pair{S,S}) where {S<:AbstractString} = replace(s, pat)
@inline _replace(s::S, pat::Pair{S,S}, pats::(Pair{S,S})...) where {S<:AbstractString} = _replace(replace(s, pat), pats...)
const _REPLACE_STRING_PAIRS_MAX = 9
for k in 2:_REPLACE_STRING_PAIRS_MAX
pats = [Symbol("p",i) for i in 1:k]
typed_pats = [:($p::Pair{S,S} where {S<:AbstractString}) for p in pats]
@eval @inline Base.replace(s::AbstractString, $(typed_pats...)) = _replace(s, $(pats...))
end Lowered codenreplcalls(f) = begin
io = IOBuffer()
code_llvm(io, f, Tuple{String}; optimize=true, debuginfo=:none)
s = String(take!(io));
s |> # llvm code
Base.Fix2(split, "\n") |> # llvm code lines
Base.Fix1(filter, Base.Fix1(occursin,"replace")) |> # llvm code lines with replace
length # count them
end
@test 3 == nreplcalls(s->replace(s,"b"=>"c","a"=>"b","c"=>"a"))
@test 4 == nreplcalls(s->replace(s,"c"=>"d","b"=>"c","a"=>"b","d"=>"a"))
@test 5 == nreplcalls(s->replace(s,"d"=>"e","c"=>"d","b"=>"c","a"=>"b","e"=>"d"))
@test 6 == nreplcalls(s->replace(s,"e"=>"f","d"=>"e","c"=>"d","b"=>"c","a"=>"b","f"=>"a")) Code with n pairs has been converted to n calls to replace in llvm. |
What is the status here? This just came up in https://discourse.julialang.org/t/how-to-replace-the-charactors-in-the-string/30778/24. I was super suprised that we don't have a multi-pair replace for strings (char replacement in this case) while we do have the mentioned I fully agree with @andyferris on the semantics: |
I agree that the Another issue with this API in general is that any implementation for large numbers of replacements will probably involve constructing a |
I'm also now convinced that the Perhaps I should close this to make room for someone else to do a more serious PR? |
There are several points to deal with there :
My previous post addresses
The post i have proposed contains an integer threshold based on the number of allowed replacement pairs we can use as a switch to rely / relay to an upcoming and enhanced implementation of the (many) clever algo the Julia community will not miss the opportunity to propose later, i am sure (: |
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. #25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes #35327 fixes #39061 fixes #35414 fixes #29849 fixes #30457 fixes #25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This PR is intended to allow syntax like
in order to be more in line with other methods of
replace
like