Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace: allow transform function as first argument? #24598

Open
StefanKarpinski opened this issue Nov 13, 2017 · 17 comments
Open

replace: allow transform function as first argument? #24598

StefanKarpinski opened this issue Nov 13, 2017 · 17 comments
Labels
good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request strings "Strings!"

Comments

@StefanKarpinski
Copy link
Sponsor Member

The replace function allows a string, an s"..." replacement specifier or a function as its third argument. For the sake of do syntax, it might make sense to support the function as the first argument, like so:

julia> replace("Hello, world.", r"\w+") do word
           n = 0
           for c in word
               n += lowercase(c) in "aeiou"
           end
           return n
       end
"2, 1."

If someone decides to do this, while they're at it, an explanation of replacement patterns – i.e. s"..." strings and some more examples in the replace docs wouldn't hurt. E.g.:

julia> replace("Hello, world.", r"(\w+)(\W+)(\w+)", s"\3\2\1")
"world, Hello."
@StefanKarpinski StefanKarpinski added good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request strings "Strings!" labels Nov 13, 2017
@affans
Copy link
Contributor

affans commented Nov 13, 2017

I would like to try to tackle this. But where would I find the source code for handling the do syntax? Doing a search for do isn't helpful.

@StefanKarpinski
Copy link
Sponsor Member Author

See this section: https://docs.julialang.org/en/latest/manual/functions/#Do-Block-Syntax-for-Function-Arguments-1. You don't need to do anything special for do syntax, just add a method to replace where the replacement function is the first argument.

@nalimilan
Copy link
Member

(For another, non-string extension of replace, see #22324.)

@StefanKarpinski
Copy link
Sponsor Member Author

@rfourquet, @nalimilan: does non-string extension collide with this change as far as you can tell?

@nalimilan
Copy link
Member

It wouldn't collide strictly speaking since this one would dispatch on AbstractString, but the semantics would be different, which could be confusing (or not). The implementation at #22324 takes a predicate function which returns true for entries which should be replaced with the element provided as an argument to replace. So the function is the predicate, and the replacement is fixed, which isn't the case here.

It could be made more general so that it can return either nothing, in which case nothing happens, or another value, in which case the entry is replaced with it. That general API would also work here. It would be a bit more complex, but it would reduce the number of different signatures needed.

@rfourquet
Copy link
Member

rfourquet commented Nov 14, 2017

It could be made more general so that it can return either nothing, in which case nothing happens, or another value, in which case the entry is replaced with it.

This is exactly what #22324 does, with less general cases being defined in terms of this one (cf. the diff, where it is documented in terms of Nullable while the actual implementation replaced it with Union{Some{T}, Void} to test performance differences).

@rfourquet
Copy link
Member

And if we go with this more general API, this should obviously wait till @nalimilan's work of replacing Nullable in base gets merged.

@nalimilan
Copy link
Member

Ah, right, the function I described is just a convenience helper based on the Nullable method.

@twistedcubic
Copy link
Contributor

From what I could find I don't think the branch replacing Nullable has been merged yet, but I started thinking about this, and I am going to implement a replace that takes a function as the first argument, much like what open does.

I don't understand the comment

It could be made more general so that it can return either nothing, in which case nothing happens, or another value, in which case the entry is replaced with it.

Shouldn't whether this new replace returns either nothing or another value be determined entirely by the user's function? I looked at the diff in #22324, but didn't find the equivalent there.

@JeffBezanson
Copy link
Sponsor Member

I don't think we need the nothing feature, since if you want no replacement you can return the argument unmodified.

@nalimilan
Copy link
Member

@twistedcubic Yes, it's been merged several months ago, see #23642. But we've decided against using nothing to mean "keep original value" in #25697, as it's not very useful as @JeffBezanson noted, and it's ambiguous when nothing is a possible value (unless you require wrapping all value in Some, which is cumbersome).

@twistedcubic
Copy link
Contributor

twistedcubic commented Feb 28, 2018

Thanks. I have been experimenting with variations of:

function replace(f::Function, s::AbstractString, pair::Pair; count::Integer=typemax(Int))
         res = replace(s, pair, count)
         return f(res)
 end

But have been running into issues with

replace("world.", "."=>"1") do word
       return "hey " * word
   end

such as

ERROR: MethodError: no method matching keys(::Pair{String,String})
Closest candidates are:
  keys(::Cmd) at process.jl:819
  keys(::Tuple) at tuple.jl:42
  keys(::Tuple, ::Tuple...) at tuple.jl:48
  ...
Stacktrace:
 [1] findnext(::Base.EqualTo{String}, ::Pair{String,String}, ::Int64) at ./array.jl:1635
 [2] findnext at ./deprecated.jl:730 [inlined]
 [3] findnext(::Pair{String,String}, ::String, ::Int64) at ./deprecated.jl:57
 [4] #replace#343 at ./strings/util.jl:386 [inlined]
 [5] replace at ./strings/util.jl:380 [inlined]
 [6] replace(::String, ::Pair{String,String}, ::Int64) at ./deprecated.jl:57
 [7] #replace#156 at ./REPL[250]:2 [inlined]
 [8] replace(::Function, ::String, ::Pair{String,String}) at ./REPL[250]:2
 [9] top-level scope

I'm going to chew on it more. In the meanwhile:

  • Wouldn't we want to add the same API to replace!?
  • Do we want to allow function-transform-as-first-arg for replace signatures other than String replacements? Although the problem with that is ambiguity with the replace that takes a predicate as the first arg, since such as function would no longer dispatch on AbstractString.

@nalimilan
Copy link
Member

Honestly I'm not so sure it's a good idea to add yet another method to the already crowded replace. I have a hard time grasping all the different variants that we support. Do we have a strong use case for this method? The example given that @StefanKarpinski above isn't very realistic. Also as @twistedcubic noted this method doesn't generalize to non-string collections due to ambiguities.

BTW, we could probably deprecate replace(pred::Function, A, new) in favor of replace(x -> pred(x) ? new : x), which would at least simplify a bit the situation.

@twistedcubic
Copy link
Contributor

@StefanKarpinski and @JeffBezanson thoughts on whether we should keep pursuing allowing a transform function as the first arg?

@epithet
Copy link
Contributor

epithet commented May 30, 2021

Maybe, this wouldn't add enough to warrant the added complexity.
Consider the minimal difference to what we can already do:

replace("The quick foxes run quickly.", r"quick|slow" => function(s)
    uppercase(s)
end)
replace("The quick foxes run quickly.", r"quick|slow") do s
    uppercase(s)
end

But I think this presents an opportunity to enable something more powerful, in a backwards-compatible way.
We currently have the choice between transforming the match with an arbitrary function, but without access to capture groups, and having access to capture groups, but without the ability to transform those further:

replace("The quick foxes run quickly.", r"quick|slow" => uppercase)
# The QUICK foxes run QUICKly.

replace("The quick foxes run quickly.", r"fox(es)?" => s"bus\1")
# The quick buses run quickly.

I propose to add a method suitable for do-block syntax, where the passed function has access to a RegexMatch object:

replace("The quick foxes run quickly.", r"fox(es)") do match
    "bus" * uppercase(match[1])
end
# The quick busES run quickly.

This would be a non-breaking change that adds more capabilities instead of just a new usage variant. Personally, I use the equivalent functionality in other languages from time to time. Of course, other standard libraries (cf. Python, JavaScript, Rust, .NET) don't bother to generalize their replace functions to other collections and I don't know whether string replacements are so much more useful to justify a special case. But I would say that do-blocks indicate that we want to do something a little more complicated than an one-liner (for which the existing method is still useful), so it would be somewhat fitting that we can do more complicated things.

Here's a slightly less trivial example of how this might be useful:

template = "the {animal} is {activity|upper}"
variables = Dict("animal" => "fox", "activity" => "running")
transformations = Dict("|upper" => uppercase)
replace(template, r"{(.*?)(\|.*?)?}") do match
    value = get(variables, match[1], "")
    t = get(transformations, match[2], x->x)
    t(value)
end
# the fox is RUNNING

I found an example in Franklin.jl that does basically something like that and could be cleaned up and made more performant with this method, i.e. one pass instead of compiling a new regex and executing replace for each variable.

@epithet
Copy link
Contributor

epithet commented May 31, 2021

Issue #36293 proposes an alternative solution.

@epithet
Copy link
Contributor

epithet commented Jun 4, 2021

Never mind, #40484 makes my proposal even more inconsistent and less useful. So it's at best a convenience add-on, but doesn't make sense as an alternative to #36293.

And if #36293 gets accepted, it would make do-block syntax for replace ambiguous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request strings "Strings!"
Projects
None yet
Development

No branches or pull requests

7 participants