Add replace & replace! for collections #22324

rfourquet · 2017-06-10T14:39:09Z

This adds methods to replace for AbstractArray, AbstractSet and Associative and a corresponding inplace replace! function.
Did I miss already existing similar functionality in Base?

The order in which to put the 2 function arguments pred and f can be discussed.

Also, replace!(f::Function, A, old) is not included as it would be ambiguous with replace!(pred, A, new) (and a new function name would be necessary, e.g. replaceif!); But it is assumed that in most of the cases, the replacement value f(x) can be computed by f(old) and then use replace!(A, old, f(old)). Otherwise, replace!(pred, f, A) can be used.

tkelman · 2017-06-10T15:27:26Z

test/algorithms.jl

@@ -0,0 +1,28 @@
+
+@testset "replace! & replace" begin


new test files would need to be added to test/choosetests.jl to actually run, but aren't there existing files with related tests this would fit with?

tkelman · 2017-06-10T15:28:11Z

base/algorithms.jl

@@ -0,0 +1,119 @@
+module Algorithms


while reorganizing some of base into modules is a good idea, I don't see why these two functions make sense as the only thing in an Algorithms module

Yes, I forgot to mention that in the OP. It's just that I didn't know to put those functions, but I open to suggestions.

Seems they'd simply belong in the files corresponding to the types they're implemented for? That is, base/abstractarray.jl, base/set.jl, and base/associative.jl. I agree it doesn't make sense to separate these into their own module.

But where to put the general methods which work on Union{AbstractArray,Associative,AbstractSet} ?

find* methods are defined in array.jl even when they apply to any iterator, which isn't ideal but I guess OK.

I just saw that the unique methods are defined in set.jl and multidimensional; the reason seems to be that they are defined right after the needed tools (e.g. Set or multidim tools) are defined. This would make much more sense to me put those functions also in an "algorithm.jl" file. E.g. I would expect to find in "set.jl" only the functions implementing the interface of Set, and possibly methods specialized for Set, but functions merely using a set in their implementation have nothing to do there.

But then what's are the boundaries of this file? Any function uses an "algorithm". Maybe we'd need a collections.jl file with all generic functions which apply to any collection. Anyway, better use an existing file for this PR, and open another one to regroup functions in a subsequent PR.

I agree "algorithm" is a bit vague (I "algorihtms" like in the c++ STL...), but no problem I will update and follow existing practices here.

tkelman · 2017-06-10T15:28:51Z

base/exports.jl

@@ -1351,6 +1351,9 @@ export
    nzrange,
    nnz,

+# Algorithms module re-exports
+    replace!,


needs to be added to stdlib doc index

tkelman · 2017-06-10T15:29:58Z

base/algorithms.jl

+julia> replace!(x->x.first=>3, Dict(1=>2, 3=>4), 1) do (k, v)
+           v < 3
+       end
+


there shouldn't be an empty line between input and output, should there?

tkelman · 2017-06-10T15:32:40Z

base/algorithms.jl

+
+Return a copy of collection `A` where all occurrences `x` for which
+`pred(x)` is true are replaced by `new` or `f(x)` (exactly one among
+`f` and `new` must be specified).


I think it would be better to list the different signatures instead of trying to explain constraints on non-syntactic square brackets here

Shall I put the 2 stacked signatures (one below the other), followed by a common doctstring?

I think that would be clearer, yes

iamed2 · 2017-06-10T18:12:09Z

Given this is new behaviour not needed by Base itself, could it live in https://github.com/JuliaCollections/DataStructures.jl or another package instead?

rfourquet · 2017-06-10T18:55:42Z

Why not, but this is very basic functionality... For example, C++ which is quite barebones in its standard library, has a replace function.

iamed2 · 2017-06-10T19:13:54Z

A lot of what is in the C++ standard library (e.g. deque, sorted sets/maps, etc.) lives in other packages such as DataStructures.jl. Some functionality which exists in the C++ standard library was moved from Base out to other packages (e.g. priority queues, combinatorics, etc.). Code in packages can be updated more frequently and doesn't contribute to Base size or the time it takes to run Julia's tests.

Just thought I'd suggest it, since this has come up for discussion before in cases where the code is not as easily decoupled from Base.

ararslan · 2017-06-10T19:37:20Z

Technically since replace is defined in Base and this is extending it with Base types, having it in a package would be committing type piracy. Also, FWIW, I've found myself wanting functionality like this often.

nalimilan

Sounds useful, thanks!

nalimilan · 2017-06-10T19:22:38Z

base/algorithms.jl

+Replace all occurrences `x` in collection `A` for which `pred(x)` is true
+by `new` or `f(x)` (exactly one among `f` and `new` must be specified).
+If `count` is specified, then replace at most `count` occurrences.
+This is the in-place version of [`replace`](@ref).


I don't think we usually refer to copying versions of the functions.

I thougth it was a good thing to document more related functions in docstrings.

nalimilan · 2017-06-10T19:24:51Z

base/algorithms.jl

+!!! note
+When `A` is an `Associative` or `AbstractSet` collection, if
+there are collisions among old and newly created keys, the result
+can be unexpected:


"can be unexpected" is a bit vague: the last insert key wins, which can be undefined results when the iteration order is undefined. For Set, is the order defined or not?

Also, I believe the text in Documenter !!! note blocks has to be indented four spaces.

The order is not defined for Set, but reproducible, so I chose values which work to illustrate my point. Will try to find a better wording.

Is it better now?

nalimilan · 2017-06-10T19:26:26Z

base/algorithms.jl

+`pred(x)` is true are replaced by `new` or `f(x)` (exactly one among
+`f` and `new` must be specified).
+If `count` is given, then replace at most `count` occurrences.
+See the in-place version [`replace!`](@ref) for examples.


Better repeat examples (that's what we do e.g. for sum).

nalimilan · 2017-06-10T19:27:02Z

base/algorithms.jl

+replace!(pred, A::ReplaceCollection, new, n::Integer=-1) = replace!(pred, y->new, A, n)
+
+"""
+    replace(pred, [f::Function], A, [new], [count])


count is called n for the AbstractString method. Better stay consistent (in one direction or the other).

nalimilan · 2017-06-10T19:31:00Z

base/algorithms.jl

+```jldoctest
+julia> a = [1, 2, 3, 1];
+
+julia> replace!(isodd, a, 0, 2); a


Why ; a? Doesn't the function return it already? BTW, maybe it would be useful to return count instead?

It was to clarify that a is modified in-place, but this may not be necessary. As to return count, I see that it could be useful, but I'm uncomfortable with the inconsistency of having the non-mutating replace returning the collection instead.

The inconsistency with replace doesn't sound a big issue to me, but I would be more concerned about consistency with other in-place functions.

nalimilan · 2017-06-10T19:31:25Z

base/algorithms.jl

+If `count` is given, then replace at most `count` occurrences.
+See the in-place version [`replace!`](@ref) for examples.
+"""
+replace(pred, new::Function, A::ReplaceCollection, n::Integer=-1) = replace!(pred, new, copy(A), n)


Wrap line at 92 chars.

nalimilan · 2017-06-10T19:36:21Z

base/algorithms.jl

+```
+"""
+function replace!(pred, new::Function, A::AbstractArray, n::Integer=-1)
+    n == 0 && return A


The AbstractString method uses n=0 by default, so it means "unlimited". Your choice sounds better since it allows using 0 for a no-op, which can be useful if that's the result of a computation, but better adapt the existing method then. Would also make sense to check that n > -1 to prevent unexpected things.

Yes I try propose to update the existing method in #22325. I will document the behavior for negatibe inputs, but I'm not sure it's useful to prevent n < -1.

I'm converting to a default value of typemax(Int), like for the string method.

nalimilan · 2017-06-10T19:37:51Z

base/algorithms.jl

+    A
+end
+
+const ReplaceCollection = Union{AbstractArray,Associative,AbstractSet}


Maybe you should leave the signature unrestricted until we have a MutableCollection type? Else, it won't be extensible for custom package types.

To avoid ambiguities, then pred has to be constrained to be a Function (or Base.Callable), which seems OK.

nalimilan · 2017-06-10T19:38:48Z

base/algorithms.jl

+
+# Examples
+```jldoctest
+julia> replace!(Set([1, 2, 3]), 1, 0)


Why not illustrate this with a plain Array first?

Because a previous method of the same method (hence in the same docstring) was already having an example with an Array, so I tried to vary.

It's fine to show examples using more complex structures, but I would start with the simplest example. Else it's not immediately obvious whether Set is particularly associated with that method or whether that's an arbitrary variation.

nalimilan · 2017-06-10T22:47:39Z

FWIW, I have found out for CategoricalArrays.recode that pairs offer a simple, flexible and efficient API for this kind of operation. For example, recode(x, 1=>2, 3=>4) can perform two replacement operations and the generated code is really fast (i.e. the loop over the pairs is completely unrolled).

Maybe something to consider to make replace more powerful while keeping the interface relatively simple. Also the difference between Pair and Function could help dispatch when it's not clear what's the meaning of an argument.

iamed2 · 2017-06-10T22:55:08Z

Technically since replace is defined in Base and this is extending it with Base types, having it in a package would be committing type piracy.

The string replace doesn't obey the same conventions (replaces subsets, not elements) and could actually cause confusion or method conflicts if there was an iterator version of replace defined (and I'll probably write one if it doesn't make it into this PR).

rfourquet · 2017-06-11T08:24:23Z

recode(x, 1=>2, 3=>4) can perform two replacement operations and the generated code is really fast

I don't know CategoricalArrays yet, but for the replace!(a, old, new) method can be optimized greatly for Associative and AbstractSet (or sorted arrays).

The string replace doesn't obey the same conventions (replaces subsets, not elements)

It may be possible to replace subsets for collection too... (but than may require to be more strict type for the old value for single element replacement, e.g. typeof(A))

nalimilan · 2017-06-11T08:48:52Z

I don't know CategoricalArrays yet, but for the replace!(a, old, new) method can be optimized greatly for Associative and AbstractSet (or sorted arrays).

Sorry, I guess my comment wasn't clear. What I mean is that using pairs allows recode(x, 1=>2, 3=>4) to do the equivalent of replace(replace(x, 1, 2), 3, 4), but in a single pass, and with quite efficient code. This has nothing categorical-specific.

The element vs. subset debate reminds me of issues discussed in the Search & Find Julep, with findfirst vs. (the proposed) findseq. But I don't really see the problem with strings: there is no ambiguity since replace("qwerty" ,"qw", "") isn't ambiguous. As @rfourquet it could be made to work for general arrays with some complications; might be worth using a separate replaceseq function for that. But it's so simple and natural for strings that it would be too bad not to support that feature.

iamed2 · 2017-06-11T12:10:42Z

Seems like that concern does fit in with the Julep. If there was a replaceseq function, the String version would definitely fit better there, though the naming suffers.

Lately I've been feeling that the string methods don't really jive with their iterator/array counterparts. But that's a general issue for somewhere else.

But I don't really see the problem with strings: there is no ambiguity since replace("qwerty" ,"qw", "") isn't ambiguous.

Suppose there was an iterator version of replace replace{T}(itr{T}, old, new) -> ReplaceIterator{eltype(T)} and replace{T}(pred, itr{T}, new) -> ReplaceIterator{eltype{T}}. If the first format is called on a String, the existing String method would take precedence. But if the user changed their call to call to the second format, the iterator version would be called and they would get a ReplaceIterator{Char} instead.

nalimilan · 2017-06-11T12:42:36Z

Suppose there was an iterator version of replace replace{T}(itr{T}, old, new) -> ReplaceIterator{eltype(T)} and replace{T}(pred, itr{T}, new) -> ReplaceIterator{eltype{T}}. If the first format is called on a String, the existing String method would take precedence. But if the user changed their call to call to the second format, the iterator version would be called and they would get a ReplaceIterator{Char} instead.

Yes, but at that point we would also add a specialized String method for the second form, returning a string. Anyway, replace isn't going to start returning an iterator, that would be Base.Iterators.replace.

iamed2 · 2017-06-11T13:10:39Z

Anyway, replace isn't going to start returning an iterator, that would be Base.Iterators.replace.

Ah, right, I forgot about that deprecation, sorry.

rfourquet · 2017-06-17T16:14:21Z

I think I addressed all the comments. I still have to know how to set the link to replace from the doc of replace! point to the collection version, not to the string version; and to not have a duplicate docstring for the collection version...

On the design level:

I think @nalimilan's example of using replace(A, old=>new) is interesting; but Allowing multiple pairs would not be compatible with a "count" argument (n): replace(A, p::Pair..., n=typemax(Int)) is not valid... As using a Dict of those pairs internally would probably be necessary for efficiency, it could be required to pass a Dict directly instead: replace(A, repl::Dict, n=typemax(Int)).
there are two forms when a predicate is given:

replace(pred, A, new, [n])
replace(pred, f::Callable, A, [n])

It would be cleaner to have instead replace(pred, A, f::Callable, [n]) for the second, to keep the same argument order, but this would collide when A can store callables... Do we care enough for this case? (I guess yes)

nalimilan · 2017-06-18T10:52:19Z

I think I addressed all the comments. I still have to know how to set the link to replace from the doc of replace! point to the collection version, not to the string version; and to not have a duplicate docstring for the collection version...

Can you elaborate? I think generally we point the docstring to the function, which concatenates docstrings for all methods. That's not ideal, but at least it includes the expected docstring.

On the design level:

I think @nalimilan's example of using replace(A, old=>new) is interesting; but Allowing multiple pairs would not be compatible with a "count" argument (n): replace(A, p::Pair..., n=typemax(Int)) is not valid... As using a Dict of those pairs internally would probably be necessary for efficiency, it could be required to pass a Dict directly instead: replace(A, repl::Dict, n=typemax(Int)).

Actually, no, the pairs should be passed as a varargs so that the compiler is able to unroll the checks rather than using a loop. With a small number of pairs, a dict isn't faster than repeated equality tests.

But you're right that the handling of n would be problematic in that case. We could make it a keyword argument. I guess it really depends on whether we want to make replace very general or rather keep it as simple as possible.

there are two forms when a predicate is given:
replace(pred, A, new, [n])
replace(pred, f::Callable, A, [n])

It would be cleaner to have instead replace(pred, A, f::Callable, [n]) for the second, to keep the same argument order, but this would collide when A can store callables... Do we care enough for this case? (I guess yes)

Yes, that's a tough call. The API based on pairs would solve this.

nalimilan · 2017-06-18T10:55:32Z

base/array.jl

+
+## replace/replace! ##
+
+# NOTE: to implement replace! and replace for a new type T, it's enough to define:


I don't think this is the place to document this. Though I don't really know where it could go. Also, I don't like recommending people to override an unexported function whose name starts with an underscore. Really, they can reimplement the three lines in replace that appear before the call to _replace, that's a negligible repetition.

Ok, I will replace _replace by replace in the recommandation. Until a better place is suggested, I will keep it here though.

I'd rather not have this at all, AFAIK we don't generally do this, and people are not supposed to read this file to find out...

This note is intended also for people adding a new method in Base... What do you think about keeping/removing this note @tkelman ?

rfourquet · 2017-06-18T14:32:59Z

Can you elaborate? I think generally we point the docstring to the function, which concatenates docstrings for all methods. That's not ideal, but at least it includes the expected docstring.

By specifying the signatures in .md files, it's possible to show only the docstring for the string version in "strings.md", and the general collections version in "collection.md". But unfortunately, the link from replace! choses to point at the string version. In the docs of Documenter, I didn't see a way to address that, someone has an idea? otherwise I will simply remove the link.

Actually, no, the pairs should be passed as a varargs so that the compiler is able to unroll the checks rather than using a loop. With a small number of pairs, a dict isn't faster than repeated equality tests.

OK, I see. Then indeed, why not set the count argument as a keyword. It may be clearer actually (and I expect it not to be used so often), and I understood that keyword args are meant to be faster eventually.

The API based on pairs would solve this.

I don't see that. Assuming there is replace(A, oldnew::Pair...; [count]), how do you propose the version with a predicate to be? (for which old is not necessary anymore).
One possibility is to remove replace(pred, A, new), and to merge merge f and pred for the last version, which can be more powerful:
replace(pred_f, A), with pred_f returning Nullable for the replacement value. This would allow to implement the pair version in terms of this pred_f (finding whether a pair matches (in the pred part) is work which can be reused for the f part).

nalimilan · 2017-06-18T15:38:29Z

By specifying the signatures in .md files, it's possible to show only the docstring for the string version in "strings.md", and the general collections version in "collection.md". But unfortunately, the link from replace! choses to point at the string version. In the docs of Documenter, I didn't see a way to address that, someone has an idea? otherwise I will simply remove the link.

I see. Maybe something like [`replace`](@ref replace(pred, A, new, [n])) would work (adapting the signature)? Else you'll have to ask @mortenpi.

I don't see that. Assuming there is replace(A, oldnew::Pair...; [count]), how do you propose the version with a predicate to be? (for which old is not necessary anymore).

Mmm, actually I was thinking about the possible ambiguity with the replace(A, old, new) form, but that doesn't help with the case you highlighted.

One possibility is to remove replace(pred, A, new), and to merge merge f and pred for the last version, which can be more powerful:
replace(pred_f, A), with pred_f returning Nullable for the replacement value. This would allow to implement the pair version in terms of this pred_f (finding whether a pair matches (in the pred part) is work which can be reused for the f part).

Interesting! It makes sense to do everything from a single function indeed. (Regarding the implementation of the pair version, I'm not sure what's your suggestion but you'd have to be very careful that everything is optimized.)

rfourquet · 2017-06-18T17:36:53Z

So I implemented your idea, and I like it very much:

to have a keyword arg for n is way clearer (it should be changed to count or something more explicit though)
having old=>new also is more clear
when the keyword is not specified, the speed are on par with the previous version (even faster, I don't understand why yet)
I kept the version replace(pred, A, new) as it doesn't hurt after all

Compare:

julia> replace(A, 2, 3, 8) # old version

julia> replace(A, 2=>3, count=8) # new version

PS: your trick to use [replace](@ref replace(pred, A, new, [n])) for the doc didn't work :(

nalimilan · 2017-06-18T20:53:17Z

Cool. I'd rename n now too since we won't be able to change it later (keyword argument). Maybe "maxcount"? Should check for consistency with other functions, if any. Too bad for the docstrings, maybe somebody else can help.

nalimilan · 2017-12-16T13:58:57Z

@rfourquet I guess this won't be in 0.7, but maybe we should deprecate replace(s, pattern, replacement) in favor of replace(s, pattern => replacement) now to avoid ambiguity issues in the future when adding more powerful functions? I also find it more explicit.

StefanKarpinski · 2017-12-16T17:59:56Z

I like that idea. It would also allow giving multiple replacements, which can potentially be done more efficiently in a single pass.

rfourquet · 2017-12-17T17:12:36Z

That sounds good indeed. I thought we didn't reach a decision yet on this design with pairs, but I still like it. It's true that I didn't plan to get it into 0.7, but mostly because it's not breaking and I didn't want to overload triage, so this could wait for 1.1. But if we go for this design and deprecate the existing replace, then given all the review which has gone into this PR already, I would propose to merge it too. I will rebase to get an updated feel of how it looks with your Union{Some{T}, Void} PR merged.

rfourquet · 2017-12-17T19:26:08Z

Let me recap the API proposed here, if it can help:

replace(A, 1=>2, 2=>3; count=3) to replace in A 1 by 2 and 2 by 3, with at most 3 replacements (count available for all methods).
replace(isodd, A, 0) to replace odd numbers by 0 in A
replace(x -> isodd(x) ? 2x : nothing) to replace odd numbers x by 2x. Here, 2x can be written as Some(2x) (Some will be useful when you want to replace a value by nothing).

Also included the mutating variant, replace!.

nalimilan

Thanks! If @StefanKarpinski is fine with merging this now, I think we should go ahead. Most of my comments could be addressed later as they don't affect the API, but I think the AbstractString-specific method should be deprecated (see below).

nalimilan · 2017-12-18T13:12:53Z

base/set.jl

+
+# we use this wrapper because using directly eltype(A) as the type
+# parameter below for Some degrades performance
+function _replace!(A, ::Type{K}, count::Integer, old_new #=::Pair...=#) where {K}


Mark old_new with ::Tuple{Vararg{Pair}}?

Good catch!

nalimilan · 2017-12-18T13:13:57Z

base/set.jl

+## replace/replace! ##
+
+# NOTE: to implement replace! and replace for a new type T, it's enough to define:
+# function _replace!(pred::Callable, A::T, count::Int)


Again, I don't really like recommending people to overload an internal function. Is there a way to make this replace!(pred::Callable, A::T; count::Int)? BTW, keyword arguments are supposed to be faster now.

Ah right, I didn't re-evaluate this choice in light of the supposedly faster keyword args. I would be willing to benchmark that, but as I'm very short on time now, I will accept your offer to address that later by giving it a lower priority.

Why not go with the simplest solution of defining only the keyword argument version for now, and see later whether performance can be improved?

nalimilan · 2017-12-18T13:17:09Z

base/set.jl

+
+# Examples
+```jldoctest
+julia> replace!(Dict(1=>2, 3=>4)) do kv


Could you add a simpler example which doesn't use the do syntax, and an example using Some?

nalimilan · 2017-12-18T13:20:06Z

base/set.jl

+replace(prednew::Callable, A; count::Integer=typemax(Int)) = _replace!(prednew, copy(A), count)
+
+# Handle ambiguities
+replace!(a::Callable, b::Pair; count::Integer=-1) = throw(MethodError(replace!, a, b))


Should be (a, b). Else you get a MethodError, but it happens when failing to create the MethodError. ;-)

But should these really be errors? It's perfectly valid to have an array of pairs and to replace some pairs with others.

Ah yes right! Those were not tested, just inserted to silence some ambiguities. You are right that what you said is perfectly valid, but this method doesn't have an array as parameter! This is just a silly signature, but Julia needs it!

nalimilan · 2017-12-18T13:22:15Z

base/set.jl

+### _replace! for AbstractDict/AbstractSet
+
+askey(k, ::AbstractDict) = k.first
+askey(k, ::AbstractSet) = k


These two convenience functions could interest @andyferris, who tried to unify iteration over sets and dicts.

Maybe that would justify defining pairs(::AbstractSet).

How so? That seems backwards from the problem. The issue here seems to be that pop!(::AbstractDict, key) / delete! is inconsistent with the way the rest of the way the Associative interface is defined (e.g. it should remove an element, not a key). For consistency with our other indexed containers, these methods should perhaps be renamed splice! or deleteat!. (later in v1.x we can then re-add pop!(dict, element))

FWIW, deleteat! vs delete! has been discussed at #20531.

nalimilan · 2017-12-18T13:23:22Z

base/set.jl

+askey(k, ::AbstractDict) = k.first
+askey(k, ::AbstractSet) = k
+
+function _replace_update_hash!(repl::Vector{<:Pair}, x, y::Some)


I don't understand why it's called "hash". Maybe "dict" instead?

I just had to quicly find a name, and both Dict and Set use a hash, but it's a wrong name indeed as this is a helper function for Union{AbstractDict,AbstractSet} in general

nalimilan · 2017-12-18T13:25:08Z

base/set.jl

+    repl = Pair{eltype(A),eltype(A)}[]
+    c = 0
+    for x in A
+        c +=_replace_update_hash!(repl, x, prednew(x))


Isn't it possible to do the replacement in a single pass?

Missing space after +=.

I think it's possible for particular containers, but in general we don't know if mutating the container while iterating is valid (e.g. for a hash-table, there could be a rehash after one insertion, and the rest of the iteration may not visit all non-visited elements. As a first step I wanted something always safe, but it's likely that this can be improved at least for Set and Dict.

nalimilan · 2017-12-18T13:30:29Z

base/set.jl

+    c = 0
+    for i in eachindex(A)
+        c += _replace_update!(A, i, prednew(A[i]))
+        c == count && break


That's probably an overkill optimization, but I wonder whether using a special loop for count == typemax(Int) could be worth it in some cases where the function is so simple that SIMD might be enabled without the break call. Maybe leave that to future improvements after some benchmarking.

I will add that on my todo list to not forget :)

nalimilan · 2017-12-18T13:32:29Z

doc/src/stdlib/strings.md

@@ -38,7 +38,7 @@ Base.searchindex
 Base.rsearchindex
 Base.contains(::AbstractString, ::AbstractString)
 Base.reverse(::Union{String,SubString{String}})
-Base.replace
+Base.replace(s::AbstractString, pat, f)


This old method is inconsistent with the new ones. Maybe deprecate it in favor of pat => f?

Done in #25165 :)

nalimilan · 2017-12-18T13:33:27Z

test/sets.jl

@@ -344,3 +344,37 @@ end
    @test typeof(cssset) == Set{String}
    @test cssset == Set(["foo", "bar"])
 end
+
+@testset "replace! & replace" begin
+    maybe1(v, p) = if p Some(v) end


The point of using Some is to be able to distinguish nothing. Can you add a test with Some(nothing) to ensure this works?

rfourquet · 2017-12-18T16:49:54Z

I addressed most of comments @nalimilan thanks!
I just kept the deprecation for replace(::String, ...) for another PR.
I ended up doing few benchmarks concerning the _replace! methods which avoids forwarding keywords arguments, and it seems there is basically no overhead now with that (except 1 allocation, which doens't affect the performance), so I followed your suggestion to get rid of this _replace!.

nalimilan

Thanks! Looks good, though it needs a NEWS entry.

The deprecation of the other method should also happen quite quickly since we're theoretically past feature freeze.

vtjnash · 2017-12-18T17:24:51Z

base/set.jl

+        c == count && break
+    end
+    for oldnew in repl
+        pop!(A, askey(first(oldnew), A))


should this do all deletions before doing all insertions?

I did it this way to avoid 2 passes, and as a result had to put a warning in the docstring. But you are definitely right. It's much more predictable, and the basic benchmarks I just did don't show any overhead :) thanks!

splitting the oldnew replacements list into old and new lists might even help performance? (for non-isbits Pair{Key, Value})

I just did a benchmark with strings, and it doesn't seem to make a difference. I think I will not bother now, and will investigate when I have time again, but thanks.

nalimilan · 2017-12-21T16:51:42Z

@rfourquet I've tried rebasing via GitHub but I apparently failed. Would you do it, so that the PR can be merged ASAP?

rfourquet · 2017-12-21T17:05:03Z

Would you do it, so that the PR can be merged ASAP?

Sure, but I thought it would need a more clear decision? I put earlier today the triage label on the replace PR for strings, I guess the outcome of it can apply here too.

nalimilan · 2017-12-21T17:11:57Z

Sure, but there seems to be a broad consensus that it's a positive change, so...

rfourquet · 2017-12-22T10:34:27Z

Triage said yes to this API for strings, so I will take it we go with this one too! CI is remarkably all green and there are not yet conflicts, I would say this is a good oppotunity window for someone to click the green button.

nalimilan · 2018-01-21T10:10:50Z

@rfourquet Can you remind me why we use nothing to mean "no replacement"/"keep existing value"? Couldn't we just require the user function to return the original value in that case? Then nothing could mean "drop value".

rfourquet · 2018-01-21T15:55:18Z

Can you remind me why we use nothing to mean "no replacement"/"keep existing value"?

Because I initially used Nullable, where isnull means "no replacement", I hadn't thought of alternatives.

Couldn't we just require the user function to return the original value in that case? Then nothing could mean "drop value".

Sounds like an excellent idea! then it's a bit more than purely replacing values, but after all, replace for strings already allows deleting some parts (by providing an empty string as the replacement). I started playing with your idea and this seems to work well. Unless you like to work on this yourself, I will look at the performance tomorrow and prepare a PR to discuss further this API.

nalimilan · 2018-01-21T18:07:16Z

Cool. We don't necessarily need to allow dropping some elements, but if we stop using nothing for that purpose we can always allow this later if we want.

rfourquet · 2018-01-22T11:20:01Z

My preliminary benchmarks seem to indicate mainly no performance regression :)

We don't necessarily need to allow dropping some elements, but if we stop using nothing for that purpose we can always allow this later if we want.

indeed, except when the replacement value is nothing.

rfourquet added the collections Data structures holding multiple items, e.g. sets label Jun 10, 2017

rfourquet force-pushed the rf/replace branch 2 times, most recently from 30e97aa to d0c2d57 Compare June 10, 2017 15:16

tkelman reviewed Jun 10, 2017

View reviewed changes

nalimilan reviewed Jun 10, 2017

View reviewed changes

rfourquet mentioned this pull request Jun 11, 2017

make replace with count=0 a no-op #22325

Merged

rfourquet force-pushed the rf/replace branch from d0c2d57 to b6f1398 Compare June 17, 2017 16:14

nalimilan reviewed Jun 18, 2017

View reviewed changes

rfourquet force-pushed the rf/replace branch from 0298b33 to f0c7d51 Compare June 18, 2017 17:30

nalimilan mentioned this pull request Nov 14, 2017

replace: allow transform function as first argument? #24598

Open

rfourquet mentioned this pull request Nov 22, 2017

hook up stdlib to the standard test running system #24701

Merged

nalimilan mentioned this pull request Dec 2, 2017

parse with Missing JuliaData/Missings.jl#61

Open

rfourquet force-pushed the rf/replace branch from 9b22856 to 124ca4c Compare December 17, 2017 19:17

nalimilan reviewed Dec 18, 2017

View reviewed changes

nalimilan changed the title ~~[WIP] add replace & replace! for collections~~ Add replace & replace! for collections Dec 18, 2017

rfourquet force-pushed the rf/replace branch from 124ca4c to 4ff02f0 Compare December 18, 2017 16:45

nalimilan approved these changes Dec 18, 2017

View reviewed changes

vtjnash reviewed Dec 18, 2017

View reviewed changes

rfourquet mentioned this pull request Dec 18, 2017

deprecate replace(s::String, pat, r, n) to replace(s, pat=>r, count=n) #25165

Merged

rfourquet force-pushed the rf/replace branch from 4ff02f0 to babf788 Compare December 18, 2017 18:35

add replace & replace! for collections

158ddfe

rfourquet force-pushed the rf/replace branch from 9a0cf08 to 158ddfe Compare December 21, 2017 18:11

StefanKarpinski merged commit e237f4f into master Dec 22, 2017

StefanKarpinski deleted the rf/replace branch December 22, 2017 11:54

rfourquet mentioned this pull request Jan 6, 2018

fix replace! stack overflow without method overwrite #25422

Closed

rfourquet mentioned this pull request Jan 23, 2018

replace the API of replace #25697

Merged


		## replace/replace! ##

		# NOTE: to implement replace! and replace for a new type T, it's enough to define:

Add replace & replace! for collections #22324

Add replace & replace! for collections #22324

Conversation

rfourquet commented Jun 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman Jun 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamed2 commented Jun 10, 2017

rfourquet commented Jun 10, 2017

iamed2 commented Jun 10, 2017

ararslan commented Jun 10, 2017

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Jun 10, 2017

iamed2 commented Jun 10, 2017 • edited Loading

rfourquet commented Jun 11, 2017

nalimilan commented Jun 11, 2017

iamed2 commented Jun 11, 2017

nalimilan commented Jun 11, 2017

iamed2 commented Jun 11, 2017

rfourquet commented Jun 17, 2017

nalimilan commented Jun 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfourquet commented Jun 18, 2017

nalimilan commented Jun 18, 2017

rfourquet commented Jun 18, 2017

nalimilan commented Jun 18, 2017

nalimilan commented Dec 16, 2017

StefanKarpinski commented Dec 16, 2017

rfourquet commented Dec 17, 2017

rfourquet commented Dec 17, 2017

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman Jun 10, 2017 •

edited

Loading

iamed2 commented Jun 10, 2017 •

edited

Loading