Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added unique!(f, itr) function #30141

Merged
merged 5 commits into from
Dec 5, 2018
Merged

Added unique!(f, itr) function #30141

merged 5 commits into from
Dec 5, 2018

Conversation

raghav9-97
Copy link
Contributor

@raghav9-97 raghav9-97 commented Nov 24, 2018

Added a naive implemetation of unique!(f, itr).Fixes #28415
Please comment for further amendments.

base/set.jl Outdated
end
resize!(A, count)
end

# If A is not grouped, then we will need to keep track of all of the elements that we have
# seen so far.
function _unique!(A::AbstractVector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaces now with

_unique!(A::AbstractVector) = unique(identity, A)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oxinabox Could you please explain it a little bit as this is my first contribution to Julia Lang?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was being overly terse (unusual for me :-P)

Once you have defined a function unique!(f, A), like you are defining.
The function _unique!(A) here,
can be simply defined as call to unique!(identity, A).

identity is a function in Julia that does x->x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for explaining it @oxinabox

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified _unique according to the need

base/set.jl Outdated
```
"""
function unique!(f, A::AbstractVector)
seen = Set{eltype(A)}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not going to work our so well.

Consisder:
Current behavour of unique

julia> unique(x-> x % 2 ==0 ? :even : :odd, [1,2,3,4,2,2,1])
2-element Array{Int64,1}:
 1
 2

vs for the this implementation of unique!

julia> unique!(x-> x % 2 ==0 ? :even : :odd, [1,2,3,4,2,2,1])
ERROR: MethodError: Cannot `convert` an object of type Symbol to an object of type Int64
Closest candidates are:
  convert(::Type{T<:Number}, ::T<:Number) where T<:Number at number.jl:6
  convert(::Type{T<:Number}, ::Number) where T<:Number at number.jl:7
  convert(::Type{T<:Integer}, ::Ptr) where T<:Integer at pointer.jl:23
  ...
Stacktrace:
 [1] setindex!(::Dict{Int64,Nothing}, ::Nothing, ::Symbol) at ./dict.jl:373
 [2] push!(::Set{Int64}, ::Symbol) at ./set.jl:48
 [3] new_unique!(::getfield(Main, Symbol("##3#4")), ::Array{Int64,1}) at ./REPL[8]:9
 [4] top-level scope at none:0

Copy link
Contributor Author

@raghav9-97 raghav9-97 Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the necessary changes.Please comment if further changes required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine now.
I feel like there is an optimisation that could be done to remove the Set{Any}() (like in map)
but that it is not required in this PR, since this works the same way as unique (and it might not be required at all, benchmarking should be done)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ultimately this is fine by now isn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK it is fine,
but you'll have to wait for someone with merge permissions to do a real review/merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why travis-ci build test is failing repeatedly.Any advice would be helpful

@raghav9-97 raghav9-97 changed the title RFC : Added naive unique!(f, itr) function RFC : Added unique!(f, itr) function Nov 26, 2018
Copy link
Member

@andyferris andyferris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, thanks for the contribution. Unfortunately we'll have to fix this up a bit to reach optimal efficiency, but we'll get there :)

  • Avoid creating a Set{Any}()

```
"""
function unique!(f, A::AbstractVector)
seen = Set()
Copy link
Member

@andyferris andyferris Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this won't be efficient because this means Set{Any}().

Since relying on inference is naughty, what should probably happen here is a widening pattern. Start with a Set{typeof(first(A))}() and then keep pushing to this set until the element type isn't wide enough. Someone else might know better, but I think the most efficient pattern is then to dispatch again to more-or-less the same function. This is, unfortunately, a little bit of work to implement, sorry.

You'll want an inner function like _unique!(f, A, seen, i) where seen is the current Set and i is the current index The outer unique! function is creates the initial Set and dispatches to the inner function.

i.e. something like

function unique!(f, A::AbstractArray)
    if length(A) <= 1
        return A
    end

    i = firstindex(A)
    t = @inbounds A[i]
    seen = Set{typeof(x)}()
    push!(seen, t)
    return _unique!(f, A, seen, 1, i+1)
end

and the other funciton approximately

function _unique!(f, A::AbstractArray, seen::Set, count::Integer, i::Integer)
    while i <= lastindex(A)
        t = @inbounds A[i]
        if t  seen
            count += 1
            @inbounds A[count] = t
            if typeof(t) <: eltype(seen)
                push!(seen, t)
            else
                seen2 = convert(Set{promote_typejoin(eltype(seen), typeof(t))}, seen)
                push!(seen2, t)
                return _unique!(f, A, seen2, count, i+1)
            end
        end
        i += 1
    end
    return resize!(A, count)
end

I haven't tested the above at all, but I hope it demonstrates the pattern at least. The compiler will elide all the complexity in the standard case that f(first(A)) is inferrable. Also note that I've assumed all AbstractVectors have AbstractUnitRange indices in the above to avoid the need to use iterate, but the other pattern is fine too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree,
I also think that can be in a seperate PR.
Right now, the existing unique(f, iter) (no bang),
does the same thing.
And can be optimised in the same way.

This PR adds a naive implementation that can be improved later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this could be implemented in a different PR as both with and without bang version of unique(f, iter) uses naive method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyferris I can try to implement the inner function _unique within this PR if this is what is needed to merge it but will need some help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I didn’t mean to block this - iterating later is of course fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyferris you might want to dismiss your review,
However that is done,
(possibly by rereviewing?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyferris Please re review this PR.

@raghav9-97 raghav9-97 changed the title RFC : Added unique!(f, itr) function Added unique!(f, itr) function Dec 5, 2018
@@ -196,6 +221,10 @@ function _unique!(A::AbstractVector)
resize!(A, count)
end

# If A is not grouped, then we will need to keep track of all of the elements that we have
# seen so far.
_unique!(A::AbstractVector) = unique!(identity, A::AbstractVector)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the call to unique! here will now use a Set{Any} this will decrease regular unique!(f,A)s performance, and I think Andy's objection should be fixed in this PR, since this PR not only adding a new feature that can be optimized later, it makes performance of existing code worse. It would at least be good if you could present some benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will try to build the inner _unique function as suggested by @andyferris.In the meantime I will benchmark the existing solution and will present it here.

Copy link
Contributor Author

@raghav9-97 raghav9-97 Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fredrikekre When I run @btime unique!(x->2x, 1:10_000); it says
setindex! not defined for UnitRange{Int64}
I couldn’t understand what the error message says?
Same problem occurs with the @benchmark macro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because 1:10_000 is an immutable array, try instead @btime unique!(x->2x, $([1:10_000;]))

Copy link
Contributor Author

@raghav9-97 raghav9-97 Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some benchmarking results for unique!(f, iter) and unique!(iter) are stated:
@benchmark unique!(x->x^2, $([1:10_000;]))

  memory estimate:  786.97 KiB
  allocs estimate:  37981
  --------------
  minimum time:     995.317 μs (0.00% GC)
  median time:      1.050 ms (0.00% GC)
  mean time:        1.159 ms (3.08% GC)
  maximum time:     38.007 ms (96.42% GC)
  --------------
  samples:          4305
  evals/sample:     1

@benchmark unique!($([1:10_000;]))

  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     38.053 μs (0.00% GC)
  median time:      38.081 μs (0.00% GC)
  mean time:        41.537 μs (0.00% GC)
  maximum time:     10.190 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Copy link
Contributor

@oxinabox oxinabox Dec 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I lead you down the wrong path with my coimment on #30141 (comment)

Replacing _unique!(iter) with unique!(identity, iter) would only have (certainly) been optimised away if the Set{Any} had not been needed.

Can you also report benchmarks for julia master for the old performance of unique!($([1:10_000;])), without this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a branch, ajf/faster_unique_f. Completely untested, but I'll finish it up.

@fredrikekre I was thinking it would be better to have small PRs with seperate concerns, as Lyndon suggests. Let's simply merge this feature PR from @raghav9-97 (his first Julia PR if I understand) and I'll follow up straight away with the performance PR that addresses both unique and unique!. Are you OK with that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, this particular line does leave us in an bad intermediate state... oh well.

@andyferris andyferris merged commit 5cbbed3 into JuliaLang:master Dec 5, 2018
@fredrikekre
Copy link
Member

Should have been squashed.

@andyferris
Copy link
Member

Should have been squashed.

Oh... is this the general policy?

@fredrikekre
Copy link
Member

This PR had a couple of unrelated things like a test by me in one commit etc, and generally when there are "fix-up" commits it is nice to squash IMO, since the intermediate commits are useless.

@raghav9-97
Copy link
Contributor Author

Thanks for guiding me through my very first contribution in Julia @fredrikekre @oxinabox @andyferris @rfourquet. I know there were some mistakes in this PR which I would work upon in the future PRs.

@raghav9-97 raghav9-97 deleted the solveissue branch December 6, 2018 10:57
fredrikekre added a commit that referenced this pull request Dec 7, 2018
Compat annotation for rank(A; rtol=..., atol=...), #29926.
ararslan pushed a commit that referenced this pull request Dec 7, 2018
* Compat annotation for unique!(f, A), #30141.
Compat annotation for rank(A; rtol=..., atol=...), #29926.

* Update stdlib/LinearAlgebra/src/generic.jl
KristofferC pushed a commit that referenced this pull request Dec 7, 2018
* Compat annotation for unique!(f, A), #30141.
Compat annotation for rank(A; rtol=..., atol=...), #29926.

* Update stdlib/LinearAlgebra/src/generic.jl

(cherry picked from commit 4fc446f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants