Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement unique! #20549

Closed
StefanKarpinski opened this issue Feb 9, 2017 · 9 comments
Closed

implement unique! #20549

StefanKarpinski opened this issue Feb 9, 2017 · 9 comments
Labels
good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Feb 9, 2017

We have a unique function that produces a new collection similar to its argument, but with each item only occurring once (in order of first appearance). There should be a corresponding unique! function that removes recurrences of items and returns this modified collection. Note that for performance, at least on dense arrays, resizing of the array should only occur at end when you know the final size to shrink the array to. Part of #20402.

@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Feb 9, 2017
@StefanKarpinski StefanKarpinski added good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request labels Feb 9, 2017
@kvmanohar22
Copy link
Contributor

kvmanohar22 commented Feb 10, 2017

I don't understand by ...removing recurrences of items... , what is the additional functionality that would be incorporated in unique! function, could you please provide a small example ?

@ararslan
Copy link
Member

@kvmanohar22

julia> x = [2, 2, 3, 1, 2, 3, 1];

julia> unique(x) # Retrieves unique elements without modifying x
3-element Array{Int64,1}:
 2
 3
 1

julia> unique!(x) # Modifies x to only contain its unique elements in order of occurrence
3-element Array{Int64,1}:
 2
 3
 1

julia> x # Proof that x has been modified
3-element Array{Int64,1}:
 2
 3
 1

@kvmanohar22
Copy link
Contributor

I would like to implement this.
Would this be a good approach ?

function unique!(itr)
a= _default_eltype(typeof(itr))[]

# check if `val` is present in `a` if not append it 
push!(a, val)

# finally
return a

@ararslan
Copy link
Member

That's a start, but notice that itr has not been modified in place in your code. I recommend taking a look at how unique is implemented and basing your approach off of that. Do note though that while unique can accept any iterable, unique! can only accept mutable iterables, otherwise the input won't (or rather can't) be modified in place.

One possible approach (probably not the best) would be to locate the indices in the input which contain duplicated elements, then just call deleteat!.

@kvmanohar22
Copy link
Contributor

How about we sort the given iterable in place and remove the duplicates ?
Something like this...

function unique!(itr)
# sort `itr`
sort!(itr)

# remove duplicates
for i=1:length(itr)-1
   if i<length(itr)
      if itr[i]==itr[i+1]
         deleteat!(itr, i)
      end
   end
end

return itr

@ararslan
Copy link
Member

The problem with sorting is that you lose the order of occurrence. Notice in my example above that unique(x) == [2, 3, 1]; with your proposal, unique!(x) would turn x into [1, 2, 3].

A very naive way to do it, posting here solely for the sake of example, would be

function unique!(itr)
    seen = Set{eltype(itr)}()
    for (i, x) in enumerate(itr)
        if x in seen
            # If x has already been encountered in itr, remove it
            deleteat!(itr, i)
        else
            # Otherwise note that we've encountered it
            push!(seen, x)
        end
    end
    return itr
end

Does that approach make sense? This is somewhat similar to how unique is currently implemented. As I said, I recommend taking a look at the code for unique to get a feel for how to efficiently approach the problem.

@fredrikekre
Copy link
Member

The implementation above will give BoundsError since the itr will be shortened inside the loop.

julia> A = [1, 2, 1];

julia> unique!(A)
ERROR: BoundsError: attempt to access 2-element Array{Int64,1} at index [4]
 in next at ./iterator.jl:48 [inlined]
 in unique!(::Array{Int64,1}) at ./REPL[23]:3

I think the indices of the duplicates should be stored somehow and then the itr should be resized only once after the loop. Ref original post:

resizing of the array should only occur at end when you know the final size to shrink the array to

@kvmanohar22
Copy link
Contributor

@fredrikekre to avoid BoundsError, I have put the condition on index, the first if condition makes sure that there is no BoundsError

@fredrikekre
Copy link
Member

Feel free to open a work in progress PR, then its easier for people to comment on your implementation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors to Julia help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

5 participants