Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unique(AbstractArray, dim) #5811

Merged
merged 2 commits into from
Feb 15, 2014
Merged

Add unique(AbstractArray, dim) #5811

merged 2 commits into from
Feb 15, 2014

Conversation

simonster
Copy link
Member

Efficiently finds the unique columns, rows, etc. of an array. The algorithm first hashes along the specified dimension, then finds the unique hashes, and finally checks that the hashes don't collide. It is roughly O(n) in the number of elements in the array.

Compared with MATLAB's unique(x, 'rows'), which first sorts the rows, this approach gives much more consistent and almost always better performance. It is ~10% slower for MATLAB's best case (when values are random, and so sorting requires inspecting only a single column) and much faster for other cases (it is 25x faster than MATLAB for 500 repeats of the same 10 random rows with 5000 columns).

This is my first time using Cartesian. Without it, this code is presently about 10% faster for finding unique rows of a matrix, but the overhead is probably worth it for the generality.

Efficiently finds the unique columns, rows, etc. of an array. The
algorithm first hashes each row, then finds the unique hashes, and
finally checks that the hashes don't collide. It is roughly O(n) in
the number of elements in the matrix.

This is my first time using Cartesian. Without it, this code is
presently about 10% faster for finding unique rows of a matrix, but
the overhead is probably worth it for the generality.
@timholy
Copy link
Member

timholy commented Feb 14, 2014

Very nice! You're already a master.

@timholy
Copy link
Member

timholy commented Feb 14, 2014

I think this is good functionality to have in base. If no other objections arise, I say this should be merged.

timholy added a commit that referenced this pull request Feb 15, 2014
@timholy timholy merged commit 65ed7aa into master Feb 15, 2014
@simonster simonster deleted the sjk/uniquedim branch February 15, 2014 20:35
@simonster
Copy link
Member Author

Thanks for reviewing this, Tim!

simonster added a commit to simonster/julia that referenced this pull request Feb 16, 2014
simonster added a commit to simonster/julia that referenced this pull request Feb 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants