Add unique(AbstractArray, dim) #5811

simonster · 2014-02-14T01:44:58Z

Efficiently finds the unique columns, rows, etc. of an array. The algorithm first hashes along the specified dimension, then finds the unique hashes, and finally checks that the hashes don't collide. It is roughly O(n) in the number of elements in the array.

Compared with MATLAB's unique(x, 'rows'), which first sorts the rows, this approach gives much more consistent and almost always better performance. It is ~10% slower for MATLAB's best case (when values are random, and so sorting requires inspecting only a single column) and much faster for other cases (it is 25x faster than MATLAB for 500 repeats of the same 10 random rows with 5000 columns).

This is my first time using Cartesian. Without it, this code is presently about 10% faster for finding unique rows of a matrix, but the overhead is probably worth it for the generality.

Efficiently finds the unique columns, rows, etc. of an array. The algorithm first hashes each row, then finds the unique hashes, and finally checks that the hashes don't collide. It is roughly O(n) in the number of elements in the matrix. This is my first time using Cartesian. Without it, this code is presently about 10% faster for finding unique rows of a matrix, but the overhead is probably worth it for the generality.

timholy · 2014-02-14T11:05:50Z

Very nice! You're already a master.

timholy · 2014-02-14T18:47:19Z

I think this is good functionality to have in base. If no other objections arise, I say this should be merged.

Add unique(AbstractArray, dim)

simonster · 2014-02-15T20:35:45Z

Thanks for reviewing this, Tim!

simonster mentioned this pull request Feb 14, 2014

Strange errors with ifelse #5813

Closed

unique(AbstractArray, dim): incorporate @timholy's suggestions

c402aea

timholy added a commit that referenced this pull request Feb 15, 2014

Merge pull request #5811 from JuliaLang/sjk/uniquedim

65ed7aa

Add unique(AbstractArray, dim)

timholy merged commit 65ed7aa into master Feb 15, 2014

simonster deleted the sjk/uniquedim branch February 15, 2014 20:35

simonster added a commit to simonster/julia that referenced this pull request Feb 16, 2014

Update NEWS and docs for JuliaLang#5811

2a8bfbf

simonster added a commit to simonster/julia that referenced this pull request Feb 16, 2014

Update NEWS and docs for JuliaLang#5811

bffa6c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unique(AbstractArray, dim) #5811

Add unique(AbstractArray, dim) #5811

simonster commented Feb 14, 2014

timholy commented Feb 14, 2014

timholy commented Feb 14, 2014

simonster commented Feb 15, 2014

Add unique(AbstractArray, dim) #5811

Add unique(AbstractArray, dim) #5811

Conversation

simonster commented Feb 14, 2014

timholy commented Feb 14, 2014

timholy commented Feb 14, 2014

simonster commented Feb 15, 2014