speed-up randperm, randcycle and shuffle #14772

rfourquet · 2016-01-23T15:06:15Z

Cf. https://groups.google.com/forum/#!topic/julia-users/XxYcgNuHtJs.
This uses a less general but more efficient version of random generation within a range.

Note that amortizing the cost of creating RangeGenerator objects offers substantial benefits, but insufficient compared to this PR, e.g: randperm(10^7) takes here 0.92s on master, 0.47s when amortizing, and 0.23 with this PR (and 0.30s with the proposed R-based solution in the julia-users thread).

Note: it may be worth revisiting range generation, this PR is more of a stop-gap solution.

rfourquet · 2016-01-23T16:17:02Z

I hadn't run the full tests locally, an easy fix for the failure of travis is to change the value of the array at line 951 of test/sparsedir/sparse.jl. After that, I get a failed test at line 1102 of the same file... I'm not sure why, and don't have time to investigate now (help welcome, I don't know this code).

johnmyleswhite · 2016-01-23T17:44:30Z

Since you've read the R translation in that thread, I fear that we can't merge this code because it's tainted by the GPL.

ViralBShah · 2016-01-23T18:30:16Z

This implementation has nothing in common with the posted R solution. Why is it considered tainted?

johnmyleswhite · 2016-01-23T18:37:28Z

As far as I understand the logic of clean room implementations, no one who has read that thread can write code for this function now. By linking to the thread containing GPL code, there's prima facie evidence that the author has read the GPL code.

It's possible that an implementation that is provably non-derivative isn't tainted, but it's not clear to me what a judge would think of that argument.

ViralBShah · 2016-01-23T18:41:16Z

OK. But this is not even a clean room implementation of what was discussed in that thread. It optimizes Julia's rand functions in the existing Julia implementation of randperm.

johnmyleswhite · 2016-01-23T18:59:28Z

It's very possible I'm just being paranoid. I'm just worried by the acknowledgment of reading GPL code for functionality that's being updated.

StefanKarpinski · 2016-01-23T20:29:11Z

This code is completely unrelated to the GPL code that was posted in that thread and is fine legally.

rfourquet · 2016-01-24T09:51:06Z

I opened an issue JuliaLang/LinearAlgebra.jl#301 for the above mentioned test failure, which AFAICS is unrelated to this PR. By commenting out the failing line, all the tests pass locally. Should I comment it out here to have green tests, or rather wait for that other issue to be fixed first (I'm helpless about it) ?

rfourquet · 2016-01-24T11:09:14Z

So I just pushed a new version (with the failing test commented out), as I updated slightly the rest of the code.

ViralBShah · 2016-01-24T11:54:36Z

I will take a look at the failing test. Let's give a couple of days to see if we can sort it out.

speed-up randperm, randcycle and shuffle

ViralBShah · 2016-01-27T10:03:23Z

@rfourquet Do we have an open issue on range generation that should replace this stopgap solution? Would be great - just to serve as a reminder.

rfourquet · 2016-01-27T13:52:20Z

I don't think there is an open issue, but I had this in mind for a while (since some random generation functions became faster). Actually, I intended to try that on top of a 1+ year old branch which accelerates random generation of integers, but which depends on a solution like #9174 (btw, something like #9174 (comment) may also be useful for #14536), as I don't want rand(Int) to allocate in some cases.

StefanKarpinski · 2016-01-27T15:28:27Z

@rfourquet, would you mind opening an issue saying summarizing this state just to keep track of this?

rfourquet · 2016-01-29T05:36:26Z

I could do so yes, but not sure if it makes sense as there is no real issue or bug, besides a slight duplication of the code. To be honest, I'm not certain that we can improve the efficiency of random generation on ranges in a general way, such that we can use it for randperm (i.e. having the general solution as fast as the optimized version for randperm).

So I wait for your confirmation if you think it's worth having this issue, in the lines of "rand(Int) can be sped up by using an array in MersenneTwister, this depends on avoiding pointer_to_array allocations, and then check if this can be used to improve rand on a ranges".

ViralBShah · 2016-01-29T11:22:31Z

Avoiding pointer_to_array is definitely the first thing. Should we rebase that PR and see if we can get it merged? I think it is worth trying.

GunnarFarneback · 2016-06-16T14:34:16Z

While not breaking, this change produces different shuffles and permutations than 0.4, which can be inconvenient when you want to verify that your code gives the same results on 0.5 or compare the speed. I think it would be helpful to mention this somewhere in NEWS.md.

tkelman · 2016-08-03T11:27:15Z

@GunnarFarneback could you propose a wording for such a mention?

GunnarFarneback · 2016-08-08T18:37:46Z

Something like "shuffle, shuffle!, randperm, and randcycle have been switched to faster algorithms. As a side-effect they produce different results compared to 0.4 even if the random seed is the same."

The reference to the random seed is a simplification of the actual interaction with the RNG but should be good enough to convey the message that results will become different.

rfourquet force-pushed the rf/randperm/rand_lt branch from 07581a2 to 4ae6345 Compare January 23, 2016 15:19

kshyatt added the performance Must go faster label Jan 23, 2016

rfourquet mentioned this pull request Jan 24, 2016

cond (and normestinv) return not consistent values JuliaLang/LinearAlgebra.jl#301

Closed

speed-up randperm, randcycle and shuffle

fd93d45

rfourquet force-pushed the rf/randperm/rand_lt branch from 4ae6345 to fd93d45 Compare January 24, 2016 11:08

ViralBShah added a commit that referenced this pull request Jan 27, 2016

Merge pull request #14772 from rfourquet/rf/randperm/rand_lt

5ac7f75

speed-up randperm, randcycle and shuffle

ViralBShah merged commit 5ac7f75 into JuliaLang:master Jan 27, 2016

rfourquet deleted the rf/randperm/rand_lt branch January 29, 2016 05:36

rfourquet added the randomness Random number generation and the Random stdlib label May 14, 2017

rfourquet mentioned this pull request Jun 29, 2017

faster rand(::MersenneTwister, UnitRange{<:BitInteger}) (fix #22600) #22612

Merged

stevengj mentioned this pull request Jun 29, 2023

unify shuffle and randperm #50318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed-up randperm, randcycle and shuffle #14772

speed-up randperm, randcycle and shuffle #14772

rfourquet commented Jan 23, 2016

rfourquet commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

ViralBShah commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

ViralBShah commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

StefanKarpinski commented Jan 23, 2016

rfourquet commented Jan 24, 2016

rfourquet commented Jan 24, 2016

ViralBShah commented Jan 24, 2016

ViralBShah commented Jan 27, 2016

rfourquet commented Jan 27, 2016

StefanKarpinski commented Jan 27, 2016

rfourquet commented Jan 29, 2016

ViralBShah commented Jan 29, 2016

GunnarFarneback commented Jun 16, 2016

tkelman commented Aug 3, 2016

GunnarFarneback commented Aug 8, 2016

speed-up randperm, randcycle and shuffle #14772

speed-up randperm, randcycle and shuffle #14772

Conversation

rfourquet commented Jan 23, 2016

rfourquet commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

ViralBShah commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

ViralBShah commented Jan 23, 2016

johnmyleswhite commented Jan 23, 2016

StefanKarpinski commented Jan 23, 2016

rfourquet commented Jan 24, 2016

rfourquet commented Jan 24, 2016

ViralBShah commented Jan 24, 2016

ViralBShah commented Jan 27, 2016

rfourquet commented Jan 27, 2016

StefanKarpinski commented Jan 27, 2016

rfourquet commented Jan 29, 2016

ViralBShah commented Jan 29, 2016

GunnarFarneback commented Jun 16, 2016

tkelman commented Aug 3, 2016

GunnarFarneback commented Aug 8, 2016