-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed-up randperm, randcycle and shuffle #14772
Conversation
07581a2
to
4ae6345
Compare
I hadn't run the full tests locally, an easy fix for the failure of travis is to change the value of the array at line 951 of test/sparsedir/sparse.jl. After that, I get a failed test at line 1102 of the same file... I'm not sure why, and don't have time to investigate now (help welcome, I don't know this code). |
Since you've read the R translation in that thread, I fear that we can't merge this code because it's tainted by the GPL. |
This implementation has nothing in common with the posted R solution. Why is it considered tainted? |
As far as I understand the logic of clean room implementations, no one who has read that thread can write code for this function now. By linking to the thread containing GPL code, there's prima facie evidence that the author has read the GPL code. It's possible that an implementation that is provably non-derivative isn't tainted, but it's not clear to me what a judge would think of that argument. |
OK. But this is not even a clean room implementation of what was discussed in that thread. It optimizes Julia's rand functions in the existing Julia implementation of randperm. |
It's very possible I'm just being paranoid. I'm just worried by the acknowledgment of reading GPL code for functionality that's being updated. |
This code is completely unrelated to the GPL code that was posted in that thread and is fine legally. |
I opened an issue JuliaLang/LinearAlgebra.jl#301 for the above mentioned test failure, which AFAICS is unrelated to this PR. By commenting out the failing line, all the tests pass locally. Should I comment it out here to have green tests, or rather wait for that other issue to be fixed first (I'm helpless about it) ? |
4ae6345
to
fd93d45
Compare
So I just pushed a new version (with the failing test commented out), as I updated slightly the rest of the code. |
I will take a look at the failing test. Let's give a couple of days to see if we can sort it out. |
speed-up randperm, randcycle and shuffle
@rfourquet Do we have an open issue on range generation that should replace this stopgap solution? Would be great - just to serve as a reminder. |
I don't think there is an open issue, but I had this in mind for a while (since some random generation functions became faster). Actually, I intended to try that on top of a 1+ year old branch which accelerates random generation of integers, but which depends on a solution like #9174 (btw, something like #9174 (comment) may also be useful for #14536), as I don't want |
@rfourquet, would you mind opening an issue saying summarizing this state just to keep track of this? |
I could do so yes, but not sure if it makes sense as there is no real issue or bug, besides a slight duplication of the code. To be honest, I'm not certain that we can improve the efficiency of random generation on ranges in a general way, such that we can use it for So I wait for your confirmation if you think it's worth having this issue, in the lines of " |
Avoiding |
While not breaking, this change produces different shuffles and permutations than 0.4, which can be inconvenient when you want to verify that your code gives the same results on 0.5 or compare the speed. I think it would be helpful to mention this somewhere in NEWS.md. |
@GunnarFarneback could you propose a wording for such a mention? |
Something like "shuffle, shuffle!, randperm, and randcycle have been switched to faster algorithms. As a side-effect they produce different results compared to 0.4 even if the random seed is the same." The reference to the random seed is a simplification of the actual interaction with the RNG but should be good enough to convey the message that results will become different. |
Cf. https://groups.google.com/forum/#!topic/julia-users/XxYcgNuHtJs.
This uses a less general but more efficient version of random generation within a range.
Note that amortizing the cost of creating
RangeGenerator
objects offers substantial benefits, but insufficient compared to this PR, e.g:randperm(10^7)
takes here 0.92s on master, 0.47s when amortizing, and 0.23 with this PR (and 0.30s with the proposed R-based solution in the julia-users thread).Note: it may be worth revisiting range generation, this PR is more of a stop-gap solution.