-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster randn by separating out unlikely branch in a function #9132
Conversation
All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.
616e1d7
to
b99ea92
Compare
The progress here has been really amazing to see. |
👍 from another fan looking forward to taking advantage of this. |
Awesome! Hope this paves the way for vectorizing randn. |
The travis failure seems unrelated - it is in bitarray. |
faster randn by separating out unlikely branch in a function
I verify the speedup. |
@rfourquet Do you think for the array version of |
That said, |
No: I did this few days ago (I think this improves locality and allows to avoid the bounds-check in |
Sorry about that. The |
I was actually happy that there is no need of specialized version (however, this could be different on another computer, I'll keep in mind to test this when I have a chance; on mine this makes |
Yes, that seems negligible. If you have a branch, I can try it out on mine too. |
I investigated more, and then I observe (cf. my branch rf/randn-fillarray,
However I had a hard time getting reproducible timings: I found that the efficiency of a function may depend on what have been called (compiled?) before. To see that, checkout the first commit of this branch, then
The difference disappears with the second commit where I duplicate the |
Separate out the unlikely branch in randn() into its own function and provide a fast path for the common case.
I have backported this to 0.3. It gives a significant and noticeable speedup and the stream of numbers is exactly the same. Cc: @ivarne |
Great! |
All credits to @ViralBShah (cf. #8941 and #9126).
This change probably allows better inlining.
On my machine this gets 35% faster. With the 40-50% of yesterday, this sums up to almost twice as fast :)