Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster randn by separating out unlikely branch in a function #9132

Merged
merged 1 commit into from
Nov 25, 2014

Conversation

rfourquet
Copy link
Member

All credits to @ViralBShah (cf. #8941 and #9126).
This change probably allows better inlining.

On my machine this gets 35% faster. With the 40-50% of yesterday, this sums up to almost twice as fast :)

All credits to @ViralBShah (cf. #8941 and #9126).
This change probably allows better inlining.
@timholy
Copy link
Member

timholy commented Nov 24, 2014

The progress here has been really amazing to see.

@garborg
Copy link
Contributor

garborg commented Nov 24, 2014

👍 from another fan looking forward to taking advantage of this.

@ViralBShah
Copy link
Member

Awesome! Hope this paves the way for vectorizing randn.

@ViralBShah
Copy link
Member

The travis failure seems unrelated - it is in bitarray.

ViralBShah added a commit that referenced this pull request Nov 25, 2014
faster randn by separating out unlikely branch in a function
@ViralBShah ViralBShah merged commit a02fdc7 into master Nov 25, 2014
@ViralBShah ViralBShah deleted the rf/randn-splitbranch branch November 25, 2014 03:26
@ViralBShah
Copy link
Member

I verify the speedup. randn is now twice as fast as it was 2 days ago!

@ViralBShah ViralBShah added the randomness Random number generation and the Random stdlib label Nov 25, 2014
@ViralBShah
Copy link
Member

@rfourquet Do you think for the array version of randn, we can get a speedup by using an array fill version of rand_ui52, and then doing the rest of the work in a loop, rather than calling randn repeatedly?

@ViralBShah
Copy link
Member

That said, randn(10^7) is only 2.5x slower than rand(10^7) currently, which seems pretty good to me.

@rfourquet
Copy link
Member Author

No: I did this few days ago (I think this improves locality and allows to avoid the bounds-check in rand_ui52) and it was pretty good, but before I could do a PR, you added @inline to randn and this completely obsoleted this optimization!

@ViralBShah
Copy link
Member

Sorry about that. The @inline stuff works really well. Once the unlikely branch was refactored, this seemed like the obvious thing to do.

@rfourquet
Copy link
Member Author

I was actually happy that there is no need of specialized version (however, this could be different on another computer, I'll keep in mind to test this when I have a chance; on mine this makes randn! faster by about 5% which I thought was negligible.)

@ViralBShah
Copy link
Member

Yes, that seems negligible. If you have a branch, I can try it out on mine too.

@rfourquet
Copy link
Member Author

I investigated more, and then I observe (cf. my branch rf/randn-fillarray, include("randn.jl"); warmup0(); [arrN(10^i, 10^(8-i)) for i in 1:8])

  1. a slow-down of about 20% for arrays of length n = 10, 100
  2. a speedup of 20% for n = 10^3 to 10^6
  3. a speedup of 5% for n = 10^7 and 10% for n = 10^8

However I had a hard time getting reproducible timings: I found that the efficiency of a function may depend on what have been called (compiled?) before. To see that, checkout the first commit of this branch, then

include("randn.jl")
warmup1(); # call arrN first 
arrN(10^8) # -> 2.612355766
arrN0(10^8) # -> 0.503510284  
scalN(10^8) # -> 2.593528138
# restart
include("randn.jl")
warmup2(); # call arrN0 first
arrN(10^8) # -> 0.763035464
arrN0(10^8) # -> 0.490026425
scalN(10^8) # -> 0.643273084

The difference disappears with the second commit where I duplicate the randn function... I would love if someone can shed some light on this mysterious behavior.

ViralBShah added a commit that referenced this pull request Dec 23, 2014
Separate out the unlikely branch in randn() into its own function
and provide a fast path for the common case.
@ViralBShah
Copy link
Member

I have backported this to 0.3. It gives a significant and noticeable speedup and the stream of numbers is exactly the same.

Cc: @ivarne

@ivarne
Copy link
Member

ivarne commented Dec 23, 2014

Great!

@rfourquet rfourquet added the performance Must go faster label Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants