faster randn by separating out unlikely branch in a function #9132

rfourquet · 2014-11-24T09:46:06Z

All credits to @ViralBShah (cf. #8941 and #9126).
This change probably allows better inlining.

On my machine this gets 35% faster. With the 40-50% of yesterday, this sums up to almost twice as fast :)

@ViralBShah

All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.

timholy · 2014-11-24T10:42:05Z

The progress here has been really amazing to see.

garborg · 2014-11-24T15:32:14Z

👍 from another fan looking forward to taking advantage of this.

ViralBShah · 2014-11-24T16:26:11Z

Awesome! Hope this paves the way for vectorizing randn.

ViralBShah · 2014-11-25T03:26:00Z

The travis failure seems unrelated - it is in bitarray.

faster randn by separating out unlikely branch in a function

ViralBShah · 2014-11-25T04:11:34Z

I verify the speedup. randn is now twice as fast as it was 2 days ago!

ViralBShah · 2014-11-27T04:36:13Z

@rfourquet Do you think for the array version of randn, we can get a speedup by using an array fill version of rand_ui52, and then doing the rest of the work in a loop, rather than calling randn repeatedly?

ViralBShah · 2014-11-27T04:42:43Z

That said, randn(10^7) is only 2.5x slower than rand(10^7) currently, which seems pretty good to me.

rfourquet · 2014-11-27T04:51:02Z

No: I did this few days ago (I think this improves locality and allows to avoid the bounds-check in rand_ui52) and it was pretty good, but before I could do a PR, you added @inline to randn and this completely obsoleted this optimization!

ViralBShah · 2014-11-27T05:13:17Z

Sorry about that. The @inline stuff works really well. Once the unlikely branch was refactored, this seemed like the obvious thing to do.

rfourquet · 2014-11-27T05:31:39Z

I was actually happy that there is no need of specialized version (however, this could be different on another computer, I'll keep in mind to test this when I have a chance; on mine this makes randn! faster by about 5% which I thought was negligible.)

ViralBShah · 2014-11-27T06:40:52Z

Yes, that seems negligible. If you have a branch, I can try it out on mine too.

rfourquet · 2014-11-27T12:49:12Z

I investigated more, and then I observe (cf. my branch rf/randn-fillarray, include("randn.jl"); warmup0(); [arrN(10^i, 10^(8-i)) for i in 1:8])

a slow-down of about 20% for arrays of length n = 10, 100
a speedup of 20% for n = 10^3 to 10^6
a speedup of 5% for n = 10^7 and 10% for n = 10^8

However I had a hard time getting reproducible timings: I found that the efficiency of a function may depend on what have been called (compiled?) before. To see that, checkout the first commit of this branch, then

include("randn.jl")
warmup1(); # call arrN first 
arrN(10^8) # -> 2.612355766
arrN0(10^8) # -> 0.503510284  
scalN(10^8) # -> 2.593528138
# restart
include("randn.jl")
warmup2(); # call arrN0 first
arrN(10^8) # -> 0.763035464
arrN0(10^8) # -> 0.490026425
scalN(10^8) # -> 0.643273084

The difference disappears with the second commit where I duplicate the randn function... I would love if someone can shed some light on this mysterious behavior.

Separate out the unlikely branch in randn() into its own function and provide a fast path for the common case.

ViralBShah · 2014-12-23T19:12:54Z

I have backported this to 0.3. It gives a significant and noticeable speedup and the stream of numbers is exactly the same.

Cc: @ivarne

ivarne · 2014-12-23T19:29:38Z

Great!

faster randn by separating out unlikely branch in a function

b99ea92

All credits to @ViralBShah (cf. #8941 and #9126). This change probably allows better inlining.

rfourquet force-pushed the rf/randn-splitbranch branch from 616e1d7 to b99ea92 Compare November 24, 2014 10:14

ViralBShah added a commit that referenced this pull request Nov 25, 2014

Merge pull request #9132 from JuliaLang/rf/randn-splitbranch

a02fdc7

faster randn by separating out unlikely branch in a function

ViralBShah merged commit a02fdc7 into master Nov 25, 2014

ViralBShah deleted the rf/randn-splitbranch branch November 25, 2014 03:26

ViralBShah added the randomness Random number generation and the Random stdlib label Nov 25, 2014

ViralBShah added a commit that referenced this pull request Dec 23, 2014

Backport #9132 for randn performance.

89e004e

Separate out the unlikely branch in randn() into its own function and provide a fast path for the common case.

rfourquet added the performance Must go faster label Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster randn by separating out unlikely branch in a function #9132

faster randn by separating out unlikely branch in a function #9132

rfourquet commented Nov 24, 2014

timholy commented Nov 24, 2014

garborg commented Nov 24, 2014

ViralBShah commented Nov 24, 2014

ViralBShah commented Nov 25, 2014

ViralBShah commented Nov 25, 2014

ViralBShah commented Nov 27, 2014

ViralBShah commented Nov 27, 2014

rfourquet commented Nov 27, 2014

ViralBShah commented Nov 27, 2014

rfourquet commented Nov 27, 2014

ViralBShah commented Nov 27, 2014

rfourquet commented Nov 27, 2014

ViralBShah commented Dec 23, 2014

ivarne commented Dec 23, 2014

faster randn by separating out unlikely branch in a function #9132

faster randn by separating out unlikely branch in a function #9132

Conversation

rfourquet commented Nov 24, 2014

timholy commented Nov 24, 2014

garborg commented Nov 24, 2014

ViralBShah commented Nov 24, 2014

ViralBShah commented Nov 25, 2014

ViralBShah commented Nov 25, 2014

ViralBShah commented Nov 27, 2014

ViralBShah commented Nov 27, 2014

rfourquet commented Nov 27, 2014

ViralBShah commented Nov 27, 2014

rfourquet commented Nov 27, 2014

ViralBShah commented Nov 27, 2014

rfourquet commented Nov 27, 2014

ViralBShah commented Dec 23, 2014

ivarne commented Dec 23, 2014