generate benchmark input in device#10109
generate benchmark input in device#10109rapids-bot[bot] merged 119 commits intorapidsai:branch-22.04from
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10109 +/- ##
================================================
+ Coverage 86.13% 86.16% +0.02%
================================================
Files 139 139
Lines 22438 22447 +9
================================================
+ Hits 19328 19341 +13
+ Misses 3110 3106 -4
Continue to review full report at Codecov.
|
|
rerun tests |
vyasr
left a comment
There was a problem hiding this comment.
Aside from my request for documenting the normal/binomial approximation, I think this is good enough to merge on my end. Thanks again!
|
@karthikeyann pointed out that the binomial has been removed, I think I was getting linked back to old versions of the code when verifying those parts of the code. I think we're set. |
vuule
left a comment
There was a problem hiding this comment.
Concern about the geometric distribution, plus some nitpicks.
| [lower_bound, upper_bound, dist = make_normal_dist(diffType{0}, upper_bound - lower_bound)]( | ||
| thrust::minstd_rand& engine, size_t size) -> rmm::device_uvector<T> { | ||
| rmm::device_uvector<T> result(size, rmm::cuda_stream_default); | ||
| thrust::tabulate(thrust::device, | ||
| result.begin(), | ||
| result.end(), | ||
| abs_value_generator{lower_bound, upper_bound, engine, dist}); |
There was a problem hiding this comment.
I don't think this emulates a geometric distribution. At least it didn't add up with some sample lower_bound
and upper_bound values I tried on paper.
AFAICT we need a normal distribution with mean = 0, so we can use abs to make half of the bell. Then these values need to be moved/inverted so that the tip of the bell is at lower_bound, and probability falls towards upper_bound.
We can maybe leave this as TODO, but it might affect benchmarks in the meantime.
There was a problem hiding this comment.
updated. added geometric distribution.
|
rerun tests |
1 similar comment
|
rerun tests |
vuule
left a comment
There was a problem hiding this comment.
Thank you for addressing all review comments!
Looks 🔥 🔥
|
@gpucibot merge |
|
Thank you @vuule, @vyasr and @davidwendt for reviewing this big PR! 💯 |
To speedup generate benchmark input generation, move all data generation to device.
To address #5773 (comment)
This PR moves the random input generation to device.
Rest all of the original work in this PR was split to multiple PRs and merged.
#10277
#10278
#10279
#10280
#10281
#10300
With all of these changes, single iteration of all benchmark runs in <1000 seconds. (from 3067s to 964s).
Running more iterations would see higher benefit too because the benchmark is restarted several times during run which again calls benchmark input generation code.
closes #9857