-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out level 3 gemm tests #2610
Open
kshyatt
wants to merge
1
commit into
master
Choose a base branch
from
ksh/gemm
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: ddac187 | Previous: a0c2f4b | Ratio |
---|---|---|---|
latency/precompile |
45404021171 ns |
45297385295 ns |
1.00 |
latency/ttfp |
6388606471 ns |
6375596178 ns |
1.00 |
latency/import |
3040221123 ns |
3036561495 ns |
1.00 |
integration/volumerhs |
9565588 ns |
9567419 ns |
1.00 |
integration/byval/slices=1 |
146893 ns |
146746 ns |
1.00 |
integration/byval/slices=3 |
425698 ns |
425517.5 ns |
1.00 |
integration/byval/reference |
144994 ns |
145010 ns |
1.00 |
integration/byval/slices=2 |
286459 ns |
286216 ns |
1.00 |
integration/cudadevrt |
103520 ns |
103513 ns |
1.00 |
kernel/indexing |
14218 ns |
14419 ns |
0.99 |
kernel/indexing_checked |
15449 ns |
15499 ns |
1.00 |
kernel/occupancy |
728.203125 ns |
748.2734375 ns |
0.97 |
kernel/launch |
2109.4 ns |
2194.6666666666665 ns |
0.96 |
kernel/rand |
14991 ns |
17335 ns |
0.86 |
array/reverse/1d |
19743 ns |
19412 ns |
1.02 |
array/reverse/2d |
25140 ns |
24576 ns |
1.02 |
array/reverse/1d_inplace |
11280 ns |
11029 ns |
1.02 |
array/reverse/2d_inplace |
13218 ns |
13223 ns |
1.00 |
array/copy |
20708 ns |
20740 ns |
1.00 |
array/iteration/findall/int |
158377 ns |
158179 ns |
1.00 |
array/iteration/findall/bool |
138671 ns |
138583 ns |
1.00 |
array/iteration/findfirst/int |
153822.5 ns |
153423 ns |
1.00 |
array/iteration/findfirst/bool |
154577.5 ns |
154821 ns |
1.00 |
array/iteration/scalar |
75766.5 ns |
77451 ns |
0.98 |
array/iteration/logical |
213464 ns |
216735 ns |
0.98 |
array/iteration/findmin/1d |
41245 ns |
41556.5 ns |
0.99 |
array/iteration/findmin/2d |
94061 ns |
94128 ns |
1.00 |
array/reductions/reduce/1d |
35422 ns |
42013 ns |
0.84 |
array/reductions/reduce/2d |
41072.5 ns |
51911 ns |
0.79 |
array/reductions/mapreduce/1d |
33410 ns |
39275 ns |
0.85 |
array/reductions/mapreduce/2d |
41182.5 ns |
49505.5 ns |
0.83 |
array/broadcast |
21668 ns |
21668 ns |
1 |
array/copyto!/gpu_to_gpu |
13525 ns |
11569 ns |
1.17 |
array/copyto!/cpu_to_gpu |
211822 ns |
211873 ns |
1.00 |
array/copyto!/gpu_to_cpu |
245148.5 ns |
245423 ns |
1.00 |
array/accumulate/1d |
108822 ns |
108388.5 ns |
1.00 |
array/accumulate/2d |
79771 ns |
79823 ns |
1.00 |
array/construct |
1117.05 ns |
1208.35 ns |
0.92 |
array/random/randn/Float32 |
43263 ns |
43873.5 ns |
0.99 |
array/random/randn!/Float32 |
26118 ns |
25937 ns |
1.01 |
array/random/rand!/Int64 |
27149 ns |
27271 ns |
1.00 |
array/random/rand!/Float32 |
8683.333333333334 ns |
8766.666666666666 ns |
0.99 |
array/random/rand/Int64 |
29975 ns |
29637 ns |
1.01 |
array/random/rand/Float32 |
12857 ns |
12723 ns |
1.01 |
array/permutedims/4d |
66803 ns |
66923 ns |
1.00 |
array/permutedims/2d |
56890 ns |
56439 ns |
1.01 |
array/permutedims/3d |
59106 ns |
58867 ns |
1.00 |
array/sorting/1d |
2919614 ns |
2933352 ns |
1.00 |
array/sorting/by |
3499898 ns |
3500830 ns |
1.00 |
array/sorting/2d |
1084142 ns |
1085059 ns |
1.00 |
cuda/synchronization/stream/auto |
1028.3 ns |
1038.4 ns |
0.99 |
cuda/synchronization/stream/nonblocking |
6556.8 ns |
6432 ns |
1.02 |
cuda/synchronization/stream/blocking |
794 ns |
807.5918367346939 ns |
0.98 |
cuda/synchronization/context/auto |
1212.5 ns |
1194.1 ns |
1.02 |
cuda/synchronization/context/nonblocking |
6758.6 ns |
6649.8 ns |
1.02 |
cuda/synchronization/context/blocking |
901.3823529411765 ns |
886.6415094339623 ns |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
Failure seems related:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Testing locally, the level 3 and split-out level 3 GEMM-y tests seem to take the same amount of time. Should help with parallelization. Also removed an extraneous comment.