Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Results #1

Open
GiggleLiu opened this issue Nov 4, 2018 · 3 comments
Open

Benchmark Results #1

GiggleLiu opened this issue Nov 4, 2018 · 3 comments

Comments

@GiggleLiu
Copy link
Member

9 qubit QCBM circuit with depth 8

Batched Performance

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  17.13 MiB
  allocs estimate:  15549
  --------------
  minimum time:     9.164 ms (0.00% GC)
  median time:      78.108 ms (2.70% GC)
  mean time:        76.510 ms (7.66% GC)
  maximum time:     105.105 ms (91.49% GC)
  --------------
  samples:          27
  evals/sample:     1

julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
  allocs estimate:  2478
  --------------
  minimum time:     868.712 ms (0.00% GC)
  median time:      938.671 ms (0.00% GC)
  mean time:        926.054 ms (0.08% GC)
  maximum time:     970.780 ms (0.24% GC)
  --------------
  samples:          3
  evals/sample:     1

Single Run Performance

julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.50 MiB
  allocs estimate:  15293
  --------------
  minimum time:     3.071 ms (0.00% GC)
  median time:      3.295 ms (0.00% GC)
  mean time:        3.750 ms (8.93% GC)
  maximum time:     10.285 ms (54.88% GC)
  --------------
  samples:          531
  evals/sample:     1

julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     217.369 μs (0.00% GC)
  median time:      222.433 μs (0.00% GC)
  mean time:        292.978 μs (18.22% GC)
  maximum time:     8.223 ms (96.29% GC)
  --------------
  samples:          6768
  evals/sample:     1

Platform

CPU:
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz

GPU:
Nvidia GeForce 940MX
@GiggleLiu
Copy link
Member Author

Another benchmark on Nvidia P100

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  16.70 MiB
  allocs estimate:  7278
  --------------
  minimum time:     4.623 ms (0.00% GC)
  median time:      10.226 ms (8.24% GC)
  mean time:        11.168 ms (9.86% GC)
  maximum time:     81.029 ms (89.50% GC)
  --------------
  samples:          180
  evals/sample:     1

julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
  allocs estimate:  2478
  --------------
  minimum time:     345.571 ms (0.00% GC)
  median time:      360.031 ms (0.00% GC)
  mean time:        358.910 ms (0.70% GC)
  maximum time:     369.374 ms (4.10% GC)
  --------------
  samples:          6
  evals/sample:     1

julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.07 MiB
  allocs estimate:  7121
  --------------
  minimum time:     1.597 ms (0.00% GC)
  median time:      1.743 ms (0.00% GC)
  mean time:        1.957 ms (8.67% GC)
  maximum time:     77.709 ms (96.39% GC)
  --------------
  samples:          1021
  evals/sample:     1

julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     205.896 μs (0.00% GC)
  median time:      212.959 μs (0.00% GC)
  mean time:        247.828 μs (13.21% GC)
  maximum time:     75.570 ms (99.60% GC)
  --------------
  samples:          8002
  evals/sample:     1

Platform

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Stepping:              1
CPU MHz:               2523.984
BogoMIPS:              4401.45
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47


GPU:
Model: 		 Tesla P100-PCIE-12GB
IRQ:   		 74
GPU UUID: 	 GPU-a78e4979-19e4-4d0e-ebc7-66348ddd11b3
Video BIOS: 	 86.00.3a.00.02
Bus Type: 	 PCIe
DMA Size: 	 47 bits
DMA Mask: 	 0x7fffffffffff
Bus Location: 	 0000:04:00.0
Device Minor: 	 0

@GiggleLiu
Copy link
Member Author

Another benchmark on Nvidia Tesla M40

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  16.70 MiB
  allocs estimate:  7278
  --------------
  minimum time:     4.713 ms (0.00% GC)
  median time:      12.068 ms (7.94% GC)
  mean time:        12.484 ms (9.53% GC)
  maximum time:     80.318 ms (91.10% GC)
  --------------
  samples:          161
  evals/sample:     1


julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
  allocs estimate:  2478
  --------------
  minimum time:     382.711 ms (0.00% GC)
  median time:      384.631 ms (0.00% GC)
  mean time:        386.760 ms (0.65% GC)
  maximum time:     396.166 ms (3.78% GC)
  --------------
  samples:          6
  evals/sample:     1


julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.07 MiB
  allocs estimate:  7121
  --------------
  minimum time:     1.620 ms (0.00% GC)
  median time:      1.674 ms (0.00% GC)
  mean time:        1.900 ms (8.67% GC)
  maximum time:     77.474 ms (96.30% GC)
  --------------
  samples:          1051
  evals/sample:     1


julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     209.335 μs (0.00% GC)
  median time:      216.347 μs (0.00% GC)
  mean time:        251.826 μs (13.06% GC)
  maximum time:     75.510 ms (99.59% GC)
  --------------
  samples:          7876
  evals/sample:     1%

Platform

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    1
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Stepping:              1
CPU MHz:               1200.031
BogoMIPS:              4401.55
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11
NUMA node1 CPU(s):     12-23

GPU:
Model: 		 Tesla M40
IRQ:   		 74
GPU UUID: 	 GPU-????????-????-????-????-????????????
Video BIOS: 	 ??.??.??.??.??
Bus Type: 	 PCIe
DMA Size: 	 40 bits
DMA Mask: 	 0xffffffffff
Bus Location: 	 0000:04:00.0
Device Minor: 	 0

@GiggleLiu
Copy link
Member Author

Another benchmark on Nvidia TItan V

julia> @benchmark zero_state(n, 1000) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  16.70 MiB
  allocs estimate:  7278
  --------------
  minimum time:     4.884 ms (0.00% GC)
  median time:      6.476 ms (17.07% GC)
  mean time:        6.986 ms (18.01% GC)
  maximum time:     110.769 ms (94.64% GC)
  --------------
  samples:          286
  evals/sample:     1

julia> @benchmark zero_state(n, 1000) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  8.02 MiB
   43 function iterate(qo::QCBMOptimizer, state::Int=1)
  allocs estimate:  2478
  --------------
  minimum time:     396.184 ms (0.00% GC)
  median time:      397.585 ms (0.00% GC)
  mean time:        401.956 ms (1.01% GC)
  maximum time:     418.597 ms (4.83% GC)
  --------------
  samples:          5
  evals/sample:     1

julia> @benchmark zero_state(n) |> cu |> $(qcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  1.07 MiB
  allocs estimate:  7121
  --------------
  minimum time:     1.635 ms (0.00% GC)
  median time:      2.631 ms (0.00% GC)
  mean time:        3.069 ms (11.31% GC)
  maximum time:     114.555 ms (96.01% GC)
  --------------
  samples:          651
  evals/sample:     1

julia> @benchmark zero_state(n) |> $(cqcbm.circuit) seconds = 2
BenchmarkTools.Trial:
  memory estimate:  234.52 KiB
  allocs estimate:  2781
  --------------
  minimum time:     236.792 μs (0.00% GC)
  median time:      453.363 μs (0.00% GC)
  mean time:        523.526 μs (13.50% GC)
  maximum time:     109.883 ms (99.48% GC)
  --------------
  samples:          3792
  evals/sample:     1

Platform

CPU:
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2499.921
CPU max MHz:           2900.0000
CPU min MHz:           1200.0000
BogoMIPS:              4401.27
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47


GPU:
Model: 		 TITAN V
IRQ:   		 98
GPU UUID: 	 GPU-f04d8db3-bb77-b4ee-cd2e-b666cd0fd0ea
Video BIOS: 	 88.00.41.00.12
Bus Type: 	 PCIe
DMA Size: 	 47 bits
DMA Mask: 	 0x7fffffffffff
Bus Location: 	 0000:04:00.0
Device Minor: 	 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant