Skip to content

Commit 9faa910

Browse files
committed
Minor README updats.
1 parent 40dd226 commit 9faa910

File tree

1 file changed

+15
-5
lines changed

1 file changed

+15
-5
lines changed

README.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,28 @@ Terry Sun; Arch Linux, Intel i5-4670, GTX 750
1414

1515
![](images/nbody_perf_plot.png)
1616

17+
The graph shows time taken (in ms) to update one frame at block sizes from 16
18+
to 1024 in steps of 8, for various values of N (planets in the system).
19+
1720
I measured performance by disabling visualization and using `CudaEvent`s to time
1821
the kernel invocations (measuring the time elapsed for both `kernUpdateVelPos`
19-
and `kernUpdateAcc`). The graph shows time elapsed (in ms) to update one frame
20-
at block sizes from 16 to 1024 in steps of 8.
22+
and `kernUpdateAcc`). The recorded value is an average over 100 frames.
2123

2224
Code for performance measuring can be found on the `performance` branch.
2325

2426
Changing the number of planets, as expected, increases the time elapsed for the
25-
kernels, due to a for-loop in the acceleration calculation (which increases
26-
linearly by the number of total planets in the system. More interestingly, it
27+
kernels, due to a for-loop in the acceleration calculation (which increases the
28+
time with the number of total planets in the system). More interestingly, it
2729
also changes the way that performance reacts to block size (see n=4096 in the
28-
above plot).
30+
above plot). The difference in performance as block size changes is much greater
31+
with greater N, and also exhibits different behaviors.
32+
33+
At certain block sizes, the time per frame sharply decreases, such as at N=4096,
34+
block size=1024, 512, 256, 128. These are points where each block would be
35+
saturated (ie. no threads are started that are not needed).
36+
37+
I have no idea what's going on with the spikes peaking at N=4096, block size~800
38+
or N=3072, block size~600.
2939

3040
# Part2: An Even More Basic Matrix Library
3141

0 commit comments

Comments
 (0)