Minor README updats.

terrynsun · terrynsun · commit 9faa9108b457 · 2015-09-07T23:35:38.000-04:00
diff --git a/README.md b/README.md
@@ -14,18 +14,28 @@ Terry Sun; Arch Linux, Intel i5-4670, GTX 750
 
 ![](images/nbody_perf_plot.png)
 
+The graph shows time taken (in ms) to update one frame at block sizes from 16
+to 1024 in steps of 8, for various values of N (planets in the system).
+
 I measured performance by disabling visualization and using `CudaEvent`s to time
 the kernel invocations (measuring the time elapsed for both `kernUpdateVelPos`
-and `kernUpdateAcc`). The graph shows time elapsed (in ms) to update one frame
-at block sizes from 16 to 1024 in steps of 8.
+and `kernUpdateAcc`). The recorded value is an average over 100 frames.
 
 Code for performance measuring can be found on the `performance` branch.
 
 Changing the number of planets, as expected, increases the time elapsed for the
-kernels, due to a for-loop in the acceleration calculation (which increases
-linearly by the number of total planets in the system. More interestingly, it
+kernels, due to a for-loop in the acceleration calculation (which increases the
+time with the number of total planets in the system). More interestingly, it
 also changes the way that performance reacts to block size (see n=4096 in the
-above plot).
+above plot). The difference in performance as block size changes is much greater
+with greater N, and also exhibits different behaviors.
+
+At certain block sizes, the time per frame sharply decreases, such as at N=4096,
+block size=1024, 512, 256, 128. These are points where each block would be
+saturated (ie. no threads are started that are not needed).
+
+I have no idea what's going on with the spikes peaking at N=4096, block size~800
+or N=3072, block size~600.
 
 # Part2: An Even More Basic Matrix Library