Skip to content

bcrusco/Project1-CUDA-Introduction

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA Introduction

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1

  • Bradley Crusco
  • Tested on: Windows 10, i7-3770K @ 3.50GHz 16GB, 2 x GTX 980 4096MB (Personal Computer)

N-Body Simulation

YouTube Video

Performance Analysis

N-Body Performance by Block Size ![](images/N-Body Performance by Block Size.png "N-Body Performance by Block Size")

How does changing the tile and block size affect performance?

In the first graph you see that, despite a slight improvement in the 384 and 512 block size ranges, that performance decreases as we increase the block size. The reason for this, I suspect, is that the blocks and threads are optimally computationally saturated in this area, giving the increased performance. After this point though, the blocks and threads are under saturated, resulting in performance decrease as we add additional overhead in the form of new blocks that the simulation is not taking proper advantage of.

N-Body Performance by Planet Size ![](images/N-Body Performance by Planet Size.png "N-Body Performance by Planet Size")

How does changing the number of planets effect performance?

As can be seen in the second graph, the average duration of the kernUpdateAcc function increases exponentially as the number of planets increases. This result is with the number of blocks set to the default of 128 given in the starter code as a baseline. The reason for this seems fairly obvious, since the N-body simulation is a O(N2) problem, and without altering our block size to compensate we should expect to see this growth in execution time.

Matrix Math

![](images/Matrix Math.png "Matrix Math Test Output")

Performance Analysis

Matrix Math Block Size Analysis ![](images/Matrix Math Block Size Analysis.png "Matrix Math Block Size Analysis")

How does changing the tile and block size affect performance?

Because the computational requirements are so low with the small matrices we are dealing with, block size does not really matter. In fact, we could reduce the block size to 1 (from 25, as I have it now) and see no noticeable difference, as you see in the graph. For very large matrices however this would not be the case.

Without running comparisons of CPU code vs. GPU code, how would you expect the performance to compare?

First, for the addition and subtraction operations, I don't expect there to be much of a difference. Addition and subtraction are O(N) operations (where N is the number of elements in the matrix), and so GPU performance is going to be bottlenecked by memory access, keeping us from taking advantage of the GPU's compute power. Matrix multiplication would also have a runtime of O(N), which would be an improvement over the sequential runtime on the CPU, which I believe is O(N1.5). This is because there is more computational overhead for the multiplication operation, allowing the GPU to shine where the CPU gets slowed down.

About

Introduction to programming with CUDA.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • CMake 84.0%
  • C++ 10.3%
  • Cuda 5.1%
  • Other 0.6%