See also the samples overview.
For four of the samples there is a performance analysis available. The remaining examples are not meant for performance comparisons, but rather to show how to use Hybrid Fortran.
Name | Performance Results | Speedup HF on 6 Core vs. 1 Core [1] | Speedup HF on GPU vs 6 Core [1] | Speedup HF on GPU vs 1 Core [1] |
---|---|---|---|---|
3D Diffusion | Link | 1.06x Compare Performance |
10.94x Compare Performance Compare Speedup |
11.66x |
Particle Push | Link | 9.08x Compare Performance |
21.72x Compare Performance Compare Speedup |
152.79x |
Poisson on FEM Solver with Jacobi Approximation | Link | 1.41x | 5.13x | 7.28x |
MIDACO Ant Colony Solver with MINLP Example | Link | 5.26x | 10.07x | 52.99x |
[1]: If available, comparing to reference C version, otherwise comparing to Hybrid Fortran CPU implementation. Kepler K20x has been used as GPU, Westmere Xeon X5670 has been used as CPU (TSUBAME 2.5). All results measured in double precision. The CPU cores have been limited to one socket using thread affinity 'compact' with 12 logical threads. For CPU, Intel compilers ifort / icc with '-fast' setting have been used. For GPU, PGI compiler with '-fast' setting and CUDA compute capability 3.x has been used. All GPU results include the memory copy time from host to device.