StencilStream Version 1.1.1
Changes
This release adds a benchmark mode to the hotspot
and fdtd
examples. In fdtd
, this can be enabled with the -b
flag. In hotspot
, it can be enabled by appending true
to the list of arguments.
Performance
This release contains the isolated StencilStream library as well as synthesized application binaries. All of these binaries have been synthesized using oneAPI version beta-10, and the targeted boards are the Nallatech/Bittware 520N Board as well as the Intel PAC Stratix 10. Below are performance metrics of some sample applications. The conway
application is optimized for readability, not for performance, and is therefore not listed.
Bittware/Nallatech 520N (Stratix 10 GX 2800)
Application | Cycles per Loop | Pipeline Depth | Cycle Frequency | Generations per Second | Overall Performance | Logic Usage | Register Usage | RAM Usage | DSP Usage |
---|---|---|---|---|---|---|---|---|---|
hotspot |
1.07 cycles | 200 cores | 206.25 MHz | 36933 G/s | 580.91 GFLOPS | 79.38% | 49.02% | 35.35% | 52.13% |
fdtd |
16.54 cycles | 35 cores | 272.50 MHz | 243.56 G/s | 136.10 GFLOPS | 79.93% | 49.61% | 47.64% | 52.66% |
Intel PAC D5005 (Stratix 10 SX 2800)
Application | Cycles per Loop | Pipeline Depth | Cycle Frequency | Generations per Second | Overall Performance | Logic Usage | Register Usage | RAM Usage | DSP Usage |
---|---|---|---|---|---|---|---|---|---|
hotspot |
0.98 cycles | 200 cores | 163.00 MHz | 31644.4 G/s | 497.724 GFLOPS | 83.67% | 50.16% | 35.60% | 52.13% |
fdtd |
6.69 cycles | 20 cores | 221.00 MHz | 157.61 G/s | 78.01 GFLOPS | 63.93% | 35.65% | 33.45% | 30.30% |