Skip to content

Latest commit

 

History

History
49 lines (37 loc) · 1.39 KB

README.md

File metadata and controls

49 lines (37 loc) · 1.39 KB

GEMM Kernel Microbenchmark

This repo provides a microbenchmark for GEMM kernels on NVIDIA GPUs with Ampere Architecture (sm_80). It includes both a CUDA kernel benchmark and a Python extension benchmark.

Requirements

  • NVIDIA GPU with Ampere Architecture (sm_80)
  • CUDA 12.2

Getting Started

CUDA Kernel Benchmark

  1. Build the project:
$ make
  1. Run a benchmark with specific parameters:
$ ./csrc/bench/main --groups=16 --m=64 --n=64 --k=768 --iterations=3

Where:

  • --groups: Number of groups
  • --m, --n, --k: Problem size dimensions
  • --iterations: Number of iterations
  1. For more information on available options:
$ ./csrc/bench/main --help

Python Extension Benchmark

  1. Export the CUDA kernel as a Python extension:
$ python ./python/testbed/lib.py
$ cd out && TORCH_CUDA_ARCH_LIST="8.0" python setup.py install --user
  1. Run the benchmark:
$ python ./python/testbed/multi_gemm.py > perf.txt

References