Skip to content

๐Ÿ“š A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software

License

Notifications You must be signed in to change notification settings

jssonx/awesome-gemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Awesome GEMM Awesome

awesome-gemm-banner

๐Ÿš€ Welcome to Awesome GEMM!
A curated and continually evolving list of frameworks, libraries, tutorials, and tools for optimizing General Matrix Multiply (GEMM) operations. Whether you're a beginner eager to learn the fundamentals, a developer optimizing performance-critical code, or a researcher pushing the limits of hardware, this repository is your launchpad to mastery.


Why GEMM Matters ๐Ÿ’ก

General Matrix Multiply is at the core of a wide range of computational tasks: from scientific simulations and signal processing to modern AI workloads like neural network training and inference. Efficiently implementing and optimizing GEMM can lead to dramatic performance improvements across entire systems.

This repository is a comprehensive resource for:

  • Students & Beginners: Learn the basics and theory of matrix multiplication.
  • Engineers & Developers: Discover frameworks, libraries, and tools to optimize GEMM on CPUs, GPUs, and specialized hardware.
  • Researchers & Performance Experts: Explore cutting-edge techniques, research papers, and advanced optimization strategies.

Quickstart & Highlights ๐ŸŒฑ

If youโ€™re new and just want to dive in, start here:

  • For Beginners:

    • NumPy (CPU, Python) - The go-to library for basic matrix operations.
    • How To Optimize GEMM - A step-by-step guide to improving performance from a naive implementation.
  • For GPU Developers:

    • NVIDIA cuBLAS - Highly optimized BLAS for NVIDIA GPUs.
    • NVIDIA CUTLASS - Templates and building blocks to write your own CUDA GEMM kernels.
  • For Low-Precision & AI Workloads:

    • FBGEMM (Meta) - Specialized low-precision GEMM for server inference.
    • gemmlowp (Google) - Low-precision (integer) GEMM for efficient ML inference.

Table of Contents ๐Ÿ“‘


Fundamental Theories and Concepts ๐Ÿง 


General Optimization Techniques ๐Ÿš€


Frameworks and Development Tools ๐Ÿ› ๏ธ

  • BLIS - A modular framework for building high-performance BLAS-like libraries.
  • BLISlab - Educational framework for experimenting with BLIS-like GEMM algorithms.
  • Tensile - AMD ROCm JIT compiler for GPU kernels, specializing in GEMM and tensor contractions.

Libraries ๐Ÿ—‚๏ธ

CPU Libraries ๐Ÿ’ป

GPU Libraries โšก

Cross-Platform Libraries ๐ŸŒ

Language-Specific Libraries ๐Ÿ”ค

Python:

C++:

Julia:


Debugging and Profiling Tools ๐Ÿ”

Intel Tools:

NVIDIA Tools:

ROCm Tools:

Others:


Learning Resources ๐Ÿ“š

University Courses & Tutorials ๐ŸŽ“

Selected Papers ๐Ÿ“

Blogs ๐Ÿ–‹๏ธ


Example Implementations ๐Ÿ’ก


Contributions ๐Ÿค

We welcome and encourage contributions! You can help by:

  • Adding new libraries, tools, or tutorials.
  • Submitting performance benchmarks or example implementations.
  • Improving documentation or correcting errors.

Submit a pull request or open an issue to get started!


License ๐Ÿ“œ

This repository is licensed under the MIT License.


By maintaining this curated list, we hope to empower the community to learn, implement, and optimize GEMM efficiently. Thanks for visiting, and happy computing!

About

๐Ÿ“š A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published