This is a fast deep learning kernel in C built from scratch for hardware performance benchmarking or simple machine learning tasks requiring efficiently configurable and deployable deep neural networks.
The default program in main.c reports timing for model initialization, forward passing, and backward passing of a network with over 1 billion parameters. To compile and execute this program, simply use the terminal commands shown below. Use the same commands if modifications are made in main.c for other use.
cd build
make && ./exec
Dell XPS 15 9520 (Intel i7-12700H)