A curated list of awesome deep learning hardware, compute cycle/memory optimisation and implementation techniques. Inspired by awesome-deep-learning. Literature from 2014 onwards.
- 2016/05 A 2.2 GHz SRAM with High Temperature Variation Immunity for Deep Learning Application under 28nm
- 2016/06 Switched by Input: Power Efficient Structure for RRAM-based Convolutional Neural Network
- 2016/06 Low-power approximate convolution computing unit with domain-wall motion based "spin-memristor" for image processing applications
- 2014/06 A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
- 2015/02 Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
- 2015/06 Neuromorphic Architectures for Spiking Deep Neural Networks
- 2015/06 Memory and information processing in neuromorphic systems
- 2015/08 INsight: A Neuromorphic Computing System for Evaluation of Large Neural Networks
- 2016/02 Deep Learning on FPGAs: Past, Present, and Future.
- 2016/02 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems
- 2016/02 vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
- 2016/04 Demonstrating Hybrid Learning in a Flexible Neuromorphic Hardware System
- 2016/04 Hardware-oriented Approximation of Convolutional Neural Networks
- 2016/04 Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices
- 2016/05 ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars
- 2016/07 Maximizing CNN Accelerator Efficiency Through Resource Partitioning
- 2016/07 Overcoming Resource Underutilization in Spatial CNN Accelerators
- 2016/07 Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks
- 2014/12 Training Deep Neural Neworks with Low Precision Multiplications
- 2014/12 Implementation of Deep Convolutional Neural Net on a Digital Signal Processor
- 2015/02 Deep Learning with Limited Numerical Precision
- 2015/02 Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training
- 2015/02 8-Bit Approximations for Parallelism in Deep Learning
- 2016/01 DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- 2016/02 Neural Networks with Few Multiplications
- 2016/02 Deep Compression: Compressing Deep Neural Networks with Pruning, Quantization and Huffman Coding
- 2016/02 8-Bit Approximations for Parallelism in Deep Learning
- 2015/09 Heterogeneous Computing in HPC and Deep Learning
- 2016/02 Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
- 2016/02 Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
- 2016/05 Deep Compression: Compressing Deep Neural Networks with Pruning, Quantization and Huffman Coding
- 2016/05 DNNWEAVER: From High-Level Deep Network Models to FPGA Acceleration
- 2015/08 FPGA based Multi-core architectures for Deep Learning
- 2016/05 Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks
- 2015/02 Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
- 2015/07 Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL
- 2015/05 Numerical Optimization for Deep Learning
- 2015/10 Single Node Caffe Scoring and Training on Intel® Xeon E5-Series Processors
- 2016/03 FPGAs Challenge GPUs as a Platform for Deep Learning
- 2016/03 FPGA with OpenCL Solution Released to Deep Learning
- 2016/04 Boosting Deep Learning with the Intel Scalable System Framework
- 2016/04 Movidius puts deep learning chip in a USB drive
- 2016/05 The PCM-Neuron and Neural Computing
- 2016/05 FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency
- Nvidia Devbox
- Google Tensor Processing Unit
- Facebook Open Rack V2 compatible 8-GPU server
- CEVA DNN Digital Signal Processor
- Movidius Fathom USB Stick
- IBM TrueNorth
- AMAX SenseBox
License