GitHub - LiuXiaoxuanPKU/Cost-Model-papers

Open Source Projects

Cost Model
- nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices
- Paleo: A Performance Model for Deep Neural Networks
Distributed Training

Memory Cost Model

Estimating GPU Memory Consumption of Deep Learning Models by Yanjie Gao et al., ESEC/FSE 2020

Computation Cost Model

Cost Model for NAS/Cloud
- Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training by Hongyu Zhu et al., USENIX ATC 2020
- Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training by Geoffrey X. Yu et al., USENIX ATC 2021
- To bridge neural network design and real-world performance: A behaviour study for neural networks by Xiaohu Tang et al., MLSys 2021
- perf4sight: A toolflow to model CNN training performance on Edge GPUs by Aditya Rajagopal et al., ArXiv 2021
- nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices by Li Lyna Zhang et al., MobiSys 2021
- Empirical Analysis and Modeling of Compute Times of CNN Operations on AWS Cloud by Ubaid Ullah Hafeez et al., IISWC 2020
- Paleo: A Performance Model for Deep Neural Networks by Hang Qi et al., ICLR 2017
- Augur: Modeling the Resource Requirements of Convolutional Neural Networks on Mobile Devices by Zongqing Lu et al., Proceedings of the 25th ACM international conference on Multimedia 2017
- Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures by Andre Viebke et al., ArXiv 2019
Cost model for kernel compilation
- A learned Performance Model for Tensor Processing Units by Samuel J. Kaufman et al., MLSys 2021

Communication Cost Model

Iteration Time Prediction for CNN in Multi-GPU Platform: Modeling and Analysis by Ziqian Pei et al., IEEE Access 2019

Distributed Training

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning by Lianmin Zheng et al., ArXiv 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi et al., ArXiv 2019
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning by Samyam Rajbhandari et al., SC 2021
Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc by Zhihao Jia et al., MLSys 2020
A Distributed Multi-GPU System for Fast Graph Processing by Zhihao Jia et al., VLDB 2017

Device Placement

Device Placement Optimization with Reinforcement Learning by Azalia Mirhoseini et al., ICML 2017
DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture by Minjia et al., IEEE IPDPS 2021

Memory Optimization for Training

gradient checkpoint
- Training Deep Nets with Sublinear Memory Cost by Tianqi Chen et al., arXiv 2016
- Efficient Rematerialization for Deep Networks by Ravi Kumar et al., NeurIPS 2019
- Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization by Paras Jain et al., MLsys 2020
- Dynamic Tensor Rematerialization by Marisa Kirisame et al., ICLR 2021
gradient checkpoint + distributed training
- Reducing Activation Recomputation in Large Transformer Models by Vijay Korthikanti et al., arXiv 2022
kernel fusion
- Data movement is all you need: A case study on optimizing transformers by Andrei Ivanov et al., MLSys 2021
compression/quantization
- Gist: Efficient Data Encoding for Deep Neural Network Training by Animesh Jain et al., ISCA 2018
- Gradient Compression Supercharged High-Performance Data Parallel DNN Training by Youhui Bai et al., SOSP 2021
- GACT: Activation compressed training for generic network architectures by Xiaoxuan Liu et al., ICML 2022
- On the Utility of Gradient Compression in Distributed Training Systems by Saurabh Agarwal et al., MLsys 2022
swapping
- Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training by Olivier Beaumont et al., European Conference on Parallel Processing 2020
- SwapAdvisor: Push Deep Learning Beyond the GPU Memory Limit via Smart Swapping by Huang et al., ASPLOS 2020
- Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers by Youjie Li et al., VLDB2022.
- STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training by Xiaoyang Sun et al., SC2022
- ZeRO-Offload: Democratizing Billion-Scale Model Training by Jie Ren et al., USENIX ATC'21
swapping + pipeline parallelism
- Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers by Youjie Li et al., VLDB 2022.
swapping + gradient checkpointing
- Capuchin: Tensor-based gpu memory management for deep learning by Peng, Xuan, et al., ASPLOS 2020
- Efficient Combination of Rematerialization and Offloading for Training DNNs by Olivier Beaumont et al., NeurIPS 2021
- POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging by Shishir G. Patil, ICML 2022
Memory Allocator
- OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks by Benoit Steiner et al., Arxiv 2022
Efficient Optimizer
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models by Samyam Rajbhandari et al., SC'20
- 1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed by Hanlin Tang et al., ICML 2021
Hardware related
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness by Tri Dao et al., NeurIPS 2022

Framework Introduction

PyTorch Internal
Profiler Trace File
Characterizing Deep Learning Training Workloads on Alibaba-PAI by Mengdi Wang et al., IISWC 2019

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
drafts		drafts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open Source Projects

Memory Cost Model

Computation Cost Model

Communication Cost Model

Distributed Training

Device Placement

Memory Optimization for Training

Framework Introduction

About

Releases

Packages

LiuXiaoxuanPKU/Cost-Model-papers

Folders and files

Latest commit

History

Repository files navigation

Open Source Projects

Memory Cost Model

Computation Cost Model

Communication Cost Model

Distributed Training

Device Placement

Memory Optimization for Training

Framework Introduction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages