Skip to content

rotabulo/apex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

Full API Documentation: https://nvidia.github.io/apex

Contents

1. Mixed Precision

amp: Automatic Mixed Precision

apex.amp is a tool designed for ease of use and maximum safety in FP16 training. All potentially unsafe ops are performed in FP32 under the hood, while safe ops are performed using faster, Tensor Core-friendly FP16 math. amp also automatically implements dynamic loss scaling.

The intention of amp is to be the "on-ramp" to easy FP16 training: achieve all the numerical stability of full FP32 training, with most of the performance benefits of full FP16 training.

Python Source and API Documentation

FP16_Optimizer

apex.FP16_Optimizer wraps an existing Python optimizer and automatically implements master parameters and static or dynamic loss scaling under the hood.

The intention of FP16_Optimizer is to be the "highway" for FP16 training: achieve most of the numerically stability of full FP32 training, and almost all the performance benefits of full FP16 training.

API Documentation

Python Source

Simple examples with FP16_Optimizer

Imagenet with FP16_Optimizer

word_language_model with FP16_Optimizer

The Imagenet and word_language_model directories also contain examples that show manual management of master parameters and static loss scaling.

These manual examples illustrate what sort of operations amp and FP16_Optimizer are performing automatically.

2. Distributed Training

apex.parallel.DistributedDataParallel is a module wrapper, similar to torch.nn.parallel.DistributedDataParallel. It enables convenient multiprocess distributed training, optimized for NVIDIA's NCCL communication library.

apex.parallel.multiproc is a launch utility that helps set up arguments for DistributedDataParallel.

API Documentation

Python Source

Example/Walkthrough

The Imagenet with FP16_Optimizer mixed precision examples also demonstrate apex.parallel.DistributedDataParallel.

Requirements

Python 3

CUDA 9

PyTorch 0.4 or newer. We recommend to use the latest stable release, obtainable from https://pytorch.org/. We also test against the latest master branch, obtainable from https://github.com/pytorch/pytorch.
If you have any problems building, please file an issue.

Quick Start

To build the extension run the following command in the root directory of this project

python setup.py install

To use the extension

import apex

and optionally (if required for your use)

import apex_C as apex_backend

About

A PyTorch Extension

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.5%
  • Cuda 12.0%
  • C++ 1.4%
  • Shell 0.1%