-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large refactor and complete testing #38
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit makes a number of large changes including updates to support PyTorch 1.8-1.10 features, a large refactoring of the KFAC codebase, removal of Horovod support, and more. General Changes: - Updated to KFAC 0.4.0 - Add `Makefile` for running code formatting (with black), flaking (with flake8), and unit tests (with pytest). - All code has been formatted to conform with PEP (and some small modifications to PEP). - Removed Horovod examples - Updated distributed launch scripts to work with `torch.distributed.launch` (removing the need for auxilliary launch-on-node-scripts). - Update `scripts/*.sh` to infer distributed environment from environment variables. - Cleaned up README KFAC Changes: - PyTorch 1.9/1.10 compatibility - Updated grad hooks to be compatible with new torch.nn.Module hooks - Update inverse and eigen decomp methods to no longer use deprecated functions. Additionally, inverse/eigen decomp results are written straight into buffer. - Intermediate values in forward/backward passes are no longer accumulated as separate values but summed in a tensor. - The number of accumulation steps passed to `KFAC()` is used to average the summed intermediate values. - Intermediate values are only saved if the model is in training mode. E.g., with `model.train()`. - Added enum types for `AssignmentStrategy`, `ComputeMethod`, and `DistributedStrategy`. - Updated `KFAC()` parameters: - Added `accumulation_steps` - Added `symmetry_aware` - Renamed `*_update_freq` to `*_update_steps` - Renamed `compute_eigen_outer_product` - Renamed `colocate_factors` - Added @Property methods for all params in the `KFAC()` state dict. - `KFAC()` now launchers communication operations for factors, inverses, and gradients as soon as they are computed to overlap communication with computation. - `KFAC()` computes inverse and gradients in reverse order of layers since G factors are only made availble in backward pass. - New `TorchDistributedCommunicator()` class - Communication operations now return PyTorch futures, and the `KFACBaseLayer` classes have sync methods to wait on futures. - Removed Horovod backend support - Rewrote `WorkerAllocator` - Merged load balancer with `WorkerAllocator` - Update variable and function names to be consistent with names used in KAISA paper. - Removed `state` attribute of `KFACBaseLayer` classes so tensors are stored directly as class attributes. - Separated the eigen decomp and inverse method layer classes into separate classes that inherit from `KFACBaseLayer`. - KFAC `get_*_factors` for specific Torch modules are now separate `ModuleHelper` classes that are stored as am attribute of `KFACBaseLayer`. - Removed support for Embedding and LSTM/RNN layers (not that they were really supported anyways).
Refactoring: - Changed the `kfac` package to use absolute imports everywhere. - Renamed `grad_comm_*` to `grad_receiver_*` in `KFACBaseLayer` to use align the terminology with that used in `WorkerAllocator`. - Use `torch.outer()` for computing outer products instead of unsqueezes + matrix multiplication. - Renamed `kfac/comm.py` to `kfac/preconditioner.py`. Changes: - Use multiplication by inverse of world size instead of matrix division after allreduces. - Eigenvalues are clamped to be non-negative (as it was in KFAC 0.3). - Removed uncessary `.view()` calls in `kfac/layers/modules.py`. - Add early function exit from communication functions in `KFACBaseLayer`. Note that `KFAC.step()` would already correctly skip these based on the distribution strategy. Bug Fixes: - Fixed instance of using `Future.value()` instead of `Future.wait()` in a sync function`. - Fixed issue on the first iteration where the A and G factors were correctly initialized to a zero matrix with ones on the diagonals but then the running average of the intialized matrix with the new values of A and G were skipped causing poor convergence.
When accumulating the inputs for the A factor, PyTorch was raising an error that variables needed for the gradient computation were modified by in-place operations. This was because on the first accumulation step, we were saving a reference to the intermediate data. In this fix, we simply save a reference to the data, rather than the tensor as only the tensor has a history.
- Refactored script argument names to be inline with the parameter names to KFAC. - Added prefetching to dataloader - Removed reference to language modeling scripts in README
The reshape in the computation of the factors for the linear module were removed previously but these reshapes are necessary for linear modules with inputs with more than 2 dimensions.
The mini-step tracking assumed that forward/backward steps were never overlapped so a single counter was used to determine when a gradient accumulation boundary was. This assumption breaks for more advanced layer pipelining schemes such as those used by DeepSpeed. The fix is counting the mini-steps on a per-module basis.
- Added parameters to KFAC and KFACLayer classes - Changed self.comm to self.tdc for clarity - Updated CNN example to explicitly define bucket cap
Refactoring and Testing
Refactor and Testing
gpauloski
added a commit
that referenced
this pull request
Mar 25, 2022
Large refactor and complete testing
gpauloski
added a commit
that referenced
this pull request
Mar 25, 2022
Large refactor and complete testing
gpauloski
added a commit
that referenced
this pull request
Mar 25, 2022
Large refactor and complete testing Former-commit-id: b778010
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Mass K-FAC refactor and repository changes
DevOps changes
kfac
requirestorch>=1.8
and Python>=3.7
tox
used for testing environments and automationpre-commit
updated. Major changes include prefer single-quotes, mypy, flake8 pluginssetup.cfg
for package metadata andtox
/flake8
/mypy
/coverage
configurationrequirement-dev.txt
that contains all dependencies needed to run the test suiteCode quality and testing
mypy
testing/
andtests/
respectivelypytest
) that checks loss decreases when training with K-FACpytest
) that verifies training with K-FAC achieves higher accuracykfac
package improvementsKFACBaseLayer
handles general K-FAC computations and communications for an arbitrary layerModuleHelper
implementations provide a unified interface for interacting with supported PyTorch modulesKFACBaseLayer
instance is passed aModuleHelper
instance corresponding to the module in the model being preconditionedkfac.layers.register
modulecomm
module with thedistributed
module that provide a more exhaustive set of distributed communication utiltiesget_rank
andget_world_size
methods to enable K-FAC training whentorch.distributed
is not initializedenums
module for convenience with type annotationsKFACBaseLayer
is now agnostic of its placementKFACBaseLayer
expects some other object to correctly execute its operations according to some placement strategy.KFACBaseLayer
without being beholded to some placement strategy.BaseKFACPreconditioner
which provides the minimal set of functionality for preconditioning with K-FACstep()
method, hook registration toKFACBaseLayer
, and some small bookkeeping functionalityBaseKFACPreconditioner
takes as input already registeredKFACBaseLayer
s and an initializedWorkAssignment
object.reset_batch()
to clear the staged factors for the batch in the case of a bad batch of data (e.g., if the gradients overflowed)memory_usage()
includes the intermediate factors accumulated for the current batchstate_dict
now includes K-FAC hyperparameters and steps in addition to factorsKFACPreconditioner
, a subclass ofBaseKFACPreconditioner
, that implements the full functionality described in the KAISA paper.WorkAssignment
interface that provides a schematic for the methods needed byBaseKFACPreconditioner
to determine where to perform computations and communicationsKAISAAssignment
implementation that provides the KAISA gradient worker fraction-based strategyKFACParamScheduler
replace with aLambdaParamScheduler
modeled on PyTorch'sLambdaLRSchedule
BaseKFACPreconditioner
can be passed functions the return the current K-FAC hyperparameters rather than static float valueslogging
andKFACBasePreconditioner
takes an optionalloglevel
parameter (closes KFAC verbose should use logging instead of print #33)Example script changes
examples/requirements.txt
examples/README.md
kfac
APIOther changes + future goals