[Redo][Unity] Split DecomposeOpsForTraining into two steps #16465

Lunderberg · 2024-01-24T15:24:19Z

This is a reapplication of #15954, after resolving the breakages that required reverting in #16442. The regex matching is now implemented without the #include <regex> from the C++ stdlib, to avoid ABI incompatibility with pytorch.

Prior to this commit, the DecomposeOpsForTraining transform directly replaced relax.nn.batch_norm into more primitive relax operations. This required the decomposed form of relax.nn.batch_norm to be duplicated with DecomposeOpsForInference. This commit refactors the pass to occur in two steps, first to apply training-specific mutations, and then to decompose.

Having a clear DecomposeOps pass also has a clear single location for operator decomposition, which may be migrated into the operator definition in the future, similar to FLegalize.

This function should be used instead of `std::regex` within C++ call sites, to avoid ABI incompatibilities with pytorch. Currently, the pytorch wheels available through pip install use the pre-C++11 ABI by setting `-DUSE_CXX11_ABI=0` [0]. If TVM were to user the pre-C++11 ABI, this would cause breakages with dynamically-linked LLVM environments. Use of the `<regex>` header in TVM should be avoided, as its implementation is not supported by gcc's dual ABI. This ABI incompatibility results in runtime errors either when `std::regex` is called from TVM, or when `std::regex` is called from pytorch, depending on which library was loaded first. This restriction can be removed when a version of pytorch compiled using `-DUSE_CXX11_ABI=1` is available from PyPI. [0] pytorch/pytorch#51039

This is a reapplication of apache#15954, after resolving the breakages that required reverting in apache#16442. The regex matching is now implemented without the `#include <regex>` from the C++ stdlib, to avoid ABI incompatibility with pytorch. Prior to this commit, the `DecomposeOpsForTraining` transform directly replaced `relax.nn.batch_norm` into more primitive relax operations. This required the decomposed form of `relax.nn.batch_norm` to be duplicated with `DecomposeOpsForInference`. This commit refactors the pass to occur in two steps, first to apply training-specific mutations, and then to decompose. Having a clear `DecomposeOps` pass also has a clear single location for operator decomposition, which may be migrated into the operator definition in the future, similar to `FLegalize`.

slyubomirsky · 2024-01-24T23:18:43Z

Is having the two separate passes necessary for reducing code duplication for batch norm? It does come at the cost of an extra traversal.

Lunderberg · 2024-01-25T14:22:03Z

It isn't strictly necessary, but I'd like to move in that direction as a first step in removing the LegalizeOpsForInference step altogether. This came up in a conversation here regarding potential use cases of allowing FLegalize to be implemented in terms of other relax operations.

Currently, there are two distinct transforms, DecomposeOpsForTraining and DecomposeOpsForInference. If a model uses either R.nn.batch_norm or R.nn.layer_norm, a user must apply one of these two transforms prior to calling tvm.relax.build. This long-term goal of this change is to remove that requirement. By splitting the transform into two steps, the training flow has an optional pass MutateOpsForTraining and a mandatory pass DecomposeOps, while the inference flow has a single mandatory pass DecomposeOps.

Since the DecomposeOps pass is required for both use cases, and the change it makes is a special case of FLegalize, a follow-up PR can then define FLegalize for R.nn.batch_norm and R.nn.layer_norm, and remove DecomposeOps altogether.

slyubomirsky · 2024-01-26T01:53:18Z

Ah, that seems like a good reason then. 👍 Some bigger simplifications in the works.

slyubomirsky

These changes seem reasonable and, per your comment, set us up for further simplifications down the line.

Lunderberg · 2024-02-05T14:14:53Z

Sounds good. Re-running CI as I let the results get more stale than I'd like, then (assuming no new failures arise) merging in.

Lunderberg requested a review from tqchen January 24, 2024 15:24

Lunderberg mentioned this pull request Jan 24, 2024

[Lint] Add check to prevent usage of #include <regex> #16412

Merged

Lunderberg added 2 commits January 24, 2024 18:09

Lunderberg force-pushed the transform_decompose_ops_for_training branch from 8a8f8d9 to 317e8da Compare January 24, 2024 18:09

slyubomirsky approved these changes Jan 26, 2024

View reviewed changes

Merge branch 'main' into transform_decompose_ops_for_training_pr_16465

af8613d

Lunderberg merged commit 8da3de1 into apache:main Feb 6, 2024

Lunderberg deleted the transform_decompose_ops_for_training branch February 6, 2024 14:02

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Redo][Unity] Split DecomposeOpsForTraining into two steps #16465

[Redo][Unity] Split DecomposeOpsForTraining into two steps #16465

Uh oh!

Lunderberg commented Jan 24, 2024

Uh oh!

slyubomirsky commented Jan 24, 2024

Uh oh!

Lunderberg commented Jan 25, 2024

Uh oh!

slyubomirsky commented Jan 26, 2024

Uh oh!

slyubomirsky left a comment

Uh oh!

Lunderberg commented Feb 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Redo][Unity] Split DecomposeOpsForTraining into two steps #16465

[Redo][Unity] Split DecomposeOpsForTraining into two steps #16465

Uh oh!

Conversation

Lunderberg commented Jan 24, 2024

Uh oh!

slyubomirsky commented Jan 24, 2024

Uh oh!

Lunderberg commented Jan 25, 2024

Uh oh!

slyubomirsky commented Jan 26, 2024

Uh oh!

slyubomirsky left a comment

Choose a reason for hiding this comment

Uh oh!

Lunderberg commented Feb 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants