Unrolling tensor subclasses in fwd/bwd split #1489

crcrpar · 2024-11-28T09:36:52Z

What does this PR do?

In #1415 and #1394, tensor subclasses and their __torch_dispatch__ are unrolled before forward-backward split.
It turned out that we want to postpone it in the split as the unrolling seems to be harmful to backward generation.

TODO

support no autograd cases

note: pytorch/ao#1339 is used

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar · 2024-11-28T09:37:51Z

thunder/core/jit_ext.py

@@ -637,7 +637,7 @@ def _convert_pytorchfunc_to_thundertrace(
    trace = TraceCtx()
    trace.bound_symbols.extend(active_jit_ctx.computation_trace.pop_scope())
    func_result = unwrap(wrapped_func_result)
-    if shallow_copy_output:
+    if shallow_copy_output and not trace.bound_symbols:


copy from #1485

crcrpar · 2024-11-28T09:38:34Z

thunder/torch/__init__.py

+def _transpose_grad(a: TensorLike, /, dim0: int, dim1: int) -> TensorLike:
+    fwd = transpose(a, dim0, dim1)
+    g = get_grad(fwd)
+    a_grad = transpose(g, dim0, dim1)
+    put_grad(a, a_grad)
+    return fwd
+
+
+register_grad(transpose, _transpose_grad)


rel: #1487

needed to avoid prims.permute

Signed-off-by: Masaki Kozuki <[email protected]>

for more information, see https://pre-commit.ci

crcrpar · 2024-11-28T11:30:18Z

thunder/tests/test_tensor_subclass.py

@@ -269,3 +266,5 @@ def test_torchao_float8_linear(executor, device, _):

    jitted = executor.make_callable(fp8_model)
    actual = jitted(x)
+
+    torch.testing.assert_close(actual, expected)


traces: https://gist.github.com/crcrpar/c682d624e2eed3c1805ceceaf7de830b

Signed-off-by: Masaki Kozuki <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Masaki Kozuki <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

crcrpar added 6 commits November 27, 2024 17:42

allow enum in tree_flatten

0bd36d2

Signed-off-by: Masaki Kozuki <[email protected]>

tmp

10edd13

Signed-off-by: Masaki Kozuki <[email protected]>

flattening in fwd-bwd split

84de0fc

Signed-off-by: Masaki Kozuki <[email protected]>

register grad rule of ltorch.transpose tentatively

9a1f78d

Signed-off-by: Masaki Kozuki <[email protected]>

remove debug print from scaled_mm transform

84bb4ec

Signed-off-by: Masaki Kozuki <[email protected]>

fix TensorSubclass.__tensor_unflatten__

6fc88e9

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar commented Nov 28, 2024

View reviewed changes

crcrpar and others added 3 commits November 28, 2024 20:21

flattening for non-autograd

a90075a

Signed-off-by: Masaki Kozuki <[email protected]>

check outputs

de76714

Signed-off-by: Masaki Kozuki <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

ec05978

for more information, see https://pre-commit.ci

crcrpar commented Nov 28, 2024

View reviewed changes

crcrpar marked this pull request as ready for review November 28, 2024 11:33

crcrpar requested review from mruberry, lantiga and t-vi as code owners November 28, 2024 11:33

crcrpar removed request for lantiga, t-vi and mruberry November 28, 2024 11:35

crcrpar and others added 2 commits November 28, 2024 20:55

use more executors

f582143

Signed-off-by: Masaki Kozuki <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

308b8f4

for more information, see https://pre-commit.ci

crcrpar merged commit 06ee30e into crpa/subclass-torchao_float8tensor Nov 28, 2024
29 of 36 checks passed

crcrpar deleted the crpa/torchao-fp8tensor-flattening-in-fwdbwd-split branch November 28, 2024 12:12

crcrpar added a commit that referenced this pull request Nov 28, 2024

Unrolling tensor subclasses in fwd/bwd split (#1489)

14ccf6b

Signed-off-by: Masaki Kozuki <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unrolling tensor subclasses in fwd/bwd split #1489

Unrolling tensor subclasses in fwd/bwd split #1489

crcrpar commented Nov 28, 2024 •

edited

Loading

crcrpar Nov 28, 2024

crcrpar Nov 28, 2024

crcrpar Nov 28, 2024

Unrolling tensor subclasses in fwd/bwd split #1489

Unrolling tensor subclasses in fwd/bwd split #1489

Conversation

crcrpar commented Nov 28, 2024 • edited Loading

What does this PR do?

crcrpar Nov 28, 2024

Choose a reason for hiding this comment

crcrpar Nov 28, 2024

Choose a reason for hiding this comment

crcrpar Nov 28, 2024

Choose a reason for hiding this comment

crcrpar commented Nov 28, 2024 •

edited

Loading