Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic backward tracing #227

Open
ezyang opened this issue Sep 13, 2017 · 1 comment
Open

Deterministic backward tracing #227

ezyang opened this issue Sep 13, 2017 · 1 comment

Comments

@ezyang
Copy link
Owner

ezyang commented Sep 13, 2017

Backwards tracing is nondeterministic because the execution order of autograd functions is done in parallel. I hope I don't need to say why nondeterminism is bad.

In #217, I proposed that we topsort our trace in order to squash nondeterminism. But there's a problem when you do this on forwards: if we perform a topsort, we destroy our recording of the topological order operations were originally run in, and we are currently operating under the assumption that this topological order is useful and should be preserved as much as possible.

The situation is more nuanced for backwards. Inside a single (transparent) backward() operation, there will be some sequence of autograd function calls whose topological order we want to preserve. However, if we consider multiple backwards functions executing in parallel, there is some (nondeterministic) interleaving due to thread scheduling, and we don't care about preserving this interleaving. To put it in other words, we care about preserving deterministic topological order, and we want to discard nondeterministic topological order.

@ezyang
Copy link
Owner Author

ezyang commented Nov 8, 2017

test_backward_opaque as seen in this build log https://travis-ci.org/pytorch/pytorch/jobs/298935108 seems to be affected by this problem:

======================================================================
FAIL: test_backward_opaque (__main__.TestJit)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_jit.py", line 409, in test_backward_opaque
    self.assertExpected(str(trace))
  File "/home/travis/build/pytorch/pytorch/test/common.py", line 330, in assertExpected
    self.assertMultiLineEqual(expected, s)
AssertionError: 'grap[246 chars]-1](%2, %4), uses = [%0.i1];\n  %6 : Double(3,[69 chars]n}\n' != 'grap[246 chars]-1](%4, %1), uses = [%0.i2];\n  %6 : Double(3,[69 chars]n}\n'
  graph(%1 : Double(3, 3)
        %2 : Double(3, 3)
        -------- stage 1 --------
        %4 : Double(3, 3)) {
    %3 : Double(3, 3) = cross[dim=-1](%1, %2), uses = [%0.i0];
    ---------------- stage 1 ----------------
-   %5 : Double(3, 3) = cross[dim=-1](%2, %4), uses = [%0.i1];
-   %6 : Double(3, 3) = cross[dim=-1](%4, %1), uses = [%0.i2];
?    ^
+   %5 : Double(3, 3) = cross[dim=-1](%4, %1), uses = [%0.i2];
?    ^
+   %6 : Double(3, 3) = cross[dim=-1](%2, %4), uses = [%0.i1];
-   return (%3, %5, %6);
?                 ----
+   return (%3, %6, %5);
?               ++++
  }

The backwards of cross performs two crosses, but the order of execution is nondeterministic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant