Deterministic backward tracing #227

ezyang · 2017-09-13T17:56:10Z

Backwards tracing is nondeterministic because the execution order of autograd functions is done in parallel. I hope I don't need to say why nondeterminism is bad.

In #217, I proposed that we topsort our trace in order to squash nondeterminism. But there's a problem when you do this on forwards: if we perform a topsort, we destroy our recording of the topological order operations were originally run in, and we are currently operating under the assumption that this topological order is useful and should be preserved as much as possible.

The situation is more nuanced for backwards. Inside a single (transparent) backward() operation, there will be some sequence of autograd function calls whose topological order we want to preserve. However, if we consider multiple backwards functions executing in parallel, there is some (nondeterministic) interleaving due to thread scheduling, and we don't care about preserving this interleaving. To put it in other words, we care about preserving deterministic topological order, and we want to discard nondeterministic topological order.

The text was updated successfully, but these errors were encountered:

ezyang · 2017-11-08T06:38:40Z

test_backward_opaque as seen in this build log https://travis-ci.org/pytorch/pytorch/jobs/298935108 seems to be affected by this problem:

======================================================================
FAIL: test_backward_opaque (__main__.TestJit)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_jit.py", line 409, in test_backward_opaque
    self.assertExpected(str(trace))
  File "/home/travis/build/pytorch/pytorch/test/common.py", line 330, in assertExpected
    self.assertMultiLineEqual(expected, s)
AssertionError: 'grap[246 chars]-1](%2, %4), uses = [%0.i1];\n  %6 : Double(3,[69 chars]n}\n' != 'grap[246 chars]-1](%4, %1), uses = [%0.i2];\n  %6 : Double(3,[69 chars]n}\n'
  graph(%1 : Double(3, 3)
        %2 : Double(3, 3)
        -------- stage 1 --------
        %4 : Double(3, 3)) {
    %3 : Double(3, 3) = cross[dim=-1](%1, %2), uses = [%0.i0];
    ---------------- stage 1 ----------------
-   %5 : Double(3, 3) = cross[dim=-1](%2, %4), uses = [%0.i1];
-   %6 : Double(3, 3) = cross[dim=-1](%4, %1), uses = [%0.i2];
?    ^
+   %5 : Double(3, 3) = cross[dim=-1](%4, %1), uses = [%0.i2];
?    ^
+   %6 : Double(3, 3) = cross[dim=-1](%2, %4), uses = [%0.i1];
-   return (%3, %5, %6);
?                 ----
+   return (%3, %6, %5);
?               ++++
  }

The backwards of cross performs two crosses, but the order of execution is nondeterministic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deterministic backward tracing #227

Deterministic backward tracing #227

ezyang commented Sep 13, 2017

ezyang commented Nov 8, 2017

Deterministic backward tracing #227

Deterministic backward tracing #227

Comments

ezyang commented Sep 13, 2017

ezyang commented Nov 8, 2017