You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Backwards tracing is nondeterministic because the execution order of autograd functions is done in parallel. I hope I don't need to say why nondeterminism is bad.
In #217, I proposed that we topsort our trace in order to squash nondeterminism. But there's a problem when you do this on forwards: if we perform a topsort, we destroy our recording of the topological order operations were originally run in, and we are currently operating under the assumption that this topological order is useful and should be preserved as much as possible.
The situation is more nuanced for backwards. Inside a single (transparent) backward() operation, there will be some sequence of autograd function calls whose topological order we want to preserve. However, if we consider multiple backwards functions executing in parallel, there is some (nondeterministic) interleaving due to thread scheduling, and we don't care about preserving this interleaving. To put it in other words, we care about preserving deterministic topological order, and we want to discard nondeterministic topological order.
The text was updated successfully, but these errors were encountered:
Backwards tracing is nondeterministic because the execution order of autograd functions is done in parallel. I hope I don't need to say why nondeterminism is bad.
In #217, I proposed that we topsort our trace in order to squash nondeterminism. But there's a problem when you do this on forwards: if we perform a topsort, we destroy our recording of the topological order operations were originally run in, and we are currently operating under the assumption that this topological order is useful and should be preserved as much as possible.
The situation is more nuanced for backwards. Inside a single (transparent)
backward()
operation, there will be some sequence of autograd function calls whose topological order we want to preserve. However, if we consider multiple backwards functions executing in parallel, there is some (nondeterministic) interleaving due to thread scheduling, and we don't care about preserving this interleaving. To put it in other words, we care about preserving deterministic topological order, and we want to discard nondeterministic topological order.The text was updated successfully, but these errors were encountered: