Numerical discrepancy when calculating the jacobian on TPU vs CPU #11069

schainmariano · 2022-06-11T19:56:40Z

Calculating a Jacobian returns substantially different answers on CPU vs. TPU.

In the example below, we calculate the Jacobian, sum its rows and compare it to the gradient of the sum (calculated using gradient). When running on CPU, the answers are indeed almost identical, however, when running on TPU we get a large discrepancy.

import jax
import jax.numpy as jnp

rng = jax.random.PRNGKey(0)
n = 1000
m = 10000

A = jax.random.normal(rng, (n, m))


def f(x):
  return jnp.dot(A, x)

g = jax.jit(jax.jacrev(f))
g_sum = jax.jit(jax.grad(lambda x: jnp.sum((f(x)))))

x0 = jax.random.normal(rng, (m,))

print("max numerical error = ", jnp.max(jnp.abs(jnp.sum(g(x0), axis=0) - g_sum(x0))))

When running g on TPU we get

>>> max numerical error =  0.21506119

and on CPU we get

>>> max numerical error =  0.0001449585

The text was updated successfully, but these errors were encountered:

sharadmv · 2022-06-12T04:07:54Z

I think this is a matmul precision issue.

TPUs by default do bf16 matmuls. You can use the jax.default_matmul_precision context manager/flag to set the default precision to 'float32' which should be much closer to the CPU results (at the cost of being slower).

with jax.default_matmul_precision('float32'):
  z = jnp.dot(x, y)

Alternatively you can pass a precision directly into jax.lax.dot_general

hawkinsp · 2022-06-14T14:10:26Z

Correct. This is almost certainly related to the default matmul precision on TPU.

Closing as a duplicate of #7010

hawkinsp closed this as completed Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerical discrepancy when calculating the jacobian on TPU vs CPU #11069

Numerical discrepancy when calculating the jacobian on TPU vs CPU #11069

schainmariano commented Jun 11, 2022

sharadmv commented Jun 12, 2022 •

edited

Loading

hawkinsp commented Jun 14, 2022

Numerical discrepancy when calculating the jacobian on TPU vs CPU #11069

Numerical discrepancy when calculating the jacobian on TPU vs CPU #11069

Comments

schainmariano commented Jun 11, 2022

sharadmv commented Jun 12, 2022 • edited Loading

hawkinsp commented Jun 14, 2022

sharadmv commented Jun 12, 2022 •

edited

Loading