Challenge: Is it possible to specify both `fwd` and `bwd` return values in a single vjp function? #14341

tylerflex · 2023-02-07T23:40:26Z

tylerflex
Feb 7, 2023

I have an application where my function f(x) is very expensive and must be computed at both the forward and backward pass of the VJP.

I could benefit from computing the return value of f_fwd in the bwd function as I would be able to take advantage of parallelism when computing fwd and bwd values. However, I don't see any way to do this within the constraints of jax.custom_vjp.

Does anyone have any clever ideas to avoid computing f(x) in f_fwd but instead compute it in f_bwd in the contrived example below? Thanks!

import jax

@jax.custom_vjp
def f(x):
    return jnp.sin(x)

def f_fwd(x):
    fwd_val = f(x)      # instead of computing f(x) here...
    return fwd_val, x

def f_bwd(x, v):
    fwd_val = f(x)      # would like to compute it here (at the same time as the bwd value)
    bwd_val = f(v)
    return bwd_val,     # but how to tell jax to use `fwd_val` as the output of `f_fwd()`?

f.defvjp(f_fwd, f_bwd)
val, grad = jax.value_and_grad(f)(1.0)
print(f"val={val:.2f}, grad={grad:.2f}")

A few notes:

the bwd return value depends on the vjp input v in a non-trivial way and therefore can't be computed in f_fwd.
it's not straightforward or practical to define a JVP for this application as it is a multiple input -> single output problem and doing so would result in having to evaluate f(x) far more times.

Appreciate if anyone has any thoughts!

mattjj · 2023-02-07T23:51:32Z

mattjj
Feb 7, 2023
Maintainer

Thanks for the question.

However, I think we fundamentally need to compute f(x) in the forward pass (i.e. in f_fwd), because we need its value to compute the rest of the function being differentiated, and that needs to happen before we start any part of the backward pass. That is, it's not a limitation of the jax.custom_vjp API.

For example, say we compose your f with a scalar-valued function g, like

def gf(x):
  return g(f(x))

Then the value-and-gradient is

def gf_vjp(x, v):
  # forward pass
  y, res = f_fwd(x)
  z, g_vjp = jax.vjp(g, y)  # !!!

  # backward pass
  v, = g_vjp(1.0)
  bwd_val, = f_bwd(res, v)
  return z, bwd_val

On the line marked with # !!!, we need the value y because we need to compute the output of g.

What do you think? Or did I misunderstand?

1 reply

tylerflex Feb 8, 2023
Author

Thanks for the response, that's a great point and makes total sense.

I wonder if there's any way to make it work in a more specific (and easier) case that I'm interested. In my application, the v can often be constructed without any knowledge of y. For example, I may be able to write the gf_vjp function above as

def gf_vjp(x, v):
  # forward pass
  z, g_vjp = jax.vjp(g, 0.0)  # just pass an arbitrary value

  # backward pass
  v, = g_vjp(1.0)
  y, res = f_fwd(x)  # compute  fwd_val and bwd_val together
  bwd_val, =  f_bwd(res, v)
  bwd_val = y * bwd_val # and use y to modify the bwd_val directly
  return z, bwd_val

My ultimate goal here is to detect these cases and set up a VJP to handle them where f_fwd and f_bwd are computed in parallel. Do you know if there's any way to hack something together to direct jax to a VJP like this in this specific case?

Really appreciate the help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenge: Is it possible to specify both `fwd` and `bwd` return values in a single vjp function? #14341

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Challenge: Is it possible to specify both fwd and bwd return values in a single vjp function? #14341

tylerflex Feb 7, 2023

Replies: 1 comment · 1 reply

mattjj Feb 7, 2023 Maintainer

tylerflex Feb 8, 2023 Author

Challenge: Is it possible to specify both `fwd` and `bwd` return values in a single vjp function? #14341

tylerflex
Feb 7, 2023

Replies: 1 comment 1 reply

mattjj
Feb 7, 2023
Maintainer

tylerflex Feb 8, 2023
Author