Confused about JIT-compilation of nested functions #21297

dfdx · 2024-05-18T19:58:32Z

dfdx
May 18, 2024

Does JAX use cache when JIT-compiling nested functions? Consider this example:

import jax
import jax.numpy as jnp


def outer(a: jax.Array):
    def inner(x: jax.Array):
        print("compiling")
        return x * 2
    # print(f"id(inner) = {id(inner)}")
    # print(f"hash(inner) = {hash(inner)}")
    jitted_inner = jax.jit(inner)
    return jitted_inner(a)

a = jax.random.normal(jax.random.key(0), (3, 4))
outer(a)

When calling outer(), I'd expect inner() to be compiled just once and then re-used from the cache. But in practice text "compiling" is printed on each call.

Some sources say that jax.jit() uses function ID as a cache key, others mention the hash. In any case, I added both (uncomment print() statements in the code above), and indeed both values change on every call to outer().

If Python re-defines nested functions on each call, how do I make JIT cache them?

Full output after 3 calls:

In [48]: outer(a);
id(inner) = 134661989421488
hash(inner) = 8416374338843
compiling

In [49]: outer(a);
id(inner) = 134662026406512
hash(inner) = 8416376650407
compiling

In [50]: outer(a);
id(inner) = 134661989432864
hash(inner) = 8416374339554
compiling

System info:

Python 3.10.12
JAX 0.4.26

Answered by dfm

May 19, 2024

That's what I expected! Yeah, like you say, the usual advice here would be to move the jit as high up the stack as a you can. For example, in the flax examples that you link to, the jit is applied to the training step, e.g.:

@jax.jit
def outer(...):
  def inner(...):
    ...
  inner(...)

in which case inner is only compiled once!

But, there are cases where this won't necessarily work (e.g. long compile times, etc.). In that case, maybe you could try converting the closure into a compiled function (at the global level) which takes the relevant parameters as static arguments, which should also lead to a cache hit.

View full answer

dfm · 2024-05-19T10:22:06Z

dfm
May 19, 2024
Collaborator

As you've identified here, since inner is being redefined in each call to outer, it is treated as a new function by JAX, and you get a cache miss. In this particular example, I would move the definition of inner up to the global scope:

def inner(x: jax.Array):
    print("compiling")
    return x * 2

def outer(a: jax.Array):
    jitted_inner = jax.jit(inner)
    return jitted_inner(a)

which would work as you intend. To be completely explicit I would probably also move the jit decorator to inner:

@jax.jit
def inner(x: jax.Array):
    print("compiling")
    return x * 2

def outer(a: jax.Array):
    return inner(a)

but the former seems to do the trick as well.

In this simple example, I don't see any reason why you wouldn't want to refactor like this, but I'm not sure how easily this generalizes to your case. Either way, I hope it helps!

10 replies

dfdx May 19, 2024
Author

Indeed, this code is just a simplified example of real use case. In reality, I'm looking at the implementation of greedy search by Huggingface, which uses two inner functions greedy_search_body_fn and greedy_search_cond_fn, both of which are actually closures over model and parameters. These functions are implicitly compiled in lax.while_loop, but the performance isn't great. After a few experiments I realized the inner functions are actually re-compiled on every call, and so here we are.

More generally, many JAX/Flax tutorials make use of nested functions (example). If inner functions are re-defined on each call, does it mean the only proper way to handle them is to apply JIT to the outer-most ones?

dfm May 19, 2024
Collaborator

That's what I expected! Yeah, like you say, the usual advice here would be to move the jit as high up the stack as a you can. For example, in the flax examples that you link to, the jit is applied to the training step, e.g.:

@jax.jit
def outer(...):
  def inner(...):
    ...
  inner(...)

in which case inner is only compiled once!

But, there are cases where this won't necessarily work (e.g. long compile times, etc.). In that case, maybe you could try converting the closure into a compiled function (at the global level) which takes the relevant parameters as static arguments, which should also lead to a cache hit.

Answer selected by dfdx

dfdx May 19, 2024
Author

It would require quite some refactoring, but eventually should work. Thank you for clarification!

JeffGreen Sep 3, 2024

As a follow-up, I'd like to understand why / how functions are designated "new" functions in JAX.

If I declare two lambdas with identical source code, JAX will consider these to be two different functions even though they produce the same outputs, will have the same JIT representation, etc. The implication here is that JAX is hashing / caching based not on the functions arguments / content (ie inspect.getsource() or related), but instead based on the declaration / Callable id() itself. In an effort to understand JAX and its design decisions more deeply (and having run into this issue myself!), I'm curious why JAX caches based on function declaration / ID as opposed to content.

Thanks.

jakevdp Sep 3, 2024
Maintainer

Yes, you're correct: JAX caches the JIT-compilation artifact based on the hash/equality of the function object being compiled, not based on the content of the function. What that means, for example, is that this code will lead to 10 compilations:

for i in range(10):
  jax.jit(lambda x: x)(x)

JeffGreen Sep 3, 2024

Thanks. I'm not familiar with the innards of JAX - would it be prohibitively difficult to hash based on content instead so that we could avoid redundant recompilations as above?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confused about JIT-compilation of nested functions #21297

{{title}}

Replies: 1 comment 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Confused about JIT-compilation of nested functions #21297

dfdx May 18, 2024

Replies: 1 comment · 10 replies

dfm May 19, 2024 Collaborator

dfdx May 19, 2024 Author

dfm May 19, 2024 Collaborator

dfdx May 19, 2024 Author

JeffGreen Sep 3, 2024

jakevdp Sep 3, 2024 Maintainer

JeffGreen Sep 3, 2024

dfdx
May 18, 2024

Replies: 1 comment 10 replies

dfm
May 19, 2024
Collaborator

dfdx May 19, 2024
Author

dfm May 19, 2024
Collaborator

dfdx May 19, 2024
Author

jakevdp Sep 3, 2024
Maintainer