Implementing padding inside jit function #22458

tchatow · 2024-07-15T18:06:49Z

tchatow
Jul 15, 2024

The use case I'm describing is a function implemented in jax, lowered/exported to HLO, and executed with the XLA C++ runtime. I couldn't get shape polymorphism to load correctly with XLA, though it doesn't solve my use anyway since I need ahead-of-time compilation. I used Jax to export functions for varying powers of 2 of the input dimensions, and then select the smallest one at runtime. This works well, but for some functions I need to pad the inputs. Here are some things I've considered:

Pad the arrays as a preprocessing step before invoking XLA Execute. However, I would prefer the padding logic remain part of the HLO function so the runtime needs no knowledge of the padding scheme.
A middle ground, to keep the padding logic in the HLO is to have the runtime build a mask for each input. This requires additional memory and can't be inlined.
The next attempt adds a parameter to the function, actual_shapes, which the runtime populates with the real dimensions of each parameter. For instance, all within a jitted function:

# args is a list of input arrays (already sized up to powers of 2)
# padding is a static array with the fill value for each input
# actual_shapes is a matrix which contains the actual shapes of each input (this is runtime, not static)
new_args = []
# Loop is statically unrolled
for idx, arg in enumerate(args):
    # Number of dimensions of the argument (static)
    ndim = arg.ndim
    # Actual shape of the argument, as determined at runtime
    actual_shape = actual_shapes[:,idx][0:ndim]

    # Construct mask with the power-of-2 shape, then set values outside the runtime shape to True
    mask = jax.numpy.zeros_like(arg, dtype=jax.numpy.bool).at[tuple(slice(0, i) for i in actual_shape)].set(True)

    new_args.append(jax.numpy.where(mask, arg, padding[idx]))

This, however, doesn't work since slicing needs to be static. I can't find a correct dynamic slice method either since they require the shape to static. We can try to construct the mask a different way by building up each dimension:

mask = 1
for i, (static_len, dyn_len) in enumerate(zip(arg.shape, actual_shape)):
    mask = mask * (jax.numpy.arange(static_len).reshape(-1, *(1,)*(ndim-i-1)) < dyn_len)

The matrix multiplication is slow, but works. We could also try successive where's on the argument to avoid constructing the large temporary mask.

for i, (static_len, dyn_len) in enumerate(zip(arg.shape, actual_shape)):
    mask = (jax.numpy.arange(static_len).reshape(-1, *(1,)*(ndim-i-1)) < dyn_len)
    arg = jax.numpy.where(mask, arg, padding[idx])

This problem doesn't seem like it should be difficult or expensive - after all, a simple comparison of the thread idx in a CUDA function with the runtime shape would perform this operation. I'm not sure exactly what the equivalent in CPU code would be.

How can I implement this padding in pure Jax?

Answered by jakevdp

Jul 22, 2024

Well, this maps pretty directly to the operations the GPU will have to do to construct the output array, so I'm not sure what alternatives could exist.

View full answer

jakevdp · 2024-07-22T16:06:57Z

jakevdp
Jul 22, 2024
Maintainer

I don't know of any way to do this beyond the methods you've already suggested.

One tweak, though: rather than element-wise multiplication for the mask, you might use element-wise and &. Also meshgrid is helpful to construct the indices:

mask = 1
indices = jnp.meshgrid(*(jnp.arange(s) for s in shape), sparse=True, indexing='ij')
for i, dyn_len in zip(indices, actual_shape):
  mask &= (i < dyn_len)

2 replies

tchatow Jul 22, 2024
Author

Thanks, I did arrive at the & mask as well. It feels backward to check whether each element is masked or not, rather than an imperative-style fill of the padded values. However I haven't done much benchmarking yet to prove whether it's a problem.

jakevdp Jul 22, 2024
Maintainer

Well, this maps pretty directly to the operations the GPU will have to do to construct the output array, so I'm not sure what alternatives could exist.

Answer selected by tchatow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing padding inside jit function #22458

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Implementing padding inside jit function #22458

tchatow Jul 15, 2024

Replies: 1 comment · 2 replies

jakevdp Jul 22, 2024 Maintainer

tchatow Jul 22, 2024 Author

jakevdp Jul 22, 2024 Maintainer

tchatow
Jul 15, 2024

Replies: 1 comment 2 replies

jakevdp
Jul 22, 2024
Maintainer

tchatow Jul 22, 2024
Author

jakevdp Jul 22, 2024
Maintainer