How can I write a pallas kernel with no pure out-arguments? #23272

axelfeldmann · 2024-08-27T15:07:29Z

axelfeldmann
Aug 27, 2024

Hi,

Suppose I want to write a Pallas matmul kernel that does C += A @ B. Then, I want to pass A, B, and C to my Pallas kernel as in/out arguments maybe something like this:

    kernel = functools.partial(kernel_func)
    pl.pallas_call(kernel,
        grid=grid, in_specs=in_specs,
        out_specs=[  ],
        out_shape=[  ], name="matmul"
    )(A, B, C)

This means that I leave out_specs and out_shape empty... but then the kernel doesn't actually do anything! (C will be unchanged at the end).

I am able to get around this by creating a dummy "pure out argument" like this:

    dummy = jnp.zeros((1, 1), dtype=A.dtype)
    kernel = functools.partial(kernel_func)
    dummy_out, = pl.pallas_call(kernel,
        grid=grid, in_specs=in_specs,
        out_specs=[ pl.BlockSpec(lambda r, c: (r, c), (1, 1)) ],
        out_shape=[ dummy ], name="matmul"
    )(A, B, C)

and then everything works totally fine. However, this seems like a very hacky workaround. What is the proper way to do this?

Thanks so much!

Answered by Rifur13

Aug 27, 2024

One possible solution is to initialize the output to C, and then do A @ B inside the kernel.
You can initialize the output using input/output aliasing, which allows us to reuse input buffers for outputs.

It will look something like this:

A = jnp.ones((16, 16))
B = jnp.ones((16, 16))
C = jnp.ones((16, 16)) + 20

def kernel(a_ref, b_ref, _, out_ref):
  out_ref[...] += a_ref[...] @ b_ref[...]

out = pl.pallas_call(kernel,
    grid=(1, ),
    out_shape=jax.ShapeDtypeStruct(C.shape, C.dtype),
    name="matmul",
    input_output_aliases={2: 0},
)(A, B, C)

View full answer

Rifur13 · 2024-08-27T19:51:18Z

Rifur13
Aug 27, 2024
Collaborator

One possible solution is to initialize the output to C, and then do A @ B inside the kernel.
You can initialize the output using input/output aliasing, which allows us to reuse input buffers for outputs.

It will look something like this:

A = jnp.ones((16, 16))
B = jnp.ones((16, 16))
C = jnp.ones((16, 16)) + 20

def kernel(a_ref, b_ref, _, out_ref):
  out_ref[...] += a_ref[...] @ b_ref[...]

out = pl.pallas_call(kernel,
    grid=(1, ),
    out_shape=jax.ShapeDtypeStruct(C.shape, C.dtype),
    name="matmul",
    input_output_aliases={2: 0},
)(A, B, C)

1 reply

axelfeldmann Aug 27, 2024
Author

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I write a pallas kernel with no pure out-arguments? #23272

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How can I write a pallas kernel with no pure out-arguments? #23272

axelfeldmann Aug 27, 2024

Replies: 1 comment · 1 reply

Rifur13 Aug 27, 2024 Collaborator

axelfeldmann Aug 27, 2024 Author

axelfeldmann
Aug 27, 2024

Replies: 1 comment 1 reply

Rifur13
Aug 27, 2024
Collaborator

axelfeldmann Aug 27, 2024
Author