Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linalgtiling #974

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Linalgtiling #974

wants to merge 21 commits into from

Conversation

arun-thmn
Copy link

@arun-thmn arun-thmn commented Sep 18, 2024

This PR adds lingalg tiling pass to the pipeline.

Linalg tiling pass does tiling of the innermost dimensions of the batch reduce matmul. After tiling it swaps the BRGEMM loop and k-loop. The loop structure looks like M-loop->N-loop->BRGEMM-loop->K-loop(batch reduce matmul). Then lower the tiled code for vectorization using the vector contract pass.

The goal is to perform vectorization in tpp-mlir itself without relying on the libxsmm.

Copy link
Collaborator

@adam-smnk adam-smnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some fly-by comments

lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
test/Passes/pass-tile-linalg-matmul.mlir Outdated Show resolved Hide resolved
include/TPP/Passes.td Outdated Show resolved Hide resolved
@arun-thmn arun-thmn marked this pull request as ready for review October 7, 2024 10:59
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
include/TPP/Passes.td Outdated Show resolved Hide resolved
include/TPP/Passes.td Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this is a temporary pass to expose the behaviour we want and to compare in the future to an upstream implementation of a proper brgemm tiling for vectorization into XSMM-like loops.

Currently on memref because it's easier to reason about pointers, but linalg tiling is done at tensors, and there's already a mechanism for that upstream (TilingInterface and tileLinalgOpImpl) which need to be correctly updated.

But this is a good comparison for the future upstream patches, both in implementation and performance, so it's good to have it in tpp-mlir.

Before we merge, here are a few requests:

  • Add a benchmark for the tling/vectorization, by duplicating one of the GEMM mlir-gen benchmarks and adding tpp-opt with your new options to it. It should be slower than XSMM, but faster than plain vectorization.
  • Rename the pass to brgemm-tiling, since this will not be expanded to all linalg ops and actual linalg tiling already exists upstream, at tensor level.
  • Expand comments for most of the code, that is too dense to read through and follow the core logic.
  • Rename variables and test files to brgemm instead of linalgOp or matmul, to make it clear what the pass is supposed to do.

Thanks!

lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
for (auto itrShapeM = finaltile.begin(); itrShapeM != finaltile.end();
itrShapeM++, i++) {
int index = swap_i[i] / boundariesOne[swap_i[i]];
int offset = swap_i[i] / (finaltile.size() - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what is index, offset and indexTwo here. Can you elaborate on a comment and perhaps name them better?

lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
std::prev(innermostForLoop.getBody()->end(), 1));
auto clone = rewriter.clone(*linalgOp);
linalgOp.replaceAllUsesWith(clone);
if (linalgOp->use_empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing all users, why use would be not empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting error: operand #0 does not dominate this use, if we don't call use_empty().

lib/TPP/Transforms/LinalgTiling.cpp Outdated Show resolved Hide resolved
include/TPP/IR/TilingUtils.h Outdated Show resolved Hide resolved
lib/TPP/Transforms/BrgemmLinalgTiling.cpp Outdated Show resolved Hide resolved
include/TPP/Passes.td Outdated Show resolved Hide resolved
Copy link
Contributor

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is too dense to review properly. This cannot be upstreamed anyway, so let's just go with it and see what the IR we get for the next stage and iterate.

Remember, when we want to upstream this, it will have to go as implementation of the tiling interface for these operations (batch/reduce/matmul), so we'll have to re-implement from scratch in a completely different mindset.

But if this works out, it will serve as a baseline, so let's get going and reach that baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants