Exclude IndexSelectOp from inlineMost by jjsjann123 · Pull Request #4266 · NVIDIA/Fuser

jjsjann123 · 2025-04-17T18:30:11Z

This is to restore a performance regression for pointwise kernel containing IndexSelectOp.

IndexSelectOp supports vectorized load. Excluding it from inlineMost allows a codegen change from

array<vector_factor> buffer;
for unroll:
    vec_load(buffer)  // IndexSelectOp
    ... // computation consuming value of buffer

to

array<vector_factor * unroll> buffer;
for unroll:
    vec_load(&buffer[i])  // IndexSelectOp
for unroll:
    ... // computation consuming value of buffer

tasks:

code comment
add benchmark numbers in PR

github-actions · 2025-04-17T18:30:56Z

Review updated until commit 222bc69

Description

Exclude IndexSelectOp from inlineMost for performance
Aggregate allocation of manual unroll ID and inner ID
Add comment explaining the change

Changes walkthrough 📝

Relevant files

Enhancement

pointwise.cpp `Exclude IndexSelectOp from inlineMost` csrc/scheduler/pointwise.cpp Add logic to exclude `IndexSelectOp` output from `inner_most_tensors` Add comment explaining the rationale behind the change	+8/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests

⚡ Recommended focus areas for review

Performance Impact

The change excludes IndexSelectOp from inlineMost, which may have implications on performance. Ensure that the performance metrics provided in the PR description accurately reflect the intended performance gains and that there are no unintended regressions.

// IndexSelectOp reads lookup tv without cache. Because pointwise scheduler
// doesn't use ParallelType::Unroll, we need to exclude consumer of fusion
// inputs to be inlineMost. This allows us to aggregate the allocation of
// manual unroll ID and its inner ID.
for (auto idx_sel : ir_utils::getOpsOfType<IndexSelectOp>(fusion)) {
  inner_most_tensors.erase(idx_sel->output(0)->as<TensorView>());
}

jjsjann123 · 2025-04-17T20:20:19Z

Comparing the performance across the embedding fwd python benchmark. This is the speedup we get from this PR.

…select

jjsjann123 · 2025-04-18T17:29:57Z

!test

csrc/scheduler/pointwise.cpp

…select

jjsjann123 · 2025-04-29T18:20:18Z

!test

jjsjann123 · 2025-04-30T09:48:53Z

!test

jjsjann123 · 2025-04-30T09:49:11Z

@naoyam updated the comment, this one should be good to go.

naoyam

LGTM

jjsjann123 added 4 commits April 17, 2025 09:28

Hack to disable transpose scheduler

53da9a1

WIP

8dfba34

WIP

eda1c13

removing unwanted changes

612d905

adding comment

4e4d2b9

jjsjann123 changed the title ~~Jjsjann123/fixing vectorized load for index select~~ Exclude IndexSelectOp from inlineMost Apr 18, 2025

Merge branch 'main' into jjsjann123/fixing_vectorized_load_for_index_…

fd003bf

…select

jjsjann123 marked this pull request as ready for review April 18, 2025 17:29

jjsjann123 requested review from kevinstephano, naoyam and protonu April 18, 2025 17:30

naoyam reviewed Apr 18, 2025

View reviewed changes

csrc/scheduler/pointwise.cpp Show resolved Hide resolved

naoyam reviewed Apr 18, 2025

View reviewed changes

csrc/scheduler/pointwise.cpp Outdated Show resolved Hide resolved

jjsjann123 requested a review from naoyam April 22, 2025 18:24

jjsjann123 added 2 commits April 29, 2025 08:54

Merge branch 'main' into jjsjann123/fixing_vectorized_load_for_index_…

073e62f

…select

Merge branch 'main' into jjsjann123/fixing_vectorized_load_for_index_…

b2b4e76

…select

jjsjann123 added 2 commits April 30, 2025 02:47

adjusting comment

d8b270f

Merge remote-tracking branch 'origin/main' into HEAD

222bc69

jjsjann123 requested a review from liqiangxl April 30, 2025 09:48

naoyam approved these changes Apr 30, 2025

View reviewed changes

jjsjann123 merged commit a8ad0bc into main Apr 30, 2025
52 of 53 checks passed

jjsjann123 deleted the jjsjann123/fixing_vectorized_load_for_index_select branch April 30, 2025 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude IndexSelectOp from inlineMost#4266

Exclude IndexSelectOp from inlineMost#4266
jjsjann123 merged 10 commits intomainfrom
jjsjann123/fixing_vectorized_load_for_index_select

jjsjann123 commented Apr 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 17, 2025 •

edited

Loading

Uh oh!

jjsjann123 commented Apr 17, 2025

Uh oh!

jjsjann123 commented Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

jjsjann123 commented Apr 29, 2025

Uh oh!

jjsjann123 commented Apr 30, 2025

Uh oh!

jjsjann123 commented Apr 30, 2025

Uh oh!

naoyam left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jjsjann123 commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

jjsjann123 commented Apr 17, 2025

Uh oh!

jjsjann123 commented Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

jjsjann123 commented Apr 29, 2025

Uh oh!

jjsjann123 commented Apr 30, 2025

Uh oh!

jjsjann123 commented Apr 30, 2025

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jjsjann123 commented Apr 17, 2025 •

edited

Loading

github-actions bot commented Apr 17, 2025 •

edited

Loading