Skip to content

[BACKEND] Optimize code generation for load with other arg#4582

Merged
ThomasRaoux merged 1 commit intotriton-lang:mainfrom
ThomasRaoux:opt_load_other
Aug 27, 2024
Merged

[BACKEND] Optimize code generation for load with other arg#4582
ThomasRaoux merged 1 commit intotriton-lang:mainfrom
ThomasRaoux:opt_load_other

Conversation

@ThomasRaoux
Copy link
Collaborator

When other is there we should use it to initalize the reg before doing the load instead of initializing the reg with 0.

Note that this does add a scoreboard dependency between the other def and the load but user can remove it by using a select if other comes from a high latency op.

When other is there we should use it to initalize the reg before
doing the load instead of initializing the reg with 0.

Note that this does add a scoreboard dependency between the other
def and the load but user can remove it by using a select if other
comes from a high latency op.
@ThomasRaoux ThomasRaoux requested a review from ptillet as a code owner August 26, 2024 21:59
@Jokeren
Copy link
Contributor

Jokeren commented Aug 26, 2024

Why it's a "scoreboard" dependency?

Note that this does add a scoreboard dependency between the other def and the load

I think using NVIDIA's terminology, scoreboard dependency refers mostly to dependency caused by memory instructions. Do you mean other is created by another memory op?

@ThomasRaoux
Copy link
Collaborator Author

Why it's a "scoreboard" dependency?

Note that this does add a scoreboard dependency between the other def and the load

I think using NVIDIA's terminology, scoreboard dependency refers mostly to dependency caused by memory instructions. Do you mean other is created by another memory op?

I meant register scoreboard, which is HW will stall waiting for a register to be ready.

Before we had:

mov r, 0
(p) load r
(!p) mov r, other <- this move can potentially be schedule later

and now we have

mov r, other <- there will be a wait for other reg here
(p) load r

@ThomasRaoux ThomasRaoux requested a review from Jokeren August 26, 2024 22:40
Copy link
Contributor

@Jokeren Jokeren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mov r, other <- there will be a wait for other reg here
(p) load r

So this pattern has the benefit of releasing the other register earlier before the following load is finished?

@Jokeren
Copy link
Contributor

Jokeren commented Aug 26, 2024

user can remove it by using a select

What does "select" mean here? Do you mean tl.where?

@ThomasRaoux
Copy link
Collaborator Author

mov r, other <- there will be a wait for other reg here
(p) load r

So this pattern has the benefit of releasing the other register earlier before the following load is finished?

Yes it makes the liverange smaller, the flipside is that it removes scheduling opportunities.

user can remove it by using a select

What does "select" mean here? Do you mean tl.where?

Yes I mean tl.where. (I guess select would be the llvm IR inst generated)

@ThomasRaoux ThomasRaoux merged commit 78af5c9 into triton-lang:main Aug 27, 2024
davidberard98 added a commit to davidberard98/triton that referenced this pull request Nov 21, 2024
bertmaher pushed a commit that referenced this pull request Nov 22, 2024
bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024
…ng#4582)

When `other` is there we should use it to initalize the reg before doing
the load instead of initializing the reg with 0.

Note that this does add a scoreboard dependency between the `other` def
and the load but user can remove it by using a select if other comes
from a high latency op.
jbdalido pushed a commit to jbdalido/triton that referenced this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants