[BACKEND] Optimize code generation for load with other arg#4582
[BACKEND] Optimize code generation for load with other arg#4582ThomasRaoux merged 1 commit intotriton-lang:mainfrom
Conversation
When other is there we should use it to initalize the reg before doing the load instead of initializing the reg with 0. Note that this does add a scoreboard dependency between the other def and the load but user can remove it by using a select if other comes from a high latency op.
|
Why it's a "scoreboard" dependency?
I think using NVIDIA's terminology, scoreboard dependency refers mostly to dependency caused by memory instructions. Do you mean |
I meant register scoreboard, which is HW will stall waiting for a register to be ready. Before we had: and now we have |
Jokeren
left a comment
There was a problem hiding this comment.
mov r, other <- there will be a wait for other reg here
(p) load r
So this pattern has the benefit of releasing the other register earlier before the following load is finished?
What does "select" mean here? Do you mean |
Yes it makes the liverange smaller, the flipside is that it removes scheduling opportunities.
Yes I mean tl.where. (I guess select would be the llvm IR inst generated) |
…riton-lang#4582)" This reverts commit 78af5c9.
…ng#4582) When `other` is there we should use it to initalize the reg before doing the load instead of initializing the reg with 0. Note that this does add a scoreboard dependency between the `other` def and the load but user can remove it by using a select if other comes from a high latency op.
…riton-lang#4582)" This reverts commit 78af5c9.
When
otheris there we should use it to initalize the reg before doing the load instead of initializing the reg with 0.Note that this does add a scoreboard dependency between the
otherdef and the load but user can remove it by using a select if other comes from a high latency op.