[Dlight] Enhance Decode-GEMV Schedule #15195
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR enhances Decode-GEMV rule with the following changes:
This would help with further analysis and scheduling, in cases for
example, when there was no spatial loop in the original reduction
block.
into a TVM packed func
tir.schedule.GetLoopIterTypeusingtvm._ffi.get_global_func.innermost dimension is spatial or reduction.
suggest_threads_per_blockto guess the threads to beallocated each threadblock. This helps avoid the previous case where
dlight allocates 256 threads for a workload whose degree of parallelism
is only 128.
This rest of the changes are split out to separate PRs that are already
merged to main.
be positive ones ([TIR][Schedule] Derive Nonnegative Bounds from Shape Var #15210)
provable via affine analysis ([ARITH] Allow Analyzer to MarkGlobalNonNegValue #15193)
X[threadIdx.x]is used ([TIR][Transform] Add LiftThreadBinding Pass #15207)