Change make_ttgir and make_llir to make it closer to OpenAI version#4010
Change make_ttgir and make_llir to make it closer to OpenAI version#4010
Conversation
Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the TTGIR/LLIR generation pipeline in the Intel backend to align the implementation closer to the OpenAI version. Key changes include the extraction of helper methods for options validation and module annotation, removal of inline warp-specialization in make_llir, and the addition of new passes (global scratch memory allocation, canonicalization, and common subexpression elimination) in the LLIR pipeline.
Comments suppressed due to low confidence (2)
third_party/intel/backend/compiler.py:341
- The previous warp-specialization block that adjusted num_warps by multiplying with num_warp_groups has been removed. Confirm that this change aligns with the new design and does not introduce unintended behavior.
def make_llir(src, metadata, options):
third_party/intel/backend/compiler.py:303
- Verify that the updated order—annotating the module, retrieving threads_per_warp, and then validating options—preserves the intended constraints without introducing discrepancies compared to the previous inline validation.
XPUBackend.validate_options(opt, properties)
Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
Signed-off-by: Tiotto, Ettore <ettore.tiotto@intel.com>
whitneywhtsang
left a comment
There was a problem hiding this comment.
Please ensure no performance regressions before merging.
No performance regressions in any of the key benchmarks: |
This path modifies
make_ttgirandmake_llirto make it more similar to the OpenAI versions. Specifically:add_allocate_global_scratch_memory(pm), cananicolizations, cst to the make_llir pass pipeline