-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[TIR] Enhance and fix tensorize schedule for some case #16560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @Hzfengsy and @Lunderberg , looks like pr #13299 provides a stmt_simplify declaration but do not provide an implementation. |
|
cc @vinx13 |
|
|
||
| class TensorIntrinSimplifier : public arith::IRMutatorWithAnalyzer { | ||
| public: | ||
| static PrimFunc Apply(PrimFunc func, arith::Analyzer* analyzer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of simplifying the body of the PrimFunc, can we instead simplify the entire PrimFunc? That way, dynamic expressions that are used in shapes are exposed to the analyzer as non-negative. (e.g. Using buffer of shape [n,m] implies that n >= 0 && m >= 0.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as u mentioned in #13299 (comment) , perhaps its better to simplify in prim_func level, I chose to implement a stmt simplifier because it may be more useful. The rationale is that stmt is more fine-grained.
Moreover, in the context of tensor desc in tensorize schedule, prim_func typically encompasses a single block without dynamic symbolic. I think for this issue a stmt simplifier is enough.
But we can implement a prim_func one as well, should we keep both stmt and primfunc simplifier or just maintain only one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as u mentioned in #13299 (comment) , perhaps its better to simplify in prim_func level, I chose to implement a stmt simplifier because it may be more useful. The rationale is that stmt is more fine-grained.
Good point. Thinking on it again in the morning, I think we should avoid having the simplify function for tir::Stmt altogether, because it is more fine-grained. That is, its existence would encourage simplifications to be performed for specific statements, even though those statements might not be the outer-most.
But we can implement a prim_func one as well, should we keep both stmt and primfunc simplifier or just maintain only one of them?
I think having a simplifier for a PrimFunc would be better, because it encourages developers to simplify with the full context of a statement. The functionality already exists here, and would just need a wrapper function to expose StmtSimplifier::Apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lunderberg hi, I think this pr is ready for review.
|
@vinx13 @spectrometerHBH do you mind take a look at this PR? |
* support tensorize with simplified and call expr * replace stmt simplifier with primfunc simplifier * lint fix * lint:remove white space * lint: remove white space * cpp lint fix * lint: resolve include * clang format lint fix
* support tensorize with simplified and call expr * replace stmt simplifier with primfunc simplifier * lint fix * lint:remove white space * lint: remove white space * cpp lint fix * lint: resolve include * clang format lint fix
To optimize i4_to_f16 decoding, we can use some advanced hardware instructions to do fast type conversion to alleviate the cost of decoding, we can do that by tensorize in tvm.
To tensorize decoding, this pr extends the call component of ir_comparator, which is necessary because the decode block comprises call expressions.
Moreover, currently comparator do simplification on the lhs expr, however, the tensor intrin descs are not simplified, which will be inconsistent and will fail at comparation,
see this pr: #14108.
For example, we provide a test case for this situation:
The desc should be simplified from [v1 // 8] and [v1 % 8] to [0], [v1] to match the simplified lhs expr.
To do simplification for tensor intrin's desc, we warp and reuse tir::transform::simplify to support simplification for single stmt.