You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[BYOC] Two helper passes for external codegen using RelayToTIR custom pass machinery
(See https://discuss.tvm.apache.org/t/byoc-supporting-cutlass-byoc-with-collage/12796/6 for
context, which in turn is part of Collage (https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md).
For reasons explained in the above thread I'm moving CUTLASS to be IRModule-at-a-time external codegen
using a custom RelayToTIR pass instead of the traditional function-at-a-time external codegen using
a relay.ext.cutlass registered function. This means some of the rewriing done on-the-fly by LowerTEPass now
needs to be done by the custom pass directly. This PR supplies two passes which ease that burden:
- Before starting the CUTLASS-specific processing, make sure all "Compiler" attributed functions have
unique global definitions (ie are outlined). Though functions start in this form after BYOC partitioning,
under Graph and AOT compilation flows those functions are then inlined to pass through the 'codegen' keyhole
which assumes the whole model is just one self-contained main function. This pass will undo that. (I gave up
trying to just remove the inlining in the first place.)
- After the CUTLASS-specific processing the now compiled "Compiler" attributed functions need to marked as
'extern'. The te_compiler.cc uses the "ExternalSymbol" attribute for that, but since a) the symbol name
is never needed, on the presense of the attribute is significant downstream and b) "ExternalSymbol" is
easy to confuse with "global_symbol", I just replaced "ExternalSymbol" with "Extern" with an Integer(1)
(cf "Primitive").
The outlining pass is a little more general than necessary because it (will also) be used by Collage to
rewrite the IRModule into optimally partitioned form while making maximal reuse of partition functions.
Hence the abstract GlobalSymbolCache.
0 commit comments