[Pass] Attach memory-planning attributes for dynamic func output

MasterJH5574 · MasterJH5574 · commit 72784b2a49ab · 2024-01-14T23:50:27.000-05:00
This PR adds a pass into the model compilation pipeline, which attach an attribute `"relax.memory_plan_dynamic_func_output"` for each Relax function in the IRModule. This attribute suggests that the Relax functions' output tensors, though having dynamic shapes, are statically plannable. This enhancement makes sure that in serving scenarios, our memory allcoation is completely static after stablized. So we will not be worried about continuing memory usage growth, and can allocate more memory for KV cache. This PR can be early merged, but it will not take effects until apache/tvm#16111 is merged.
diff --git a/python/mlc_chat/compiler_pass/pipeline.py b/python/mlc_chat/compiler_pass/pipeline.py
@@ -127,6 +127,7 @@ def _pipeline(mod: tvm.ir.IRModule, _ctx: tvm.transform.PassContext) -> tvm.ir.I
                 tvm.relax.transform.RemovePurityChecking(),
                 tvm.relax.transform.CallTIRRewrite(),
                 tvm.relax.transform.StaticPlanBlockMemory(),
+                _DebugDump("memory-planning.py", debug_dump, show_meta=False),
                 AttachMetadataWithMemoryUsage(metadata),
                 tvm.relax.transform.RewriteCUDAGraph(),
                 tvm.relax.transform.LowerAllocTensor(),