Skip to content

Commit 72784b2

Browse files
committed
[Pass] Attach memory-planning attributes for dynamic func output
This PR adds a pass into the model compilation pipeline, which attach an attribute `"relax.memory_plan_dynamic_func_output"` for each Relax function in the IRModule. This attribute suggests that the Relax functions' output tensors, though having dynamic shapes, are statically plannable. This enhancement makes sure that in serving scenarios, our memory allcoation is completely static after stablized. So we will not be worried about continuing memory usage growth, and can allocate more memory for KV cache. This PR can be early merged, but it will not take effects until apache/tvm#16111 is merged.
1 parent b58d32d commit 72784b2

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

python/mlc_chat/compiler_pass/pipeline.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ def _pipeline(mod: tvm.ir.IRModule, _ctx: tvm.transform.PassContext) -> tvm.ir.I
127127
tvm.relax.transform.RemovePurityChecking(),
128128
tvm.relax.transform.CallTIRRewrite(),
129129
tvm.relax.transform.StaticPlanBlockMemory(),
130+
_DebugDump("memory-planning.py", debug_dump, show_meta=False),
130131
AttachMetadataWithMemoryUsage(metadata),
131132
tvm.relax.transform.RewriteCUDAGraph(),
132133
tvm.relax.transform.LowerAllocTensor(),

0 commit comments

Comments
 (0)