-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move the allocates of AoT codegen to be TVMBAWs #9065
Move the allocates of AoT codegen to be TVMBAWs #9065
Conversation
cc : @areusch @mbs-octoml @tqchen |
@@ -589,5 +590,41 @@ def test_memory_planning(workspace_byte_alignment, main_workspace_size, sum_work | |||
) | |||
|
|||
|
|||
def test_aot_codegen_backend_alloc_workspace_calls(): | |||
dtype = "float32" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a brief docstring explaining what this test asserts?
inputs = OrderedDict([("data", i_data), ("weight", w1_data)]) | ||
output_list = generate_ref_data(mod, inputs) | ||
|
||
compiled_runtime_modules = compile_models( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to assert that all the tir.allocate nodes are correctly tagged somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it is a bit cumbersome to do that :), Instead I used relay in primitive form so its clear that main function should only have three allocates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbs-octoml @denise-k can we add a tracking/cleanup task to make this kind of assert easier to write? And flag to cleanup this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@areusch roadmap item and task tracking have been created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (FWIW). Thanks I learned from this one.
@@ -623,8 +623,13 @@ class AOTExecutorCodegen : public MixedModeVisitor { | |||
// Define the storage allocator ids | |||
for (auto kv : storage_device_map_) { | |||
for (auto sid : kv.second->storage_ids) { | |||
// The buffer_var is created with storage_scope to be global.workspace to be serviced by | |||
// TVMBAWs, explicitly. The reasoning being the executor allocates should be serviced by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For poor folks like me: use TVMBackendAllocWorkspace at least once.
@csullivan @mbrookhart @junrushao1994 can you guys sign off on this one as well to make sure this won't break anyone? |
LGTM, will just ping other core compiler folks to make sure this works for them. |
True, | ||
) | ||
|
||
source = compiled_runtime_modules[0].lib.imported_modules[0].get_source() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AoT codegen for main ends up as an imported module? Naively I would expect the TVMBackendAllocateWorkspace calls in the imported_modules list to be intra-op only, e.g. for the conv2d, not AoT main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand the reasoning but the current flow just creates per target IRModules just before runtime.Modules are created. Therefore all host_target (i.e. CPU) PrimFuncs end up in a single runtime.Module.
1f2a365
to
db37329
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed the comments now. PTAL.
inputs = OrderedDict([("data", i_data), ("weight", w1_data)]) | ||
output_list = generate_ref_data(mod, inputs) | ||
|
||
compiled_runtime_modules = compile_models( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it is a bit cumbersome to do that :), Instead I used relay in primitive form so its clear that main function should only have three allocates.
True, | ||
) | ||
|
||
source = compiled_runtime_modules[0].lib.imported_modules[0].get_source() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand the reasoning but the current flow just creates per target IRModules just before runtime.Modules are created. Therefore all host_target (i.e. CPU) PrimFuncs end up in a single runtime.Module.
db37329
to
321ba2c
Compare
This is failing CI on an unrelated error (I think due to : #9013). I think #9076 should fix it (Thanks @mbs-octoml ). Therefore it would be great if we can get an approval if this looks good as it is blocking our work -- so that once #9076 lands we could merge this. |
@manupa-arm @junrushao1994 and I landed the other patch, if you guys are still working today can you rebase before you log off and I will try and merge before end of day pacific time? |
321ba2c
to
14f3cf4
Compare
Thanks! Done. |
This commit introduces changes to aot_executor_codegen.cc to place tir.allocate to use storage_scope = 'global.workspace'. The lower_tvm_builtin pass is modified slightly to generate TVMBAW calls. Change-Id: Iba4ba437c1431c5197bf11abb826e03807bbcf66
*Adding more comments and descriptions *Modified the test case to use primitive relay Change-Id: Ia18a169d94bded3f81af7b3081c7d1ac29c669bc
14f3cf4
to
bfa7944
Compare
* Move the allocates of AoT codegen to be TVMBAWs This commit introduces changes to aot_executor_codegen.cc to place tir.allocate to use storage_scope = 'global.workspace'. The lower_tvm_builtin pass is modified slightly to generate TVMBAW calls. Change-Id: Iba4ba437c1431c5197bf11abb826e03807bbcf66 * Move the allocates of AoT codegen to be TVMBAWs *Adding more comments and descriptions *Modified the test case to use primitive relay Change-Id: Ia18a169d94bded3f81af7b3081c7d1ac29c669bc
* Move the allocates of AoT codegen to be TVMBAWs This commit introduces changes to aot_executor_codegen.cc to place tir.allocate to use storage_scope = 'global.workspace'. The lower_tvm_builtin pass is modified slightly to generate TVMBAW calls. Change-Id: Iba4ba437c1431c5197bf11abb826e03807bbcf66 * Move the allocates of AoT codegen to be TVMBAWs *Adding more comments and descriptions *Modified the test case to use primitive relay Change-Id: Ia18a169d94bded3f81af7b3081c7d1ac29c669bc
This commit introduces changes to aot_executor_codegen.cc
to place tir.allocate to use storage_scope = 'global.workspace'.
The lower_tvm_builtin pass is modified slightly to generate
TVMBAW calls.