Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the allocates of AoT codegen to be TVMBAWs #9065

Merged
merged 2 commits into from
Sep 24, 2021

Conversation

manupak
Copy link
Contributor

@manupak manupak commented Sep 21, 2021

This commit introduces changes to aot_executor_codegen.cc
to place tir.allocate to use storage_scope = 'global.workspace'.
The lower_tvm_builtin pass is modified slightly to generate
TVMBAW calls.

@manupak
Copy link
Contributor Author

manupak commented Sep 21, 2021

cc : @areusch @mbs-octoml @tqchen

@@ -589,5 +590,41 @@ def test_memory_planning(workspace_byte_alignment, main_workspace_size, sum_work
)


def test_aot_codegen_backend_alloc_workspace_calls():
dtype = "float32"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a brief docstring explaining what this test asserts?

inputs = OrderedDict([("data", i_data), ("weight", w1_data)])
output_list = generate_ref_data(mod, inputs)

compiled_runtime_modules = compile_models(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to assert that all the tir.allocate nodes are correctly tagged somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it is a bit cumbersome to do that :), Instead I used relay in primitive form so its clear that main function should only have three allocates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbs-octoml @denise-k can we add a tracking/cleanup task to make this kind of assert easier to write? And flag to cleanup this test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@areusch roadmap item and task tracking have been created.

Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (FWIW). Thanks I learned from this one.

src/tir/transforms/lower_tvm_builtin.cc Show resolved Hide resolved
@@ -623,8 +623,13 @@ class AOTExecutorCodegen : public MixedModeVisitor {
// Define the storage allocator ids
for (auto kv : storage_device_map_) {
for (auto sid : kv.second->storage_ids) {
// The buffer_var is created with storage_scope to be global.workspace to be serviced by
// TVMBAWs, explicitly. The reasoning being the executor allocates should be serviced by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For poor folks like me: use TVMBackendAllocWorkspace at least once.

src/tir/transforms/storage_rewrite.cc Show resolved Hide resolved
@jroesch
Copy link
Member

jroesch commented Sep 22, 2021

@csullivan @mbrookhart @junrushao1994 can you guys sign off on this one as well to make sure this won't break anyone?

@jroesch
Copy link
Member

jroesch commented Sep 22, 2021

LGTM, will just ping other core compiler folks to make sure this works for them.

True,
)

source = compiled_runtime_modules[0].lib.imported_modules[0].get_source()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AoT codegen for main ends up as an imported module? Naively I would expect the TVMBackendAllocateWorkspace calls in the imported_modules list to be intra-op only, e.g. for the conv2d, not AoT main.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the reasoning but the current flow just creates per target IRModules just before runtime.Modules are created. Therefore all host_target (i.e. CPU) PrimFuncs end up in a single runtime.Module.

@manupak manupak force-pushed the move_aot_allocates_global_workspace branch from 1f2a365 to db37329 Compare September 22, 2021 17:24
Copy link
Contributor Author

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed the comments now. PTAL.

src/tir/transforms/lower_tvm_builtin.cc Show resolved Hide resolved
src/tir/transforms/storage_rewrite.cc Show resolved Hide resolved
inputs = OrderedDict([("data", i_data), ("weight", w1_data)])
output_list = generate_ref_data(mod, inputs)

compiled_runtime_modules = compile_models(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it is a bit cumbersome to do that :), Instead I used relay in primitive form so its clear that main function should only have three allocates.

True,
)

source = compiled_runtime_modules[0].lib.imported_modules[0].get_source()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the reasoning but the current flow just creates per target IRModules just before runtime.Modules are created. Therefore all host_target (i.e. CPU) PrimFuncs end up in a single runtime.Module.

@manupak manupak force-pushed the move_aot_allocates_global_workspace branch from db37329 to 321ba2c Compare September 22, 2021 20:33
@manupak
Copy link
Contributor Author

manupak commented Sep 23, 2021

This is failing CI on an unrelated error (I think due to : #9013). I think #9076 should fix it (Thanks @mbs-octoml ). Therefore it would be great if we can get an approval if this looks good as it is blocking our work -- so that once #9076 lands we could merge this.

@jroesch
Copy link
Member

jroesch commented Sep 23, 2021

@manupa-arm @junrushao1994 and I landed the other patch, if you guys are still working today can you rebase before you log off and I will try and merge before end of day pacific time?

@manupak manupak force-pushed the move_aot_allocates_global_workspace branch from 321ba2c to 14f3cf4 Compare September 23, 2021 17:04
@manupak
Copy link
Contributor Author

manupak commented Sep 23, 2021

Thanks! Done.

This commit introduces changes to aot_executor_codegen.cc
to place tir.allocate to use storage_scope = 'global.workspace'.
The lower_tvm_builtin pass is modified slightly to generate
TVMBAW calls.

Change-Id: Iba4ba437c1431c5197bf11abb826e03807bbcf66
*Adding more comments and descriptions
*Modified the test case to use primitive relay

Change-Id: Ia18a169d94bded3f81af7b3081c7d1ac29c669bc
@manupak manupak force-pushed the move_aot_allocates_global_workspace branch from 14f3cf4 to bfa7944 Compare September 24, 2021 05:46
@leandron leandron merged commit 7c6a334 into apache:main Sep 24, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* Move the allocates of AoT codegen to be TVMBAWs

This commit introduces changes to aot_executor_codegen.cc
to place tir.allocate to use storage_scope = 'global.workspace'.
The lower_tvm_builtin pass is modified slightly to generate
TVMBAW calls.

Change-Id: Iba4ba437c1431c5197bf11abb826e03807bbcf66

* Move the allocates of AoT codegen to be TVMBAWs

*Adding more comments and descriptions
*Modified the test case to use primitive relay

Change-Id: Ia18a169d94bded3f81af7b3081c7d1ac29c669bc
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Move the allocates of AoT codegen to be TVMBAWs

This commit introduces changes to aot_executor_codegen.cc
to place tir.allocate to use storage_scope = 'global.workspace'.
The lower_tvm_builtin pass is modified slightly to generate
TVMBAW calls.

Change-Id: Iba4ba437c1431c5197bf11abb826e03807bbcf66

* Move the allocates of AoT codegen to be TVMBAWs

*Adding more comments and descriptions
*Modified the test case to use primitive relay

Change-Id: Ia18a169d94bded3f81af7b3081c7d1ac29c669bc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants