Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix transpose optimizer on GPU EP #15988

Merged
merged 4 commits into from
May 19, 2023
Merged

Conversation

fs-eire
Copy link
Contributor

@fs-eire fs-eire commented May 17, 2023

Description

because of #15618 , the default allocator changed to device allocator, which will be GPU instead of CPU. in transpose optimizer we expect to read data from initializers so a CPU allocator is required here.

this change fixes transpose optimizer on GPU EP

Fixes the issue referred to in #15869, #15796

guschmue
guschmue previously approved these changes May 17, 2023
@askhade
Copy link
Contributor

askhade commented May 17, 2023

Looks like this PR need to be cherry picked as well.

@fs-eire
Copy link
Contributor Author

fs-eire commented May 17, 2023

Looks like this change fails tests on CUDA EP. I don't know much about how CUDA works with this transpose optimizer. Maybe I should only apply this fix to JSEP?

OK, I fixed the errors. So an EP may not register an allocator with type OrtMemTypeCPU (eg. InternalTestingExecutionProvider). So the latest behavior is, try to get allocator for OrtMemTypeDefault with that EP, if it is not on CPU device, try get allocator for OrtMemTypeCPU. If it still failed, return a failure status code.

@@ -913,7 +913,15 @@ PostLayoutTransformCostCheck(const api::GraphRef& graph, const api::NodeRef& nod
Status TransformLayoutForEP(Graph& graph, bool& modified, const IExecutionProvider& execution_provider,
const DebugGraphFn& debug_graph_fn) {
// sub graph recurse will be added later
auto api_graph = MakeApiGraph(graph, execution_provider.GetAllocator(OrtMemTypeDefault), nullptr);
auto cpu_allocator = execution_provider.GetAllocator(OrtMemTypeDefault);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler for the caller to pass in the value returned by ExecutionProviders.GetDefaultCpuAllocator()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's used in this way -

Status PartitionOrtFormatModel(onnxruntime::Graph& graph,
                               const ExecutionProviders& providers,
                               KernelRegistryManager& kernel_registry_manager,
                               SessionState& session_state) {

  layout_transformer::TransformLayoutFunction transform_layout_fn = layout_transformer::IsSupportedOpset(graph)
                                                                        ? layout_transformer::TransformLayoutForEP
                                                                        : nullptr;

  GraphPartitioner partitioner(kernel_registry_manager, providers);
  ORT_RETURN_IF_ERROR(partitioner.Partition(graph,
                                            session_state.GetMutableFuncMgr(),
                                            transform_layout_fn,
                                            GraphPartitioner::Mode::kOrtFormatLoad));

  return Status::OK();
}

So this is going to modify the function signature of GraphPartitioner::Partition() and some callback type definition, which is not a trivial one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In InferenceSessions.cs there is a lambda used so technically you'd only need to update TransformLayoutForEP and not the entire path through GraphPartitioner.

layout_transformer::TransformLayoutForEP(graph_to_transform, modified, execution_provider,

Copy link
Contributor Author

@fs-eire fs-eire May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 places where TransformLayoutForEP() is called, and neither of them has a context with an existing allocator. If I make this change, I still need code to get allocator from the EP, and I need do this twice.

Edit: I see you mentioned ExecutionProviders.GetDefaultCpuAllocator(). Let me check if I can figure it out how to do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easier way to create lambdas in session_state_test.cc? is there a way to do currying in a simpler way

skottmckay
skottmckay previously approved these changes May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants