-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix transpose optimizer on GPU EP #15988
Conversation
Looks like this PR need to be cherry picked as well. |
OK, I fixed the errors. So an EP may not register an allocator with type |
@@ -913,7 +913,15 @@ PostLayoutTransformCostCheck(const api::GraphRef& graph, const api::NodeRef& nod | |||
Status TransformLayoutForEP(Graph& graph, bool& modified, const IExecutionProvider& execution_provider, | |||
const DebugGraphFn& debug_graph_fn) { | |||
// sub graph recurse will be added later | |||
auto api_graph = MakeApiGraph(graph, execution_provider.GetAllocator(OrtMemTypeDefault), nullptr); | |||
auto cpu_allocator = execution_provider.GetAllocator(OrtMemTypeDefault); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler for the caller to pass in the value returned by ExecutionProviders.GetDefaultCpuAllocator()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's used in this way -
Status PartitionOrtFormatModel(onnxruntime::Graph& graph,
const ExecutionProviders& providers,
KernelRegistryManager& kernel_registry_manager,
SessionState& session_state) {
layout_transformer::TransformLayoutFunction transform_layout_fn = layout_transformer::IsSupportedOpset(graph)
? layout_transformer::TransformLayoutForEP
: nullptr;
GraphPartitioner partitioner(kernel_registry_manager, providers);
ORT_RETURN_IF_ERROR(partitioner.Partition(graph,
session_state.GetMutableFuncMgr(),
transform_layout_fn,
GraphPartitioner::Mode::kOrtFormatLoad));
return Status::OK();
}
So this is going to modify the function signature of GraphPartitioner::Partition()
and some callback type definition, which is not a trivial one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In InferenceSessions.cs there is a lambda used so technically you'd only need to update TransformLayoutForEP and not the entire path through GraphPartitioner.
layout_transformer::TransformLayoutForEP(graph_to_transform, modified, execution_provider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 places where TransformLayoutForEP()
is called, and neither of them has a context with an existing allocator. If I make this change, I still need code to get allocator from the EP, and I need do this twice.
Edit: I see you mentioned ExecutionProviders.GetDefaultCpuAllocator()
. Let me check if I can figure it out how to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an easier way to create lambdas in session_state_test.cc? is there a way to do currying in a simpler way
Description
because of #15618 , the default allocator changed to device allocator, which will be GPU instead of CPU. in transpose optimizer we expect to read data from initializers so a CPU allocator is required here.
this change fixes transpose optimizer on GPU EP
Fixes the issue referred to in #15869, #15796