fix transpose optimizer on GPU EP #15988

fs-eire · 2023-05-17T21:22:54Z

Description

because of #15618 , the default allocator changed to device allocator, which will be GPU instead of CPU. in transpose optimizer we expect to read data from initializers so a CPU allocator is required here.

this change fixes transpose optimizer on GPU EP

Fixes the issue referred to in #15869, #15796

askhade · 2023-05-17T21:41:45Z

Looks like this PR need to be cherry picked as well.

fs-eire · 2023-05-17T21:56:22Z

~~Looks like this change fails tests on CUDA EP. I don't know much about how CUDA works with this transpose optimizer. Maybe I should only apply this fix to JSEP?~~

OK, I fixed the errors. So an EP may not register an allocator with type OrtMemTypeCPU (eg. InternalTestingExecutionProvider). So the latest behavior is, try to get allocator for OrtMemTypeDefault with that EP, if it is not on CPU device, try get allocator for OrtMemTypeCPU. If it still failed, return a failure status code.

skottmckay · 2023-05-17T22:24:33Z

onnxruntime/core/optimizer/transpose_optimizer/optimizer_api_impl.cc

@@ -913,7 +913,15 @@ PostLayoutTransformCostCheck(const api::GraphRef& graph, const api::NodeRef& nod
 Status TransformLayoutForEP(Graph& graph, bool& modified, const IExecutionProvider& execution_provider,
                            const DebugGraphFn& debug_graph_fn) {
  // sub graph recurse will be added later
-  auto api_graph = MakeApiGraph(graph, execution_provider.GetAllocator(OrtMemTypeDefault), nullptr);
+  auto cpu_allocator = execution_provider.GetAllocator(OrtMemTypeDefault);


Would it be simpler for the caller to pass in the value returned by ExecutionProviders.GetDefaultCpuAllocator()?

it's used in this way -

Status PartitionOrtFormatModel(onnxruntime::Graph& graph, const ExecutionProviders& providers, KernelRegistryManager& kernel_registry_manager, SessionState& session_state) { layout_transformer::TransformLayoutFunction transform_layout_fn = layout_transformer::IsSupportedOpset(graph) ? layout_transformer::TransformLayoutForEP : nullptr; GraphPartitioner partitioner(kernel_registry_manager, providers); ORT_RETURN_IF_ERROR(partitioner.Partition(graph, session_state.GetMutableFuncMgr(), transform_layout_fn, GraphPartitioner::Mode::kOrtFormatLoad)); return Status::OK(); }

So this is going to modify the function signature of GraphPartitioner::Partition() and some callback type definition, which is not a trivial one.

In InferenceSessions.cs there is a lambda used so technically you'd only need to update TransformLayoutForEP and not the entire path through GraphPartitioner.

onnxruntime/onnxruntime/core/session/inference_session.cc

Line 944 in 648bedf

layout_transformer::TransformLayoutForEP(graph_to_transform, modified, execution_provider,

There are 2 places where TransformLayoutForEP() is called, and neither of them has a context with an existing allocator. If I make this change, I still need code to get allocator from the EP, and I need do this twice.

Edit: I see you mentioned ExecutionProviders.GetDefaultCpuAllocator(). Let me check if I can figure it out how to do this

Updated the code

Is there an easier way to create lambdas in session_state_test.cc? is there a way to do currying in a simpler way

fix transpose optimizer on GPU EP

c1ab5ea

guschmue previously approved these changes May 17, 2023

View reviewed changes

try get CPU allocator only when default allocator is not CPU

993c8ee

fs-eire dismissed guschmue’s stale review via 993c8ee May 17, 2023 22:13

skottmckay reviewed May 17, 2023

View reviewed changes

fix build break

72028a1

skottmckay previously approved these changes May 18, 2023

View reviewed changes

refactor TransformLayoutForEP() to receive allocator as parameter

37f6c90

fs-eire dismissed skottmckay’s stale review via 37f6c90 May 18, 2023 09:20

skottmckay approved these changes May 18, 2023

View reviewed changes

guschmue approved these changes May 18, 2023

View reviewed changes

fs-eire merged commit dc06c25 into main May 19, 2023

fs-eire deleted the fs-eire/fix-transpose-optimizer branch May 19, 2023 21:33

gabrielgrant mentioned this pull request May 24, 2024

[Web] WebGPU backend fails to load some model due to exception during initialization inside transpose optimizer #15869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix transpose optimizer on GPU EP #15988

fix transpose optimizer on GPU EP #15988

fs-eire commented May 17, 2023 •

edited

Loading

askhade commented May 17, 2023

fs-eire commented May 17, 2023 •

edited

Loading

skottmckay May 17, 2023

fs-eire May 17, 2023

skottmckay May 18, 2023

fs-eire May 18, 2023 •

edited

Loading

fs-eire May 18, 2023

fs-eire May 18, 2023

fix transpose optimizer on GPU EP #15988

fix transpose optimizer on GPU EP #15988

Conversation

fs-eire commented May 17, 2023 • edited Loading

Description

askhade commented May 17, 2023

fs-eire commented May 17, 2023 • edited Loading

skottmckay May 17, 2023

Choose a reason for hiding this comment

fs-eire May 17, 2023

Choose a reason for hiding this comment

skottmckay May 18, 2023

Choose a reason for hiding this comment

fs-eire May 18, 2023 • edited Loading

Choose a reason for hiding this comment

fs-eire May 18, 2023

Choose a reason for hiding this comment

fs-eire May 18, 2023

Choose a reason for hiding this comment

fs-eire commented May 17, 2023 •

edited

Loading

fs-eire commented May 17, 2023 •

edited

Loading

fs-eire May 18, 2023 •

edited

Loading