change the EP device to default OrtDevice() for memoryType equals CPUInput #15903

jslhcl · 2023-05-10T23:31:19Z

Description

change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph
x and tensorRT EP

Motivation and Context

My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test:

root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
...
E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128]

Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0);

Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.

…Input for cuda, rocm, migraphx and tensorRT EP

jslhcl · 2023-05-11T01:06:35Z

/azp run orttraining-linux-gpu-ci-pipeline

azure-pipelines · 2023-05-11T01:06:45Z

Azure Pipelines successfully started running 1 pipeline(s).

wangyems · 2023-05-12T17:45:02Z

Verified this PR also fixes a perf regression issue in rel1.5 RC introuced by #15618

jslhcl · 2023-05-12T21:48:22Z

/azp run ONNX Runtime React Native CI Pipeline

azure-pipelines · 2023-05-12T21:48:33Z

Azure Pipelines successfully started running 1 pipeline(s).

jslhcl · 2023-05-12T22:06:24Z

/azp run orttraining-amd-gpu-ci-pipeline

azure-pipelines · 2023-05-12T22:06:32Z

Azure Pipelines successfully started running 1 pipeline(s).

update ROCm/MIGraphX CI to ROC5.5. TODO: two PR to fix failure on orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py - test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal (#15903) - test_ortmodule_attribute_name_collision_warning (#15884)

souptc · 2023-05-15T03:11:58Z

onnxruntime/core/providers/cuda/cuda_execution_provider.cc

-  if (mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput) {
-    return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::CUDA_PINNED, 0 /*CPU device id always be 0*/);
-  }
+  if (mem_type == OrtMemTypeCPUInput) return OrtDevice();


if (mem_type == OrtMemTypeCPUInput) return OrtDevice();

could you put some comments here why for CPU input we use non-pinned memory?

Will do in the next PR

souptc

update ROCm/MIGraphX CI to ROC5.5. TODO: two PR to fix failure on orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py - test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal (#15903) - test_ortmodule_attribute_name_collision_warning (#15884)

…Input (#15903) ### Description  change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph x and tensorRT EP ### Motivation and Context  My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test: root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax ... E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128] Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0); Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.

…uals CPUInput (#15903)" This reverts commit 3b8f3a0.

…Input (#15903) ### Description  change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph x and tensorRT EP ### Motivation and Context  My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test: root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax ... E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128] Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0); Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.

### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (#15819) [js/web] fix terser reserved symbols for worker (#15864) [JSEP] fix constructor for OrtDevice (#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (#15798) [wasm] revert emsdk to v3.1.19 (#15793) [wasm/JSEP] add threaded build to artifacts (#15777) [js/web] add target ort.webgpu.min.js (#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#15688) fix: setting builder optimization level to TRT 8.6 default (#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (#15921) Fix segfault for multiple GPU run (regression) (#15823) android package fix (#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (#15993) Adding support for conv fp16 fusion on Resnet50v1 (#15474) update onnx release 1.14 for docker files (#15680) Avoid generating training documentation during packaging (#15795) Update Conv-Add-Relu Fusion Transformation (#15834) Fix symbolic shape infer empty value_info (#15842) NhwcFusedConv: Add before Activation (#15837) use __hmul2 instead of __hmul2_rn (#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (#15903) Fixing NhwcFusedConv fp16 (#15950) fix topo sort in quantization tool (#16003) [doc] add LeakyRelu to coreml supported ops (#15944) [DML EP] Add frequent upload heap flushing (#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola

### Description Cherry-picks 26 commits to the release branch. Most cherry-picks are clean merges. Except: 1. When I got conflicts in cgmanifest.json and download-deps.yml, I choose to ignore the conflicts and regenerate the two files 2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc PR list: [js/webgpu] fix Transpose with non-float tensor (microsoft#15819) [js/web] fix terser reserved symbols for worker (microsoft#15864) [JSEP] fix constructor for OrtDevice (microsoft#15805) Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799) Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798) [wasm] revert emsdk to v3.1.19 (microsoft#15793) [wasm/JSEP] add threaded build to artifacts (microsoft#15777) [js/web] add target ort.webgpu.min.js (microsoft#15780) update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688) fix: setting builder optimization level to TRT 8.6 default (microsoft#15897) Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921) Fix segfault for multiple GPU run (regression) (microsoft#15823) android package fix (microsoft#15999) [CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993) Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474) update onnx release 1.14 for docker files (microsoft#15680) Avoid generating training documentation during packaging (microsoft#15795) Update Conv-Add-Relu Fusion Transformation (microsoft#15834) Fix symbolic shape infer empty value_info (microsoft#15842) NhwcFusedConv: Add before Activation (microsoft#15837) use __hmul2 instead of __hmul2_rn (microsoft#15852) change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903) Fixing NhwcFusedConv fp16 (microsoft#15950) fix topo sort in quantization tool (microsoft#16003) [doc] add LeakyRelu to coreml supported ops (microsoft#15944) [DML EP] Add frequent upload heap flushing (microsoft#15960) Co-authored-by: Yulong Wang Co-authored-by: dependabot[bot] Co-authored-by: Guenther Schmuelling Co-authored-by: Shalva Mist Co-authored-by: Maximilian Müller Co-authored-by: Dmitri Smirnov Co-authored-by: pengwa Co-authored-by: Ashwini Khade Co-authored-by: Edward Chen Co-authored-by: Jian Chen Co-authored-by: liqun Fu Co-authored-by: Baiju Meswani Co-authored-by: Tianlei Wu Co-authored-by: Chen Fu Co-authored-by: Ye Wang Co-authored-by: cao lei Co-authored-by: Yufeng Li Co-authored-by: Rachel Guo Co-authored-by: Patrice Vignola

change the EP device to default OrtDevice() for memoryType equals CPU…

ae1a937

…Input for cuda, rocm, migraphx and tensorRT EP

jslhcl requested review from pengwa, souptc, RandySheriffH and PeixuanZuo May 10, 2023 23:31

PeixuanZuo mentioned this pull request May 11, 2023

[ROCm] update ROCm/MIGraphX CI to ROCm5.5 #15905

Merged

compare EP type before reuse check

b0760e0

clear freeList before checking reuse case in next stream

eceee11

souptc reviewed May 15, 2023

View reviewed changes

souptc approved these changes May 15, 2023

View reviewed changes

jslhcl marked this pull request as ready for review May 15, 2023 14:42

jslhcl merged commit 3b8f3a0 into main May 15, 2023

jslhcl deleted the leca/changeEpDevice branch May 15, 2023 14:42

wangyems added the release:1.15 label May 15, 2023

mszhanyi added a commit that referenced this pull request May 18, 2023

Revert "change the EP device to default OrtDevice() for memoryType eq…

90a1327

…uals CPUInput (#15903)" This reverts commit 3b8f3a0.

snnn added the triage:approved Approved for cherrypicks for release label May 18, 2023

snnn removed the triage:approved Approved for cherrypicks for release label May 19, 2023

snnn removed the release:1.15 label May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change the EP device to default OrtDevice() for memoryType equals CPUInput #15903

change the EP device to default OrtDevice() for memoryType equals CPUInput #15903

jslhcl commented May 10, 2023

jslhcl commented May 11, 2023

azure-pipelines bot commented May 11, 2023

wangyems commented May 12, 2023

jslhcl commented May 12, 2023

azure-pipelines bot commented May 12, 2023

jslhcl commented May 12, 2023

azure-pipelines bot commented May 12, 2023

souptc May 15, 2023

jslhcl May 15, 2023

souptc left a comment

change the EP device to default OrtDevice() for memoryType equals CPUInput #15903

change the EP device to default OrtDevice() for memoryType equals CPUInput #15903

Conversation

jslhcl commented May 10, 2023

Description

Motivation and Context

jslhcl commented May 11, 2023

azure-pipelines bot commented May 11, 2023

wangyems commented May 12, 2023

jslhcl commented May 12, 2023

azure-pipelines bot commented May 12, 2023

jslhcl commented May 12, 2023

azure-pipelines bot commented May 12, 2023

souptc May 15, 2023

Choose a reason for hiding this comment

jslhcl May 15, 2023

Choose a reason for hiding this comment

souptc left a comment

Choose a reason for hiding this comment