Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the EP device to default OrtDevice() for memoryType equals CPUInput #15903

Merged
merged 3 commits into from
May 15, 2023

Conversation

jslhcl
Copy link
Contributor

@jslhcl jslhcl commented May 10, 2023

Description

change the EP device to default OrtDevice() for memoryType equals CPUInput for cuda, rocm, migraph
x and tensorRT EP

Motivation and Context

My previous PR (#15618) caused random failures on cuda training test GradientCheckerTest.TileGrad (see build https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb) and rocm test:

root@a59558217e53:/workspace# pytest orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
...
E RuntimeError: Error in backward pass execution: Non-zero status code returned while running ATen node. Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size calculation overflowed with sizes=[72340172838076673, 72340172838076673, 128]

Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx EP is CPUInput, previously the corresponding device in the IAllocator's memoryInfo is default OrtDevice(), while after my change, it becomes OrtDevice(CPU, xx_PINNED, 0);

Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training build.

…Input for cuda, rocm, migraphx and tensorRT EP
@jslhcl
Copy link
Contributor Author

jslhcl commented May 11, 2023

/azp run orttraining-linux-gpu-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@wangyems
Copy link
Contributor

Verified this PR also fixes a perf regression issue in rel1.5 RC introuced by #15618

@jslhcl
Copy link
Contributor Author

jslhcl commented May 12, 2023

/azp run ONNX Runtime React Native CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jslhcl
Copy link
Contributor Author

jslhcl commented May 12, 2023

/azp run orttraining-amd-gpu-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

PeixuanZuo added a commit that referenced this pull request May 15, 2023
update ROCm/MIGraphX CI to ROC5.5.

TODO:
two PR to fix failure on
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py
-
test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal
(#15903)
- test_ortmodule_attribute_name_collision_warning
(#15884)
if (mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput) {
return OrtDevice(OrtDevice::CPU, OrtDevice::MemType::CUDA_PINNED, 0 /*CPU device id always be 0*/);
}
if (mem_type == OrtMemTypeCPUInput) return OrtDevice();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (mem_type == OrtMemTypeCPUInput) return OrtDevice();

could you put some comments here why for CPU input we use non-pinned memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in the next PR

Copy link
Member

@souptc souptc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jslhcl jslhcl marked this pull request as ready for review May 15, 2023 14:42
@jslhcl jslhcl merged commit 3b8f3a0 into main May 15, 2023
@jslhcl jslhcl deleted the leca/changeEpDevice branch May 15, 2023 14:42
prathikr pushed a commit that referenced this pull request May 16, 2023
update ROCm/MIGraphX CI to ROC5.5.

TODO:
two PR to fix failure on
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py
-
test_gradient_correctness_minmax/test_gradient_correctness_argmax_unfold/test_gradient_correctness_argmax_diagonal
(#15903)
- test_ortmodule_attribute_name_collision_warning
(#15884)
prathikr pushed a commit that referenced this pull request May 16, 2023
…Input (#15903)

### Description
<!-- Describe your changes. -->
change the EP device to default OrtDevice() for memoryType equals
CPUInput for cuda, rocm, migraph
x and tensorRT EP


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
My previous PR (#15618)
caused random failures on cuda training test
GradientCheckerTest.TileGrad (see build
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb)
and rocm test:

root@a59558217e53:/workspace# pytest
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
... 
E RuntimeError: Error in backward pass execution: Non-zero status code
returned while running ATen node.
Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size
calculation overflowed with sizes=[72340172838076673, 72340172838076673,
128]

Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx
EP is CPUInput, previously the corresponding device in the IAllocator's
memoryInfo is default OrtDevice(), while after my change, it becomes
OrtDevice(CPU, xx_PINNED, 0);

Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training
build.
mszhanyi added a commit that referenced this pull request May 18, 2023
@snnn snnn added the triage:approved Approved for cherrypicks for release label May 18, 2023
snnn pushed a commit that referenced this pull request May 19, 2023
…Input (#15903)

### Description
<!-- Describe your changes. -->
change the EP device to default OrtDevice() for memoryType equals
CPUInput for cuda, rocm, migraph
x and tensorRT EP


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
My previous PR (#15618)
caused random failures on cuda training test
GradientCheckerTest.TileGrad (see build
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb)
and rocm test:

root@a59558217e53:/workspace# pytest
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
... 
E RuntimeError: Error in backward pass execution: Non-zero status code
returned while running ATen node.
Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size
calculation overflowed with sizes=[72340172838076673, 72340172838076673,
128]

Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx
EP is CPUInput, previously the corresponding device in the IAllocator's
memoryInfo is default OrtDevice(), while after my change, it becomes
OrtDevice(CPU, xx_PINNED, 0);

Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training
build.
snnn pushed a commit that referenced this pull request May 19, 2023
…Input (#15903)

### Description
<!-- Describe your changes. -->
change the EP device to default OrtDevice() for memoryType equals
CPUInput for cuda, rocm, migraph
x and tensorRT EP


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
My previous PR (#15618)
caused random failures on cuda training test
GradientCheckerTest.TileGrad (see build
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=986784&view=logs&j=5076e696-f193-5f12-2d8a-703dda41a79b&t=a3824a7c-2162-5e3d-3fdd-8cf808834fbb)
and rocm test:

root@a59558217e53:/workspace# pytest
orttraining/orttraining/test/python/orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax
... 
E RuntimeError: Error in backward pass execution: Non-zero status code
returned while running ATen node.
Name:'/_original_module/ATen_Grad/ATen_1' Status Message: Storage size
calculation overflowed with sizes=[72340172838076673, 72340172838076673,
128]

Potential reason is that if the memType of cuda/tensorRT/rocm/migraphx
EP is CPUInput, previously the corresponding device in the IAllocator's
memoryInfo is default OrtDevice(), while after my change, it becomes
OrtDevice(CPU, xx_PINNED, 0);

Changing it back fixed GradientCheckerTest.TileGrad in Win GPU training
build.
snnn added a commit that referenced this pull request May 19, 2023
### Description
Cherry-picks 26 commits to the release branch. 
Most cherry-picks are clean merges. Except:

1. When I got conflicts in cgmanifest.json and download-deps.yml, I
choose to ignore the conflicts and regenerate the two files
2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc


PR list:

[js/webgpu] fix Transpose with non-float tensor (#15819)
[js/web] fix terser reserved symbols for worker (#15864)
[JSEP] fix constructor for OrtDevice (#15805)
Bump engine.io from 6.4.1 to 6.4.2 in /js/web (#15799)
Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (#15798)
[wasm] revert emsdk to v3.1.19 (#15793)
[wasm/JSEP] add threaded build to artifacts (#15777)
[js/web] add target ort.webgpu.min.js (#15780)
update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (#15688)
fix: setting builder optimization level to TRT 8.6 default (#15897)
Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (#15921)
Fix segfault for multiple GPU run (regression) (#15823)
android package fix (#15999)
[CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (#15993)
Adding support for conv fp16 fusion on Resnet50v1 (#15474)
update onnx release 1.14 for docker files (#15680)
Avoid generating training documentation during packaging (#15795)
Update Conv-Add-Relu Fusion Transformation (#15834)
Fix symbolic shape infer empty value_info (#15842)
NhwcFusedConv: Add before Activation (#15837)
use __hmul2 instead of __hmul2_rn (#15852)
change the EP device to default OrtDevice() for memoryType equals CPU Input (#15903)
Fixing NhwcFusedConv fp16 (#15950)
fix topo sort in quantization tool (#16003)
[doc] add LeakyRelu to coreml supported ops (#15944)
[DML EP] Add frequent upload heap flushing (#15960)

Co-authored-by: Yulong Wang 
Co-authored-by: dependabot[bot] 
Co-authored-by: Guenther Schmuelling 
Co-authored-by: Shalva Mist 
Co-authored-by: Maximilian Müller 
Co-authored-by: Dmitri Smirnov 
Co-authored-by: pengwa 
Co-authored-by: Ashwini Khade 
Co-authored-by: Edward Chen 
Co-authored-by: Jian Chen 
Co-authored-by: liqun Fu 
Co-authored-by: Baiju Meswani 
Co-authored-by: Tianlei Wu 
Co-authored-by: Chen Fu 
Co-authored-by: Ye Wang 
Co-authored-by: cao lei 
Co-authored-by: Yufeng Li 
Co-authored-by: Rachel Guo 
Co-authored-by: Patrice Vignola
@snnn snnn removed the triage:approved Approved for cherrypicks for release label May 19, 2023
@snnn snnn removed the release:1.15 label May 19, 2023
preetha-intel pushed a commit to intel/onnxruntime that referenced this pull request Jun 7, 2023
### Description
Cherry-picks 26 commits to the release branch. 
Most cherry-picks are clean merges. Except:

1. When I got conflicts in cgmanifest.json and download-deps.yml, I
choose to ignore the conflicts and regenerate the two files
2. There were some conflicts in cmake/deps.txt, onnxruntime_c_api.cc


PR list:

[js/webgpu] fix Transpose with non-float tensor (microsoft#15819)
[js/web] fix terser reserved symbols for worker (microsoft#15864)
[JSEP] fix constructor for OrtDevice (microsoft#15805)
Bump engine.io from 6.4.1 to 6.4.2 in /js/web (microsoft#15799)
Bump engine.io from 6.4.0 to 6.4.2 in /onnxruntime/test/wasm (microsoft#15798)
[wasm] revert emsdk to v3.1.19 (microsoft#15793)
[wasm/JSEP] add threaded build to artifacts (microsoft#15777)
[js/web] add target ort.webgpu.min.js (microsoft#15780)
update ort extensions to 94142d8391c9791ec71c38336436319a2d4ac7a0 (microsoft#15688)
fix: setting builder optimization level to TRT 8.6 default (microsoft#15897)
Adust GetVersionString() GetBuildInfoString() signatures and move them to OrtApi (microsoft#15921)
Fix segfault for multiple GPU run (regression) (microsoft#15823)
android package fix (microsoft#15999)
[CoreML EP] Minor changes to allow CoreML EP to handle more nodes and models. (microsoft#15993)
Adding support for conv fp16 fusion on Resnet50v1 (microsoft#15474)
update onnx release 1.14 for docker files (microsoft#15680)
Avoid generating training documentation during packaging (microsoft#15795)
Update Conv-Add-Relu Fusion Transformation (microsoft#15834)
Fix symbolic shape infer empty value_info (microsoft#15842)
NhwcFusedConv: Add before Activation (microsoft#15837)
use __hmul2 instead of __hmul2_rn (microsoft#15852)
change the EP device to default OrtDevice() for memoryType equals CPU Input (microsoft#15903)
Fixing NhwcFusedConv fp16 (microsoft#15950)
fix topo sort in quantization tool (microsoft#16003)
[doc] add LeakyRelu to coreml supported ops (microsoft#15944)
[DML EP] Add frequent upload heap flushing (microsoft#15960)

Co-authored-by: Yulong Wang 
Co-authored-by: dependabot[bot] 
Co-authored-by: Guenther Schmuelling 
Co-authored-by: Shalva Mist 
Co-authored-by: Maximilian Müller 
Co-authored-by: Dmitri Smirnov 
Co-authored-by: pengwa 
Co-authored-by: Ashwini Khade 
Co-authored-by: Edward Chen 
Co-authored-by: Jian Chen 
Co-authored-by: liqun Fu 
Co-authored-by: Baiju Meswani 
Co-authored-by: Tianlei Wu 
Co-authored-by: Chen Fu 
Co-authored-by: Ye Wang 
Co-authored-by: cao lei 
Co-authored-by: Yufeng Li 
Co-authored-by: Rachel Guo 
Co-authored-by: Patrice Vignola
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants