Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick for 1.17.3 #20013

Merged
merged 22 commits into from
Mar 29, 2024
Merged

Cherry-pick for 1.17.3 #20013

merged 22 commits into from
Mar 29, 2024

Conversation

YUNQIUGUO
Copy link
Contributor

Description

Web prs are not included yet.

Motivation and Context

yufenglee and others added 7 commits March 21, 2024 11:12
### Description
<!-- Describe your changes. -->
Windows memory map casts mapped_offset to DWORD directly. It will be
truncated if it is larger than 2^32-1. We need to set high
dwFileOffsetHigh for this case.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The bug was found from #19450
Answers issue #19640
More details are in the issue, basically I am changing all the include
directory and link directory usage to CMake's `CUDA::*` targets
### Description
Make Linux logic consistent as Windows


### Motivation and Context
onnxruntime_lite_custom_op.h in Windows zip package but not in Linux zip
package

https://github.com/microsoft/onnxruntime/blob/acbfc29f272b5578145e7600bc42342e116ffbc2/tools/ci_build/github/azure-pipelines/templates/c-api-artifacts-package-and-publish-steps-windows.yml#L67

Co-authored-by: Your Name <[email protected]>
Fix some linker errors that come up when integrating the onnxruntime-training-c pod into another Xcode project. The problematic configuration is a minimal build with training APIs enabled.
- training_op_defs.o had some unresolved references to ONNX functions. It should not be included at all in a minimal build.
- tree_ensemble_helper.o also had unresolved references to ONNX ParseData. The containing function is unused in a minimal build.

Added a test to cover this configuration.
#19845)

fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime
supports Windows 10 and above, only."

### Description

Include Windows 11 in the version check. Now, you will not see the
warning “Unsupported Windows version (11). ONNX Runtime supports Windows
10 and above, only.”

### Motivation and Context

Warning on Windows 11: Only supports systems above Windows 10, which is
somewhat strange.
fix build break caused by image update. tensorrt isn't expected to pass
all onnx node tests.
Fixed all CUDA NHWC Pooling operations which were broken and enabled the
NHWC CUDA pooling tests. Disabled all pooling tests which are not
supported by the CUDA EP.

Ensure parity between CUDA NHWC / NCHW and work towards 100% tests
enabled for the CUDA EP / CUDA NHWC EP.

---------

Co-authored-by: Tianlei Wu <[email protected]>
@YUNQIUGUO YUNQIUGUO requested a review from a team as a code owner March 21, 2024 18:43
@YUNQIUGUO
Copy link
Contributor Author

Had to address a couple merge conflicts so please have a look at the current changes - help make sure it's correct.

@Craigacp
Copy link
Contributor

Is 1.17.3 a release for all APIs or just the web one? If it's for all APIs could we get #19942 merged in? It fixes a regression in 1.17 from 1.16 which we've hit in practice.

@YUNQIUGUO
Copy link
Contributor Author

Is 1.17.3 a release for all APIs or just the web one? If it's for all APIs could we get #19942 merged in? It fixes a regression in 1.17 from 1.16 which we've hit in practice.

Sure - mind clarifying what packages this change apply to?

@Craigacp
Copy link
Contributor

Craigacp commented Mar 21, 2024

It's a fix to the CPU provider implementation of the SplitToSequence operator. So it's still useful for the web API, but our use is via Python or Java.

@YUNQIUGUO
Copy link
Contributor Author

@tianleiwu @mtavenrath looks like this pr #19889 introduces a bunch of build failures on the release branch. mind sharing is there any dependency prs that I need to take or so? https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1331111&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=38e4b5a4-9041-5949-6670-6ace666c420b&l=5163 failure error message.

Craigacp and others added 5 commits March 21, 2024 13:43
### Description
Previously, GQA incorrectly enforced rotary cos and sin cache to be of
sequence length equal to present sequence length. Now it enforces that
it be greater than or equal to present sequence length since to match
Rotary Embedding Op it should be of max_sequence_length



### Motivation and Context
Fixes issue with fusing Rotary Embedding and GQA for certain models
which prefer this optimization.
### Description
GQA Rotary Dimension 1 incorrectly assumed to be based on head size.



### Motivation and Context
This change should enable us to run phi-2 with GQA and Rotary Embedding
fused.
### Description
This PR updates the replacement of MultiHeadAttention (MHA) with
GroupQueryAttention (GQA). It is related to the changes in [this
PR](#18906).

### Motivation and Context
The updated replacement of MHA with GQA includes the following fusion
changes.
- Apply sliding window within GQA
- Fuse the rotary embeddings within GQA
- Fuse the 3 MatMuls into 1 packed MatMul if possible
- Fuse the 3 Adds into 1 packed Add if possible
@tianleiwu
Copy link
Contributor

@tianleiwu @mtavenrath looks like this pr #19889 introduces a bunch of build failures on the release branch. mind sharing is there any dependency prs that I need to take or so? https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1331111&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=38e4b5a4-9041-5949-6670-6ace666c420b&l=5163 failure error message.

There is no need to take dependency PRs, which might bring more changes. I added a commit to manually fix the build errors.

@YUNQIUGUO
Copy link
Contributor Author

@tianleiwu @mtavenrath looks like this pr #19889 introduces a bunch of build failures on the release branch. mind sharing is there any dependency prs that I need to take or so? https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1331111&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=38e4b5a4-9041-5949-6670-6ace666c420b&l=5163 failure error message.

There is no need to take dependency PRs, which might bring more changes. I added a commit to manually fix the build errors.

cool, thanks!

rachguo and others added 7 commits March 21, 2024 18:28
…oad TRT binaries in every build (#19919)

### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.

Similar to #19909

### Motivation and Context

As a follow up of #19118
…ad TRT binaries in every build (#19909)

### Description
Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download
TRT binaries in every build. Now all the other build jobs are already
doing this. This is the only one left.


### Motivation and Context

As a follow up of #19118
### Description
Add support for packed qkv input and rotary embedding with sm<80 using
memory efficient attention kernel.



### Motivation and Context
Allows lower-end gpus to run gqa with packed qkv input and rotary
embedding.
### Description

This PR adds a benchmarking script to measure end-to-end performance and
saves the results in a CSV file.

### Motivation and Context

With this PR, end-to-end performance can be easily measured for many
large-language models such as LLaMA-2. The performance numbers for
LLaMA-2 are located
[here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/models/llama).
### Description
This PR removes early stopping from the end-to-end LLaMA-2 benchmark
script.

### Motivation and Context
This allows models to always generate the requested number of new
tokens.
edgchen1
edgchen1 previously approved these changes Mar 25, 2024
Copy link
Contributor

@edgchen1 edgchen1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusion of #19858 looks good.

tianleiwu
tianleiwu previously approved these changes Mar 27, 2024
The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its
static variables, `created_custom_op_list` and `custom_op_domain`.
This PR makes sure synchronization using mutex.

see issue: #20089
@YUNQIUGUO YUNQIUGUO dismissed stale reviews from tianleiwu and edgchen1 via ed4edfe March 27, 2024 17:00
@YUNQIUGUO
Copy link
Contributor Author

YUNQIUGUO commented Mar 27, 2024

The cherry-pick process so far should be completed. please @microsoft/onnxruntime-es take a look and help approve, thanks.

snnn
snnn previously approved these changes Mar 28, 2024
tianleiwu
tianleiwu previously approved these changes Mar 28, 2024
@YUNQIUGUO
Copy link
Contributor Author

YUNQIUGUO commented Mar 28, 2024

Looking at the required CI testing, I think there has been a dml test failing on the windows GPU CI pipeline: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1335887&view=logs&j=4b364901-e881-5aa9-a2fc-f7beb70b2ce1&s=e4766ded-b3c3-5183-8adb-9072bf8f96b3&t=b32ab5b7-b815-50a4-8a25-e0aac0264208&l=44468

anyone knows is this a known issue or intermittent error?

checked it had failed a couple times in the history of the CI runs.

@snnn
Copy link
Member

snnn commented Mar 28, 2024

### Description
1. change in build.py is to fix DML exception
(https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=10&_a=summary)
2. change in requirements.txt is to fix exception in python packaging
pipeline.
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430433&view=results



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Yi Zhang <[email protected]>
@YUNQIUGUO YUNQIUGUO dismissed stale reviews from kunal-vaishnavi, tianleiwu, and snnn via 5ac3b6f March 28, 2024 23:34
@YUNQIUGUO
Copy link
Contributor Author

YUNQIUGUO commented Mar 28, 2024

You need this: https://github.com/microsoft/onnxruntime/pull/20073/files

thanks added the fix commit.

@YUNQIUGUO YUNQIUGUO merged commit 046d06f into rel-1.17.3 Mar 29, 2024
102 of 108 checks passed
@YUNQIUGUO YUNQIUGUO deleted the yguo/cherry-pick-for-1.17.3 branch March 29, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.