-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry-pick for 1.17.3 #20013
Cherry-pick for 1.17.3 #20013
Conversation
### Description <!-- Describe your changes. --> Windows memory map casts mapped_offset to DWORD directly. It will be truncated if it is larger than 2^32-1. We need to set high dwFileOffsetHigh for this case. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The bug was found from #19450
Answers issue #19640 More details are in the issue, basically I am changing all the include directory and link directory usage to CMake's `CUDA::*` targets
### Description Make Linux logic consistent as Windows ### Motivation and Context onnxruntime_lite_custom_op.h in Windows zip package but not in Linux zip package https://github.com/microsoft/onnxruntime/blob/acbfc29f272b5578145e7600bc42342e116ffbc2/tools/ci_build/github/azure-pipelines/templates/c-api-artifacts-package-and-publish-steps-windows.yml#L67 Co-authored-by: Your Name <[email protected]>
Fix some linker errors that come up when integrating the onnxruntime-training-c pod into another Xcode project. The problematic configuration is a minimal build with training APIs enabled. - training_op_defs.o had some unresolved references to ONNX functions. It should not be included at all in a minimal build. - tree_ensemble_helper.o also had unresolved references to ONNX ParseData. The containing function is unused in a minimal build. Added a test to cover this configuration.
#19845) fix: "UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only." ### Description Include Windows 11 in the version check. Now, you will not see the warning “Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.” ### Motivation and Context Warning on Windows 11: Only supports systems above Windows 10, which is somewhat strange.
fix build break caused by image update. tensorrt isn't expected to pass all onnx node tests.
Fixed all CUDA NHWC Pooling operations which were broken and enabled the NHWC CUDA pooling tests. Disabled all pooling tests which are not supported by the CUDA EP. Ensure parity between CUDA NHWC / NCHW and work towards 100% tests enabled for the CUDA EP / CUDA NHWC EP. --------- Co-authored-by: Tianlei Wu <[email protected]>
Had to address a couple merge conflicts so please have a look at the current changes - help make sure it's correct. |
Is 1.17.3 a release for all APIs or just the web one? If it's for all APIs could we get #19942 merged in? It fixes a regression in 1.17 from 1.16 which we've hit in practice. |
Sure - mind clarifying what packages this change apply to? |
It's a fix to the CPU provider implementation of the |
@tianleiwu @mtavenrath looks like this pr #19889 introduces a bunch of build failures on the release branch. mind sharing is there any dependency prs that I need to take or so? https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1331111&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=38e4b5a4-9041-5949-6670-6ace666c420b&l=5163 failure error message. |
### Description Previously, GQA incorrectly enforced rotary cos and sin cache to be of sequence length equal to present sequence length. Now it enforces that it be greater than or equal to present sequence length since to match Rotary Embedding Op it should be of max_sequence_length ### Motivation and Context Fixes issue with fusing Rotary Embedding and GQA for certain models which prefer this optimization.
### Description GQA Rotary Dimension 1 incorrectly assumed to be based on head size. ### Motivation and Context This change should enable us to run phi-2 with GQA and Rotary Embedding fused.
### Description This PR updates the replacement of MultiHeadAttention (MHA) with GroupQueryAttention (GQA). It is related to the changes in [this PR](#18906). ### Motivation and Context The updated replacement of MHA with GQA includes the following fusion changes. - Apply sliding window within GQA - Fuse the rotary embeddings within GQA - Fuse the 3 MatMuls into 1 packed MatMul if possible - Fuse the 3 Adds into 1 packed Add if possible
There is no need to take dependency PRs, which might bring more changes. I added a commit to manually fix the build errors. |
cool, thanks! |
…osoft/onnxruntime into yguo/cherry-pick-for-1.17.3
…oad TRT binaries in every build (#19919) ### Description Change nuget pipeline's "Final_Jar_Testing_Windows_GPU" job to download TRT binaries in every build. Now all the other build jobs are already doing this. This is the only one left. Similar to #19909 ### Motivation and Context As a follow up of #19118
### Description Add support for packed qkv input and rotary embedding with sm<80 using memory efficient attention kernel. ### Motivation and Context Allows lower-end gpus to run gqa with packed qkv input and rotary embedding.
### Description This PR adds a benchmarking script to measure end-to-end performance and saves the results in a CSV file. ### Motivation and Context With this PR, end-to-end performance can be easily measured for many large-language models such as LLaMA-2. The performance numbers for LLaMA-2 are located [here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/models/llama).
### Description This PR removes early stopping from the end-to-end LLaMA-2 benchmark script. ### Motivation and Context This allows models to always generate the requested number of new tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inclusion of #19858 looks good.
The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its static variables, `created_custom_op_list` and `custom_op_domain`. This PR makes sure synchronization using mutex. see issue: #20089
The cherry-pick process so far should be completed. please @microsoft/onnxruntime-es take a look and help approve, thanks. |
Looking at the required CI testing, I think there has been a dml test failing on the windows GPU CI pipeline: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1335887&view=logs&j=4b364901-e881-5aa9-a2fc-f7beb70b2ce1&s=e4766ded-b3c3-5183-8adb-9072bf8f96b3&t=b32ab5b7-b815-50a4-8a25-e0aac0264208&l=44468 anyone knows is this a known issue or intermittent error? checked it had failed a couple times in the history of the CI runs. |
### Description 1. change in build.py is to fix DML exception (https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=10&_a=summary) 2. change in requirements.txt is to fix exception in python packaging pipeline. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=430433&view=results ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yi Zhang <[email protected]>
5ac3b6f
thanks added the fix commit. |
Description
Web prs are not included yet.
Motivation and Context