-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Filter data (Weights) reorder optimization #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
sreekanth-yalachigere
wants to merge
58
commits into
microsoft:master
from
sreekanth-yalachigere:master
Closed
Filter data (Weights) reorder optimization #134
sreekanth-yalachigere
wants to merge
58
commits into
microsoft:master
from
sreekanth-yalachigere:master
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
snnn
reviewed
Dec 10, 2018
jywu-msft
reviewed
Dec 10, 2018
jywu-msft
reviewed
Dec 10, 2018
jywu-msft
reviewed
Dec 10, 2018
jywu-msft
reviewed
Dec 10, 2018
jywu-msft
requested changes
Dec 10, 2018
Member
jywu-msft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please address review feedback, including thread safety, non-unique key across models...
…pdating the type and shape info. (#195)
* add check before fusing sub-graph in greedy partitioning * update the partitioning logic to 1) not fuse sub-graph if inner nodes were assigned 2) avoid resolving graph after each provider capability checking and assignment. * resolve conflicts
* define gather_nd op * add test cases * add test file * refactor the code and doc * add test cases * fix win compile err * fix win compile err * adjust indent * make constructor explicit * add coment * remove templates * remove wrong def * migrate macros * fix an issue in shape inference
Allow using MKLML header/libs when use_mklml is specified
* Fixed out of bounds access in ArrayFeatureExtractor. * some cleanup * Updated tensor_shape.h comments. * Updated macro name. * Added copy assignment, move assignment/ctor to TensorShape. * Removed i64 literal suffix. * Fixed test. * Fixed type of x_num_dims.
* Minor updates to exception message * update models folder to new location * update copy to preservenewest
* More Ort prefix changes for consistency * Fix C# methods * More C# fixes
* update onnx
…t NodeArg usage. Allows using an initializer from multiple levels up to not fail. We would need to accumulate a list of initializers from all levels up otherwise, and doing so doesn't add any value. (#200) Improve a comment to clarify when the parent graph NodeArg lookup kicks in.
* Adding the include folder for the C Windows pkg. * Add import lib to the pkg * Disable csharp pretrained tests temporarily
… free a re-used output that is used for a dead output (output with zero users). (#214)
- apply any transforms to the main graph and any subgraphs first
- call Graph::Resolve() once on the main graph, which will recurse into the subgraphs
- previously it was called after the transform on each subgraph, which results in it traversing up to the main graph to call resolve, and that resolve call recursing into all subgraphs every time.
This avoids lots of unnecessary Graph::Resolve calls, and prevents subgraphs from being broken by SessionStateInitializer::InitializeAndSave calling graph_.CleanAllInitializedTensors() prior to final Graph::Resolve call. If a subgraph has optional inputs the backing initializers were removed by CleanAllInitializedTensors causing the next Resolve to incorrectly turn them into required inputs.
* placeholder for internal contrib ops * remove useless internal file * fix build break
This helps identify fp accuracy issues
* Initial commit Maxunpool operator * fix gpu build failure * remove op test from excluded list * Change to ORT
* Minor updates to exception message * update models folder to new location * update copy to preservenewest * reenable pretrained test * added some debugging info for build * update pretrained test, and tensor proto definition
* refactor the kernel memory type interface * remove useless change * fix comments in PR
* More intuitive ordering to the API functions * Rename TCHAR_T
Contributor
Author
|
creating new PR. |
TedThemistokleous
pushed a commit
to TedThemistokleous/onnxruntime
that referenced
this pull request
Jul 9, 2025
The onnxruntime_add_shared_library_module() command places generated DLLs in the lib directory instead of the bin on Windows. The PR changes to use onnxruntime_add_shared_library(). I checked most of the essential EPs, and they all use onnxruntime_add_shared_library() as the method to create execution provider targets. The difference between the module and non-module versions of the command is that the module version does not install .lib files for targets it creates.
TedThemistokleous
added a commit
to TedThemistokleous/onnxruntime
that referenced
this pull request
Jul 9, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Filter data is constant for all iterations and currently we are reordering filter memory layout in every iteration. By saving filter memory layout in execution provider, we can avoid
53 data reordersand reduce latency (up to 10 ms)