Skip to content

Conversation

@sreekanth-yalachigere
Copy link
Contributor

Filter data is constant for all iterations and currently we are reordering filter memory layout in every iteration. By saving filter memory layout in execution provider, we can avoid 53 data reorders and reduce latency (up to 10 ms)

@sreekanth-yalachigere sreekanth-yalachigere requested a review from a team as a code owner December 10, 2018 06:03
Copy link
Member

@jywu-msft jywu-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address review feedback, including thread safety, non-unique key across models...

sreekanth-yalachigere and others added 23 commits December 17, 2018 12:57
* add check before fusing sub-graph in greedy partitioning

* update the partitioning logic to 1) not fuse sub-graph if inner nodes were assigned 2) avoid resolving graph after each provider capability checking and assignment.

* resolve conflicts
* define gather_nd op

* add test cases

* add test file

* refactor the code and doc

* add test cases

* fix win compile err

* fix win compile err

* adjust indent

* make constructor explicit

* add coment

* remove templates

* remove wrong def

* migrate macros

* fix an issue in shape inference
Allow using MKLML header/libs when use_mklml is specified
* Fixed out of bounds access in ArrayFeatureExtractor.

* some cleanup

* Updated tensor_shape.h comments.

* Updated macro name.

* Added copy assignment, move assignment/ctor to TensorShape.

* Removed i64 literal suffix.

* Fixed test.

* Fixed type of x_num_dims.
* Minor updates to exception message

* update models folder to new location

* update copy to preservenewest
* More Ort prefix changes for consistency

* Fix C# methods

* More C# fixes
* update onnx
…t NodeArg usage. Allows using an initializer from multiple levels up to not fail. We would need to accumulate a list of initializers from all levels up otherwise, and doing so doesn't add any value. (#200)

Improve a comment to clarify when the parent graph NodeArg lookup kicks in.
* Adding the include folder for the C Windows pkg.

* Add import lib to the pkg

* Disable csharp pretrained tests temporarily
… free a re-used output that is used for a dead output (output with zero users). (#214)
- apply any transforms to the main graph and any subgraphs first
  - call Graph::Resolve() once on the main graph, which will recurse into the subgraphs
    - previously it was called after the transform on each subgraph, which results in it traversing up to the main graph to call resolve, and that resolve call recursing into all subgraphs every time.

This avoids lots of unnecessary Graph::Resolve calls, and prevents subgraphs from being broken by SessionStateInitializer::InitializeAndSave calling graph_.CleanAllInitializedTensors() prior to final Graph::Resolve call. If a subgraph has optional inputs the backing initializers were removed by CleanAllInitializedTensors causing the next Resolve to incorrectly turn them into required inputs.
souptc and others added 25 commits December 19, 2018 18:17
* placeholder for internal contrib ops

* remove useless internal file

* fix build break
This helps identify fp accuracy issues
* Initial commit Maxunpool operator

* fix gpu build failure

* remove op test from excluded list

* Change to ORT
* Minor updates to exception message

* update models folder to new location

* update copy to preservenewest

* reenable pretrained test

* added some debugging info for build

* update pretrained test, and tensor proto definition
* refactor the kernel memory type interface

* remove useless change

* fix comments in PR
* More intuitive ordering to the API functions

* Rename TCHAR_T
@sreekanth-yalachigere
Copy link
Contributor Author

creating new PR.

TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this pull request Jul 9, 2025
The onnxruntime_add_shared_library_module() command places generated DLLs in the lib directory instead of the bin on Windows. The PR changes to use onnxruntime_add_shared_library(). I checked most of the essential EPs, and they all use onnxruntime_add_shared_library() as the method to create execution provider targets. The difference between the module and non-module versions of the command is that the module version does not install .lib files for targets it creates.
TedThemistokleous added a commit to TedThemistokleous/onnxruntime that referenced this pull request Jul 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.