diff --git a/.gitignore b/.gitignore index a3e9f552dd1c3..27706f2f83844 100644 --- a/.gitignore +++ b/.gitignore @@ -11,6 +11,7 @@ distribute/* *.bin cmake_build .cmake_build +cmake-build-debug gen *~ .vs diff --git a/BUILD.md b/BUILD.md index 08ffc1a4c3421..06d878f922487 100644 --- a/BUILD.md +++ b/BUILD.md @@ -36,14 +36,17 @@ ONNX Runtime python binding only supports Python 3.5, 3.6 and 3.7. cd onnxruntime ``` 2. Install cmake-3.13 or better from https://cmake.org/download/. -3. (optional) Install protobuf 3.6.1 from source code (cmake/external/protobuf). CMake flag protobuf\_BUILD\_SHARED\_LIBS must be turned off. After the installation, you should have the 'protoc' executable in your PATH. -4. (optional) Install onnx from source code (cmake/external/onnx) +3. (optional) Install protobuf 3.6.1 from source code (cmake/external/protobuf). CMake flag protobuf\_BUILD\_SHARED\_LIBS must be turned OFF on Windows and turned ON on Linux. After the installation, you should have the 'protoc' executable in your PATH. On Linux it is recommended to run `ldconfig` to make sure protobuf libraries are found. +4. If you installed your protobuf in a non standard location it would be helpful on Linux build to set the following env var: +`export CMAKE_ARGS="-DONNX_CUSTOM_PROTOC_EXECUTABLE=full path to protoc"` so ONNX build can find it. +On Linux also run `ldconfig ` so the linker can find protobuf libraries. +5. (optional) Install onnx from source code (cmake/external/onnx) ``` export ONNX_ML=1 python3 setup.py bdist_wheel pip3 install --upgrade dist/*.whl ``` -5. Run `./build.sh --config RelWithDebInfo --build_wheel` for Linux (or `build.bat --config RelWithDebInfo --build_wheel` for Windows) +6. Run `./build.sh --config RelWithDebInfo --build_wheel` for Linux (or `build.bat --config RelWithDebInfo --build_wheel` for Windows). Upon successful build you should be able to find the wheel under `dist` folder. The build script runs all unit tests by default (for native builds and skips tests by default for cross-compiled builds). @@ -53,6 +56,10 @@ The complete list of build options can be found by running `./build.sh (or ./bui 1. For Windows, just add --x86 argument when launching build.bat 2. For Linux, it must be built out of a x86 os, --x86 argument also needs be specified to build.sh +## Build ONNX Runtime Server on Linux + +1. In the ONNX Runtime root folder, run `./build.sh --config RelWithDebInfo --build_server --use_openmp --parallel` + ## Build/Test Flavors for CI ### CI Build Environments @@ -110,6 +117,9 @@ If you want to build with an earlier version, you must temporarily remove the 'C To build ONNX Runtime with MKL-DNN support, build it with `./build.sh --use_mkldnn` To build ONNX Runtime using MKL-DNN built with dependency on MKL small libraries, build it with `./build.sh --use_mkldnn --use_mklml` +### nGraph +ONNX runtime with nGraph as an execution provider (released as preview) can be built on Linux as follows : `./build.sh --use_ngraph`. Similarly, on Windows use `.\build.bat --use_ngraph`. + ### TensorRT ONNX Runtime supports the TensorRT execution provider (released as preview). You will need to download and install [CUDA](https://developer.nvidia.com/cuda-toolkit), [CUDNN](https://developer.nvidia.com/cudnn) and [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download). diff --git a/README.md b/README.md index f0ca7bbb9a71a..876cab44f49a8 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ ONNX is an open format for machine learning (ML) models that is supported by various ML and DNN frameworks and tools. This format makes it easier to interoperate between frameworks and to maximize the reach of your hardware optimization investments. Learn more about ONNX on [https://onnx.ai](https://onnx.ai) or view the [Github Repo](https://github.com/onnx/onnx). # Why use ONNX Runtime -ONNX Runtime has an open architecture that is continually evolving to address the newest developments and challenges in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard, supporting all ONNX releases with future compatibliity and maintaining backwards compatibility with prior releases. +ONNX Runtime has an open architecture that is continually evolving to address the newest developments and challenges in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard, supporting all ONNX releases with future compatibility and maintaining backwards compatibility with prior releases. ONNX Runtime continuously strives to provide top performance for a broad and growing number of usage scenarios in Machine Learning. Our investments focus on: 1. Run any ONNX model @@ -74,8 +74,8 @@ system. | API Documentation | CPU package | GPU package | |-----|-------------|-------------| | [Python](https://aka.ms/onnxruntime-python) | [Available on Pypi](https://pypi.org/project/onnxruntime)

| [Available on Pypi](https://pypi.org/project/onnxruntime-gpu)


| -| [C#](docs/CSharp_API.md) | Available on Nuget : [MLAS+Eigen](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime/), [MKL-ML](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.MKLML/)
| [Available on Nuget](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Gpu/)

| -| [C](docs/C_API.md) | Available on Nuget : [MLAS+Eigen](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime/), [MKL-ML](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.MKLML/)

[Files (.zip, .tgz)](https://aka.ms/onnxruntime-release)
| [Available on Nuget](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Gpu/)


[Files (.zip, .tgz)](https://aka.ms/onnxruntime-release)

| +| [C#](docs/CSharp_API.md) | **Available on Nuget :**
[MLAS+Eigen](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime/)

[MKL-ML](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.MKLML/)| [Available on Nuget](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Gpu/)

| +| [C](docs/C_API.md) | **Available on Nuget :**
[MLAS+Eigen](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime/)

[MKL-ML](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.MKLML/)

[Binaries (.zip, .tgz)](https://aka.ms/onnxruntime-release)
| [Available on Nuget](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Gpu/)


[Binaries (.zip, .tgz)](https://aka.ms/onnxruntime-release)

| | [C++](onnxruntime/core/session/inference_session.h) | [Build from source](https://github.com/Microsoft/onnxruntime/blob/master/BUILD.md) | [Build from source](https://github.com/Microsoft/onnxruntime/blob/master/BUILD.md) | For builds using other execution providers, see Build Details below. diff --git a/TensorRT-ExecutionProvider.md b/TensorRT-ExecutionProvider.md deleted file mode 100644 index 37c4c75ff58fa..0000000000000 --- a/TensorRT-ExecutionProvider.md +++ /dev/null @@ -1,24 +0,0 @@ -## TensortRT Execution Provider (preview) - -The TensorRT execution provider in the ONNX Runtime will make use of NVIDIA's [TensortRT](https://developer.nvidia.com/tensorrt) Deep Learning inferencing engine to accelerate ONNX model in their family of GPUs. Microsoft and NVIDIA worked closely to integrate the TensorRT execution provider with ONNX Runtime. - -This execution provider release is currently in preview but, we have validated support for all the ONNX Models in the model zoo. With the TensorRT execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic GPU acceleration. - -### Build TensorRT execution provider -Developers can now tap into the power of TensorRT through ONNX Runtime to accelerate inferencing of ONNX models. Instructions to build the TensorRT execution provider from source is available [here](https://github.com/Microsoft/onnxruntime/blob/master/BUILD.md#build). - -### Using the TensorRT execution provider -#### C/C++ -The TensortRT execution provider needs to be registered with ONNX Runtime to enable in the inference session. -``` -InferenceSession session_object{so}; -session_object.RegisterExecutionProvider(std::make_unique<::onnxruntime::TensorrtExecutionProvider>()); -status = session_object.Load(model_file_name); -``` -The C API details are [here](https://github.com/Microsoft/onnxruntime/blob/master/docs/C_API.md#c-api). - -### Python -When using the python wheel from the ONNX Runtime build with TensorRT execution provider, it will be automatically prioritized over the default GPU or CPU execution providers. There is no need to separately register the execution provider. Python APIs details are [here](https://github.com/Microsoft/onnxruntime/blob/master/docs/python/api_summary.rst#api-summary). - -### Using onnxruntime_perf_test -You can test the performance for your ONNX Model with the TensorRT execution provider. Use the flag `-e tensorrt` in [onnxruntime_perf_test](https://github.com/Microsoft/onnxruntime/tree/master/onnxruntime/test/perftest#onnxruntime-performance-test). diff --git a/cgmanifest.json b/cgmanifest.json index f6b5066a82775..3e397eadd5c00 100644 --- a/cgmanifest.json +++ b/cgmanifest.json @@ -49,7 +49,7 @@ "component":{ "type":"git", "git":{ - "commitHash":"c1c04af4e9fa0c96fbc1fda7b330bb994118f3c5", + "commitHash":"7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef", "repositoryUrl":"https://github.com/onnx/onnx.git" } } diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt index 96c89fefe70e6..ad2211e30f0fe 100644 --- a/cmake/CMakeLists.txt +++ b/cmake/CMakeLists.txt @@ -68,8 +68,10 @@ option(onnxruntime_ENABLE_MICROSOFT_INTERNAL "Use this option to enable/disable option(onnxruntime_USE_NUPHAR "Build with Nupha" OFF) option(onnxruntime_USE_BRAINSLICE "Build with BrainSlice" OFF) option(onnxruntime_USE_TENSORRT "Build with TensorRT support" OFF) -option(onnxruntime_ENABLE_LTO "Enable link time optimization, which is not stable on older GCCs" OFF) +option(onnxruntime_ENABLE_LTO "Enable link time optimization" ON) + option(onnxruntime_CROSS_COMPILING "Cross compiling onnx runtime" OFF) +option(onnxruntime_BUILD_SERVER "Build ONNX Runtime Server" OFF) option(onnxruntime_USE_FULL_PROTOBUF "Use full protobuf" OFF) option(onnxruntime_DISABLE_CONTRIB_OPS "Disable contrib ops" OFF) option(onnxruntime_USE_EIGEN_THREADPOOL "Use eigen threadpool. Otherwise OpenMP or a homemade one will be used" OFF) @@ -81,18 +83,16 @@ set(NSYNC_ENABLE_TESTS OFF CACHE BOOL "Build protobuf tests" FORCE) set(ONNX_ML 1) if(onnxruntime_ENABLE_LTO) - #LTO(or LTCG) is great, in our case it can reduce binary size by 1/3. - #cmake can only help us check if the compiler support LTO or not, it can't tell us if the feature works well - #Don't enable this option in Ubuntu 16.04, protoc will crash - include(CheckIPOSupported) - check_ipo_supported(RESULT ipo_enabled OUTPUT ipo_output) #TODO: figure out why nsync doesn't work - if(NOT onnxruntime_USE_NSYNC) - if(ipo_enabled) - set(CMAKE_INTERPROCEDURAL_OPTIMIZATION_RELEASE ON) - set(CMAKE_INTERPROCEDURAL_OPTIMIZATION_RELWITHDEBINFO ON) - else() - message(WARNING "IPO is not supported: ${ipo_output}") + if(onnxruntime_USE_NSYNC) + message(WARNING "IPO is not supported when nsync is in use") + set(onnxruntime_ENABLE_LTO OFF) + else() + include(CheckIPOSupported) + check_ipo_supported(RESULT ipo_enabled OUTPUT ipo_output) + if(NOT ipo_enabled) + message(WARNING "IPO is not supported by this compiler") + set(onnxruntime_ENABLE_LTO OFF) endif() endif() endif() @@ -243,11 +243,6 @@ endif() add_executable(protobuf::protoc ALIAS protoc) include(protobuf_function.cmake) - -if (onnxruntime_USE_FULL_PROTOBUF) - add_definitions(-DUSE_FULL_PROTOBUF) -endif() - if (onnxruntime_DISABLE_CONTRIB_OPS) add_definitions(-DDISABLE_CONTRIB_OPS) endif() @@ -436,7 +431,6 @@ else() string(APPEND CMAKE_CXX_FLAGS_DEBUG " -Wno-nonnull-compare") string(APPEND CMAKE_C_FLAGS_DEBUG " -Wno-nonnull-compare") endif() - string(APPEND CMAKE_CXX_FLAGS " -Wno-error=sign-compare") if(HAS_PARENTHESES) string(APPEND CMAKE_CXX_FLAGS " -Wno-parentheses") endif() @@ -477,9 +471,6 @@ if (onnxruntime_USE_MKLDNN) endif() if (onnxruntime_USE_NGRAPH) - if (Win32) - message(FATAL_ERROR "nGraph is not currently supported on Windows.") - endif() #if (onnxruntime_USE_OPENMP) # message(FATAL_ERROR "Please set onnxruntime_USE_OPENMP=OFF for nGraph execution provider.") #endif() @@ -607,6 +598,10 @@ if (onnxruntime_BUILD_SHARED_LIB) include(onnxruntime.cmake) endif() +if (onnxruntime_BUILD_SERVER) + include(onnxruntime_server.cmake) +endif() + # some of the tests rely on the shared libs to be # built; hence the ordering if (onnxruntime_BUILD_UNIT_TESTS) @@ -633,3 +628,4 @@ if (onnxruntime_BUILD_CSHARP) # set_property(GLOBAL PROPERTY VS_DOTNET_TARGET_FRAMEWORK_VERSION "netstandard2.0") include(onnxruntime_csharp.cmake) endif() + diff --git a/cmake/external/ngraph.cmake b/cmake/external/ngraph.cmake index 50ab13d5cc056..65b7159e34bee 100644 --- a/cmake/external/ngraph.cmake +++ b/cmake/external/ngraph.cmake @@ -6,7 +6,7 @@ include (ExternalProject) set(ngraph_ROOT_DIR ${CMAKE_CURRENT_BINARY_DIR}/external/ngraph) set(ngraph_INSTALL_DIR ${ngraph_ROOT_DIR}) set(ngraph_INCLUDE_DIRS ${ngraph_INSTALL_DIR}/include) -set(ngraph_LIBRARIES ${ngraph_INSTALL_DIR}/lib) +set(ngraph_LIBRARIES ${ngraph_INSTALL_DIR}/${CMAKE_INSTALL_LIBDIR}) set(ngraph_SRC ${CMAKE_CURRENT_BINARY_DIR}/ngraph/src/project_ngraph) set(prebuilt_ONNX_SOURCE_DIR "${PROJECT_SOURCE_DIR}/external/onnx") set(prebuilt_ONNX_BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/onnx") @@ -14,44 +14,95 @@ set(ngraph_URL "https://github.com/NervanaSystems/ngraph.git") set(ngraph_TAG "v0.18.1") # Libraries for python package. -set(NGRAPH_SHARED_LIB libngraph.so) -set(NGRAPH_CODEGEN_SHARED_LIB libcodegen.so) -set(NGRAPH_CPU_BACKEND_SHARED_LIB libcpu_backend.so) -set(NGRAPH_IOMP5MD_SHARED_LIB libiomp5.so) -set(NGRAPH_MKLDNN_SHARED_LIB libmkldnn.so) -set(NGRAPH_MKLML_SHARED_LIB libmklml_intel.so) -if("${CMAKE_BUILD_TYPE}" STREQUAL "Debug") - set(NGRAPH_TBB_SHARED_LIB libtbb_debug.so) - set(NGRAPH_TBB_SHARED_LIB_2 libtbb_debug.so.2) +if (WIN32) + set(NGRAPH_SHARED_LIB ngraph.dll) + set(NGRAPH_CPU_BACKEND_SHARED_LIB cpu_backend.dll) + set(NGRAPH_IOMP5MD_SHARED_LIB libiomp5md.dll) + set(NGRAPH_MKLDNN_SHARED_LIB mkldnn.dll) + set(NGRAPH_MKLML_SHARED_LIB mklml.dll) + if("${CMAKE_BUILD_TYPE}" STREQUAL "Debug") + set(NGRAPH_TBB_SHARED_LIB tbb_debug.dll) + else() + set(NGRAPH_TBB_SHARED_LIB tbb.dll) + endif() else() - set(NGRAPH_TBB_SHARED_LIB libtbb.so) - set(NGRAPH_TBB_SHARED_LIB_2 libtbb.so.2) + set(NGRAPH_SHARED_LIB libngraph.so) + set(NGRAPH_CODEGEN_SHARED_LIB libcodegen.so) + set(NGRAPH_CPU_BACKEND_SHARED_LIB libcpu_backend.so) + set(NGRAPH_IOMP5MD_SHARED_LIB libiomp5.so) + set(NGRAPH_MKLDNN_SHARED_LIB libmkldnn.so) + set(NGRAPH_MKLML_SHARED_LIB libmklml_intel.so) + if("${CMAKE_BUILD_TYPE}" STREQUAL "Debug") + set(NGRAPH_TBB_SHARED_LIB libtbb_debug.so) + set(NGRAPH_TBB_SHARED_LIB_2 libtbb_debug.so.2) + else() + set(NGRAPH_TBB_SHARED_LIB libtbb.so) + set(NGRAPH_TBB_SHARED_LIB_2 libtbb.so.2) + endif() endif() -ExternalProject_Add(project_ngraph - PREFIX ngraph - GIT_REPOSITORY ${ngraph_URL} - GIT_TAG ${ngraph_TAG} - # Here we use onnx and protobuf built by onnxruntime to avoid linking with incompatible libraries. This might change in future. - PATCH_COMMAND ${CMAKE_COMMAND} -E copy ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_onnx.cmake ${ngraph_SRC}/cmake/external_onnx.cmake - # TODO: Use cmake.file+copy as above. - COMMAND patch -p1 < ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_protobuf.patch - CMAKE_ARGS - -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} - -DNGRAPH_USE_PREBUILT_LLVM=TRUE - -DNGRAPH_USE_SYSTEM_PROTOBUF=FALSE - -DNGRAPH_ONNX_IMPORT_ENABLE=TRUE - -DNGRAPH_INTERPRETER_ENABLE=FALSE - -DNGRAPH_ONNXIFI_ENABLE=FALSE - -DNGRAPH_UNIT_TEST_ENABLE=FALSE - -DNGRAPH_TOOLS_ENABLE=FALSE - -DCMAKE_INSTALL_PREFIX=${ngraph_INSTALL_DIR} - -Dprebuilt_ONNX_BINARY_DIR=${prebuilt_ONNX_BINARY_DIR} - -Dprebuilt_ONNX_SOURCE_DIR=${prebuilt_ONNX_SOURCE_DIR} - DEPENDS onnx - ) - -add_library(ngraph SHARED IMPORTED) -set_property(TARGET ngraph PROPERTY IMPORTED_LOCATION ${ngraph_LIBRARIES}/${NGRAPH_SHARED_LIB}) +# discard prior changes due to unblock incremental builds. +set(NGRAPH_PATCH_DISCARD_COMMAND cd ${ngraph_SRC} && git checkout -- .) + +if (MSVC) + set(prebuilt_ONNX_BINARY_DIR "${CMAKE_CURRENT_BINARY_DIR}/onnx/${CMAKE_BUILD_TYPE}") + set(prebuilt_ONNX_SOURCE_DIR "${CMAKE_CURRENT_BINARY_DIR}") + + + # For the moment, Windows does not support codegen, it works on DEX-only mode + ExternalProject_Add(project_ngraph + PREFIX ngraph + GIT_REPOSITORY ${ngraph_URL} + GIT_TAG ${ngraph_TAG} + PATCH_COMMAND ${NGRAPH_PATCH_DISCARD_COMMAND} + COMMAND ${CMAKE_COMMAND} -E copy ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_onnx.cmake ${ngraph_SRC}/cmake/external_onnx.cmake + COMMAND git apply --ignore-space-change --ignore-whitespace ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_protobuf.patch + COMMAND git apply --ignore-space-change --ignore-whitespace ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_fix_install_error.patch + COMMAND git apply --ignore-space-change --ignore-whitespace ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_fix_library_path.patch + CMAKE_ARGS + -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} + -DNGRAPH_DEX_ONLY=ON + -DNGRAPH_USE_SYSTEM_PROTOBUF=FALSE + -DNGRAPH_ONNX_IMPORT_ENABLE=TRUE + -DNGRAPH_INTERPRETER_ENABLE=FALSE + -DNGRAPH_ONNXIFI_ENABLE=FALSE + -DNGRAPH_UNIT_TEST_ENABLE=FALSE + -DNGRAPH_TOOLS_ENABLE=FALSE + -DCMAKE_INSTALL_PREFIX=${ngraph_INSTALL_DIR} + -Dprebuilt_ONNX_BINARY_DIR=${prebuilt_ONNX_BINARY_DIR} + -Dprebuilt_ONNX_SOURCE_DIR=${prebuilt_ONNX_SOURCE_DIR} + DEPENDS onnx + ) + add_library(ngraph STATIC IMPORTED) + set_property(TARGET ngraph PROPERTY IMPORTED_LOCATION ${ngraph_LIBRARIES}/ngraph.lib) +else() + ExternalProject_Add(project_ngraph + PREFIX ngraph + GIT_REPOSITORY ${ngraph_URL} + GIT_TAG ${ngraph_TAG} + GIT_SHALLOW TRUE + PATCH_COMMAND ${NGRAPH_PATCH_DISCARD_COMMAND} + # Here we use onnx and protobuf built by onnxruntime to avoid linking with incompatible libraries. This might change in future. + COMMAND ${CMAKE_COMMAND} -E copy ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_onnx.cmake ${ngraph_SRC}/cmake/external_onnx.cmake + # TODO: Use cmake.file+copy as above. + COMMAND git apply --ignore-space-change --ignore-whitespace ${PROJECT_SOURCE_DIR}/patches/ngraph/ngraph_protobuf.patch + CMAKE_ARGS + -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} + -DNGRAPH_USE_PREBUILT_LLVM=TRUE + -DNGRAPH_USE_SYSTEM_PROTOBUF=FALSE + -DNGRAPH_ONNX_IMPORT_ENABLE=TRUE + -DNGRAPH_INTERPRETER_ENABLE=FALSE + -DNGRAPH_ONNXIFI_ENABLE=FALSE + -DNGRAPH_UNIT_TEST_ENABLE=FALSE + -DNGRAPH_TOOLS_ENABLE=FALSE + -DCMAKE_INSTALL_PREFIX=${ngraph_INSTALL_DIR} + -Dprebuilt_ONNX_BINARY_DIR=${prebuilt_ONNX_BINARY_DIR} + -Dprebuilt_ONNX_SOURCE_DIR=${prebuilt_ONNX_SOURCE_DIR} + DEPENDS onnx + ) + + add_library(ngraph SHARED IMPORTED) + set_property(TARGET ngraph PROPERTY IMPORTED_LOCATION ${ngraph_LIBRARIES}/${NGRAPH_SHARED_LIB}) +endif() add_dependencies(ngraph project_ngraph) include_directories(${ngraph_INCLUDE_DIRS}) diff --git a/cmake/external/onnx b/cmake/external/onnx index c1c04af4e9fa0..7d7bc83d29a32 160000 --- a/cmake/external/onnx +++ b/cmake/external/onnx @@ -1 +1 @@ -Subproject commit c1c04af4e9fa0c96fbc1fda7b330bb994118f3c5 +Subproject commit 7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef diff --git a/cmake/onnxruntime.cmake b/cmake/onnxruntime.cmake index 952732b99d900..ecf833a546073 100644 --- a/cmake/onnxruntime.cmake +++ b/cmake/onnxruntime.cmake @@ -74,7 +74,10 @@ target_link_libraries(onnxruntime PRIVATE set_property(TARGET onnxruntime APPEND_STRING PROPERTY LINK_FLAGS ${ONNXRUNTIME_SO_LINK_FLAG}) set_target_properties(onnxruntime PROPERTIES LINK_DEPENDS ${SYMBOL_FILE}) - +if(onnxruntime_ENABLE_LTO) + set_target_properties(onnxruntime PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE TRUE) + set_target_properties(onnxruntime PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELWITHDEBINFO TRUE) +endif() install(TARGETS onnxruntime ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} diff --git a/cmake/onnxruntime_mlas.cmake b/cmake/onnxruntime_mlas.cmake index 1e19c11b30264..a9c5ffec37ef7 100644 --- a/cmake/onnxruntime_mlas.cmake +++ b/cmake/onnxruntime_mlas.cmake @@ -10,6 +10,7 @@ set(mlas_common_srcs ${ONNXRUNTIME_ROOT}/core/mlas/lib/activate.cpp ${ONNXRUNTIME_ROOT}/core/mlas/lib/logistic.cpp ${ONNXRUNTIME_ROOT}/core/mlas/lib/tanh.cpp + ${ONNXRUNTIME_ROOT}/core/mlas/lib/erf.cpp ) if(MSVC) @@ -65,6 +66,7 @@ if(MSVC) ${ONNXRUNTIME_ROOT}/core/mlas/lib/amd64/cvtfp16a.asm ${ONNXRUNTIME_ROOT}/core/mlas/lib/amd64/LogisticKernelFma3.asm ${ONNXRUNTIME_ROOT}/core/mlas/lib/amd64/TanhKernelFma3.asm + ${ONNXRUNTIME_ROOT}/core/mlas/lib/amd64/ErfKernelFma3.asm ) endif() @@ -157,6 +159,7 @@ else() ${ONNXRUNTIME_ROOT}/core/mlas/lib/x86_64/SgemmKernelFma3.S ${ONNXRUNTIME_ROOT}/core/mlas/lib/x86_64/LogisticKernelFma3.S ${ONNXRUNTIME_ROOT}/core/mlas/lib/x86_64/TanhKernelFma3.S + ${ONNXRUNTIME_ROOT}/core/mlas/lib/x86_64/ErfKernelFma3.S ) set_source_files_properties(${mlas_platform_srcs_avx2} PROPERTIES COMPILE_FLAGS "-mavx2 -mfma") diff --git a/cmake/onnxruntime_providers.cmake b/cmake/onnxruntime_providers.cmake index 15ad19948ee4f..6c17c6b12eb4f 100644 --- a/cmake/onnxruntime_providers.cmake +++ b/cmake/onnxruntime_providers.cmake @@ -68,6 +68,8 @@ if (onnxruntime_USE_CUDA) if (UNIX) target_compile_options(onnxruntime_providers_cuda PRIVATE "$<$:SHELL:-Xcompiler -Wno-reorder>" "$<$>:-Wno-reorder>") + target_compile_options(onnxruntime_providers_cuda PRIVATE "$<$:SHELL:-Xcompiler -Wno-error=sign-compare>" + "$<$>:-Wno-error=sign-compare>") endif() onnxruntime_add_include_to_target(onnxruntime_providers_cuda onnxruntime_common onnxruntime_framework gsl onnx onnx_proto protobuf::libprotobuf) add_dependencies(onnxruntime_providers_cuda ${onnxruntime_EXTERNAL_DEPENDENCIES} ${onnxruntime_tvm_dependencies}) @@ -195,9 +197,10 @@ if (onnxruntime_USE_NGRAPH) target_include_directories(onnxruntime_providers_ngraph PRIVATE ${ONNXRUNTIME_ROOT} ${ngraph_INCLUDE_DIRS}) set_target_properties(onnxruntime_providers_ngraph PROPERTIES LINKER_LANGUAGE CXX) - target_compile_options(onnxruntime_providers_ngraph PRIVATE "SHELL:-Wformat" "SHELL:-Wformat-security" "SHELL:-fstack-protector-strong" "SHELL:-D_FORTIFY_SOURCE=2") - target_link_options(onnxruntime_providers_ngraph PRIVATE "LINKER:-z, noexecstack " "LINKER:-z relro" "LINKER:-z now" "LINKER:-pie") - + if (NOT MSVC) + target_compile_options(onnxruntime_providers_ngraph PRIVATE "SHELL:-Wformat" "SHELL:-Wformat-security" "SHELL:-fstack-protector-strong" "SHELL:-D_FORTIFY_SOURCE=2") + target_link_options(onnxruntime_providers_ngraph PRIVATE "LINKER:-z, noexecstack " "LINKER:-z relro" "LINKER:-z now" "LINKER:-pie") + endif() endif() if (onnxruntime_ENABLE_MICROSOFT_INTERNAL) diff --git a/cmake/onnxruntime_python.cmake b/cmake/onnxruntime_python.cmake index 6f2d325090a9a..24a2e973bcd5b 100644 --- a/cmake/onnxruntime_python.cmake +++ b/cmake/onnxruntime_python.cmake @@ -109,7 +109,10 @@ endif() set_target_properties(onnxruntime_pybind11_state PROPERTIES PREFIX "") set_target_properties(onnxruntime_pybind11_state PROPERTIES FOLDER "ONNXRuntime") - +if(onnxruntime_ENABLE_LTO) + set_target_properties(onnxruntime_pybind11_state PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELEASE TRUE) + set_target_properties(onnxruntime_pybind11_state PROPERTIES INTERPROCEDURAL_OPTIMIZATION_RELWITHDEBINFO TRUE) +endif() if (MSVC) set_target_properties(onnxruntime_pybind11_state PROPERTIES SUFFIX ".pyd") else() diff --git a/cmake/onnxruntime_unittests.cmake b/cmake/onnxruntime_unittests.cmake index c3e8083d006e2..53f9524b23d48 100644 --- a/cmake/onnxruntime_unittests.cmake +++ b/cmake/onnxruntime_unittests.cmake @@ -49,6 +49,8 @@ function(AddTest) target_compile_options(${_UT_TARGET} PRIVATE ${disabled_warnings}) else() target_compile_options(${_UT_TARGET} PRIVATE ${DISABLED_WARNINGS_FOR_TVM}) + target_compile_options(${_UT_TARGET} PRIVATE "$<$:SHELL:-Xcompiler -Wno-error=sign-compare>" + "$<$>:-Wno-error=sign-compare>") endif() set(TEST_ARGS) @@ -163,13 +165,15 @@ set(onnxruntime_test_framework_libs onnxruntime_mlas ) +set(onnxruntime_test_server_libs + onnxruntime_test_utils_for_framework + onnxruntime_test_utils_for_server +) if(WIN32) list(APPEND onnxruntime_test_framework_libs Advapi32) endif() - - set (onnxruntime_test_providers_dependencies ${onnxruntime_EXTERNAL_DEPENDENCIES}) if(onnxruntime_USE_CUDA) @@ -236,6 +240,9 @@ file(GLOB onnxruntime_test_framework_src CONFIGURE_DEPENDS #with auto initialize onnxruntime add_library(onnxruntime_test_utils_for_framework ${onnxruntime_test_utils_src}) onnxruntime_add_include_to_target(onnxruntime_test_utils_for_framework onnxruntime_framework gtest gsl onnx onnx_proto) +if (onnxruntime_USE_FULL_PROTOBUF) + target_compile_definitions(onnxruntime_test_utils_for_framework PRIVATE USE_FULL_PROTOBUF=1) +endif() if (onnxruntime_USE_MKLDNN) target_compile_definitions(onnxruntime_test_utils_for_framework PUBLIC USE_MKLDNN=1) endif() @@ -248,6 +255,9 @@ set_target_properties(onnxruntime_test_utils_for_framework PROPERTIES FOLDER "ON #without auto initialize onnxruntime add_library(onnxruntime_test_utils ${onnxruntime_test_utils_src}) onnxruntime_add_include_to_target(onnxruntime_test_utils onnxruntime_framework gtest gsl onnx onnx_proto) +if (onnxruntime_USE_FULL_PROTOBUF) + target_compile_definitions(onnxruntime_test_utils PRIVATE USE_FULL_PROTOBUF=1) +endif() if (onnxruntime_USE_MKLDNN) target_compile_definitions(onnxruntime_test_utils PUBLIC USE_MKLDNN=1) endif() @@ -368,7 +378,15 @@ if(WIN32) ${MKLML_LIB_DIR}/${MKLML_SHARED_LIB} ${MKLML_LIB_DIR}/${IOMP5MD_SHARED_LIB} $ ) - endif() + endif() + if (onnxruntime_USE_NGRAPH) + add_custom_command( + TARGET ${test_data_target} POST_BUILD + COMMAND ${CMAKE_COMMAND} -E copy_directory + ${ngraph_LIBRARIES}/ + $ + ) + endif() endif() add_library(onnx_test_data_proto ${TEST_SRC_DIR}/proto/tml.proto) @@ -403,6 +421,7 @@ set(onnx_test_runner_common_srcs ${onnx_test_runner_src_dir}/runner.cc ${onnx_test_runner_src_dir}/TestCase.cc ${onnx_test_runner_src_dir}/TestCase.h + ${onnx_test_runner_src_dir}/onnxruntime_event.h ${onnx_test_runner_src_dir}/sync_api.h ${onnx_test_runner_src_dir}/sync_api.cc) @@ -413,8 +432,6 @@ if(WIN32) set_target_properties(win_getopt_wide PROPERTIES FOLDER "ONNXRuntimeTest") set(onnx_test_runner_common_srcs ${onnx_test_runner_common_srcs}) set(GETOPT_LIB_WIDE win_getopt_wide) -else() - set(onnx_test_runner_common_srcs ${onnx_test_runner_common_srcs} ${onnx_test_runner_src_dir}/onnxruntime_event.h ${onnx_test_runner_src_dir}/simple_thread_pool.h) endif() add_library(onnx_test_runner_common ${onnx_test_runner_common_srcs}) @@ -557,6 +574,64 @@ if (onnxruntime_BUILD_SHARED_LIB) endif() endif() +if (onnxruntime_BUILD_SERVER) + file(GLOB onnxruntime_test_server_src + "${TEST_SRC_DIR}/server/unit_tests/*.cc" + "${TEST_SRC_DIR}/server/unit_tests/*.h" + ) + + file(GLOB onnxruntime_integration_test_server_src + "${TEST_SRC_DIR}/server/integration_tests/*.py" + ) + if(NOT WIN32) + if(HAS_UNUSED_PARAMETER) + set_source_files_properties("${TEST_SRC_DIR}/server/unit_tests/json_handling_tests.cc" PROPERTIES COMPILE_FLAGS -Wno-unused-parameter) + set_source_files_properties("${TEST_SRC_DIR}/server/unit_tests/converter_tests.cc" PROPERTIES COMPILE_FLAGS -Wno-unused-parameter) + set_source_files_properties("${TEST_SRC_DIR}/server/unit_tests/util_tests.cc" PROPERTIES COMPILE_FLAGS -Wno-unused-parameter) + set_source_files_properties("${TEST_SRC_DIR}/server/unit_tests/executor_test.cc" PROPERTIES COMPILE_FLAGS -Wno-unused-parameter) + endif() + endif() + + add_library(onnxruntime_test_utils_for_server ${onnxruntime_test_server_src}) + onnxruntime_add_include_to_target(onnxruntime_test_utils_for_server onnxruntime_test_utils_for_framework gtest gmock gsl onnx onnx_proto server_proto) + add_dependencies(onnxruntime_test_utils_for_server onnxruntime_server_lib onnxruntime_server_http_core_lib Boost ${onnxruntime_EXTERNAL_DEPENDENCIES}) + target_include_directories(onnxruntime_test_utils_for_server PUBLIC ${Boost_INCLUDE_DIR} ${REPO_ROOT}/cmake/external/re2 ${CMAKE_CURRENT_BINARY_DIR}/onnx ${ONNXRUNTIME_ROOT}/server/http ${ONNXRUNTIME_ROOT}/server/http/core PRIVATE ${ONNXRUNTIME_ROOT} ) + if(UNIX) + target_compile_options(onnxruntime_test_utils_for_server PRIVATE "$<$:SHELL:-Xcompiler -Wno-error=sign-compare>" + "$<$>:-Wno-error=sign-compare>") + endif() + target_link_libraries(onnxruntime_test_utils_for_server ${Boost_LIBRARIES}) + + + AddTest( + TARGET onnxruntime_server_tests + SOURCES ${onnxruntime_test_server_src} + LIBS ${onnxruntime_test_server_libs} server_proto onnxruntime_server_lib ${onnxruntime_test_providers_libs} + DEPENDS ${onnxruntime_EXTERNAL_DEPENDENCIES} + ) + + onnxruntime_protobuf_generate( + APPEND_PATH IMPORT_DIRS ${REPO_ROOT}/cmake/external/protobuf/src ${ONNXRUNTIME_ROOT}/server/protobuf ${ONNXRUNTIME_ROOT}/core/protobuf + PROTOS ${ONNXRUNTIME_ROOT}/server/protobuf/predict.proto ${ONNXRUNTIME_ROOT}/server/protobuf/onnx-ml.proto + LANGUAGE python + TARGET onnxruntime_server_tests + OUT_VAR server_test_py) + + add_custom_command( + TARGET onnxruntime_server_tests POST_BUILD + COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_BINARY_DIR}/server_test + COMMAND ${CMAKE_COMMAND} -E copy + ${onnxruntime_integration_test_server_src} + ${CMAKE_CURRENT_BINARY_DIR}/server_test/ + COMMAND ${CMAKE_COMMAND} -E copy + ${CMAKE_CURRENT_BINARY_DIR}/onnx_ml_pb2.py + ${CMAKE_CURRENT_BINARY_DIR}/server_test/ + COMMAND ${CMAKE_COMMAND} -E copy + ${CMAKE_CURRENT_BINARY_DIR}/predict_pb2.py + ${CMAKE_CURRENT_BINARY_DIR}/server_test/ + ) + +endif() add_executable(onnxruntime_mlas_test ${TEST_SRC_DIR}/mlas/unittest.cpp) target_include_directories(onnxruntime_mlas_test PRIVATE ${ONNXRUNTIME_ROOT}/core/mlas/inc) diff --git a/cmake/patches/ngraph/ngraph_onnx.cmake b/cmake/patches/ngraph/ngraph_onnx.cmake index b27cc16019879..6a1eba8b3bd98 100644 --- a/cmake/patches/ngraph/ngraph_onnx.cmake +++ b/cmake/patches/ngraph/ngraph_onnx.cmake @@ -3,8 +3,13 @@ set(ONNX_INCLUDE_DIR ${BINARY_DIR}) set(ONNX_SOURCE_INCLUDE_DIR "${prebuilt_ONNX_SOURCE_DIR}/onnx") include_directories("${ONNX_SOURCE_INCLUDE_DIR}") set(ONNX_PROTO_INCLUDE_DIR ${ONNX_INCLUDE_DIR}) -set(ONNX_LIBRARY ${BINARY_DIR}/libonnx.a) -set(ONNX_PROTO_LIBRARY ${BINARY_DIR}/libonnx_proto.a) +if (WIN32) + set(ONNX_LIBRARY ${BINARY_DIR}/onnx.lib) + set(ONNX_PROTO_LIBRARY ${BINARY_DIR}/onnx_proto.lib) +else() + set(ONNX_LIBRARY ${BINARY_DIR}/libonnx.a) + set(ONNX_PROTO_LIBRARY ${BINARY_DIR}/libonnx_proto.a) +endif() set(ONNX_LIBRARIES ${ONNX_LIBRARY} ${ONNX_PROTO_LIBRARY}) if (NOT TARGET onnx::libonnx) diff --git a/cmake/patches/ngraph/ngraph_protobuf.patch b/cmake/patches/ngraph/ngraph_protobuf.patch index 0736ef5aea518..ac824a4203ae2 100644 --- a/cmake/patches/ngraph/ngraph_protobuf.patch +++ b/cmake/patches/ngraph/ngraph_protobuf.patch @@ -1,13 +1,14 @@ diff --git a/cmake/external_protobuf.cmake b/cmake/external_protobuf.cmake -index 47977b3..1a66e1c 100644 +index 32217f5d..f6de5e76 100644 --- a/cmake/external_protobuf.cmake +++ b/cmake/external_protobuf.cmake @@ -23,7 +23,7 @@ include(ExternalProject) - + # This version of PROTOBUF is required by Microsoft ONNX Runtime. set(NGRAPH_PROTOBUF_GIT_REPO_URL "https://github.com/protocolbuffers/protobuf") -set(NGRAPH_PROTOBUF_GIT_TAG "v3.5.2") +set(NGRAPH_PROTOBUF_GIT_TAG "v3.6.1") + + if (WIN32) + ExternalProject_Add( - ExternalProject_Add( - ext_protobuf diff --git a/csharp/sample/Microsoft.ML.OnnxRuntime.InferenceSample/Microsoft.ML.OnnxRuntime.InferenceSample.csproj b/csharp/sample/Microsoft.ML.OnnxRuntime.InferenceSample/Microsoft.ML.OnnxRuntime.InferenceSample.csproj index 8a43842d729c2..009f9b520035b 100644 --- a/csharp/sample/Microsoft.ML.OnnxRuntime.InferenceSample/Microsoft.ML.OnnxRuntime.InferenceSample.csproj +++ b/csharp/sample/Microsoft.ML.OnnxRuntime.InferenceSample/Microsoft.ML.OnnxRuntime.InferenceSample.csproj @@ -3,7 +3,7 @@ Exe netcoreapp2.0 - AnyCPU + AnyCPU;x86 ..\.. $(OnnxRuntimeCsharpRoot)\..\build\Windows $(OnnxRuntimeBuildDirectory)\$(Configuration)\$(Configuration) diff --git a/csharp/src/Microsoft.ML.OnnxRuntime/DisposableNamedOnnxValue.cs b/csharp/src/Microsoft.ML.OnnxRuntime/DisposableNamedOnnxValue.cs index 187a44efcd187..ad69a4da746fb 100644 --- a/csharp/src/Microsoft.ML.OnnxRuntime/DisposableNamedOnnxValue.cs +++ b/csharp/src/Microsoft.ML.OnnxRuntime/DisposableNamedOnnxValue.cs @@ -76,7 +76,7 @@ internal static DisposableNamedOnnxValue CreateTensorFromOnnxValue(string name, TensorElementType elemType = TensorElementType.DataTypeMax; try { - NativeApiStatus.VerifySuccess(NativeMethods.OrtGetTensorShapeAndType(nativeOnnxValue, out typeAndShape)); + NativeApiStatus.VerifySuccess(NativeMethods.OrtGetTensorTypeAndShape(nativeOnnxValue, out typeAndShape)); elemType = NativeMethods.OrtGetTensorElementType(typeAndShape); } finally @@ -164,7 +164,7 @@ internal static DisposableNamedOnnxValue CreateFromOnnxValue(string name, IntPtr TensorElementType elemType = TensorElementType.DataTypeMax; NativeApiStatus.VerifySuccess(NativeMethods.OrtGetValue(nativeOnnxValue, 0, allocator, out nativeOnnxValueMapKeys)); NativeApiStatus.VerifySuccess(NativeMethods.OrtGetValue(nativeOnnxValue, 1, allocator, out nativeOnnxValueMapValues)); - NativeApiStatus.VerifySuccess(NativeMethods.OrtGetTensorShapeAndType(nativeOnnxValueMapKeys, out typeAndShape)); + NativeApiStatus.VerifySuccess(NativeMethods.OrtGetTensorTypeAndShape(nativeOnnxValueMapKeys, out typeAndShape)); elemType = NativeMethods.OrtGetTensorElementType(typeAndShape); if (typeAndShape != IntPtr.Zero) diff --git a/csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs b/csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs index fd5133534be78..847e23f399e75 100644 --- a/csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs +++ b/csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs @@ -15,7 +15,7 @@ namespace Microsoft.ML.OnnxRuntime /// /// Represents an Inference Session on an ONNX Model /// - public class InferenceSession: IDisposable + public class InferenceSession : IDisposable { protected IntPtr _nativeHandle; protected Dictionary _inputMetadata, _outputMetadata; @@ -54,24 +54,25 @@ public InferenceSession(string modelPath, SessionOptions options) _outputMetadata = new Dictionary(); // get input count - ulong inputCount = 0; + UIntPtr inputCount = UIntPtr.Zero; NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionGetInputCount(_nativeHandle, out inputCount)); // get all the output names - for (ulong i = 0; i < inputCount; i++) + for (ulong i = 0; i < (ulong)inputCount; i++) { - _inputMetadata[GetInputName(i)] = GetInputMetadata(i); + var iname = GetInputName(i); + _inputMetadata[iname] = GetInputMetadata(i); } - // get output count - ulong outputCount = 0; + UIntPtr outputCount = UIntPtr.Zero; NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionGetOutputCount(_nativeHandle, out outputCount)); // get all the output names - for (ulong i = 0; i < outputCount; i++) + for (ulong i = 0; i < (ulong)outputCount; i++) { _outputMetadata[GetOutputName(i)] = GetOutputMetadata(i); } + } catch (OnnxRuntimeException e) { @@ -88,7 +89,7 @@ public IReadOnlyDictionary InputMetadata { get { - return _inputMetadata; + return _inputMetadata; } } @@ -96,7 +97,7 @@ public IReadOnlyDictionary OutputMetadata { get { - return _outputMetadata; + return _outputMetadata; } } @@ -157,9 +158,9 @@ internal IDisposableReadOnlyCollection Run(IReadOnlyCo // Passing null uses the default run options in the C-api inputNames, inputTensors, - (ulong)(inputTensors.Length), /* TODO: size_t, make it portable for x86 arm */ + (UIntPtr)(inputTensors.Length), outputNamesArray, - (ulong)outputNames.Count, /* TODO: size_t, make it portable for x86 and arm */ + (UIntPtr)outputNames.Count, outputValueArray /* An array of output value pointers. Array must be allocated by the caller */ ); @@ -195,7 +196,7 @@ internal IDisposableReadOnlyCollection Run(IReadOnlyCo pinnedBufferHandles[i].Dispose(); } } - + } //TODO: kept internal until implemented @@ -215,9 +216,9 @@ private string GetOutputName(ulong index) IntPtr nameHandle = IntPtr.Zero; string str = null; - IntPtr status = NativeMethods.OrtSessionGetOutputName( + IntPtr status = NativeMethods.OrtSessionGetOutputName( _nativeHandle, - index, + (UIntPtr)index, NativeMemoryAllocator.DefaultInstance.Handle, out nameHandle); try @@ -225,7 +226,7 @@ private string GetOutputName(ulong index) NativeApiStatus.VerifySuccess(status); str = Marshal.PtrToStringAnsi(nameHandle); //assumes charset = ANSI } - finally + finally { if (nameHandle != IntPtr.Zero) { @@ -243,11 +244,12 @@ private string GetInputName(ulong index) IntPtr status = NativeMethods.OrtSessionGetInputName( _nativeHandle, - index, + (UIntPtr)index, NativeMemoryAllocator.DefaultInstance.Handle, out nameHandle); try { + NativeApiStatus.VerifySuccess(status); str = Marshal.PtrToStringAnsi(nameHandle); //assumes charset = ANSI } @@ -258,7 +260,6 @@ private string GetInputName(ulong index) NativeMemoryAllocator.DefaultInstance.FreeMemory(nameHandle); } } - return str; } @@ -268,7 +269,7 @@ private NodeMetadata GetInputMetadata(ulong index) IntPtr typeInfo = IntPtr.Zero; try { - NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionGetInputTypeInfo(_nativeHandle, index, out typeInfo)); + NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionGetInputTypeInfo(_nativeHandle, (UIntPtr)index, out typeInfo)); return GetMetadataFromTypeInfo(typeInfo); } finally @@ -285,7 +286,7 @@ private NodeMetadata GetOutputMetadata(ulong index) IntPtr typeInfo = IntPtr.Zero; try { - NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionGetOutputTypeInfo(_nativeHandle, index, out typeInfo)); + NativeApiStatus.VerifySuccess(NativeMethods.OrtSessionGetOutputTypeInfo(_nativeHandle, (UIntPtr)index, out typeInfo)); return GetMetadataFromTypeInfo(typeInfo); } finally @@ -305,7 +306,7 @@ internal static NodeMetadata GetMetadataFromTypeInfo(IntPtr typeInfo) return new NodeMetadata(valueType, new int[] { }, typeof(NamedOnnxValue)); } - IntPtr tensorInfo = NativeMethods.OrtCastTypeInfoToTensorInfo(typeInfo); + IntPtr tensorInfo = NativeMethods.OrtCastTypeInfoToTensorInfo(typeInfo); //(IntPtr)(int)(uint) // Convert the newly introduced OrtTypeInfo* to the older OrtTypeAndShapeInfo* if (tensorInfo == IntPtr.Zero) @@ -314,12 +315,12 @@ internal static NodeMetadata GetMetadataFromTypeInfo(IntPtr typeInfo) TensorElementType type = NativeMethods.OrtGetTensorElementType(tensorInfo); Type dotnetType = null; int width = 0; - TensorElementTypeConverter.GetTypeAndWidth(type, out dotnetType, out width); - ulong numDimensions = NativeMethods.OrtGetNumOfDimensions(tensorInfo); + TensorElementTypeConverter.GetTypeAndWidth(type, out dotnetType, out width); + var numDimensions = NativeMethods.OrtGetDimensionsCount(tensorInfo); long[] dimensions = new long[(int)numDimensions]; NativeMethods.OrtGetDimensions(tensorInfo, dimensions, numDimensions); int[] intDimensions = new int[(int)numDimensions]; - for (ulong i = 0; i < numDimensions; i++) + for (var i = 0; i < (long)numDimensions; i++) { intDimensions[i] = (int)dimensions[i]; } @@ -410,7 +411,7 @@ public bool IsTensor } - internal class ModelMetadata + internal class ModelMetadata { //TODO: placeholder for Model metadata. Currently C-API does not expose this. } diff --git a/csharp/src/Microsoft.ML.OnnxRuntime/Microsoft.ML.OnnxRuntime.csproj b/csharp/src/Microsoft.ML.OnnxRuntime/Microsoft.ML.OnnxRuntime.csproj index 16464d29fd123..f8c87adb996fc 100644 --- a/csharp/src/Microsoft.ML.OnnxRuntime/Microsoft.ML.OnnxRuntime.csproj +++ b/csharp/src/Microsoft.ML.OnnxRuntime/Microsoft.ML.OnnxRuntime.csproj @@ -2,7 +2,7 @@ netstandard1.1 - AnyCPU + AnyCPU;x86 true true false diff --git a/csharp/src/Microsoft.ML.OnnxRuntime/NamedOnnxValue.cs b/csharp/src/Microsoft.ML.OnnxRuntime/NamedOnnxValue.cs index 07c00943fde74..dfbd5e4899577 100644 --- a/csharp/src/Microsoft.ML.OnnxRuntime/NamedOnnxValue.cs +++ b/csharp/src/Microsoft.ML.OnnxRuntime/NamedOnnxValue.cs @@ -183,18 +183,18 @@ out nativeElementType Debug.Assert(dataBufferPointer != IntPtr.Zero, "dataBufferPointer must be non-null after obtaining the pinned buffer"); // copy to an ulong[] shape to match size_t[] - ulong[] longShape = new ulong[rank]; + long[] longShape = new long[rank]; for (int i = 0; i < rank; i++) { - longShape[i] = (ulong)shape[i]; + longShape[i] = shape[i]; } IntPtr status = NativeMethods.OrtCreateTensorWithDataAsOrtValue( NativeMemoryAllocatorInfo.DefaultInstance.Handle, dataBufferPointer, - (ulong)(dataBufferLength), + (UIntPtr)(dataBufferLength), longShape, - (ulong)rank, + (UIntPtr)rank, nativeElementType, out onnxValue ); diff --git a/csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.cs b/csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.cs index 68da5be40677d..a7ee28c96c808 100644 --- a/csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.cs +++ b/csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.cs @@ -57,9 +57,9 @@ internal static class NativeMethods IntPtr /*(OrtSessionRunOptions*)*/ runOptions, // can be null to use the default options string[] inputNames, IntPtr[] /* (OrtValue*[])*/ inputValues, - ulong inputCount, /* TODO: size_t, make it portable for x86 arm */ + UIntPtr inputCount, string[] outputNames, - ulong outputCount, /* TODO: size_t, make it portable for x86 and arm */ + UIntPtr outputCount, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 5 /*index of outputCount*/)][In, Out] IntPtr[] outputValues /* An array of output value pointers. Array must be allocated by the caller */ @@ -69,25 +69,25 @@ IntPtr[] outputValues /* An array of output value pointers. Array must be alloca [DllImport(nativeLib, CharSet = charSet)] public static extern IntPtr /*(OrtStatus*)*/ OrtSessionGetInputCount( IntPtr /*(OrtSession*)*/ session, - out ulong /* TODO: size_t */ count); + out UIntPtr count); [DllImport(nativeLib, CharSet = charSet)] public static extern IntPtr /*(OrtStatus*)*/ OrtSessionGetOutputCount( IntPtr /*(OrtSession*)*/ session, - out ulong /*TODO: size_t port*/ count); + out UIntPtr count); [DllImport(nativeLib, CharSet = charSet)] public static extern IntPtr /*(OrtStatus*)*/OrtSessionGetInputName( IntPtr /*(OrtSession*)*/ session, - ulong index, //TODO: port size_t + UIntPtr index, IntPtr /*(OrtAllocator*)*/ allocator, out IntPtr /*(char**)*/name); [DllImport(nativeLib, CharSet = charSet)] public static extern IntPtr /*(OrtStatus*)*/OrtSessionGetOutputName( IntPtr /*(OrtSession*)*/ session, - ulong index, //TODO: port size_t + UIntPtr index, IntPtr /*(OrtAllocator*)*/ allocator, out IntPtr /*(char**)*/name); @@ -95,14 +95,14 @@ IntPtr[] outputValues /* An array of output value pointers. Array must be alloca [DllImport(nativeLib, CharSet = charSet)] public static extern IntPtr /*(OrtStatus*)*/OrtSessionGetInputTypeInfo( IntPtr /*(const OrtSession*)*/ session, - ulong index, //TODO: port for size_t + UIntPtr index, out IntPtr /*(struct OrtTypeInfo**)*/ typeInfo); // release the typeinfo using OrtReleaseTypeInfo [DllImport(nativeLib, CharSet = charSet)] public static extern IntPtr /*(OrtStatus*)*/OrtSessionGetOutputTypeInfo( IntPtr /*(const OrtSession*)*/ session, - ulong index, //TODO: port for size_t + UIntPtr index, out IntPtr /* (struct OrtTypeInfo**)*/ typeInfo); [DllImport(nativeLib, CharSet = charSet)] @@ -265,9 +265,9 @@ public enum MemoryType public static extern IntPtr /* OrtStatus */ OrtCreateTensorWithDataAsOrtValue( IntPtr /* (const OrtAllocatorInfo*) */ allocatorInfo, IntPtr /* (void*) */dataBufferHandle, - ulong dataLength, //size_t, TODO: make it portable for x86, arm - ulong[] shape, //size_t* or size_t[], TODO: make it portable for x86, arm - ulong shapeLength, //size_t, TODO: make it portable for x86, arm + UIntPtr dataLength, + long[] shape, + UIntPtr shapeLength, TensorElementType type, out IntPtr /* OrtValue** */ outputValue); @@ -280,19 +280,20 @@ public enum MemoryType public static extern IntPtr /*(OrtStatus*)*/ OrtGetStringTensorContent( IntPtr /*(OrtValue*)*/ value, IntPtr /*(void*)*/ dst_buffer, - ulong dst_buffer_len, //size_t, TODO: make it portable for x86, arm + UIntPtr dst_buffer_len, IntPtr offsets, - ulong offsets_len); //size_t, TODO: make it portable for x86, arm + UIntPtr offsets_len); [DllImport(nativeLib, CharSet = charSet)] - public static extern IntPtr /*(OrtStatus*)*/ OrtGetStringTensorDataLength(IntPtr /*(OrtValue*)*/ value, out ulong /*(size_t*)*/ len); + public static extern IntPtr /*(OrtStatus*)*/ OrtGetStringTensorDataLength(IntPtr /*(OrtValue*)*/ value, + out UIntPtr /*(size_t*)*/ len); [DllImport(nativeLib, CharSet = charSet)] public static extern IntPtr /*(const struct OrtTensorTypeAndShapeInfo*)*/ OrtCastTypeInfoToTensorInfo(IntPtr /*(struct OrtTypeInfo*)*/ typeInfo); [DllImport(nativeLib, CharSet = charSet)] - public static extern IntPtr /*(OrtStatus*)*/ OrtGetTensorShapeAndType(IntPtr /*(OrtValue*)*/ value, out IntPtr /*(struct OrtTensorTypeAndShapeInfo*)*/ typeAndShapeInfo); + public static extern IntPtr /*(OrtStatus*)*/ OrtGetTensorTypeAndShape(IntPtr /*(OrtValue*)*/ value, out IntPtr /*(struct OrtTensorTypeAndShapeInfo*)*/ typeAndShapeInfo); [DllImport(nativeLib, CharSet = charSet)] @@ -302,13 +303,13 @@ public enum MemoryType public static extern TensorElementType OrtGetTensorElementType(IntPtr /*(const struct OrtTensorTypeAndShapeInfo*)*/ typeAndShapeInfo); [DllImport(nativeLib, CharSet = charSet)] - public static extern ulong /*TODO: port for size_t */OrtGetNumOfDimensions(IntPtr /*(const struct OrtTensorTypeAndShapeInfo*)*/ typeAndShapeInfo); + public static extern UIntPtr OrtGetDimensionsCount(IntPtr /*(const struct OrtTensorTypeAndShapeInfo*)*/ typeAndShapeInfo); [DllImport(nativeLib, CharSet = charSet)] public static extern void OrtGetDimensions( IntPtr /*(const struct OrtTensorTypeAndShapeInfo*)*/ typeAndShapeInfo, long[] dim_values, - ulong dim_values_length); + UIntPtr dim_values_length); /** * How many elements does this tensor have. diff --git a/csharp/src/Microsoft.ML.OnnxRuntime/NativeOnnxTensorMemory.cs b/csharp/src/Microsoft.ML.OnnxRuntime/NativeOnnxTensorMemory.cs index 10dc1b4701740..96b54ded178db 100644 --- a/csharp/src/Microsoft.ML.OnnxRuntime/NativeOnnxTensorMemory.cs +++ b/csharp/src/Microsoft.ML.OnnxRuntime/NativeOnnxTensorMemory.cs @@ -32,7 +32,7 @@ public NativeOnnxTensorMemory(IntPtr onnxValueHandle, bool isStringTensor = fals int width = 0; _onnxValueHandle = onnxValueHandle; - NativeApiStatus.VerifySuccess(NativeMethods.OrtGetTensorShapeAndType(onnxValueHandle, out typeAndShape)); + NativeApiStatus.VerifySuccess(NativeMethods.OrtGetTensorTypeAndShape(onnxValueHandle, out typeAndShape)); TensorElementType elemType = NativeMethods.OrtGetTensorElementType(typeAndShape); TensorElementTypeConverter.GetTypeAndWidth(elemType, out type, out width); @@ -41,7 +41,7 @@ public NativeOnnxTensorMemory(IntPtr onnxValueHandle, bool isStringTensor = fals _elementWidth = width; - ulong dimension = NativeMethods.OrtGetNumOfDimensions(typeAndShape); + var dimension = NativeMethods.OrtGetDimensionsCount(typeAndShape).ToUInt64(); long count = NativeMethods.OrtGetTensorShapeElementCount(typeAndShape); // count can be negative. if (count < 0) { @@ -49,7 +49,7 @@ public NativeOnnxTensorMemory(IntPtr onnxValueHandle, bool isStringTensor = fals } long[] shape = new long[dimension]; - NativeMethods.OrtGetDimensions(typeAndShape, shape, dimension); //Note: shape must be alive during the call + NativeMethods.OrtGetDimensions(typeAndShape, shape, (UIntPtr) dimension); //Note: shape must be alive during the call _elementCount = (int)count; _dimensions = new int[dimension]; @@ -66,15 +66,15 @@ public NativeOnnxTensorMemory(IntPtr onnxValueHandle, bool isStringTensor = fals { if (typeof(T) != typeof(byte)) throw new NotSupportedException(nameof(NativeOnnxTensorMemory) + " T = " + nameof(T) + ". Should = byte, when isStringTensor is true"); - ulong strLen; - var offsets = new ulong[_elementCount]; + UIntPtr strLen; + var offsets = new UIntPtr[_elementCount]; NativeApiStatus.VerifySuccess(NativeMethods.OrtGetStringTensorDataLength(_onnxValueHandle, out strLen)); - var dataBuffer = new byte[strLen]; + var dataBuffer = new byte[strLen.ToUInt64()]; var dataBufferMemory = new Memory(dataBuffer); var dataBufferHandle = dataBufferMemory.Pin(); IntPtr dataBufferPointer = IntPtr.Zero; - var offsetMemory = new Memory(offsets); + var offsetMemory = new Memory(offsets); var offsetMemoryHandle = offsetMemory.Pin(); IntPtr offsetBufferPointer = IntPtr.Zero; unsafe @@ -82,15 +82,15 @@ public NativeOnnxTensorMemory(IntPtr onnxValueHandle, bool isStringTensor = fals dataBufferPointer = (IntPtr)dataBufferHandle.Pointer; offsetBufferPointer = (IntPtr)offsetMemoryHandle.Pointer; } - NativeApiStatus.VerifySuccess(NativeMethods.OrtGetStringTensorContent(_onnxValueHandle, dataBufferPointer, strLen, offsetBufferPointer, Convert.ToUInt64(_elementCount))); + NativeApiStatus.VerifySuccess(NativeMethods.OrtGetStringTensorContent(_onnxValueHandle, dataBufferPointer, strLen, offsetBufferPointer, (UIntPtr)_elementCount)); _dataBufferPointer = dataBufferPointer; _dataBufferAsString = new string[_elementCount]; for (var i = 0; i < offsets.Length; i++) { var length = (i == offsets.Length - 1) - ? strLen - offsets[i] - : offsets[i + 1] - offsets[i]; + ? strLen.ToUInt64() - offsets[i].ToUInt64() + : offsets[i + 1].ToUInt64() - offsets[i].ToUInt64(); // Onnx specifies strings always in UTF-8, no trailing null, no leading BOM _dataBufferAsString[i] = Encoding.UTF8.GetString(dataBuffer, (int)offsets[i], (int)length); } diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp index 92295326b7351..4f69cecf49350 100644 --- a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp +++ b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp @@ -44,7 +44,12 @@ int main(int argc, char* argv[]) { // using squeezenet version 1.3 // URL = https://github.com/onnx/models/tree/master/squeezenet OrtSession* session; +#ifdef _WIN32 const wchar_t* model_path = L"squeezenet.onnx"; +#else + const char* model_path = "squeezenet.onnx"; +#endif + CHECK_STATUS(OrtCreateSession(env, model_path, session_options, &session)); //************************************************************************* @@ -52,7 +57,7 @@ int main(int argc, char* argv[]) { size_t num_input_nodes; OrtStatus* status; OrtAllocator* allocator; - OrtCreateDefaultAllocator(&allocator); + CHECK_STATUS(OrtCreateDefaultAllocator(&allocator)); // print number of model input nodes status = OrtSessionGetInputCount(session, &num_input_nodes); @@ -78,7 +83,7 @@ int main(int argc, char* argv[]) { printf("Input %zu : type=%d\n", i, type); // print input shapes/dims - size_t num_dims = OrtGetNumOfDimensions(tensor_info); + size_t num_dims = OrtGetDimensionsCount(tensor_info); printf("Input %zu : num_dims=%zu\n", i, num_dims); input_node_dims.resize(num_dims); OrtGetDimensions(tensor_info, (int64_t*)input_node_dims.data(), num_dims); @@ -132,7 +137,7 @@ int main(int argc, char* argv[]) { // Get pointer to output tensor float values float* floatarr; - OrtGetTensorMutableData(output_tensor, (void**)&floatarr); + CHECK_STATUS(OrtGetTensorMutableData(output_tensor, (void**)&floatarr)); assert(abs(floatarr[0] - 0.000045) < 1e-6); // score the model, and print scores for first 5 classes diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.vcxproj b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.vcxproj index 6beff3fbdffa0..a250d3c3cb5d5 100644 --- a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.vcxproj +++ b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.vcxproj @@ -15,6 +15,14 @@ Release x64 + + + Debug + x86 + + + Release + x86 @@ -24,13 +32,13 @@ MicrosoftMLOnnxRuntimeEndToEndTestsRunCapi - + Application true v141 Unicode - + Application false v141 @@ -48,6 +56,12 @@ + + + + + + true @@ -55,7 +69,13 @@ false - + + true + + + false + + NotUsing Level3 @@ -69,7 +89,7 @@ true - + NotUsing Level3 diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/runtest.bat b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/runtest.bat index a3f933b13dd96..a570cabe686c7 100644 --- a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/runtest.bat +++ b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/runtest.bat @@ -1,52 +1,56 @@ REM Copyright (c) Microsoft Corporation. All rights reserved. REM Licensed under the MIT License. -echo on +ECHO on -set LocalNuGetRepo=%1 -setlocal enableextensions disabledelayedexpansion +SET LocalNuGetRepo=%1 +SET TargetArch=x64 +IF NOT "%2"=="" (SET TargetArch=%2) + +SETLOCAL enableextensions disabledelayedexpansion REM WorkingDirectory is Build.SourcesDirectory\csharp -set /p MajorVersionNumber=<..\VERSION_NUMBER -set VersionSuffix= +SET /p MajorVersionNumber=<..\VERSION_NUMBER +SET VersionSuffix= IF NOT DEFINED IsReleaseBuild ( FOR /F "tokens=* USEBACKQ" %%F IN (`git rev-parse --short HEAD`) DO ( - set VersionSuffix=-dev-%%F + SET VersionSuffix=-dev-%%F ) ) -set CurrentOnnxRuntimeVersion=%MajorVersionNumber%%VersionSuffix% -@echo %CurrentOnnxRuntimeVersion% +SET CurrentOnnxRuntimeVersion=%MajorVersionNumber%%VersionSuffix% +SET CurrentOnnxRuntimeVersion=0.4.0-dev-f39a8d1f +@ECHO %CurrentOnnxRuntimeVersion% -pushd test\Microsoft.ML.OnnxRuntime.EndToEndTests.Capi +PUSHD test\Microsoft.ML.OnnxRuntime.EndToEndTests.Capi -REM Set up VS envvars +REM SET up VS envvars REM call "C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat" REM Generate packages.config with version -echo off -set "token=CurrentOnnxRuntimeVersion" -set "replace=%CurrentOnnxRuntimeVersion%" -set "templateFile=packages.conf" +ECHO off +SET "token=CurrentOnnxRuntimeVersion" +SET "replace=%CurrentOnnxRuntimeVersion%" +SET "templateFile=packages.conf" for /f "delims=" %%i in ('type "%templateFile%" ^& break ^> "packages.config" ') do ( - set "line=%%i" - setlocal enabledelayedexpansion - >>"packages.config" echo(!line:%token%=%replace%! - endlocal + SET "line=%%i" + SETLOCAL enabledelayedexpansion + >>"packages.config" ECHO(!line:%token%=%replace%! + ENDLOCAL ) -echo on +ECHO on REM Update project file with package name (e.g. Microsoft.ML.OnnxRuntime.Gpu) IF "%PackageName%"=="" goto :skip -@echo off +@ECHO off SETLOCAL EnableExtensions DisableDelayedExpansion SET "search="Microsoft.ML.OnnxRuntime"" SET "replace="%PackageName%"" SET "projfile="packages.config"" FOR /f "delims=" %%i in ('type "packages.config" ^& break ^> "packages.config" ') do ( - set "line=%%i" - setlocal enabledelayedexpansion - >>"packages.config" echo(!line:%search%=%replace%! - endlocal + SET "line=%%i" + SETLOCAL enabledelayedexpansion + >>"packages.config" ECHO(!line:%search%=%replace%! + ENDLOCAL ) :skip @@ -54,32 +58,38 @@ FOR /f "delims=" %%i in ('type "packages.config" ^& break ^> "packages.config" ' REM Restore NuGet Packages nuget restore -PackagesDirectory ..\packages -Source %LocalNuGetRepo% Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.vcxproj if NOT %ERRORLEVEL% EQU 0 ( - echo "Error:Nuget restore failed" - popd + ECHO "Error:Nuget restore failed" + POPD EXIT /B 1 ) + +IF "%TargetArch%"=="x86" ( + SET OutputDir="Debug" +) ELSE ( + SET OutputDir="x64\Debug" +) + REM Build Native project -msbuild Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.vcxproj +msbuild /p:Platform=%TargetArch% Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.vcxproj if NOT %ERRORLEVEL% EQU 0 ( - echo "Error:MSBuild failed to compile project" - popd + ECHO "Error:MSBuild failed to compile project" + POPD EXIT /B 1 ) - REM Run Unit Tests -pushd x64\Debug +PUSHD %OutputDir% REM vstest.console.exe /platform:x64 Microsoft.ML.OnnxRuntime.EndToEndTests.Capi.dll .\Microsoft.ML.OnnxRuntime.EndToEndTests.RunCapi.exe if NOT %ERRORLEVEL% EQU 0 ( - echo "Unit test failure: %ERRORLEVEL%" - popd - popd + ECHO "Unit test failure: %ERRORLEVEL%" + POPD + POPD EXIT /B 1 ) -popd -popd +POPD +POPD EXIT /B 0 diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest-docker.sh b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest-docker.sh index 9454ffbeaf103..f7bb480674a6b 100755 --- a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest-docker.sh +++ b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest-docker.sh @@ -7,17 +7,24 @@ #TODO: Get this working, not tested yet set -x - - SOURCE_ROOT=$1 BUILD_DIR=$2 NUGET_REPO_DIRNAME=$3 # path relative to BUILD_DIR -IMAGE="ubuntu16.04" +Arch=${4:-x64} # x32, x64 +PackageName=${PackageName:-Microsoft.ML.OnnxRuntime} +RunTestCsharp=${RunTestCsharp:-true} +RunTestNative=${RunTestNative:-true} PYTHON_VER=3.5 +IMAGE="ubuntu16.04_$Arch" + OldDir=$(pwd) -cd $SOURCE_ROOT/tools/ci_build/github/linux/docker -docker build -t "onnxruntime-$IMAGE" --build-arg OS_VERSION=16.04 --build-arg PYTHON_VERSION=${PYTHON_VER} -f Dockerfile.ubuntu . +cd $SOURCE_ROOT/tools/ci_build/github/linux/docker +if [ $Arch = "x86" ]; then + docker build -t "onnxruntime-$IMAGE" --build-arg OS_VERSION=16.04 --build-arg PYTHON_VERSION=${PYTHON_VER} -f Dockerfile.ubuntu_x86 . +else + docker build -t "onnxruntime-$IMAGE" --build-arg OS_VERSION=16.04 --build-arg PYTHON_VERSION=${PYTHON_VER} -f Dockerfile.ubuntu . +fi docker rm -f "onnxruntime-cpu" || true @@ -33,6 +40,8 @@ docker run -h $HOSTNAME \ -e "IsReleaseBuild=$IsReleaseBuild" \ -e "PackageName=$PackageName" \ -e "DisableContribOps=$DisableContribOps" \ + -e "RunTestCsharp=$RunTestCsharp" \ + -e "RunTestNative=$RunTestNative" \ "onnxruntime-$IMAGE" \ /bin/bash /onnxruntime_src/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.sh \ /home/onnxruntimedev/$NUGET_REPO_DIRNAME /onnxruntime_src /home/onnxruntimedev $TestDataUrl $TestDataChecksum & @@ -43,6 +52,4 @@ EXIT_CODE=$? set -e exit $EXIT_CODE - - cd $OldDir diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.bat b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.bat index 72b4e78fdcff0..154d44f32bd2b 100755 --- a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.bat +++ b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.bat @@ -1,31 +1,44 @@ REM Copyright (c) Microsoft Corporation. All rights reserved. REM Licensed under the MIT License. -@echo off +@ECHO ON SETLOCAL EnableDelayedExpansion -set LocalNuGetRepo=%1 -IF "%2"=="" (SET TargetFramework=netcoreapp2.1) ELSE (SET TargetFramework=%2) +SET TargetFramework=netcoreapp2.1 +SET TargetArch=x64 +SET dn="C:\Program Files\dotnet\dotnet" + +SET LocalNuGetRepo=%1 +IF NOT "%2"=="" (SET TargetFramework=%2) +IF NOT "%3"=="" (SET TargetArch=%3) + +IF "%TargetArch%"=="x86" ( + SET dn="C:\Program Files (x86)\dotnet\dotnet" + SET RuntimeIdentifier=win-x86 + SET PlatformTarget=x86 +) + ECHO Target Framework is %TargetFramework% REM WorkingDirectory is Build.SourcesDirectory\csharp -set /p MajorVersionNumber=<..\VERSION_NUMBER -set VersionSuffix= +SET /p MajorVersionNumber=<..\VERSION_NUMBER +SET VersionSuffix= IF NOT DEFINED IsReleaseBuild ( FOR /F "tokens=* USEBACKQ" %%F IN (`git rev-parse --short HEAD`) DO ( set VersionSuffix=-dev-%%F ) ) -set CurrentOnnxRuntimeVersion=%MajorVersionNumber%%VersionSuffix% +SET CurrentOnnxRuntimeVersion=%MajorVersionNumber%%VersionSuffix% + @echo %CurrentOnnxRuntimeVersion% -dotnet restore test\Microsoft.ML.OnnxRuntime.EndToEndTests\Microsoft.ML.OnnxRuntime.EndToEndTests.csproj -s %LocalNuGetRepo% --configfile .\Nuget.CSharp.config -if NOT errorlevel 0 ( +%dn% restore test\Microsoft.ML.OnnxRuntime.EndToEndTests\Microsoft.ML.OnnxRuntime.EndToEndTests.csproj -s %LocalNuGetRepo% --configfile .\Nuget.CSharp.config +IF NOT errorlevel 0 ( @echo "Failed to restore nuget packages for the test project" - Exit 1 + EXIT 1 ) -dotnet test test\Microsoft.ML.OnnxRuntime.EndToEndTests\Microsoft.ML.OnnxRuntime.EndToEndTests.csproj --no-restore -if NOT errorlevel 0 ( +%dn% test test\Microsoft.ML.OnnxRuntime.EndToEndTests\Microsoft.ML.OnnxRuntime.EndToEndTests.csproj --no-restore +IF NOT errorlevel 0 ( @echo "Failed to build or execute the end-to-end test" - Exit 1 + EXIT 1 ) diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.sh b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.sh index a05a9871bdc78..62001ae297f5a 100755 --- a/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.sh +++ b/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/runtest.sh @@ -28,16 +28,44 @@ fi export CurrentOnnxRuntimeVersion=$MajorVersion$VersionSuffix echo "Current NuGet package version is $CurrentOnnxRuntimeVersion" -dotnet restore $SourceRoot/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/Microsoft.ML.OnnxRuntime.EndToEndTests.csproj -s $LocalNuGetRepo -s https://api.nuget.org/v3/index.json -if [ $? -ne 0 ]; then +if [ $RunTestCsharp = "true" ]; then + # Run C# tests + dotnet restore $SourceRoot/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/Microsoft.ML.OnnxRuntime.EndToEndTests.csproj -s $LocalNuGetRepo -s https://api.nuget.org/v3/index.json + if [ $? -ne 0 ]; then echo "Failed to restore nuget packages for the test project" exit 1 -fi + fi -dotnet test $SourceRoot/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/Microsoft.ML.OnnxRuntime.EndToEndTests.csproj --no-restore --verbosity detailed -if [ $? -ne 0 ]; then + dotnet test $SourceRoot/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests/Microsoft.ML.OnnxRuntime.EndToEndTests.csproj --no-restore --verbosity detailed + if [ $? -ne 0 ]; then echo "Failed to build or execute the end-to-end test" exit 1 + fi +fi + +if [ $RunTestNative = "true" ]; then + # Run Native shared object test + # PackageName is passed in environment (e.g. Microsoft.ML.OnnxRuntime) + PackageName="$PackageName.$CurrentOnnxRuntimeVersion.nupkg" + cd $LocalNuGetRepo + TempDir=_tmp + mkdir -p $TempDir && pushd $TempDir + unzip ../$PackageName + libs="-L runtimes/linux-x86/native -L runtimes/linux-x64/native -l onnxruntime" + inc="-I build/native/include" + g++ -std=c++14 $SourceRoot/csharp/test/Microsoft.ML.OnnxRuntime.EndToEndTests.Capi/C_Api_Sample.cpp $inc $libs -Wunused-result -o sampletest + + # Create link to versioned shared object required at runtime + libname=`ldd sampletest | grep onnxruntime | xargs | cut -d" " -f1` + ln -sf runtimes/linux-x64/native/libonnxruntime.so $libname + + # Copy Sample Model + cp $SourceRoot/csharp/testdata/squeezenet.onnx . + + # Run the sample model + ./sampletest + popd + rm -rf $TempDir fi +cd $OldDir -cd $OldDir \ No newline at end of file diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.Tests/InferenceTest.cs b/csharp/test/Microsoft.ML.OnnxRuntime.Tests/InferenceTest.cs index c57a72deaccb7..f0d4362b0a7d7 100644 --- a/csharp/test/Microsoft.ML.OnnxRuntime.Tests/InferenceTest.cs +++ b/csharp/test/Microsoft.ML.OnnxRuntime.Tests/InferenceTest.cs @@ -586,11 +586,20 @@ private void TestModelSequenceOfMapIntFloat() var seq = outNode2.AsEnumerable(); // try-cast first element in sequence to map/dictionary type - var map = seq.First().AsDictionary(); - //verify values are valid - Assert.Equal(0.25938290, map[0], 6); - Assert.Equal(0.40904793, map[1], 6); - Assert.Equal(0.33156919, map[2], 6); + if (System.Environment.Is64BitProcess) + { + var map = seq.First().AsDictionary(); + Assert.Equal(0.25938290, map[0], 6); + Assert.Equal(0.40904793, map[1], 6); + Assert.Equal(0.33156919, map[2], 6); + } + else // 32-bit + { + var map = seq.First().AsDictionary(); + Assert.Equal(0.25938290, map[0], 6); + Assert.Equal(0.40904793, map[1], 6); + Assert.Equal(0.33156919, map[2], 6); + } } } } @@ -687,7 +696,7 @@ private void VerifyNativeMethodsExist() "OrtSessionOptionsAppendExecutionProvider_CPU","OrtCreateAllocatorInfo","OrtCreateCpuAllocatorInfo", "OrtCreateDefaultAllocator","OrtAllocatorFree","OrtAllocatorGetInfo", "OrtCreateTensorWithDataAsOrtValue","OrtGetTensorMutableData", "OrtReleaseAllocatorInfo", - "OrtCastTypeInfoToTensorInfo","OrtGetTensorShapeAndType","OrtGetTensorElementType","OrtGetNumOfDimensions", + "OrtCastTypeInfoToTensorInfo","OrtGetTensorTypeAndShape","OrtGetTensorElementType","OrtGetDimensionsCount", "OrtGetDimensions","OrtGetTensorShapeElementCount","OrtReleaseValue"}; var hModule = LoadLibrary(module); diff --git a/csharp/test/Microsoft.ML.OnnxRuntime.Tests/Microsoft.ML.OnnxRuntime.Tests.csproj b/csharp/test/Microsoft.ML.OnnxRuntime.Tests/Microsoft.ML.OnnxRuntime.Tests.csproj index ec8404b499773..0f307ae7680f2 100644 --- a/csharp/test/Microsoft.ML.OnnxRuntime.Tests/Microsoft.ML.OnnxRuntime.Tests.csproj +++ b/csharp/test/Microsoft.ML.OnnxRuntime.Tests/Microsoft.ML.OnnxRuntime.Tests.csproj @@ -4,7 +4,7 @@ netcoreapp2.0 false ..\.. - AnyCPU + AnyCPU;x86 bin\$(Configuration)\ $(OnnxRuntimeCsharpRoot)\..\build\Windows $(OnnxRuntimeBuildDirectory)\$(Configuration)\external\protobuf\cmake\$(Configuration) diff --git a/csharp/tools/Microsoft.ML.OnnxRuntime.PerfTool/Microsoft.ML.OnnxRuntime.PerfTool.csproj b/csharp/tools/Microsoft.ML.OnnxRuntime.PerfTool/Microsoft.ML.OnnxRuntime.PerfTool.csproj index 07ebb0e2a5b08..6f15a4d49c59e 100644 --- a/csharp/tools/Microsoft.ML.OnnxRuntime.PerfTool/Microsoft.ML.OnnxRuntime.PerfTool.csproj +++ b/csharp/tools/Microsoft.ML.OnnxRuntime.PerfTool/Microsoft.ML.OnnxRuntime.PerfTool.csproj @@ -2,7 +2,7 @@ Exe - AnyCPU + AnyCPU;x86 netcoreapp2.0 ..\.. $(OnnxRuntimeCsharpRoot)\..\build\Windows diff --git a/dockerfiles/Dockerfile.arm32v7 b/dockerfiles/Dockerfile.arm32v7 index c272261285d9b..f1f45dfce5301 100644 --- a/dockerfiles/Dockerfile.arm32v7 +++ b/dockerfiles/Dockerfile.arm32v7 @@ -23,10 +23,10 @@ RUN pip3 install numpy # Build the latest cmake WORKDIR /code -RUN wget https://cmake.org/files/v3.12/cmake-3.12.3.tar.gz; -RUN tar zxf cmake-3.12.3.tar.gz +RUN wget https://github.com/Kitware/CMake/releases/download/v3.14.3/cmake-3.14.3.tar.gz +RUN tar zxf cmake-3.14.3.tar.gz -WORKDIR /code/cmake-3.12.3 +WORKDIR /code/cmake-3.14.3 RUN ./configure --system-curl RUN make RUN sudo make install @@ -53,4 +53,4 @@ RUN ./build.sh ${BUILDARGS} --enable_pybind --build_wheel RUN ls -l /code/onnxruntime/build/Linux/${BUILDTYPE}/*.so RUN ls -l /code/onnxruntime/build/Linux/${BUILDTYPE}/dist/*.whl -RUN [ "cross-build-end" ] \ No newline at end of file +RUN [ "cross-build-end" ] diff --git a/dockerfiles/Dockerfile.cpu b/dockerfiles/Dockerfile.cpu deleted file mode 100644 index 3691a77fcbc5f..0000000000000 --- a/dockerfiles/Dockerfile.cpu +++ /dev/null @@ -1,23 +0,0 @@ -#------------------------------------------------------------------------- -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. -#-------------------------------------------------------------------------- -# Official user quickstart docker container for ONNX Runtime -# Ubuntu 16.04, CPU version, Python 3. -#-------------------------------------------------------------------------- - -FROM ubuntu:16.04 -MAINTAINER Vinitra Swamy "viswamy@microsoft.com" - -RUN apt-get update && \ - apt-get install -y sudo \ - build-essential curl \ - libcurl4-openssl-dev \ - libssl-dev wget \ - python 3.6 python3-pip \ - python3-dev git -RUN pip3 install --upgrade pip -RUN pip3 install numpy onnx - -RUN pip3 install onnxruntime -WORKDIR /code diff --git a/dockerfiles/Dockerfile.gpu b/dockerfiles/Dockerfile.gpu deleted file mode 100644 index 1f1fc63bf9302..0000000000000 --- a/dockerfiles/Dockerfile.gpu +++ /dev/null @@ -1,23 +0,0 @@ -#------------------------------------------------------------------------- -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. -#-------------------------------------------------------------------------- -# Official user quickstart nvidia-docker container for ONNX Runtime GPU -# Ubuntu 16.04, GPU version, CuDNN 7, CUDA 10, Python 3. -#-------------------------------------------------------------------------- - -FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04 -MAINTAINER Vinitra Swamy "viswamy@microsoft.com" - -RUN apt-get update && \ - apt-get install -y sudo \ - build-essential curl \ - libcurl4-openssl-dev \ - libssl-dev wget \ - python 3.6 python3-pip \ - python3-dev git -RUN pip3 install --upgrade pip -RUN pip3 install numpy onnx - -RUN pip3 install onnxruntime-gpu -WORKDIR /code diff --git a/dockerfiles/README.md b/dockerfiles/README.md index 6d599a8e6c4bf..c79140a423fc8 100644 --- a/dockerfiles/README.md +++ b/dockerfiles/README.md @@ -1,69 +1,38 @@ # Quick-start Docker containers for ONNX Runtime -## CPU Version (Preview) -#### Linux 16.04, Python Bindings, Compatible with Docker for Windows - -1. Retrieve your docker image in one of the following ways. - -- Build the docker image from the DockerFile in this repository. - ``` - # If you have a Linux machine, preface this command with "sudo" - docker build -t onnxruntime-cpu -f Dockerfile.cpu . - ``` - - Pull the official image from DockerHub. - - ``` - # Will be available with ONNX Runtime 0.2.0 - ``` -2. Run the docker image +## nGraph Version (Preview) +#### Linux 16.04, Python Bindings +1. Build the docker image from the Dockerfile in this repository. ``` # If you have a Linux machine, preface this command with "sudo" - # If you have a Windows machine, preface this command with "winpty" - docker run -it onnxruntime-cpu + docker build -t onnxruntime-ngraph -f Dockerfile.ngraph . ``` -## GPU Version (Preview) -#### Linux 16.04, Python Bindings, CUDA 10, CuDNN7, Requires Nvidia-Docker version 2.0 - -0. Prerequisites: [Install Nvidia-Docker 2.0](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)) +2. Run the Docker image -1. Retrieve your docker image in one of the following ways. - - Build the docker image from the DockerFile in this repository. - ``` - # If you have a Linux machine, preface this command with "sudo" - - docker build -t onnxruntime-gpu -f Dockerfile.gpu . - ``` - Note that you can change the base CUDA distribution to 9.1 and use nvidia-docker v1 - by replacing the first line of the dockerfile with the base image below. - ``` - FROM nvidia/cuda:9.1-cudnn7-devel-ubuntu16.04 - ``` - - Pull the official image from DockerHub. - - ``` - # Will be available with ONNX Runtime 0.2.0 - ``` - -2. Run the docker image ``` # If you have a Linux machine, preface this command with "sudo" - # If you have a Windows machine, preface this command with "winpty" - docker run -it --runtime=nvidia --rm nvidia/cuda onnxruntime-gpu + docker run -it onnxruntime-ngraph ``` -### Other options to get started with ONNX Runtime -- Deploy [inference for pretrained ONNX models](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment/onnx) for handwritten digit recognition (MNIST) -or facial expression recognition (FER+) using Azure Machine Learning +## ONNX Runtime Server (Preview) +#### Linux 16.04 -- Work with ONNX runtime in your local environment using the PyPi release ([CPU](https://pypi.org/project/onnxruntime/), [GPU](https://pypi.org/project/onnxruntime-gpu/)) - - ``pip install onnxruntime`` - - ``pip install onnxruntime-gpu`` +1. Build the docker image from the Dockerfile in this repository + ``` + docker build -t {docker_image_name} -f Dockerfile.server . + ``` + +2. Run the ONNXRuntime server with the image created in step 1 -- Build ONNX Runtime from the source code by following [these instructions for developers](../BUILD.md). + ``` + docker run -v {localModelAbsoluteFolder}:{dockerModelAbsoluteFolder} -e MODEL_ABSOLUTE_PATH={dockerModelAbsolutePath} -p {your_local_port}:8001 {imageName} + ``` +3. Send HTTP requests to the container running ONNX Runtime Server -### License -[MIT License](../LICENSE) + Send HTTP requests to the docker container through the binding local port. Here is the full [usage document](https://github.com/Microsoft/onnxruntime/blob/master/docs/ONNX_Runtime_Server_Usage.md). + ``` + curl -X POST -d "@request.json" -H "Content-Type: application/json" http://0.0.0.0:{your_local_port}/v1/models/mymodel/versions/3:predict \ No newline at end of file diff --git a/docs/ContribOperators.md b/docs/ContribOperators.md index 4f6aac66e55e8..ae0be30674fdd 100644 --- a/docs/ContribOperators.md +++ b/docs/ContribOperators.md @@ -1,28 +1,22 @@ ## Contrib Operator Schemas *This file is automatically generated from the - [def files](/onnxruntime/core/graph/contrib_ops/contrib_defs.cc) via [this script](/onnxruntime/python/tools/gen_doc.py). + [def files](/onnxruntime/core/graph/contrib_ops/contrib_defs.cc) via [this script](/tools/python/gen_doc.py). Do not modify directly and instead edit operator definitions.* * com.microsoft * com.microsoft.AttnLSTM - * com.microsoft.ConvInteger - * com.microsoft.DequantizeLinear * com.microsoft.ExpandDims * com.microsoft.FusedConv * com.microsoft.FusedGemm * com.microsoft.GatherND - * com.microsoft.MatMulInteger * com.microsoft.MaxpoolWithMask * com.microsoft.MurmurHash3 - * com.microsoft.NonMaxSuppression - * com.microsoft.QLinearConv - * com.microsoft.QLinearMatMul - * com.microsoft.QuantizeLinear - * com.microsoft.ROIAlign + * com.microsoft.Pad * com.microsoft.Range * com.microsoft.ReduceSumInteger * com.microsoft.SampleOp * com.microsoft.Tokenizer + * com.microsoft.Unique * com.microsoft.WordConvEmbedding ## com.microsoft @@ -184,158 +178,54 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs (3 - 14)
-
:
-
-
:
-
-
:
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
+
X : T
+
The input sequences packed (and potentially padded) into one 3-D tensor with the shape of `[seq_length, batch_size, input_size]`
+
W : T
+
The weight tensor for the gates. Concatenation of `W[iofc]` and `WB[iofc]` (if bidirectional) along dimension 0. The tensor has shape `[num_directions, 4*hidden_size, input_size]`.
+
R : T
+
The recurrence weight tensor. Concatenation of `R[iofc]` and `RB[iofc]` (if bidirectional) along dimension 0. This tensor has shape `[num_directions, 4*hidden_size, hidden_size]`.
+
B (optional) : T
+
The bias tensor for input gate. Concatenation of `[Wb[iofc], Rb[iofc]]`, and `[WBb[iofc], RBb[iofc]]` (if bidirectional) along dimension 0. This tensor has shape `[num_directions, 8*hidden_size]`. Optional: If not specified - assumed to be 0.
+
sequence_lens (optional) : T1
+
Optional tensor specifying lengths of the sequences in a batch. If not specified - assumed all sequences in the batch to have length `seq_length`. It has shape `[batch_size]`
+
initial_h (optional) : T
+
Optional initial value of the hidden. If not specified - assumed to be 0. It has shape `[num_directions, batch_size, hidden_size]`.
+
initial_c (optional) : T
+
Optional initial value of the cell. If not specified - assumed to be 0. It has shape `[num_directions, batch_size, hidden_size]`.
+
P (optional) : T
+
The weight tensor for peepholes. Concatenation of `P[iof]` and `PB[iof]` (if bidirectional) along dimension 0. It has shape `[num_directions, 3*hidde_size]`. Optional: If not specified - assumed to be 0.
+
QW (optional) : T
+
The weight tensor of the query layer in the attention mechanism. Should be of shape `[num_directions, am_query_depth(hidden_size of lstm), am_attn_size]`
+
MW (optional) : T
+
The weight tensor of the memory layer in the attention mechanism. Should be of shape `[num_directions, memory_depth, am_attn_size]`
+
V (optional) : T
+
The attention_v tensor in the attention mechanism. Should be of shape `[num_directions, am_attn_size]`
+
M (optional) : T
+
The sequence of the memory (input) for attention mechanism. Should be of `[batch_size, max_memory_step, memory_depth]`
+
memory_seq_lens (optional) : T1
+
The sequence length of the input memory for the attention mechanism. Should be of `[batch_size]`
+
AW (optional) : T
+
The weights of attention layer in the attention wrapper. If exists, should be of shape `[num_directions, memory_depth+hidden_size, aw_attn_size]. Please note that attention mechanism context depth is also memory_depth in the attention mechanism.`
#### Outputs (0 - 3)
-
(optional) :
-
-
(optional) :
-
-
(optional) :
-
-
- -#### Type Constraints - -
-
:
-
-
:
-
-
- - -### **com.microsoft.ConvInteger** - - The integer convolution operator consumes an input tensor, a filter, and a padding value, - and computes the output. The production MUST never overflow. The accumulation may overflow - if and only if in 32 bits. - -#### Version - -This version of the operator has been available since version 1 of the 'com.microsoft' operator set. - -#### Attributes - -
-
auto_pad : string
-
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding.
-
dilations : list of ints
-
dilation value along each axis of the filter. If not present, the dilation defaults to 1 along each axis.
-
group : int
-
number of groups input channels and output channels are divided into. default is 1.
-
kernel_shape : list of ints
-
The shape of the convolution kernel. If not present, should be inferred from input 'w'.
-
pads : list of ints
-
Padding for the beginning and ending along each axis, it can take any value greater than or equal to 0.The value represent the number of pixels added to the beginning and end part of the corresponding axis.`pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number ofpixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`.This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaultsto 0 along start and end of each axis.
-
strides : list of ints
-
Stride along each axis. If not present, the stride defaults to 1 along each axis.
-
- -#### Inputs (2 - 4) - -
-
:
-
-
:
-
-
(optional) :
-
-
(optional) :
-
-
- -#### Outputs - -
-
:
-
+
Y (optional) : T
+
A tensor that concats all the intermediate output values of the hidden. It has shape `[seq_length, num_directions, batch_size, hidden_size]`
+
Y_h (optional) : T
+
The last output value of the hidden. It has shape `[num_directions, batch_size, hidden_size]`.
+
Y_c (optional) : T
+
The last output value of the cell. It has shape `[num_directions, batch_size, hidden_size]`.
#### Type Constraints
-
:
-
-
:
-
-
:
-
-
- - -### **com.microsoft.DequantizeLinear** - - The linear de-quantization operator. It consumes a quantized data, a scale, a zero point and computes the full precision data. - The dequantization formula is y = (x - x_zero_point) * x_scale. - Scale and zero point must have same shape. They must be either scalar (per tensor) or 1-D tensor (per 'axis'). - -#### Version - -This version of the operator has been available since version 1 of the 'com.microsoft' operator set. - -#### Attributes - -
-
axis : int
-
the axis along which same quantization parameters are applied. It's optional. If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars. If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.
-
- -#### Inputs - -
-
:
-
-
:
-
-
:
-
-
- -#### Outputs - -
-
:
-
-
- -#### Type Constraints - -
-
:
-
-
:
-
+
T : tensor(float), tensor(double)
+
Constrain input and output types to float tensors.
+
T1 : tensor(int32)
+
Constrain seq_lens to integral tensors.
@@ -350,24 +240,24 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
-
:
-
+
X : T
+
input
+
axis : tensor(int32)
+
Specified axis to insert a dimension
#### Outputs
-
:
-
+
Y : T
+
output
#### Type Constraints
-
:
-
+
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
+
Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.
@@ -404,26 +294,26 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs (2 - 3)
-
:
+
X : T
-
:
+
W : T
-
(optional) :
+
B (optional) : T
#### Outputs
-
:
+
Y : T
#### Type Constraints
-
:
-
+
T : tensor(float16), tensor(float), tensor(double)
+
Constrain input and output types to float tensors
@@ -456,26 +346,26 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
-
:
-
-
:
-
+
A : T
+
Input tensor A. The shape of A should be (M, K) if transA is 0, or (K, M) if transA is non-zero.
+
B : T
+
Input tensor B. The shape of B should be (K, N) if transB is 0, or (N, K) if transB is non-zero.
+
C : T
+
Input tensor C. The shape of C should be unidirectional broadcastable to (M, N).
#### Outputs
-
:
-
+
Y : T
+
Output tensor of shape (M, N).
#### Type Constraints
-
:
-
+
T : tensor(float16), tensor(float), tensor(double), tensor(uint32), tensor(uint64), tensor(int32), tensor(int64)
+
Constrain input and output types to float/int tensors.
@@ -507,67 +397,26 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
-
:
-
+
data : T
+
Tensor of rank r >= 1.
+
indices : Tind
+
Tensor of rank q >= 1.
#### Outputs
-
:
-
+
output : T
+
Tensor of rank q-1+r-indices[-1].
#### Type Constraints
-
:
-
-
:
-
-
- - -### **com.microsoft.MatMulInteger** - - Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html. - The production MUST never overflow. The accumulation may overflow if and only if in 32 bits. - -#### Version - -This version of the operator has been available since version 1 of the 'com.microsoft' operator set. - -#### Inputs (2 - 4) - -
-
:
-
-
:
-
-
(optional) :
-
-
(optional) :
-
-
- -#### Outputs - -
-
:
-
-
- -#### Type Constraints - -
-
:
-
-
:
-
-
:
-
+
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
+
Constrain input and output types to any tensor type.
+
Tind : tensor(int32), tensor(int64)
+
Constrain indice type to int32 or int64
@@ -597,24 +446,24 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
-
:
+
X : T
+
M : tensor(int32)
+
mask
#### Outputs
-
:
+
Y : T
#### Type Constraints
-
:
-
+
T : tensor(float)
+
Constrain input0 and output types to float tensors
@@ -638,272 +487,46 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
-
- -#### Outputs - -
-
:
-
-
- -#### Type Constraints - -
-
:
-
-
:
-
-
- - -### **com.microsoft.NonMaxSuppression** - - Pruning away boxes that have high intersection-over-union (IOU) overlap with previously selected boxes. - Bounding boxes with score less than score_threshold are removed. Bounding boxes are supplied as [y1, x1, y2, x2], - where (y1, x1) and (y2, x2) are the coordinates of any diagonal pair of box corners and the coordinates can be provided - as normalized (i.e., lying in the interval [0, 1]) or absolute. - Note that this algorithm is agnostic to where the origin is in the coordinate system and more generally is invariant to - orthogonal transformations and translations of the coordinate system; - thus translating or reflections of the coordinate system result in the same boxes being selected by the algorithm. - The output of this operation is a set of integers indexing into the input collection of bounding boxes representing the selected boxes. - The bounding box coordinates corresponding to the selected indices can then be obtained using the gather operation. - -#### Version - -This version of the operator has been available since version 1 of the 'com.microsoft' operator set. - -#### Attributes - -
-
iou_threshold : float
-
Float representing the threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. The default is 0.0
-
max_output_size : int (required)
-
Integer representing the maximum number of boxes to be selected by non max suppression.
-
pad_to_max_output_size : int
-
Optional. 1(true) - the output selected_indices is padded to be of length max_output_size. Defaults to 0(false).
-
score_threshold : float (required)
-
Float tensor representing the threshold for deciding when to remove boxes based on score.
-
- -#### Inputs - -
-
:
-
-
:
-
-
- -#### Outputs (1 - 2) - -
-
:
-
-
(optional) :
-
-
- -#### Type Constraints - -
-
:
-
-
:
-
-
- - -### **com.microsoft.QLinearConv** - - The convolution operator consumes a quantized input tensor, its scale and zero point, - a quantized filter, its scale and zero point, and output's scale and zero point, - and computes the quantized output. Each scale and zero point pair must have same shape. - It means they must be either scalars (per tensor) or 1-D tensors (per channel). - The production MUST never overflow. The accumulation may overflow in 32 bits - if the input is 8 bits or in 64 bits if the input is 16 bits. - -#### Version - -This version of the operator has been available since version 1 of the 'com.microsoft' operator set. - -#### Attributes - -
-
auto_pad : string
-
auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding.
-
dilations : list of ints
-
dilation value along each axis of the filter. If not present, the dilation defaults to 1 along each axis.
-
group : int
-
number of groups input channels and output channels are divided into. default is 1.
-
kernel_shape : list of ints
-
The shape of the convolution kernel. If not present, should be inferred from input 'w'.
-
pads : list of ints
-
Padding for the beginning and ending along each axis, it can take any value greater than or equal to 0.The value represent the number of pixels added to the beginning and end part of the corresponding axis.`pads` format should be as follow [x1_begin, x2_begin...x1_end, x2_end,...], where xi_begin the number ofpixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`.This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaultsto 0 along start and end of each axis.
-
strides : list of ints
-
Stride along each axis. If not present, the stride defaults to 1 along each axis.
-
- -#### Inputs (8 - 9) - -
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
-
(optional) :
-
-
- -#### Outputs - -
-
:
-
-
- -#### Type Constraints - -
-
:
-
-
:
-
-
:
-
-
:
-
-
- - -### **com.microsoft.QLinearMatMul** - - Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html. - It consumes two quantized input tensors, their scales and zero points, and output's scale and zero point, and computes - the quantized output. The quantization formula is x_quantized = (x_fp32 / x_scale) + x_zero_point. For (x_fp32 / x_scale), - it computes the nearest integer value to arg (in floating-point format), rounding halfway cases away from zero. - Scale and zero point must have same shape. They must be either scalar (per tensor) or 1-D tensor (per row for a and per column for b). - If scale and zero point are 1D tensor, the number of elements of scale and zero point tensor of input 'a' and output 'y' - should be equal to the number of rows of input 'a', and the number of elements of scale and zero point tensor of input 'b' - should be equal to the number of columns of input 'b'. The production MUST never overflow. The accumulation may overflow in 32 bits - if the input is 8 bits or in 64 bits if the input is 16 bits. - -#### Version - -This version of the operator has been available since version 1 of the 'com.microsoft' operator set. - -#### Inputs - -
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
-
:
-
+
X : T1
+
An input tensor to hash.
#### Outputs
-
:
-
+
Y : T2
+
32-bit hash value.
#### Type Constraints
-
:
-
-
:
-
-
:
-
+
T1 : tensor(uint32), tensor(int32), tensor(string)
+
Constrain input type to unsigned or signed 32-bit integer tensor, or string tensor. It should be utf-8 encoded if using unicode.
+
T2 : tensor(uint32), tensor(int32)
+
Constrain output type to unsigned and signed 32-bit integer tensor.
-### **com.microsoft.QuantizeLinear** +### **com.microsoft.Pad** - The linear quantization operator. It consumes a full precision data, a scale, a zero point and computes the quantized data. - The quantization formula is y = (x / y_scale) + y_zero_point. For (x / y_scale), it computes the nearest integer value to arg (in floating-point format), - rounding halfway cases away from zero. Scale and zero point must have same shape. They must be either scalar (per tensor) or 1-D tensor (per 'axis'). - -#### Version - -This version of the operator has been available since version 1 of the 'com.microsoft' operator set. - -#### Attributes - -
-
axis : int
-
The axis along which same quantization parameters are applied. It's optional. If it's not specified, it means per-tensor quantization and input 'x_scale' and 'x_zero_point' must be scalars. If it's specified, it means per 'axis' quantization and input 'x_scale' and 'x_zero_point' must be 1-D tensors.
-
- -#### Inputs - -
-
:
-
-
:
-
-
:
-
-
- -#### Outputs - -
-
:
-
-
- -#### Type Constraints - -
-
:
-
-
:
-
-
- - -### **com.microsoft.ROIAlign** - - Region of Interest (RoI) align operation described in the - [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). - RoIAlign consumes an input tensor X and region of interests (rois) - to apply pooling across each RoI; it produces a 4-D tensor of shape - (num_rois, C, pooled_h, pooled_w). - - RoIAlign is proposed to avoid the misalignment by removing - quantizations while converting from original image into feature - map and from feature map into RoI feature; in each ROI bin, - the value of the sampled locations are computed directly - through bilinear interpolation. + Given `data` tensor, pads, mode, and value. + Example: + Insert 0 pads to the beginning of the second dimension. + data = [ + [1.0, 1.2], + [2.3, 3.4], + [4.5, 5.7], + ] + pads = [0, 2, 0, 0] + output = [ + [ + [0.0, 0.0, 1.0, 1.2], + [0.0, 0.0, 2.3, 3.4], + [0.0, 0.0, 4.5, 5.7], + ], + ] + #### Version @@ -913,38 +536,32 @@ This version of the operator has been available since version 1 of the 'com.micr
mode : string
-
The pooling method. Two modes are supported: 'avg' and 'max'. Default is 'avg'.
-
pooled_h : int
-
default 1; Pooled output Y's height.
-
pooled_w : int
-
default 1; Pooled output Y's width.
-
sampling_ratio : int
-
Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / pooled_w), and likewise for height). Default is 0.
-
spatial_scale : float
-
Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f.
+
Three modes: `constant`(default) - pads with a given constant value, `reflect` - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, `edge` - pads with the edge values of array
-#### Inputs +#### Inputs (2 - 3)
-
:
-
-
:
-
+
data : T
+
Input tensor.
+
pads : tensor(int64)
+
Tensor of integers indicating the number of padding elements to add or remove (if negative) at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. `pads` should be a 1D tensor of shape [2 * input_rank] or a 2D tensor of shape [1, 2 * input_rank]. `pads` format (1D example) should be as follow [x1_begin, x2_begin,...,x1_end, x2_end,...], where xi_begin is the number of pixels added at the beginning of axis `i` and xi_end, the number of pixels added at the end of axis `i`.
+
value (optional) : T
+
(Optional) A scalar or rank 1 tensor containing a single value to be filled if the mode chosen is `constant` (by default it is 0.0).
#### Outputs
-
:
-
+
output : T
+
Tensor after padding.
#### Type Constraints
-
:
-
+
T : tensor(float16), tensor(float), tensor(double)
+
Constrain input and output types to float tensors.
@@ -960,26 +577,26 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs (2 - 3)
-
:
-
-
:
-
-
(optional) :
-
+
start : T
+
Tensor(scalar, or dims=[1]). First entry in the range.
+
limit : T
+
Tensor(scalar, or dims=[1]). Upper limit of sequence, exclusive.
+
delta (optional) : T
+
Tensor(scalar, or dims=[1]). Number that increments start. Defaults to 1.
#### Outputs
-
:
-
+
Y : T
+
1-D Tensor of the range.
#### Type Constraints
-
:
-
+
T : tensor(float), tensor(double), tensor(int16), tensor(int32), tensor(int64)
+
Constrain input and output types.
@@ -1006,24 +623,24 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
+
data : T1
+
An input tensor.
#### Outputs
-
:
-
+
reduced : T2
+
Reduced output tensor.
#### Type Constraints
-
:
-
-
:
-
+
T1 : tensor(int8), tensor(uint8)
+
Constrain input type to 8-bit integer tensor.
+
T2 : tensor(int32), tensor(uint32)
+
Constrain output data type to 32-bit integer tensor.T2 must be tensor(uint32) when T1 is tensor(uint8),or must be tensor(int32) when T1 is tensor(int8).
@@ -1038,22 +655,22 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
+
X : T
+
input
#### Outputs
-
:
-
+
Y : T
+
output
#### Type Constraints
-
:
-
+
T : tensor(uint32), tensor(uint64), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double)
+
Constrain to any tensor type. If the dtype attribute is not provided this must be a valid output type.
@@ -1063,7 +680,8 @@ This version of the operator has been available since version 1 of the 'com.micr If the maximum number of tokens found per input string is D, the output shape would be [N, C, D] when input shape is [N, C]. Similarly, if input shape is [C] then the output should be [C, D]. Tokenizer has two different operation modes. The first mode is selected when "tokenexp" is not set and "separators" is set. If "tokenexp" is set and "separators" is not set, - the second mode will be used. The first mode breaks each input string into tokens by removing separators. + the second mode will be used. The first mode breaks each input string into tokens by matching and removing separators. + "separators" is a list of strings which are regular expressions. "tokenexp" is a single regular expression. Let's assume "separators" is [" "] and consider an example. If input is @@ -1078,6 +696,9 @@ This version of the operator has been available since version 1 of the 'com.micr whose shape is [2, 5] because you can find at most 5 tokens per input string. Note that the input at most can have two axes, so 3-D and higher dimension are not supported. + If "separators" contains a single empty string, the Tokenizer will enter into character tokenezation mode. This means all strings + will be broken part into individual characters. + For each input string, the second mode searches matches of "tokenexp" and each match will be a token in Y. The matching of "tokenexp" is conducted greedily (i.e., a match should be as long as possible). This operator searches for the first match starting from the beginning of the considered string, @@ -1089,6 +710,13 @@ This version of the operator has been available since version 1 of the 'com.micr If input is ["Hello", "World"], then the corresponding output would be [0x02, "Hello", "World", 0x03]. This implies that if mark is true, [C]/[N, C] - input's output shape becomes [C, D+2]/[N, C, D+2]. + + If tokenizer removes the entire content of [C]-input, it will produce [[]]. + I.e. the output shape should be [C][0] or [N][C][0] if input shape was [N][C]. + + If the tokenizer receives empty input of [0] then the output is [0] if empty input + of [N, 0] then [N, 0]. + #### Version @@ -1104,7 +732,7 @@ This version of the operator has been available since version 1 of the 'com.micr
pad_value : string (required)
The string used to pad output tensors when the tokens extracted doesn't match the maximum number of tokens found. If start/end markers are needed, padding will appear outside the markers.
separators : list of strings
-
an optional list of strings (type: AttributeProto::STRINGS), each single string in this attribute is a separator. Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is "Hello World!" and this attribute contains only one space character, the corresponding output would be ["Hello", "World!"]. To achieve character-level tokenization, one should set the separators to [""], which contains only one empty string. If 'separators' is a L-element array, there will be L rounds of tokenization using one stop word. More specifically, in the first round, the first element in 'separators' is used to tokenize each string in the input. Then, the second element in 'separators' will be used to tokenize the resulted strings produced at the first round.
+
an optional list of strings attribute that contains a list of separators - regular expressions to match separators Two consecutive segments in X connected by a separator would be divided into two tokens. For example, if the input is "Hello World!" and this attribute contains only one space character, the corresponding output would be ["Hello", "World!"]. To achieve character-level tokenization, one should set the 'separators' to [""], which contains an empty string.
tokenexp : string
An optional string. Token's regular expression in basic POSIX format (http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03). If set, tokenizer may produce tokens matching the specified pattern. Note that one and only of 'tokenexp' and 'separators' should be set.
@@ -1112,22 +740,68 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
+
X : T
+
Strings to tokenize
#### Outputs
-
:
-
+
Y : T
+
Tokenized strings
#### Type Constraints
-
:
-
+
T : tensor(string)
+
Input/Output is a string tensor
+
+ + +### **com.microsoft.Unique** + + Finds all the unique values (deduped list) present in the given input tensor. + This operator returns 3 outputs. + The first output tensor 'uniques' contains all of the unique elements of the input, + sorted in the same order that they occur in the input. + The second output tensor 'idx' is the same size as the input and it contains the index + of each value of the input in 'uniques'. + The third output tensor 'counts' contains the count of each element of 'uniques' in the input. + Example: + input_x = [2, 1, 1, 3, 4, 3] + output_uniques = [2, 1, 3, 4] + output_idx = [0, 1, 1, 2, 3, 2] + output_counts = [1, 2, 2, 1] + + +#### Version + +This version of the operator has been available since version 1 of the 'com.microsoft' operator set. + +#### Inputs + +
+
x : T
+
A 1-D input tensor that is to be processed.
+
+ +#### Outputs + +
+
y : T
+
A 1-D tensor of the same type as 'x' containing all the unique values in 'x' sorted in the same order that they occur in the input 'x'
+
idx : tensor(int64)
+
A 1-D INT64 tensor of the same size as 'x' containing the indices for each value in 'x' in the output 'uniques'
+
counts : tensor(int64)
+
A 1-D INT64 tensor containing the the count of each element of 'uniques' in the input 'x'
+
+ +#### Type Constraints + +
+
T : tensor(uint8), tensor(uint16), tensor(uint32), tensor(uint64), tensor(int8), tensor(int16), tensor(int32), tensor(int64), tensor(float16), tensor(float), tensor(double), tensor(string), tensor(bool), tensor(complex64), tensor(complex128)
+
Input can be of any tensor type.
@@ -1153,30 +827,30 @@ This version of the operator has been available since version 1 of the 'com.micr #### Inputs
-
:
-
-
:
-
-
:
-
-
:
-
+
Sequence : T
+
Specify batchs of sequence words to embedding
+
W : T1
+
Specify weights of conv
+
B : T1
+
Specify bias of conv
+
C : T1
+
Specify embedding vector of char
#### Outputs
-
:
-
+
Y : T1
+
output
#### Type Constraints
-
:
-
-
:
-
+
T : tensor(int32)
+
Constrain to tensor(int32).
+
T1 : tensor(float)
+
Constrain to tensor(float).
diff --git a/docs/How_To_Update_ONNX_Dev_Notes.md b/docs/How_To_Update_ONNX_Dev_Notes.md index 9ff9019ccfec8..2f312aacfeaa7 100644 --- a/docs/How_To_Update_ONNX_Dev_Notes.md +++ b/docs/How_To_Update_ONNX_Dev_Notes.md @@ -15,7 +15,8 @@ git add onnx 2. Update [cgmanifest.json](/cgmanifest.json) Search 'https://github.com/onnx/onnx.git', update the commitHash with it. -3. Update [tools/ci_build/github/linux/docker/scripts/install_deps.sh](/tools/ci_build/github/linux/docker/scripts/install_deps.sh) +3. Update [tools/ci_build/github/linux/docker/scripts/install_deps.sh](/tools/ci_build/github/linux/docker/scripts/install_deps.sh) +and [tools/ci_build/github/linux/docker/scripts/install_deps_x86.sh](/tools/ci_build/github/linux/docker/scripts/install_deps_x86.sh) Search 'for onnx_version in', update the commit hashes. The list should contain every release version from ONNX 1.2, and the latest one in our cmake/external/onnx folder. 4. Update onnxruntime/core/protobuf diff --git a/include/onnxruntime/core/common/const_pointer_container.h b/include/onnxruntime/core/common/const_pointer_container.h index 49f042114692c..1d821ba609205 100644 --- a/include/onnxruntime/core/common/const_pointer_container.h +++ b/include/onnxruntime/core/common/const_pointer_container.h @@ -64,6 +64,10 @@ class ConstPointerContainer { explicit ConstPointerContainer(const Container& data) noexcept : data_(data) {} size_t size() const noexcept { return data_.size(); } + bool empty() const noexcept { return data_.empty(); } + + ConstIterator cbegin() const noexcept { return ConstIterator(data_.cbegin()); } + ConstIterator cend() const noexcept { return ConstIterator(data_.cend()); } ConstIterator begin() const noexcept { return ConstIterator(data_.cbegin()); } ConstIterator end() const noexcept { return ConstIterator(data_.cend()); } diff --git a/include/onnxruntime/core/framework/allocator.h b/include/onnxruntime/core/framework/allocator.h index 082ecf4fa3c6c..462aed63f1d68 100644 --- a/include/onnxruntime/core/framework/allocator.h +++ b/include/onnxruntime/core/framework/allocator.h @@ -32,10 +32,6 @@ struct OrtAllocatorInfo { type(type_) { } - inline bool operator==(const OrtAllocatorInfo& other) const { - return mem_type == other.mem_type && type == other.type && id == other.id && strcmp(name, other.name) == 0; - } - // To make OrtAllocatorInfo become a valid key in std map inline bool operator<(const OrtAllocatorInfo& other) const { if (type != other.type) @@ -60,6 +56,13 @@ struct OrtAllocatorInfo { } }; +inline bool operator==(const OrtAllocatorInfo& left, const OrtAllocatorInfo& other) { + return left.mem_type == other.mem_type && left.type == other.type && left.id == other.id && + strcmp(left.name, other.name) == 0; +} + +inline bool operator!=(const OrtAllocatorInfo& lhs, const OrtAllocatorInfo& rhs) { return !(lhs == rhs); } + std::ostream& operator<<(std::ostream& out, const OrtAllocatorInfo& info); namespace onnxruntime { diff --git a/include/onnxruntime/core/framework/data_types.h b/include/onnxruntime/core/framework/data_types.h index b55b8ed6d1ecd..5898f4b81a5a3 100644 --- a/include/onnxruntime/core/framework/data_types.h +++ b/include/onnxruntime/core/framework/data_types.h @@ -71,7 +71,7 @@ struct ort_endian { //BFloat16 struct BFloat16 { uint16_t val{0}; - explicit BFloat16() {} + explicit BFloat16() = default; explicit BFloat16(uint16_t v) : val(v) {} explicit BFloat16(float v) { uint16_t* dst = reinterpret_cast(&v); @@ -174,7 +174,7 @@ class DataTypeImpl { /** * Convert an ONNX TypeProto to onnxruntime DataTypeImpl. * However, this conversion is lossy. Don't try to use 'this->GetTypeProto()' converting it back - * Don't pass the returned value to MLValue::MLValue(...) function + * Don't pass the returned value to OrtValue::OrtValue(...) function * \param proto */ static MLDataType TypeFromProto(const ONNX_NAMESPACE::TypeProto& proto); diff --git a/include/onnxruntime/core/framework/execution_provider.h b/include/onnxruntime/core/framework/execution_provider.h index 4df5767f2f81e..8965033b4042b 100644 --- a/include/onnxruntime/core/framework/execution_provider.h +++ b/include/onnxruntime/core/framework/execution_provider.h @@ -28,7 +28,7 @@ typedef std::map AllocatorMap; // if we are export the fused function to dll, the function will still in the same binary as lotus // use std function to give execution provider some chance to capture some state. using CreateFunctionStateFunc = std::function; -using ComputeFunc = std::function; +using ComputeFunc = std::function; using DestroyFunctionStateFunc = std::function; struct NodeComputeInfo { @@ -52,8 +52,8 @@ class IExecutionProvider { } /** - Get allocator with specified MemType - */ + * Get an allocator with specified device id and MemType. Return nullptr if it doesn't exist + */ virtual AllocatorPtr GetAllocator(int id, OrtMemType mem_type) const; /** diff --git a/include/onnxruntime/core/framework/fence.h b/include/onnxruntime/core/framework/fence.h index 879fee00b5e87..fd2cdb5f68dc3 100644 --- a/include/onnxruntime/core/framework/fence.h +++ b/include/onnxruntime/core/framework/fence.h @@ -23,31 +23,31 @@ class IFence { virtual ~IFence() = default; /** - Called by executor before MLValue is used as input in a compute kernel in provider_type and exec queue_id - This should wait in the specified provider's exec queue for previous write to MLValue to finish + Called by executor before OrtValue is used as input in a compute kernel in provider_type and exec queue_id + This should wait in the specified provider's exec queue for previous write to OrtValue to finish */ virtual void BeforeUsingAsInput(onnxruntime::ProviderType provider_type, int queue_id) = 0; /** - Called by executor before MLValue is used as output in a compute kernel in provider_type and exec queue_id - This should wait in the specified provider's exec queue for previous read to MLValue to finish + Called by executor before OrtValue is used as output in a compute kernel in provider_type and exec queue_id + This should wait in the specified provider's exec queue for previous read to OrtValue to finish */ virtual void BeforeUsingAsOutput(onnxruntime::ProviderType provider_type, int queue_id) = 0; /** - Called by executor after MLValue is used as input in a compute kernel in provider_type and exec queue_id + Called by executor after OrtValue is used as input in a compute kernel in provider_type and exec queue_id This should update the read fence of the MLValue */ virtual void AfterUsedAsInput(int queue_id) = 0; /** - Called by executor after MLValue is used as output in a compute kernel in provider_type and exec queue_id + Called by executor after OrtValue is used as output in a compute kernel in provider_type and exec queue_id This should update the write fence of the MLValue */ virtual void AfterUsedAsOutput(int queue_id) = 0; /** - Called by executor before release MLValue to see whether async data read is finished or not. This is non-blocking. + Called by executor before release OrtValue to see whether async data read is finished or not. This is non-blocking. */ virtual bool CanRelease() = 0; }; diff --git a/include/onnxruntime/core/framework/framework_common.h b/include/onnxruntime/core/framework/framework_common.h index c3336f2e32c1c..dd0dea9856b0c 100644 --- a/include/onnxruntime/core/framework/framework_common.h +++ b/include/onnxruntime/core/framework/framework_common.h @@ -15,9 +15,8 @@ class NodeArg; } // namespace onnxruntime namespace onnxruntime { -class MLValue; using InputDefList = std::vector; using OutputDefList = std::vector; -using NameMLValMap = std::unordered_map; +using NameMLValMap = std::unordered_map; } // namespace onnxruntime diff --git a/include/onnxruntime/core/framework/func_api.h b/include/onnxruntime/core/framework/func_api.h index d105139ffe521..cf1ba25d0b143 100644 --- a/include/onnxruntime/core/framework/func_api.h +++ b/include/onnxruntime/core/framework/func_api.h @@ -1,31 +1,7 @@ #pragma once #include "core/common/common.h" -namespace onnxruntime { -//TODO: Should use the lotus cpi element type definition. -enum DType { - TFloat32 = 0, - TInt32 = 1, - TDouble = 2, - TInt64 = 3, - TBool = 4, - TUint8 = 5, - TInt8 = 6, - TUint16 = 7, - TInt16 = 8, - TUint32 = 9, - TUint64 = 10 - //TODO: more types -}; -typedef struct { - void* data; - /*! \brief Number of dimensions */ - size_t ndim; - /*! \brief The data type of the pointer*/ - DType dtype; - /*! \brief The shape of the tensor */ - int64_t* shape; -} ONNXRunTimeTensor; +namespace onnxruntime { // AllocateFunc(void* handle, size_t alignment, size_t size) using AllocateFunc = void* (*)(void*, size_t, size_t); @@ -44,7 +20,7 @@ using FunctionState = void*; // take the ComputeContext, and create a function state. using CreateFunctionStateC = int (*)(ComputeContext*, FunctionState*); // pass in the function state and input/output tensors, perform compute and return status code, 0 - succeed. -using ComputeFuncC = int (*)(FunctionState, ONNXRunTimeTensor*, size_t, ONNXRunTimeTensor*, size_t); +using ComputeFuncC = int (*)(FunctionState, const OrtCustomOpApi*, OrtKernelContext*); // release the function state. using DestroyFunctionStateC = void (*)(FunctionState); } // namespace onnxruntime diff --git a/include/onnxruntime/core/framework/kernel_def_builder.h b/include/onnxruntime/core/framework/kernel_def_builder.h index 90edd3b382738..3c093f45401fb 100644 --- a/include/onnxruntime/core/framework/kernel_def_builder.h +++ b/include/onnxruntime/core/framework/kernel_def_builder.h @@ -19,16 +19,16 @@ class KernelDefBuilder; typedef std::map MemTypeMap; -// note that input/output might be on CPU implicitly when the node is from CPU execution provider -inline bool MemTypeOnCpuExplicitly(OrtMemType mem_type) { - return mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput; -} - class KernelDef { - public: - explicit KernelDef() { + private: + // note that input/output might be on CPU implicitly when the node is from CPU execution provider + static inline bool MemTypeOnCpuExplicitly(OrtMemType mem_type) { + return mem_type == OrtMemTypeCPUInput || mem_type == OrtMemTypeCPUOutput; } + public: + explicit KernelDef() = default; + const std::string& OpName() const { return op_name_; } @@ -65,6 +65,10 @@ class KernelDef { return it->second; } + bool IsInputOnCpu(size_t input_index) const { return MemTypeOnCpuExplicitly(InputMemoryType(input_index)); } + + bool IsOutputOnCpu(size_t output_index) const { return MemTypeOnCpuExplicitly(OutputMemoryType(output_index)); } + OrtMemType OutputMemoryType(size_t output_index) const { auto it = output_memory_type_args_.find(output_index); if (it == output_memory_type_args_.end()) diff --git a/include/onnxruntime/core/framework/kernel_registry.h b/include/onnxruntime/core/framework/kernel_registry.h index e27be83c19dc8..017c81978cbe9 100644 --- a/include/onnxruntime/core/framework/kernel_registry.h +++ b/include/onnxruntime/core/framework/kernel_registry.h @@ -25,7 +25,7 @@ class KernelRegistry { // itself. // TODO(Task:132) Make usage of unique_ptr/shared_ptr as out param consistent Status TryCreateKernel(const onnxruntime::Node& node, const IExecutionProvider& execution_provider, - const std::unordered_map& initialized_tensors, + const std::unordered_map& initialized_tensors, const MLValueNameIdxMap& mlvalue_name_idx_map, const FuncManager& funcs_mgr, std::unique_ptr& op_kernel) const; diff --git a/include/onnxruntime/core/framework/ml_value.h b/include/onnxruntime/core/framework/ml_value.h index b5456863462bd..1d9835a7ff574 100644 --- a/include/onnxruntime/core/framework/ml_value.h +++ b/include/onnxruntime/core/framework/ml_value.h @@ -10,20 +10,19 @@ #include "core/framework/data_types.h" #include "core/framework/tensor.h" -namespace onnxruntime { /** Represents both tensors and non-tensors. */ -class MLValue { +struct OrtValue { public: - MLValue() : data_(nullptr) {} - virtual ~MLValue() = default; + OrtValue() : data_(nullptr) {} + virtual ~OrtValue() = default; - MLValue(void* pData, MLDataType type, DeleteFunc deleter) { + OrtValue(void* pData, onnxruntime::MLDataType type, onnxruntime::DeleteFunc deleter) { Init(pData, type, deleter); } - void Init(void* pData, MLDataType type, DeleteFunc deleter) { + void Init(void* pData, onnxruntime::MLDataType type, onnxruntime::DeleteFunc deleter) { data_.reset(pData, deleter); type_ = type; } @@ -34,39 +33,41 @@ class MLValue { template const T& Get() const { - ORT_ENFORCE(DataTypeImpl::GetType() == type_, DataTypeImpl::GetType(), " != ", type_); + ORT_ENFORCE(onnxruntime::DataTypeImpl::GetType() == type_, onnxruntime::DataTypeImpl::GetType(), " != ", type_); return *static_cast(data_.get()); } template T* GetMutable() { - ORT_ENFORCE(DataTypeImpl::GetType() == type_, DataTypeImpl::GetType(), " != ", type_); + ORT_ENFORCE(onnxruntime::DataTypeImpl::GetType() == type_, onnxruntime::DataTypeImpl::GetType(), " != ", type_); return static_cast(data_.get()); } bool IsTensor() const noexcept { - return DataTypeImpl::GetType() == type_; + return onnxruntime::DataTypeImpl::GetType() == type_; } - MLDataType Type() const { + onnxruntime::MLDataType Type() const { return type_; } - Fence_t Fence() const { + onnxruntime::Fence_t Fence() const { return fence_.get(); } - void SetFence(FencePtr fence) { + void SetFence(onnxruntime::FencePtr fence) { fence_ = fence; } - void ShareFenceWith(MLValue& v) { + void ShareFenceWith(OrtValue& v) { fence_ = v.fence_; } private: std::shared_ptr data_; - MLDataType type_{nullptr}; - FencePtr fence_; + onnxruntime::MLDataType type_{nullptr}; + onnxruntime::FencePtr fence_; }; -} // namespace onnxruntime + +//TODO: remove the following line +#define MLValue OrtValue diff --git a/include/onnxruntime/core/framework/op_kernel.h b/include/onnxruntime/core/framework/op_kernel.h index 175f2a00bca80..ccfe13d79c89c 100644 --- a/include/onnxruntime/core/framework/op_kernel.h +++ b/include/onnxruntime/core/framework/op_kernel.h @@ -79,7 +79,7 @@ class OpKernelContext { template const T* Input(int index) const { - const MLValue* p_ml_value = GetInputMLValue(index); + const OrtValue* p_ml_value = GetInputMLValue(index); try { return p_ml_value ? &(p_ml_value->Get()) : nullptr; } catch (const std::exception& /*e*/) { @@ -93,7 +93,7 @@ class OpKernelContext { if (index < 0 || index >= OutputCount()) return nullptr; - MLValue* p_ml_value = nullptr; + OrtValue* p_ml_value = nullptr; ORT_ENFORCE(GetOrCreateOutputMLValue(index, p_ml_value).IsOK()); return p_ml_value ? p_ml_value->GetMutable() : nullptr; } @@ -107,14 +107,17 @@ class OpKernelContext { return *logger_; } + // always >= 0 int InputCount() const { return static_cast(kernel_->Node().InputDefs().size()); } + // always >= 0 int ImplicitInputCount() const { return static_cast(kernel_->Node().ImplicitInputDefs().size()); } + // always >= 0 int OutputCount() const { return static_cast(kernel_->Node().OutputDefs().size()); } @@ -128,39 +131,39 @@ class OpKernelContext { /** Return the fence of current node's input. @param index The index of the input. - @returns Point to the Fence of the input MLValue. - It is null if the input MLValue doesn't have fence or the input is optional. + @returns Point to the Fence of the input OrtValue. + It is null if the input OrtValue doesn't have fence or the input is optional. */ Fence_t InputFence(int index) const; /** Return the fence of current node's implicit input. @param index The index of the implicit input. - @returns Point to the Fence of the implicit input MLValue. - It is null if the input MLValue doesn't have fence or the input is optional. + @returns Point to the Fence of the implicit input OrtValue. + It is null if the input OrtValue doesn't have fence or the input is optional. */ Fence_t ImplicitInputFence(int index) const; /** Return the fence of current node's output identifed by index. @param index The index of the output. - @returns Point to the Fence of the output MLValue. - It is null if the output MLValue doesn't have fence or the output is optional. + @returns Point to the Fence of the output OrtValue. + It is null if the output OrtValue doesn't have fence or the output is optional. */ Fence_t OutputFence(int index) const; protected: onnxruntime::NodeIndex GetNodeIndex() const; - const MLValue* GetInputMLValue(int index) const; - const MLValue* GetImplicitInputMLValue(int index) const; - MLValue* GetOutputMLValue(int index); - MLValue* OutputMLValue(int index, const TensorShape& shape); // Creates the MLValue* based on the shape, if it does not exist + const OrtValue* GetInputMLValue(int index) const; + const OrtValue* GetImplicitInputMLValue(int index) const; + OrtValue* GetOutputMLValue(int index); + OrtValue* OutputMLValue(int index, const TensorShape& shape); // Creates the OrtValue* based on the shape, if it does not exist private: ORT_DISALLOW_COPY_AND_ASSIGNMENT(OpKernelContext); - Status GetOrCreateOutputMLValue(int index, MLValue*& value); + Status GetOrCreateOutputMLValue(int index, OrtValue*& value); int GetInputArgIndex(int index) const; int GetImplicitInputArgIndex(int index) const; @@ -179,7 +182,7 @@ class OpKernelContext { // Fetching output tensor without shape is not allowed except when it already exists template <> inline Tensor* OpKernelContext::Output(int index) { - MLValue* p_ml_value = GetOutputMLValue(index); + OrtValue* p_ml_value = GetOutputMLValue(index); ORT_ENFORCE(p_ml_value, "Please fetch output tensor with specified shape."); return p_ml_value->GetMutable(); } diff --git a/include/onnxruntime/core/framework/op_kernel_info.h b/include/onnxruntime/core/framework/op_kernel_info.h index 7b3d3a0d3f340..521ca03630967 100644 --- a/include/onnxruntime/core/framework/op_kernel_info.h +++ b/include/onnxruntime/core/framework/op_kernel_info.h @@ -21,18 +21,16 @@ class FuncManager; // NOTE: it does not own/hold any objects. class OpKernelInfo : public OpNodeProtoHelper { public: - explicit OpKernelInfo(const onnxruntime::Node& node, - const KernelDef& kernel_def, + explicit OpKernelInfo(const onnxruntime::Node& node, const KernelDef& kernel_def, const IExecutionProvider& execution_provider, - const std::unordered_map& initialized_tensors, - const MLValueNameIdxMap& mlvalue_name_idx_map, - const FuncManager& funcs_mgr); + const std::unordered_map& initialized_tensors, + const MLValueNameIdxMap& mlvalue_name_idx_map, const FuncManager& funcs_mgr); OpKernelInfo(const OpKernelInfo& other); const OrtAllocatorInfo& GetAllocatorInfo(int device_id, OrtMemType mem_type) const; - const AllocatorPtr GetAllocator(int device_id, OrtMemType mem_type) const; + AllocatorPtr GetAllocator(int device_id, OrtMemType mem_type) const; const KernelDef& GetKernelDef() const; @@ -55,8 +53,8 @@ class OpKernelInfo : public OpNodeProtoHelper { // For non cpu/cuda case, this pointer should be set so that function kernel // will delegate kernel compute call to compute call. gsl::not_null execution_provider_; - const std::unordered_map& initialized_tensors_; - const MLValueNameIdxMap& mlvalue_name_idx_map_; + const std::unordered_map& initialized_tensors_; + const MLValueNameIdxMap& ort_value_name_idx_map_; const FuncManager& funcs_mgr_; ProtoHelperNodeContext proto_helper_context_; }; diff --git a/include/onnxruntime/core/framework/tensor.h b/include/onnxruntime/core/framework/tensor.h index 249c7e634036c..260d1731bc6c0 100644 --- a/include/onnxruntime/core/framework/tensor.h +++ b/include/onnxruntime/core/framework/tensor.h @@ -40,7 +40,7 @@ class BufferDeleter { AllocatorPtr alloc_; }; -typedef std::unique_ptr BufferUniquePtr; +using BufferUniquePtr = std::unique_ptr; using BufferNakedPtr = void*; //TODO:ensure dtype_!=nullptr #ifdef __GNUC__ diff --git a/include/onnxruntime/core/framework/tensor_shape.h b/include/onnxruntime/core/framework/tensor_shape.h index 1007601058640..5cf9cf08e0868 100644 --- a/include/onnxruntime/core/framework/tensor_shape.h +++ b/include/onnxruntime/core/framework/tensor_shape.h @@ -127,7 +127,8 @@ class TensorShape : private std::vector { empty shape or 1D shape (1) is regarded as scalar tensor */ bool IsScalar() const { - return size() == 0 || (size() == 1 && at(0) == 1); + size_t len = size(); + return len == 0 || (len == 1 && operator[](0) == 1); } static const TensorShape& ReinterpretBaseType(const std::vector& dimensions) { diff --git a/include/onnxruntime/core/graph/graph.h b/include/onnxruntime/core/graph/graph.h index b0cc98f47f290..66b5954cf5177 100644 --- a/include/onnxruntime/core/graph/graph.h +++ b/include/onnxruntime/core/graph/graph.h @@ -125,9 +125,17 @@ class Node { return common::Status::OK(); } + /** Gets the count of arguments for each of the Node's explicit inputs. */ + const std::vector& InputArgCount() const noexcept { return definitions_.input_arg_count; } + + /** Gets a modifiable count of arguments for each of the Node's explicit inputs. + @todo This should be removed in favor of a method that updates the input args and the count. + Currently these operations are separate which is not a good setup. */ + std::vector& MutableInputArgsCount() { return definitions_.input_arg_count; } + /** Gets the Node's input definitions. @remarks requires ConstPointerContainer wrapper to apply const to the NodeArg pointers so access is read-only. */ - const ConstPointerContainer> InputDefs() const noexcept { + ConstPointerContainer> InputDefs() const noexcept { return ConstPointerContainer>(definitions_.input_defs); } @@ -136,24 +144,11 @@ class Node { return definitions_.input_defs; } - /** Gets a modifiable collection of the Node's output definitions. */ - std::vector& MutableOutputDefs() noexcept { - return definitions_.output_defs; - } - - /** Gets the count of arguments for each of the Node's explicit inputs. */ - const std::vector& InputArgCount() const noexcept { return definitions_.input_arg_count; } - - /** Gets a modifiable count of arguments for each of the Node's explicit inputs. - @todo This should be removed in favor of a method that updates the input args and the count. - Currently these operations are separate which is not a good setup. */ - std::vector& MutableInputArgsCount() { return definitions_.input_arg_count; } - /** Gets the implicit inputs to this Node. If this Node contains a subgraph, these are the NodeArg's that are implicitly consumed by Nodes within that subgraph. e.g. If and Loop operators.*/ - const std::vector& ImplicitInputDefs() const noexcept { - return definitions_.implicit_input_defs; + ConstPointerContainer> ImplicitInputDefs() const noexcept { + return ConstPointerContainer>(definitions_.implicit_input_defs); } /** Gets a modifiable collection of the Node's implicit input definitions. */ @@ -163,10 +158,15 @@ class Node { /** Gets the Node's output definitions. @remarks requires ConstPointerContainer wrapper to apply const to the NodeArg pointers so access is read-only. */ - const ConstPointerContainer> OutputDefs() const noexcept { + ConstPointerContainer> OutputDefs() const noexcept { return ConstPointerContainer>(definitions_.output_defs); } + /** Gets a modifiable collection of the Node's output definitions. */ + std::vector& MutableOutputDefs() noexcept { + return definitions_.output_defs; + } + /** Struct to provide sorting between EdgeEnd instances based on NodeIndex first, and NodeArg::Name second. */ struct EdgeEndCompare { bool operator()(const EdgeEnd& lhs, const EdgeEnd& rhs) const { @@ -211,7 +211,9 @@ class Node { NodeConstIterator InputNodesEnd() const noexcept { return NodeConstIterator(relationships_.input_edges.cend()); } /** Gets an iterator to the beginning of the output nodes from this Node. */ - NodeConstIterator OutputNodesBegin() const noexcept { return NodeConstIterator(relationships_.output_edges.cbegin()); } + NodeConstIterator OutputNodesBegin() const noexcept { + return NodeConstIterator(relationships_.output_edges.cbegin()); + } /** Gets an iterator to the end of the output nodes from this Node. */ NodeConstIterator OutputNodesEnd() const noexcept { return NodeConstIterator(relationships_.output_edges.cend()); } @@ -270,7 +272,7 @@ class Node { */ Graph* GetMutableGraphAttribute(const std::string& attr_name); - /** Checks if the Node contains at least one subgraph (this is the case for control flow operators, such as If, Scan, Loop). + /** Checks if the Node contains at least one subgraph (this is the case for control flow operators, such as If, Scan, Loop). @returns true if the Node contains a subgraph. */ bool ContainsSubgraph() const { @@ -705,6 +707,7 @@ class Graph { /** Gets the GraphProto representation of this Graph. */ const ONNX_NAMESPACE::GraphProto& ToGraphProto(); + ONNX_NAMESPACE::GraphProto ToGraphProto() const; /** Gets the ISchemaRegistry instances being used with this Graph. */ IOnnxRuntimeOpSchemaCollectionPtr GetSchemaRegistry() const; @@ -737,12 +740,12 @@ class Graph { /** When programmatically constructing a Graph, explicitly set graph inputs. @param inputs NodeArgs that represent complete graph inputs which need to be explicitly ordered. @remarks If the Graph was loaded from a GraphProto this has no effect.*/ - void SetInputs(const std::vector inputs); + void SetInputs(const std::vector& inputs); /** When programmatically constructing a Graph, explicitly set graph outputs. @param outputs NodeArgs that represent complete graph outputs which need to be explicitly ordered. @remarks If the Graph was loaded from a GraphProto this has no effect.*/ - void SetOutputs(const std::vector outputs); + void SetOutputs(const std::vector& outputs); /** Returns true if this is a subgraph or fase if it is a high-level graph. */ bool IsSubgraph() const { return parent_graph_ != nullptr; } @@ -877,9 +880,6 @@ class Graph { // Set graph inputs/outputs when resolving a graph.. common::Status SetGraphInputsOutputs(); - // Sync graph inputs/outputs when serializing to proto. - void SyncGraphInputsOutputs(); - // Clear all unused initializers void CleanUnusedInitializers(); @@ -903,6 +903,8 @@ class Graph { void AddFunction(const ONNX_NAMESPACE::FunctionProto* func_proto); + void ToGraphProtoInternal(ONNX_NAMESPACE::GraphProto& graph_proto) const; + // GraphProto to store name, version, initializer. // When serializing <*this> Graph to a GraphProto, the nodes and // functions in will also be fed into so that diff --git a/include/onnxruntime/core/graph/schema_registry.h b/include/onnxruntime/core/graph/schema_registry.h index d78eebdb6f648..c7ee2d3b74fca 100644 --- a/include/onnxruntime/core/graph/schema_registry.h +++ b/include/onnxruntime/core/graph/schema_registry.h @@ -44,10 +44,8 @@ class IOnnxRuntimeOpSchemaCollection : public ONNX_NAMESPACE::ISchemaRegistry { using ISchemaRegistry::GetSchema; - virtual const ONNX_NAMESPACE::OpSchema* GetSchema( - const std::string& key, - const int maxInclusiveVersion, - const std::string& domain) const final { + const ONNX_NAMESPACE::OpSchema* GetSchema(const std::string& key, const int maxInclusiveVersion, + const std::string& domain) const final { const ONNX_NAMESPACE::OpSchema* latest_schema = nullptr; int earliest_opset_where_unchanged = std::numeric_limits::max(); GetSchemaAndHistory(key, maxInclusiveVersion, domain, &latest_schema, &earliest_opset_where_unchanged); @@ -97,12 +95,9 @@ class OnnxRuntimeOpSchemaRegistry : public IOnnxRuntimeOpSchemaCollection { using IOnnxRuntimeOpSchemaCollection::GetSchema; - void GetSchemaAndHistory( - const std::string& key, - const int maxInclusiveVersion, - const std::string& domain, - const ONNX_NAMESPACE::OpSchema** latest_schema, - int* earliest_opset_where_unchanged) const override; + void GetSchemaAndHistory(const std::string& key, int maxInclusiveVersion, const std::string& domain, + const ONNX_NAMESPACE::OpSchema** latest_schema, + int* earliest_opset_where_unchanged) const override; bool empty() const { return map_.empty(); @@ -155,12 +150,9 @@ class SchemaRegistryManager : public onnxruntime::IOnnxRuntimeOpSchemaCollection @param[out] earliest_opset_where_unchanged The earliest opset version preceding max_inclusive_version where the operator is known to be unchanged. */ - void GetSchemaAndHistory( - const std::string& key, - const int max_inclusive_version, - const std::string& domain, - const ONNX_NAMESPACE::OpSchema** latest_schema, - int* earliest_opset_where_unchanged) const override; + void GetSchemaAndHistory(const std::string& key, int max_inclusive_version, const std::string& domain, + const ONNX_NAMESPACE::OpSchema** latest_schema, + int* earliest_opset_where_unchanged) const override; private: std::deque> registries; diff --git a/include/onnxruntime/core/optimizer/graph_transformer.h b/include/onnxruntime/core/optimizer/graph_transformer.h index bbe65aadafa0d..77753458140b9 100644 --- a/include/onnxruntime/core/optimizer/graph_transformer.h +++ b/include/onnxruntime/core/optimizer/graph_transformer.h @@ -2,6 +2,8 @@ // Licensed under the MIT License. #pragma once +#include +#include #include "core/common/common.h" #include "core/graph/graph_viewer.h" @@ -59,7 +61,7 @@ class GraphTransformer { // You should avoid calling Graph::Resolve in ApplyImpl unless you are 100% sure it's required. In most cases // the call to Graph::Resolve in Apply prior to ApplyImpl being called, and after ApplyImpl fore the main graph // completes (if 'modified' is true) should suffice. - virtual common::Status ApplyImpl(Graph& graph, bool& modified, int graph_level = 0) const = 0; + virtual common::Status ApplyImpl(Graph& graph, bool& modified, int graph_level) const = 0; const std::string name_; const std::unordered_set compatible_provider_types_; diff --git a/include/onnxruntime/core/optimizer/rewrite_rule.h b/include/onnxruntime/core/optimizer/rewrite_rule.h index 4401fb0f77307..f481439e5ff00 100644 --- a/include/onnxruntime/core/optimizer/rewrite_rule.h +++ b/include/onnxruntime/core/optimizer/rewrite_rule.h @@ -35,6 +35,18 @@ If the list of op types is left empty, that rule will be triggered for every op */ class RewriteRule { public: + /** + @class RewriteRuleEffect + + Class used to indicate the effect of rule application on a graph's node. + */ + enum class RewriteRuleEffect : uint8_t { + kNone, // The rewrite rule has not modified the graph. + kUpdatedCurrentNode, // The rewrite rule updated (but did not remove) the node on which it was triggered. + kRemovedCurrentNode, // The rewrite rule removed the node on which it was triggered. + kModifiedRestOfGraph // The rewrite rule modified nodes other than the one it was triggered on. + }; + RewriteRule(const std::string& name) : name_(name) {} virtual ~RewriteRule() = default; @@ -52,11 +64,10 @@ class RewriteRule { /** Checks if the condition of the rule is satisfied, and if so applies the body of the rule. @param[in] graph The Graph. @param[in] node The Node to apply the rewrite to. - @param[out] modified Set to indicate whether the node was modified or not. - @param[out] deleted Set to indicate if the node was deleted. + @param[out] rule_effect Enum to indicate if and how the graph was modified as a result of the rule application. @returns Status indicating success or providing error information */ - common::Status CheckConditionAndApply(Graph& graph, Node& node, bool& modified, bool& deleted) { - return SatisfyCondition(graph, node) ? Apply(graph, node, modified, deleted) : Status::OK(); + common::Status CheckConditionAndApply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) { + return SatisfyCondition(graph, node) ? Apply(graph, node, rule_effect) : Status::OK(); } private: @@ -72,8 +83,7 @@ class RewriteRule { /** This is the actual body of the rule that performs the graph transformation. The transformation happens in-place. The return-value of node may be different from the input-value due to rewriting. - The value of "modified" indicates if the graph was modified or not. - The value of "deleted" indicates if the node was deleted or not. */ - virtual common::Status Apply(Graph& graph, Node& node, bool& modified, bool& deleted) = 0; + The value of "rule_effect" indicates whether and how the graph was modified by the rule. */ + virtual common::Status Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) = 0; }; } // namespace onnxruntime diff --git a/include/onnxruntime/core/optimizer/rule_based_graph_transformer.h b/include/onnxruntime/core/optimizer/rule_based_graph_transformer.h index b20abadec3c9e..4a3fe8159a4bb 100644 --- a/include/onnxruntime/core/optimizer/rule_based_graph_transformer.h +++ b/include/onnxruntime/core/optimizer/rule_based_graph_transformer.h @@ -58,14 +58,16 @@ class RuleBasedGraphTransformer : public GraphTransformer { @param[in] graph The Graph. @param[in] node The Node to apply the rules to. @param[in] rules The vector of RewriteRules that will be applied to the Node. - @param[out] modified Set to indicate whether the node was modified or not. - @param[out] deleted Set to indicate if the node was deleted. + @param[out] rule_effect Enum that indicates whether and how the graph was modified as a result of + applying rules on this node. @returns Status indicating success or providing error information. */ common::Status ApplyRulesOnNode(Graph& graph, Node& node, const std::vector>& rules, - bool& modified, bool& deleted) const; + RewriteRule::RewriteRuleEffect& rule_effect) const; private: + using RuleEffect = RewriteRule::RewriteRuleEffect; + // Map that associates a node's op type with the vector of rules that are registered to be triggered for that node. std::unordered_map>> op_type_to_rules_; // Rules that will be evaluated regardless of the op type of the node. diff --git a/include/onnxruntime/core/providers/providers.h b/include/onnxruntime/core/providers/providers.h index fc16812417674..234a0cc5c6d37 100644 --- a/include/onnxruntime/core/providers/providers.h +++ b/include/onnxruntime/core/providers/providers.h @@ -5,7 +5,7 @@ namespace onnxruntime { class IExecutionProvider; struct IExecutionProviderFactory { - virtual ~IExecutionProviderFactory() {} + virtual ~IExecutionProviderFactory() = default; virtual std::unique_ptr CreateProvider() = 0; }; } // namespace onnxruntime diff --git a/include/onnxruntime/core/session/onnxruntime_c_api.h b/include/onnxruntime/core/session/onnxruntime_c_api.h index 599e17cbc7976..de35625185e39 100644 --- a/include/onnxruntime/core/session/onnxruntime_c_api.h +++ b/include/onnxruntime/core/session/onnxruntime_c_api.h @@ -185,10 +185,12 @@ ORT_API_STATUS(OrtCreateEnvWithCustomLogger, OrtLoggingFunction logging_function // execution of OrtCreateSession, or does the OrtSession retain a handle to the file/directory // and continue to access throughout the OrtSession lifetime? // What sort of access is needed to model_path : read or read/write? -// TODO: allow loading from an in-memory byte-array ORT_API_STATUS(OrtCreateSession, _In_ OrtEnv* env, _In_ const ORTCHAR_T* model_path, _In_ const OrtSessionOptions* options, _Out_ OrtSession** out); +ORT_API_STATUS(OrtCreateSessionFromArray, _In_ OrtEnv* env, _In_ const void* model_data, int model_data_len, + _In_ const OrtSessionOptions* options, _Out_ OrtSession** out); + ORT_API_STATUS(OrtRun, _Inout_ OrtSession* sess, _In_ OrtRunOptions* run_options, _In_ const char* const* input_names, _In_ const OrtValue* const* input, size_t input_len, @@ -237,6 +239,7 @@ ORT_API(void, OrtSetSessionLogVerbosityLevel, _In_ OrtSessionOptions* options, u ORT_API(int, OrtSetSessionGraphOptimizationLevel, _In_ OrtSessionOptions* options, uint32_t graph_optimization_level); // How many threads in the session thread pool. +// Returns 0 on success, and -1 otherwise ORT_API(int, OrtSetSessionThreadPoolSize, _In_ OrtSessionOptions* options, int session_thread_pool_size); /** @@ -384,10 +387,10 @@ ORT_API_STATUS(OrtSetTensorElementType, _In_ OrtTensorTypeAndShapeInfo*, enum ON * \param dim_values An array with length of `dim_count`. Its elements can contain negative values. * \param dim_count length of dim_values */ -ORT_API_STATUS(OrtSetDims, OrtTensorTypeAndShapeInfo* info, _In_ const int64_t* dim_values, size_t dim_count); +ORT_API_STATUS(OrtSetDimensions, OrtTensorTypeAndShapeInfo* info, _In_ const int64_t* dim_values, size_t dim_count); ORT_API(enum ONNXTensorElementDataType, OrtGetTensorElementType, _In_ const OrtTensorTypeAndShapeInfo*); -ORT_API(size_t, OrtGetNumOfDimensions, _In_ const OrtTensorTypeAndShapeInfo* info); +ORT_API(size_t, OrtGetDimensionsCount, _In_ const OrtTensorTypeAndShapeInfo* info); ORT_API(void, OrtGetDimensions, _In_ const OrtTensorTypeAndShapeInfo* info, _Out_ int64_t* dim_values, size_t dim_values_length); /** @@ -404,7 +407,7 @@ ORT_API(int64_t, OrtGetTensorShapeElementCount, _In_ const OrtTensorTypeAndShape /** * \param out Should be freed by OrtReleaseTensorTypeAndShapeInfo after use */ -ORT_API_STATUS(OrtGetTensorShapeAndType, _In_ const OrtValue* value, _Out_ OrtTensorTypeAndShapeInfo** out); +ORT_API_STATUS(OrtGetTensorTypeAndShape, _In_ const OrtValue* value, _Out_ OrtTensorTypeAndShapeInfo** out); /** * Get the type information of an OrtValue @@ -522,7 +525,7 @@ ORT_API_STATUS(OrtGetValueCount, const OrtValue* value, size_t* out); * sequence. 'in' should be an arrary of N OrtValues. * \value_type should be either map or sequence. */ -ORT_API_STATUS(OrtCreateValue, OrtValue** const in, int num_values, enum ONNXType value_type, +ORT_API_STATUS(OrtCreateValue, OrtValue** in, size_t num_values, enum ONNXType value_type, OrtValue** out); /* @@ -547,17 +550,21 @@ struct OrtCustomOpApi { OrtStatus*(ORT_API_CALL* KernelInfoGetAttribute_float)(_In_ const OrtKernelInfo* info, _In_ const char* name, _Out_ float* out); OrtStatus*(ORT_API_CALL* KernelInfoGetAttribute_int64)(_In_ const OrtKernelInfo* info, _In_ const char* name, _Out_ int64_t* out); - OrtStatus*(ORT_API_CALL* GetTensorShapeAndType)(_In_ const OrtValue* value, _Out_ OrtTensorTypeAndShapeInfo** out); + OrtStatus*(ORT_API_CALL* GetTensorTypeAndShape)(_In_ const OrtValue* value, _Out_ OrtTensorTypeAndShapeInfo** out); int64_t(ORT_API_CALL* GetTensorShapeElementCount)(_In_ const OrtTensorTypeAndShapeInfo* info); + enum ONNXTensorElementDataType(ORT_API_CALL* GetTensorElementType)(_In_ const OrtTensorTypeAndShapeInfo*); + size_t(ORT_API_CALL* GetDimensionCount)(_In_ const OrtTensorTypeAndShapeInfo* info); void(ORT_API_CALL* GetDimensions)(_In_ const OrtTensorTypeAndShapeInfo* info, _Out_ int64_t* dim_values, size_t dim_values_length); OrtStatus*(ORT_API_CALL* SetDimensions)(OrtTensorTypeAndShapeInfo* info, _In_ const int64_t* dim_values, size_t dim_count); - OrtStatus*(ORT_API_CALL* GetTensorMutableData)(_Inout_ OrtValue* value, void** data); + OrtStatus*(ORT_API_CALL* GetTensorMutableData)(_Inout_ OrtValue* value, _Out_ void** data); void(ORT_API_CALL* ReleaseTensorTypeAndShapeInfo)(OrtTensorTypeAndShapeInfo* input); - OrtValue*(ORT_API_CALL* KernelContext_GetInput)(OrtKernelContext* context, _In_ size_t index); + size_t(ORT_API_CALL* KernelContext_GetInputCount)(const OrtKernelContext* context); + const OrtValue*(ORT_API_CALL* KernelContext_GetInput)(const OrtKernelContext* context, _In_ size_t index); + size_t(ORT_API_CALL* KernelContext_GetOutputCount)(const OrtKernelContext* context); OrtValue*(ORT_API_CALL* KernelContext_GetOutput)(OrtKernelContext* context, _In_ size_t index, _In_ const int64_t* dim_values, size_t dim_count); }; typedef struct OrtCustomOpApi OrtCustomOpApi; @@ -582,7 +589,6 @@ struct OrtCustomOp { size_t(ORT_API_CALL* GetOutputTypeCount)(_In_ struct OrtCustomOp* op); // Op kernel callbacks - void(ORT_API_CALL* KernelGetOutputShape)(_In_ void* op_kernel, _In_ OrtKernelContext* context, _In_ size_t output_index, _In_ OrtTensorTypeAndShapeInfo* output); void(ORT_API_CALL* KernelCompute)(_In_ void* op_kernel, _In_ OrtKernelContext* context); void(ORT_API_CALL* KernelDestroy)(_In_ void* op_kernel); }; diff --git a/include/onnxruntime/core/session/onnxruntime_cxx_api.h b/include/onnxruntime/core/session/onnxruntime_cxx_api.h index c0be19e149a71..7e75c6b9caab4 100644 --- a/include/onnxruntime/core/session/onnxruntime_cxx_api.h +++ b/include/onnxruntime/core/session/onnxruntime_cxx_api.h @@ -3,79 +3,506 @@ #pragma once #include "onnxruntime_c_api.h" -#include -#include -#include +#include +#include +#include #include -#include "core/common/exceptions.h" - -//TODO: encode error code in the message? -#define ORT_THROW_ON_ERROR(expr) \ - do { \ - OrtStatus* onnx_status = (expr); \ - if (onnx_status != nullptr) { \ - std::string ort_error_message = OrtGetErrorMessage(onnx_status); \ - OrtErrorCode error_code = OrtGetErrorCode(onnx_status); \ - OrtReleaseStatus(onnx_status); \ - switch (error_code) { \ - case ORT_NOT_IMPLEMENTED: \ - throw onnxruntime::NotImplementedException(ort_error_message); \ - default: \ - throw onnxruntime::OnnxRuntimeException(ORT_WHERE, ort_error_message); \ - } \ - } \ - } while (0); +#include +#include +#include #define ORT_REDIRECT_SIMPLE_FUNCTION_CALL(NAME) \ decltype(Ort##NAME(value.get())) NAME() { \ return Ort##NAME(value.get()); \ } +#define ORT_DEFINE_DELETER(NAME) \ + template <> \ + struct default_delete { \ + void operator()(Ort##NAME* ptr) { \ + OrtRelease##NAME(ptr); \ + } \ + }; + namespace std { -template <> -struct default_delete { - void operator()(OrtAllocator* ptr) { - OrtReleaseAllocator(ptr); - } +ORT_DEFINE_DELETER(Allocator); +ORT_DEFINE_DELETER(TypeInfo); +ORT_DEFINE_DELETER(RunOptions); +ORT_DEFINE_DELETER(SessionOptions); +ORT_DEFINE_DELETER(TensorTypeAndShapeInfo); +} // namespace std + +namespace Ort { + +using std::nullptr_t; + +struct Exception : std::exception { + Exception(std::string&& string, OrtErrorCode code) : message_{std::move(string)}, code_{code} {} + + OrtErrorCode GetOrtErrorCode() const { return code_; } + const char* what() const noexcept override { return message_.c_str(); } + + private: + std::string message_; + OrtErrorCode code_; }; -template <> -struct default_delete { - void operator()(OrtEnv* ptr) { - OrtReleaseEnv(ptr); +#define ORT_THROW_ON_ERROR(expr) \ + if (OrtStatus* onnx_status = (expr)) { \ + std::string ort_error_message = OrtGetErrorMessage(onnx_status); \ + OrtErrorCode ort_error_code = OrtGetErrorCode(onnx_status); \ + OrtReleaseStatus(onnx_status); \ + throw Ort::Exception(std::move(ort_error_message), ort_error_code); \ } -}; -template <> -struct default_delete { - void operator()(OrtRunOptions* ptr) { - OrtReleaseRunOptions(ptr); +#define ORT_DEFINE_RELEASE(NAME) \ + inline void Release(Ort##NAME* ptr) { OrtRelease##NAME(ptr); } + +ORT_DEFINE_RELEASE(Allocator); +ORT_DEFINE_RELEASE(AllocatorInfo); +ORT_DEFINE_RELEASE(CustomOpDomain); +ORT_DEFINE_RELEASE(Env); +ORT_DEFINE_RELEASE(RunOptions); +ORT_DEFINE_RELEASE(Session); +ORT_DEFINE_RELEASE(SessionOptions); +ORT_DEFINE_RELEASE(TensorTypeAndShapeInfo); +ORT_DEFINE_RELEASE(TypeInfo); +ORT_DEFINE_RELEASE(Value); + +template +struct Base { + Base() = default; + Base(T* p) : p_{p} {} + ~Base() { Release(p_); } + + operator T*() { return p_; } + operator const T*() const { return p_; } + + T* release() { + T* p = p_; + p_ = nullptr; + return p; } -}; -template <> -struct default_delete { - void operator()(OrtTypeInfo* ptr) { - OrtReleaseTypeInfo(ptr); + protected: + Base(const Base&) = delete; + Base(Base&& v) : p_{v.p_} { v.p_ = nullptr; } + void operator=(Base&& v) { + Release(p_); + p_ = v.p_; + v.p_ = nullptr; } + + T* p_{}; + + template + friend struct Unowned; }; -template <> -struct default_delete { - void operator()(OrtTensorTypeAndShapeInfo* ptr) { - OrtReleaseTensorTypeAndShapeInfo(ptr); - } +template +struct Unowned : T { + Unowned(decltype(T::p_) p) : T{p} {} + Unowned(Unowned&& v) : T{v.p_} {} + ~Unowned() { this->p_ = nullptr; } }; -template <> -struct default_delete { - void operator()(OrtSessionOptions* ptr) { - OrtReleaseSessionOptions(ptr); - } +struct Allocator; +struct AllocatorInfo; +struct Env; +struct TypeInfo; +struct Value; + +struct Env : Base { + Env(nullptr_t) {} + Env(OrtLoggingLevel default_warning_level, _In_ const char* logid); }; -} // namespace std +struct CustomOpDomain : Base { + explicit CustomOpDomain(nullptr_t) {} + explicit CustomOpDomain(const char* domain); + + void Add(OrtCustomOp* op); +}; + +struct RunOptions : Base { + RunOptions(nullptr_t) {} + RunOptions(); + + RunOptions& SetRunLogVerbosityLevel(unsigned int); + unsigned int GetRunLogVerbosityLevel() const; + + RunOptions& SetRunTag(const char* run_tag); + const char* GetRunTag() const; + + RunOptions& SetTerminate(bool flag); +}; + +struct SessionOptions : Base { + explicit SessionOptions(nullptr_t) {} + SessionOptions(); + explicit SessionOptions(OrtSessionOptions* p) : Base{p} {} + + SessionOptions Clone() const; + + SessionOptions& SetThreadPoolSize(int session_thread_pool_size); + SessionOptions& SetGraphOptimizationLevel(uint32_t graph_optimization_level); + + SessionOptions& EnableCpuMemArena(); + SessionOptions& DisableCpuMemArena(); + + SessionOptions& EnableMemPattern(); + SessionOptions& DisableMemPattern(); + + SessionOptions& EnableSequentialExecution(); + SessionOptions& DisableSequentialExecution(); + + SessionOptions& SetLogId(const char* logid); + + SessionOptions& Add(OrtCustomOpDomain* custom_op_domain); +}; + +struct Session : Base { + explicit Session(nullptr_t) {} + Session(Env& env, const ORTCHAR_T* model_path, const SessionOptions& options); + + std::vector Run(RunOptions& run_options, const char* const* input_names, Value* input_values, size_t input_count, + const char* const* output_names, size_t output_names_count); + + size_t GetInputCount() const; + size_t GetOutputCount() const; + + char* GetInputName(size_t index, OrtAllocator* allocator) const; + char* GetOutputName(size_t index, OrtAllocator* allocator) const; + + TypeInfo GetInputTypeInfo(size_t index) const; + TypeInfo GetOutputTypeInfo(size_t index) const; +}; + +struct TensorTypeAndShapeInfo : Base { + explicit TensorTypeAndShapeInfo(nullptr_t) {} + explicit TensorTypeAndShapeInfo(OrtTensorTypeAndShapeInfo* p) : Base{p} {} + + ONNXTensorElementDataType GetElementType() const; + + size_t GetDimensionsCount() const; + void GetDimensions(int64_t* values, size_t values_count) const; + std::vector GetShape() const; +}; + +struct TypeInfo : Base { + explicit TypeInfo(nullptr_t) {} + explicit TypeInfo(OrtTypeInfo* p) : Base{p} {} + + Unowned GetTensorTypeAndShapeInfo() const; +}; + +struct Value : Base { + static Value CreateTensor(const AllocatorInfo& info, void* p_data, size_t p_data_len, const int64_t* shape, size_t shape_len, + ONNXTensorElementDataType type); + static Value CreateMap(Value& keys, Value& values); + static Value CreateSequence(std::vector& values); + + explicit Value(nullptr_t) {} + explicit Value(OrtValue* p) : Base{p} {} + + bool IsTensor() const; + size_t GetCount() const; // If a non tensor, returns 2 for map and N for sequence, where N is the number of elements + Value GetValue(int index, OrtAllocator* allocator) const; + + size_t GetStringTensorDataLength() const; + void GetStringTensorContent(void* buffer, size_t buffer_length, size_t* offsets, size_t offsets_count) const; + + template + T* GetTensorMutableData(); + + TensorTypeAndShapeInfo GetTensorTypeAndShapeInfo() const; +}; + +struct Allocator : Base { + static Allocator CreateDefault(); + + explicit Allocator(nullptr_t) {} + explicit Allocator(OrtAllocator* p) : Base{p} {} + + void* Alloc(size_t size); + void Free(void* p); + + const OrtAllocatorInfo* GetInfo() const; +}; + +struct AllocatorInfo : Base { + static AllocatorInfo CreateCpu(OrtAllocatorType type, OrtMemType mem_type1); + + explicit AllocatorInfo(nullptr_t) {} + AllocatorInfo(const char* name, OrtAllocatorType type, int id, OrtMemType mem_type); + + explicit AllocatorInfo(OrtAllocatorInfo* p) : Base{p} {} +}; + +} // namespace Ort + +namespace Ort { + +inline Allocator Allocator::CreateDefault() { + OrtAllocator* p; + ORT_THROW_ON_ERROR(OrtCreateDefaultAllocator(&p)); + return Allocator(p); +} + +inline void* Allocator::Alloc(size_t size) { + return OrtAllocatorAlloc(p_, size); +} + +inline void Allocator::Free(void* p) { + OrtAllocatorFree(p_, p); +} + +inline const OrtAllocatorInfo* Allocator::GetInfo() const { + return OrtAllocatorGetInfo(p_); +} + +inline AllocatorInfo AllocatorInfo::CreateCpu(OrtAllocatorType type, OrtMemType mem_type) { + OrtAllocatorInfo* p; + ORT_THROW_ON_ERROR(OrtCreateCpuAllocatorInfo(type, mem_type, &p)); + return AllocatorInfo(p); +} + +inline AllocatorInfo::AllocatorInfo(const char* name, OrtAllocatorType type, int id, OrtMemType mem_type) { + ORT_THROW_ON_ERROR(OrtCreateAllocatorInfo(name, type, id, mem_type, &p_)); +} + +inline Env::Env(OrtLoggingLevel default_warning_level, _In_ const char* logid) { + ORT_THROW_ON_ERROR(OrtCreateEnv(default_warning_level, logid, &p_)); +} + +inline CustomOpDomain::CustomOpDomain(const char* domain) + : Base{OrtCreateCustomOpDomain(domain)} { +} + +inline void CustomOpDomain::Add(OrtCustomOp* op) { + ORT_THROW_ON_ERROR(OrtCustomOpDomain_Add(p_, op)); +} + +inline RunOptions::RunOptions() : Base{OrtCreateRunOptions()} {} + +inline RunOptions& RunOptions::SetRunLogVerbosityLevel(unsigned int level) { + ORT_THROW_ON_ERROR(OrtRunOptionsSetRunLogVerbosityLevel(p_, level)); + return *this; +} + +inline unsigned int RunOptions::GetRunLogVerbosityLevel() const { + return OrtRunOptionsGetRunLogVerbosityLevel(p_); +} + +inline RunOptions& RunOptions::SetRunTag(const char* run_tag) { + ORT_THROW_ON_ERROR(OrtRunOptionsSetRunTag(p_, run_tag)); + return *this; +} + +inline const char* RunOptions::GetRunTag() const { + return OrtRunOptionsGetRunTag(p_); +} + +inline RunOptions& RunOptions::SetTerminate(bool flag) { + OrtRunOptionsSetTerminate(p_, flag ? 1 : 0); + return *this; +} + +inline SessionOptions::SessionOptions() : Base{OrtCreateSessionOptions()} { +} + +inline SessionOptions SessionOptions::Clone() const { + return SessionOptions{OrtCloneSessionOptions(p_)}; +} + +inline SessionOptions& SessionOptions::SetThreadPoolSize(int session_thread_pool_size) { + if (OrtSetSessionThreadPoolSize(p_, session_thread_pool_size) == -1) + throw Exception("Error calling SessionOptions::SetThreadPoolSize", ORT_FAIL); + return *this; +} + +inline SessionOptions& SessionOptions::SetGraphOptimizationLevel(uint32_t graph_optimization_level) { + if (OrtSetSessionGraphOptimizationLevel(p_, graph_optimization_level) == -1) + throw Exception("Error calling SessionOptions::SetGraphOptimizationLevel", ORT_FAIL); + return *this; +} + +inline SessionOptions& SessionOptions::EnableMemPattern() { + OrtEnableMemPattern(p_); + return *this; +} + +inline SessionOptions& SessionOptions::DisableMemPattern() { + OrtDisableMemPattern(p_); + return *this; +} + +inline SessionOptions& SessionOptions::EnableCpuMemArena() { + OrtEnableCpuMemArena(p_); + return *this; +} + +inline SessionOptions& SessionOptions::DisableCpuMemArena() { + OrtDisableCpuMemArena(p_); + return *this; +} + +inline SessionOptions& SessionOptions::EnableSequentialExecution() { + OrtEnableSequentialExecution(p_); + return *this; +} + +inline SessionOptions& SessionOptions::DisableSequentialExecution() { + OrtDisableSequentialExecution(p_); + return *this; +} + +inline SessionOptions& SessionOptions::SetLogId(const char* logid) { + OrtSetSessionLogId(p_, logid); + return *this; +} +inline SessionOptions& SessionOptions::Add(OrtCustomOpDomain* custom_op_domain) { + ORT_THROW_ON_ERROR(OrtAddCustomOpDomain(p_, custom_op_domain)); + return *this; +} + +inline Session::Session(Env& env, const ORTCHAR_T* model_path, const SessionOptions& options) { + ORT_THROW_ON_ERROR(OrtCreateSession(env, model_path, options, &p_)); +} + +inline std::vector Session::Run(RunOptions& run_options, const char* const* input_names, Value* input_values, size_t input_count, + const char* const* output_names, size_t output_names_count) { + std::vector ort_input_values(input_values, input_values + input_count); + std::vector ort_out(output_names_count); + ORT_THROW_ON_ERROR(OrtRun(p_, run_options, input_names, ort_input_values.data(), ort_input_values.size(), output_names, output_names_count, ort_out.data())); + std::vector out(ort_out.begin(), ort_out.end()); + return out; +} + +inline size_t Session::GetInputCount() const { + size_t out; + ORT_THROW_ON_ERROR(OrtSessionGetInputCount(p_, &out)); + return out; +} + +inline size_t Session::GetOutputCount() const { + size_t out; + ORT_THROW_ON_ERROR(OrtSessionGetOutputCount(p_, &out)); + return out; +} + +inline char* Session::GetInputName(size_t index, OrtAllocator* allocator) const { + char* out; + ORT_THROW_ON_ERROR(OrtSessionGetInputName(p_, index, allocator, &out)); + return out; +} + +inline char* Session::GetOutputName(size_t index, OrtAllocator* allocator) const { + char* out; + ORT_THROW_ON_ERROR(OrtSessionGetOutputName(p_, index, allocator, &out)); + return out; +} + +inline TypeInfo Session::GetInputTypeInfo(size_t index) const { + OrtTypeInfo* out; + ORT_THROW_ON_ERROR(OrtSessionGetInputTypeInfo(p_, index, &out)); + return TypeInfo{out}; +} + +inline TypeInfo Session::GetOutputTypeInfo(size_t index) const { + OrtTypeInfo* out; + ORT_THROW_ON_ERROR(OrtSessionGetOutputTypeInfo(p_, index, &out)); + return TypeInfo{out}; +} + +inline ONNXTensorElementDataType TensorTypeAndShapeInfo::GetElementType() const { + return OrtGetTensorElementType(p_); +} + +inline size_t TensorTypeAndShapeInfo::GetDimensionsCount() const { + return OrtGetDimensionsCount(p_); +} + +inline void TensorTypeAndShapeInfo::GetDimensions(int64_t* values, size_t values_count) const { + OrtGetDimensions(p_, values, values_count); +} + +inline std::vector TensorTypeAndShapeInfo::GetShape() const { + std::vector out(GetDimensionsCount(), 0); + GetDimensions(out.data(), out.size()); + return out; +} + +inline Unowned TypeInfo::GetTensorTypeAndShapeInfo() const { + return Unowned{const_cast(OrtCastTypeInfoToTensorInfo(p_))}; +} + +inline Value Value::CreateTensor(const AllocatorInfo& info, void* p_data, size_t p_data_len, const int64_t* shape, size_t shape_len, + ONNXTensorElementDataType type) { + OrtValue* out; + ORT_THROW_ON_ERROR(OrtCreateTensorWithDataAsOrtValue(info, p_data, p_data_len, shape, shape_len, type, &out)); + return Value{out}; +} + +inline Value Value::CreateMap(Value& keys, Value& values) { + OrtValue* out; + OrtValue* inputs[2] = {keys, values}; + ORT_THROW_ON_ERROR(OrtCreateValue(inputs, 2, ONNX_TYPE_MAP, &out)); + return Value{out}; +} + +inline Value Value::CreateSequence(std::vector& values) { + OrtValue* out; + std::vector values_ort{values.data(), values.data() + values.size()}; + ORT_THROW_ON_ERROR(OrtCreateValue(values_ort.data(), values_ort.size(), ONNX_TYPE_SEQUENCE, &out)); + return Value{out}; +} // namespace Ort + +inline bool Value::IsTensor() const { + return OrtIsTensor(p_) != 0; +} + +inline size_t Value::GetCount() const { + size_t out; + ORT_THROW_ON_ERROR(OrtGetValueCount(p_, &out)); + return out; +} + +inline Value Value::GetValue(int index, OrtAllocator* allocator) const { + OrtValue* out; + ORT_THROW_ON_ERROR(OrtGetValue(p_, index, allocator, &out)); + return Value{out}; +} + +inline size_t Value::GetStringTensorDataLength() const { + size_t out; + ORT_THROW_ON_ERROR(OrtGetStringTensorDataLength(p_, &out)); + return out; +} + +inline void Value::GetStringTensorContent(void* buffer, size_t buffer_length, size_t* offsets, size_t offsets_count) const { + ORT_THROW_ON_ERROR(OrtGetStringTensorContent(p_, buffer, buffer_length, offsets, offsets_count)); +} + +template +T* Value::GetTensorMutableData() { + T* out; + ORT_THROW_ON_ERROR(OrtGetTensorMutableData(p_, (void**)&out)); + return out; +} + +inline TensorTypeAndShapeInfo Value::GetTensorTypeAndShapeInfo() const { + OrtTensorTypeAndShapeInfo* output; + ORT_THROW_ON_ERROR(OrtGetTensorTypeAndShape(p_, &output)); + return TensorTypeAndShapeInfo{output}; +} + +} // namespace Ort + +// Deprecated: Will be removed once all dependencies of it are removed +#if 1 namespace onnxruntime { + class SessionOptionsWrapper { private: std::unique_ptr value; @@ -115,20 +542,14 @@ class SessionOptionsWrapper { OrtSessionOptions* p = OrtCloneSessionOptions(value.get()); return SessionOptionsWrapper(env_, p); } -#ifdef _WIN32 - OrtSession* OrtCreateSession(_In_ const wchar_t* model_path) { - OrtSession* ret = nullptr; - ORT_THROW_ON_ERROR(::OrtCreateSession(env_, model_path, value.get(), &ret)); - return ret; - } -#else - OrtSession* OrtCreateSession(_In_ const char* model_path) { + + OrtSession* OrtCreateSession(_In_ const ORTCHAR_T* model_path) { OrtSession* ret = nullptr; ORT_THROW_ON_ERROR(::OrtCreateSession(env_, model_path, value.get(), &ret)); return ret; } -#endif }; + inline OrtValue* OrtCreateTensorAsOrtValue(_Inout_ OrtAllocator* env, const std::vector& shape, ONNXTensorElementDataType type) { OrtValue* ret; ORT_THROW_ON_ERROR(::OrtCreateTensorAsOrtValue(env, shape.data(), shape.size(), type, &ret)); @@ -142,21 +563,25 @@ inline OrtValue* OrtCreateTensorWithDataAsOrtValue(_In_ const OrtAllocatorInfo* } inline std::vector GetTensorShape(const OrtTensorTypeAndShapeInfo* info) { - size_t dims = OrtGetNumOfDimensions(info); + size_t dims = OrtGetDimensionsCount(info); std::vector ret(dims); OrtGetDimensions(info, ret.data(), ret.size()); return ret; } +} // namespace onnxruntime +#endif + +namespace Ort { struct CustomOpApi { CustomOpApi(const OrtCustomOpApi& api) : api_(api) {} template T KernelInfoGetAttribute(_In_ const OrtKernelInfo* info, _In_ const char* name); - OrtTensorTypeAndShapeInfo* GetTensorShapeAndType(_In_ const OrtValue* value) { + OrtTensorTypeAndShapeInfo* GetTensorTypeAndShape(_In_ const OrtValue* value) { OrtTensorTypeAndShapeInfo* out; - ORT_THROW_ON_ERROR(api_.GetTensorShapeAndType(value, &out)); + ORT_THROW_ON_ERROR(api_.GetTensorTypeAndShape(value, &out)); return out; } @@ -164,6 +589,10 @@ struct CustomOpApi { return api_.GetTensorShapeElementCount(info); } + ONNXTensorElementDataType GetTensorElementType(const OrtTensorTypeAndShapeInfo* info) { + return api_.GetTensorElementType(info); + } + size_t GetDimensionCount(_In_ const OrtTensorTypeAndShapeInfo* info) { return api_.GetDimensionCount(info); } @@ -183,13 +612,33 @@ struct CustomOpApi { return data; } + template + const T* GetTensorData(_Inout_ const OrtValue* value) { + return GetTensorMutableData(const_cast(value)); + } + + std::vector GetTensorShape(const OrtTensorTypeAndShapeInfo* info) { + std::vector output(GetDimensionCount(info)); + GetDimensions(info, output.data(), output.size()); + return output; + } + void ReleaseTensorTypeAndShapeInfo(OrtTensorTypeAndShapeInfo* input) { api_.ReleaseTensorTypeAndShapeInfo(input); } - OrtValue* KernelContext_GetInput(OrtKernelContext* context, _In_ size_t index) { + size_t KernelContext_GetInputCount(const OrtKernelContext* context) { + return api_.KernelContext_GetInputCount(context); + } + + const OrtValue* KernelContext_GetInput(const OrtKernelContext* context, _In_ size_t index) { return api_.KernelContext_GetInput(context, index); } + + size_t KernelContext_GetOutputCount(const OrtKernelContext* context) { + return api_.KernelContext_GetOutputCount(context); + } + OrtValue* KernelContext_GetOutput(OrtKernelContext* context, _In_ size_t index, _In_ const int64_t* dim_values, size_t dim_count) { return api_.KernelContext_GetOutput(context, index, dim_values, dim_count); } @@ -225,12 +674,11 @@ struct CustomOpBase : OrtCustomOp { OrtCustomOp::GetOutputTypeCount = [](OrtCustomOp* this_) { return static_cast(this_)->GetOutputTypeCount(); }; OrtCustomOp::GetOutputType = [](OrtCustomOp* this_, size_t index) { return static_cast(this_)->GetOutputType(index); }; - OrtCustomOp::KernelGetOutputShape = [](void* op_kernel, OrtKernelContext* context, size_t output_index, OrtTensorTypeAndShapeInfo* output) { static_cast(op_kernel)->GetOutputShape(context, output_index, output); }; OrtCustomOp::KernelCompute = [](void* op_kernel, OrtKernelContext* context) { static_cast(op_kernel)->Compute(context); }; OrtCustomOp::KernelDestroy = [](void* op_kernel) { delete static_cast(op_kernel); }; } }; -} // namespace onnxruntime +} // namespace Ort #undef ORT_REDIRECT_SIMPLE_FUNCTION_CALL diff --git a/onnxruntime/contrib_ops/contrib_kernels.cc b/onnxruntime/contrib_ops/contrib_kernels.cc index c38f9d72f0d52..f190c3344c002 100644 --- a/onnxruntime/contrib_ops/contrib_kernels.cc +++ b/onnxruntime/contrib_ops/contrib_kernels.cc @@ -18,6 +18,8 @@ class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, WordC class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, GatherND); class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, MurmurHash3); class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, float, MaxpoolWithMask); +class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, Pad); +class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kMSDomain, 1, Unique); // This section includes all opkernel declarations for former experimental ops which have now been removed from onnx. // To maintain backward compatibility these are added as contrib ops. @@ -47,44 +49,44 @@ class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 1, Sca void RegisterContribKernels(KernelRegistry& kernel_registry) { static const BuildKernelCreateInfoFn function_table[] = { - BuildKernelCreateInfo, + BuildKernelCreateInfo, - // add more kernels here - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - - // These ops were experimental ops in onnx domain which have been removed now. We add them here as - // contrib ops to main backward compatibility - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - }; + // add more kernels here + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + // These ops were experimental ops in onnx domain which have been removed now. We add them here as + // contrib ops to main backward compatibility + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo}; for (auto& function_table_entry : function_table) { kernel_registry.Register(function_table_entry()); diff --git a/onnxruntime/core/codegen/tvm/tvm_compiler.cc b/onnxruntime/core/codegen/tvm/tvm_compiler.cc index fba99dfeda0a8..a3ae548f70363 100644 --- a/onnxruntime/core/codegen/tvm/tvm_compiler.cc +++ b/onnxruntime/core/codegen/tvm/tvm_compiler.cc @@ -25,13 +25,13 @@ TVMGraph::TensorDescriptor::TensorDescriptor(MLDataType type, onnxruntime::Provi class IdGenerator { public: - IdGenerator() : cur_(0) {} + IdGenerator() {} int GetNext() { return cur_++; } private: - int cur_; + int cur_{0}; }; // This is a special compiler step for the test case that sum two 1-D tensors @@ -40,7 +40,8 @@ static void Compile1DAddToTVM(const onnxruntime::Node& node, std::unordered_map< tvm::Array shape; shape.push_back(tvm::var("n1")); - tvm::Tensor t1, t2; + tvm::Tensor t1; + tvm::Tensor t2; auto it = tvm_tensors.find(node.InputDefs()[0]->Name()); if (it == tvm_tensors.end()) { tvm_tensors[node.InputDefs()[0]->Name()] = TVMGraph::TensorDescriptor( diff --git a/onnxruntime/core/codegen/tvm/tvm_compiler.h b/onnxruntime/core/codegen/tvm/tvm_compiler.h index 624d5d260aa96..e4eed0dc80d94 100644 --- a/onnxruntime/core/codegen/tvm/tvm_compiler.h +++ b/onnxruntime/core/codegen/tvm/tvm_compiler.h @@ -22,7 +22,7 @@ struct TVMGraph { public: TensorDescriptor(MLDataType type, onnxruntime::ProviderType execution_provider_type, tvm::Tensor tvm_tensor); - TensorDescriptor() {} + TensorDescriptor() = default; }; std::vector inputs_; std::vector outputs_; diff --git a/onnxruntime/core/common/threadpool.cc b/onnxruntime/core/common/threadpool.cc index 875b90c4ff94f..07305a41d0645 100644 --- a/onnxruntime/core/common/threadpool.cc +++ b/onnxruntime/core/common/threadpool.cc @@ -78,32 +78,34 @@ class ThreadPool::Impl : public Eigen::ThreadPool { void ParallelFor(int32_t total, std::function fn) { // TODO: Eigen supports a more efficient ThreadPoolDevice mechanism // We will simply rely on the work queue and stealing in the short term. - Barrier barrier(static_cast(total)); + Barrier barrier(static_cast(total - 1)); std::function handle_iteration = [&barrier, &fn](int iteration) { fn(iteration); barrier.Notify(); }; - for (int32_t id = 0; id < total; ++id) { + for (int32_t id = 1; id < total; ++id) { Schedule([=, &handle_iteration]() { handle_iteration(id); }); } + fn(0); barrier.Wait(); } void ParallelForRange(int64_t first, int64_t last, std::function fn) { // TODO: Eigen supports a more efficient ThreadPoolDevice mechanism // We will simply rely on the work queue and stealing in the short term. - Barrier barrier(static_cast(last - first + 1)); + Barrier barrier(static_cast(last - first)); std::function handle_range = [&barrier, &fn](int64_t first, int64_t last) { fn(first, last); barrier.Notify(); }; - for (int64_t id = first; id <= last; ++id) { + for (int64_t id = first + 1; id <= last; ++id) { Schedule([=, &handle_range]() { handle_range(id, id + 1); }); } + fn(first, first + 1); barrier.Wait(); } }; @@ -127,15 +129,16 @@ class ThreadPool::Impl : public TaskThreadPool { fn(id); } #else - Barrier barrier(static_cast(total)); + Barrier barrier(static_cast(total - 1)); std::function handle_iteration = [&barrier, &fn](int iteration) { fn(iteration); barrier.Notify(); }; - for (int32_t id = 0; id < total; ++id) { + for (int32_t id = 1; id < total; ++id) { std::packaged_task task(std::bind(handle_iteration, id)); RunTask(std::move(task)); } + fn(0); barrier.Wait(); #endif } @@ -147,15 +150,16 @@ class ThreadPool::Impl : public TaskThreadPool { fn(id, id + 1); } #else - Barrier barrier(static_cast(last - first + 1)); + Barrier barrier(static_cast(last - first)); std::function handle_iteration = [&barrier, &fn](int64_t first, int64_t last) { fn(first, last); barrier.Notify(); }; - for (int64_t id = first; id < last; ++id) { + for (int64_t id = first + 1; id < last; ++id) { std::packaged_task task(std::bind(handle_iteration, id, id + 1)); RunTask(std::move(task)); } + fn(first, first + 1); barrier.Wait(); #endif } diff --git a/onnxruntime/core/framework/allocation_planner.cc b/onnxruntime/core/framework/allocation_planner.cc index 80f3813d363af..8528ce20f1527 100644 --- a/onnxruntime/core/framework/allocation_planner.cc +++ b/onnxruntime/core/framework/allocation_planner.cc @@ -100,14 +100,10 @@ std::ostream& operator<<(std::ostream& out, std::pair& outer_scope_node_args, - const ExecutionProviders& providers, - const KernelRegistryManager& kernel_registry, - const MLValueNameIdxMap& mlvalue_name_idx_map, - const ISequentialPlannerContext& context, - SequentialExecutionPlan& plan) + PlannerImpl(const Node* parent_node, const onnxruntime::GraphViewer& graph_viewer, + const std::vector& outer_scope_node_args, const ExecutionProviders& providers, + const KernelRegistryManager& kernel_registry, const MLValueNameIdxMap& ort_value_name_idx_map, + const ISequentialPlannerContext& context, SequentialExecutionPlan& plan) : context_{context}, plan_{plan}, parent_node_{parent_node}, @@ -115,8 +111,7 @@ class PlannerImpl { outer_scope_node_args_{outer_scope_node_args}, execution_providers_{providers}, kernel_registry_{kernel_registry}, - mlvalue_name_idx_map_{mlvalue_name_idx_map} { - } + ort_value_name_idx_map_{ort_value_name_idx_map} {} Status CreatePlan(); @@ -130,9 +125,9 @@ class PlannerImpl { const ExecutionProviders& execution_providers_; const KernelRegistryManager& kernel_registry_; - const MLValueNameIdxMap& mlvalue_name_idx_map_; + const MLValueNameIdxMap& ort_value_name_idx_map_; - // MLValueInfo: Auxiliary information about an MLValue used only during plan-generation: + // MLValueInfo: Auxiliary information about an OrtValue used only during plan-generation: struct MLValueInfo { const onnxruntime::NodeArg* p_def_site; // the (unique) NodeArg corresponding to the MLValue int usecount = 0; // static reference-count @@ -149,7 +144,8 @@ class PlannerImpl { // deallocate_point is an index into the execution-plan; thus, ml_value becomes free after // this step in the execution-plan is completed. size_t deallocate_point; - FreeBufferInfo(MLValueIndex mlvalue, size_t dealloc_point) : ml_value(mlvalue), deallocate_point(dealloc_point) {} + FreeBufferInfo(MLValueIndex ort_value, size_t dealloc_point) + : ml_value(ort_value), deallocate_point(dealloc_point) {} }; // freelist_ : a list of ml-values whose buffers are free to be reused, sorted by when // they became free (more recently freed earlier in the list). @@ -157,18 +153,25 @@ class PlannerImpl { MLValueIndex Index(const MLValueName& name) { MLValueIndex result; - auto status = mlvalue_name_idx_map_.GetIdx(name, result); + auto status = ort_value_name_idx_map_.GetIdx(name, result); ORT_ENFORCE(status.IsOK(), status.ErrorMessage()); return result; } - int& UseCount(MLValueIndex n) { return ml_value_info_.at(n).usecount; } + int& UseCount(MLValueIndex n) { + ORT_ENFORCE(n >= 0 && static_cast(n) < ml_value_info_.size()); + return ml_value_info_[n].usecount; + } int& UseCount(const MLValueName& name) { return UseCount(Index(name)); } - MLValueIndex& Buffer(MLValueIndex n) { return ml_value_info_.at(n).reused_buffer_index; } + MLValueIndex& Buffer(MLValueIndex n) { + ORT_ENFORCE(n >= 0 && static_cast(n) < ml_value_info_.size()); + return ml_value_info_[n].reused_buffer_index; + } AllocPlanPerValue& AllocPlan(MLValueIndex n) { - return plan_.allocation_plan.at(n); + ORT_ENFORCE(n >= 0 && static_cast(n) < plan_.allocation_plan.size()); + return plan_.allocation_plan[static_cast(n)]; } AllocPlanPerValue& AllocPlan(const MLValueName& name) { @@ -177,13 +180,14 @@ class PlannerImpl { // Initialize state for a given ml-value at its definition site: void ProcessDef(MLValueIndex id, const onnxruntime::NodeArg* p_def_site) { - MLValueInfo& info = ml_value_info_.at(id); + ORT_ENFORCE(id >= 0 && static_cast(id) < ml_value_info_.size()); + MLValueInfo& info = ml_value_info_[id]; info.usecount = 0; info.reused_buffer_index = id; // initially, no reuse; the ml-value uses its own buffer info.p_def_site = p_def_site; } - // Reuse/Alias/Share between two MLValue indexes + // Reuse/Alias/Share between two OrtValue indexes void Reuse(MLValueIndex reused, MLValueIndex reused_for, AllocKind alloc_kind) { ORT_ENFORCE(reused != reused_for); // find original buffer underlying ml-value we want to reuse: @@ -209,7 +213,7 @@ class PlannerImpl { } const std::vector>& alias_map = ci->kernel_def->Alias(); - auto& input_args = node.InputDefs(); + auto input_args = node.InputDefs(); for (auto pair : alias_map) { if (pair.second == output_arg_num) { // we _must_ reuse this input to satisfy aliasing requirement: (e.g., for reshape) @@ -245,7 +249,7 @@ class PlannerImpl { return false; } - bool SameShape(const TensorShapeProto& shape1, const TensorShapeProto& shape2) { + static bool SameShape(const TensorShapeProto& shape1, const TensorShapeProto& shape2) { // TODO: This should probably be defined to be the equality operator on TensorShapeProto. int rank1 = shape1.dim_size(); if (shape2.dim_size() != rank1) return false; @@ -263,7 +267,7 @@ class PlannerImpl { /*! \brief Given a tensor-type, return the size of an element of the tensor. */ - size_t GetElementSize(const DataType& tensor_type) { + static size_t GetElementSize(const DataType& tensor_type) { const TypeProto& type_proto = ONNX_NAMESPACE::Utils::DataTypeUtils::ToTypeProto(tensor_type); MLDataType ml_data_type = DataTypeImpl::TypeFromProto(type_proto); const TensorTypeBase* tensor_type_base = ml_data_type->AsTensorType(); @@ -272,8 +276,8 @@ class PlannerImpl { return elt_type->Size(); } - bool SameSize(const TensorShapeProto& shape1, const DataType& ptype1, - const TensorShapeProto& shape2, const DataType& ptype2) { + static bool SameSize(const TensorShapeProto& shape1, const DataType& ptype1, const TensorShapeProto& shape2, + const DataType& ptype2) { return (GetElementSize(ptype1) == GetElementSize(ptype2)) && SameShape(shape1, shape2); /* TODO: we can improve this if the concrete shapes are known for both as below. @@ -307,8 +311,8 @@ class PlannerImpl { auto& required_allocator_info = AllocPlan(output_arg.Name()).location; for (auto it = freelist_.begin(); it != freelist_.end(); ++it) { - auto reusable = it->ml_value; - auto p_node_arg = ml_value_info_.at(reusable).p_def_site; + size_t reusable = static_cast(it->ml_value); + const onnxruntime::NodeArg* p_node_arg = ml_value_info_.at(reusable).p_def_site; auto& available_allocator_info = AllocPlan(p_node_arg->Name()).location; if (!(available_allocator_info == required_allocator_info)) continue; auto p_available_buffer_shape = context_.GetShape(*p_node_arg); @@ -316,7 +320,7 @@ class PlannerImpl { auto available_buffer_type = p_node_arg->Type(); if (SameSize(*p_available_buffer_shape, available_buffer_type, *p_required_buffer_shape, required_buffer_type)) { - *reusable_tensor = reusable; + *reusable_tensor = it->ml_value; freelist_.erase(it); return true; } @@ -393,25 +397,22 @@ class PlannerImpl { } auto& default_allocator_info = exec_provider->GetAllocator(0, OrtMemTypeDefault)->Info(); - auto& outputs = pnode->OutputDefs(); + auto outputs = pnode->OutputDefs(); auto num_outputs = outputs.size(); for (size_t i = 0; i < num_outputs; ++i) { auto* node_output = outputs[i]; - if (node_output->Exists()) { - MLValueIndex index = Index(node_output->Name()); - ProcessDef(index, node_output); - ++UseCount(index); - if (strcmp(default_allocator_info.name, CPU) != 0) { - // By default, outputs of this node are allocated on the default device allocator, - // except for outputs marked for allocation in MemoryType: - auto memory_type = p_kernelDef->OutputMemoryType(i); - if (memory_type == OrtMemTypeDefault) { - AllocPlan(index).location = default_allocator_info; - } else { - AllocPlan(index).location = exec_provider->GetAllocator(0, memory_type)->Info(); - } - } + if (!node_output->Exists()) continue; + MLValueIndex index = Index(node_output->Name()); + ProcessDef(index, node_output); + ++UseCount(index); + if (strcmp(default_allocator_info.name, CPU) != 0) { + // By default, outputs of this node are allocated on the default device allocator, + // except for outputs marked for allocation in MemoryType: + auto memory_type = p_kernelDef->OutputMemoryType(i); + plan_.SetLocation(static_cast(index), memory_type == OrtMemTypeDefault + ? default_allocator_info + : exec_provider->GetAllocator(0, memory_type)->Info()); } } // if sync is needed, mark allocation plan as create_fence_if_async=true @@ -432,40 +433,54 @@ class PlannerImpl { return Status::OK(); } - // TODO: Don't generate plan for CPU tensors, which may get its memory from 'mmap(2)' + OrtAllocatorInfo GetLocationForNodeInput(size_t input_index, const Node& node) { + auto* p_provider = execution_providers_.Get(node); + ORT_ENFORCE(p_provider); + + const KernelCreateInfo* kernel_create_info; + auto st = kernel_registry_.SearchKernelRegistry(node, &kernel_create_info); + ORT_ENFORCE(st.IsOK(), st.ErrorMessage()); + ORT_ENFORCE(kernel_create_info != nullptr && kernel_create_info->kernel_def != nullptr); + if (kernel_create_info->kernel_def->IsInputOnCpu(input_index)) + // weights are not output from any node, so it's OK to put its location on CPU provider + return execution_providers_.GetDefaultCpuAllocatorInfo(); + return p_provider->GetAllocator(0, OrtMemTypeDefault)->Info(); + } + Status GeneratePlanForWeights() { auto& weights = graph_viewer_.GetAllInitializedTensors(); - + std::vector> locations(plan_.allocation_plan.size()); for (auto& node : graph_viewer_.Nodes()) { - ORT_RETURN_IF_ERROR(onnxruntime::Node::ForEachWithIndex(node.InputDefs(), [this, &node, &weights]( - const onnxruntime::NodeArg& def, - size_t index) { - auto& def_name = def.Name(); - if (!weights.count(def_name)) return Status::OK(); - - auto wt_index = Index(def_name); - AllocPlanPerValue& thisplan = AllocPlan(wt_index); - auto* p_provider = execution_providers_.Get(node); - ORT_ENFORCE(p_provider); - - thisplan.alloc_kind = AllocKind::kAllocateStatically; - const KernelCreateInfo* kernel_create_info; - ORT_RETURN_IF_ERROR(kernel_registry_.SearchKernelRegistry(node, &kernel_create_info)); - if (kernel_create_info == nullptr || kernel_create_info->kernel_def == nullptr) - return Status(ONNXRUNTIME, FAIL, "search kernel failed"); // shouldn't reach here - if (MemTypeOnCpuExplicitly(kernel_create_info->kernel_def->InputMemoryType(index))) - // weights are not output from any node, so it's OK to put its location on CPU provider - thisplan.location = - execution_providers_.Get(onnxruntime::kCpuExecutionProvider)->GetAllocator(0, OrtMemTypeDefault)->Info(); - else - thisplan.location = p_provider->GetAllocator(0, OrtMemTypeDefault)->Info(); - - return Status::OK(); - })); + ORT_RETURN_IF_ERROR(onnxruntime::Node::ForEachWithIndex( + node.InputDefs(), [this, &locations, &node, &weights](const onnxruntime::NodeArg& def, size_t index) { + try { + auto& def_name = def.Name(); + if (!weights.count(def_name)) return Status::OK(); + auto wt_index = Index(def_name); + locations[wt_index].emplace_back(GetLocationForNodeInput(index, node)); + } catch (std::exception& ex) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, ex.what()); + } + return Status::OK(); + })); + } + for (size_t i = 0; i != locations.size(); ++i) { + const std::vector& loc = locations[i]; + if (loc.empty()) continue; + plan_.allocation_plan[i].alloc_kind = AllocKind::kAllocateStatically; + plan_.allocation_plan[i].location = loc[0]; + for (size_t j = 0; j != loc.size(); ++j) { + if (loc[j] != loc[0]) { + // set the location to CPU + plan_.allocation_plan[i].location = execution_providers_.GetDefaultCpuAllocatorInfo(); + break; + } + } } return Status::OK(); } + // Should only be used after ProcessDef() Status ComputeReusePlan() { std::vector& execution_plan{plan_.execution_plan}; @@ -490,6 +505,7 @@ class PlannerImpl { setup_preexisting(outer_scope_node_arg); } + // set AllocationInfo for each weight ORT_RETURN_IF_ERROR(GeneratePlanForWeights()); for (size_t program_counter = 0; program_counter < execution_plan.size(); ++program_counter) { @@ -526,7 +542,7 @@ class PlannerImpl { } else if (FindReusableInput(*pnode, output_arg_num, &reused)) { // Reuse one of this node's input buffers as the output buffer (for in-place update) Reuse(reused, current, AllocKind::kReuse); - } else if (!context_.EnableParallelExecution() && FindReusableTensor(*node_output, &reused)) { + } else if (!context_.IsParallelExecutionEnabled() && FindReusableTensor(*node_output, &reused)) { // Reuse an available (dead) buffer for this output, this is only for sequential execution. Reuse(reused, current, AllocKind::kReuse); } else { @@ -597,7 +613,7 @@ class PlannerImpl { plan_.execution_plan[prev_dealloc_point].free_to_index = current - 1; } - bool IsNonTensor(const onnxruntime::NodeArg& nodearg) { + static bool IsNonTensor(const onnxruntime::NodeArg& nodearg) { // TODO: unclear why we should go through a string-representation of type auto ptype = nodearg.Type(); auto& type_proto = ONNX_NAMESPACE::Utils::DataTypeUtils::ToTypeProto(ptype); @@ -608,9 +624,9 @@ class PlannerImpl { Status PlannerImpl::CreatePlan() { auto& p_graph_nodes = graph_viewer_.GetNodesInTopologicalOrder(); - auto num_ml_values = mlvalue_name_idx_map_.MaxIdx() + 1; + int num_ml_values = ort_value_name_idx_map_.MaxIdx() + 1; - Initialize(p_graph_nodes.size(), num_ml_values); + Initialize(p_graph_nodes.size(), static_cast(num_ml_values)); // Determine execution order: we use the default topological sort order for now. We can later // explore more efficient orderings (from a memory usage perspective). @@ -630,19 +646,17 @@ Status PlannerImpl::CreatePlan() { return Status::OK(); } -Status SequentialPlanner::CreatePlan(const Node* parent_node, - const onnxruntime::GraphViewer& graph_viewer, +Status SequentialPlanner::CreatePlan(const Node* parent_node, const onnxruntime::GraphViewer& graph_viewer, const std::vector& outer_scope_node_args, - const ExecutionProviders& providers, - const KernelRegistryManager& kernel_registry, - const MLValueNameIdxMap& mlvalue_name_idx_map, + const ExecutionProviders& providers, const KernelRegistryManager& kernel_registry, + const MLValueNameIdxMap& ort_value_name_idx_map, const ISequentialPlannerContext& context, std::unique_ptr& plan) { // allocate/reset here so we know it's clean plan = std::make_unique(); - PlannerImpl planner(parent_node, graph_viewer, outer_scope_node_args, - providers, kernel_registry, mlvalue_name_idx_map, context, *plan); + PlannerImpl planner(parent_node, graph_viewer, outer_scope_node_args, providers, kernel_registry, + ort_value_name_idx_map, context, *plan); return planner.CreatePlan(); } diff --git a/onnxruntime/core/framework/allocation_planner.h b/onnxruntime/core/framework/allocation_planner.h index 9115c2f0bc13f..58d9561587b47 100644 --- a/onnxruntime/core/framework/allocation_planner.h +++ b/onnxruntime/core/framework/allocation_planner.h @@ -22,15 +22,13 @@ class MLValueNameIdxMap; class ISequentialPlannerContext { public: virtual const ONNX_NAMESPACE::TensorShapeProto* GetShape(const onnxruntime::NodeArg& arg) const = 0; - virtual bool EnableParallelExecution() const { return false; } + // If it returns true, planner won't reuse output tensors + // see PlannerImpl::ComputeReusePlan + virtual bool IsParallelExecutionEnabled() const { return false; } }; class SequentialPlannerContext : public ISequentialPlannerContext { public: - SequentialPlannerContext() - : m_enable_parallel_execution(false) { - } - SequentialPlannerContext(bool p_enable_parallel_execution) : m_enable_parallel_execution(p_enable_parallel_execution) { } @@ -39,39 +37,20 @@ class SequentialPlannerContext : public ISequentialPlannerContext { return arg.Shape(); } - bool EnableParallelExecution() const override { - return m_enable_parallel_execution; - } + bool IsParallelExecutionEnabled() const override { return m_enable_parallel_execution; } private: - bool m_enable_parallel_execution; + bool m_enable_parallel_execution{false}; }; class SequentialPlanner { public: // This API allows user to provide a custom planner context. - static Status CreatePlan(const Node* parent_node, - const onnxruntime::GraphViewer& graph, + static Status CreatePlan(const Node* parent_node, const onnxruntime::GraphViewer& graph, const std::vector& outer_scope_node_args, - const ExecutionProviders& providers, - const KernelRegistryManager& kernel_registry, - const MLValueNameIdxMap& mlvalue_name_idx_map, - const ISequentialPlannerContext& context, + const ExecutionProviders& providers, const KernelRegistryManager& kernel_registry, + const MLValueNameIdxMap& ort_value_name_idx_map, const ISequentialPlannerContext& context, std::unique_ptr& plan); - - // This uses a standard planner context and is meant to be the primary API for creating a plan - // as the context is primarily used in test scenarios. - static Status CreatePlan(const Node* parent_node, - const onnxruntime::GraphViewer& graph, - const std::vector& outer_scope_node_args, - const ExecutionProviders& providers, - const KernelRegistryManager& kernel_registry, - const MLValueNameIdxMap& mlvalue_name_idx_map, - std::unique_ptr& plan) { - SequentialPlannerContext context; - return CreatePlan(parent_node, graph, outer_scope_node_args, providers, kernel_registry, mlvalue_name_idx_map, - context, plan); - } }; } // namespace onnxruntime diff --git a/onnxruntime/core/framework/execution_frame.cc b/onnxruntime/core/framework/execution_frame.cc index b4bedcd57a75c..3dc87b1e02fd6 100644 --- a/onnxruntime/core/framework/execution_frame.cc +++ b/onnxruntime/core/framework/execution_frame.cc @@ -6,6 +6,8 @@ #include #include "core/framework/mem_pattern_planner.h" +#include "core/framework/execution_plan_base.h" +#include "core/framework/sequential_execution_plan.h" #include "core/framework/ml_value_patterns_planner.h" #include "core/framework/node_index_info.h" #include "core/framework/op_kernel.h" @@ -16,55 +18,52 @@ using namespace onnxruntime::common; namespace onnxruntime { -IExecutionFrame::IExecutionFrame(const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::unordered_map& initializers, - const std::vector& fetch_mlvalue_idxs, - const std::vector& fetches, - const MLValueNameIdxMap& mlvalue_idx_map, - const NodeIndexInfo& node_index_info) +IExecutionFrame::IExecutionFrame(const std::vector& feed_mlvalue_idxs, const std::vector& feeds, + const std::unordered_map& initializers, + const std::vector& fetch_mlvalue_idxs, const std::vector& fetches, + const MLValueNameIdxMap& ort_value_idx_map, const NodeIndexInfo& node_index_info) : node_index_info_{node_index_info}, fetch_mlvalue_idxs_{fetch_mlvalue_idxs} { ORT_ENFORCE(feeds.size() == feed_mlvalue_idxs.size()); ORT_ENFORCE(fetches.empty() || fetches.size() == fetch_mlvalue_idxs.size()); - Init(feed_mlvalue_idxs, feeds, initializers, fetch_mlvalue_idxs, fetches, mlvalue_idx_map); + Init(feed_mlvalue_idxs, feeds, initializers, fetch_mlvalue_idxs, fetches, ort_value_idx_map); } IExecutionFrame::~IExecutionFrame() = default; // Return nullptr if index map to an value that is an unused optional input/output -const MLValue* IExecutionFrame::GetNodeInputOrOutputMLValue(int index) const { - int mlvalue_idx = GetNodeIdxToMLValueIdx(index); - return mlvalue_idx != NodeIndexInfo::kInvalidEntry ? &all_values_[mlvalue_idx] : nullptr; +const OrtValue* IExecutionFrame::GetNodeInputOrOutputMLValue(int index) const { + int ort_value_idx = GetNodeIdxToMLValueIdx(index); + return ort_value_idx != NodeIndexInfo::kInvalidEntry ? &all_values_[ort_value_idx] : nullptr; } -MLValue* IExecutionFrame::GetMutableNodeInputOrOutputMLValue(int index) { - return const_cast(GetNodeInputOrOutputMLValue(index)); +OrtValue* IExecutionFrame::GetMutableNodeInputOrOutputMLValue(int index) { + return const_cast(GetNodeInputOrOutputMLValue(index)); } // TO DO: make it thread safe // This method is not thread safe! // Return S_OK and nullptr if index map to an value that is an unused optional input/output -Status IExecutionFrame::GetOrCreateNodeOutputMLValue(int index, const TensorShape* shape, MLValue*& p_mlvalue) { +Status IExecutionFrame::GetOrCreateNodeOutputMLValue(int index, const TensorShape* shape, OrtValue*& p_ort_value) { auto status = Status::OK(); - int mlvalue_idx = GetNodeIdxToMLValueIdx(index); + int ort_value_idx = GetNodeIdxToMLValueIdx(index); // return nullptr if it is optional - if (mlvalue_idx == NodeIndexInfo::kInvalidEntry) { - p_mlvalue = nullptr; + if (ort_value_idx == NodeIndexInfo::kInvalidEntry) { + p_ort_value = nullptr; } else { - p_mlvalue = &all_values_[mlvalue_idx]; + p_ort_value = &all_values_[ort_value_idx]; - if (p_mlvalue->IsAllocated()) { + if (p_ort_value->IsAllocated()) { // already allocated. verify shape matches if tensor. - if (p_mlvalue->IsTensor()) { - const Tensor& tensor = p_mlvalue->Get(); + if (p_ort_value->IsTensor()) { + const Tensor& tensor = p_ort_value->Get(); ORT_ENFORCE(shape && tensor.Shape() == *shape, - "MLValue shape verification failed. Current shape:", tensor.Shape(), + "OrtValue shape verification failed. Current shape:", tensor.Shape(), " Requested shape:", shape ? shape->ToString() : "null"); } } else { - status = CreateNodeOutputMLValueImpl(*p_mlvalue, mlvalue_idx, shape); + status = CreateNodeOutputMLValueImpl(*p_ort_value, ort_value_idx, shape); } } @@ -75,51 +74,47 @@ AllocatorPtr IExecutionFrame::GetAllocator(const OrtAllocatorInfo& info) const { return GetAllocatorImpl(info); } -Status IExecutionFrame::ReleaseMLValue(int mlvalue_idx) { - return ReleaseMLValueImpl(mlvalue_idx); -} +Status IExecutionFrame::ReleaseMLValue(int ort_value_idx) { return ReleaseMLValueImpl(ort_value_idx); } -Status IExecutionFrame::ReleaseMLValueImpl(int mlvalue_idx) { - if (mlvalue_idx == NodeIndexInfo::kInvalidEntry || static_cast(mlvalue_idx) >= all_values_.size()) { - return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "invalid index ", mlvalue_idx); +Status IExecutionFrame::ReleaseMLValueImpl(int ort_value_idx) { + if (ort_value_idx == NodeIndexInfo::kInvalidEntry || static_cast(ort_value_idx) >= all_values_.size()) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "invalid index ", ort_value_idx); } // If fence is available, check whether async read has completed or not. - Fence_t fence = GetMLValue(mlvalue_idx).Fence(); + Fence_t fence = GetMLValue(ort_value_idx).Fence(); if (fence && !fence->CanRelease()) { // Async data reading is not done yet, defer mem release until Session.run() end. return Status::OK(); } - - all_values_[mlvalue_idx] = MLValue(); + + all_values_[ort_value_idx] = OrtValue(); return Status::OK(); } int IExecutionFrame::GetNodeIdxToMLValueIdx(int index) const { - int mlvalue_idx = node_index_info_.GetMLValueIndex(index); - ORT_ENFORCE(mlvalue_idx == NodeIndexInfo::kInvalidEntry || - (mlvalue_idx >= 0 && static_cast(mlvalue_idx) < all_values_.size())); + int ort_value_idx = node_index_info_.GetMLValueIndex(index); + ORT_ENFORCE(ort_value_idx == NodeIndexInfo::kInvalidEntry || + (ort_value_idx >= 0 && static_cast(ort_value_idx) < all_values_.size())); - return mlvalue_idx; + return ort_value_idx; } -void IExecutionFrame::Init(const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::unordered_map& initializers, - const std::vector& fetch_mlvalue_idxs, - const std::vector& fetches, - const MLValueNameIdxMap& mlvalue_idx_map) { +void IExecutionFrame::Init(const std::vector& feed_mlvalue_idxs, const std::vector& feeds, + const std::unordered_map& initializers, + const std::vector& fetch_mlvalue_idxs, const std::vector& fetches, + const MLValueNameIdxMap& ort_value_idx_map) { // 1. resize the all_value_ vector - all_values_.resize(mlvalue_idx_map.MaxIdx() + 1); + all_values_.resize(ort_value_idx_map.MaxIdx() + 1); // 2. Handle non-empty output vector if (!fetches.empty()) { auto num_fetches = fetch_mlvalue_idxs.size(); for (size_t idx = 0; idx < num_fetches; ++idx) { - int mlvalue_idx = fetch_mlvalue_idxs[idx]; - all_values_[mlvalue_idx] = fetches[idx]; + int ort_value_idx = fetch_mlvalue_idxs[idx]; + all_values_[ort_value_idx] = fetches[idx]; } } @@ -127,23 +122,23 @@ void IExecutionFrame::Init(const std::vector& feed_mlvalue_idxs, // We do this after the fetches to handle an edge case (possibly dubious) where a Constant is an output. // The Constant gets lifted to an initializer so there's no Node producing the value as an output during Graph // execution (i.e. Graph execution won't write the value to all_values_). - // A non-empty fetches vector will overwrite the actual weight in all_values_[mlvalue_idx] if we did this earlier. + // A non-empty fetches vector will overwrite the actual weight in all_values_[ort_value_idx] if we did this earlier. // This makes the ONNX Constant test (onnx\backend\test\data\node\test_constant) happy as that // involves a graph with a single Constant node. for (const auto& entry : initializers) { - int mlvalue_index = entry.first; - all_values_[mlvalue_index] = entry.second; + int ort_value_index = entry.first; + all_values_[ort_value_index] = entry.second; } // 4. handle feed in values. these can override initializer values so must be last for (size_t idx = 0, end = feed_mlvalue_idxs.size(); idx < end; ++idx) { - int mlvalue_idx = feed_mlvalue_idxs[idx]; + int ort_value_idx = feed_mlvalue_idxs[idx]; // we are sharing the underline tensor/object for MLValue - all_values_[mlvalue_idx] = feeds[idx]; + all_values_[ort_value_idx] = feeds[idx]; } } -Status IExecutionFrame::GetOutputs(std::vector& fetches) { +Status IExecutionFrame::GetOutputs(std::vector& fetches) { auto num_fetches = fetch_mlvalue_idxs_.size(); if (fetches.empty()) { @@ -164,14 +159,12 @@ Status IExecutionFrame::GetOutputs(std::vector& fetches) { return Status::OK(); } -bool IExecutionFrame::IsOutput(int mlvalue_idx) const { - return std::find(fetch_mlvalue_idxs_.begin(), fetch_mlvalue_idxs_.end(), mlvalue_idx) != fetch_mlvalue_idxs_.end(); +bool IExecutionFrame::IsOutput(int ort_value_idx) const { + return std::find(fetch_mlvalue_idxs_.begin(), fetch_mlvalue_idxs_.end(), ort_value_idx) != fetch_mlvalue_idxs_.end(); } -ExecutionFrame::ExecutionFrame(const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::vector& fetch_mlvalue_idxs, - const std::vector& fetches, +ExecutionFrame::ExecutionFrame(const std::vector& feed_mlvalue_idxs, const std::vector& feeds, + const std::vector& fetch_mlvalue_idxs, const std::vector& fetches, const std::unordered_map& fetch_allocators, const SessionState& session_state) : IExecutionFrame(feed_mlvalue_idxs, feeds, session_state.GetInitializedTensors(), fetch_mlvalue_idxs, fetches, @@ -179,14 +172,14 @@ ExecutionFrame::ExecutionFrame(const std::vector& feed_mlvalue_idxs, session_state_{session_state}, mem_patterns_{nullptr}, planner_{nullptr} { - // map the custom allocators to mlvalue_idx entries + // map the custom allocators to ort_value_idx entries if (!fetch_allocators.empty()) { for (size_t idx = 0, end = fetch_mlvalue_idxs.size(); idx < end; ++idx) { - int mlvalue_idx = fetch_mlvalue_idxs[idx]; + int ort_value_idx = fetch_mlvalue_idxs[idx]; auto custom_alloc_entry = fetch_allocators.find(idx); if (custom_alloc_entry != fetch_allocators.cend()) { - custom_allocators_[mlvalue_idx] = custom_alloc_entry->second; + custom_allocators_[ort_value_idx] = custom_alloc_entry->second; } } } @@ -230,22 +223,18 @@ ExecutionFrame::ExecutionFrame(const std::vector& feed_mlvalue_idxs, ExecutionFrame::~ExecutionFrame() = default; -Status ExecutionFrame::AllocateMLValueTensorSelfOwnBuffer(MLValue& mlvalue, - int mlvalue_index, - MLDataType element_type, - const OrtAllocatorInfo& location, - const TensorShape& shape, - bool create_fence) { - return AllocateMLValueTensorSelfOwnBufferHelper(mlvalue, mlvalue_index, element_type, location, shape, create_fence); +Status ExecutionFrame::AllocateMLValueTensorSelfOwnBuffer(OrtValue& ort_value, int ort_value_index, + MLDataType element_type, const OrtAllocatorInfo& location, + const TensorShape& shape, bool create_fence) { + return AllocateMLValueTensorSelfOwnBufferHelper(ort_value, ort_value_index, element_type, location, shape, + create_fence); } -Status ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper(MLValue& mlvalue, - int mlvalue_index, +Status ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper(OrtValue& ort_value, int ort_value_index, MLDataType element_type, const OrtAllocatorInfo& location, - const TensorShape& shape, - bool create_fence) { - if (mlvalue_index == NodeIndexInfo::kInvalidEntry) { + const TensorShape& shape, bool create_fence) { + if (ort_value_index == NodeIndexInfo::kInvalidEntry) { return Status(ONNXRUNTIME, FAIL, "Trying to allocate memory for unused optional inputs/outputs"); } @@ -262,20 +251,20 @@ Status ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper(MLValue& mlvalue // create fence if needed if (create_fence) { - ORT_ENFORCE(mlvalue.Fence() == nullptr); + ORT_ENFORCE(ort_value.Fence() == nullptr); FencePtr f = alloc->CreateFence(&session_state_); // it is OK to have fence been nullptr if the execution provider has no async execution, // and allocator::CreateFence returns nullptr - mlvalue.SetFence(f); + ort_value.SetFence(f); } - // if we have pre-calculated memory pattern, and the mlvalue is not output mlvalue + // if we have pre-calculated memory pattern, and the ort_value is not output mlvalue // try to allocated on pre-allocated big chunk. - const auto& per_alloc_plan = GetAllocationPlan(mlvalue_index); + const auto& per_alloc_plan = GetAllocationPlan(ort_value_index); if (mem_patterns_ && per_alloc_plan.alloc_kind != AllocKind::kAllocateOutput) { auto pattern = mem_patterns_->GetPatterns(location); if (pattern) { - auto block = pattern->GetBlock(mlvalue_index); + auto block = pattern->GetBlock(ort_value_index); // if block not found, fall back to default behavior if (block) { auto it = buffers_.find(location); @@ -283,16 +272,17 @@ Status ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper(MLValue& mlvalue if (it != buffers_.end() && block->size_ == size) { void* buffer = it->second.get(); auto status = AllocateTensorWithPreAllocateBufferHelper( - mlvalue, static_cast(static_cast(buffer) + block->offset_), - element_type, location, shape); + ort_value, static_cast(static_cast(buffer) + block->offset_), element_type, location, + shape); return status; } if (block->size_ != size) { - LOGS_DEFAULT(WARNING) << "For mlvalue with index: " << mlvalue_index << ", block in memory pattern size is: " - << block->size_ << " but the actually size is: " << size + LOGS_DEFAULT(WARNING) << "For ort_value with index: " << ort_value_index + << ", block in memory pattern size is: " << block->size_ + << " but the actually size is: " << size << ", fall back to default allocation behavior"; } else if (it == buffers_.end()) { - LOGS_DEFAULT(WARNING) << "For mlvalue with index: " << mlvalue_index + LOGS_DEFAULT(WARNING) << "For ort_value with index: " << ort_value_index << ", block not found in target location. fall back to default allocation behavior"; } } @@ -301,84 +291,77 @@ Status ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper(MLValue& mlvalue //no memory pattern, or the pattern is not correct. std::unique_ptr p_tensor = std::make_unique(element_type, shape, alloc); - mlvalue.Init(p_tensor.release(), - DataTypeImpl::GetType(), - DataTypeImpl::GetType()->GetDeleteFunc()); + ort_value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); // trace the memory allocation. // don't trace the memory allocation on string tensors, as it need // placement new, we don't support it in memory pattern optimization. if (element_type != DataTypeImpl::GetType()) { - TraceAllocate(mlvalue_index, size); + TraceAllocate(ort_value_index, size); } return Status::OK(); } -Status ExecutionFrame::AllocateMLValueTensorPreAllocateBuffer(MLValue& mlvalue, - int mlvalue_index_reuse, - MLDataType element_type, - const OrtAllocatorInfo& location, - const TensorShape& shape, - bool create_fence) { - MLValue& mlvalue_reuse = GetMutableMLValue(mlvalue_index_reuse); +Status ExecutionFrame::AllocateMLValueTensorPreAllocateBuffer(OrtValue& ort_value, int ort_value_index_reuse, + MLDataType element_type, const OrtAllocatorInfo& location, + const TensorShape& shape, bool create_fence) { + OrtValue& ort_value_reuse = GetMutableMLValue(ort_value_index_reuse); - auto* reuse_tensor = mlvalue_reuse.GetMutable(); + auto* reuse_tensor = ort_value_reuse.GetMutable(); void* reuse_buffer = reuse_tensor->MutableDataRaw(); - // create fence on reused mlvalue if needed + // create fence on reused ort_value if needed // TODO: differentiate reuse and alias, by add AllocKind::kAlias? - if (create_fence && mlvalue_reuse.Fence() == nullptr) { + if (create_fence && ort_value_reuse.Fence() == nullptr) { FencePtr f = GetAllocator(location)->CreateFence(&session_state_); - mlvalue_reuse.SetFence(f); + ort_value_reuse.SetFence(f); } - // reused MLValue share the same fence - mlvalue.ShareFenceWith(mlvalue_reuse); - return AllocateTensorWithPreAllocateBufferHelper(mlvalue, reuse_buffer, element_type, location, shape); + // reused OrtValue share the same fence + ort_value.ShareFenceWith(ort_value_reuse); + return AllocateTensorWithPreAllocateBufferHelper(ort_value, reuse_buffer, element_type, location, shape); } -Status ExecutionFrame::AllocateTensorWithPreAllocateBufferHelper(MLValue& mlvalue, - void* pBuffer, +Status ExecutionFrame::AllocateTensorWithPreAllocateBufferHelper(OrtValue& ort_value, void* pBuffer, MLDataType element_type, const OrtAllocatorInfo& location, const TensorShape& shape) { auto p_tensor = std::make_unique(element_type, shape, pBuffer, location); - mlvalue.Init(p_tensor.release(), - DataTypeImpl::GetType(), - DataTypeImpl::GetType()->GetDeleteFunc()); + ort_value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); return Status::OK(); } -static Status AllocateTraditionalMLValue(MLValue& mlvalue, const NonTensorTypeBase& type) { +static Status AllocateTraditionalMLValue(OrtValue& ort_value, const NonTensorTypeBase& type) { auto creator = type.GetCreateFunc(); - mlvalue.Init(creator(), &type, type.GetDeleteFunc()); + ort_value.Init(creator(), &type, type.GetDeleteFunc()); return Status::OK(); } // This method is not thread safe! -Status ExecutionFrame::AllocateAsPerAllocationPlan(MLValue& mlvalue, int mlvalue_index, const TensorShape* shape) { - // if there is a custom allocator for this mlvalue_index, call it to do the allocation - auto custom_alloc_entry = custom_allocators_.find(mlvalue_index); +Status ExecutionFrame::AllocateAsPerAllocationPlan(OrtValue& ort_value, int ort_value_index, const TensorShape* shape) { + // if there is a custom allocator for this ort_value_index, call it to do the allocation + auto custom_alloc_entry = custom_allocators_.find(ort_value_index); if (custom_alloc_entry != custom_allocators_.cend()) { ORT_ENFORCE(shape, "We don't expect custom allocators for non-tensor types, so a shape is mandatory here."); - return (custom_alloc_entry->second)(*shape, mlvalue); + return (custom_alloc_entry->second)(*shape, ort_value); } const SequentialExecutionPlan* p_seq_exec_plan = session_state_.GetExecutionPlan(); const auto& alloc_plan = p_seq_exec_plan->allocation_plan; - ORT_ENFORCE(mlvalue_index >= 0 && mlvalue_index < alloc_plan.size()); - const auto& per_alloc_plan = alloc_plan[mlvalue_index]; + ORT_ENFORCE(ort_value_index >= 0 && static_cast(ort_value_index) < alloc_plan.size()); + const auto& per_alloc_plan = alloc_plan[ort_value_index]; auto alloc_info = per_alloc_plan.location; auto ml_type = per_alloc_plan.value_type; if (ml_type == nullptr) - return Status(ONNXRUNTIME, INVALID_ARGUMENT, - "Tried to allocate without valid type information, mlvalue index=" + std::to_string(mlvalue_index)); + return Status( + ONNXRUNTIME, INVALID_ARGUMENT, + "Tried to allocate without valid type information, ort_value index=" + std::to_string(ort_value_index)); if (!ml_type->IsTensorType()) { - return AllocateTraditionalMLValue(mlvalue, *static_cast(ml_type)); + return AllocateTraditionalMLValue(ort_value, *static_cast(ml_type)); } ORT_ENFORCE(shape, "Allocation of tensor types requires a shape."); @@ -392,21 +375,20 @@ Status ExecutionFrame::AllocateAsPerAllocationPlan(MLValue& mlvalue, int mlvalue // In the future we may want to have different way to handle it. case AllocKind::kAllocateOutput: case AllocKind::kAllocate: { - ORT_RETURN_IF_ERROR(AllocateMLValueTensorSelfOwnBuffer(mlvalue, mlvalue_index, ml_data_type, alloc_info, *shape, - per_alloc_plan.create_fence_if_async)); + ORT_RETURN_IF_ERROR(AllocateMLValueTensorSelfOwnBuffer(ort_value, ort_value_index, ml_data_type, alloc_info, + *shape, per_alloc_plan.create_fence_if_async)); break; } case AllocKind::kReuse: { int reuse_mlvalue_index = per_alloc_plan.reused_buffer; - ORT_RETURN_IF_ERROR(AllocateMLValueTensorPreAllocateBuffer(mlvalue, reuse_mlvalue_index, - ml_data_type, alloc_info, *shape, - per_alloc_plan.create_fence_if_async)); + ORT_RETURN_IF_ERROR(AllocateMLValueTensorPreAllocateBuffer( + ort_value, reuse_mlvalue_index, ml_data_type, alloc_info, *shape, per_alloc_plan.create_fence_if_async)); break; } case AllocKind::kShare: { int reuse_mlvalue_index = per_alloc_plan.reused_buffer; - // copy at the MLValue level so the shared_ptr for the data is shared between the two MLValue instances - mlvalue = GetMutableMLValue(reuse_mlvalue_index); + // copy at the OrtValue level so the shared_ptr for the data is shared between the two OrtValue instances + ort_value = GetMutableMLValue(reuse_mlvalue_index); break; } default: { @@ -425,41 +407,42 @@ AllocatorPtr ExecutionFrame::GetAllocatorImpl(const OrtAllocatorInfo& info) cons // This method is not thread safe! // Return S_OK and nullptr if index map to an value that is an unused optional input/output -Status ExecutionFrame::CreateNodeOutputMLValueImpl(MLValue& mlvalue, int mlvalue_idx, const TensorShape* shape) { - return AllocateAsPerAllocationPlan(mlvalue, mlvalue_idx, shape); +Status ExecutionFrame::CreateNodeOutputMLValueImpl(OrtValue& ort_value, int ort_value_idx, const TensorShape* shape) { + return AllocateAsPerAllocationPlan(ort_value, ort_value_idx, shape); } -Status ExecutionFrame::ReleaseMLValueImpl(int mlvalue_idx) { - ORT_RETURN_IF_ERROR(IExecutionFrame::ReleaseMLValueImpl(mlvalue_idx)); - TraceFree(mlvalue_idx); +Status ExecutionFrame::ReleaseMLValueImpl(int ort_value_idx) { + ORT_RETURN_IF_ERROR(IExecutionFrame::ReleaseMLValueImpl(ort_value_idx)); + TraceFree(ort_value_idx); return Status::OK(); } -const AllocPlanPerValue& ExecutionFrame::GetAllocationPlan(int mlvalue_idx) { +const AllocPlanPerValue& ExecutionFrame::GetAllocationPlan(int ort_value_idx) { const SequentialExecutionPlan* p_seq_exec_plan = session_state_.GetExecutionPlan(); const auto& alloc_plan = p_seq_exec_plan->allocation_plan; - ORT_ENFORCE(mlvalue_idx != NodeIndexInfo::kInvalidEntry && mlvalue_idx < alloc_plan.size()); - return alloc_plan[mlvalue_idx]; + ORT_ENFORCE(ort_value_idx >= 0 && static_cast(ort_value_idx) < alloc_plan.size()); + return alloc_plan[ort_value_idx]; } -void ExecutionFrame::TraceAllocate(int mlvalue_idx, size_t size) { +void ExecutionFrame::TraceAllocate(int ort_value_idx, size_t size) { if (planner_) { // don't trace the output tensors. - auto& allocation_plan = GetAllocationPlan(mlvalue_idx); + auto& allocation_plan = GetAllocationPlan(ort_value_idx); if (allocation_plan.alloc_kind == AllocKind::kAllocateOutput) return; - auto status = planner_->TraceAllocation(mlvalue_idx, size); + auto status = planner_->TraceAllocation(ort_value_idx, size); if (!status.IsOK()) - LOGS(session_state_.Logger(), WARNING) << "TraceAllocation for mlvalue_idx=" << mlvalue_idx << " size=" << size - << " failed: " << status.ErrorMessage(); + LOGS(session_state_.Logger(), WARNING) << "TraceAllocation for ort_value_idx=" << ort_value_idx + << " size=" << size << " failed: " << status.ErrorMessage(); } } -void ExecutionFrame::TraceFree(int mlvalue_idx) { +void ExecutionFrame::TraceFree(int ort_value_idx) { // don't trace free on output tensors. - if (planner_ && !IsOutput(mlvalue_idx)) { + if (planner_ && !IsOutput(ort_value_idx)) { const SequentialExecutionPlan* p_seq_exec_plan = session_state_.GetExecutionPlan(); const auto& alloc_plan = p_seq_exec_plan->allocation_plan; - const auto& per_alloc_plan = alloc_plan.at(mlvalue_idx); + ORT_ENFORCE(ort_value_idx >= 0 && static_cast(ort_value_idx) < alloc_plan.size()); + const auto& per_alloc_plan = alloc_plan[ort_value_idx]; // only trace tensors auto ml_type = per_alloc_plan.value_type; @@ -468,10 +451,10 @@ void ExecutionFrame::TraceFree(int mlvalue_idx) { auto ml_data_type = static_cast(ml_type)->GetElementType(); // don't trace string tensors if (ml_data_type != DataTypeImpl::GetType()) { - auto status = planner_->TraceFree(mlvalue_idx); + auto status = planner_->TraceFree(ort_value_idx); if (!status.IsOK()) { - LOGS(session_state_.Logger(), WARNING) << "TraceFree for mlvalue_idx=" << mlvalue_idx - << " failed: " << status.ErrorMessage(); + LOGS(session_state_.Logger(), WARNING) + << "TraceFree for ort_value_idx=" << ort_value_idx << " failed: " << status.ErrorMessage(); } } } diff --git a/onnxruntime/core/framework/execution_frame.h b/onnxruntime/core/framework/execution_frame.h index 508d0f4715fa0..600c51d0cc06c 100644 --- a/onnxruntime/core/framework/execution_frame.h +++ b/onnxruntime/core/framework/execution_frame.h @@ -25,12 +25,9 @@ class NodeIndexInfo; class IExecutionFrame { protected: - IExecutionFrame(const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::unordered_map& initializers, - const std::vector& fetch_mlvalue_idxs, - const std::vector& fetches, - const MLValueNameIdxMap& mlvalue_idx_map, + IExecutionFrame(const std::vector& feed_mlvalue_idxs, const std::vector& feeds, + const std::unordered_map& initializers, const std::vector& fetch_mlvalue_idxs, + const std::vector& fetches, const MLValueNameIdxMap& ort_value_idx_map, const NodeIndexInfo& node_index_info); public: @@ -42,94 +39,82 @@ class IExecutionFrame { } // Return nullptr if index map to an value that is an unused optional input/output - const MLValue* GetNodeInputOrOutputMLValue(int index) const; - MLValue* GetMutableNodeInputOrOutputMLValue(int index); + const OrtValue* GetNodeInputOrOutputMLValue(int index) const; + OrtValue* GetMutableNodeInputOrOutputMLValue(int index); // TO DO: make it thread safe // This method is not thread safe! // Return S_OK and nullptr if index map to an value that is an unused optional input/output // Shape is required for tensors but not traditional ML values. - Status GetOrCreateNodeOutputMLValue(int index, const TensorShape* shape, MLValue*& p_mlvalue); + Status GetOrCreateNodeOutputMLValue(int index, const TensorShape* shape, OrtValue*& p_ort_value); /** * write the output values to the 'fetches' vector * Don't access the values after SessionState is destroyed */ - Status GetOutputs(std::vector& fetches); + Status GetOutputs(std::vector& fetches); AllocatorPtr GetAllocator(const OrtAllocatorInfo& info) const; - Status ReleaseMLValue(int mlvalue_idx); + Status ReleaseMLValue(int ort_value_idx); protected: - // get the mlvalue_idx from NodeIndexInfo + // get the ort_value_idx from NodeIndexInfo int GetNodeIdxToMLValueIdx(int index) const; - MLValue& GetMutableMLValue(int mlvalue_index) { - return const_cast(GetMLValue(mlvalue_index)); - } + OrtValue& GetMutableMLValue(int ort_value_index) { return const_cast(GetMLValue(ort_value_index)); } - virtual Status ReleaseMLValueImpl(int mlvalue_idx); + virtual Status ReleaseMLValueImpl(int ort_value_idx); - // returns true if the mlvalue_idx is an output from the graph - bool IsOutput(int mlvalue_idx) const; + // returns true if the ort_value_idx is an output from the graph + bool IsOutput(int ort_value_idx) const; private: ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(IExecutionFrame); - void Init(const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::unordered_map& initializers, - const std::vector& fetch_mlvalue_idxs, - const std::vector& fetches, - const MLValueNameIdxMap& mlvalue_idx_map); + void Init(const std::vector& feed_mlvalue_idxs, const std::vector& feeds, + const std::unordered_map& initializers, const std::vector& fetch_mlvalue_idxs, + const std::vector& fetches, const MLValueNameIdxMap& ort_value_idx_map); - const MLValue& GetMLValue(int mlvalue_index) const { - ORT_ENFORCE(mlvalue_index >= 0 && static_cast(mlvalue_index) < all_values_.size()); - return all_values_[mlvalue_index]; + const OrtValue& GetMLValue(int ort_value_index) const { + ORT_ENFORCE(ort_value_index >= 0 && static_cast(ort_value_index) < all_values_.size()); + return all_values_[ort_value_index]; } virtual AllocatorPtr GetAllocatorImpl(const OrtAllocatorInfo& info) const = 0; - virtual Status CreateNodeOutputMLValueImpl(MLValue& mlvalue, int mlvalue_idx, const TensorShape* shape) = 0; + virtual Status CreateNodeOutputMLValueImpl(OrtValue& ort_value, int ort_value_idx, const TensorShape* shape) = 0; const NodeIndexInfo& node_index_info_; // All the intermediate values for the entire graph. // Input and Output values are passed in by executors - std::vector all_values_; + std::vector all_values_; const std::vector fetch_mlvalue_idxs_; }; class ExecutionFrame final : public IExecutionFrame { public: - ExecutionFrame(const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::vector& fetch_mlvalue_idxs, - const std::vector& fetches, + ExecutionFrame(const std::vector& feed_mlvalue_idxs, const std::vector& feeds, + const std::vector& fetch_mlvalue_idxs, const std::vector& fetches, // optional custom allocators. key is index in fetches const std::unordered_map& fetch_allocators, const SessionState& session_state); - ~ExecutionFrame(); + ~ExecutionFrame() override; // TODO: These two AllocateMLValue... methods are in the API purely for unit test usage. // Fix the unit tests so they set an execution plan that results in these methods being called by // GetOrCreateNodeOutputMLValue instead - Status AllocateMLValueTensorSelfOwnBuffer(MLValue& mlvalue, - int mlvalue_index, - MLDataType element_type, - const OrtAllocatorInfo& location, - const TensorShape& shape, + Status AllocateMLValueTensorSelfOwnBuffer(OrtValue& ort_value, int ort_value_index, MLDataType element_type, + const OrtAllocatorInfo& location, const TensorShape& shape, bool create_fence = false); - Status AllocateMLValueTensorPreAllocateBuffer(MLValue& mlvalue, - int mlvalue_index_reuse, - MLDataType element_type, - const OrtAllocatorInfo& location, - const TensorShape& shape, + Status AllocateMLValueTensorPreAllocateBuffer(OrtValue& ort_value, int ort_value_index_reuse, MLDataType element_type, + const OrtAllocatorInfo& location, const TensorShape& shape, bool create_fence = false); + // thread-safe Status GeneratePatterns(MemoryPatternGroup* out) const; bool HasMemoryPatternPlanner() const { @@ -140,30 +125,22 @@ class ExecutionFrame final : public IExecutionFrame { ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(ExecutionFrame); AllocatorPtr GetAllocatorImpl(const OrtAllocatorInfo& info) const override; - Status ReleaseMLValueImpl(int mlvalue_idx) override; - Status CreateNodeOutputMLValueImpl(MLValue& mlvalue, int mlvalue_idx, const TensorShape* shape) override; - - common::Status AllocateAsPerAllocationPlan(MLValue& mlvalue, - int mlvalue_index, - const TensorShape* shape); - - Status AllocateMLValueTensorSelfOwnBufferHelper(MLValue& mlvalue, - int mlvalue_index, - MLDataType element_type, - const OrtAllocatorInfo& location, - const TensorShape& shape, + Status ReleaseMLValueImpl(int ort_value_idx) override; + Status CreateNodeOutputMLValueImpl(OrtValue& ort_value, int ort_value_idx, const TensorShape* shape) override; + + common::Status AllocateAsPerAllocationPlan(OrtValue& ort_value, int ort_value_index, const TensorShape* shape); + + Status AllocateMLValueTensorSelfOwnBufferHelper(OrtValue& ort_value, int ort_value_index, MLDataType element_type, + const OrtAllocatorInfo& location, const TensorShape& shape, bool create_fence); - Status AllocateTensorWithPreAllocateBufferHelper(MLValue& mlvalue, - void* pBuffer, - MLDataType element_type, - const OrtAllocatorInfo& location, - const TensorShape& shape); + Status AllocateTensorWithPreAllocateBufferHelper(OrtValue& ort_value, void* pBuffer, MLDataType element_type, + const OrtAllocatorInfo& location, const TensorShape& shape); - void TraceAllocate(int mlvalue_idx, size_t size); - void TraceFree(int mlvalue_idx); + void TraceAllocate(int ort_value_idx, size_t size); + void TraceFree(int ort_value_idx); - const AllocPlanPerValue& GetAllocationPlan(int mlvalue_idx); + const AllocPlanPerValue& GetAllocationPlan(int ort_value_idx); const SessionState& session_state_; diff --git a/onnxruntime/core/framework/execution_provider.cc b/onnxruntime/core/framework/execution_provider.cc index d773bc89c17c9..84d4e060501bb 100644 --- a/onnxruntime/core/framework/execution_provider.cc +++ b/onnxruntime/core/framework/execution_provider.cc @@ -65,7 +65,7 @@ void IExecutionProvider::InsertAllocator(AllocatorPtr allocator) { ORT_THROW("duplicated allocator"); } allocators_.insert(iter, {key, allocator}); - allocator_list_.push_back(gsl::not_null(allocator.get())); + allocator_list_.emplace_back(gsl::not_null(allocator.get())); } common::Status IExecutionProvider::Compile(const std::vector& /*fused_node*/, diff --git a/onnxruntime/core/framework/execution_providers.h b/onnxruntime/core/framework/execution_providers.h index b1406e568493f..aee11a275577b 100644 --- a/onnxruntime/core/framework/execution_providers.h +++ b/onnxruntime/core/framework/execution_providers.h @@ -91,6 +91,10 @@ class ExecutionProviders { const_iterator begin() const noexcept { return exec_providers_.cbegin(); } const_iterator end() const noexcept { return exec_providers_.cend(); } + OrtAllocatorInfo GetDefaultCpuAllocatorInfo() const { + return Get(onnxruntime::kCpuExecutionProvider)->GetAllocator(0, OrtMemTypeDefault)->Info(); + } + private: std::vector> exec_providers_; diff --git a/onnxruntime/core/framework/feeds_fetches_manager.cc b/onnxruntime/core/framework/feeds_fetches_manager.cc index bd190b6ac7b59..6434b159ee53d 100644 --- a/onnxruntime/core/framework/feeds_fetches_manager.cc +++ b/onnxruntime/core/framework/feeds_fetches_manager.cc @@ -4,34 +4,34 @@ #include "core/framework/feeds_fetches_manager.h" #include "core/framework/execution_providers.h" -#include "core/framework/mlvalue_name_idx_map.h" +#include "core/framework/ort_value_name_idx_map.h" namespace onnxruntime { common::Status FeedsFetchesInfo::MapNamesToMLValueIdxs(const std::vector& names, - const MLValueNameIdxMap& mlvalue_name_idx_map, - std::vector& mlvalue_idxs) { + const MLValueNameIdxMap& ort_value_name_idx_map, + std::vector& ort_value_idxs) { auto status = Status::OK(); - mlvalue_idxs.reserve(names.size()); + ort_value_idxs.reserve(names.size()); for (const auto& name : names) { int idx; - status = mlvalue_name_idx_map.GetIdx(name, idx); + status = ort_value_name_idx_map.GetIdx(name, idx); ORT_RETURN_IF_ERROR(status); - mlvalue_idxs.push_back(idx); + ort_value_idxs.push_back(idx); } return status; } -Status FeedsFetchesInfo::SetMLValueIdxs(const MLValueNameIdxMap& mlvalue_name_idx_map) { - auto status = MapNamesToMLValueIdxs(feed_names, mlvalue_name_idx_map, feeds_mlvalue_idxs); +Status FeedsFetchesInfo::SetMLValueIdxs(const MLValueNameIdxMap& ort_value_name_idx_map) { + auto status = MapNamesToMLValueIdxs(feed_names, ort_value_name_idx_map, feeds_mlvalue_idxs); if (!status.IsOK()) { return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Error mapping feeds: " + status.ErrorMessage()); } - status = MapNamesToMLValueIdxs(output_names, mlvalue_name_idx_map, fetches_mlvalue_idxs); + status = MapNamesToMLValueIdxs(output_names, ort_value_name_idx_map, fetches_mlvalue_idxs); if (!status.IsOK()) { return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Error mapping output names: " + status.ErrorMessage()); } @@ -41,13 +41,13 @@ Status FeedsFetchesInfo::SetMLValueIdxs(const MLValueNameIdxMap& mlvalue_name_id Status FeedsFetchesManager::Create(const std::vector& feed_names, const std::vector& output_names, - const MLValueNameIdxMap& mlvalue_name_idx_map, + const MLValueNameIdxMap& ort_value_name_idx_map, std::unique_ptr& feed_fetch_manager) { FeedsFetchesInfo info; info.feed_names = feed_names; info.output_names = output_names; - ORT_RETURN_IF_ERROR(info.SetMLValueIdxs(mlvalue_name_idx_map)); + ORT_RETURN_IF_ERROR(info.SetMLValueIdxs(ort_value_name_idx_map)); feed_fetch_manager = std::make_unique(std::move(info)); diff --git a/onnxruntime/core/framework/feeds_fetches_manager.h b/onnxruntime/core/framework/feeds_fetches_manager.h index 9554534232f16..3b485c605f017 100644 --- a/onnxruntime/core/framework/feeds_fetches_manager.h +++ b/onnxruntime/core/framework/feeds_fetches_manager.h @@ -32,11 +32,11 @@ struct FeedsFetchesInfo { : feed_names{feed_names_in}, output_names{output_names_in} {} static Status MapNamesToMLValueIdxs(const std::vector& names, - const MLValueNameIdxMap& mlvalue_name_idx_map, - std::vector& mlvalue_idxs); + const MLValueNameIdxMap& ort_value_name_idx_map, + std::vector& ort_value_idxs); - // set the mlvalue_idxs for the current values in feed_names and output_names - Status SetMLValueIdxs(const MLValueNameIdxMap& mlvalue_name_idx_map); + // set the ort_value_idxs for the current values in feed_names and output_names + Status SetMLValueIdxs(const MLValueNameIdxMap& ort_value_name_idx_map); std::vector feed_names; std::vector output_names; @@ -53,9 +53,8 @@ class FeedsFetchesManager { const IExecutionProvider* copy_provider = nullptr; }; - static Status Create(const std::vector& feed_names, - const std::vector& output_names, - const MLValueNameIdxMap& mlvalue_name_idx_map, + static Status Create(const std::vector& feed_names, const std::vector& output_names, + const MLValueNameIdxMap& ort_value_name_idx_map, std::unique_ptr& feeds_fetches_manager); FeedsFetchesManager(FeedsFetchesInfo&& info) : feeds_fetches_info_{info} {} diff --git a/onnxruntime/core/framework/func_kernel.cc b/onnxruntime/core/framework/func_kernel.cc index 70c55e45add76..f02e3be2d3a40 100644 --- a/onnxruntime/core/framework/func_kernel.cc +++ b/onnxruntime/core/framework/func_kernel.cc @@ -15,31 +15,4 @@ void release_helper_func(void* allocator, void* p) { return alloc->Free(p); } -DType ORT_type_to_c_type(MLDataType type) { - if (type == DataTypeImpl::GetType()) - return DType::TFloat32; - else if (type == DataTypeImpl::GetType()) - return DType::TDouble; - else if (type == DataTypeImpl::GetType()) - return DType::TInt32; - else if (type == DataTypeImpl::GetType()) - return DType::TBool; - else if (type == DataTypeImpl::GetType()) - return DType::TUint8; - else if (type == DataTypeImpl::GetType()) - return DType::TInt8; - else if (type == DataTypeImpl::GetType()) - return DType::TUint16; - else if (type == DataTypeImpl::GetType()) - return DType::TInt16; - else if (type == DataTypeImpl::GetType()) - return DType::TUint32; - else if (type == DataTypeImpl::GetType()) - return DType::TUint64; - else if (type == DataTypeImpl::GetType()) - return DType::TInt64; - else - ORT_NOT_IMPLEMENTED("Unsupport MLType to c type."); -} - } // namespace onnxruntime diff --git a/onnxruntime/core/framework/func_kernel.h b/onnxruntime/core/framework/func_kernel.h index 26347f3216cb2..c2401fcd33d52 100644 --- a/onnxruntime/core/framework/func_kernel.h +++ b/onnxruntime/core/framework/func_kernel.h @@ -1,15 +1,17 @@ #pragma once #include "core/framework/op_kernel.h" #include "core/framework/func_api.h" +#include "core/framework/op_kernel_context_internal.h" #include "core/graph/function.h" + +const OrtCustomOpApi& GetCustomOpApi(); + namespace onnxruntime { void* allocate_helper_func(void* allocator, size_t alignment, size_t size); void release_helper_func(void* allocator, void* p); -DType ORT_type_to_c_type(MLDataType type); - //A kernel that wrapper the ComputeFunction call generated by execution provider when fuse the sub-graph class FunctionKernel : public OpKernel { public: @@ -30,46 +32,17 @@ class FunctionKernel : public OpKernel { } } - virtual ~FunctionKernel() { + ~FunctionKernel() override { if (release_func_ && func_state_) { release_func_(func_state_); } } virtual Status Compute(OpKernelContext* context) const override { - std::vector input_tensors; - for (int i = 0; i < num_inputs_; i++) { - const Tensor* input = context->Input(i); - auto& shape = input->Shape(); - auto& dims = shape.GetDims(); - ONNXRunTimeTensor input_tensor = { - const_cast(input->DataRaw()), - shape.NumDimensions(), - //hard code to double now - ORT_type_to_c_type(input->DataType()), - dims.empty() ? nullptr : const_cast(&dims[0])}; - input_tensors.push_back(input_tensor); - } - - std::vector output_tensors(num_outputs_); - int ret = func_(func_state_, input_tensors.empty() ? nullptr : &input_tensors[0], input_tensors.size(), &output_tensors[0], output_tensors.size()); + auto* context_internal = static_cast(context); + int ret = func_(func_state_, &GetCustomOpApi(), reinterpret_cast(context_internal)); if (ret != 0) return Status(common::ONNXRUNTIME, common::FAIL, "FuncKernel call failed with error code: " + std::to_string(ret)); - - for (int i = 0; i < num_outputs_; i++) { - TensorShape output_shape(std::vector(output_tensors[i].shape, output_tensors[i].shape + output_tensors[i].ndim)); - Tensor* output = context->Output(i, output_shape); - auto data = output->MutableDataRaw(); - //TODO: for string tensors, this copy is not correct. - ORT_ENFORCE(output->DataType() != DataTypeImpl::GetType()); - memcpy(data, output_tensors[i].data, output->DataType()->Size() * output_shape.Size()); - //Release output tensors (buffer, shape). - host_allocator_->Free(output_tensors[i].data); - // for shape, becauset the TempSpaceAllocator we use could be a device allocator, if the kernel is assigned to a device like gpu. - // so we prefer to directly allocate shape on heap. otherwise we need pass in multile allocator function for host and device. - delete[] output_tensors[i].shape; - } - return Status::OK(); } diff --git a/onnxruntime/core/framework/fuse_nodes_funcs.cc b/onnxruntime/core/framework/fuse_nodes_funcs.cc index c4fbdc9ab14b0..7dfbe71545cda 100644 --- a/onnxruntime/core/framework/fuse_nodes_funcs.cc +++ b/onnxruntime/core/framework/fuse_nodes_funcs.cc @@ -40,8 +40,8 @@ Status FuncManager::GetFuncs(const std::string& name, ComputeFunc* compute, Crea ORT_RETURN_IF_ERROR(Env::Default().GetSymbolFromLibrary(handle, kReleaseStateFuncSymbol + name, &release_func_symbol_handle)); - it->second.compute_func = [=](FunctionState state, ONNXRunTimeTensor* input, size_t n_input, ONNXRunTimeTensor* output, size_t n_output) { - return reinterpret_cast(compute_func_symbol_handle)(state, input, n_input, output, n_output); + it->second.compute_func = [=](FunctionState state, const OrtCustomOpApi* api, OrtKernelContext* context) { + return reinterpret_cast(compute_func_symbol_handle)(state, api, context); }; it->second.create_state_func = [=](ComputeContext* context, FunctionState* state) { diff --git a/onnxruntime/core/framework/graph_partitioner.cc b/onnxruntime/core/framework/graph_partitioner.cc index 426f9fbf1d29e..d4d250027cb1c 100644 --- a/onnxruntime/core/framework/graph_partitioner.cc +++ b/onnxruntime/core/framework/graph_partitioner.cc @@ -176,7 +176,7 @@ Status GraphPartitioner::Partition(Graph& graph, bool export_dll, FuncManager& f //prepare the func kernel KernelDefBuilder builder; BuildFusedKernelDef(builder, *node); - if (node->GetExecutionProviderType() == onnxruntime::kTensorrtExecutionProvider) { + if (node->GetExecutionProviderType() == onnxruntime::kTensorrtExecutionProvider || node->GetExecutionProviderType() == onnxruntime::kNGraphExecutionProvider) { builder.SetDefaultInputsMemoryType(OrtMemTypeCPUInput); builder.SetDefaultOutputMemoryType(OrtMemTypeCPUOutput); } diff --git a/onnxruntime/core/framework/iexecutor.h b/onnxruntime/core/framework/iexecutor.h index 86398edd2e603..c06b878c8bcbb 100644 --- a/onnxruntime/core/framework/iexecutor.h +++ b/onnxruntime/core/framework/iexecutor.h @@ -10,9 +10,8 @@ #include "core/framework/framework_common.h" #include "core/framework/ml_value.h" +struct OrtValue; namespace onnxruntime { - -class MLValue; class SessionState; class TensorShape; namespace logging { @@ -21,7 +20,7 @@ class Logger; class IExecutor { public: - using CustomAllocator = std::function; + using CustomAllocator = std::function; virtual ~IExecutor() = default; @@ -30,20 +29,20 @@ class IExecutor { */ common::Status Execute(const SessionState& session_state, const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, + const std::vector& feeds, const std::vector& fetch_mlvalue_idxs, - std::vector& fetches, + std::vector& fetches, const logging::Logger& logger) { - return Execute(session_state, feed_mlvalue_idxs, feeds, fetch_mlvalue_idxs, fetches, {}, logger); + std::unordered_map fetch_allocators; + return Execute(session_state, feed_mlvalue_idxs, feeds, fetch_mlvalue_idxs, fetches, fetch_allocators, logger); } - virtual common::Status Execute(const SessionState& session_state, - const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::vector& fetch_mlvalue_idxs, - std::vector& fetches, + // TODO: as fetch_allocators is optional, it should be a pointer instead of reference + virtual common::Status Execute(const SessionState& session_state, const std::vector& feed_mlvalue_idxs, + const std::vector& feeds, const std::vector& fetch_mlvalue_idxs, + std::vector& fetches, // optional custom allocators. key is index in fetches - const std::unordered_map fetch_allocators, + const std::unordered_map& fetch_allocators, const logging::Logger& logger) = 0; }; } // namespace onnxruntime diff --git a/onnxruntime/core/framework/kernel_def_builder.cc b/onnxruntime/core/framework/kernel_def_builder.cc index 12d0fd936897c..724d43c3a0f5f 100644 --- a/onnxruntime/core/framework/kernel_def_builder.cc +++ b/onnxruntime/core/framework/kernel_def_builder.cc @@ -27,7 +27,8 @@ inline bool AreVectorsOverlap(const std::vector& v1, const std::vector& v2 bool KernelDef::IsConflict(const KernelDef& other) const { if (op_name_ != other.OpName() || provider_type_ != other.Provider()) return false; - int start = 0, end = 0; + int start = 0; + int end = 0; other.SinceVersion(&start, &end); if (!AreIntervalsOverlap(op_since_version_start_, op_since_version_end_, start, end)) return false; diff --git a/onnxruntime/core/framework/kernel_registry.cc b/onnxruntime/core/framework/kernel_registry.cc index 943fde7fd69f3..717d41389f136 100644 --- a/onnxruntime/core/framework/kernel_registry.cc +++ b/onnxruntime/core/framework/kernel_registry.cc @@ -47,7 +47,7 @@ void TraverseFormalParametersWithTypeProto(const Node& node, } // process outputs: - auto& actual_outputs = node.OutputDefs(); + auto actual_outputs = node.OutputDefs(); const auto num_actual_outputs = actual_outputs.size(); const auto last_formal = op_schema.outputs().size() - 1; for (size_t i = 0; i != num_actual_outputs; ++i) { @@ -142,7 +142,8 @@ bool KernelRegistry::VerifyKernelDef(const onnxruntime::Node& node, } // check if version matches - int kernel_start_version, kernel_end_version; + int kernel_start_version; + int kernel_end_version; kernel_def.SinceVersion(&kernel_start_version, &kernel_end_version); int node_since_version = node.Op()->since_version(); @@ -240,8 +241,8 @@ Status KernelRegistry::Register(KernelCreateInfo&& create_info) { } Status KernelRegistry::TryCreateKernel(const onnxruntime::Node& node, const IExecutionProvider& execution_provider, - const std::unordered_map& initialized_tensors, - const MLValueNameIdxMap& mlvalue_name_idx_map, const FuncManager& funcs_mgr, + const std::unordered_map& initialized_tensors, + const MLValueNameIdxMap& ort_value_name_idx_map, const FuncManager& funcs_mgr, /*out*/ std::unique_ptr& op_kernel) const { const KernelCreateInfo* kernel_create_info = TryFindKernel(node, execution_provider.Type()); @@ -249,12 +250,8 @@ Status KernelRegistry::TryCreateKernel(const onnxruntime::Node& node, const IExe return Status(ONNXRUNTIME, FAIL, "Failed to find kernel for " + node.OpType()); } - OpKernelInfo kernel_info(node, - *kernel_create_info->kernel_def, - execution_provider, - initialized_tensors, - mlvalue_name_idx_map, - funcs_mgr); + OpKernelInfo kernel_info(node, *kernel_create_info->kernel_def, execution_provider, initialized_tensors, + ort_value_name_idx_map, funcs_mgr); op_kernel.reset(kernel_create_info->kernel_create_func(kernel_info)); return Status::OK(); } diff --git a/onnxruntime/core/framework/mem_pattern_planner.h b/onnxruntime/core/framework/mem_pattern_planner.h index a71b82bf40e98..4288bd553e6d5 100644 --- a/onnxruntime/core/framework/mem_pattern_planner.h +++ b/onnxruntime/core/framework/mem_pattern_planner.h @@ -16,19 +16,24 @@ limitations under the License. // Licensed under the MIT License. #pragma once +#include + #include "core/framework/mem_pattern.h" #include "core/framework/allocation_planner.h" -#include +#include "core/platform/ort_mutex.h" namespace onnxruntime { // MemPatternPlanner is used to trace allocation/free steps // in a single iteration, record the pattern and cached for // future request if they have the same input shape. +// Thread-safe. class MemPatternPlanner { public: MemPatternPlanner() = default; void TraceAllocation(int ml_value_idx, size_t size) { + std::lock_guard lock(lock_); + if (size == 0) { allocs_.emplace_back(ml_value_idx, MemoryBlock(0, 0)); return; @@ -61,6 +66,8 @@ class MemPatternPlanner { } void TraceFree(int ml_value_index) { + std::lock_guard lock(lock_); + for (auto it = blocks_.begin(); it != blocks_.end(); it++) { if (allocs_[*it].index_ == ml_value_index) { blocks_.erase(it); @@ -70,6 +77,8 @@ class MemPatternPlanner { } MemoryPattern GenerateMemPattern() const { + std::lock_guard lock(lock_); + MemoryPattern pattern; pattern.peak_size_ = buffer_size; for (auto& alloc : allocs_) { @@ -92,6 +101,7 @@ class MemPatternPlanner { // blocks_ the list of currently allocated memory blocks, sorted in order of their offset std::list blocks_; size_t buffer_size{0}; + mutable OrtMutex lock_; }; } // namespace onnxruntime diff --git a/onnxruntime/core/framework/ml_value_pattern_planner.cc b/onnxruntime/core/framework/ml_value_pattern_planner.cc index f5da571f5d6f4..f171e60058e8b 100644 --- a/onnxruntime/core/framework/ml_value_pattern_planner.cc +++ b/onnxruntime/core/framework/ml_value_pattern_planner.cc @@ -3,19 +3,47 @@ #include #include "core/framework/ml_value_patterns_planner.h" -#include "core/framework/sequential_execution_plan.h" +#include "core/framework/execution_plan_base.h" namespace onnxruntime { -MLValuePatternPlanner::MLValuePatternPlanner(const SequentialExecutionPlan& execution_plan) +MLValuePatternPlanner::MLValuePatternPlanner(const ExecutionPlanBase& execution_plan) : execution_planner_{execution_plan} { - std::set locations; - for (auto& alloc_plan : execution_planner_.allocation_plan) { - if (locations.find(alloc_plan.location) == locations.end()) - locations.insert(alloc_plan.location); + for (auto& location : execution_plan.GetAllLocations()) { + planner_map_.emplace(location, std::make_unique()); } - for (auto& location : locations) { - pattern_planners_.push_back(std::make_unique()); - planner_map_[location] = pattern_planners_.back().get(); +} + +common::Status MLValuePatternPlanner::TraceAllocation(int ml_value_idx, size_t size) { + auto location = execution_planner_.GetLocation(ml_value_idx); + auto it = planner_map_.find(location); + if (it == planner_map_.end()) { + return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); } + + it->second->TraceAllocation(ml_value_idx, size); + return common::Status::OK(); +} + +common::Status MLValuePatternPlanner::TraceFree(int ml_value_index) { + auto location = execution_planner_.GetLocation(ml_value_index); + auto it = planner_map_.find(location); + if (it == planner_map_.end()) { + return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); + } + + it->second->TraceFree(ml_value_index); + return common::Status::OK(); } + +common::Status MLValuePatternPlanner::GeneratePatterns(MemoryPatternGroup* out) { + if (!out) return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); + + for (auto& it : planner_map_) { + out->locations.push_back(it.first); + out->patterns.push_back(it.second->GenerateMemPattern()); + } + + return common::Status::OK(); +} + } // namespace onnxruntime diff --git a/onnxruntime/core/framework/ml_value_patterns_planner.h b/onnxruntime/core/framework/ml_value_patterns_planner.h index 4d650b46f9812..271ad5d76e69f 100644 --- a/onnxruntime/core/framework/ml_value_patterns_planner.h +++ b/onnxruntime/core/framework/ml_value_patterns_planner.h @@ -6,60 +6,27 @@ #include #include "core/common/common.h" -#include "core/platform/ort_mutex.h" #include "core/framework/mem_pattern_planner.h" +#include "core/framework/execution_plan_base.h" #include "core/framework/allocation_planner.h" namespace onnxruntime { -struct SequentialExecutionPlan; +class ExecutionPlanBase; +// Thread-safe +// As it doesn't always work, the usage of it must be guarded by +// SessionOptions.enable_mem_pattern class MLValuePatternPlanner { public: - explicit MLValuePatternPlanner(const SequentialExecutionPlan& execution_plan); - - common::Status TraceAllocation(int ml_value_idx, size_t size) { - auto location = execution_planner_.allocation_plan[ml_value_idx].location; - auto it = planner_map_.find(location); - if (it == planner_map_.end()) { - return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); - } - - std::lock_guard lock(lock_); - it->second->TraceAllocation(ml_value_idx, size); - return common::Status::OK(); - } - - common::Status TraceFree(int ml_value_index) { - auto location = execution_planner_.allocation_plan[ml_value_index].location; - auto it = planner_map_.find(location); - if (it == planner_map_.end()) { - return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); - } - - std::lock_guard lock(lock_); - it->second->TraceFree(ml_value_index); - return common::Status::OK(); - } - - common::Status GeneratePatterns(MemoryPatternGroup* out) { - if (!out) - return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); - - std::lock_guard lock(lock_); - for (auto& it : planner_map_) { - out->locations.push_back(it.first); - out->patterns.push_back(it.second->GenerateMemPattern()); - } - - return common::Status::OK(); - } - - private: + explicit MLValuePatternPlanner(const ExecutionPlanBase& execution_plan); + common::Status TraceAllocation(int ml_value_idx, size_t size); + common::Status TraceFree(int ml_value_index); + common::Status GeneratePatterns(MemoryPatternGroup* out); ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(MLValuePatternPlanner); - mutable OrtMutex lock_; - std::map planner_map_; - std::vector > pattern_planners_; - const SequentialExecutionPlan& execution_planner_; + private: + // This map itself is const after the construction + std::map> planner_map_; + const ExecutionPlanBase& execution_planner_; }; } // namespace onnxruntime diff --git a/onnxruntime/core/framework/mlvalue_name_idx_map.h b/onnxruntime/core/framework/mlvalue_name_idx_map.h deleted file mode 100644 index 1da32369484bc..0000000000000 --- a/onnxruntime/core/framework/mlvalue_name_idx_map.h +++ /dev/null @@ -1,58 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -#pragma once - -#include -#include - -#include "core/common/common.h" - -//This class is not thread-safe -//TODO: this is a static hash lookup, it's easy to do it better -namespace onnxruntime { -class MLValueNameIdxMap { - public: - using const_iterator = typename std::unordered_map::const_iterator; - - MLValueNameIdxMap() = default; - - // Add MLValue name to map and return index associated with it. - // If entry already existed the existing index value is returned. - int Add(const std::string& name) { - auto it = map_.find(name); - if (it == map_.end()) { - int idx; - idx = mlvalue_max_idx_++; - map_.insert(it, {name, idx}); - return idx; - } - return it->second; - } - - common::Status GetIdx(const std::string& name, int& idx) const { - idx = -1; - - auto it = map_.find(name); - if (it == map_.end()) { - return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Could not find MLValue with name '", name, "'"); - } - - idx = it->second; - return common::Status::OK(); - } - - size_t Size() const { return map_.size(); }; - int MaxIdx() const { return mlvalue_max_idx_; } - - const_iterator begin() const noexcept { return map_.cbegin(); } - const_iterator end() const noexcept { return map_.cend(); } - - private: - ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(MLValueNameIdxMap); - - int mlvalue_max_idx_ = 0; - std::unordered_map map_; -}; - -} // namespace onnxruntime diff --git a/onnxruntime/core/framework/mlvalue_tensor_slicer.cc b/onnxruntime/core/framework/mlvalue_tensor_slicer.cc index dd5e1460cf2e5..e60d67df03060 100644 --- a/onnxruntime/core/framework/mlvalue_tensor_slicer.cc +++ b/onnxruntime/core/framework/mlvalue_tensor_slicer.cc @@ -1,37 +1,37 @@ // Copyright (c) Microsoft Corporation. All rights reserved. // Licensed under the MIT License. -#include "core/framework/mlvalue_tensor_slicer.h" +#include "core/framework/ort_value_tensor_slicer.h" #include namespace onnxruntime { template -MLValueTensorSlicer MLValueTensorSlicer::Create(T& mlvalue, int64_t slice_dimension, int64_t dim0_offset) { - static_assert(std::is_same, MLValue>::value, - "MLValueTensorSlicer can only be used with 'MLValue' or 'const MLValue'"); +MLValueTensorSlicer MLValueTensorSlicer::Create(T& ort_value, int64_t slice_dimension, int64_t dim0_offset) { + static_assert(std::is_same, OrtValue>::value, + "MLValueTensorSlicer can only be used with 'OrtValue' or 'const OrtValue'"); - ORT_ENFORCE(mlvalue.IsTensor(), "Can't slice a non-tensor MLValue. Type was ", mlvalue.Type()); - ORT_ENFORCE(mlvalue.IsAllocated(), "MLValue has not been allocated so can't be sliced."); + ORT_ENFORCE(ort_value.IsTensor(), "Can't slice a non-tensor OrtValue. Type was ", ort_value.Type()); + ORT_ENFORCE(ort_value.IsAllocated(), "OrtValue has not been allocated so can't be sliced."); - auto& tensor_shape{mlvalue.template Get().Shape()}; + auto& tensor_shape{ort_value.template Get().Shape()}; ORT_ENFORCE(gsl::narrow_cast(tensor_shape.NumDimensions()) >= slice_dimension, "Insufficient dimensions to slice on ", slice_dimension, ". Shape:", tensor_shape); auto dim0_size = tensor_shape[0]; ORT_ENFORCE(dim0_offset < dim0_size, "Invalid dim0_offset of ", dim0_offset, ". Dimension 0 is ", dim0_size); - return MLValueTensorSlicer{mlvalue, slice_dimension, dim0_offset}; + return MLValueTensorSlicer{ort_value, slice_dimension, dim0_offset}; }; template -MLValueTensorSlicer::Iterator::Iterator(T& mlvalue, size_t slice_dimension, size_t dim0_offset, - int64_t position, Direction direction) - : mlvalue_{&mlvalue}, +MLValueTensorSlicer::Iterator::Iterator(T& ort_value, size_t slice_dimension, size_t dim0_offset, int64_t position, + Direction direction) + : ort_value_{&ort_value}, position_{position}, increment_by_{direction == Direction::kForward ? 1 : -1}, position_materialized_{-1} { - const auto& tensor = mlvalue.template Get(); + const auto& tensor = ort_value.template Get(); tensor_data_type_ = tensor.DataType(); tensor_location_ = &tensor.Location(); @@ -71,28 +71,27 @@ void MLValueTensorSlicer::Iterator::MaterializeMLValue() const { position_materialized_ = position_; const void* tensor_slice_data_raw = static_cast(tensor_data_raw_) + (position_ * per_iteration_offset_); - // create a sub Tensor for the current position, and put it in an MLValue. + // create a sub Tensor for the current position, and put it in an OrtValue. // // We need the non-const data pointer from the Tensor in order to create the sub-Tensors as we iterate, // so a const_cast is required. - // However we will only return a non-const MLValue from operator* if MLValueTensorSlicer was created with - // a non-const MLValue, so externally we maintain constness as expected. + // However we will only return a non-const OrtValue from operator* if MLValueTensorSlicer was created with + // a non-const OrtValue, so externally we maintain constness as expected. // // TODO: Ideally we could avoid the overhead of creating a new Tensor but that would require // a lot more complexity (re-consider how ExecutionFrame and OpKernelContext work and whether - // they need to be MLValue based, or whether they could be Tensor based). + // they need to be OrtValue based, or whether they could be Tensor based). // Potential future performance enhancement. auto sub_tensor = std::make_unique(tensor_data_type_, per_iteration_shape_, const_cast(tensor_slice_data_raw), *tensor_location_); - current_ = MLValue{sub_tensor.release(), - DataTypeImpl::GetType(), - DataTypeImpl::GetType()->GetDeleteFunc()}; + current_ = + OrtValue{sub_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()}; } -template class MLValueTensorSlicer; -template class MLValueTensorSlicer; +template class MLValueTensorSlicer; +template class MLValueTensorSlicer; } // namespace onnxruntime diff --git a/onnxruntime/core/framework/mlvalue_tensor_slicer.h b/onnxruntime/core/framework/mlvalue_tensor_slicer.h deleted file mode 100644 index 92e1ba6fb1a4b..0000000000000 --- a/onnxruntime/core/framework/mlvalue_tensor_slicer.h +++ /dev/null @@ -1,148 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -#pragma once - -#include -#include -#include - -#include "core/common/common.h" -#include "core/framework/ml_value.h" -#include "core/framework/tensor.h" - -namespace onnxruntime { - -/** -Class to provide a slicing service over a Tensor stored within an MLValue with shape -{batch size, sequence length, }. Access to the slices is via an iterator interface. - -For each iteration an MLValue will be returned containing a sub-Tensor of the original Tensor. -The sub-Tensor applies the relevant offset to the data address from the original Tensor in order -to avoid any memory allocations/copies for the tensor data. -*/ -template -class MLValueTensorSlicer { - public: - /** - Create a new instance to slice the Tensor contained in an MLValue - into sub-Tensors contained within new MLValue instances that are accessed via the Iterator. - T must be 'MLValue' or 'const MLValue' - @param slice_dimension Dimension to slice on. - @param dim0_offset Offset to start at. Only meaningful if slice_dimension != 0. - e.g. if input is [batch, seq_len, data] and you want to slice the seq_len dimension, you need to - create an Iterator instance for each batch item, incrementing dim0_offset for each one. - */ - static MLValueTensorSlicer Create(T& mlvalue, int64_t slice_dimension = 0, int64_t dim0_offset = 0); - - class Iterator { - public: - using iterator_category = std::input_iterator_tag; - using value_type = T; - using difference_type = ptrdiff_t; - using pointer = T*; - using reference = T&; - using const_reference = std::add_const_t; - - enum class Direction { kForward, - kReverse }; - - explicit Iterator(T& mlvalue, size_t slice_dimension, size_t dim0_offset, - int64_t position, Direction direction = Direction::kForward); - - bool operator==(const Iterator& other) const noexcept { - return mlvalue_ == other.mlvalue_ && position_ == other.position_; - } - - bool operator!=(const Iterator& other) const noexcept { - return !(*this == other); - } - - Iterator& operator++() { - position_ += increment_by_; - return *this; - } - - Iterator operator++(int) { - Iterator tmp{*this}; - ++(*this); - - return tmp; - } - - Iterator& operator+=(difference_type n) { - position_ += increment_by_ * n; - return *this; - } - - // const accessor is always enabled - const_reference operator*() const { - ORT_ENFORCE(position_ >= 0 && position_ < sequence_length_); - if (position_ != position_materialized_) { - MaterializeMLValue(); - } - - return current_; - } - - // non-const is only enabled if T is not const (i.e. is 'MLValue' not 'const MLValue') - std::enable_if_t::value, reference> operator*() { - ORT_ENFORCE(position_ >= 0 && position_ < sequence_length_); - if (position_ != position_materialized_) { - MaterializeMLValue(); - } - - return current_; - } - - private: - void MaterializeMLValue() const; - - T* mlvalue_; - int64_t position_; - - // 1 for forward, -1 for reverse - // Alternatively we could apply a std::reverse_iterator adapter to Iterator, however the primary use case - // for this class involves passing a mix of forward/reverse iterator instances in a single collection so - // we need to handle the direction internally so only one type is involved in that collection. - const int64_t increment_by_; - - const void* tensor_data_raw_; - MLDataType tensor_data_type_; - const OrtAllocatorInfo* tensor_location_; - - int64_t sequence_length_; - TensorShape per_iteration_shape_; - size_t per_iteration_offset_; - - mutable int64_t position_materialized_; // position_ when current_ was created - mutable MLValue current_; - }; - - Iterator begin() const noexcept { return Iterator(*mlvalue_, slice_dimension_, dim0_offset_, 0); } - Iterator end() const noexcept { return Iterator(*mlvalue_, slice_dimension_, dim0_offset_, - std::numeric_limits::max()); } - - Iterator rbegin() const noexcept { - return Iterator(*mlvalue_, slice_dimension_, dim0_offset_, std::numeric_limits::max(), - Iterator::Direction::kReverse); - } - - Iterator rend() const noexcept { - return Iterator(*mlvalue_, slice_dimension_, dim0_offset_, -1, - Iterator::Direction::kReverse); - } - - private: - MLValueTensorSlicer(T& mlvalue, int64_t slice_dimension, int64_t dim0_offset) noexcept - : mlvalue_{&mlvalue}, - slice_dimension_{slice_dimension}, - dim0_offset_{dim0_offset} { - } - - T* mlvalue_; - int64_t slice_dimension_; - int64_t dim0_offset_; -}; - -} // namespace onnxruntime diff --git a/onnxruntime/core/framework/node_index_info.cc b/onnxruntime/core/framework/node_index_info.cc index c7d29058cc003..050bbde5f1024 100644 --- a/onnxruntime/core/framework/node_index_info.cc +++ b/onnxruntime/core/framework/node_index_info.cc @@ -3,26 +3,26 @@ #include "core/framework/node_index_info.h" -#include "core/framework/mlvalue_name_idx_map.h" +#include "core/framework/ort_value_name_idx_map.h" #include "core/graph/graph_viewer.h" #include "core/graph/node_arg.h" namespace onnxruntime { // if we have a full GraphViewer, assume the min node index is 0 -NodeIndexInfo::NodeIndexInfo(const GraphViewer& graph_viewer, const MLValueNameIdxMap& mlvalue_idx_map) - : min_node_index_{0}, max_mlvalue_idx_{mlvalue_idx_map.MaxIdx()} { - Init(graph_viewer.Nodes(), graph_viewer.MaxNodeIndex(), mlvalue_idx_map); +NodeIndexInfo::NodeIndexInfo(const GraphViewer& graph_viewer, const MLValueNameIdxMap& ort_value_idx_map) + : min_node_index_{0}, max_mlvalue_idx_{ort_value_idx_map.MaxIdx()} { + Init(graph_viewer.Nodes(), graph_viewer.MaxNodeIndex(), ort_value_idx_map); } -NodeIndexInfo::NodeIndexInfo(const GraphNodes& nodes, const MLValueNameIdxMap& mlvalue_idx_map) - : max_mlvalue_idx_{mlvalue_idx_map.MaxIdx()} { - Init(nodes, 0, mlvalue_idx_map); +NodeIndexInfo::NodeIndexInfo(const GraphNodes& nodes, const MLValueNameIdxMap& ort_value_idx_map) + : max_mlvalue_idx_{ort_value_idx_map.MaxIdx()} { + Init(nodes, 0, ort_value_idx_map); } -NodeIndexInfo::NodeIndexInfo(const std::vector& nodes, const MLValueNameIdxMap& mlvalue_idx_map) - : max_mlvalue_idx_{mlvalue_idx_map.MaxIdx()} { - Init(ValidNodes>(nodes), 0, mlvalue_idx_map); +NodeIndexInfo::NodeIndexInfo(const std::vector& nodes, const MLValueNameIdxMap& ort_value_idx_map) + : max_mlvalue_idx_{ort_value_idx_map.MaxIdx()} { + Init(ValidNodes>(nodes), 0, ort_value_idx_map); } template @@ -43,7 +43,8 @@ static void FindMinAndMaxNodeIndex(const TValidNodes& nodes, NodeIndex& min, Nod } template -void NodeIndexInfo::Init(const TValidNodes& nodes, NodeIndex max_node_index, const MLValueNameIdxMap& mlvalue_idx_map) { +void NodeIndexInfo::Init(const TValidNodes& nodes, NodeIndex max_node_index, + const MLValueNameIdxMap& ort_value_idx_map) { if (nodes.empty()) { // fairly stupid edge case to handle unit test for Constant. the Constant node becomes an initializer, leaving // the graph with no nodes. @@ -78,7 +79,7 @@ void NodeIndexInfo::Init(const TValidNodes& nodes, NodeIndex max_node_index, con auto& name = node_arg.Name(); if (node_arg.Exists()) { int index; - Status status = mlvalue_idx_map.GetIdx(name, index); + Status status = ort_value_idx_map.GetIdx(name, index); ORT_ENFORCE(status.IsOK(), status.ErrorMessage()); node_values_[cur_idx] = index; } diff --git a/onnxruntime/core/framework/node_index_info.h b/onnxruntime/core/framework/node_index_info.h index fbd0d9100cd69..2b954b13d0b70 100644 --- a/onnxruntime/core/framework/node_index_info.h +++ b/onnxruntime/core/framework/node_index_info.h @@ -17,11 +17,11 @@ class Node; class NodeIndexInfo final { public: // construct from a GraphViewer. - NodeIndexInfo(const GraphViewer& graph_viewer, const MLValueNameIdxMap& mlvalue_idx_map); + NodeIndexInfo(const GraphViewer& graph_viewer, const MLValueNameIdxMap& ort_value_idx_map); // construct from a subset of nodes. The min and max NodeIndex values will be calculated by iterating 'nodes'. - NodeIndexInfo(const GraphNodes& nodes, const MLValueNameIdxMap& mlvalue_idx_map); - NodeIndexInfo(const std::vector& nodes, const MLValueNameIdxMap& mlvalue_idx_map); + NodeIndexInfo(const GraphNodes& nodes, const MLValueNameIdxMap& ort_value_idx_map); + NodeIndexInfo(const std::vector& nodes, const MLValueNameIdxMap& ort_value_idx_map); enum { kInvalidEntry = -1 }; @@ -35,7 +35,7 @@ class NodeIndexInfo final { return node_offsets_[node_offsets_index]; } - // Get the mlvalue index value. + // Get the ort_value index value. // Returns kInvalidEntry for optional inputs/outputs that do not exist in this graph. int GetMLValueIndex(int offset) const { ORT_ENFORCE(offset >= 0 && static_cast(offset) < node_values_.size()); @@ -48,7 +48,7 @@ class NodeIndexInfo final { ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(NodeIndexInfo); template - void Init(const TValidNodes& nodes, NodeIndex max_node_index, const MLValueNameIdxMap& mlvalue_idx_map); + void Init(const TValidNodes& nodes, NodeIndex max_node_index, const MLValueNameIdxMap& ort_value_idx_map); // This vector contains the indices from the MLValueNameIdxMap in the SessionState for each Node's input/outputs. // Order is node inputs, implicit inputs, outputs. diff --git a/onnxruntime/core/framework/op_kernel.cc b/onnxruntime/core/framework/op_kernel.cc index dfd7489a2d294..eea0f6f53f577 100644 --- a/onnxruntime/core/framework/op_kernel.cc +++ b/onnxruntime/core/framework/op_kernel.cc @@ -28,7 +28,7 @@ Tensor* OpKernelContext::Output(int index, const TensorShape& shape) { return p_ml_value ? p_ml_value->GetMutable() : nullptr; } -MLValue* OpKernelContext::OutputMLValue(int index, const TensorShape& shape) { +OrtValue* OpKernelContext::OutputMLValue(int index, const TensorShape& shape) { if (index < 0 || index >= OutputCount()) return nullptr; @@ -36,7 +36,7 @@ MLValue* OpKernelContext::OutputMLValue(int index, const TensorShape& shape) { //"error: 'ret' may be used uninitialized in this function" //This warning only exists in Release build. //I believe it's a false alarm. - MLValue* p_ml_value = nullptr; + OrtValue* p_ml_value = nullptr; Status status = execution_frame_->GetOrCreateNodeOutputMLValue(GetOutputArgIndex(index), &shape, p_ml_value); ORT_ENFORCE(status.IsOK(), status.ErrorMessage()); return p_ml_value; @@ -59,13 +59,13 @@ Status OpKernelContext::GetTempSpaceAllocator(AllocatorPtr* output) const { MLDataType OpKernelContext::InputType(int index) const { int input_arg_index = GetInputArgIndex(index); - const MLValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(input_arg_index); + const OrtValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(input_arg_index); return p_ml_value ? p_ml_value->Type() : nullptr; } MLDataType OpKernelContext::OutputType(int index) const { auto output_arg_index = GetOutputArgIndex(index); - const MLValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(output_arg_index); + const OrtValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(output_arg_index); return p_ml_value ? p_ml_value->Type() : nullptr; } @@ -74,7 +74,7 @@ Fence_t OpKernelContext::InputFence(int index) const { return nullptr; int input_index = GetInputArgIndex(index); - const MLValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(input_index); + const OrtValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(input_index); return p_ml_value ? p_ml_value->Fence() : nullptr; } @@ -83,7 +83,7 @@ Fence_t OpKernelContext::ImplicitInputFence(int index) const { return nullptr; int input_index = GetImplicitInputArgIndex(index); - const MLValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(input_index); + const OrtValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(input_index); return p_ml_value ? p_ml_value->Fence() : nullptr; } @@ -92,11 +92,11 @@ Fence_t OpKernelContext::OutputFence(int index) const { return nullptr; auto output_arg_index = GetOutputArgIndex(index); - const MLValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(output_arg_index); + const OrtValue* p_ml_value = execution_frame_->GetNodeInputOrOutputMLValue(output_arg_index); return p_ml_value ? p_ml_value->Fence() : nullptr; } -Status OpKernelContext::GetOrCreateOutputMLValue(int index, MLValue*& p_value) { +Status OpKernelContext::GetOrCreateOutputMLValue(int index, OrtValue*& p_value) { auto output_arg_index = GetOutputArgIndex(index); ORT_ENFORCE(execution_frame_->GetOrCreateNodeOutputMLValue(output_arg_index, nullptr, p_value).IsOK()); return Status::OK(); @@ -118,7 +118,7 @@ onnxruntime::NodeIndex OpKernelContext::GetNodeIndex() const { return kernel_->Node().Index(); } -const MLValue* OpKernelContext::GetInputMLValue(int index) const { +const OrtValue* OpKernelContext::GetInputMLValue(int index) const { if (index < 0 || index >= InputCount()) return nullptr; @@ -126,7 +126,7 @@ const MLValue* OpKernelContext::GetInputMLValue(int index) const { return execution_frame_->GetNodeInputOrOutputMLValue(input_arg_index); } -const MLValue* OpKernelContext::GetImplicitInputMLValue(int index) const { +const OrtValue* OpKernelContext::GetImplicitInputMLValue(int index) const { if (index < 0 || index >= ImplicitInputCount()) return nullptr; @@ -134,7 +134,7 @@ const MLValue* OpKernelContext::GetImplicitInputMLValue(int index) const { return execution_frame_->GetNodeInputOrOutputMLValue(input_arg_index); } -MLValue* OpKernelContext::GetOutputMLValue(int index) { +OrtValue* OpKernelContext::GetOutputMLValue(int index) { if (index < 0 || index >= OutputCount()) return nullptr; diff --git a/onnxruntime/core/framework/op_kernel_context_internal.h b/onnxruntime/core/framework/op_kernel_context_internal.h index c65fbd0e18701..02515ba39a160 100644 --- a/onnxruntime/core/framework/op_kernel_context_internal.h +++ b/onnxruntime/core/framework/op_kernel_context_internal.h @@ -19,7 +19,7 @@ class OpKernelContextInternal : public OpKernelContext { IExecutionFrame& frame, const OpKernel& kernel, const logging::Logger& logger, - const std::vector& implicit_inputs, + const ConstPointerContainer> implicit_inputs, const bool& terminate_flag) : OpKernelContext(&frame, &kernel, logger), session_state_{session_state}, @@ -31,22 +31,22 @@ class OpKernelContextInternal : public OpKernelContext { return session_state_.GetSubgraphSessionState(GetNodeIndex(), attribute_name); } - const MLValue* GetInputMLValue(int index) const { + const OrtValue* GetInputMLValue(int index) const { return OpKernelContext::GetInputMLValue(index); } - MLValue* GetOutputMLValue(int index) { + OrtValue* GetOutputMLValue(int index) { return OpKernelContext::GetOutputMLValue(index); } - MLValue* OutputMLValue(int index, const TensorShape& shape) { + OrtValue* OutputMLValue(int index, const TensorShape& shape) { return OpKernelContext::OutputMLValue(index, shape); } - std::unordered_map GetImplicitInputs() const { - // we need to convert implicit_inputs_ to a name to MLValue map so it can be used in the ExecutionFrame + std::unordered_map GetImplicitInputs() const { + // we need to convert implicit_inputs_ to a name to OrtValue map so it can be used in the ExecutionFrame // for a subgraph (the index numbers will be different there). - std::unordered_map implicit_inputs_map; + std::unordered_map implicit_inputs_map; for (int i = 0, end = gsl::narrow_cast(implicit_inputs_.size()); i < end; ++i) { implicit_inputs_map[implicit_inputs_[i]->Name()] = GetImplicitInputMLValue(i); @@ -61,7 +61,7 @@ class OpKernelContextInternal : public OpKernelContext { private: const SessionState& session_state_; - const std::vector& implicit_inputs_; + const ConstPointerContainer> implicit_inputs_; const bool& terminate_flag_; }; diff --git a/onnxruntime/core/framework/op_kernel_info.cc b/onnxruntime/core/framework/op_kernel_info.cc index b1ad1b72debff..358059cf0ed63 100644 --- a/onnxruntime/core/framework/op_kernel_info.cc +++ b/onnxruntime/core/framework/op_kernel_info.cc @@ -1,35 +1,29 @@ // Copyright (c) Microsoft Corporation. All rights reserved. // Licensed under the MIT License. -#include "core/framework/mlvalue_name_idx_map.h" +#include "core/framework/ort_value_name_idx_map.h" #include "core/framework/fuse_nodes_funcs.h" #include "core/framework/op_kernel.h" #include "core/framework/op_kernel_info.h" namespace onnxruntime { -OpKernelInfo::OpKernelInfo(const onnxruntime::Node& node, - const KernelDef& kernel_def, +OpKernelInfo::OpKernelInfo(const onnxruntime::Node& node, const KernelDef& kernel_def, const IExecutionProvider& execution_provider, - const std::unordered_map& initialized_tensors, - const MLValueNameIdxMap& mlvalue_name_idx_map, - const FuncManager& funcs_mgr) + const std::unordered_map& initialized_tensors, + const MLValueNameIdxMap& ort_value_name_idx_map, const FuncManager& funcs_mgr) : OpNodeProtoHelper(&proto_helper_context_), node_(node), kernel_def_(kernel_def), execution_provider_(&execution_provider), initialized_tensors_(initialized_tensors), - mlvalue_name_idx_map_(mlvalue_name_idx_map), + ort_value_name_idx_map_(ort_value_name_idx_map), funcs_mgr_(funcs_mgr), proto_helper_context_(node) {} OpKernelInfo::OpKernelInfo(const OpKernelInfo& other) - : OpKernelInfo(other.node_, - other.kernel_def_, - *other.execution_provider_, - other.initialized_tensors_, - other.mlvalue_name_idx_map_, - other.funcs_mgr_) {} + : OpKernelInfo(other.node_, other.kernel_def_, *other.execution_provider_, other.initialized_tensors_, + other.ort_value_name_idx_map_, other.funcs_mgr_) {} const OrtAllocatorInfo& OpKernelInfo::GetAllocatorInfo(int device_id, OrtMemType mem_type) const { AllocatorPtr alloc = GetAllocator(device_id, mem_type); @@ -37,7 +31,7 @@ const OrtAllocatorInfo& OpKernelInfo::GetAllocatorInfo(int device_id, OrtMemType return alloc->Info(); } -const AllocatorPtr OpKernelInfo::GetAllocator(int device_id, OrtMemType mem_type) const { +AllocatorPtr OpKernelInfo::GetAllocator(int device_id, OrtMemType mem_type) const { return execution_provider_->GetAllocator(device_id, mem_type); } @@ -59,7 +53,7 @@ bool OpKernelInfo::TryGetConstantInput(int input_index, const Tensor** constant_ } auto& input_arg_name = node_.InputDefs()[input_index]->Name(); int input_arg_index = -1; - if (!mlvalue_name_idx_map_.GetIdx(input_arg_name, input_arg_index).IsOK()) { + if (!ort_value_name_idx_map_.GetIdx(input_arg_name, input_arg_index).IsOK()) { return false; } diff --git a/onnxruntime/core/framework/parallel_executor.cc b/onnxruntime/core/framework/parallel_executor.cc index e7a1982e29cc2..053cf653d582b 100644 --- a/onnxruntime/core/framework/parallel_executor.cc +++ b/onnxruntime/core/framework/parallel_executor.cc @@ -29,12 +29,10 @@ ParallelExecutor::ParallelExecutor(const SessionState& session_state, const bool executor_pool_ = std::make_unique("EXECUTOR", 32); } -Status ParallelExecutor::Execute(const SessionState& session_state, - const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::vector& fetch_mlvalue_idxs, - std::vector& fetches, - const std::unordered_map fetch_allocators, +Status ParallelExecutor::Execute(const SessionState& session_state, const std::vector& feed_mlvalue_idxs, + const std::vector& feeds, const std::vector& fetch_mlvalue_idxs, + std::vector& fetches, + const std::unordered_map& fetch_allocators, const logging::Logger& logger) { TimePoint tp; bool f_profiler_enabled = session_state.Profiler().FEnabled(); diff --git a/onnxruntime/core/framework/parallel_executor.h b/onnxruntime/core/framework/parallel_executor.h index 938354892c2df..c489f5e31ca3c 100644 --- a/onnxruntime/core/framework/parallel_executor.h +++ b/onnxruntime/core/framework/parallel_executor.h @@ -24,12 +24,10 @@ class ParallelExecutor : public IExecutor { ParallelExecutor(const bool& terminate_flag = false) : terminate_flag_{terminate_flag} {} ParallelExecutor(const SessionState& session_state, const bool& terminate_flag = false); - common::Status Execute(const SessionState& session_state, - const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::vector& fetch_mlvalue_idxs, - std::vector& fetches, - const std::unordered_map fetch_allocators, + common::Status Execute(const SessionState& session_state, const std::vector& feed_mlvalue_idxs, + const std::vector& feeds, const std::vector& fetch_mlvalue_idxs, + std::vector& fetches, + const std::unordered_map& fetch_allocators, const logging::Logger& logger) override; private: diff --git a/onnxruntime/core/framework/path_lib.h b/onnxruntime/core/framework/path_lib.h index 7845bbf535f66..3832fd5522f80 100644 --- a/onnxruntime/core/framework/path_lib.h +++ b/onnxruntime/core/framework/path_lib.h @@ -236,10 +236,9 @@ void LoopDir(const std::string& dir_name, T func) { auto e = errno; char buf[1024]; char* msg; -#if defined(_GNU_SOURCE) && !defined(__APPLE__) +#if defined(__GLIBC__) && defined(_GNU_SOURCE) msg = strerror_r(e, buf, sizeof(buf)); #else - // for Mac OS X if (strerror_r(e, buf, sizeof(buf)) != 0) { buf[0] = '\0'; } diff --git a/onnxruntime/core/framework/sequential_execution_plan.h b/onnxruntime/core/framework/sequential_execution_plan.h index f7d20acf443b0..c1d3523fa75e9 100644 --- a/onnxruntime/core/framework/sequential_execution_plan.h +++ b/onnxruntime/core/framework/sequential_execution_plan.h @@ -6,6 +6,7 @@ #include "core/graph/basic_types.h" #include "core/framework/alloc_kind.h" #include "core/framework/data_types.h" +#include "core/framework/execution_plan_base.h" namespace onnxruntime { // Every ml-value has a unique name and is assigned a unique integral number. @@ -25,7 +26,7 @@ struct AllocPlanPerValue { MLDataType value_type{nullptr}; OrtAllocatorInfo location; // reused_buffer is valid only if alloc_kind == kReuse. It indicates - // which MLValue's buffer must be reused for this MLValue. + // which OrtValue's buffer must be reused for this OrtValue. MLValueIndex reused_buffer{0}; // if the value is used in async kernel, a fence object would be created // note the fence object would be shared between MLValues reusing the same buffer @@ -37,7 +38,7 @@ struct AllocPlanPerValue { // SequentialExecutionPlan: This is the data that is produced by a static // planner for a sequential execution, to be used by a SequentialExecutor. -struct SequentialExecutionPlan { +struct SequentialExecutionPlan : public ExecutionPlanBase { // Allocation plan: // ExecutionFrame::GetOrCreateTensor() should use the following information // to decide whether to allocate a new buffer or reuse an existing buffer @@ -67,6 +68,22 @@ struct SequentialExecutionPlan { // to_be_freed: vector elements represent indices of ml-values to be freed (as described above) std::vector to_be_freed; + + const OrtAllocatorInfo& GetLocation(size_t ort_value_index) const override { + return allocation_plan[ort_value_index].location; + } + + void SetLocation(size_t ort_value_index, const struct OrtAllocatorInfo& info) override { + allocation_plan[ort_value_index].location = info; + } + + std::set GetAllLocations() const override { + std::set locations; + for (auto& alloc_plan : allocation_plan) { + if (locations.find(alloc_plan.location) == locations.end()) locations.insert(alloc_plan.location); + } + return locations; + } }; // Output details of an execution plan: diff --git a/onnxruntime/core/framework/sequential_executor.cc b/onnxruntime/core/framework/sequential_executor.cc index e5bd2c55cf182..94100bc020105 100644 --- a/onnxruntime/core/framework/sequential_executor.cc +++ b/onnxruntime/core/framework/sequential_executor.cc @@ -22,12 +22,10 @@ static Status ReleaseNodeMLValues(ExecutionFrame& frame, const SequentialExecutionPlan::NodeExecutionPlan& node_exec_plan, const logging::Logger& logger); -Status SequentialExecutor::Execute(const SessionState& session_state, - const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::vector& fetch_mlvalue_idxs, - std::vector& fetches, - const std::unordered_map fetch_allocators, +Status SequentialExecutor::Execute(const SessionState& session_state, const std::vector& feed_mlvalue_idxs, + const std::vector& feeds, const std::vector& fetch_mlvalue_idxs, + std::vector& fetches, + const std::unordered_map& fetch_allocators, const logging::Logger& logger) { bool f_profiler_enabled = session_state.Profiler().FEnabled(); TimePoint tp; @@ -205,9 +203,9 @@ static Status ReleaseNodeMLValues(ExecutionFrame& frame, const SequentialExecutionPlan::NodeExecutionPlan& node_exec_plan, const logging::Logger& logger) { for (auto i = node_exec_plan.free_from_index; i <= node_exec_plan.free_to_index; ++i) { - auto mlvalue_idx = seq_exec_plan.to_be_freed[i]; - VLOGS(logger, 1) << "Releasing mlvalue with index: " << mlvalue_idx; - ORT_RETURN_IF_ERROR(frame.ReleaseMLValue(mlvalue_idx)); + auto ort_value_idx = seq_exec_plan.to_be_freed[i]; + VLOGS(logger, 1) << "Releasing ort_value with index: " << ort_value_idx; + ORT_RETURN_IF_ERROR(frame.ReleaseMLValue(ort_value_idx)); } return Status::OK(); diff --git a/onnxruntime/core/framework/sequential_executor.h b/onnxruntime/core/framework/sequential_executor.h index c241c27f6726d..82c1aeccc3bb8 100644 --- a/onnxruntime/core/framework/sequential_executor.h +++ b/onnxruntime/core/framework/sequential_executor.h @@ -4,6 +4,7 @@ #pragma once #include +#include #include "core/common/common.h" #include "core/common/status.h" #include "core/common/logging/logging.h" @@ -18,12 +19,10 @@ class SequentialExecutor : public IExecutor { public: SequentialExecutor(const bool& terminate_flag = false) : terminate_flag_{terminate_flag} {} - common::Status Execute(const SessionState& session_state, - const std::vector& feed_mlvalue_idxs, - const std::vector& feeds, - const std::vector& fetch_mlvalue_idxs, - std::vector& fetches, - const std::unordered_map fetch_allocators, + common::Status Execute(const SessionState& session_state, const std::vector& feed_mlvalue_idxs, + const std::vector& feeds, const std::vector& fetch_mlvalue_idxs, + std::vector& fetches, + const std::unordered_map& fetch_allocators, const logging::Logger& logger) override; private: diff --git a/onnxruntime/core/framework/session_state.cc b/onnxruntime/core/framework/session_state.cc index 62d7932f8eff7..6669cdc449afd 100644 --- a/onnxruntime/core/framework/session_state.cc +++ b/onnxruntime/core/framework/session_state.cc @@ -21,11 +21,8 @@ void SessionState::SetGraphViewer(std::unique_ptr grap const GraphViewer* SessionState::GetGraphViewer() const { return graph_viewer_.get(); } const OpKernel* SessionState::GetKernel(NodeIndex node_id) const { - if (session_kernels_.count(node_id) == 0) { - return nullptr; - } - - return session_kernels_.find(node_id)->second.get(); + auto kernel = session_kernels_.find(node_id); + return (kernel != session_kernels_.cend()) ? kernel->second.get() : nullptr; } void SessionState::AddKernel(onnxruntime::NodeIndex node_id, std::unique_ptr p_kernel) { @@ -39,17 +36,17 @@ void SessionState::SetExecutionPlan(std::unique_ptr p_s const SequentialExecutionPlan* SessionState::GetExecutionPlan() const { return p_seq_exec_plan_.get(); } -Status SessionState::AddInitializedTensor(int mlvalue_index, const MLValue& mlvalue, const OrtCallback* d) { - ORT_ENFORCE(mlvalue_index >= 0 && mlvalue_index <= mlvalue_name_idx_map_.MaxIdx()); - auto p = initialized_tensors_.insert({mlvalue_index, mlvalue}); +Status SessionState::AddInitializedTensor(int ort_value_index, const OrtValue& ort_value, const OrtCallback* d) { + ORT_ENFORCE(ort_value_index >= 0 && ort_value_index <= ort_value_name_idx_map_.MaxIdx()); + auto p = initialized_tensors_.insert({ort_value_index, ort_value}); if (!p.second) - return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "duplicated mlvalue index:", mlvalue_index, + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "duplicated ort_value index:", ort_value_index, ". Do you have duplicated calls to SessionState::AddInitializedTensor function?"); - if (d != nullptr && d->f != nullptr) deleter_for_initialized_tensors_[mlvalue_index] = *d; + if (d != nullptr && d->f != nullptr) deleter_for_initialized_tensors_[ort_value_index] = *d; return Status::OK(); } -const std::unordered_map& SessionState::GetInitializedTensors() const { return initialized_tensors_; } +const std::unordered_map& SessionState::GetInitializedTensors() const { return initialized_tensors_; } SessionState& SessionState::SetLogger(const logging::Logger& logger) { logger_ = &logger; @@ -96,10 +93,6 @@ Status SessionState::UpdateMemoryPatternGroupCache(const std::vector& node_info_vec) const { - if (!input_names_to_nodeinfo_mapping_.count(input_name)) { + auto entry = input_names_to_nodeinfo_mapping_.find(input_name); + if (entry == input_names_to_nodeinfo_mapping_.cend()) { return Status(ONNXRUNTIME, FAIL, "Failed to find input name in the mapping: " + input_name); } - node_info_vec = input_names_to_nodeinfo_mapping_.at(input_name); + + node_info_vec = entry->second; return Status::OK(); } @@ -206,7 +201,7 @@ const SessionState* SessionState::GetSubgraphSessionState(onnxruntime::NodeIndex void SessionState::CalculateNodeIndexInfo() { ORT_ENFORCE(graph_viewer_); - node_index_info_ = std::make_unique(*graph_viewer_, mlvalue_name_idx_map_); + node_index_info_ = std::make_unique(*graph_viewer_, ort_value_name_idx_map_); for (auto& node_to_map_pair : subgraph_session_states_) { for (auto& attr_name_to_subgraph : node_to_map_pair.second) { diff --git a/onnxruntime/core/framework/session_state.h b/onnxruntime/core/framework/session_state.h index 08a53afb9bc8f..9e8daf934e9f3 100644 --- a/onnxruntime/core/framework/session_state.h +++ b/onnxruntime/core/framework/session_state.h @@ -20,7 +20,7 @@ #include "core/framework/mem_pattern.h" #include "core/framework/ml_value.h" #include "core/common/callback.h" -#include "core/framework/mlvalue_name_idx_map.h" +#include "core/framework/ort_value_name_idx_map.h" #include "core/framework/node_index_info.h" #include "core/graph/graph_viewer.h" #include "core/framework/fuse_nodes_funcs.h" @@ -42,9 +42,8 @@ struct MemoryPatternGroup; */ class SessionState { public: - SessionState(const ExecutionProviders& execution_providers) - : execution_providers_{execution_providers} { - } + SessionState(const ExecutionProviders& execution_providers, bool enable_mem_pattern) + : execution_providers_{execution_providers}, enable_mem_pattern_(enable_mem_pattern) {} ~SessionState() { for (auto& kvp : deleter_for_initialized_tensors_) { @@ -65,23 +64,23 @@ class SessionState { const ExecutionProviders& GetExecutionProviders() const noexcept { return execution_providers_; } - const MLValueNameIdxMap& GetMLValueNameIdxMap() const noexcept { return mlvalue_name_idx_map_; } - MLValueNameIdxMap& GetMLValueNameIdxMap() noexcept { return mlvalue_name_idx_map_; } + const MLValueNameIdxMap& GetMLValueNameIdxMap() const noexcept { return ort_value_name_idx_map_; } + MLValueNameIdxMap& GetMLValueNameIdxMap() noexcept { return ort_value_name_idx_map_; } // initialized tensors /** - * Adds an initialized tensor (weight) so that it can be used by the - * execution frame to setup the appropriate MLValue vectors. - * This function will take a shallow copy of d if d is not NULL - */ - Status AddInitializedTensor(int mlvalue_index, const MLValue& mlvalue, const OrtCallback* d); + * Adds an initialized tensor (weight) so that it can be used by the + * execution frame to setup the appropriate OrtValue vectors. + * This function will take a shallow copy of d if d is not NULL + */ + Status AddInitializedTensor(int ort_value_index, const OrtValue& ort_value, const OrtCallback* d); /** - * Gets the list of all initialized tensors (weights) so that it can be used by the - * execution frame to setup the appropriate MLValue vectors. - * The lifetime of returned MLValues are limited by this SessionState object. - */ - const std::unordered_map& GetInitializedTensors() const; + * Gets the list of all initialized tensors (weights) so that it can be used by the + * execution frame to setup the appropriate OrtValue vectors. + * The lifetime of returned MLValues are limited by this SessionState object. + */ + const std::unordered_map& GetInitializedTensors() const; // execution plan void SetExecutionPlan(std::unique_ptr p_seq_exec_plan); @@ -121,11 +120,6 @@ class SessionState { Status UpdateMemoryPatternGroupCache(const std::vector& input_shape, std::unique_ptr mem_patterns) const; - /** - Set enable memory pattern flag - */ - void SetEnableMemoryPattern(bool flag); - /** Get enable memory pattern flag */ @@ -180,7 +174,7 @@ class SessionState { const FuncManager& GetFuncMgr() const { return fused_funcs_mgr_; } FuncManager& GetMutableFuncMgr() { return fused_funcs_mgr_; } - std::map& GetMutableWeightsBuffers() { return weights_buffers_; } + std::vector& GetMutableWeightsBuffers() { return weights_buffers_; } void CalculateNodeIndexInfo(); const NodeIndexInfo& GetNodeIndexInfo() const; @@ -194,21 +188,21 @@ class SessionState { std::unique_ptr graph_viewer_; const ExecutionProviders& execution_providers_; // owned by InferenceSession - MLValueNameIdxMap mlvalue_name_idx_map_; + MLValueNameIdxMap ort_value_name_idx_map_; // initialized tensorset - std::unordered_map initialized_tensors_; // key is mlvalue_index + std::unordered_map initialized_tensors_; // key is ort_value_index // This data structure is for unintializing string tensors and // munmap memory region and close file descriptor std::unordered_map deleter_for_initialized_tensors_; - std::map weights_buffers_; + std::vector weights_buffers_; std::unique_ptr p_seq_exec_plan_ = nullptr; const logging::Logger* logger_ = nullptr; profiling::Profiler* profiler_; // switch for enable memory pattern optimization or not. - bool enable_mem_pattern_ = true; + const bool enable_mem_pattern_; // lock for the mem_patterns_ mutable OrtMutex mem_patterns_lock_; // cache for the generated mem_patterns. key is calculated based on input shapes. diff --git a/onnxruntime/core/framework/session_state_initializer.cc b/onnxruntime/core/framework/session_state_initializer.cc index ab72323c497dd..d8e23321118c3 100644 --- a/onnxruntime/core/framework/session_state_initializer.cc +++ b/onnxruntime/core/framework/session_state_initializer.cc @@ -15,27 +15,25 @@ #include "core/framework/graph_partitioner.h" #include "core/framework/ml_value.h" #include "core/framework/ml_value_patterns_planner.h" -#include "core/framework/mlvalue_name_idx_map.h" +#include "core/framework/ort_value_name_idx_map.h" #include "core/framework/sequential_execution_plan.h" #include "core/framework/session_state.h" #include "core/framework/tensorprotoutils.h" #include "core/framework/utils.h" #include "core/framework/mem_buffer.h" +#include "core/framework/tensor_allocator.h" namespace onnxruntime { static common::Status SaveMLValueNameIndexMapping(const GraphViewer& graph_viewer, - MLValueNameIdxMap& mlvalue_name_idx_map, + MLValueNameIdxMap& ort_value_name_idx_map, const logging::Logger& logger); -// T should have signature of '(int idx, const onnxruntime::MLValue& value, const OrtCallback& d) -> Status' +// T should have signature of '(int idx, const OrtValue& value, const OrtCallback& d) -> Status' template static common::Status SaveInitializedTensors(const Env& env, const std::basic_string& graph_loc, - const onnxruntime::Graph& graph, - const SequentialExecutionPlan& execution_plan, - const ExecutionProviders& exec_providers, - const MLValueNameIdxMap& mlvalue_name_idx_map, - std::map& weights_buffers, + const onnxruntime::Graph& graph, const ExecutionProviders& exec_providers, + const MLValueNameIdxMap& ort_value_name_idx_map, ITensorAllocator* planner, const T& save_tensor_func, const logging::Logger& logger); static common::Status SaveKernels(const ExecutionProviders& execution_providers, @@ -43,12 +41,14 @@ static common::Status SaveKernels(const ExecutionProviders& execution_providers, const KernelRegistryManager& custom_registry_manager, const logging::Logger& logger); -static common::Status SaveInputOutputNamesToNodeMapping(const onnxruntime::Graph& graph, - const KernelRegistryManager& custom_registry_manager, - SessionState& session_state, - const std::vector* implicit_inputs); +static common::Status SaveInputOutputNamesToNodeMapping( + const onnxruntime::Graph& graph, + const KernelRegistryManager& custom_registry_manager, + SessionState& session_state, + const ConstPointerContainer>* implicit_inputs); -SessionStateInitializer::SessionStateInitializer(const std::basic_string& graph_loc, +SessionStateInitializer::SessionStateInitializer(bool enable_mem_pattern, + const std::basic_string& graph_loc, onnxruntime::Graph& graph, SessionState& session_state, const ExecutionProviders& providers, KernelRegistryManager& kernel_registry_manager) @@ -57,69 +57,59 @@ SessionStateInitializer::SessionStateInitializer(const std::basic_string& outer_scope_node_args, - bool enable_sequential_execution) { +common::Status SessionStateInitializer::CreatePlan( + const Node* parent_node, + const ConstPointerContainer>* outer_scope_node_args, + bool enable_sequential_execution) { auto graph_viewer = std::make_unique(graph_); // populate the SessionState MLValueNameIdxMap - auto& mlvalue_name_idx_map = session_state_.GetMLValueNameIdxMap(); - ORT_RETURN_IF_ERROR(SaveMLValueNameIndexMapping(*graph_viewer, mlvalue_name_idx_map, logger_)); + auto& ort_value_name_idx_map = session_state_.GetMLValueNameIdxMap(); + ORT_RETURN_IF_ERROR(SaveMLValueNameIndexMapping(*graph_viewer, ort_value_name_idx_map, logger_)); // ignore any outer scope args we don't know about. this can happen if a node contains multiple subgraphs. std::vector valid_outer_scope_node_args; - std::for_each(outer_scope_node_args.cbegin(), outer_scope_node_args.cend(), - [&mlvalue_name_idx_map, &valid_outer_scope_node_args](const NodeArg* node_arg) { - int idx; - if (mlvalue_name_idx_map.GetIdx(node_arg->Name(), idx).IsOK()) { - valid_outer_scope_node_args.push_back(node_arg); - }; - }); - - std::unique_ptr exec_plan; - - if (enable_sequential_execution) { - // CreatePlan will create a new SequentialExecutionPlan instance that we will - // save into the session state. - ORT_RETURN_IF_ERROR( - SequentialPlanner::CreatePlan(parent_node, *graph_viewer, valid_outer_scope_node_args, execution_providers_, - kernel_registry_manager_, mlvalue_name_idx_map, exec_plan)); - - session_state_.SetExecutionPlan(std::move(exec_plan)); - } else { - // Parallel execution still uses same allocation plan, but has limitation of memory buffer reuse. - SequentialPlannerContext context(true /* enable parallel execution */); - ORT_RETURN_IF_ERROR( - SequentialPlanner::CreatePlan(parent_node, *graph_viewer, valid_outer_scope_node_args, execution_providers_, - kernel_registry_manager_, mlvalue_name_idx_map, context, exec_plan)); - - session_state_.SetExecutionPlan(std::move(exec_plan)); + if (outer_scope_node_args) { + std::for_each(outer_scope_node_args->cbegin(), outer_scope_node_args->cend(), + [&ort_value_name_idx_map, &valid_outer_scope_node_args](const NodeArg* node_arg) { + int idx; + if (ort_value_name_idx_map.GetIdx(node_arg->Name(), idx).IsOK()) { + valid_outer_scope_node_args.push_back(node_arg); + }; + }); } + std::unique_ptr exec_plan; + SequentialPlannerContext context(!enable_sequential_execution); + ORT_RETURN_IF_ERROR(SequentialPlanner::CreatePlan(parent_node, *graph_viewer, valid_outer_scope_node_args, + execution_providers_, kernel_registry_manager_, + ort_value_name_idx_map, context, exec_plan)); + session_state_.SetExecutionPlan(std::move(exec_plan)); session_state_.SetGraphViewer(std::move(graph_viewer)); return Status::OK(); } -common::Status SessionStateInitializer::InitializeAndSave(const std::vector* implicit_inputs) { +common::Status SessionStateInitializer::InitializeAndSave( + const ConstPointerContainer>* implicit_inputs) { const auto* exec_plan_ptr = session_state_.GetExecutionPlan(); ORT_ENFORCE(exec_plan_ptr, "Execution plan was not found in SessionState. CreatePlan must be called first."); - const auto& exec_plan{*exec_plan_ptr}; - const auto& mlvalue_name_idx_map{session_state_.GetMLValueNameIdxMap()}; + const auto& ort_value_name_idx_map{session_state_.GetMLValueNameIdxMap()}; + std::unique_ptr tensor_allocator_(ITensorAllocator::Create( + enable_mem_pattern_, *exec_plan_ptr, execution_providers_, session_state_.GetMutableWeightsBuffers())); // lambda to save initialized tensors into SessionState directly const Env& env = Env::Default(); - ORT_RETURN_IF_ERROR( - SaveInitializedTensors( - env, graph_loc_, graph_, exec_plan, execution_providers_, mlvalue_name_idx_map, - session_state_.GetMutableWeightsBuffers(), - [this](int idx, const onnxruntime::MLValue& value, const OrtCallback& d) -> Status { - return session_state_.AddInitializedTensor(idx, value, &d); - }, - logger_)); + ORT_RETURN_IF_ERROR(SaveInitializedTensors( + env, graph_loc_, graph_, execution_providers_, ort_value_name_idx_map, tensor_allocator_.get(), + [this](int idx, const OrtValue& value, const OrtCallback& d) -> Status { + return session_state_.AddInitializedTensor(idx, value, &d); + }, + logger_)); // remove weights from the graph now to save memory but in many cases it won't save memory, if the tensor was // preallocated with the some other tensors in a single 'allocate' call, which is very common. // TODO: make it better @@ -132,25 +122,24 @@ common::Status SessionStateInitializer::InitializeAndSave(const std::vectoridx mapping -common::Status SaveMLValueNameIndexMapping(const GraphViewer& graph_viewer, - MLValueNameIdxMap& mlvalue_name_idx_map, +// Build the OrtValue name->idx mapping +common::Status SaveMLValueNameIndexMapping(const GraphViewer& graph_viewer, MLValueNameIdxMap& ort_value_name_idx_map, const logging::Logger& logger) { LOGS(logger, INFO) << "SaveMLValueNameIndexMapping"; int idx = 0; // we keep all graph inputs (including initializers), even if they are unused, so make sure they all have an entry for (const auto* input_def : graph_viewer.GetInputsIncludingInitializers()) { - idx = mlvalue_name_idx_map.Add(input_def->Name()); + idx = ort_value_name_idx_map.Add(input_def->Name()); VLOGS(logger, 1) << "Added graph_viewer input with name: " << input_def->Name() << " to MLValueIndex with index: " << idx; } for (auto& node : graph_viewer.Nodes()) { - // build the MLValue->index map + // build the OrtValue->index map for (const auto* input_def : node.InputDefs()) { if (input_def->Exists()) { - idx = mlvalue_name_idx_map.Add(input_def->Name()); + idx = ort_value_name_idx_map.Add(input_def->Name()); VLOGS(logger, 1) << "Added input argument with name: " << input_def->Name() << " to MLValueIndex with index: " << idx; } @@ -158,7 +147,7 @@ common::Status SaveMLValueNameIndexMapping(const GraphViewer& graph_viewer, for (const auto* input_def : node.ImplicitInputDefs()) { if (input_def->Exists()) { - idx = mlvalue_name_idx_map.Add(input_def->Name()); + idx = ort_value_name_idx_map.Add(input_def->Name()); VLOGS(logger, 1) << "Added implicit input argument with name: " << input_def->Name() << " to MLValueIndex with index: " << idx; } @@ -166,33 +155,34 @@ common::Status SaveMLValueNameIndexMapping(const GraphViewer& graph_viewer, for (const auto* output_def : node.OutputDefs()) { if (output_def->Exists()) { - mlvalue_name_idx_map.Add(output_def->Name()); + ort_value_name_idx_map.Add(output_def->Name()); VLOGS(logger, 1) << "Added output argument with name: " << output_def->Name() << " to MLValueIndex with index: " << idx; } } } - // allocate MLValue for graph outputs when coming from initializers + // allocate OrtValue for graph outputs when coming from initializers for (const auto& output : graph_viewer.GetOutputs()) { if (output->Exists()) { - idx = mlvalue_name_idx_map.Add(output->Name()); + idx = ort_value_name_idx_map.Add(output->Name()); VLOGS(logger, 1) << "Added graph output with name: " << output->Name() << " to MLValueIndex with index: " << idx; } } - LOGS(logger, INFO) << "Done saving MLValue mappings."; + LOGS(logger, INFO) << "Done saving OrtValue mappings."; return Status::OK(); } static common::Status DeserializeTensorProto(const Env& env, const std::basic_string& proto_path, const ONNX_NAMESPACE::TensorProto& tensor_proto, const MemBuffer& m, - const ExecutionProviders& exec_providers, MLValue& mlvalue, OrtCallback& deleter) { + const ExecutionProviders& exec_providers, OrtValue& ort_value, + OrtCallback& deleter) { const OrtAllocatorInfo& alloc_info = m.GetAllocInfo(); if (strcmp(alloc_info.name, CPU) == 0 || alloc_info.mem_type == OrtMemTypeCPUOutput) { // deserialize directly to CPU tensor - return utils::TensorProtoToMLValue(env, proto_path.c_str(), tensor_proto, m, mlvalue, deleter); + return utils::TensorProtoToMLValue(env, proto_path.c_str(), tensor_proto, m, ort_value, deleter); } //alloc_info.name is not 'CPU' const IExecutionProvider* provider = exec_providers.Get(alloc_info); @@ -212,17 +202,17 @@ static common::Status DeserializeTensorProto(const Env& env, const std::basic_st size_t cpu_tensor_length; ORT_RETURN_IF_ERROR(utils::GetSizeInBytesFromTensorProto<0>(tensor_proto, &cpu_tensor_length)); if (m.GetLen() < cpu_tensor_length) { - return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Internal error. The preallocated buffer is too small. Requires ", cpu_tensor_length, - ", Got ", m.GetLen()); + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Internal error. The preallocated buffer is too small. Requires ", + cpu_tensor_length, ", Got ", m.GetLen()); } - OrtAllocatorInfo info(CPU, OrtDeviceAllocator, 0, OrtMemTypeDefault); + OrtAllocatorInfo info = exec_providers.GetDefaultCpuAllocatorInfo(); std::unique_ptr data(new char[cpu_tensor_length]); std::unique_ptr p_tensor; - MLValue tmp_mlvalue; + OrtValue tmp_ort_value; OrtCallback d; - ORT_RETURN_IF_ERROR(utils::TensorProtoToMLValue( - env, proto_path.c_str(), tensor_proto, MemBuffer(data.get(), cpu_tensor_length, info), tmp_mlvalue, d)); - const Tensor& p_deserialize_tensor = tmp_mlvalue.Get(); + ORT_RETURN_IF_ERROR(utils::TensorProtoToMLValue(env, proto_path.c_str(), tensor_proto, + MemBuffer(data.get(), cpu_tensor_length, info), tmp_ort_value, d)); + const Tensor& p_deserialize_tensor = tmp_ort_value.Get(); p_tensor = std::make_unique(p_deserialize_tensor.DataType(), p_deserialize_tensor.Shape(), m.GetBuffer(), m.GetAllocInfo()); @@ -239,132 +229,57 @@ static common::Status DeserializeTensorProto(const Env& env, const std::basic_st } return copy_status; } - mlvalue.Init(p_tensor.release(), - DataTypeImpl::GetType(), - DataTypeImpl::GetType()->GetDeleteFunc()); + ort_value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); return common::Status::OK(); } -static common::Status AllocatePlannedBuffers(const MemoryPatternGroup& mem_patterns, - const ExecutionProviders& exec_providers, - std::map& weights_buffers) { - const size_t location_len = mem_patterns.locations.size(); - for (size_t i = 0; i < location_len; ++i) { - auto& location = mem_patterns.locations[i]; - ORT_ENFORCE(weights_buffers.find(location) == weights_buffers.end(), "Existing entry in weights buffer for ", - location.name); - - auto alloc = utils::GetAllocator(exec_providers, location); - if (!alloc) - return Status(common::ONNXRUNTIME, common::FAIL, "Failed to get allocator for location: " + location.ToString()); - - if (mem_patterns.patterns[i].PeakSize() > 0) { - void* buffer = alloc->Alloc(mem_patterns.patterns[i].PeakSize()); - auto kvp = weights_buffers.insert(std::make_pair(location, BufferUniquePtr(buffer, alloc))); - if (!kvp.second) { - alloc->Free(buffer); - return Status(common::ONNXRUNTIME, common::FAIL, "duplicated location"); - } - } - } - return Status::OK(); -} - -/** - * When it succeeded, p could be NULL if the tensor with 'mlvalue_index' will not have any element - */ -static common::Status GetPreallocatedBuffer(const MemoryPatternGroup& mem_patterns, const OrtAllocatorInfo& location, - int mlvalue_index, - const std::map& weights_buffers, - const char* name, void*& p, size_t& len) { - auto pattern = mem_patterns.GetPatterns(location); - if (pattern == nullptr) { - return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Mem pattern for initializer ", name, " is not found"); - } - // if block is not found, means this mlvalue is not traced - // fall back to allocate separate buffer. - // if it->second.get() is null, then fall back to the block not found case - auto block = pattern->GetBlock(mlvalue_index); - auto it = weights_buffers.find(location); - if (it == weights_buffers.end()) { - if (block != nullptr && block->size_ == 0) { - // Because the size is 0, this miss find is expected. we won't allocate a buffer with size of zero. - p = nullptr; - len = 0; - return Status::OK(); - } - return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Weight buffer for initializer '", name, "' is not found"); - } - - if (block == nullptr || it->second == nullptr) { - return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Get preallocated buffer for initializer '", name, "' failed"); - } - - p = reinterpret_cast(it->second.get()) + block->offset_; - len = block->size_; - return Status::OK(); -} - template common::Status SaveInitializedTensors(const Env& env, const std::basic_string& graph_loc, - const Graph& graph, const SequentialExecutionPlan& execution_plan, - const ExecutionProviders& exec_providers, - const MLValueNameIdxMap& mlvalue_name_idx_map, - std::map& weights_buffers, + const Graph& graph, const ExecutionProviders& exec_providers, + const MLValueNameIdxMap& ort_value_name_idx_map, ITensorAllocator* planner, const T& save_tensor_func, const logging::Logger& logger) { LOGS(logger, INFO) << "Saving initialized tensors."; - static constexpr int alignment = 256; - ORT_ENFORCE(mlvalue_name_idx_map.MaxIdx() > 0, "MLValue indexes should have been populated."); - - MLValuePatternPlanner planner(execution_plan); + ORT_ENFORCE(ort_value_name_idx_map.MaxIdx() > 0, "OrtValue indexes should have been populated."); //1. first plan the memory const onnxruntime::InitializedTensorSet& initialized_tensor_set = graph.GetAllInitializedTensors(); std::unordered_map id_to_initialized_tensor; for (const auto& entry : initialized_tensor_set) { - int mlvalue_index; - ORT_RETURN_IF_ERROR(mlvalue_name_idx_map.GetIdx(entry.first, mlvalue_index)); - id_to_initialized_tensor[mlvalue_index] = entry.second; + int ort_value_index; + ORT_RETURN_IF_ERROR(ort_value_name_idx_map.GetIdx(entry.first, ort_value_index)); + id_to_initialized_tensor[ort_value_index] = entry.second; } for (const auto& entry : id_to_initialized_tensor) { - size_t len = 0; - ORT_RETURN_IF_ERROR(utils::GetSizeInBytesFromTensorProto(*entry.second, &len)); - ORT_RETURN_IF_ERROR(planner.TraceAllocation(entry.first, len)); + ORT_RETURN_IF_ERROR(planner->Trace(entry.first, entry.second)); } //2. allocate weight buffer on different locations - MemoryPatternGroup mem_patterns; - ORT_RETURN_IF_ERROR(planner.GeneratePatterns(&mem_patterns)); - ORT_RETURN_IF_ERROR(AllocatePlannedBuffers(mem_patterns, exec_providers, weights_buffers)); + ORT_RETURN_IF_ERROR(planner->FinalizePlan()); OrtCallback deleter; //3. create weight tensors based on weights buffer for (const auto& entry : id_to_initialized_tensor) { - int mlvalue_index = entry.first; + int ort_value_index = entry.first; const char* name = entry.second->has_name() ? entry.second->name().c_str() : ""; const ONNX_NAMESPACE::TensorProto& tensor_proto = *(entry.second); - auto& location = execution_plan.allocation_plan[mlvalue_index].location; - void* buffer = nullptr; - size_t len = 0; + std::unique_ptr m; // TODO: if the tensor need be copied, does it have enough room? - ORT_RETURN_IF_ERROR( - GetPreallocatedBuffer(mem_patterns, location, mlvalue_index, weights_buffers, name, buffer, len)); + ORT_RETURN_IF_ERROR(planner->GetPreallocatedBuffer(ort_value_index, name, m)); #ifndef NDEBUG - ORT_ENFORCE(buffer != nullptr || len == 0); + ORT_ENFORCE(m != nullptr); + ORT_ENFORCE(m->GetBuffer() != nullptr || m->GetLen() == 0); #endif - - MemBuffer m(buffer, len, location); - MLValue mlvalue; - Status st = DeserializeTensorProto(env, graph_loc, tensor_proto, m, exec_providers, mlvalue, deleter); + OrtValue ort_value; + Status st = DeserializeTensorProto(env, graph_loc, tensor_proto, *m, exec_providers, ort_value, deleter); if (!st.IsOK()) { std::ostringstream oss; oss << "Deserialize tensor " << name << " failed." << st.ErrorMessage(); return Status(st.Category(), st.Code(), oss.str()); } - ORT_RETURN_IF_ERROR(save_tensor_func(mlvalue_index, mlvalue, deleter)); + ORT_RETURN_IF_ERROR(save_tensor_func(ort_value_index, ort_value, deleter)); - VLOGS(logger, 1) << "Added weight with name : " << name << " with index: " << mlvalue_index; + VLOGS(logger, 1) << "Added weight with name : " << name << " with index: " << ort_value_index; } LOGS(logger, INFO) << "Done saving initialized tensors"; @@ -411,19 +326,19 @@ common::Status SaveKernels(const ExecutionProviders& execution_providers, return Status::OK(); } -template // T is const NodeArg or NodeArg +template // T is container of const NodeArg* or NodeArg* static bool IsArgNameInInputsOutputs(const std::string& name, - const std::vector& graph_args) { - auto it = std::find_if(std::begin(graph_args), std::end(graph_args), [&name](const onnxruntime::NodeArg* arg) { + const T& graph_args) { + auto it = std::find_if(graph_args.cbegin(), graph_args.cend(), [&name](const onnxruntime::NodeArg* arg) { return arg->Name() == name; }); - return it != graph_args.end(); + return it != graph_args.cend(); } common::Status SaveInputOutputNamesToNodeMapping(const onnxruntime::Graph& graph, const KernelRegistryManager& custom_registry_manager, SessionState& session_state, - const std::vector* implicit_inputs) { + const ConstPointerContainer>* implicit_inputs) { auto& graph_inputs = graph.GetInputsIncludingInitializers(); auto& graph_outputs = graph.GetOutputs(); @@ -498,7 +413,7 @@ common::Status SaveInputOutputNamesToNodeMapping(const onnxruntime::Graph& graph const auto& name = graph_input->Name(); if (input_map.find(name) == end_map) { // dummy entry for an input that we didn't find a use of in the graph. warn about it in case that's a bug. - // utils::CopyOneInputAcrossDevices will use the input MLValue as is given we don't believe it's used anywhere. + // utils::CopyOneInputAcrossDevices will use the input OrtValue as is given we don't believe it's used anywhere. LOGS(session_state.Logger(), WARNING) << "Graph input with name " << name << " is not associated with a node. "; ORT_RETURN_IF_ERROR(session_state.AddInputNameToNodeInfoMapping(name, empty_node_info)); } diff --git a/onnxruntime/core/framework/session_state_initializer.h b/onnxruntime/core/framework/session_state_initializer.h index 47516c3b7513f..3634704de5e2a 100644 --- a/onnxruntime/core/framework/session_state_initializer.h +++ b/onnxruntime/core/framework/session_state_initializer.h @@ -4,9 +4,11 @@ #pragma once #include +#include "core/common/const_pointer_container.h" #include "core/framework/allocator.h" #include "core/framework/tensor.h" #include "core/framework/path_lib.h" +#include "core/framework/tensor_allocator.h" namespace onnxruntime { class ExecutionProviders; @@ -29,18 +31,18 @@ class SessionStateInitializer { * * \param graph_loc The file path of where the graph was loaded. e.g. /tmp/test_squeezenet/model.onnx */ - SessionStateInitializer(const std::basic_string& graph_loc, onnxruntime::Graph& graph, - SessionState& session_state, const ExecutionProviders& providers, + SessionStateInitializer(bool enable_mem_pattern, const std::basic_string& graph_loc, + onnxruntime::Graph& graph, SessionState& session_state, const ExecutionProviders& providers, KernelRegistryManager& kernel_registry_manager); // First perform any transformations and create the execution plan common::Status CreatePlan(const Node* parent_node, - const std::vector& outer_scope_node_args, + const ConstPointerContainer>* outer_scope_node_args, bool enable_sequential_execution); // initialize tensors, and save. save kernels and input/output node mappings // \param implicit_inputs could be NULL - common::Status InitializeAndSave(const std::vector* implicit_inputs); + common::Status InitializeAndSave(const ConstPointerContainer>* implicit_inputs); private: const std::basic_string& graph_loc_; @@ -50,5 +52,6 @@ class SessionStateInitializer { const ExecutionProviders& execution_providers_; KernelRegistryManager& kernel_registry_manager_; const logging::Logger& logger_; + const bool enable_mem_pattern_; }; } // namespace onnxruntime diff --git a/onnxruntime/core/framework/tensor.cc b/onnxruntime/core/framework/tensor.cc index 4463ec7145bc9..d0085c0fe6c1a 100644 --- a/onnxruntime/core/framework/tensor.cc +++ b/onnxruntime/core/framework/tensor.cc @@ -21,7 +21,9 @@ Tensor::Tensor(MLDataType p_type, const TensorShape& shape, std::shared_ptr(shape_size) >= std::numeric_limits::max()) ORT_THROW("shape.Size() must >=0"); - void* p_data = allocator->AllocArray(static_cast(shape_size), p_type->Size()); + void* p_data = nullptr; + if(shape_size > 0) + p_data = allocator->AllocArray(static_cast(shape_size), p_type->Size()); Init(p_type, shape, p_data, allocator, offset); } diff --git a/onnxruntime/core/framework/tensor_type_and_shape.cc b/onnxruntime/core/framework/tensor_type_and_shape.cc index ea17763bacfd5..eeac468376677 100644 --- a/onnxruntime/core/framework/tensor_type_and_shape.cc +++ b/onnxruntime/core/framework/tensor_type_and_shape.cc @@ -38,7 +38,7 @@ ORT_API_STATUS_IMPL(OrtSetTensorElementType, _In_ OrtTensorTypeAndShapeInfo* thi API_IMPL_END } -ORT_API_STATUS_IMPL(OrtSetDims, OrtTensorTypeAndShapeInfo* this_ptr, _In_ const int64_t* dim_values, size_t dim_count) { +ORT_API_STATUS_IMPL(OrtSetDimensions, OrtTensorTypeAndShapeInfo* this_ptr, _In_ const int64_t* dim_values, size_t dim_count) { API_IMPL_BEGIN this_ptr->shape = onnxruntime::TensorShape(dim_values, dim_count); return nullptr; @@ -49,7 +49,7 @@ ORT_API(enum ONNXTensorElementDataType, OrtGetTensorElementType, _In_ const stru return info->type; } -ORT_API(size_t, OrtGetNumOfDimensions, _In_ const struct OrtTensorTypeAndShapeInfo* info) { +ORT_API(size_t, OrtGetDimensionsCount, _In_ const struct OrtTensorTypeAndShapeInfo* info) { return info->shape.NumDimensions(); } @@ -113,7 +113,7 @@ OrtStatus* GetTensorShapeAndType(const onnxruntime::TensorShape* shape, return status; } if (shape != nullptr) { - status = OrtSetDims(ret, shape->GetDims().data(), shape->GetDims().size()); + status = OrtSetDimensions(ret, shape->GetDims().data(), shape->GetDims().size()); if (status != nullptr) { OrtReleaseTensorTypeAndShapeInfo(ret); return status; @@ -123,18 +123,16 @@ OrtStatus* GetTensorShapeAndType(const onnxruntime::TensorShape* shape, return nullptr; } -ORT_API_STATUS_IMPL(OrtGetTensorShapeAndType, _In_ const OrtValue* value, +ORT_API_STATUS_IMPL(OrtGetTensorTypeAndShape, _In_ const OrtValue* v, _Out_ OrtTensorTypeAndShapeInfo** out) { API_IMPL_BEGIN - auto v = reinterpret_cast(value); const onnxruntime::Tensor& tensor = v->Get(); return GetTensorShapeAndType(&tensor.Shape(), tensor.DataType(), out); API_IMPL_END } -ORT_API(enum ONNXType, OrtGetValueType, _In_ const OrtValue* value) { +ORT_API(enum ONNXType, OrtGetValueType, _In_ const OrtValue* v) { try { - auto v = reinterpret_cast(value); onnxruntime::MLDataType type = v->Type(); OrtTypeInfo* out = nullptr; OrtStatus* ptr = OrtTypeInfo::FromDataTypeImpl(type, nullptr, nullptr, &out); @@ -155,8 +153,7 @@ ORT_API(enum ONNXType, OrtGetValueType, _In_ const OrtValue* value) { * \param value * \return The returned value should be freed by OrtReleaseTypeInfo after use */ -ORT_API_STATUS_IMPL(OrtGetTypeInfo, _In_ const OrtValue* value, struct OrtTypeInfo** out) { - auto v = reinterpret_cast(value); +ORT_API_STATUS_IMPL(OrtGetTypeInfo, _In_ const OrtValue* v, struct OrtTypeInfo** out) { onnxruntime::MLDataType type = v->Type(); if (type == nullptr) { *out = nullptr; diff --git a/onnxruntime/core/framework/tensorprotoutils.cc b/onnxruntime/core/framework/tensorprotoutils.cc index 3e52be92c607b..34cc6b329cd4c 100644 --- a/onnxruntime/core/framework/tensorprotoutils.cc +++ b/onnxruntime/core/framework/tensorprotoutils.cc @@ -138,8 +138,8 @@ Status UnpackTensor(const ONNX_NAMESPACE::TensorProto& tensor, const void* /*raw "UnpackTensor: the pre-allocate size does not match the size in proto"); auto& string_data = tensor.string_data(); - for (auto iter = string_data.cbegin(); iter != string_data.cend(); ++iter) { - *p_data++ = *iter; + for (const auto& iter : string_data) { + *p_data++ = iter; } return Status::OK(); @@ -163,8 +163,8 @@ Status UnpackTensor(const ONNX_NAMESPACE::TensorProto& tensor, const void* raw_d if (tensor.int32_data_size() != expected_size) return Status(common::ONNXRUNTIME, common::FAIL, "UnpackTensor: the pre-allocate size does not match the size in proto"); - for (auto iter = tensor.int32_data().cbegin(); iter != tensor.int32_data().cend(); ++iter) { - *p_data++ = static_cast(*iter); + for (int iter : tensor.int32_data()) { + *p_data++ = static_cast(iter); } return Status::OK(); @@ -208,8 +208,8 @@ Status UnpackTensor(const ONNX_NAMESPACE::TensorProto& tensor, const void* raw_d const size_t size = raw_data != nullptr ? raw_data_len : tensor.int32_data_size(); if (size == 0) return Status::OK(); - else - return Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); + + return Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); } if (ONNX_NAMESPACE::TensorProto_DataType_BFLOAT16 != tensor.data_type()) { return Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT); @@ -246,11 +246,11 @@ template common::Status GetSizeInBytesFromTensorProto(const ONNX_NAMESPACE::TensorProto& tensor_proto, size_t* out) { const auto& dims = tensor_proto.dims(); size_t size = 1; - for (int i = 0; i < dims.size(); ++i) { - if (dims[i] < 0) { + for (google::protobuf::int64 dim : dims) { + if (dim < 0 || static_cast(dim) >= std::numeric_limits::max()) { return common::Status(common::ONNXRUNTIME, common::FAIL, "Invalid TensorProto"); } - if (!IAllocator::CalcMemSizeForArray(size, static_cast(dims[i]), &size)) { + if (!IAllocator::CalcMemSizeForArray(size, static_cast(dim), &size)) { return common::Status(common::ONNXRUNTIME, common::FAIL, "Invalid TensorProto"); } } @@ -363,7 +363,7 @@ static void MoveOrtCallback(OrtCallback& from, OrtCallback& to) { } Status TensorProtoToMLValue(const Env& env, const ORTCHAR_T* tensor_proto_path, - const ONNX_NAMESPACE::TensorProto& tensor_proto, const MemBuffer& m, MLValue& value, + const ONNX_NAMESPACE::TensorProto& tensor_proto, const MemBuffer& m, OrtValue& value, OrtCallback& deleter) { const OrtAllocatorInfo& allocator = m.GetAllocInfo(); ONNXTensorElementDataType ele_type = utils::GetTensorElementType(tensor_proto); diff --git a/onnxruntime/core/framework/tensorprotoutils.h b/onnxruntime/core/framework/tensorprotoutils.h index b5fd466b0f97f..2e7a3b12141c7 100644 --- a/onnxruntime/core/framework/tensorprotoutils.h +++ b/onnxruntime/core/framework/tensorprotoutils.h @@ -32,7 +32,7 @@ std::vector GetTensorShapeFromTensorShapeProto(const ONNX_NAMESPACE::Te * relative path or an absolute path. */ common::Status TensorProtoToMLValue(const Env& env, const ORTCHAR_T* tensor_proto_path, - const ONNX_NAMESPACE::TensorProto& input, const MemBuffer& m, MLValue& value, + const ONNX_NAMESPACE::TensorProto& input, const MemBuffer& m, OrtValue& value, OrtCallback& deleter); // This function doesn't support string tensors ONNX_NAMESPACE::TensorProto::DataType GetTensorProtoType(const Tensor& tensor); diff --git a/onnxruntime/core/framework/utils.cc b/onnxruntime/core/framework/utils.cc index 2e613fe5f2bdb..95a177418fe6e 100644 --- a/onnxruntime/core/framework/utils.cc +++ b/onnxruntime/core/framework/utils.cc @@ -17,18 +17,12 @@ namespace onnxruntime { namespace utils { -AllocatorPtr GetAllocator(const ExecutionProviders& exec_providers, const OrtAllocatorInfo& allocator_info) { - return exec_providers.GetAllocator(allocator_info); -} - AllocatorPtr GetAllocator(const SessionState& session_state, const OrtAllocatorInfo& allocator_info) { return session_state.GetExecutionProviders().GetAllocator(allocator_info); } -common::Status AllocateHelper(const IExecutionProvider& execution_provider, - int device_id, - const Tensor& fetched_tensor, - MLValue& output_mlvalue) { +common::Status AllocateHelper(const IExecutionProvider& execution_provider, int device_id, const Tensor& fetched_tensor, + OrtValue& output_mlvalue) { auto allocator = execution_provider.GetAllocator(device_id, OrtMemTypeDefault); if (!allocator) { return Status(common::ONNXRUNTIME, common::FAIL, "invalid allocator"); @@ -51,8 +45,7 @@ const std::string& GetNodeInputProviderType(const SessionState::NodeInfo& info) // node may declare input_mem_type to be on CPU explicitly // skip implicit inputs as they don't have a valid 'index' value - bool node_input_on_cpu = !implicit_input && - info.kci && MemTypeOnCpuExplicitly(info.kci->kernel_def->InputMemoryType(info.index)); + bool node_input_on_cpu = !implicit_input && info.kci && info.kci->kernel_def->IsInputOnCpu(info.index); // need a std::string that doesn't go away for kCpuExecutionProvider so we can return a reference. static const std::string cpu_execution_provider{onnxruntime::kCpuExecutionProvider}; @@ -63,8 +56,8 @@ const std::string& GetNodeInputProviderType(const SessionState::NodeInfo& info) return required_provider_type; } -static Status CopyMLValue(const FeedsFetchesManager::MLValueCopyInfo& copy_info, - const MLValue& source_mlvalue, MLValue& target_mlvalue) { +static Status CopyMLValue(const FeedsFetchesManager::MLValueCopyInfo& copy_info, const OrtValue& source_mlvalue, + OrtValue& target_mlvalue) { if (copy_info.copy_provider == nullptr) { target_mlvalue = source_mlvalue; } else { @@ -84,11 +77,8 @@ static Status CopyMLValue(const FeedsFetchesManager::MLValueCopyInfo& copy_info, } // TODO should we handle the case of one input name feeding 2 nodes placed on different devices? -common::Status CopyOneInputAcrossDevices(const SessionState& session_state, - const std::string& input_name, - const MLValue& orig_mlvalue, - MLValue& new_mlvalue, - bool& needed_copy, +common::Status CopyOneInputAcrossDevices(const SessionState& session_state, const std::string& input_name, + const OrtValue& orig_mlvalue, OrtValue& new_mlvalue, bool& needed_copy, FeedsFetchesManager::MLValueCopyInfo& copy_info) { needed_copy = false; @@ -128,8 +118,8 @@ common::Status CopyOneInputAcrossDevices(const SessionState& session_state, ORT_ENFORCE(p_input_provider); } - //no copy for TRT - if (required_provider_type == onnxruntime::kTensorrtExecutionProvider) { + //no copy for TRT and nGraph + if (required_provider_type == onnxruntime::kTensorrtExecutionProvider || required_provider_type == onnxruntime::kNGraphExecutionProvider) { new_mlvalue = orig_mlvalue; break; } @@ -168,10 +158,8 @@ common::Status CopyOneInputAcrossDevices(const SessionState& session_state, return Status::OK(); } -common::Status CopyOneInputAcrossDevices(const SessionState& session_state, - const std::string& input_name, - const MLValue& orig_mlvalue, - MLValue& new_mlvalue) { +common::Status CopyOneInputAcrossDevices(const SessionState& session_state, const std::string& input_name, + const OrtValue& orig_mlvalue, OrtValue& new_mlvalue) { bool needed_copy; FeedsFetchesManager::MLValueCopyInfo ignored; return CopyOneInputAcrossDevices(session_state, input_name, orig_mlvalue, new_mlvalue, needed_copy, ignored); @@ -180,8 +168,7 @@ common::Status CopyOneInputAcrossDevices(const SessionState& session_state, // copies inputs across devices only if required and save copy_info static common::Status CopyInputsAcrossDevices(const SessionState& session_state, const std::vector& feed_names, - const std::vector& orig_feeds, - std::vector& new_feeds, + const std::vector& orig_feeds, std::vector& new_feeds, bool& needed_copy, std::vector* copy_info) { bool copied = false; @@ -203,7 +190,7 @@ static common::Status CopyInputsAcrossDevices(const SessionState& session_state, copied = true; if (copy_info) { - (*copy_info)[idx] = std::move(current_copy_info); + (*copy_info)[idx] = current_copy_info; } } } @@ -214,9 +201,9 @@ static common::Status CopyInputsAcrossDevices(const SessionState& session_state, } // copies inputs across devices only if required using cached copy_info -static common::Status CachedCopyInputsAcrossDevices(const std::vector& orig_feeds, - std::vector& new_feeds, - const std::vector& copy_info) { +static common::Status CachedCopyInputsAcrossDevices( + const std::vector& orig_feeds, std::vector& new_feeds, + const std::vector& copy_info) { size_t num_feeds = orig_feeds.size(); ORT_ENFORCE(copy_info.size() == num_feeds); @@ -235,8 +222,7 @@ static common::Status CachedCopyInputsAcrossDevices(const std::vector& // TODO: We should be able to use the allocation plan to know which device an output will be on. static common::Status SetupFetchesForExecute(const SessionState& session_state, const std::vector& output_names, - std::vector& fetches, - std::vector& new_fetches, + std::vector& fetches, std::vector& new_fetches, std::vector* copy_to_new_fetches_cached_values) { ORT_ENFORCE(new_fetches.empty()); @@ -259,7 +245,7 @@ static common::Status SetupFetchesForExecute(const SessionState& session_state, return std::make_pair(false, size_t(0)); } - return std::make_pair(true, it - output_names.begin()); + return std::pair(true, it - output_names.begin()); }; std::pair found; @@ -276,7 +262,7 @@ static common::Status SetupFetchesForExecute(const SessionState& session_state, seen_outputs.insert(arg->Name()); size_t idx = found.second; - const MLValue& provided_mlvalue = fetches[idx]; + const OrtValue& provided_mlvalue = fetches[idx]; if (provided_mlvalue.IsAllocated()) { if (!provided_mlvalue.IsTensor()) { @@ -312,8 +298,7 @@ static common::Status SetupFetchesForExecute(const SessionState& session_state, return Status::OK(); } -static common::Status CachedSetupFetchesForExecute(std::vector& fetches, - std::vector& new_fetches, +static common::Status CachedSetupFetchesForExecute(std::vector& fetches, std::vector& new_fetches, const std::vector& copy_to_new_fetches_cached_values) { auto num_outputs = fetches.size(); ORT_ENFORCE(new_fetches.empty()); @@ -332,10 +317,8 @@ static common::Status CachedSetupFetchesForExecute(std::vector& fetches } // copies outputs across devices only if required -static common::Status CopyOutputsAcrossDevices(const SessionState& session_state, - const std::vector& fetches, - std::vector& user_fetches, - bool& needed_copy, +static common::Status CopyOutputsAcrossDevices(const SessionState& session_state, const std::vector& fetches, + std::vector& user_fetches, bool& needed_copy, std::vector* copiers) { needed_copy = false; auto num_outputs = fetches.size(); @@ -397,16 +380,16 @@ static common::Status CopyOutputsAcrossDevices(const SessionState& session_state ORT_RETURN_IF_ERROR(CopyMLValue(copy_info, fetched_mlvalue, output_mlvalue)); if (copiers) { - (*copiers)[idx] = std::move(copy_info); + (*copiers)[idx] = copy_info; } } return Status::OK(); } -static common::Status CachedCopyOutputsAcrossDevices(const std::vector& fetches, - std::vector& user_fetches, - const std::vector& copy_info) { +static common::Status CachedCopyOutputsAcrossDevices( + const std::vector& fetches, std::vector& user_fetches, + const std::vector& copy_info) { auto num_outputs = fetches.size(); // internal logic error if these are mismatched @@ -442,14 +425,11 @@ static DeviceCopyCheck CheckExecutionProviders(const ExecutionProviders& executi } // execute graph with cached info from FeedsFetchesManager. -common::Status ExecuteGraphWithCachedInfo(const SessionState& session_state, - const FeedsFetchesManager& feeds_fetches_manager, - const std::vector& feeds, - std::vector& fetches, - const std::unordered_map& fetch_allocators, - bool sequential_execution, - const bool& terminate_flag, - const logging::Logger& logger) { +common::Status ExecuteGraphWithCachedInfo( + const SessionState& session_state, const FeedsFetchesManager& feeds_fetches_manager, + const std::vector& feeds, std::vector& fetches, + const std::unordered_map& fetch_allocators, bool sequential_execution, + const bool& terminate_flag, const logging::Logger& logger) { const auto& feeds_fetches_info = feeds_fetches_manager.GetFeedsFetchesInfo(); auto device_copy_checks = feeds_fetches_manager.GetDeviceCopyChecks(); @@ -466,10 +446,10 @@ common::Status ExecuteGraphWithCachedInfo(const SessionState& session_state, feeds_fetches_info.feeds_mlvalue_idxs, feeds, feeds_fetches_info.fetches_mlvalue_idxs, fetches, fetch_allocators, logger)); } else { - const std::vector* p_feeds = &feeds; - std::vector* p_fetches = &fetches; - std::vector device_feeds; - std::vector device_fetches; + const std::vector* p_feeds = &feeds; + std::vector* p_fetches = &fetches; + std::vector device_feeds; + std::vector device_fetches; // Copy inputs if (device_copy_checks.input_copy_needed == DeviceCopyCheck::Copy) { @@ -506,14 +486,10 @@ common::Status ExecuteGraphWithCachedInfo(const SessionState& session_state, } // execute graph and update feeds_fetches_manager with cached copy info if cache_copy_info is true -common::Status ExecuteGraph(const SessionState& session_state, - FeedsFetchesManager& feeds_fetches_manager, - const std::vector& feeds, - std::vector& fetches, +common::Status ExecuteGraph(const SessionState& session_state, FeedsFetchesManager& feeds_fetches_manager, + const std::vector& feeds, std::vector& fetches, const std::unordered_map& fetch_allocators, - bool sequential_execution, - const bool& terminate_flag, - const logging::Logger& logger, + bool sequential_execution, const bool& terminate_flag, const logging::Logger& logger, bool cache_copy_info) { const auto& feeds_fetches_info = feeds_fetches_manager.GetFeedsFetchesInfo(); auto device_copy_checks = feeds_fetches_manager.GetDeviceCopyChecks(); @@ -539,10 +515,10 @@ common::Status ExecuteGraph(const SessionState& session_state, } else { bool copy_needed = false; - const std::vector* p_feeds = &feeds; - std::vector* p_fetches = &fetches; - std::vector device_feeds; - std::vector device_fetches; + const std::vector* p_feeds = &feeds; + std::vector* p_fetches = &fetches; + std::vector device_feeds; + std::vector device_fetches; // Copy inputs auto* copiers = cache_copy_info ? &feeds_fetches_manager.GetMutableFeedsDeviceCopiers() : nullptr; diff --git a/onnxruntime/core/framework/utils.h b/onnxruntime/core/framework/utils.h index db17a22045e18..23e5c17466ae3 100644 --- a/onnxruntime/core/framework/utils.h +++ b/onnxruntime/core/framework/utils.h @@ -17,7 +17,6 @@ class Graph; class KernelDef; class KernelRegistryManager; class IExecutionProvider; -class MLValue; class Node; class Tensor; @@ -27,43 +26,31 @@ class Logger; namespace utils { -AllocatorPtr GetAllocator(const ExecutionProviders& exec_providers, const OrtAllocatorInfo& allocator_info); AllocatorPtr GetAllocator(const SessionState& session_state, const OrtAllocatorInfo& allocator_info); -common::Status AllocateHelper(const IExecutionProvider& execution_provider, - int device_id, - const Tensor& fetched_tensor, - MLValue& output_mlvalue); +common::Status AllocateHelper(const IExecutionProvider& execution_provider, int device_id, const Tensor& fetched_tensor, + OrtValue& output_mlvalue); const std::string& GetNodeInputProviderType(const SessionState::NodeInfo& info); -common::Status CopyOneInputAcrossDevices(const SessionState& session_state, - const std::string& input_name, - const MLValue& orig_mlvalue, - MLValue& new_mlvalue); +common::Status CopyOneInputAcrossDevices(const SessionState& session_state, const std::string& input_name, + const OrtValue& orig_mlvalue, OrtValue& new_mlvalue); // ExecuteGraph, writing cache info to FeedsFetchesManager to optimize feed and fetch usage across invocations when the // order and location of the feeds and fetches is unchanged. -common::Status ExecuteGraph(const SessionState& session_state, - FeedsFetchesManager& feeds_fetches_manager, - const std::vector& feeds, - std::vector& fetches, +common::Status ExecuteGraph(const SessionState& session_state, FeedsFetchesManager& feeds_fetches_manager, + const std::vector& feeds, std::vector& fetches, const std::unordered_map& fetch_allocators, - bool sequential_execution, - const bool& terminate_flag, - const logging::Logger& logger, + bool sequential_execution, const bool& terminate_flag, const logging::Logger& logger, bool cache_copy_info = true); // ExecuteGraph used the cached information in feeds_fetches_manager. -common::Status ExecuteGraphWithCachedInfo(const SessionState& session_state, - const FeedsFetchesManager& feeds_fetches_manager, - const std::vector& feeds, - std::vector& fetches, - const std::unordered_map& fetch_allocators, - bool sequential_execution, - const bool& terminate_flag, - const logging::Logger& logger); +common::Status ExecuteGraphWithCachedInfo( + const SessionState& session_state, const FeedsFetchesManager& feeds_fetches_manager, + const std::vector& feeds, std::vector& fetches, + const std::unordered_map& fetch_allocators, bool sequential_execution, + const bool& terminate_flag, const logging::Logger& logger); #define DispatchOnTensorType(tensor_type, function, ...) \ if (tensor_type == DataTypeImpl::GetType()) \ diff --git a/onnxruntime/core/graph/contrib_ops/attn_lstm_schema_defs.cc b/onnxruntime/core/graph/contrib_ops/attn_lstm_schema_defs.cc index 246b4da822b1e..b17c40c87f743 100644 --- a/onnxruntime/core/graph/contrib_ops/attn_lstm_schema_defs.cc +++ b/onnxruntime/core/graph/contrib_ops/attn_lstm_schema_defs.cc @@ -331,5 +331,5 @@ OpSchema& RegisterAttnLSTMContribOpSchema(OpSchema&& op_schema){ .SetDoc(AttnLSTM_ver1_doc); } -} -} +} // namespace contrib +} // namespace onnxruntime diff --git a/onnxruntime/core/graph/contrib_ops/contrib_defs.cc b/onnxruntime/core/graph/contrib_ops/contrib_defs.cc index 8ffecc8abb37e..2d092dd322d2c 100644 --- a/onnxruntime/core/graph/contrib_ops/contrib_defs.cc +++ b/onnxruntime/core/graph/contrib_ops/contrib_defs.cc @@ -1013,6 +1013,181 @@ Example 4: "Constrain to tensor(float).") .SetDoc(R"DOC(The WordConvEmbedding takes in a batch of sequence words and embed each word to a vector.)DOC"); + ONNX_CONTRIB_OPERATOR_SCHEMA(Pad) + .SetDomain(kMSDomain) + .SinceVersion(1) + .Attr( + "mode", + "Three modes: `constant`(default) - pads with a given constant value, " + "`reflect` - pads with the reflection of the vector mirrored on the first and last values of the vector along each axis, " + "`edge` - pads with the edge values of array", + AttributeProto::STRING, + std::string("constant")) + .Input(0, "data", "Input tensor.", "T") + .Input( + 1, + "pads", + "Tensor of integers indicating the number of padding elements to add or remove (if negative) " + "at the beginning and end of each axis. For 2D input tensor, it is the number of pixels. " + "`pads` should be a 1D tensor of shape [2 * input_rank] or a 2D tensor of shape [1, 2 * input_rank]. " + "`pads` format (1D example) should be as follow [x1_begin, x2_begin,...,x1_end, x2_end,...], " + "where xi_begin is the number of pixels added at the beginning of axis `i` and " + "xi_end, the number of pixels added at the end of axis `i`.", + "tensor(int64)") + .Input( + 2, + "value", + "(Optional) A scalar or rank 1 tensor containing a single value to be filled if the mode chosen is `constant` (by default it is 0.0).", + "T", + OpSchema::Optional) + .Output(0, "output", "Tensor after padding.", "T") + .TypeConstraint( + "T", + {"tensor(float16)", "tensor(float)", "tensor(double)"}, + "Constrain input and output types to float tensors.") + .TypeAndShapeInferenceFunction([](ONNX_NAMESPACE::InferenceContext& ctx) { + // Type inference + propagateElemTypeFromInputToOutput(ctx, 0, 0); + // Shape inference needs the input data shape + if (!hasNInputShapes(ctx, 1)) { + return; + } + const auto& input_shape = ctx.getInputType(0)->tensor_type().shape(); + const auto input_rank = input_shape.dim_size(); + + // Infer output shape if 'pads' tensor is available + const auto* pads_initializer = ctx.getInputData(1); + if (nullptr != pads_initializer) { + const auto& pads_shape = ctx.getInputType(1)->tensor_type().shape(); + if ((pads_initializer->dims_size() != 1 && + pads_initializer->dims_size() != 2) || + (pads_initializer->dims_size() == 2 && + pads_shape.dim((int)0).dim_value() != 1) || + pads_initializer->data_type() != ONNX_NAMESPACE::TensorProto::INT64) + fail_shape_inference( + "'pads' input must be a 1D (shape: [input_rank]) " + "or 2D tensor (shape: [1, input_rank]) of type int64"); + + // make a copy of the returned const vector - may have to resize + // this in next step + std::vector pads_data; + if (pads_initializer->has_raw_data()) + return; + else + pads_data.insert( + pads_data.end(), + pads_initializer->int64_data().begin(), + pads_initializer->int64_data().end()); + + // fill with zeros if needed to reach appropriate size + if (pads_data.size() != static_cast(2 * input_rank)) + pads_data.resize(2 * input_rank, 0); + + const auto& output_shape = + ctx.getOutputType(0)->mutable_tensor_type()->mutable_shape(); + for (size_t i = 0; (int64_t)i < input_rank; ++i) { + const auto& input_dim = input_shape.dim((int)i); + auto* output_dim = output_shape->add_dim(); + if (input_dim.has_dim_value()) { + output_dim->set_dim_value( + input_dim.dim_value() + pads_data[i] + pads_data[i + input_rank]); + } else if (pads_data[i] + pads_data[i + input_rank] == 0) { + *output_dim = input_dim; + } + } + } else { + // Infer ouput shapes' rank in any case + auto* output_shape_0 = getOutputShape(ctx, 0); + for (size_t i = 0; (int64_t)i < input_rank; ++i) { + output_shape_0->add_dim(); + } + } + return; + }) + .SetDoc(R"DOC( + Given `data` tensor, pads, mode, and value. + Example: + Insert 0 pads to the beginning of the second dimension. + data = [ + [1.0, 1.2], + [2.3, 3.4], + [4.5, 5.7], + ] + pads = [0, 2, 0, 0] + output = [ + [ + [0.0, 0.0, 1.0, 1.2], + [0.0, 0.0, 2.3, 3.4], + [0.0, 0.0, 4.5, 5.7], + ], + ] + )DOC"); + + ONNX_CONTRIB_OPERATOR_SCHEMA(Unique) + .SetDomain(kMSDomain) + .SinceVersion(1) + .Input(0, "x", "A 1-D input tensor that is to be processed.", "T") + .Output(0, "y", + "A 1-D tensor of the same type as 'x' " + "containing all the unique values in 'x' sorted " + "in the same order that they occur in the input 'x'", + "T") + .Output(1, "idx", + "A 1-D INT64 tensor of the same size as 'x' " + "containing the indices for each value in 'x' " + "in the output 'uniques'", + "tensor(int64)") + .Output(2, "counts", + "A 1-D INT64 tensor containing the " + "the count of each element " + "of 'uniques' in the input 'x'", + "tensor(int64)") + .TypeConstraint("T", OpSchema::all_tensor_types(), "Input can be of any tensor type.") + .TypeAndShapeInferenceFunction([](ONNX_NAMESPACE::InferenceContext& ctx) { + // Type inference + ONNX_NAMESPACE::propagateElemTypeFromInputToOutput(ctx, 0, 0); + ONNX_NAMESPACE::updateOutputElemType(ctx, 1, ONNX_NAMESPACE::TensorProto::INT64); + ONNX_NAMESPACE::updateOutputElemType(ctx, 2, ONNX_NAMESPACE::TensorProto::INT64); + + // Shape inference + + // shape of output 'uniques' and 'counts' + // depends on actual input data, but the rank is always 1 + ctx.getOutputType(0) + ->mutable_tensor_type() + ->mutable_shape() + ->add_dim(); + + ctx.getOutputType(2) + ->mutable_tensor_type() + ->mutable_shape() + ->add_dim(); + + // if the input shape doesn't exist, further shape inference is not possible + if (!hasNInputShapes(ctx, 1)) { + return; + } + + // 'idx' output has same shape as input + ONNX_NAMESPACE::propagateShapeFromInputToOutput(ctx, 0, 1); + + return; + }) + .SetDoc(R"DOC( + Finds all the unique values (deduped list) present in the given input tensor. + This operator returns 3 outputs. + The first output tensor 'uniques' contains all of the unique elements of the input, + sorted in the same order that they occur in the input. + The second output tensor 'idx' is the same size as the input and it contains the index + of each value of the input in 'uniques'. + The third output tensor 'counts' contains the count of each element of 'uniques' in the input. + Example: + input_x = [2, 1, 1, 3, 4, 3] + output_uniques = [2, 1, 3, 4] + output_idx = [0, 1, 1, 2, 3, 2] + output_counts = [1, 2, 2, 1] + )DOC"); + #ifdef MICROSOFT_INTERNAL // register internal ops RegisterInternalSchemas(); diff --git a/onnxruntime/core/graph/contrib_ops/range_schema_defs.cc b/onnxruntime/core/graph/contrib_ops/range_schema_defs.cc index e5524281b223e..06360cacc211e 100644 --- a/onnxruntime/core/graph/contrib_ops/range_schema_defs.cc +++ b/onnxruntime/core/graph/contrib_ops/range_schema_defs.cc @@ -60,9 +60,8 @@ static T GetFirstElement(const TensorProto* shapeInitializer) { if (shapeInitializer->has_raw_data()) { const std::string& bytes = shapeInitializer->raw_data(); return *reinterpret_cast(bytes.c_str()); - } else { - return get_data(shapeInitializer); } + return get_data(shapeInitializer); } template diff --git a/onnxruntime/core/graph/function.cc b/onnxruntime/core/graph/function.cc index 2ff3b940a3b35..32e6c77a50637 100644 --- a/onnxruntime/core/graph/function.cc +++ b/onnxruntime/core/graph/function.cc @@ -10,16 +10,17 @@ namespace onnxruntime { // Auto inferred and generate an opschema for stand-alone functions // TODO: revisit to see if we can eliminate typeconstraint step void IOTypeConstraintHelper(const ONNX_NAMESPACE::FunctionProto* onnx_func_proto_, - std::unique_ptr& op_schema_, - const std::unordered_map& input_name_idx_map, - const std::unordered_map& output_name_idx_map) { + std::unique_ptr& op_schema_, + const std::unordered_map& input_name_idx_map, + const std::unordered_map& output_name_idx_map) { std::vector> input_types_list(onnx_func_proto_->input_size()); std::vector> output_types_list(onnx_func_proto_->output_size()); std::unordered_map> type_constraint_map; std::unordered_map attribute_type_map; auto schema_registry = ONNX_NAMESPACE::OpSchemaRegistry::Instance(); for (auto& node : onnx_func_proto_->node()) { - const auto node_op_schema = schema_registry->GetSchema(node.op_type(), (int)onnx_func_proto_->since_version(), node.domain()); + const auto node_op_schema = + schema_registry->GetSchema(node.op_type(), static_cast(onnx_func_proto_->since_version()), node.domain()); for (int i = 0; i < node.input_size(); ++i) { auto& in_name = node.input().Get(i); auto iter = input_name_idx_map.find(in_name); @@ -70,7 +71,7 @@ void IOTypeConstraintHelper(const ONNX_NAMESPACE::FunctionProto* onnx_func_proto op_schema_->Output(i, output.first, "", output.second); ++i; } - + for (auto& tc : type_constraint_map) { op_schema_->TypeConstraint(tc.first, tc.second, ""); } @@ -85,6 +86,13 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, std::unique_ptr customized_func) : parent_graph_(&graph), onnx_func_proto_{nullptr} { customized_func_body_ = std::move(customized_func); + + // Construct body. + body_ = std::make_unique("fused_function_subgraph", false, onnxruntime::ModelMetaData(), + IOnnxRuntimeOpSchemaRegistryList({graph.GetSchemaRegistry()}), + graph.DomainToVersionMap()); + auto& sub_graph = body_->MainGraph(); + auto meta_def = customized_func_body_->GetMetaDef(); op_schema_ = std::make_unique(); op_schema_->SetName(meta_def->name); @@ -92,30 +100,36 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, op_schema_->SetDoc(meta_def->doc_string); op_schema_->SinceVersion(meta_def->since_version); int i = 0; + std::vector sub_graph_inputs; + sub_graph_inputs.resize(meta_def->inputs.size()); for (auto& input : meta_def->inputs) { - auto input_type = parent_graph_->GetNodeArg(input)->Type(); - op_schema_->Input(i, input, "", *input_type); + auto input_arg = parent_graph_->GetNodeArg(input); + auto& sub_graph_input_arg = sub_graph.GetOrCreateNodeArg(input_arg->Name(), input_arg->TypeAsProto()); + sub_graph_inputs[i] = &sub_graph_input_arg; + op_schema_->Input(i, input, "", *input_arg->Type()); ++i; } i = 0; + std::vector sub_graph_outputs; + sub_graph_outputs.resize(meta_def->outputs.size()); for (auto& output : meta_def->outputs) { - auto output_type = parent_graph_->GetNodeArg(output)->Type(); - op_schema_->Output(i, output, "", *output_type); + auto output_arg = parent_graph_->GetNodeArg(output); + auto& sub_graph_output_arg = sub_graph.GetOrCreateNodeArg(output_arg->Name(), output_arg->TypeAsProto()); + sub_graph_outputs[i] = &sub_graph_output_arg; + op_schema_->Output(i, output, "", *output_arg->Type()); ++i; } op_schema_->Finalize(); - //construct body - body_ = std::make_unique("fused_function_subgraph", false, onnxruntime::ModelMetaData(), - IOnnxRuntimeOpSchemaRegistryList({graph.GetSchemaRegistry()}), - graph.DomainToVersionMap()); - auto& sub_graph = body_->MainGraph(); + sub_graph.SetInputs(sub_graph_inputs); + sub_graph.SetOutputs(sub_graph_outputs); //Add node and node args //TODO: for better performance, we could try to transfer the nodes in parent graph to sub-graph directly, //instead of create new nodes. for (auto& node_index : customized_func_body_->nodes) { auto node = parent_graph_->GetNode(node_index); - std::vector inputs, outputs; + std::vector inputs; + std::vector outputs; for (auto input : node->InputDefs()) { auto& n_input = sub_graph.GetOrCreateNodeArg(input->Name(), input->TypeAsProto()); inputs.push_back(&n_input); @@ -127,7 +141,7 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, sub_graph.AddNode(node->Name(), node->OpType(), node->Description(), inputs, outputs, &node->GetAttributes(), node->Domain()); } - for (auto input : meta_def->inputs) { + for (const auto& input : meta_def->inputs) { const ONNX_NAMESPACE::TensorProto* initializer = nullptr; if (graph.GetInitializedTensor(input, initializer)) { sub_graph.AddInitializedTensor(*initializer); @@ -148,7 +162,7 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, op_schema_->SetName(onnx_func_proto_->name()); op_schema_->SetDomain(onnx_func_proto_->node().Get(0).domain()); op_schema_->SetDoc(onnx_func_proto_->doc_string()); - op_schema_->SinceVersion((ONNX_NAMESPACE::OperatorSetVersion)onnx_func_proto_->since_version()); + op_schema_->SinceVersion(static_cast(onnx_func_proto_->since_version())); std::unordered_map input_name_idx_map; std::unordered_map output_name_idx_map; for (int i = 0; i < onnx_func_proto_->input_size(); ++i) { @@ -166,9 +180,9 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, auto type_constraint_params = cached_op_schema->typeConstraintParams(); for (auto& type_constraint_param : type_constraint_params) { op_schema_->TypeConstraint( - type_constraint_param.type_param_str, - type_constraint_param.allowed_type_strs, - type_constraint_param.description); + type_constraint_param.type_param_str, + type_constraint_param.allowed_type_strs, + type_constraint_param.description); } int i = 0; for (auto& input : cached_op_schema->inputs()) { @@ -187,13 +201,13 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, if (!cached_op_schema || !cached_op_schema->has_type_and_shape_inference_function()) { op_schema_->TypeAndShapeInferenceFunction( - [this](ONNX_NAMESPACE::InferenceContext& ctx) { - auto schema_registry = ONNX_NAMESPACE::OpSchemaRegistry::Instance(); - const ONNX_NAMESPACE::FunctionProto* func_ptr = this->GetFuncProto(); - if (nullptr != func_ptr) { - ONNX_NAMESPACE::shape_inference::InferShapeForFunctionNode(func_ptr, schema_registry, ctx); - } - }); + [this](ONNX_NAMESPACE::InferenceContext& ctx) { + auto schema_registry = ONNX_NAMESPACE::OpSchemaRegistry::Instance(); + const ONNX_NAMESPACE::FunctionProto* func_ptr = this->GetFuncProto(); + if (nullptr != func_ptr) { + ONNX_NAMESPACE::shape_inference::InferShapeForFunctionNode(func_ptr, schema_registry, ctx); + } + }); } else { op_schema_->TypeAndShapeInferenceFunction(cached_op_schema->GetTypeAndShapeInferenceFunction()); } @@ -202,7 +216,7 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, //construct body std::unordered_map domain_to_version; //TODO: set correct domain and version - domain_to_version[onnxruntime::kOnnxDomain] = (int)onnx_func_proto_->since_version(); + domain_to_version[onnxruntime::kOnnxDomain] = static_cast(onnx_func_proto_->since_version()); body_ = std::make_unique(onnx_func_proto_->name(), false, onnxruntime::ModelMetaData(), IOnnxRuntimeOpSchemaRegistryList(), domain_to_version); auto& sub_graph = body_->MainGraph(); @@ -211,7 +225,8 @@ FunctionImpl::FunctionImpl(const onnxruntime::Graph& graph, // in the parent graph for later inlining purpose auto attr_map = node_in_parent_graph->GetAttributes(); for (auto& node : onnx_func_proto_->node()) { - std::vector inputs, outputs; + std::vector inputs; + std::vector outputs; std::string uniq_identifier = node.name(); if (!node.has_name()) { std::stringstream ss; diff --git a/onnxruntime/core/graph/graph.cc b/onnxruntime/core/graph/graph.cc index 7f5437e1fbd20..8bca0204fc826 100644 --- a/onnxruntime/core/graph/graph.cc +++ b/onnxruntime/core/graph/graph.cc @@ -75,8 +75,8 @@ DataType NodeArg::Type() const noexcept { const TypeProto* NodeArg::TypeAsProto() const noexcept { if (node_arg_info_.has_type()) return &node_arg_info_.type(); - else - return nullptr; + + return nullptr; } const TensorShapeProto* NodeArg::Shape() const { @@ -87,16 +87,14 @@ const TensorShapeProto* NodeArg::Shape() const { case TypeProto::kTensorType: { if (type->tensor_type().has_shape()) { return &(type->tensor_type().shape()); - } else { - return nullptr; } + return nullptr; } case TypeProto::kSparseTensorType: { if (type->sparse_tensor_type().has_shape()) { return &(type->sparse_tensor_type().shape()); - } else { - return nullptr; } + return nullptr; } case TypeProto::kSequenceType: case TypeProto::kMapType: @@ -330,7 +328,7 @@ void Node::ToProto(NodeProto& proto) const { // Set attributes. proto.clear_attribute(); - for (auto attribute : attributes_) { + for (const auto& attribute : attributes_) { const gsl::not_null attr{proto.add_attribute()}; *attr = attribute.second; } @@ -484,6 +482,7 @@ Status Node::UpdateInputArgCount() { if (total_arg_count < 0 || static_cast(total_arg_count) != definitions_.input_defs.size()) { return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, + "This is an invalid model. " "The sum of input arg count is not equal to size of input defs in node (", name_, ")"); } @@ -601,16 +600,13 @@ Graph::Graph(GraphProto* graph_proto, IOnnxRuntimeOpSchemaCollectionPtr schema_registry, const std::unordered_map& model_functions) : Graph(graph_proto, domain_to_version, ir_version, schema_registry, nullptr, model_functions) {} -Graph::Graph(GraphProto* graph_proto, - const std::unordered_map& domain_to_version, - Version ir_version, - IOnnxRuntimeOpSchemaCollectionPtr schema_registry, - Graph* parent_graph, +Graph::Graph(GraphProto* graph_proto, const std::unordered_map& domain_to_version, Version ir_version, + IOnnxRuntimeOpSchemaCollectionPtr schema_registry, Graph* parent_graph, const std::unordered_map& model_functions) : graph_proto_{graph_proto}, schema_registry_(schema_registry), graph_resolve_needed_(true), - graph_proto_sync_needed_(false), + domain_to_version_(domain_to_version), model_functions_(model_functions), ir_version_(ir_version), @@ -682,7 +678,7 @@ Graph::Graph(GraphProto* graph_proto, } } - for (auto node_proto : graph_proto_->node()) { + for (const auto& node_proto : graph_proto_->node()) { AddNode(node_proto, name_to_type_map); } } @@ -710,7 +706,7 @@ Status Graph::VerifyNoDuplicateName() { if (!node_name.empty() && node_name_to_index.end() != node_name_to_index.find(node_name)) { // The node has name and its name was used by another node. Status status(ONNXRUNTIME, FAIL, - "Error: two nodes with same node name (" + node_name + ")."); + "This is an invalid model. Error: two nodes with same node name (" + node_name + ")."); return status; } @@ -724,14 +720,14 @@ Status Graph::VerifyNoDuplicateName() { auto& output_arg_name = output_def->Name(); if (inputs_and_initializers.count(output_arg_name)) { Status status(ONNXRUNTIME, FAIL, - "Error: Duplicate definition of name (" + output_arg_name + ")."); + "This is an invalid model. Error: Duplicate definition of name (" + output_arg_name + ")."); return status; } auto result = output_args.insert({output_arg_name, {&node, output_index}}); if (!result.second) { // Two outputs with same name, so that insertion fails. Status status(ONNXRUNTIME, FAIL, - "Error: Duplicate definition of name (" + output_arg_name + ")."); + "This is an invalid model. Error: Duplicate definition of name (" + output_arg_name + ")."); return status; } } @@ -788,16 +784,15 @@ NodeArg* Graph::GetNodeArgIncludingParentGraphs(const std::string& node_arg_name } void Graph::AddEdge(NodeIndex src_node_index, NodeIndex dst_node_index, int src_arg_slot, int dst_arg_slot) { - if (nodes_.size() <= src_node_index || - nodes_.size() <= dst_node_index || - nullptr == nodes_[src_node_index] || - nullptr == nodes_[dst_node_index]) { + if (nodes_.size() <= src_node_index || src_arg_slot < 0 || nodes_.size() <= dst_node_index || dst_arg_slot < 0 || + nullptr == nodes_[src_node_index] || nullptr == nodes_[dst_node_index]) { // Invalid node indexes specified. ORT_THROW("Invalid node indexes specified when adding edge."); } - NodeArg *src_arg = nullptr, *dst_arg = nullptr; - if (nodes_[src_node_index]->MutableDefinitions().output_defs.size() > src_arg_slot) { + NodeArg* src_arg = nullptr; + NodeArg* dst_arg = nullptr; + if (nodes_[src_node_index]->MutableDefinitions().output_defs.size() > static_cast(src_arg_slot)) { src_arg = nodes_[src_node_index]->MutableDefinitions().output_defs[src_arg_slot]; } @@ -807,12 +802,12 @@ void Graph::AddEdge(NodeIndex src_node_index, NodeIndex dst_node_index, int src_ auto& dst_node_defs = nodes_[dst_node_index]->MutableDefinitions(); NodeArg** dst_arg_pointer = nullptr; - if (dst_node_defs.input_defs.size() > dst_arg_slot) { + if (dst_node_defs.input_defs.size() > static_cast(dst_arg_slot)) { dst_arg_pointer = &dst_node_defs.input_defs[dst_arg_slot]; dst_arg = *dst_arg_pointer; } else { auto num_of_explicit_inputs = dst_node_defs.input_defs.size(); - if (num_of_explicit_inputs + dst_node_defs.implicit_input_defs.size() > dst_arg_slot) { + if (num_of_explicit_inputs + dst_node_defs.implicit_input_defs.size() > static_cast(dst_arg_slot)) { dst_arg_pointer = &dst_node_defs.implicit_input_defs[dst_arg_slot - num_of_explicit_inputs]; dst_arg = *dst_arg_pointer; } @@ -825,9 +820,8 @@ void Graph::AddEdge(NodeIndex src_node_index, NodeIndex dst_node_index, int src_ if (src_arg->Type() != dst_arg->Type()) { // The output type of source node arg does not match the input type of destination node arg. ORT_THROW("Argument type mismatch when adding edge."); - } else { - *dst_arg_pointer = src_arg; } + *dst_arg_pointer = src_arg; } nodes_[src_node_index]->MutableRelationships().output_edges.insert(Node::EdgeEnd(*nodes_[dst_node_index], src_arg_slot, dst_arg_slot)); @@ -835,16 +829,15 @@ void Graph::AddEdge(NodeIndex src_node_index, NodeIndex dst_node_index, int src_ } void Graph::RemoveEdge(NodeIndex src_node_index, NodeIndex dst_node_index, int src_arg_slot, int dst_arg_slot) { - if (nodes_.size() <= src_node_index || - nodes_.size() <= dst_node_index || - nullptr == nodes_[src_node_index] || - nullptr == nodes_[dst_node_index]) { + if (nodes_.size() <= src_node_index || src_arg_slot < 0 || nodes_.size() <= dst_node_index || dst_arg_slot < 0 || + nullptr == nodes_[src_node_index] || nullptr == nodes_[dst_node_index]) { // Invalid node indexes specified. ORT_THROW("Invalid node indexes specified when removing edge."); } - const NodeArg *src_arg = nullptr, *dst_arg = nullptr; - if (nodes_[src_node_index]->GetDefinitions().output_defs.size() > src_arg_slot) { + const NodeArg* src_arg = nullptr; + const NodeArg* dst_arg = nullptr; + if (nodes_[src_node_index]->GetDefinitions().output_defs.size() > static_cast(src_arg_slot)) { src_arg = nodes_[src_node_index]->GetDefinitions().output_defs[src_arg_slot]; } @@ -853,11 +846,11 @@ void Graph::RemoveEdge(NodeIndex src_node_index, NodeIndex dst_node_index, int s } auto& dst_node_defs = nodes_[dst_node_index]->GetDefinitions(); - if (dst_node_defs.input_defs.size() > dst_arg_slot) { + if (dst_node_defs.input_defs.size() > static_cast(dst_arg_slot)) { dst_arg = dst_node_defs.input_defs[dst_arg_slot]; } else { auto num_of_explicit_inputs = dst_node_defs.input_defs.size(); - if (num_of_explicit_inputs + dst_node_defs.implicit_input_defs.size() > dst_arg_slot) { + if (num_of_explicit_inputs + dst_node_defs.implicit_input_defs.size() > static_cast(dst_arg_slot)) { dst_arg = dst_node_defs.implicit_input_defs[dst_arg_slot - num_of_explicit_inputs]; } } @@ -887,7 +880,7 @@ Status Graph::BuildConnections(std::vector& outer_scope_node_args_c for (auto* node : resolve_context_.nodes_with_subgraphs) { for (auto& subgraph : node->MutableSubgraphs()) { std::vector node_args_consumed; - subgraph->BuildConnections(node_args_consumed); + ORT_RETURN_IF_ERROR(subgraph->BuildConnections(node_args_consumed)); for (auto& node_arg_name : node_args_consumed) { auto node_arg = GetNodeArg(node_arg_name); @@ -901,7 +894,7 @@ Status Graph::BuildConnections(std::vector& outer_scope_node_args_c if (!parent_graph_) { return ORT_MAKE_STATUS( ONNXRUNTIME, INVALID_GRAPH, - "At top level graph without matching NodeArg that subgraph consumes. Name=", + "This is an invalid model. At top level graph without matching NodeArg that subgraph consumes. Name=", node_arg_name, " Graph may not conform to the ONNX spec and contain initializers that are not graph inputs."); } @@ -912,7 +905,7 @@ Status Graph::BuildConnections(std::vector& outer_scope_node_args_c if (!node_arg) { return ORT_MAKE_STATUS( ONNXRUNTIME, INVALID_GRAPH, - "Failed to find NodeArg in all parent graphs. Name=", node_arg_name, + "This is an invalid model. Failed to find NodeArg in all parent graphs. Name=", node_arg_name, " Graph may not conform to the ONNX spec and contain initializers that are not graph inputs."); } } @@ -954,7 +947,7 @@ Status Graph::BuildConnections(std::vector& outer_scope_node_args_c // Need mutable input defs to be able to set any outer scope NodeArg implicit inputs auto& input_args = node.MutableInputDefs(); - if (input_args.size() > 0) { + if (!input_args.empty()) { // This node needs inputs. int input_slot_index = -1; @@ -970,7 +963,7 @@ Status Graph::BuildConnections(std::vector& outer_scope_node_args_c // No such output_arg matching this input_arg. // This input arg should be fed when running evaluation. // See if it's present in the outer scope. If so it will be 'fed' by the execution frame - // providing access to the MLValue from the outer scope. Pass the name back up so nodes can + // providing access to the OrtValue from the outer scope. Pass the name back up so nodes can // be linked correctly at that level. if (outer_scope_node_args.find(input_arg->Name()) != outer_scope_node_args.cend()) { outer_scope_node_args_consumed.push_back(input_arg->Name()); @@ -985,7 +978,7 @@ Status Graph::BuildConnections(std::vector& outer_scope_node_args_c inner_nodes.insert(&output_node); } - } else if (node.OutputDefs().size() <= 0) { + } else if (node.OutputDefs().empty()) { // This is a useless node. // It has no input/output. RemoveNode(node.Index()); @@ -1000,6 +993,7 @@ void Graph::ReverseDFSFrom(const std::vector& from, const std::function& leave, const std::function& comp) const { std::vector node_vec; + node_vec.reserve(from.size()); for (auto i : from) { node_vec.push_back(GetNode(i)); } @@ -1095,7 +1089,7 @@ Status Graph::PerformTopologicalSortAndCheckIsAcyclic() { // start at the bottom and work our way up the graph for (auto iter = Nodes().begin(); iter != Nodes().end(); ++iter) { - if (0 == iter->relationships_.output_edges.size()) { + if (iter->relationships_.output_edges.empty()) { // This is a leaf node. stack.push(iter->Index()); } @@ -1129,7 +1123,7 @@ Status Graph::PerformTopologicalSortAndCheckIsAcyclic() { for (auto iter = node->InputNodesBegin(); iter != node->InputNodesEnd(); ++iter) { const NodeIndex idx = (*iter).Index(); if (output_nodes.find(idx) != output_nodes.end()) { - Status status(ONNXRUNTIME, FAIL, "Error: the graph is not acyclic."); + Status status(ONNXRUNTIME, FAIL, "This is an invalid model. Error: the graph is not acyclic."); return status; } @@ -1144,9 +1138,8 @@ Status Graph::PerformTopologicalSortAndCheckIsAcyclic() { if (num_of_nodes_ >= 0 && static_cast(num_of_nodes_) == nodes_in_topological_order_.size()) { return Status::OK(); - } else { - return Status(ONNXRUNTIME, FAIL, "Error: the graph is not acyclic."); } + return Status(ONNXRUNTIME, FAIL, "This is an invalid model. Error: the graph is not acyclic."); } bool FullyDefinedType(const TypeProto& type_proto) { @@ -1234,16 +1227,15 @@ class InferenceContextImpl : public ONNX_NAMESPACE::InferenceContext { } } - const std::vector InferredOutputTypes() const { return node_output_types_; } + std::vector InferredOutputTypes() const { return node_output_types_; } const AttributeProto* getAttribute(const std::string& name) const override { auto& attribute_value_map = node_.GetAttributes(); auto iter = attribute_value_map.find(name); if (iter == attribute_value_map.end()) { return nullptr; - } else { - return &iter->second; } + return &iter->second; } size_t getNumInputs() const noexcept override { @@ -1328,10 +1320,9 @@ Status Graph::InferAndVerifySubgraphTypes(const Node& node, Graph& subgraph, " inputs but subgraph has ", num_subgraph_inputs, " inputs and requires ", num_required_subgraph_inputs, " inputs. Either provide all subgraph inputs, or just the required inputs."); - } else { + } subgraph_inputs = &required_subgraph_inputs; num_subgraph_inputs = num_required_subgraph_inputs; - } } // apply type/shape info to the subgraph's inputs @@ -1421,7 +1412,9 @@ Status Graph::InferAndVerifyTypeMatch(Node& node, const OpSchema& op) { // Logic error: This should not happen if we properly checked that every use has // a corresponding def, for which type-inference already produced a valid type Status status(ONNXRUNTIME, FAIL, - "Node (" + node_name + ") input arg (" + + "This is an invalid model. " + "Node (" + + node_name + ") input arg (" + input_def->Name() + ") does not have type information set by parent node."); return status; } @@ -1435,7 +1428,9 @@ Status Graph::InferAndVerifyTypeMatch(Node& node, const OpSchema& op) { // Type error in input model/graph. Status status(ONNXRUNTIME, INVALID_GRAPH, - "Type Error: Type '" + *input_type + "' of input parameter (" + input_def->Name() + + "This is an invalid model. " + "Type Error: Type '" + + *input_type + "' of input parameter (" + input_def->Name() + ") of operator (" + op.Name() + ") in node (" + node_name + ") is invalid."); return status; } @@ -1565,7 +1560,10 @@ common::Status Graph::TypeCheckInputsAndInitializers() { // Check that the type of every input is specified: for (auto* graph_input : GetInputs()) { if (nullptr == graph_input->Type()) { - Status status(ONNXRUNTIME, FAIL, "Model input (" + graph_input->Name() + ") does not have type information."); + Status status(ONNXRUNTIME, FAIL, + "This is an invalid model. " + "Model input (" + + graph_input->Name() + ") does not have type information."); return status; } } @@ -1656,7 +1654,7 @@ Status Graph::VerifyNodeAndOpMatch() { try { checker::check_node(node_proto, ctx, lsc); } catch (const std::exception& ex) { - return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_GRAPH, "Node:", node_name, " ", ex.what()); + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_GRAPH, "This is an invalid model. Error in Node:", node_name, " : ", ex.what()); } auto maxInclusiveVersion = DomainToVersionMap().find(domain)->second; @@ -1700,7 +1698,9 @@ Status Graph::VerifyNodeAndOpMatch() { // TODO: Handle optional attribute but no default value specified in op definition. } else { Status status(ONNXRUNTIME, FAIL, - "Node (" + node_name + ") attribute (" + attr_def.first + + "This is an invalid model. " + "Node (" + + node_name + ") attribute (" + attr_def.first + ") is required but not specified."); return status; } @@ -2044,7 +2044,8 @@ Node& Graph::AddNode(const std::string& name, const std::vector& output_args, const NodeAttributes* attributes, const std::string& domain) { - std::vector inputs, outputs; + std::vector inputs; + std::vector outputs; inputs.resize(input_args.size()); outputs.resize(output_args.size()); int i = 0; @@ -2100,26 +2101,20 @@ bool Graph::AddControlEdge(NodeIndex src_node_index, NodeIndex dst_node_index) { return true; } -const GraphProto& Graph::ToGraphProto() { +const ONNX_NAMESPACE::GraphProto& Graph::ToGraphProto() { if (!GraphProtoSyncNeeded()) { return *graph_proto_; } // Nodes. - graph_proto_->clear_node(); - GraphViewer graph_viewer(*this); - // Nodes must be sorted in Topological Order in the GraphProto per ONNX spec. - for (auto& node_idx : graph_viewer.GetNodesInTopologicalOrder()) { - const gsl::not_null node_proto{graph_proto_->add_node()}; - const gsl::not_null p_node{GetNode(node_idx)}; - p_node->ToProto(*node_proto); - } + ToGraphProtoInternal(*graph_proto_); if (!removed_initializer_indexes_.empty()) { // Move initializers. std::sort(removed_initializer_indexes_.begin(), removed_initializer_indexes_.end()); int lastInUseInitializerIndex = graph_proto_->initializer_size() - 1; - int start = 0, end = gsl::narrow_cast(removed_initializer_indexes_.size()) - 1; + int start = 0; + int end = gsl::narrow_cast(removed_initializer_indexes_.size()) - 1; int lastRemovedInitializerIndex = removed_initializer_indexes_[end]; for (; start <= end; start++) { @@ -2143,36 +2138,58 @@ const GraphProto& Graph::ToGraphProto() { removed_initializer_indexes_.clear(); } - // Sync graph inputs/outputs/valueInfo. - SyncGraphInputsOutputs(); - GraphProtoSyncNeeded(false); return *graph_proto_; } -void Graph::SyncGraphInputsOutputs() { +ONNX_NAMESPACE::GraphProto Graph::ToGraphProto() const { + if (!GraphProtoSyncNeeded()) { + return *graph_proto_; + } + GraphProto result; + ToGraphProtoInternal(result); + + for (auto initializer : GetAllInitializedTensors()) { + *result.add_initializer() = *initializer.second; + } + + return result; +} + +void Graph::ToGraphProtoInternal(ONNX_NAMESPACE::GraphProto& graph_proto) const { + graph_proto_->clear_node(); graph_proto_->clear_input(); graph_proto_->clear_output(); graph_proto_->clear_value_info(); + graph_proto.set_name(Name()); + graph_proto.set_doc_string(Description()); for (const auto* input_arg : GetInputsIncludingInitializers()) { - *(graph_proto_->mutable_input()->Add()) = input_arg->ToProto(); + *(graph_proto.mutable_input()->Add()) = input_arg->ToProto(); } for (const auto* output_arg : GetOutputs()) { - *(graph_proto_->mutable_output()->Add()) = output_arg->ToProto(); + *(graph_proto.mutable_output()->Add()) = output_arg->ToProto(); } for (const auto* value_info : value_info_) { - *(graph_proto_->mutable_value_info()->Add()) = value_info->ToProto(); + *(graph_proto.mutable_value_info()->Add()) = value_info->ToProto(); } // add the NodeArg info for outer scope NodeArgs so we capture the type information for (const auto& name : outer_scope_node_arg_names_) { auto* node_arg = GetNodeArg(name); ORT_ENFORCE(node_arg, "Outer scope node arg name '" + name + "'was added but does not exist. "); - *(graph_proto_->mutable_value_info()->Add()) = node_arg->ToProto(); + *(graph_proto.mutable_value_info()->Add()) = node_arg->ToProto(); + } + + GraphViewer graph_viewer(*this); + // Nodes must be sorted in Topological Order in the GraphProto per ONNX spec. + for (auto& node_idx : graph_viewer.GetNodesInTopologicalOrder()) { + const gsl::not_null node_proto{graph_proto.add_node()}; + const gsl::not_null p_node{GetNode(node_idx)}; + p_node->ToProto(*node_proto); } } @@ -2279,7 +2296,10 @@ Status Graph::SetGraphInputsOutputs() { auto iter3 = graph_inputs.find(graph_output_name); if (graph_inputs.end() == iter3) { // Graph output is not found as any graph input. - return Status(ONNXRUNTIME, FAIL, "Graph output (" + graph_output_name + ") does not exist in the graph."); + return Status(ONNXRUNTIME, FAIL, + "This is an invalid model. " + "Graph output (" + + graph_output_name + ") does not exist in the graph."); } graph_outputs_.push_back(iter3->second); continue; @@ -2371,7 +2391,8 @@ Status Graph::SetGraphInputsOutputs() { if (!graph_outputs_manually_set_) { // Set graph outputs in order. std::vector graph_output_args_index; - for (auto output_arg : graph_output_args) { + graph_output_args_index.reserve(graph_output_args.size()); + for (const auto& output_arg : graph_output_args) { graph_output_args_index.push_back(output_arg.second); } std::sort(graph_output_args_index.begin(), graph_output_args_index.end()); @@ -2425,7 +2446,8 @@ Node& Graph::FuseSubGraph(std::unique_ptr<::onnxruntime::IndexedSubGraph> sub_gr auto func_meta_def = sub_graph->GetMetaDef(); ORT_ENFORCE(nullptr != func_meta_def); - std::vector input_args, output_args; + std::vector input_args; + std::vector output_args; for (auto& arg_name : func_meta_def->inputs) { input_args.push_back(GetNodeArg(arg_name)); } @@ -2443,7 +2465,7 @@ Node& Graph::FuseSubGraph(std::unique_ptr<::onnxruntime::IndexedSubGraph> sub_gr fused_node.SetNodeType(Node::Type::Fused); function_container_.emplace_back(MakeFunction(*this, std::move(sub_graph))); - fused_node.SetFunctionBody(*(function_container_.back().get())); + fused_node.SetFunctionBody(*function_container_.back()); // Remove nodes fused above. auto& sub_graph_ref = function_container_.back()->GetIndexedSubGraph(); @@ -2477,7 +2499,7 @@ Status Graph::InlineFunction(Node& node) { return Status::OK(); } -void Graph::SetInputs(const std::vector inputs) { +void Graph::SetInputs(const std::vector& inputs) { if (GraphLoadedFromModelFile(graph_proto_)) { // TODO: add this support. ORT_THROW("This API is not supported when model is loaded from proto file right now."); @@ -2487,7 +2509,7 @@ void Graph::SetInputs(const std::vector inputs) { graph_inputs_manually_set_ = true; } -void Graph::SetOutputs(const std::vector outputs) { +void Graph::SetOutputs(const std::vector& outputs) { if (GraphLoadedFromModelFile(graph_proto_)) { // TODO: add this support. ORT_THROW("This API is not supported when model is loaded from proto file right now."); diff --git a/onnxruntime/core/graph/graph_utils.cc b/onnxruntime/core/graph/graph_utils.cc index 269e981dd97c3..d3a21293a59a6 100644 --- a/onnxruntime/core/graph/graph_utils.cc +++ b/onnxruntime/core/graph/graph_utils.cc @@ -48,9 +48,8 @@ struct GraphEdge { static bool OutputEdgeProvidesImplicitInput(const Graph& graph, const GraphEdge& output_edge) { // we treat the explicit and implicit inputs as sequential, so if the destination arg index of an output edge // is past the valid range for the node's explicit inputs, it is for an implicit input - const auto num_explicit_inputs = (*graph.GetNode(output_edge.dst_node)).InputDefs().size(); - bool is_implicit_input = output_edge.dst_arg_index >= num_explicit_inputs; - return is_implicit_input; + const size_t num_explicit_inputs = (*graph.GetNode(output_edge.dst_node)).InputDefs().size(); + return static_cast(output_edge.dst_arg_index) >= num_explicit_inputs; } /** Checks if new_output_name can be used to replace removed_output_name in the subgraph input. @@ -69,7 +68,7 @@ static bool CanUpdateImplicitInputNameInSubgraph(Node& node, for (auto& subgraph_node : attr_subgraph_pair.second->Nodes()) { // recurse if this node also consumes removed_output_name as an implicit input (i.e. there are multiple levels of nested // subgraphs, and at least one level lower uses removed_output_name as an implicit input - const auto& subgraph_node_implicit_inputs = subgraph_node.ImplicitInputDefs(); + const auto subgraph_node_implicit_inputs = subgraph_node.ImplicitInputDefs(); if (!subgraph_node_implicit_inputs.empty()) { auto subgraph_node_also_consumes_nodearg_as_implicit_input = std::find_if(subgraph_node_implicit_inputs.cbegin(), subgraph_node_implicit_inputs.cend(), @@ -99,7 +98,7 @@ static void UpdateImplicitInputNameInSubgraph(Node& node, // recurse if this node also consumes removed_output_name as an implicit input // (i.e. there are multiple levels of nested subgraphs, and at least one level lower uses // removed_output_name as an implicit input - const auto& subgraph_node_implicit_inputs = subgraph_node.ImplicitInputDefs(); + const auto subgraph_node_implicit_inputs = subgraph_node.ImplicitInputDefs(); if (!subgraph_node_implicit_inputs.empty()) { auto subgraph_node_also_consumes_nodearg_as_implicit_input = std::find_if(subgraph_node_implicit_inputs.cbegin(), subgraph_node_implicit_inputs.cend(), @@ -234,7 +233,7 @@ static bool RemoveNodeWithSingleInitializerIn(Graph& graph, Node& node) { auto output_node = graph.GetNode(output_edge.dst_node); ORT_ENFORCE(output_node, "Outgoing node could not be found."); - auto dst_arg_idx = output_edge.dst_arg_index; + size_t dst_arg_idx = static_cast(output_edge.dst_arg_index); if (dst_arg_idx < output_node->InputDefs().size()) { output_node->MutableInputDefs()[output_edge.dst_arg_index] = input_def; } else if (dst_arg_idx < output_node->InputDefs().size() + output_node->ImplicitInputDefs().size()) { @@ -255,26 +254,28 @@ static bool RemoveNodeWithSingleInitializerIn(Graph& graph, Node& node) { const std::string& GetNodeInputName(const Node& node, int index) { const auto& inputs = node.InputDefs(); - ORT_ENFORCE(index < inputs.size(), "Attempting to get an input that does not exist."); + ORT_ENFORCE(index >= 0 && static_cast(index) < inputs.size(), + "Attempting to get an input that does not exist."); return inputs[index]->Name(); } const std::string& GetNodeOutputName(const Node& node, int index) { const auto& outputs = node.OutputDefs(); - assert(index < outputs.size()); + ORT_ENFORCE(index >= 0 && static_cast(index) < outputs.size(), + "Attempting to get an output that does not exist."); return outputs[index]->Name(); } bool IsSupportedOptypeVersionAndDomain(const Node& node, const std::string& op_type, - ONNX_NAMESPACE::OperatorSetVersion version, + const std::initializer_list& versions, const std::string& domain) { return (node.OpType() == op_type && !node.Op()->Deprecated() && - MatchesOpSinceVersion(node, version) && MatchesOpSetDomain(node, domain)); + MatchesOpSinceVersion(node, versions) && MatchesOpSetDomain(node, domain)); } -bool MatchesOpSinceVersion(const Node& node, ONNX_NAMESPACE::OperatorSetVersion version) { - return node.Op()->SinceVersion() == version; +bool MatchesOpSinceVersion(const Node& node, const std::initializer_list& versions) { + return std::find(versions.begin(), versions.end(), node.Op()->SinceVersion()) != versions.end(); } bool MatchesOpSetDomain(const Node& node, const std::string& domain) { @@ -287,12 +288,8 @@ bool MatchesOpSetDomain(const Node& node, const std::string& domain) { bool IsSupportedProvider(const Node& node, const std::unordered_set& compatible_providers) { - if (!compatible_providers.empty() && - compatible_providers.find(node.GetExecutionProviderType()) == compatible_providers.end()) { - return false; - } - - return true; + return !(!compatible_providers.empty() && + compatible_providers.find(node.GetExecutionProviderType()) == compatible_providers.end()); } Status ForAllMutableSubgraphs(Graph& graph, std::function func) { @@ -342,9 +339,7 @@ Status ForAllSubgraphs(const Graph& graph, std::function f } bool IsSingleInSingleOutNode(const Node& node) { - return node.InputDefs().size() == 1 && - node.ImplicitInputDefs().size() == 0 && - node.OutputDefs().size() == 1; + return node.InputDefs().size() == 1 && node.ImplicitInputDefs().empty() && node.OutputDefs().size() == 1; } const ONNX_NAMESPACE::AttributeProto* GetNodeAttribute(const Node& node, const std::string& attr_name) { @@ -353,19 +348,49 @@ const ONNX_NAMESPACE::AttributeProto* GetNodeAttribute(const Node& node, const s return iter == attrs.end() ? nullptr : &iter->second; } -bool RemoveSingleInputNode(Graph& graph, Node& node) { - // Cannot remove a node with multiple output NodeArgs (multiple output edges is fine), neither - // a node whose output is also a graph output. - if (!IsSingleInSingleOutNode(node) || - graph.IsNodeOutputsInGraphOutputs(node)) { +/** Checks for nodes with >= 1 outputs, if only one of the outputs is input to downstream Operators. */ +static bool IsOnlyOneOutputUsed(const Node& node) { + if (node.GetOutputEdgesCount() > 1) { + const int unassigned = -1; + int first_output = unassigned; + for (auto it = node.OutputEdgesBegin(), end = node.OutputEdgesEnd(); it != end; ++it) { + if (first_output == unassigned) { + first_output = it->GetSrcArgIndex(); + } else if (first_output != it->GetSrcArgIndex()) { + return false; + } + } + } + return true; +} + +bool IsOutputUsed(const Node& node, int index) { + for (auto it = node.OutputEdgesBegin(), end = node.OutputEdgesEnd(); it != end; ++it) { + if (it->GetSrcArgIndex() == index) { + return true; + } + } + return false; +} + +bool RemoveNode(Graph& graph, Node& node) { + // Cannot remove a node with implicit inputs, whose output is also a graph output, + // or with more than one of its outputs as input to downstream Operators. + if (!node.ImplicitInputDefs().empty() || + graph.IsNodeOutputsInGraphOutputs(node) || !IsOnlyOneOutputUsed(node)) { return false; } - // If the single input comes from another node (initializers are not connected with edges to nodes). if (node.GetInputEdgesCount() == 1) { + // If there is a single input edge from another node (initializers are not connected with edges to nodes). return RemoveNodeWithSingleNodeIn(graph, node); - } else { + } + if (node.InputDefs().size() == 1) { + // If a single initializer is the only input. return RemoveNodeWithSingleInitializerIn(graph, node); + } else { + // No other node removal is supported, because there will be no way to connect its inputs to its outputs. + return false; } } diff --git a/onnxruntime/core/graph/graph_utils.h b/onnxruntime/core/graph/graph_utils.h index 1f3593b013916..0f84a7575c36f 100644 --- a/onnxruntime/core/graph/graph_utils.h +++ b/onnxruntime/core/graph/graph_utils.h @@ -13,11 +13,11 @@ namespace graph_utils { /** Checks if the operator's type, version, and domain of the given node match the given values. */ bool IsSupportedOptypeVersionAndDomain(const Node& node, const std::string& op_type, - ONNX_NAMESPACE::OperatorSetVersion version, + const std::initializer_list& versions, const std::string& domain = kOnnxDomainAlias); /** Checks if the node has the same operator since version as the given one. */ -bool MatchesOpSinceVersion(const Node& node, ONNX_NAMESPACE::OperatorSetVersion version); +bool MatchesOpSinceVersion(const Node& node, const std::initializer_list& versions); /** Checks if the node has the same op set domain as the given one. */ bool MatchesOpSetDomain(const Node& node, const std::string& domain); @@ -32,6 +32,9 @@ bool IsSupportedProvider(const Node& node, fed to multiple downstream operators, i.e., it can have multiple output edges. */ bool IsSingleInSingleOutNode(const Node& node); +/** Checks if the output at the specified index is input to downstream Nodes. */ +bool IsOutputUsed(const Node& node, int index); + /** Returns true if the graph has the given input.*/ bool IsGraphInput(const Graph& graph, const NodeArg* input); @@ -56,18 +59,22 @@ bool GetRepeatedNodeAttributeValues(const Node& node, if (attr) { values = ONNX_NAMESPACE::RetrieveValues(*attr); return true; - } else { - return false; } + return false; } Status ForAllMutableSubgraphs(Graph& main_graph, std::function func); Status ForAllSubgraphs(const Graph& main_graph, std::function func); -/** Removes the given single-input Node from the Graph. The single input might be either - another node or an initializer, but not an implicit input. The node should have a single - output but can have multiple output edges. */ -bool RemoveSingleInputNode(Graph& graph, Node& node); +/** Removes the given Node from the Graph and keeps Graph consistent by rebuilding needed connections. + We support the removal of the Node as long as the following conditions hold: + - There should be no implicit inputs. + - Only one of the outputs is used by downstream operators (but it can have multiple output edges). + - If the Node has a single incoming node (and possibly multiple initializers), we can remove the Node and + connect its incoming node to its outgoing nodes. + - If the Node has a single initializer as input, we remove the Node and feed the initializer as input to its + output nodes. */ +bool RemoveNode(Graph& graph, Node& node); /** Removes all output edges from the given Node of the Graph. This should probably be elevated to the Graph API eventually. */ diff --git a/onnxruntime/core/graph/model.cc b/onnxruntime/core/graph/model.cc index c3673dd7980a1..46cb19f531a93 100644 --- a/onnxruntime/core/graph/model.cc +++ b/onnxruntime/core/graph/model.cc @@ -42,7 +42,7 @@ Model::Model(const std::string& graph_name, } auto schema_registry = std::make_shared(); - for (auto schema_collection : local_registries) { + for (const auto& schema_collection : local_registries) { schema_registry->RegisterRegistry(schema_collection); } @@ -53,7 +53,7 @@ Model::Model(const std::string& graph_name, p_domain_to_version = &domain_to_version_static; } - for (auto domain : *p_domain_to_version) { + for (const auto& domain : *p_domain_to_version) { const gsl::not_null opset_id_proto{model_proto_->add_opset_import()}; opset_id_proto->set_domain(domain.first); opset_id_proto->set_version(domain.second); @@ -97,7 +97,7 @@ Model::Model(std::unique_ptr model_proto, const IOnnxRuntimeOpSchema auto schema_registry = std::make_shared(); if (local_registries != nullptr) { - for (auto schema_collection : *local_registries) { + for (const auto& schema_collection : *local_registries) { schema_registry->RegisterRegistry(schema_collection); } } @@ -108,7 +108,7 @@ Model::Model(std::unique_ptr model_proto, const IOnnxRuntimeOpSchema } auto domain_map = schema_registry->GetLatestOpsetVersions(false); - for (auto domain : domain_map) { + for (const auto& domain : domain_map) { if (domain_to_version.find(domain.first) == domain_to_version.end()) { domain_to_version[domain.first] = domain.second; const gsl::not_null opset_id_proto{model_proto_->add_opset_import()}; @@ -391,8 +391,7 @@ Status Model::Save(Model& model, int p_fd) { const bool result = model_proto.SerializeToZeroCopyStream(&output) && output.Flush(); if (result) { return Status::OK(); - } else { - return Status(ONNXRUNTIME, INVALID_PROTOBUF, "Protobuf serialization failed."); } + return Status(ONNXRUNTIME, INVALID_PROTOBUF, "Protobuf serialization failed."); } } // namespace onnxruntime diff --git a/onnxruntime/core/graph/schema_registry.cc b/onnxruntime/core/graph/schema_registry.cc index 3df920eb74639..f0d4005c1503d 100644 --- a/onnxruntime/core/graph/schema_registry.cc +++ b/onnxruntime/core/graph/schema_registry.cc @@ -179,7 +179,7 @@ DomainToVersionMap SchemaRegistryManager::GetLatestOpsetVersions(bool is_onnx_on auto& onnx_domain_version_map = ONNX_NAMESPACE::OpSchemaRegistry::DomainToVersionRange::Instance().Map(); - for (auto domain : onnx_domain_version_map) { + for (const auto& domain : onnx_domain_version_map) { if (is_onnx_only && domain.first.compare(kOnnxDomain) != 0) continue; auto it = domain_version_map.find(domain.first); diff --git a/onnxruntime/core/mlas/inc/mlas.h b/onnxruntime/core/mlas/inc/mlas.h index 849c258c6bfa8..dfa452b4449fb 100644 --- a/onnxruntime/core/mlas/inc/mlas.h +++ b/onnxruntime/core/mlas/inc/mlas.h @@ -226,6 +226,14 @@ MlasComputeTanh( size_t N ); +void +MLASCALL +MlasComputeErf( + const float* Input, + float* Output, + size_t N + ); + // // Half-precision floating-point routines. // diff --git a/onnxruntime/core/mlas/lib/mlasi.h b/onnxruntime/core/mlas/lib/mlasi.h index 084a27b0c3439..65ce61bd38b3c 100644 --- a/onnxruntime/core/mlas/lib/mlasi.h +++ b/onnxruntime/core/mlas/lib/mlasi.h @@ -174,6 +174,16 @@ void typedef MLAS_TANH_KERNEL_ROUTINE* PMLAS_TANH_KERNEL_ROUTINE; +typedef +void +(MLASCALL MLAS_ERF_KERNEL_ROUTINE)( + const float* Input, + float* Output, + size_t N + ); + +typedef MLAS_ERF_KERNEL_ROUTINE* PMLAS_ERF_KERNEL_ROUTINE; + extern "C" { MLAS_SGEMM_KERNEL_ROUTINE MlasSgemmKernelZero; @@ -203,9 +213,11 @@ extern "C" { MLAS_TANH_KERNEL_ROUTINE MlasLogisticKernel; MLAS_TANH_KERNEL_ROUTINE MlasTanhKernel; + MLAS_ERF_KERNEL_ROUTINE MlasErfKernel; #if defined(MLAS_TARGET_AMD64) MLAS_TANH_KERNEL_ROUTINE MlasLogisticKernelFma3; MLAS_TANH_KERNEL_ROUTINE MlasTanhKernelFma3; + MLAS_ERF_KERNEL_ROUTINE MlasErfKernelFma3; #endif } @@ -269,6 +281,7 @@ struct MLAS_PLATFORM { PMLAS_SGEMM_TRANSPOSE_PACKB_BLOCK_ROUTINE TransposePackB16x4Routine; PMLAS_LOGISTIC_KERNEL_ROUTINE LogisticKernelRoutine; PMLAS_TANH_KERNEL_ROUTINE TanhKernelRoutine; + PMLAS_ERF_KERNEL_ROUTINE ErfKernelRoutine; #endif #if defined(MLAS_USE_WIN32_THREADPOOL) @@ -574,6 +587,75 @@ MlasMinimumFloat32x4(MLAS_FLOAT32X4 Vector1, MLAS_FLOAT32X4 Vector2) #endif } +inline +MLAS_FLOAT32X4 +MlasGreaterThanFloat32x4(MLAS_FLOAT32X4 Vector1, MLAS_FLOAT32X4 Vector2) +{ +#if defined(MLAS_NEON_INTRINSICS) + return vreinterpretq_f32_u32(vcgtq_f32(Vector1, Vector2)); +#elif defined(MLAS_SSE2_INTRINSICS) + return _mm_cmpgt_ps(Vector1, Vector2); +#endif +} + +inline +MLAS_FLOAT32X4 +MlasAndFloat32x4(MLAS_FLOAT32X4 Vector1, MLAS_FLOAT32X4 Vector2) +{ +#if defined(MLAS_NEON_INTRINSICS) + return vreinterpretq_f32_u32(vandq_u32(vreinterpretq_u32_f32(Vector1), vreinterpretq_u32_f32(Vector2))); +#elif defined(MLAS_SSE2_INTRINSICS) + return _mm_and_ps(Vector1, Vector2); +#endif +} + +inline +MLAS_FLOAT32X4 +MlasOrFloat32x4(MLAS_FLOAT32X4 Vector1, MLAS_FLOAT32X4 Vector2) +{ +#if defined(MLAS_NEON_INTRINSICS) + return vreinterpretq_f32_u32(vorrq_u32(vreinterpretq_u32_f32(Vector1), vreinterpretq_u32_f32(Vector2))); +#elif defined(MLAS_SSE2_INTRINSICS) + return _mm_or_ps(Vector1, Vector2); +#endif +} + +inline +MLAS_FLOAT32X4 +MlasAndNotFloat32x4(MLAS_FLOAT32X4 VectorNot, MLAS_FLOAT32X4 Vector) +{ +#if defined(MLAS_NEON_INTRINSICS) + return vreinterpretq_f32_u32(vandq_u32(vmvnq_u32(vreinterpretq_u32_f32(VectorNot)), vreinterpretq_u32_f32(Vector))); +#elif defined(MLAS_SSE2_INTRINSICS) + return _mm_andnot_ps(VectorNot, Vector); +#endif +} + +inline +MLAS_FLOAT32X4 +MlasXorFloat32x4(MLAS_FLOAT32X4 Vector1, MLAS_FLOAT32X4 Vector2) +{ +#if defined(MLAS_NEON_INTRINSICS) + return vreinterpretq_f32_u32(veorq_u32(vreinterpretq_u32_f32(Vector1), vreinterpretq_u32_f32(Vector2))); +#elif defined(MLAS_SSE2_INTRINSICS) + return _mm_xor_ps(Vector1, Vector2); +#endif +} + +// calc 2^int(N) +inline +MLAS_FLOAT32X4 +MlasPowerOf2Float32x4(MLAS_FLOAT32X4 Vector) +{ +#if defined(MLAS_NEON_INTRINSICS) + int32x4_t emm0 = vaddq_s32(vcvtq_s32_f32(Vector), vdupq_n_s32(0x7f)); + return vreinterpretq_f32_s32(vshlq_n_s32(emm0, 23)); +#elif defined(MLAS_SSE2_INTRINSICS) + __m128i emm0 = _mm_add_epi32(_mm_cvttps_epi32(Vector), _mm_set1_epi32(0x7f)); + return _mm_castsi128_ps(_mm_slli_epi32(emm0, 23)); +#endif +} + // // Reads a platform specific time stamp counter. // diff --git a/onnxruntime/core/mlas/lib/platform.cpp b/onnxruntime/core/mlas/lib/platform.cpp index 88c3dd45798a9..904d16275d412 100644 --- a/onnxruntime/core/mlas/lib/platform.cpp +++ b/onnxruntime/core/mlas/lib/platform.cpp @@ -90,6 +90,7 @@ Return Value: this->TransposePackB16x4Routine = MlasSgemmTransposePackB16x4Sse; this->LogisticKernelRoutine = MlasLogisticKernel; this->TanhKernelRoutine = MlasTanhKernel; + this->ErfKernelRoutine = MlasErfKernel; #endif // @@ -144,6 +145,7 @@ Return Value: this->LogisticKernelRoutine = MlasLogisticKernelFma3; this->TanhKernelRoutine = MlasTanhKernelFma3; + this->ErfKernelRoutine = MlasErfKernelFma3; } else { diff --git a/onnxruntime/core/optimizer/constant_folding.cc b/onnxruntime/core/optimizer/constant_folding.cc index ac277dfff999e..8e5f90d63ca57 100644 --- a/onnxruntime/core/optimizer/constant_folding.cc +++ b/onnxruntime/core/optimizer/constant_folding.cc @@ -51,19 +51,19 @@ Status ConstantFolding::ApplyImpl(Graph& graph, bool& modified, int graph_level) kernel->Compute(&op_kernel_context); - std::vector fetches; + std::vector fetches; frame.GetOutputs(fetches); // Go over all output node args and substitute them with the newly computed tensors, which will be // added to the graph as initializers. ORT_ENFORCE(fetches.size() == node->OutputDefs().size()); for (size_t fetch_idx = 0; fetch_idx < fetches.size(); ++fetch_idx) { - MLValue& mlvalue = fetches[fetch_idx]; + OrtValue& ort_value = fetches[fetch_idx]; - // Build the TensorProto that corresponds to the computed MLValue and add it as initializer to the graph. + // Build the TensorProto that corresponds to the computed OrtValue and add it as initializer to the graph. ONNX_NAMESPACE::TensorProto out_tensorproto; const auto* constant_arg_out = node->OutputDefs()[fetch_idx]; - BuildTensorProtoForInitializer(mlvalue, *constant_arg_out, out_tensorproto); + BuildTensorProtoForInitializer(ort_value, *constant_arg_out, out_tensorproto); graph.AddInitializedTensor(out_tensorproto); } @@ -81,11 +81,10 @@ Status ConstantFolding::ApplyImpl(Graph& graph, bool& modified, int graph_level) return Status::OK(); } // namespace onnxruntime -void ConstantFolding::BuildTensorProtoForInitializer(const MLValue& mlvalue, - const NodeArg& constant_node_arg, +void ConstantFolding::BuildTensorProtoForInitializer(const OrtValue& ort_value, const NodeArg& constant_node_arg, ONNX_NAMESPACE::TensorProto& tensorproto) const { - ORT_ENFORCE(mlvalue.IsTensor()); - const Tensor& out_tensor = mlvalue.Get(); + ORT_ENFORCE(ort_value.IsTensor()); + const Tensor& out_tensor = ort_value.Get(); // Set name, dimensions, type, and data of the TensorProto. tensorproto.set_name(constant_node_arg.Name()); diff --git a/onnxruntime/core/optimizer/constant_folding.h b/onnxruntime/core/optimizer/constant_folding.h index 29371e4c2c6ec..0bf935e9a4cd6 100644 --- a/onnxruntime/core/optimizer/constant_folding.h +++ b/onnxruntime/core/optimizer/constant_folding.h @@ -27,10 +27,9 @@ class ConstantFolding : public GraphTransformer { Status ApplyImpl(Graph& graph, bool& modified, int graph_level) const override; - /** Create a TensorProto that has the same value as the given MLValue + /** Create a TensorProto that has the same value as the given OrtValue and the same type and dimensions as the given NodeArg. */ - void BuildTensorProtoForInitializer(const MLValue& mlvalue, - const NodeArg& constant_node_arg, + void BuildTensorProtoForInitializer(const OrtValue& ort_value, const NodeArg& constant_node_arg, ONNX_NAMESPACE::TensorProto& tensorproto) const; }; diff --git a/onnxruntime/core/optimizer/conv_activation_fusion.cc b/onnxruntime/core/optimizer/conv_activation_fusion.cc index 90a14d5971067..c0c8e1ea45541 100644 --- a/onnxruntime/core/optimizer/conv_activation_fusion.cc +++ b/onnxruntime/core/optimizer/conv_activation_fusion.cc @@ -12,10 +12,10 @@ namespace onnxruntime { namespace { bool IsFusableActivation(const Node& node) { - return graph_utils::IsSupportedOptypeVersionAndDomain(node, "LeakyRelu", 6) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "Relu", 6) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "Sigmoid", 6) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "Tanh", 6); + return graph_utils::IsSupportedOptypeVersionAndDomain(node, "LeakyRelu", {6}) || + graph_utils::IsSupportedOptypeVersionAndDomain(node, "Relu", {6}) || + graph_utils::IsSupportedOptypeVersionAndDomain(node, "Sigmoid", {6}) || + graph_utils::IsSupportedOptypeVersionAndDomain(node, "Tanh", {6}); } void HandleActivationNodeEdges(Graph& g, const Node& act, Node& fused_conv) { @@ -46,7 +46,7 @@ Status ConvActivationFusion::ApplyImpl(Graph& graph, bool& modified, int graph_l auto node = graph.GetNode(index); ORT_RETURN_IF_ERROR(Recurse(*node, modified, graph_level)); - if (!graph_utils::IsSupportedOptypeVersionAndDomain(*node, "Conv", 1) || + if (!graph_utils::IsSupportedOptypeVersionAndDomain(*node, "Conv", {1}) || !graph_utils::IsSupportedProvider(*node, GetCompatibleExecutionProviders()) || node->GetOutputEdgesCount() != 1) { continue; diff --git a/onnxruntime/core/optimizer/conv_add_fusion.cc b/onnxruntime/core/optimizer/conv_add_fusion.cc index 6bfa4af65f1e9..59a8c010138a2 100644 --- a/onnxruntime/core/optimizer/conv_add_fusion.cc +++ b/onnxruntime/core/optimizer/conv_add_fusion.cc @@ -9,136 +9,114 @@ using namespace ONNX_NAMESPACE; using namespace ::onnxruntime::common; namespace onnxruntime { -Status ConvAddFusion::ApplyImpl(onnxruntime::Graph& graph, bool& modified, int graph_level) const { - std::vector removed_nodes; - for (auto& node : graph.Nodes()) { - ORT_RETURN_IF_ERROR(Recurse(node, modified, graph_level)); - - if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Conv", 1) || - !graph_utils::IsSupportedProvider(node, GetCompatibleExecutionProviders()) || - node.GetOutputEdgesCount() != 1) { - continue; - } +Status ConvAddFusion::Apply(Graph& graph, Node& node, RewriteRuleEffect& modified) { + auto& conv_node = node; + const auto& add_node = *conv_node.OutputNodesBegin(); + const auto& conv_inputs = conv_node.InputDefs(); + const auto& add_inputs = add_node.InputDefs(); + + const ONNX_NAMESPACE::TensorProto* conv_W_tensor_proto = nullptr; + graph.GetInitializedTensor(conv_inputs[1]->Name(), conv_W_tensor_proto); + + const ONNX_NAMESPACE::TensorProto* add_B_tensor_proto = nullptr; + graph.GetInitializedTensor(add_inputs[1]->Name(), add_B_tensor_proto); + + // Currently, fusion is only supported for float or double data type. + if (!Initializer::IsSupportedDataType(add_B_tensor_proto) || + conv_W_tensor_proto->dims_size() < 4) { + return Status::OK(); + } - const Node& next_node = *node.OutputNodesBegin(); - if (!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Add", 7) || - next_node.GetExecutionProviderType() != node.GetExecutionProviderType() || - next_node.GetInputEdgesCount() != 1 || - graph.IsNodeOutputsInGraphOutputs(next_node)) { - continue; + int axis; + if (add_B_tensor_proto->dims_size() == conv_W_tensor_proto->dims_size()) { + // Test for broadcast add such as 1xCx1x1 for a 2D convolution. + axis = 1; + } else if (add_B_tensor_proto->dims_size() == conv_W_tensor_proto->dims_size() - 1) { + // Test for broadcast add such as Cx1x1 for a 2D convolution. + axis = 0; + } else { + return Status::OK(); + } + if (add_B_tensor_proto->dims(axis) != conv_W_tensor_proto->dims(0)) { + return Status::OK(); + } + // The dimensions of add_B should be equal to 1 except axis dimension. + for (int i = 0; i < add_B_tensor_proto->dims_size(); i++) { + if (i != axis && add_B_tensor_proto->dims(i) != 1) { + return Status::OK(); } + } - auto& conv_node = node; - const Node& add_node = next_node; - - const auto& conv_inputs = conv_node.InputDefs(); - const auto& add_inputs = add_node.InputDefs(); + const ONNX_NAMESPACE::TensorProto* conv_B_tensor_proto = nullptr; + if (conv_inputs.size() == 3) { + graph.GetInitializedTensor(conv_inputs[2]->Name(), conv_B_tensor_proto); - const ONNX_NAMESPACE::TensorProto* conv_W_tensor_proto = nullptr; - graph.GetInitializedTensor(conv_inputs[1]->Name(), conv_W_tensor_proto); + if (!Initializer::IsSupportedDataType(conv_B_tensor_proto) || + conv_B_tensor_proto->data_type() != add_B_tensor_proto->data_type() || + conv_B_tensor_proto->dims_size() != 1 || + conv_B_tensor_proto->dims(0) != conv_W_tensor_proto->dims(0)) { + return Status::OK(); + } - const ONNX_NAMESPACE::TensorProto* add_B_tensor_proto = nullptr; - graph.GetInitializedTensor(add_inputs[1]->Name(), add_B_tensor_proto); + auto conv_B = std::make_unique(conv_B_tensor_proto); + auto add_B = std::make_unique(add_B_tensor_proto); - // Currently, fusion is only supported for float or double data type. - if (!Initializer::IsSupportedDataType(add_B_tensor_proto) || - conv_W_tensor_proto->dims_size() < 4 || - add_B_tensor_proto->dims_size() != conv_W_tensor_proto->dims_size() - 1 || - conv_W_tensor_proto->dims(0) != add_B_tensor_proto->dims(0)) { - continue; + if (conv_B->size() != add_B->size()) { + return Status::OK(); } - - // The dimensions of add_B should be equal to 1 except first dimension. - bool flag = false; - for (int i = 1; i < add_B_tensor_proto->dims_size(); i++) { - if (add_B_tensor_proto->dims(i) != 1) { - flag = true; - break; - } + // Calculate new value of initializers of conv node + conv_B->add(*add_B); + + // Create new initializers of conv + ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto; + conv_B->ToProto(&new_conv_B_tensor_proto); + + // Replace initializers of conv node + graph.RemoveInitializedTensor(conv_inputs[2]->Name()); + graph.AddInitializedTensor(new_conv_B_tensor_proto); + } else { + NodeArg* add_B_node_arg = graph.GetNodeArg(add_B_tensor_proto->name()); + if (add_B_node_arg == nullptr) { + return Status::OK(); } - if (flag) { - continue; - } + // Update shape of tensor proto + ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto(*add_B_tensor_proto); + int64_t dim = conv_W_tensor_proto->dims(0); + new_conv_B_tensor_proto.clear_dims(); + new_conv_B_tensor_proto.add_dims(dim); - const ONNX_NAMESPACE::TensorProto* conv_B_tensor_proto = nullptr; - if (conv_inputs.size() == 3) { - graph.GetInitializedTensor(conv_inputs[2]->Name(), conv_B_tensor_proto); - - if (!Initializer::IsSupportedDataType(conv_B_tensor_proto) || - conv_B_tensor_proto->data_type() != add_B_tensor_proto->data_type() || - conv_B_tensor_proto->dims_size() != 1 || - conv_B_tensor_proto->dims(0) != add_B_tensor_proto->dims(0)) { - continue; - } - - auto conv_B = std::make_unique(conv_B_tensor_proto); - auto add_B = std::make_unique(add_B_tensor_proto); - - if (conv_B->size() != add_B->size()) { - continue; - } - // Calculate new value of initializers of conv node - conv_B->add(*add_B); - - // Create new initializers of conv - ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto; - conv_B->ToProto(&new_conv_B_tensor_proto); - - // Replace initializers of conv node - graph.RemoveInitializedTensor(conv_inputs[2]->Name()); - graph.AddInitializedTensor(new_conv_B_tensor_proto); - } else { - NodeArg* add_B_node_arg = graph.GetNodeArg(add_B_tensor_proto->name()); - if (add_B_node_arg == nullptr) { - continue; - } - - // Update shape of tensor proto - ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto(*add_B_tensor_proto); - int64_t dim = conv_W_tensor_proto->dims(0); - new_conv_B_tensor_proto.clear_dims(); - new_conv_B_tensor_proto.add_dims(dim); - - graph.RemoveInitializedTensor(add_B_tensor_proto->name()); - graph.AddInitializedTensor(new_conv_B_tensor_proto); - - // Update shape of NodeArg - TensorShapeProto shape; - shape.add_dim()->set_dim_value(dim); - add_B_node_arg->SetShape(shape); - - conv_node.MutableInputDefs().push_back(add_B_node_arg); - conv_node.MutableInputArgsCount()[2] = 1; - } + graph.RemoveInitializedTensor(add_B_tensor_proto->name()); + graph.AddInitializedTensor(new_conv_B_tensor_proto); - // Replace the input of the node following add node - const NodeArg* add_output_def = add_node.OutputDefs()[0]; - NodeArg* conv_output_def = conv_node.MutableOutputDefs()[0]; - for (auto it = add_node.OutputNodesBegin(); it != add_node.OutputNodesEnd(); ++it) { - auto output_node = graph.GetNode((*it).Index()); - if (!output_node) { - return Status(ONNXRUNTIME, INVALID_ARGUMENT); - } - auto& input_defs = output_node->MutableInputDefs(); - for (auto& def : input_defs) { - if (def == add_output_def) { - def = conv_output_def; - } - } - } + // Update shape of NodeArg + TensorShapeProto shape; + shape.add_dim()->set_dim_value(dim); + add_B_node_arg->SetShape(shape); - removed_nodes.push_back(add_node.Index()); + conv_node.MutableInputDefs().push_back(add_B_node_arg); + conv_node.MutableInputArgsCount()[2] = 1; } - for (auto i : removed_nodes) { - graph.RemoveNode(i); + // Remove Add node. + auto* add_node_to_remove = graph.GetNode(add_node.Index()); + if (graph_utils::RemoveNode(graph, *add_node_to_remove)) { + modified = RewriteRuleEffect::kModifiedRestOfGraph; } - if (!removed_nodes.empty()) { - modified = true; + return Status::OK(); +} + +bool ConvAddFusion::SatisfyCondition(const Graph& graph, const Node& node) { + if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Conv", {1}) || + node.GetOutputEdgesCount() != 1) { + return false; } - return Status::OK(); + const auto& next_node = *node.OutputNodesBegin(); + return !(!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Add", {7}) || + next_node.GetExecutionProviderType() != node.GetExecutionProviderType() || + next_node.GetInputEdgesCount() != 1 || graph.IsNodeOutputsInGraphOutputs(next_node)); } + } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/conv_add_fusion.h b/onnxruntime/core/optimizer/conv_add_fusion.h index 717e013cac20e..3fe4e92b5abcf 100644 --- a/onnxruntime/core/optimizer/conv_add_fusion.h +++ b/onnxruntime/core/optimizer/conv_add_fusion.h @@ -3,16 +3,29 @@ #pragma once -#include "core/optimizer/graph_transformer.h" +#include "core/optimizer/rewrite_rule.h" namespace onnxruntime { -class ConvAddFusion : public onnxruntime::GraphTransformer { +/** +@Class ConvAddFusion + +Rewrite rule that fuses two Conv+Add nodes to a single Conv node. + +It is attempted to be triggered only on nodes with op type "Conv". +*/ +class ConvAddFusion : public RewriteRule { public: - ConvAddFusion() noexcept : onnxruntime::GraphTransformer("ConvAddFusion") {} + ConvAddFusion() noexcept : RewriteRule("ConvAddFusion") {} + + std::vector TargetOpTypes() const noexcept override { + return {"Conv"}; + } private: - Status ApplyImpl(Graph& graph, bool& modified, int graph_level) const override; + bool SatisfyCondition(const Graph& graph, const Node& node) override; + + Status Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) override; }; } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/conv_bn_fusion.cc b/onnxruntime/core/optimizer/conv_bn_fusion.cc index a64706ec9f56c..f13bf64eafa6e 100644 --- a/onnxruntime/core/optimizer/conv_bn_fusion.cc +++ b/onnxruntime/core/optimizer/conv_bn_fusion.cc @@ -9,184 +9,149 @@ using namespace ONNX_NAMESPACE; using namespace ::onnxruntime::common; namespace onnxruntime { -Status ConvBNFusion::ApplyImpl(Graph& graph, bool& modified, int graph_level) const { - std::vector removed_nodes; - for (auto& node : graph.Nodes()) { - ORT_RETURN_IF_ERROR(Recurse(node, modified, graph_level)); - - if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Conv", 1) || - !graph_utils::IsSupportedProvider(node, GetCompatibleExecutionProviders()) || - node.GetOutputEdgesCount() != 1) { - continue; - } - - const Node& next_node = *node.OutputNodesBegin(); - if (!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "BatchNormalization", 7) || - next_node.GetInputEdgesCount() != 1 || - graph.IsNodeOutputsInGraphOutputs(next_node) || - next_node.GetExecutionProviderType() != node.GetExecutionProviderType()) { - continue; - } - - auto& conv_node = node; - const Node& bn_node = next_node; - - // Get value of attribute group - const onnxruntime::NodeAttributes& conv_attributes = conv_node.GetAttributes(); - const ONNX_NAMESPACE::AttributeProto* group_attr = &(conv_attributes.find("group")->second); - if (group_attr != nullptr && - group_attr->type() == AttributeProto_AttributeType_INT && - group_attr->has_i() && group_attr->i() != 1) { - continue; - } +Status ConvBNFusion::Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) { + auto& conv_node = node; + const Node& bn_node = *conv_node.OutputNodesBegin(); + + // Get value of attribute epsilon + const onnxruntime::NodeAttributes& attributes = bn_node.GetAttributes(); + const ONNX_NAMESPACE::AttributeProto* attr = &(attributes.find("epsilon")->second); + if (attr == nullptr || attr->type() != AttributeProto_AttributeType_FLOAT) { + return Status::OK(); + } + float epsilon = static_cast(attr->f()); + + // Get initializers of BatchNormalization + const auto& bn_inputs = bn_node.InputDefs(); + const ONNX_NAMESPACE::TensorProto* bn_scale_tensor_proto = nullptr; + graph.GetInitializedTensor(bn_inputs[1]->Name(), bn_scale_tensor_proto); + + const ONNX_NAMESPACE::TensorProto* bn_B_tensor_proto = nullptr; + graph.GetInitializedTensor(bn_inputs[2]->Name(), bn_B_tensor_proto); + + const ONNX_NAMESPACE::TensorProto* bn_mean_tensor_proto = nullptr; + graph.GetInitializedTensor(bn_inputs[3]->Name(), bn_mean_tensor_proto); + + const ONNX_NAMESPACE::TensorProto* bn_var_tensor_proto = nullptr; + graph.GetInitializedTensor(bn_inputs[4]->Name(), bn_var_tensor_proto); + + const auto& conv_inputs = conv_node.InputDefs(); + const ONNX_NAMESPACE::TensorProto* conv_W_tensor_proto = nullptr; + graph.GetInitializedTensor(conv_inputs[1]->Name(), conv_W_tensor_proto); + + // Currently, fusion is only supported for float or double data type. + if (!Initializer::IsSupportedDataType(bn_scale_tensor_proto) || + !Initializer::IsSupportedDataType(bn_B_tensor_proto) || + !Initializer::IsSupportedDataType(bn_mean_tensor_proto) || + !Initializer::IsSupportedDataType(bn_var_tensor_proto) || + !Initializer::IsSupportedDataType(conv_W_tensor_proto) || + bn_scale_tensor_proto->dims_size() != 1 || + bn_B_tensor_proto->dims_size() != 1 || + bn_mean_tensor_proto->dims_size() != 1 || + bn_var_tensor_proto->dims_size() != 1 || + bn_scale_tensor_proto->dims(0) != bn_B_tensor_proto->dims(0) || + bn_B_tensor_proto->dims(0) != bn_mean_tensor_proto->dims(0) || + bn_mean_tensor_proto->dims(0) != bn_var_tensor_proto->dims(0) || + bn_scale_tensor_proto->data_type() != bn_B_tensor_proto->data_type() || + bn_B_tensor_proto->data_type() != bn_mean_tensor_proto->data_type() || + bn_mean_tensor_proto->data_type() != bn_var_tensor_proto->data_type() || + conv_W_tensor_proto->data_type() != bn_scale_tensor_proto->data_type() || + !(conv_W_tensor_proto->dims_size() > 2 && conv_W_tensor_proto->dims(0) == bn_scale_tensor_proto->dims(0))) { + return Status::OK(); + } - // Get value of attribute epsilon - const onnxruntime::NodeAttributes& attributes = bn_node.GetAttributes(); - const ONNX_NAMESPACE::AttributeProto* attr = &(attributes.find("epsilon")->second); - if (attr == nullptr || attr->type() != AttributeProto_AttributeType_FLOAT) { - continue; - } - float epsilon = static_cast(attr->f()); - - // Get initializers of BatchNormalization - const auto& bn_inputs = bn_node.InputDefs(); - const ONNX_NAMESPACE::TensorProto* bn_scale_tensor_proto = nullptr; - graph.GetInitializedTensor(bn_inputs[1]->Name(), bn_scale_tensor_proto); - - const ONNX_NAMESPACE::TensorProto* bn_B_tensor_proto = nullptr; - graph.GetInitializedTensor(bn_inputs[2]->Name(), bn_B_tensor_proto); - - const ONNX_NAMESPACE::TensorProto* bn_mean_tensor_proto = nullptr; - graph.GetInitializedTensor(bn_inputs[3]->Name(), bn_mean_tensor_proto); - - const ONNX_NAMESPACE::TensorProto* bn_var_tensor_proto = nullptr; - graph.GetInitializedTensor(bn_inputs[4]->Name(), bn_var_tensor_proto); - - const auto& conv_inputs = conv_node.InputDefs(); - const ONNX_NAMESPACE::TensorProto* conv_W_tensor_proto = nullptr; - graph.GetInitializedTensor(conv_inputs[1]->Name(), conv_W_tensor_proto); - - // Currently, fusion is only supported for float or double data type. - if (!Initializer::IsSupportedDataType(bn_scale_tensor_proto) || - !Initializer::IsSupportedDataType(bn_B_tensor_proto) || - !Initializer::IsSupportedDataType(bn_mean_tensor_proto) || - !Initializer::IsSupportedDataType(bn_var_tensor_proto) || - !Initializer::IsSupportedDataType(conv_W_tensor_proto) || - bn_scale_tensor_proto->dims_size() != 1 || - bn_B_tensor_proto->dims_size() != 1 || - bn_mean_tensor_proto->dims_size() != 1 || - bn_var_tensor_proto->dims_size() != 1 || - bn_scale_tensor_proto->dims(0) != bn_B_tensor_proto->dims(0) || - bn_B_tensor_proto->dims(0) != bn_mean_tensor_proto->dims(0) || - bn_mean_tensor_proto->dims(0) != bn_var_tensor_proto->dims(0) || - bn_scale_tensor_proto->data_type() != bn_B_tensor_proto->data_type() || - bn_B_tensor_proto->data_type() != bn_mean_tensor_proto->data_type() || - bn_mean_tensor_proto->data_type() != bn_var_tensor_proto->data_type() || - conv_W_tensor_proto->data_type() != bn_scale_tensor_proto->data_type() || - !(conv_W_tensor_proto->dims_size() > 2 && conv_W_tensor_proto->dims(0) == bn_scale_tensor_proto->dims(0))) { - continue; + auto bn_scale = std::make_unique(bn_scale_tensor_proto); + auto bn_B = std::make_unique(bn_B_tensor_proto); + auto bn_mean = std::make_unique(bn_mean_tensor_proto); + auto bn_var = std::make_unique(bn_var_tensor_proto); + auto conv_W = std::make_unique(conv_W_tensor_proto); + + const ONNX_NAMESPACE::TensorProto* conv_B_tensor_proto = nullptr; + std::unique_ptr conv_B = nullptr; + if (conv_inputs.size() == 3) { + if (!graph.GetInitializedTensor(conv_inputs[2]->Name(), conv_B_tensor_proto)) { + return Status::OK(); } - auto bn_scale = std::make_unique(bn_scale_tensor_proto); - auto bn_B = std::make_unique(bn_B_tensor_proto); - auto bn_mean = std::make_unique(bn_mean_tensor_proto); - auto bn_var = std::make_unique(bn_var_tensor_proto); - auto conv_W = std::make_unique(conv_W_tensor_proto); - - const ONNX_NAMESPACE::TensorProto* conv_B_tensor_proto = nullptr; - std::unique_ptr conv_B = nullptr; - if (conv_inputs.size() == 3) { - if (!graph.GetInitializedTensor(conv_inputs[2]->Name(), conv_B_tensor_proto)) - continue; - - if (!Initializer::IsSupportedDataType(conv_B_tensor_proto) || - conv_B_tensor_proto->dims_size() != 1 || - conv_B_tensor_proto->dims(0) != bn_B_tensor_proto->dims(0) || - conv_B_tensor_proto->data_type() != bn_B_tensor_proto->data_type()) { - continue; - } - conv_B = std::make_unique(conv_B_tensor_proto); + if (!Initializer::IsSupportedDataType(conv_B_tensor_proto) || + conv_B_tensor_proto->dims_size() != 1 || + conv_B_tensor_proto->dims(0) != bn_B_tensor_proto->dims(0) || + conv_B_tensor_proto->data_type() != bn_B_tensor_proto->data_type()) { + return Status::OK(); } + conv_B = std::make_unique(conv_B_tensor_proto); + } - // Calculate new value of initializers of conv node - bn_var->add(epsilon); - bn_var->sqrt(); - bn_scale->div(*bn_var); - conv_W->scale_by_axis(*bn_scale, 1); - - if (conv_inputs.size() == 3) { - conv_B->sub(*bn_mean); - conv_B->mul(*bn_scale); - conv_B->add(*bn_B); - } else { - bn_mean->mul(*bn_scale); - bn_B->sub(*bn_mean); - } + // Calculate new value of initializers of conv node + bn_var->add(epsilon); + bn_var->sqrt(); + bn_scale->div(*bn_var); + conv_W->scale_by_axis(*bn_scale, 1); + + if (conv_inputs.size() == 3) { + conv_B->sub(*bn_mean); + conv_B->mul(*bn_scale); + conv_B->add(*bn_B); + } else { + bn_mean->mul(*bn_scale); + bn_B->sub(*bn_mean); + } - // Create new initializers of conv - ONNX_NAMESPACE::TensorProto new_conv_W_tensor_proto(*conv_W_tensor_proto); - conv_W->ToProto(&new_conv_W_tensor_proto); - - ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto; - NodeArg* bn_B_node_arg = nullptr; - if (conv_inputs.size() == 3) { - conv_B->ToProto(&new_conv_B_tensor_proto); - } else { - bn_B->ToProto(&new_conv_B_tensor_proto); - bn_B_node_arg = graph.GetNodeArg(bn_B_tensor_proto->name()); - if (bn_B_node_arg == nullptr) { - continue; - } + // Create new initializers of conv + ONNX_NAMESPACE::TensorProto new_conv_W_tensor_proto(*conv_W_tensor_proto); + conv_W->ToProto(&new_conv_W_tensor_proto); + + ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto; + NodeArg* bn_B_node_arg = nullptr; + if (conv_inputs.size() == 3) { + conv_B->ToProto(&new_conv_B_tensor_proto); + } else { + bn_B->ToProto(&new_conv_B_tensor_proto); + bn_B_node_arg = graph.GetNodeArg(bn_B_tensor_proto->name()); + if (bn_B_node_arg == nullptr) { + return Status::OK(); } + } - // Replace initializers of conv node - graph.RemoveInitializedTensor(conv_W_tensor_proto->name()); - if (conv_inputs.size() == 3) { + // Replace initializers of conv node + graph.RemoveInitializedTensor(conv_W_tensor_proto->name()); + if (conv_inputs.size() == 3) { #ifdef _MSC_VER #pragma warning(push) #pragma warning(disable : 6011) // Not deferencing null pointer. conv_B_tensor_proto is set on line 93 #endif - graph.RemoveInitializedTensor(conv_B_tensor_proto->name()); + graph.RemoveInitializedTensor(conv_B_tensor_proto->name()); #ifdef _MSC_VER #pragma warning(pop) #endif - } else { - graph.RemoveInitializedTensor(bn_B_tensor_proto->name()); - conv_node.MutableInputDefs().push_back(bn_B_node_arg); - conv_node.MutableInputArgsCount()[2] = 1; - } - graph.AddInitializedTensor(new_conv_W_tensor_proto); - graph.AddInitializedTensor(new_conv_B_tensor_proto); - - // Replace the input of the nodes following batch normalization node - const NodeArg* bn_output_def = bn_node.OutputDefs()[0]; - NodeArg* conv_output_def = conv_node.MutableOutputDefs()[0]; - for (auto it = bn_node.OutputNodesBegin(); it != bn_node.OutputNodesEnd(); ++it) { - auto output_node = graph.GetNode((*it).Index()); - if (!output_node) { - return Status(ONNXRUNTIME, INVALID_ARGUMENT); - } - - auto& input_defs = output_node->MutableInputDefs(); - for (auto& def : input_defs) { - if (def == bn_output_def) { - def = conv_output_def; - } - } - } - removed_nodes.push_back(bn_node.Index()); + } else { + graph.RemoveInitializedTensor(bn_B_tensor_proto->name()); + conv_node.MutableInputDefs().push_back(bn_B_node_arg); + conv_node.MutableInputArgsCount()[2] = 1; } + graph.AddInitializedTensor(new_conv_W_tensor_proto); + graph.AddInitializedTensor(new_conv_B_tensor_proto); - for (auto i : removed_nodes) { - graph.RemoveNode(i); + // Remove BN node. + auto* bn_node_to_remove = graph.GetNode(bn_node.Index()); + if (graph_utils::RemoveNode(graph, *bn_node_to_remove)) { + rule_effect = RewriteRuleEffect::kModifiedRestOfGraph; } - if (!removed_nodes.empty()) { - modified = true; + return Status::OK(); +} + +bool ConvBNFusion::SatisfyCondition(const Graph& graph, const Node& node) { + if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Conv", {1}) || + node.GetOutputEdgesCount() != 1) { + return false; } - return Status::OK(); + const auto& next_node = *node.OutputNodesBegin(); + return !(!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "BatchNormalization", {7, 9}) || + next_node.GetInputEdgesCount() != 1 || graph.IsNodeOutputsInGraphOutputs(next_node) || + next_node.GetExecutionProviderType() != node.GetExecutionProviderType()); } } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/conv_bn_fusion.h b/onnxruntime/core/optimizer/conv_bn_fusion.h index b605956f1cbde..e23095bfdf49c 100644 --- a/onnxruntime/core/optimizer/conv_bn_fusion.h +++ b/onnxruntime/core/optimizer/conv_bn_fusion.h @@ -3,15 +3,29 @@ #pragma once -#include "core/optimizer/graph_transformer.h" +#include "core/optimizer/rewrite_rule.h" namespace onnxruntime { -class ConvBNFusion : public onnxruntime::GraphTransformer { +/** +@Class ConvBNFusion + +Rewrite rule that fuses two Conv+BN nodes to a single Conv node. + +It is attempted to be triggered only on nodes with op type "Conv". +*/ +class ConvBNFusion : public RewriteRule { public: - ConvBNFusion() noexcept : onnxruntime::GraphTransformer("ConvBNFusion") {} + ConvBNFusion() noexcept : RewriteRule("ConvBNFusion") {} + + std::vector TargetOpTypes() const noexcept override { + return {"Conv"}; + } private: - Status ApplyImpl(onnxruntime::Graph& graph, bool& modified, int graph_level) const override; + bool SatisfyCondition(const Graph& graph, const Node& node) override; + + Status Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) override; }; + } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/conv_mul_fusion.cc b/onnxruntime/core/optimizer/conv_mul_fusion.cc index dc67fd90e8fe3..dd27f0357ff39 100644 --- a/onnxruntime/core/optimizer/conv_mul_fusion.cc +++ b/onnxruntime/core/optimizer/conv_mul_fusion.cc @@ -9,135 +9,112 @@ using namespace ONNX_NAMESPACE; using namespace ::onnxruntime::common; namespace onnxruntime { -Status ConvMulFusion::ApplyImpl(Graph& graph, bool& modified, int graph_level) const { - std::vector removed_nodes; - for (auto& node : graph.Nodes()) { - ORT_RETURN_IF_ERROR(Recurse(node, modified, graph_level)); - - if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Conv", 1) || - !graph_utils::IsSupportedProvider(node, GetCompatibleExecutionProviders()) || - node.GetOutputEdgesCount() != 1) { - continue; - } - - const Node& next_node = *node.OutputNodesBegin(); - if (!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Mul", 7) || - next_node.GetInputEdgesCount() != 1 || - graph.IsNodeOutputsInGraphOutputs(next_node) || - next_node.GetExecutionProviderType() != node.GetExecutionProviderType()) { - continue; - } - - auto& conv_node = node; - const Node& mul_node = next_node; - - const auto& conv_inputs = conv_node.InputDefs(); - const auto& mul_inputs = mul_node.InputDefs(); - - const ONNX_NAMESPACE::TensorProto* conv_W_tensor_proto = nullptr; - graph.GetInitializedTensor(conv_inputs[1]->Name(), conv_W_tensor_proto); - - const ONNX_NAMESPACE::TensorProto* mul_B_tensor_proto = nullptr; - graph.GetInitializedTensor(mul_inputs[1]->Name(), mul_B_tensor_proto); +Status ConvMulFusion::Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) { + auto& conv_node = node; + const auto& mul_node = *conv_node.OutputNodesBegin(); + const auto& conv_inputs = conv_node.InputDefs(); + const auto& mul_inputs = mul_node.InputDefs(); + + const ONNX_NAMESPACE::TensorProto* conv_W_tensor_proto = nullptr; + graph.GetInitializedTensor(conv_inputs[1]->Name(), conv_W_tensor_proto); + + const ONNX_NAMESPACE::TensorProto* mul_B_tensor_proto = nullptr; + graph.GetInitializedTensor(mul_inputs[1]->Name(), mul_B_tensor_proto); + + if (!Initializer::IsSupportedDataType(conv_W_tensor_proto) || + !Initializer::IsSupportedDataType(mul_B_tensor_proto) || + conv_W_tensor_proto->data_type() != mul_B_tensor_proto->data_type() || + conv_W_tensor_proto->dims_size() < 4) { + return Status::OK(); + } - if (!Initializer::IsSupportedDataType(conv_W_tensor_proto) || - !Initializer::IsSupportedDataType(mul_B_tensor_proto) || - conv_W_tensor_proto->data_type() != mul_B_tensor_proto->data_type() || - conv_W_tensor_proto->dims_size() < 4 || - !(mul_B_tensor_proto->dims_size() == 0 || - (mul_B_tensor_proto->dims_size() == conv_W_tensor_proto->dims_size() - 1 && - conv_W_tensor_proto->dims(0) == mul_B_tensor_proto->dims(0)))) { - continue; + if (mul_B_tensor_proto->dims_size() != 0) { + int axis; + if (mul_B_tensor_proto->dims_size() == conv_W_tensor_proto->dims_size()) { + // Test for broadcast multiply such as 1xCx1x1 for a 2D convolution. + axis = 1; + } else if (mul_B_tensor_proto->dims_size() == conv_W_tensor_proto->dims_size() - 1) { + // Test for broadcast multiply such as Cx1x1 for a 2D convolution. + axis = 0; + } else { + return Status::OK(); } - - // The dimensions of mul_B should be equal to 1 except first dimension. - if (mul_B_tensor_proto->dims_size() != 0) { - bool flag = false; - for (int i = 1; i < mul_B_tensor_proto->dims_size(); i++) { - if (mul_B_tensor_proto->dims(i) != 1) { - flag = true; - break; - } - } - - if (flag) { - continue; - } + if (mul_B_tensor_proto->dims(axis) != conv_W_tensor_proto->dims(0)) { + return Status::OK(); } - auto conv_W = std::make_unique(conv_W_tensor_proto); - auto mul_B = std::make_unique(mul_B_tensor_proto); - - const ONNX_NAMESPACE::TensorProto* conv_B_tensor_proto = nullptr; - std::unique_ptr conv_B = nullptr; - const bool is_3d = conv_inputs.size() == 3; - if (is_3d) { - if (!graph.GetInitializedTensor(conv_inputs[2]->Name(), conv_B_tensor_proto)) - continue; - if (conv_B_tensor_proto == nullptr) - return Status(ONNXRUNTIME, FAIL, "Internal error in ConvMulFusion. conv_B_tensor_proto is NULL"); - if (!Initializer::IsSupportedDataType(conv_B_tensor_proto) || - conv_B_tensor_proto->data_type() != mul_B_tensor_proto->data_type() || - conv_B_tensor_proto->dims_size() != 1 || (mul_B_tensor_proto->dims_size() != 0 && conv_B_tensor_proto->dims(0) != mul_B_tensor_proto->dims(0))) { - continue; + // The dimensions of mul_B should be equal to 1 except axis dimension. + for (int i = 0; i < mul_B_tensor_proto->dims_size(); i++) { + if (i != axis && mul_B_tensor_proto->dims(i) != 1) { + return Status::OK(); } - conv_B = std::make_unique(conv_B_tensor_proto); } + } - // Calculate new value of initializers of conv node - conv_W->scale_by_axis(*mul_B, 1); - - if (conv_inputs.size() == 3) { - if (mul_B_tensor_proto->dims_size() != 0) { - conv_B->mul(*mul_B); - } else { - conv_B->scale_by_axis(*mul_B, 0); - } + auto conv_W = std::make_unique(conv_W_tensor_proto); + auto mul_B = std::make_unique(mul_B_tensor_proto); + + const ONNX_NAMESPACE::TensorProto* conv_B_tensor_proto = nullptr; + std::unique_ptr conv_B = nullptr; + const bool is_3d = conv_inputs.size() == 3; + if (is_3d) { + if (!graph.GetInitializedTensor(conv_inputs[2]->Name(), conv_B_tensor_proto)) + return Status::OK(); + if (conv_B_tensor_proto == nullptr) + return Status(ONNXRUNTIME, FAIL, "Internal error in ConvMulFusion. conv_B_tensor_proto is NULL"); + if (!Initializer::IsSupportedDataType(conv_B_tensor_proto) || + conv_B_tensor_proto->data_type() != mul_B_tensor_proto->data_type() || + conv_B_tensor_proto->dims_size() != 1 || + conv_B_tensor_proto->dims(0) != conv_W_tensor_proto->dims(0)) { + return Status::OK(); } + conv_B = std::make_unique(conv_B_tensor_proto); + } - // Create new initializers of conv - ONNX_NAMESPACE::TensorProto new_conv_W_tensor_proto(*conv_W_tensor_proto); - conv_W->ToProto(&new_conv_W_tensor_proto); - - // Replace initializers of conv node - graph.RemoveInitializedTensor(conv_inputs[1]->Name()); - graph.AddInitializedTensor(new_conv_W_tensor_proto); + // Calculate new value of initializers of conv node + conv_W->scale_by_axis(*mul_B, 1); - if (is_3d) { - ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto(*conv_B_tensor_proto); - conv_B->ToProto(&new_conv_B_tensor_proto); - graph.RemoveInitializedTensor(conv_inputs[2]->Name()); - graph.AddInitializedTensor(new_conv_B_tensor_proto); + if (conv_inputs.size() == 3) { + if (mul_B_tensor_proto->dims_size() != 0) { + conv_B->mul(*mul_B); + } else { + conv_B->scale_by_axis(*mul_B, 0); } + } - // Replace the input of the node following mul node - const NodeArg* mul_output_def = mul_node.OutputDefs()[0]; - NodeArg* conv_output_def = conv_node.MutableOutputDefs()[0]; - for (auto it = mul_node.OutputNodesBegin(); it != mul_node.OutputNodesEnd(); ++it) { - auto output_node = graph.GetNode((*it).Index()); - if (!output_node) { - return Status(ONNXRUNTIME, INVALID_ARGUMENT); - } + // Create new initializers of conv + ONNX_NAMESPACE::TensorProto new_conv_W_tensor_proto(*conv_W_tensor_proto); + conv_W->ToProto(&new_conv_W_tensor_proto); - auto& input_defs = output_node->MutableInputDefs(); - for (auto& def : input_defs) { - if (def == mul_output_def) { - def = conv_output_def; - } - } - } + // Replace initializers of conv node + graph.RemoveInitializedTensor(conv_inputs[1]->Name()); + graph.AddInitializedTensor(new_conv_W_tensor_proto); - removed_nodes.push_back(mul_node.Index()); + if (is_3d) { + ONNX_NAMESPACE::TensorProto new_conv_B_tensor_proto(*conv_B_tensor_proto); + conv_B->ToProto(&new_conv_B_tensor_proto); + graph.RemoveInitializedTensor(conv_inputs[2]->Name()); + graph.AddInitializedTensor(new_conv_B_tensor_proto); } - for (auto i : removed_nodes) { - graph.RemoveNode(i); + // Remove Mul node. + auto* mul_node_to_remove = graph.GetNode(mul_node.Index()); + if (graph_utils::RemoveNode(graph, *mul_node_to_remove)) { + rule_effect = RewriteRuleEffect::kModifiedRestOfGraph; } - if (!removed_nodes.empty()) { - modified = true; + return Status::OK(); +} + +bool ConvMulFusion::SatisfyCondition(const Graph& graph, const Node& node) { + if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Conv", {1}) || + node.GetOutputEdgesCount() != 1) { + return false; } - return Status::OK(); + const auto& next_node = *node.OutputNodesBegin(); + return !(!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Mul", {7}) || + next_node.GetInputEdgesCount() != 1 || graph.IsNodeOutputsInGraphOutputs(next_node) || + next_node.GetExecutionProviderType() != node.GetExecutionProviderType()); } } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/conv_mul_fusion.h b/onnxruntime/core/optimizer/conv_mul_fusion.h index 6452360bd186f..62a39b624570a 100644 --- a/onnxruntime/core/optimizer/conv_mul_fusion.h +++ b/onnxruntime/core/optimizer/conv_mul_fusion.h @@ -2,16 +2,29 @@ // Licensed under the MIT License. #pragma once -#include "core/optimizer/graph_transformer.h" +#include "core/optimizer/rewrite_rule.h" namespace onnxruntime { -class ConvMulFusion : public onnxruntime::GraphTransformer { +/** +@Class ConvMulFusion + +Rewrite rule that fuses two Conv+Mul nodes to a single Conv node. + +It is attempted to be triggered only on nodes with op type "Conv". +*/ +class ConvMulFusion : public RewriteRule { public: - ConvMulFusion() noexcept : onnxruntime::GraphTransformer("ConvMulFusion") {} + ConvMulFusion() noexcept : RewriteRule("ConvMulFusion") {} + + std::vector TargetOpTypes() const noexcept override { + return {"Conv"}; + } private: - Status ApplyImpl(Graph& graph, bool& modified, int graph_level) const override; + bool SatisfyCondition(const Graph& graph, const Node& node) override; + + Status Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) override; }; } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/gemm_activation_fusion.cc b/onnxruntime/core/optimizer/gemm_activation_fusion.cc index 0c37d8527aa01..7af176366e96a 100644 --- a/onnxruntime/core/optimizer/gemm_activation_fusion.cc +++ b/onnxruntime/core/optimizer/gemm_activation_fusion.cc @@ -12,10 +12,10 @@ namespace onnxruntime { namespace { bool IsFusableActivation(const Node& node) { - return graph_utils::IsSupportedOptypeVersionAndDomain(node, "LeakyRelu", 6) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "Relu", 6) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "Sigmoid", 6) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "Tanh", 6); + return graph_utils::IsSupportedOptypeVersionAndDomain(node, "LeakyRelu", {6}) || + graph_utils::IsSupportedOptypeVersionAndDomain(node, "Relu", {6}) || + graph_utils::IsSupportedOptypeVersionAndDomain(node, "Sigmoid", {6}) || + graph_utils::IsSupportedOptypeVersionAndDomain(node, "Tanh", {6}); } void HandleActivationNodeEdges(Graph& g, const Node& act, Node& fused_gemm) { @@ -46,8 +46,7 @@ Status GemmActivationFusion::ApplyImpl(Graph& graph, bool& modified, int graph_l auto& node = *graph.GetNode(index); ORT_RETURN_IF_ERROR(Recurse(node, modified, graph_level)); - if (!(graph_utils::IsSupportedOptypeVersionAndDomain(node, "Gemm", 7) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "Gemm", 9)) || + if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Gemm", {7, 9}) || !graph_utils::IsSupportedProvider(node, GetCompatibleExecutionProviders()) || node.GetOutputEdgesCount() != 1) { continue; @@ -78,9 +77,9 @@ Status GemmActivationFusion::ApplyImpl(Graph& graph, bool& modified, int graph_l //Add optional attributes for activations if (act_node.OpType() == "LeakyRelu") { - const NodeAttributes attrs = act_node.GetAttributes(); - for (auto it = attrs.begin(); it != attrs.end(); ++it) { - fused_gemm.AddAttribute("leaky_relu_" + it->first, it->second); + const NodeAttributes& attrs = act_node.GetAttributes(); + for (const auto& attr : attrs) { + fused_gemm.AddAttribute("leaky_relu_" + attr.first, attr.second); } } diff --git a/onnxruntime/core/optimizer/graph_transformer_mgr.h b/onnxruntime/core/optimizer/graph_transformer_mgr.h index aaf3667d9ba1d..676904271615d 100644 --- a/onnxruntime/core/optimizer/graph_transformer_mgr.h +++ b/onnxruntime/core/optimizer/graph_transformer_mgr.h @@ -6,7 +6,6 @@ #include "core/optimizer/graph_transformer.h" #include "core/optimizer/constant_folding.h" #include "core/optimizer/rewrite_rule.h" -using namespace ::onnxruntime::common; namespace onnxruntime { @@ -17,7 +16,7 @@ class GraphTransformerManager { explicit GraphTransformerManager(unsigned steps) : steps_(steps) { } - // Register a transformer with a level and compatible providers list + // Register a transformer with a level. common::Status Register(std::unique_ptr transformer, TransformerLevel level); // Apply all transformers registered for the given level on the given graph diff --git a/onnxruntime/core/optimizer/graph_transformer_utils.cc b/onnxruntime/core/optimizer/graph_transformer_utils.cc index 23f971800429a..c8b37a2a2b71f 100644 --- a/onnxruntime/core/optimizer/graph_transformer_utils.cc +++ b/onnxruntime/core/optimizer/graph_transformer_utils.cc @@ -30,6 +30,9 @@ std::vector> GenerateRewriteRules(TransformerLevel break; case TransformerLevel::Level2: + rules.push_back(std::make_unique()); + rules.push_back(std::make_unique()); + rules.push_back(std::make_unique()); break; default: ORT_ENFORCE(false, "Unsupported level" + std::to_string(static_cast(level))); @@ -93,9 +96,6 @@ std::vector> GenerateTransformers(TransformerL transformers.emplace_back(std::make_unique(l2_execution_providers)); transformers.emplace_back(std::make_unique(l2_execution_providers)); #endif - transformers.emplace_back(std::make_unique()); - transformers.emplace_back(std::make_unique()); - transformers.emplace_back(std::make_unique()); } break; default: @@ -110,8 +110,7 @@ std::vector> GenerateTransformers(TransformerL transformers.emplace_back(std::move(rule_transformer)); } return transformers; - - } else { + } std::vector> filtered_list; // If the rule-based transformer is not empty, it should be included in the custom transformer list below. if (rule_transformer != nullptr) { @@ -127,7 +126,6 @@ std::vector> GenerateTransformers(TransformerL }); } return filtered_list; - } } } // namespace transformer_utils diff --git a/onnxruntime/core/optimizer/identity_elimination.cc b/onnxruntime/core/optimizer/identity_elimination.cc index 1ba4eb64c3aa6..236b98f5887d5 100644 --- a/onnxruntime/core/optimizer/identity_elimination.cc +++ b/onnxruntime/core/optimizer/identity_elimination.cc @@ -10,9 +10,9 @@ namespace onnxruntime { -Status EliminateIdentity::Apply(Graph& graph, Node& node, bool& modified, bool& deleted) { - if (graph_utils::RemoveSingleInputNode(graph, node)) { - modified = deleted = true; +Status EliminateIdentity::Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) { + if (graph_utils::RemoveNode(graph, node)) { + rule_effect = RewriteRuleEffect::kRemovedCurrentNode; } return Status::OK(); diff --git a/onnxruntime/core/optimizer/identity_elimination.h b/onnxruntime/core/optimizer/identity_elimination.h index 5f9d56dbe2db6..b90d2164e01d8 100644 --- a/onnxruntime/core/optimizer/identity_elimination.h +++ b/onnxruntime/core/optimizer/identity_elimination.h @@ -25,7 +25,7 @@ class EliminateIdentity : public RewriteRule { private: bool SatisfyCondition(const Graph& graph, const Node& node) override; - Status Apply(Graph& graph, Node& node, bool& modified, bool& deleted) override; + Status Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) override; }; // namespace onnxruntime } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/insert_cast_transformer.cc b/onnxruntime/core/optimizer/insert_cast_transformer.cc index 12147acaf4125..2f9e6fbfd6901 100644 --- a/onnxruntime/core/optimizer/insert_cast_transformer.cc +++ b/onnxruntime/core/optimizer/insert_cast_transformer.cc @@ -164,7 +164,8 @@ Status InsertCastTransformer::ApplyImpl(onnxruntime::Graph& graph, bool& modifie GraphViewer graph_viewer(graph); auto& order = graph_viewer.GetNodesInTopologicalOrder(); - TypeProto float_16_tensor_proto, float_tensor_proto; + TypeProto float_16_tensor_proto; + TypeProto float_tensor_proto; float_16_tensor_proto.mutable_tensor_type()->set_elem_type(TensorProto_DataType_FLOAT16); float_tensor_proto.mutable_tensor_type()->set_elem_type(TensorProto_DataType_FLOAT); IdGenerator id_generator; diff --git a/onnxruntime/core/optimizer/matmul_add_fusion.cc b/onnxruntime/core/optimizer/matmul_add_fusion.cc index 4cfcb11d6053e..a0c392ef75354 100644 --- a/onnxruntime/core/optimizer/matmul_add_fusion.cc +++ b/onnxruntime/core/optimizer/matmul_add_fusion.cc @@ -19,8 +19,7 @@ Status MatMulAddFusion::ApplyImpl(Graph& graph, bool& modified, int graph_level) auto& node = *graph.GetNode(node_index); ORT_RETURN_IF_ERROR(Recurse(node, modified, graph_level)); - if (!(graph_utils::IsSupportedOptypeVersionAndDomain(node, "MatMul", 1) || - graph_utils::IsSupportedOptypeVersionAndDomain(node, "MatMul", 9)) || + if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "MatMul", {1, 9}) || !graph_utils::IsSupportedProvider(node, GetCompatibleExecutionProviders()) || node.GetOutputEdgesCount() != 1) { continue; @@ -32,14 +31,15 @@ Status MatMulAddFusion::ApplyImpl(Graph& graph, bool& modified, int graph_level) } const Node& next_node = (*next_node_itr); - if (!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Add", 7) || + if (!graph_utils::IsSupportedOptypeVersionAndDomain(next_node, "Add", {7}) || next_node.GetExecutionProviderType() != node.GetExecutionProviderType()) { continue; } Node& matmul_node = node; Node& add_node = const_cast(next_node); - std::vector input_args, output_args; + std::vector input_args; + std::vector output_args; auto matmul_input_defs = matmul_node.MutableInputDefs(); auto add_input_defs = add_node.MutableInputDefs(); @@ -55,7 +55,8 @@ Status MatMulAddFusion::ApplyImpl(Graph& graph, bool& modified, int graph_level) auto matmul_b_shape = matmul_input_defs[1]->Shape(); if (nullptr == matmul_a_shape || nullptr == matmul_b_shape) { continue; - } else if (1 == matmul_a_shape->dim_size() && 2 == matmul_b_shape->dim_size()) { + } + if (1 == matmul_a_shape->dim_size() && 2 == matmul_b_shape->dim_size()) { // MatMul has shape [K] * [K, N], reset it to [1, K] * [K, N], so that it can work for Gemm auto mutable_matmul_a_shape = const_cast(matmul_a_shape); auto dim_0 = mutable_matmul_a_shape->mutable_dim(0); @@ -100,8 +101,8 @@ Status MatMulAddFusion::ApplyImpl(Graph& graph, bool& modified, int graph_level) } // Have to remove node in reversed order for now to walk around the issue in RemoveNode - for (auto it = removed_nodes.begin(); it != removed_nodes.end(); ++it) { - graph.RemoveNode(*it); + for (onnxruntime::NodeIndex removed_node : removed_nodes) { + graph.RemoveNode(removed_node); } if (!removed_nodes.empty()) { diff --git a/onnxruntime/core/optimizer/optimizer_execution_frame.cc b/onnxruntime/core/optimizer/optimizer_execution_frame.cc index 4350cb7073ca8..d89782e0b62e0 100644 --- a/onnxruntime/core/optimizer/optimizer_execution_frame.cc +++ b/onnxruntime/core/optimizer/optimizer_execution_frame.cc @@ -24,24 +24,24 @@ OptimizerExecutionFrame::Info::Info(const std::vector& nodes, // Create MLValues related maps auto initialize_maps = [this, &initialized_tensor_set](const NodeArg& arg, size_t /*index*/) -> Status { - int idx = mlvalue_name_idx_map_.Add(arg.Name()); - mlvalue_idx_nodearg_map_[idx] = &arg; + int idx = ort_value_name_idx_map_.Add(arg.Name()); + ort_value_idx_nodearg_map_[idx] = &arg; - // Only create MLValue instances for initializers used by an array of nodes. + // Only create OrtValue instances for initializers used by an array of nodes. InitializedTensorSet::const_iterator it = initialized_tensor_set.find(arg.Name()); if (it != initialized_tensor_set.cend()) { const auto& tensor_proto = *(it->second); size_t cpu_tensor_length; ORT_RETURN_IF_ERROR(utils::GetSizeInBytesFromTensorProto<0>(tensor_proto, &cpu_tensor_length)); - MLValue mlvalue; - OrtAllocatorInfo info("Cpu", OrtDeviceAllocator, 0, OrtMemTypeDefault); + OrtValue ort_value; + const OrtAllocatorInfo& info = cpu_execution_provider_->GetAllocator(0, OrtMemTypeDefault)->Info(); std::unique_ptr data(new char[cpu_tensor_length]); std::unique_ptr p_tensor; OrtCallback d; - ORT_RETURN_IF_ERROR(utils::TensorProtoToMLValue( - Env::Default(), nullptr, tensor_proto, MemBuffer(data.get(), cpu_tensor_length, info), mlvalue, d)); + ORT_RETURN_IF_ERROR(utils::TensorProtoToMLValue(Env::Default(), nullptr, tensor_proto, + MemBuffer(data.get(), cpu_tensor_length, info), ort_value, d)); - initializers_[idx] = mlvalue; + initializers_[idx] = ort_value; buffer_for_initialized_tensors_[idx] = std::move(data); if (d.f != nullptr) deleter_for_initialized_tensors_[idx] = d; @@ -56,18 +56,14 @@ OptimizerExecutionFrame::Info::Info(const std::vector& nodes, onnxruntime::Node::ForEachWithIndex(node->OutputDefs(), initialize_maps); } - node_index_info_ = std::make_unique(nodes, mlvalue_name_idx_map_); + node_index_info_ = std::make_unique(nodes, ort_value_name_idx_map_); // create kernels for these nodes for (auto* node : nodes) { std::unique_ptr op_kernel; std::shared_ptr kernel_registry = cpu_execution_provider_->GetKernelRegistry(); - auto status = kernel_registry->TryCreateKernel(*node, - *cpu_execution_provider_, - initializers_, - mlvalue_name_idx_map_, - FuncManager(), - op_kernel); + auto status = kernel_registry->TryCreateKernel(*node, *cpu_execution_provider_, initializers_, + ort_value_name_idx_map_, FuncManager(), op_kernel); kernels_[node->Index()] = std::move(op_kernel); } } @@ -82,17 +78,10 @@ const OpKernel* OptimizerExecutionFrame::Info::GetKernel(NodeIndex node_id) cons // For optimizer, probably no need to pass feed_mlvalue_idxs, feeds to initialize IExecutionFrame. // If needed, the parameters of OptimizerExecutionFrame ctor can be changed later. -OptimizerExecutionFrame::OptimizerExecutionFrame(const Info& info, - const std::vector& fetch_mlvalue_idxs) - : IExecutionFrame(std::vector(), - std::vector(), - info.GetInitializers(), - fetch_mlvalue_idxs, - std::vector(), - info.GetMLValueNameIdxMap(), - info.GetNodeIndexInfo()), - info_(info) { -} +OptimizerExecutionFrame::OptimizerExecutionFrame(const Info& info, const std::vector& fetch_mlvalue_idxs) + : IExecutionFrame(std::vector(), std::vector(), info.GetInitializers(), fetch_mlvalue_idxs, + std::vector(), info.GetMLValueNameIdxMap(), info.GetNodeIndexInfo()), + info_(info) {} AllocatorPtr OptimizerExecutionFrame::GetAllocatorImpl(const OrtAllocatorInfo& info) const { return info_.GetAllocator(info); @@ -100,17 +89,16 @@ AllocatorPtr OptimizerExecutionFrame::GetAllocatorImpl(const OrtAllocatorInfo& i // This method is not thread safe! // Return S_OK and nullptr if index map to an value that is an unused optional input/output -Status OptimizerExecutionFrame::CreateNodeOutputMLValueImpl(MLValue& mlvalue, int mlvalue_idx, const TensorShape* shape) { - const DataTypeImpl* ml_type = utils::GetMLDataType(*(info_.GetMLValueIdxNodeArgMap().at(mlvalue_idx))); +Status OptimizerExecutionFrame::CreateNodeOutputMLValueImpl(OrtValue& ort_value, int ort_value_idx, + const TensorShape* shape) { + const DataTypeImpl* ml_type = utils::GetMLDataType(*(info_.GetMLValueIdxNodeArgMap().at(ort_value_idx))); if (ml_type == nullptr) return Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT, - "Tried to allocate without valid type information, mlvalue index=" + std::to_string(mlvalue_idx)); + "Tried to allocate without valid type information, ort_value index=" + std::to_string(ort_value_idx)); if (!ml_type->IsTensorType()) { const NonTensorTypeBase* non_tensor_type = static_cast(ml_type); auto creator = non_tensor_type->GetCreateFunc(); - mlvalue.Init(creator(), - non_tensor_type, - non_tensor_type->GetDeleteFunc()); + ort_value.Init(creator(), non_tensor_type, non_tensor_type->GetDeleteFunc()); return Status::OK(); } @@ -121,11 +109,9 @@ Status OptimizerExecutionFrame::CreateNodeOutputMLValueImpl(MLValue& mlvalue, in *shape, allocator_ptr); - mlvalue.Init(p_tensor.release(), - DataTypeImpl::GetType(), - DataTypeImpl::GetType()->GetDeleteFunc()); + ort_value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); return Status::OK(); } -} // namespace onnxruntime \ No newline at end of file +} // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/optimizer_execution_frame.h b/onnxruntime/core/optimizer/optimizer_execution_frame.h index 6a65bc36b1ae2..6ac9860ff0ddf 100644 --- a/onnxruntime/core/optimizer/optimizer_execution_frame.h +++ b/onnxruntime/core/optimizer/optimizer_execution_frame.h @@ -8,7 +8,7 @@ #include "core/graph/graph.h" #include "core/providers/cpu/cpu_execution_provider.h" #include "core/framework/execution_frame.h" -#include "core/framework/mlvalue_name_idx_map.h" +#include "core/framework/ort_value_name_idx_map.h" #include "core/framework/ml_value.h" #include "core/common/callback.h" @@ -33,13 +33,15 @@ class OptimizerExecutionFrame final : public IExecutionFrame { return allocator_ptr_; } - const MLValueNameIdxMap& GetMLValueNameIdxMap() const noexcept { return mlvalue_name_idx_map_; } - const std::unordered_map& GetMLValueIdxNodeArgMap() const noexcept { return mlvalue_idx_nodearg_map_; } - const std::unordered_map& GetInitializers() const noexcept { return initializers_; } + const MLValueNameIdxMap& GetMLValueNameIdxMap() const noexcept { return ort_value_name_idx_map_; } + const std::unordered_map& GetMLValueIdxNodeArgMap() const noexcept { + return ort_value_idx_nodearg_map_; + } + const std::unordered_map& GetInitializers() const noexcept { return initializers_; } const NodeIndexInfo& GetNodeIndexInfo() const { return *node_index_info_; } int GetMLValueIndex(const std::string& name) const { int index = -1; - if (mlvalue_name_idx_map_.GetIdx(name, index) == Status::OK()) { + if (ort_value_name_idx_map_.GetIdx(name, index) == Status::OK()) { return index; } return -1; @@ -55,9 +57,9 @@ class OptimizerExecutionFrame final : public IExecutionFrame { AllocatorPtr allocator_ptr_; // MLValues for optimizer - MLValueNameIdxMap mlvalue_name_idx_map_; - std::unordered_map mlvalue_idx_nodearg_map_; - std::unordered_map initializers_; + MLValueNameIdxMap ort_value_name_idx_map_; + std::unordered_map ort_value_idx_nodearg_map_; + std::unordered_map initializers_; std::unordered_map> buffer_for_initialized_tensors_; // This data structure is for unintializing string tensors and // munmap memory region and close file descriptor @@ -71,13 +73,13 @@ class OptimizerExecutionFrame final : public IExecutionFrame { OptimizerExecutionFrame(const Info& info, const std::vector& fetch_mlvalue_idxs); - ~OptimizerExecutionFrame() = default; + ~OptimizerExecutionFrame() override = default; private: ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(OptimizerExecutionFrame); AllocatorPtr GetAllocatorImpl(const OrtAllocatorInfo& info) const override; - Status CreateNodeOutputMLValueImpl(MLValue& mlvalue, int mlvalue_idx, const TensorShape* shape) override; + Status CreateNodeOutputMLValueImpl(OrtValue& ort_value, int ort_value_idx, const TensorShape* shape) override; const Info& info_; }; diff --git a/onnxruntime/core/optimizer/rule_based_graph_transformer.cc b/onnxruntime/core/optimizer/rule_based_graph_transformer.cc index e8b65a0a21417..7bafbed87b119 100644 --- a/onnxruntime/core/optimizer/rule_based_graph_transformer.cc +++ b/onnxruntime/core/optimizer/rule_based_graph_transformer.cc @@ -3,6 +3,7 @@ #include "core/optimizer/rule_based_graph_transformer.h" #include "core/graph/graph_utils.h" +#include "core/optimizer/rewrite_rule.h" using namespace ::onnxruntime::common; @@ -22,11 +23,11 @@ Status RuleBasedGraphTransformer::Register(std::unique_ptr rule) { Status RuleBasedGraphTransformer::ApplyRulesOnNode(Graph& graph, Node& node, const std::vector>& rules, - bool& modified, bool& deleted) const { + RuleEffect& rule_effect) const { for (const auto& rule : rules) { - ORT_RETURN_IF_ERROR(rule->CheckConditionAndApply(graph, node, modified, deleted)); - if (deleted) { - modified = true; // should be set by rewriter but in case it wasn't... + ORT_RETURN_IF_ERROR(rule->CheckConditionAndApply(graph, node, rule_effect)); + // If the current node was removed as a result of a rule, stop rule application for that node. + if (rule_effect == RuleEffect::kRemovedCurrentNode) { break; } } @@ -39,10 +40,15 @@ Status RuleBasedGraphTransformer::ApplyImpl(Graph& graph, bool& modified, int gr for (NodeIndex i : order) { auto* node = graph.GetNode(i); + // A node might not be found as it might have already been deleted from one of the rules. if (!node) { - return Status(ONNXRUNTIME, INVALID_ARGUMENT); + continue; } + // Initialize the effect of rules on this node to denote that the graph has not yet been modified + // by the rule application on the current node. + auto rule_effect = RuleEffect::kNone; + if (!graph_utils::IsSupportedProvider(*node, GetCompatibleExecutionProviders())) { continue; } @@ -50,22 +56,26 @@ Status RuleBasedGraphTransformer::ApplyImpl(Graph& graph, bool& modified, int gr // First apply rewrite rules that are registered for the op type of the current node; then apply rules that are // registered to be applied regardless of the op type; then recursively apply rules to subgraphs (if any). // Stop further rule application for the current node, if the node gets removed by a rule. - bool deleted = false; const std::vector>* rules = nullptr; rules = GetRewriteRulesForOpType(node->OpType()); if (rules) { - ORT_RETURN_IF_ERROR(ApplyRulesOnNode(graph, *node, *rules, modified, deleted)); + ORT_RETURN_IF_ERROR(ApplyRulesOnNode(graph, *node, *rules, rule_effect)); } - if (!deleted) { + if (rule_effect != RuleEffect::kRemovedCurrentNode) { rules = GetAnyOpRewriteRules(); if (rules) { - ORT_RETURN_IF_ERROR(ApplyRulesOnNode(graph, *node, *rules, modified, deleted)); + ORT_RETURN_IF_ERROR(ApplyRulesOnNode(graph, *node, *rules, rule_effect)); } } - if (!deleted) { + // Update the modified field of the rule-based transformer. + if (rule_effect != RuleEffect::kNone) { + modified = true; + } + + if (rule_effect != RuleEffect::kRemovedCurrentNode) { ORT_RETURN_IF_ERROR(Recurse(*node, modified, graph_level)); } } diff --git a/onnxruntime/core/optimizer/slice_elimination.cc b/onnxruntime/core/optimizer/slice_elimination.cc index f3ab6186183cf..3a42b6170d75a 100644 --- a/onnxruntime/core/optimizer/slice_elimination.cc +++ b/onnxruntime/core/optimizer/slice_elimination.cc @@ -8,9 +8,9 @@ namespace onnxruntime { -Status EliminateSlice::Apply(Graph& graph, Node& node, bool& modified, bool& removed) { - if (graph_utils::RemoveSingleInputNode(graph, node)) { - removed = modified = true; +Status EliminateSlice::Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) { + if (graph_utils::RemoveNode(graph, node)) { + rule_effect = RewriteRuleEffect::kRemovedCurrentNode; } return Status::OK(); @@ -19,7 +19,7 @@ Status EliminateSlice::Apply(Graph& graph, Node& node, bool& modified, bool& rem bool EliminateSlice::SatisfyCondition(const Graph& graph, const Node& node) { // We currently support elimination for Slice operator v1. // TODO Extend to support Slice operator v10, which includes "steps" and all attributes are now given as inputs. - if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Slice", 1)) { + if (!graph_utils::IsSupportedOptypeVersionAndDomain(node, "Slice", {1})) { return false; } @@ -37,7 +37,7 @@ bool EliminateSlice::SatisfyCondition(const Graph& graph, const Node& node) { } std::vector axes; if (!graph_utils::GetRepeatedNodeAttributeValues(node, "axes", axes)) { - for (int i = 0; (size_t)i < starts.size(); ++i) { + for (int i = 0; static_cast(i) < starts.size(); ++i) { axes.push_back(i); } } else if (axes.size() != starts.size()) { diff --git a/onnxruntime/core/optimizer/slice_elimination.h b/onnxruntime/core/optimizer/slice_elimination.h index b43af73209c55..28d689c558097 100644 --- a/onnxruntime/core/optimizer/slice_elimination.h +++ b/onnxruntime/core/optimizer/slice_elimination.h @@ -25,7 +25,7 @@ class EliminateSlice : public RewriteRule { private: bool SatisfyCondition(const Graph& graph, const Node& node) override; - Status Apply(Graph& graph, Node& node, bool& modified, bool& removed) override; + Status Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) override; }; } // namespace onnxruntime diff --git a/onnxruntime/core/optimizer/transformer_memcpy.cc b/onnxruntime/core/optimizer/transformer_memcpy.cc index 3175ee6ec8bee..8aab4f3e6e7a8 100644 --- a/onnxruntime/core/optimizer/transformer_memcpy.cc +++ b/onnxruntime/core/optimizer/transformer_memcpy.cc @@ -156,16 +156,14 @@ void TransformerMemcpyImpl::ProcessDefs(onnxruntime::Node& node, const KernelReg const KernelCreateInfo* kci = nullptr; kernel_registries.SearchKernelRegistry(node, &kci); - ORT_ENFORCE(onnxruntime::Node::ForEachWithIndex( - node.InputDefs(), - [this, &kci](const onnxruntime::NodeArg& arg, size_t index) { - if (kci && MemTypeOnCpuExplicitly(kci->kernel_def->InputMemoryType(index))) - non_provider_input_defs_.insert(&arg); - else - provider_input_defs_.insert(&arg); - return Status::OK(); - }) - .IsOK()); + ORT_ENFORCE(onnxruntime::Node::ForEachWithIndex(node.InputDefs(), [this, &kci](const onnxruntime::NodeArg& arg, + size_t index) { + if (kci && kci->kernel_def->IsInputOnCpu(index)) + non_provider_input_defs_.insert(&arg); + else + provider_input_defs_.insert(&arg); + return Status::OK(); + }).IsOK()); // we don't need to handle implicit input here as provider_ is never kCpuExecutionProvider, all control flow // nodes are CPU based, and only control flow nodes have implicit inputs. @@ -176,7 +174,7 @@ void TransformerMemcpyImpl::ProcessDefs(onnxruntime::Node& node, const KernelReg if (!arg->Exists()) continue; - if (kci && MemTypeOnCpuExplicitly(kci->kernel_def->OutputMemoryType(i))) + if (kci && kci->kernel_def->IsOutputOnCpu(i)) non_provider_output_defs_.insert(arg); else provider_output_defs_.insert(arg); @@ -184,7 +182,7 @@ void TransformerMemcpyImpl::ProcessDefs(onnxruntime::Node& node, const KernelReg } else { // TODO: copy between devices? i.e. multiple GPUs if (node.GetExecutionProviderType() != onnxruntime::kCpuExecutionProvider && node.GetExecutionProviderType() != onnxruntime::kTensorrtExecutionProvider && - !node.GetExecutionProviderType().empty()) { + node.GetExecutionProviderType() != onnxruntime::kNGraphExecutionProvider && !node.GetExecutionProviderType().empty()) { ORT_THROW("Execution type '", node.GetExecutionProviderType(), "' doesn't support memcpy "); } @@ -207,25 +205,26 @@ void TransformerMemcpyImpl::ProcessDefs(onnxruntime::Node& node, const KernelReg //for non_provider defs, collect the nodes that expect it is provider tensor as input/output. void TransformerMemcpyImpl::BuildDefsMapping(const onnxruntime::NodeArg* arg, const KernelRegistryManager& kernel_registries) { - for (auto it = graph_.Nodes().begin(); it != graph_.Nodes().end(); it++) { - if (it->OpType() == "MemcpyFromHost" || it->OpType() == "MemcpyToHost") - continue; - auto input_it = std::find(it->MutableInputDefs().begin(), it->MutableInputDefs().end(), const_cast(arg)); - auto output_it = std::find(it->MutableOutputDefs().begin(), it->MutableOutputDefs().end(), const_cast(arg)); - int arg_input_index = input_it != it->MutableInputDefs().end() ? static_cast(input_it - it->MutableInputDefs().begin()) : -1; - int arg_output_index = output_it != it->MutableOutputDefs().end() ? static_cast(output_it - it->MutableOutputDefs().begin()) : -1; + for (auto& it : graph_.Nodes()) { + if (it.OpType() == "MemcpyFromHost" || it.OpType() == "MemcpyToHost") continue; + auto input_it = + std::find(it.MutableInputDefs().begin(), it.MutableInputDefs().end(), const_cast(arg)); + auto output_it = + std::find(it.MutableOutputDefs().begin(), it.MutableOutputDefs().end(), const_cast(arg)); + int arg_input_index = + input_it != it.MutableInputDefs().end() ? static_cast(input_it - it.MutableInputDefs().begin()) : -1; + int arg_output_index = + output_it != it.MutableOutputDefs().end() ? static_cast(output_it - it.MutableOutputDefs().begin()) : -1; if (arg_input_index == -1 && arg_output_index == -1) continue; - if (it->GetExecutionProviderType() == provider_) { + if (it.GetExecutionProviderType() == provider_) { const KernelCreateInfo* kci = nullptr; - kernel_registries.SearchKernelRegistry(*it, &kci); + kernel_registries.SearchKernelRegistry(it, &kci); if (arg_input_index != -1) { - if (!kci || !MemTypeOnCpuExplicitly(kci->kernel_def->InputMemoryType(arg_input_index))) - provider_input_nodes_[arg].insert(&*it); + if (!kci || !kci->kernel_def->IsInputOnCpu(arg_input_index)) provider_input_nodes_[arg].insert(&it); } if (arg_output_index != -1) { - if (!kci || !MemTypeOnCpuExplicitly(kci->kernel_def->OutputMemoryType(arg_output_index))) - provider_output_nodes_[arg].insert(&*it); + if (!kci || !kci->kernel_def->IsOutputOnCpu(arg_output_index)) provider_output_nodes_[arg].insert(&it); } } } @@ -299,24 +298,24 @@ void TransformerMemcpyImpl::ProcessInitializers(const KernelRegistryManager& ker auto dup_replacements = replacements; const KernelCreateInfo* kci = nullptr; - kernel_registries.SearchKernelRegistry(*p_node, &kci); - p_node->ForEachWithIndex( - p_node->InputDefs(), - [kci, &dup_replacements](const onnxruntime::NodeArg& arg, size_t index) { - if (kci && MemTypeOnCpuExplicitly(kci->kernel_def->InputMemoryType(index))) - dup_replacements.erase(&arg); - return Status::OK(); - }); + auto status = kernel_registries.SearchKernelRegistry(*p_node, &kci); + ORT_ENFORCE(status.IsOK(), status.ErrorMessage()); + if (kci == nullptr) continue; + if (kci->kernel_def == nullptr) continue; + onnxruntime::Node::ForEachWithIndex(p_node->InputDefs(), + [kci, &dup_replacements](const onnxruntime::NodeArg& arg, size_t index) { + if (kci->kernel_def->IsInputOnCpu(index)) dup_replacements.erase(&arg); + return Status::OK(); + }); // normally initializers are only inputs, but things may change with ops like assign - p_node->ForEachWithIndex( - p_node->OutputDefs(), - [kci, &dup_replacements](const onnxruntime::NodeArg& arg, size_t index) { - if (kci && MemTypeOnCpuExplicitly(kci->kernel_def->OutputMemoryType(index))) { - ORT_ENFORCE(dup_replacements.find(&arg) == dup_replacements.end()); - } - return Status::OK(); - }); + onnxruntime::Node::ForEachWithIndex(p_node->OutputDefs(), + [kci, &dup_replacements](const onnxruntime::NodeArg& arg, size_t index) { + if (kci->kernel_def->IsOutputOnCpu(index)) { + ORT_ENFORCE(dup_replacements.find(&arg) == dup_replacements.end()); + } + return Status::OK(); + }); p_node->ReplaceDefs(dup_replacements); } diff --git a/onnxruntime/core/optimizer/unsqueeze_elimination.cc b/onnxruntime/core/optimizer/unsqueeze_elimination.cc index a53e52d650cf9..549d415bf244e 100644 --- a/onnxruntime/core/optimizer/unsqueeze_elimination.cc +++ b/onnxruntime/core/optimizer/unsqueeze_elimination.cc @@ -10,7 +10,7 @@ using namespace ::onnxruntime::common; namespace onnxruntime { -Status UnsqueezeElimination::Apply(Graph& graph, Node& node, bool& modified, bool& removed) { +Status UnsqueezeElimination::Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) { // Get "axes" attribute. const ONNX_NAMESPACE::AttributeProto* attr = graph_utils::GetNodeAttribute(node, "axes"); if (attr == nullptr || attr->type() != AttributeProto_AttributeType_INTS) { @@ -18,6 +18,7 @@ Status UnsqueezeElimination::Apply(Graph& graph, Node& node, bool& modified, boo } std::vector axes; + axes.reserve(attr->ints_size()); for (int i = 0; i < attr->ints_size(); i++) { axes.push_back(static_cast(attr->ints(i))); } @@ -66,8 +67,8 @@ Status UnsqueezeElimination::Apply(Graph& graph, Node& node, bool& modified, boo input_def->SetShape(shape); // Remove Unsqueeze node. - if (graph_utils::RemoveSingleInputNode(graph, node)) { - removed = modified = true; + if (graph_utils::RemoveNode(graph, node)) { + rule_effect = RewriteRuleEffect::kRemovedCurrentNode; } return Status::OK(); diff --git a/onnxruntime/core/optimizer/unsqueeze_elimination.h b/onnxruntime/core/optimizer/unsqueeze_elimination.h index f4107fba36978..e8e4dad40057f 100644 --- a/onnxruntime/core/optimizer/unsqueeze_elimination.h +++ b/onnxruntime/core/optimizer/unsqueeze_elimination.h @@ -25,7 +25,7 @@ class UnsqueezeElimination : public RewriteRule { private: bool SatisfyCondition(const Graph& graph, const Node& node) override; - Status Apply(Graph& graph, Node& node, bool& modified, bool& deleted) override; + Status Apply(Graph& graph, Node& node, RewriteRuleEffect& rule_effect) override; }; } // namespace onnxruntime diff --git a/onnxruntime/core/platform/posix/env.cc b/onnxruntime/core/platform/posix/env.cc index 7cabf7175301c..92375f6f31c46 100644 --- a/onnxruntime/core/platform/posix/env.cc +++ b/onnxruntime/core/platform/posix/env.cc @@ -180,10 +180,9 @@ class PosixEnv : public Env { char buf[1024]; const char* msg = ""; if (e > 0) { -#if defined(_GNU_SOURCE) && !defined(__APPLE__) +#if defined(__GLIBC__) && defined(_GNU_SOURCE) msg = strerror_r(e, buf, sizeof(buf)); #else - // for Mac OS X if (strerror_r(e, buf, sizeof(buf)) != 0) { buf[0] = '\0'; } diff --git a/onnxruntime/core/providers/common.h b/onnxruntime/core/providers/common.h index d3eefa382924d..16a065cb8cc1a 100644 --- a/onnxruntime/core/providers/common.h +++ b/onnxruntime/core/providers/common.h @@ -11,7 +11,7 @@ namespace onnxruntime { Handle a potentially negative axis. Enforces negative axis is valid. @param axis Axis to convert from negative to positive if needed. @param tensor_rank Rank of tensor axis applies to. Tensor::Shape()::NumDimensions(). -@returns Positive axis. +@returns non-negative axis. */ inline int64_t HandleNegativeAxis(int64_t axis, int64_t tensor_rank) { ORT_ENFORCE(axis >= -tensor_rank && axis <= tensor_rank - 1, "axis ", axis, diff --git a/onnxruntime/core/providers/cpu/controlflow/if.cc b/onnxruntime/core/providers/cpu/controlflow/if.cc index 334ba6145c7fc..97d59f9744a42 100644 --- a/onnxruntime/core/providers/cpu/controlflow/if.cc +++ b/onnxruntime/core/providers/cpu/controlflow/if.cc @@ -79,7 +79,7 @@ class IfImpl { int num_outputs_; std::vector subgraph_output_names_; - std::unordered_map implicit_inputs_; + std::unordered_map implicit_inputs_; enum class AllocationType { Delayed, // allocation of If output will be done by subgraph execution @@ -87,7 +87,7 @@ class IfImpl { }; // track where the fetches provided to subgraph execution were allocated. - std::vector> outputs_; + std::vector> outputs_; }; Status If::Compute(OpKernelContext* ctx) const { @@ -124,9 +124,9 @@ IfImpl::IfImpl(OpKernelContextInternal& context, Status IfImpl::Initialize() { auto& graph_outputs = subgraph_.GetOutputs(); - auto num_subgraph_outputs = graph_outputs.size(); + size_t num_subgraph_outputs = graph_outputs.size(); - if (num_subgraph_outputs != num_outputs_) { + if (num_subgraph_outputs != static_cast(num_outputs_)) { return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "'If' node has ", num_outputs_, " outputs which doesn't match the subgraph's ", num_subgraph_outputs, " outputs."); } @@ -158,7 +158,7 @@ Status IfImpl::AllocateOutputTensors() { TensorShape output_shape{onnxruntime::utils::GetTensorShapeFromTensorShapeProto(*graph_output_shape)}; - // if size < 0 we have a symbolic dimension and need to use a temporary MLValue in the subgraph execution + // if size < 0 we have a symbolic dimension and need to use a temporary OrtValue in the subgraph execution if (output_shape.Size() < 0) { // we still need a value to put in the feeds we give to the execution frame, so just use an empty MLValue outputs_.push_back({AllocationType::Delayed, {}}); @@ -168,7 +168,7 @@ Status IfImpl::AllocateOutputTensors() { if (!tensor) return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Failed to create output tensor for ", graph_output->Name()); - outputs_.push_back({AllocationType::IfOutput, *context_.GetOutputMLValue(index)}); + outputs_.emplace_back(AllocationType::IfOutput, *context_.GetOutputMLValue(index)); } ++index; @@ -185,7 +185,7 @@ Status IfImpl::CreateFeedsFetchesManager(std::unique_ptr& f ffi.feed_names.reserve(num_inputs); ffi.feeds_mlvalue_idxs.reserve(num_inputs); - auto& mlvalue_name_idx_map = session_state_.GetMLValueNameIdxMap(); + auto& ort_value_name_idx_map = session_state_.GetMLValueNameIdxMap(); // pass in implicit inputs as feeds. for (auto& entry : implicit_inputs_) { @@ -193,15 +193,15 @@ Status IfImpl::CreateFeedsFetchesManager(std::unique_ptr& f // alternatively we could track implicit inputs on a per-attribute basis in the node, but that // would make that tracking a bit more complicated. int idx; - if (mlvalue_name_idx_map.GetIdx(entry.first, idx).IsOK()) { + if (ort_value_name_idx_map.GetIdx(entry.first, idx).IsOK()) { ffi.feed_names.push_back(entry.first); ffi.feeds_mlvalue_idxs.push_back(idx); } } ffi.output_names = subgraph_output_names_; - ORT_RETURN_IF_ERROR(FeedsFetchesInfo::MapNamesToMLValueIdxs(ffi.output_names, mlvalue_name_idx_map, - ffi.fetches_mlvalue_idxs)); + ORT_RETURN_IF_ERROR( + FeedsFetchesInfo::MapNamesToMLValueIdxs(ffi.output_names, ort_value_name_idx_map, ffi.fetches_mlvalue_idxs)); ffm = std::make_unique(std::move(ffi)); @@ -212,7 +212,7 @@ Status IfImpl::Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* cach Status status = Status::OK(); auto num_inputs = implicit_inputs_.size(); - std::vector feeds; + std::vector feeds; feeds.reserve(num_inputs); // pass in implicit inputs as feeds. @@ -220,13 +220,12 @@ Status IfImpl::Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* cach auto& feed_names = cached_ffm ? cached_ffm->GetFeedsFetchesInfo().feed_names : ffm->GetFeedsFetchesInfo().feed_names; for (auto& feed_name : feed_names) { const auto* feed_mlvalue = implicit_inputs_[feed_name]; - ORT_ENFORCE(feed_mlvalue, "All implicit inputs should have MLValue instances by now. ", - feed_name, " did not."); + ORT_ENFORCE(feed_mlvalue, "All implicit inputs should have OrtValue instances by now. ", feed_name, " did not."); feeds.push_back(*feed_mlvalue); } - std::vector fetches; + std::vector fetches; std::unordered_map fetch_allocators; fetches.reserve(num_outputs_); @@ -236,18 +235,16 @@ Status IfImpl::Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* cach if (outputs_[i].first == AllocationType::Delayed) { // functor to forward the allocation request from the subgraph to the If node's context so that the // allocation plan for the If node's output is used. - fetch_allocators[i] = - [this, i](const TensorShape& shape, MLValue& mlvalue) { - // allocate - auto* tensor = context_.Output(i, shape); - - if (!tensor) - return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Failed to create output tensor for If output ", i); - - // return MLValue for allocated tensor - mlvalue = *context_.GetOutputMLValue(i); - return Status::OK(); - }; + fetch_allocators[i] = [this, i](const TensorShape& shape, OrtValue& ort_value) { + // allocate + auto* tensor = context_.Output(i, shape); + + if (!tensor) return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Failed to create output tensor for If output ", i); + + // return OrtValue for allocated tensor + ort_value = *context_.GetOutputMLValue(i); + return Status::OK(); + }; } } diff --git a/onnxruntime/core/providers/cpu/controlflow/loop.cc b/onnxruntime/core/providers/cpu/controlflow/loop.cc index 1361ab4d21847..7ce6e8c1528a1 100644 --- a/onnxruntime/core/providers/cpu/controlflow/loop.cc +++ b/onnxruntime/core/providers/cpu/controlflow/loop.cc @@ -102,11 +102,11 @@ class LoopImpl { Status Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* cached_ffm); private: - void CreateInitialFeeds(std::vector& feeds); - void SaveOutputsAndUpdateFeeds(const std::vector& last_outputs, std::vector& next_inputs); + void CreateInitialFeeds(std::vector& feeds); + void SaveOutputsAndUpdateFeeds(const std::vector& last_outputs, std::vector& next_inputs); // create the single Loop output from a collection of per-iteration outputs - Status ConcatenateLoopOutput(std::vector& per_iteration_output, int output_index); + Status ConcatenateLoopOutput(std::vector& per_iteration_output, int output_index); OpKernelContextInternal& context_; const SessionState& session_state_; @@ -119,17 +119,17 @@ class LoopImpl { int num_subgraph_inputs_; int num_outputs_; - std::unordered_map implicit_inputs_; + std::unordered_map implicit_inputs_; - MLValue iter_num_mlvalue_; - MLValue condition_mlvalue_; + OrtValue iter_num_mlvalue_; + OrtValue condition_mlvalue_; std::vector subgraph_input_names_; std::vector subgraph_output_names_; - // collection of MLValue outputs from each loop iteration for the loop outputs. + // collection of OrtValue outputs from each loop iteration for the loop outputs. // the order from the subgraph matches the order from the loop output - std::vector> loop_output_tensors_; + std::vector> loop_output_tensors_; }; Status Loop::Compute(OpKernelContext* ctx) const { @@ -166,7 +166,7 @@ LoopImpl::LoopImpl(OpKernelContextInternal& context, } template -static MLValue MakeScalarMLValue(AllocatorPtr& allocator, T value) { +static OrtValue MakeScalarMLValue(AllocatorPtr& allocator, T value) { auto* data_type = DataTypeImpl::GetType(); std::unique_ptr p_tensor = std::make_unique(data_type, TensorShape({1}), @@ -174,9 +174,8 @@ static MLValue MakeScalarMLValue(AllocatorPtr& allocator, T value) { *p_tensor->MutableData() = value; - return MLValue{p_tensor.release(), - DataTypeImpl::GetType(), - DataTypeImpl::GetType()->GetDeleteFunc()}; + return OrtValue{p_tensor.release(), DataTypeImpl::GetType(), + DataTypeImpl::GetType()->GetDeleteFunc()}; } Status LoopImpl::Initialize() { @@ -245,7 +244,7 @@ Status LoopImpl::CreateFeedsFetchesManager(std::unique_ptr& return status; } -void LoopImpl::CreateInitialFeeds(std::vector& feeds) { +void LoopImpl::CreateInitialFeeds(std::vector& feeds) { auto num_implicit_inputs = implicit_inputs_.size(); feeds.reserve(num_subgraph_inputs_ + num_implicit_inputs); @@ -260,13 +259,13 @@ void LoopImpl::CreateInitialFeeds(std::vector& feeds) { // pass in implicit inputs as feeds. for (auto& entry : implicit_inputs_) { - ORT_ENFORCE(entry.second, "All implicit inputs should have MLValue instances by now. ", - entry.first, " did not."); + ORT_ENFORCE(entry.second, "All implicit inputs should have OrtValue instances by now. ", entry.first, " did not."); feeds.push_back(*entry.second); } } -void LoopImpl::SaveOutputsAndUpdateFeeds(const std::vector& last_outputs, std::vector& next_inputs) { +void LoopImpl::SaveOutputsAndUpdateFeeds(const std::vector& last_outputs, + std::vector& next_inputs) { // last_output: cond, loop vars..., loop output... // next_input: iter_num, cond, loop_vars. iter_num is re-used @@ -281,7 +280,7 @@ void LoopImpl::SaveOutputsAndUpdateFeeds(const std::vector& last_output } } -Status LoopImpl::ConcatenateLoopOutput(std::vector& per_iteration_output, int output_index) { +Status LoopImpl::ConcatenateLoopOutput(std::vector& per_iteration_output, int output_index) { const auto& first_output = per_iteration_output.front().Get(); size_t bytes_per_iteration = first_output.Size(); const auto& per_iteration_shape = first_output.Shape(); @@ -301,8 +300,8 @@ Status LoopImpl::ConcatenateLoopOutput(std::vector& per_iteration_outpu output->Size()); for (int64_t i = 0; i < num_iterations; ++i) { - auto& mlvalue = per_iteration_output[i]; - auto& iteration_data = mlvalue.Get(); + auto& ort_value = per_iteration_output[i]; + auto& iteration_data = ort_value.Get(); // sanity check if (bytes_per_iteration != iteration_data.Size()) { @@ -322,8 +321,8 @@ Status LoopImpl::ConcatenateLoopOutput(std::vector& per_iteration_outpu Status LoopImpl::Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* cached_ffm) { auto status = Status::OK(); - std::vector feeds; - std::vector fetches; + std::vector feeds; + std::vector fetches; CreateInitialFeeds(feeds); @@ -360,7 +359,7 @@ Status LoopImpl::Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* ca // As the loop carried variables may change shape across iterations there's no way to avoid a copy // as we need the final shape. - auto copy_tensor_from_mlvalue_to_output = [this](const MLValue& input, int output_idx) { + auto copy_tensor_from_mlvalue_to_output = [this](const OrtValue& input, int output_idx) { auto& data = input.Get(); Tensor* output = context_.Output(output_idx, data.Shape()); auto src = gsl::make_span(static_cast(data.DataRaw()), data.Size()); @@ -371,7 +370,7 @@ Status LoopImpl::Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* ca // copy to Loop output if (iter_num_value != 0) { for (int i = 0; i < num_loop_carried_vars_; ++i) { - // need to allocate Loop output and copy MLValue from fetches + // need to allocate Loop output and copy OrtValue from fetches copy_tensor_from_mlvalue_to_output(fetches[i + 1], i); // skip cond } diff --git a/onnxruntime/core/providers/cpu/controlflow/scan_8.cc b/onnxruntime/core/providers/cpu/controlflow/scan_8.cc index 4cdb97055327b..cae618e04eaa1 100644 --- a/onnxruntime/core/providers/cpu/controlflow/scan_8.cc +++ b/onnxruntime/core/providers/cpu/controlflow/scan_8.cc @@ -108,8 +108,8 @@ class Scan8Impl { Status AllocateOutputTensors(); Status CreateLoopStateVariables(std::vector>& loop_state_variables); - using ConstTensorSlicerIterators = std::vector::Iterator>; - using MutableTensorSlicerIterators = std::vector::Iterator>; + using ConstTensorSlicerIterators = std::vector::Iterator>; + using MutableTensorSlicerIterators = std::vector::Iterator>; OpKernelContextInternal& context_; const SessionState& session_state_; @@ -129,7 +129,7 @@ class Scan8Impl { std::vector subgraph_output_names_; std::vector> output_iterators_; - std::unordered_map implicit_inputs_; + std::unordered_map implicit_inputs_; }; template <> @@ -212,9 +212,9 @@ static const Tensor& GetSubgraphInputTensor(const OpKernelContext& context, int return *context.Input(index + 1); } -// get the Scan input that is used in a call to the subgraph as an MLValue, +// get the Scan input that is used in a call to the subgraph as an OrtValue, // skipping over the optional arg to the Scan operator -static const MLValue& GetSubgraphInputMLValue(const OpKernelContextInternal& context, int index) { +static const OrtValue& GetSubgraphInputMLValue(const OpKernelContextInternal& context, int index) { // skip the optional sequence_lens input return *context.GetInputMLValue(index + 1); } @@ -294,8 +294,8 @@ Status Scan8Impl::ValidateInput() { auto d = sequence_lens_tensor_->DataAsSpan(); sequence_lens_.assign(d.cbegin(), d.cend()); - if (std::all_of(sequence_lens_.cbegin(), sequence_lens_.cend(), - [this](int64_t value) { return value > 0 && value <= max_sequence_len_; }) == false) { + if (!std::all_of(sequence_lens_.cbegin(), sequence_lens_.cend(), + [this](int64_t value) { return value > 0 && value <= max_sequence_len_; })) { return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid entries in sequence_lens. Max sequence length was ", max_sequence_len_); } @@ -341,17 +341,17 @@ Status Scan8Impl::CreateLoopStateVariables(std::vector::Iterator> loop_state_input_iterators; + std::vector::Iterator> loop_state_input_iterators; loop_state_input_iterators.reserve(num_loop_state_variables_); // create the input and output slice iterator for each loop state variable. for (int i = 0; i < num_loop_state_variables_; ++i) { - const MLValue& mlvalue = GetSubgraphInputMLValue(context_, i); - MLValue* p_mlvalue = context_.GetOutputMLValue(i); + const OrtValue& ort_value = GetSubgraphInputMLValue(context_, i); + OrtValue* p_mlvalue = context_.GetOutputMLValue(i); - ORT_ENFORCE(p_mlvalue, "Output MLValue has not been created for loop state variable output ", i); + ORT_ENFORCE(p_mlvalue, "Output OrtValue has not been created for loop state variable output ", i); - loop_state_input_iterators.push_back(MLValueTensorSlicer::Create(mlvalue).begin()); + loop_state_input_iterators.push_back(MLValueTensorSlicer::Create(ort_value).begin()); } batch_loop_state_variables.clear(); @@ -370,7 +370,7 @@ Status Scan8Impl::CreateLoopStateVariables(std::vector::Iterator> scan_input_stream_iterators; + // Setup input OrtValue streams + std::vector::Iterator> scan_input_stream_iterators; scan_input_stream_iterators.reserve(num_variadic_inputs_ - num_loop_state_variables_); for (int i = num_loop_state_variables_, end = num_variadic_inputs_; i < end; ++i) { - const auto& mlvalue = GetSubgraphInputMLValue(context_, i); + const auto& ort_value = GetSubgraphInputMLValue(context_, i); // forward if (directions_[i - num_loop_state_variables_] == static_cast(ScanDirection::kForward)) { // the iterator is self contained, so we don't need to keep the MLValueTensorSlicer instance around - scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(mlvalue, 1, b).begin()); + scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(ort_value, 1, b).begin()); } else { // reverse - scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(mlvalue, 1, b).rbegin()); + scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(ort_value, 1, b).rbegin()); // need to skip past the empty entries at the end of the input if sequence length is short auto offset = max_sequence_len_ - sequence_len; if (offset > 0) { diff --git a/onnxruntime/core/providers/cpu/controlflow/scan_9.cc b/onnxruntime/core/providers/cpu/controlflow/scan_9.cc index 52893afae4a2e..d8cb4c96d3f30 100644 --- a/onnxruntime/core/providers/cpu/controlflow/scan_9.cc +++ b/onnxruntime/core/providers/cpu/controlflow/scan_9.cc @@ -130,8 +130,8 @@ class ScanImpl { Status CreateLoopStateVariables(std::vector& loop_state_variables); Status TransposeOutput(); - using ConstTensorSlicerIterators = std::vector::Iterator>; - using MutableTensorSlicerIterators = std::vector::Iterator>; + using ConstTensorSlicerIterators = std::vector::Iterator>; + using MutableTensorSlicerIterators = std::vector::Iterator>; OpKernelContextInternal& context_; const SessionState& session_state_; @@ -152,12 +152,12 @@ class ScanImpl { std::vector input_axes_; // inputs for graph. either original input value or transposed input if an axis other than 0 was specified - std::vector inputs_; + std::vector inputs_; std::vector subgraph_output_names_; std::vector> output_iterators_; - std::unordered_map implicit_inputs_; + std::unordered_map implicit_inputs_; }; template <> @@ -345,7 +345,7 @@ Status ScanImpl::SetupInputs() { auto& input_tensor = *context_.Input(i + num_loop_state_variables_); const auto& input_shape = input_tensor.Shape(); - std::vector permutations; + std::vector permutations; std::vector new_shape; CalculateTransposedShapeForInput(input_shape, sequence_dim, permutations, new_shape); @@ -354,7 +354,7 @@ Status ScanImpl::SetupInputs() { ORT_RETURN_IF_ERROR(status); } - MLValue transpose_output = scan::detail::AllocateTensorInMLValue(input_tensor.DataType(), new_shape, alloc); + OrtValue transpose_output = scan::detail::AllocateTensorInMLValue(input_tensor.DataType(), new_shape, alloc); status = TransposeBase::DoTranspose(permutations, input_tensor, *transpose_output.GetMutable()); ORT_RETURN_IF_ERROR(status); @@ -410,11 +410,11 @@ Status ScanImpl::CreateLoopStateVariables(std::vector& loop_s loop_state_variables.reserve(num_loop_state_variables_); for (int i = 0; i < num_loop_state_variables_; ++i) { - const MLValue& input_mlvalue = *context_.GetInputMLValue(i); - MLValue* output_mlvalue = context_.GetOutputMLValue(i); - ORT_ENFORCE(output_mlvalue, "Output MLValue has not been created for loop state variable output ", i); + const OrtValue& input_mlvalue = *context_.GetInputMLValue(i); + OrtValue* output_mlvalue = context_.GetOutputMLValue(i); + ORT_ENFORCE(output_mlvalue, "Output OrtValue has not been created for loop state variable output ", i); - loop_state_variables.push_back(LoopStateVariable(input_mlvalue, *output_mlvalue, sequence_len_, alloc)); + loop_state_variables.emplace_back(input_mlvalue, *output_mlvalue, sequence_len_, alloc); } return status; @@ -433,19 +433,19 @@ Status ScanImpl::Execute(FeedsFetchesManager* ffm, const FeedsFetchesManager* ca status = CreateLoopStateVariables(loop_state_variables); ORT_RETURN_IF_ERROR(status); - // Setup input MLValue streams - std::vector::Iterator> scan_input_stream_iterators; + // Setup input OrtValue streams + std::vector::Iterator> scan_input_stream_iterators; scan_input_stream_iterators.reserve(num_variadic_inputs_ - num_loop_state_variables_); for (int i = 0, end = num_scan_inputs_; i < end; ++i) { - const auto& mlvalue = inputs_[i]; + const auto& ort_value = inputs_[i]; // forward if (input_directions_[i] == static_cast(ScanDirection::kForward)) { // the iterator is self contained, so we don't need to keep the MLValueTensorSlicer instance around - scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(mlvalue).begin()); + scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(ort_value).begin()); } else { // reverse - scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(mlvalue).rbegin()); + scan_input_stream_iterators.push_back(MLValueTensorSlicer::Create(ort_value).rbegin()); } } @@ -469,7 +469,7 @@ Status ScanImpl::TransposeOutput() { if (axis != 0) { auto output_index = i + num_loop_state_variables_; - const MLValue& temporary_output_mlvalue = output_iterators_[output_index]->GetOutput(); + const OrtValue& temporary_output_mlvalue = output_iterators_[output_index]->GetOutput(); const Tensor& temporary_output_tensor = temporary_output_mlvalue.Get(); int64_t output_rank = temporary_output_tensor.Shape().NumDimensions(); @@ -481,7 +481,7 @@ Status ScanImpl::TransposeOutput() { return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid value in scan_output_axes for output ", i, " of ", axis, ". Output tensor rank was ", output_rank); - std::vector permutations; + std::vector permutations; std::vector new_shape; CalculateTransposedShapeForOutput(temporary_output_tensor.Shape(), axis, permutations, new_shape); diff --git a/onnxruntime/core/providers/cpu/controlflow/scan_utils.cc b/onnxruntime/core/providers/cpu/controlflow/scan_utils.cc index 1f1d51fa2dbe6..92c97a6b3dbfa 100644 --- a/onnxruntime/core/providers/cpu/controlflow/scan_utils.cc +++ b/onnxruntime/core/providers/cpu/controlflow/scan_utils.cc @@ -96,11 +96,10 @@ Status AllocateOutput(OpKernelContextInternal& context, const GraphViewer& subgr return Status::OK(); } -Status CreateFeedsFetchesManager(const GraphViewer& subgraph, - int num_variadic_inputs, - std::unordered_map& implicit_inputs, +Status CreateFeedsFetchesManager(const GraphViewer& subgraph, int num_variadic_inputs, + std::unordered_map& implicit_inputs, std::vector& subgraph_output_names, - const MLValueNameIdxMap& mlvalue_name_idx_map, + const MLValueNameIdxMap& ort_value_name_idx_map, std::unique_ptr& ffm) { auto* graph_inputs = &subgraph.GetInputsIncludingInitializers(); if (static_cast(num_variadic_inputs) < graph_inputs->size()) { @@ -127,30 +126,25 @@ Status CreateFeedsFetchesManager(const GraphViewer& subgraph, } FeedsFetchesInfo ffi(feed_names, subgraph_output_names); - auto status = FeedsFetchesManager::Create(feed_names, subgraph_output_names, mlvalue_name_idx_map, ffm); + auto status = FeedsFetchesManager::Create(feed_names, subgraph_output_names, ort_value_name_idx_map, ffm); return status; } -Status IterateSequence(OpKernelContextInternal& context, - const SessionState& session_state, +Status IterateSequence(OpKernelContextInternal& context, const SessionState& session_state, std::vector& loop_state_variables, - std::vector::Iterator>& scan_input_stream_iterators, - int64_t seq_length, - int num_loop_state_variables, - int num_variadic_inputs, - int num_variadic_outputs, - std::unordered_map& implicit_inputs, - std::vector>& output_iterators, - FeedsFetchesManager* ffm, + std::vector::Iterator>& scan_input_stream_iterators, + int64_t seq_length, int num_loop_state_variables, int num_variadic_inputs, + int num_variadic_outputs, std::unordered_map& implicit_inputs, + std::vector>& output_iterators, FeedsFetchesManager* ffm, const FeedsFetchesManager* cached_ffm) { Status status = Status::OK(); auto num_implicit_inputs = implicit_inputs.size(); auto num_inputs = num_variadic_inputs + num_implicit_inputs; - std::vector feeds; - std::vector fetches; + std::vector feeds; + std::vector fetches; std::unordered_map fetch_allocators; feeds.resize(num_inputs); @@ -160,7 +154,7 @@ Status IterateSequence(OpKernelContextInternal& context, // first in each iteration though so offset by num_variadic_inputs int i = 0; for (auto& entry : implicit_inputs) { - ORT_ENFORCE(entry.second, "All implicit inputs should have MLValue instances by now. ", entry.first, " did not."); + ORT_ENFORCE(entry.second, "All implicit inputs should have OrtValue instances by now. ", entry.first, " did not."); feeds[num_variadic_inputs + i] = *entry.second; ++i; } @@ -190,19 +184,18 @@ Status IterateSequence(OpKernelContextInternal& context, auto& iterator = *output_iterators[output]; if (iterator.FinalOutputAllocated()) { - // add MLValue from sliced output - auto& mlvalue = *iterator; - fetches.push_back(mlvalue); + // add OrtValue from sliced output + auto& ort_value = *iterator; + fetches.push_back(ort_value); } else { // use a custom allocator that will forward the allocation request to the Scan context // and add the sequence length dimension. this avoids using a temporary value for the first output - fetch_allocators[output] = - [&iterator](const TensorShape& shape, MLValue& mlvalue) { - return iterator.AllocateSubgraphOutput(shape, mlvalue); - }; + fetch_allocators[output] = [&iterator](const TensorShape& shape, OrtValue& ort_value) { + return iterator.AllocateSubgraphOutput(shape, ort_value); + }; // also need a dummy empty entry in fetches so the order matches the output names - fetches.push_back({}); + fetches.emplace_back(); } } } @@ -239,18 +232,17 @@ Status IterateSequence(OpKernelContextInternal& context, return status; } -MLValue AllocateTensorInMLValue(const MLDataType data_type, const TensorShape& shape, AllocatorPtr& allocator) { +OrtValue AllocateTensorInMLValue(const MLDataType data_type, const TensorShape& shape, AllocatorPtr& allocator) { auto new_tensor = std::make_unique(data_type, shape, allocator); - return MLValue{new_tensor.release(), - DataTypeImpl::GetType(), - DataTypeImpl::GetType()->GetDeleteFunc()}; + return OrtValue{new_tensor.release(), DataTypeImpl::GetType(), + DataTypeImpl::GetType()->GetDeleteFunc()}; }; void CalculateTransposedShapeForInput(const TensorShape& original_shape, int64_t axis, - std::vector& permutations, std::vector& transposed_shape) { + std::vector& permutations, std::vector& transposed_shape) { int64_t rank = original_shape.NumDimensions(); const auto& dims = original_shape.GetDims(); @@ -269,7 +261,7 @@ void CalculateTransposedShapeForInput(const TensorShape& original_shape, int64_t } void CalculateTransposedShapeForOutput(const TensorShape& original_shape, int64_t axis, - std::vector& permutations, std::vector& transposed_shape) { + std::vector& permutations, std::vector& transposed_shape) { int64_t rank = original_shape.NumDimensions(); const auto& dims = original_shape.GetDims(); @@ -290,19 +282,15 @@ void CalculateTransposedShapeForOutput(const TensorShape& original_shape, int64_ } } -LoopStateVariable::LoopStateVariable(const MLValue& original_value, - MLValue& final_value, - const int64_t sequence_len, +LoopStateVariable::LoopStateVariable(const OrtValue& original_value, OrtValue& final_value, const int64_t sequence_len, AllocatorPtr& allocator) - : sequence_len_{sequence_len}, - original_value_{original_value}, - final_value_{final_value} { + : sequence_len_{sequence_len}, original_value_{original_value}, final_value_{final_value} { auto& tensor = original_value.Get(); auto& shape = tensor.Shape(); - // allocate a new Tensor in an MLValue with the same shape and type as the tensor in original_value. - // the Tensor will own the buffer, and the MLValue will own the Tensor. - // the MLValue returned by Input()/Output() gets copied into the execution frame feeds/fetches + // allocate a new Tensor in an OrtValue with the same shape and type as the tensor in original_value. + // the Tensor will own the buffer, and the OrtValue will own the Tensor. + // the OrtValue returned by Input()/Output() gets copied into the execution frame feeds/fetches // with the Tensor being used via a shared_ptr (so remains valid during execution and is cleaned up // automatically at the end). // TODO: Could allocate one large chunk for all the loop state variable buffers in ScanImpl, although that @@ -319,14 +307,14 @@ LoopStateVariable::LoopStateVariable(const MLValue& original_value, } } -const MLValue& LoopStateVariable::Input() const { +const OrtValue& LoopStateVariable::Input() const { if (iteration_num_ == 0) return original_value_; return iteration_num_ % 2 == 1 ? a_ : b_; } -MLValue& LoopStateVariable::Output() { +OrtValue& LoopStateVariable::Output() { if (iteration_num_ + 1 == sequence_len_) { return final_value_; } @@ -443,15 +431,15 @@ Status OutputIterator::AllocateFinalBuffer() { if (is_loop_state_var_) { // only one entry is required as we slice on a single dimension slicer_iterators_.push_back((direction_ == ScanDirection::kForward) - ? MLValueTensorSlicer::Create(*final_output_mlvalue_).begin() - : MLValueTensorSlicer::Create(*final_output_mlvalue_).rbegin()); + ? MLValueTensorSlicer::Create(*final_output_mlvalue_).begin() + : MLValueTensorSlicer::Create(*final_output_mlvalue_).rbegin()); } else { auto batch_size = final_shape_[0]; for (int i = 0; i < batch_size; ++i) { // the slicer handles the sequence dimension (dim 1) so create an entry for each batch slicer_iterators_.push_back((direction_ == ScanDirection::kForward) - ? MLValueTensorSlicer::Create(*final_output_mlvalue_, 1, i).begin() - : MLValueTensorSlicer::Create(*final_output_mlvalue_, 1, i).rbegin()); + ? MLValueTensorSlicer::Create(*final_output_mlvalue_, 1, i).begin() + : MLValueTensorSlicer::Create(*final_output_mlvalue_, 1, i).rbegin()); } } @@ -460,8 +448,8 @@ Status OutputIterator::AllocateFinalBuffer() { // nothing to slice for a loop state var. slice on dimension 0 (sequence) for the scan outputs. if (!is_loop_state_var_) { slicer_iterators_.push_back((direction_ == ScanDirection::kForward) - ? MLValueTensorSlicer::Create(*final_output_mlvalue_).begin() - : MLValueTensorSlicer::Create(*final_output_mlvalue_).rbegin()); + ? MLValueTensorSlicer::Create(*final_output_mlvalue_).begin() + : MLValueTensorSlicer::Create(*final_output_mlvalue_).rbegin()); cur_slicer_iterator_ = slicer_iterators_.begin(); } } @@ -469,7 +457,7 @@ Status OutputIterator::AllocateFinalBuffer() { return Status::OK(); } -Status OutputIterator::AllocateSubgraphOutput(const TensorShape& shape, MLValue& mlvalue) { +Status OutputIterator::AllocateSubgraphOutput(const TensorShape& shape, OrtValue& ort_value) { ORT_ENFORCE(!is_concrete_shape_, "If shape was concrete we shouldn't be using a custom allocator"); // update the final shape now that we can fill in the symbolic dimension with an actual value @@ -480,22 +468,22 @@ Status OutputIterator::AllocateSubgraphOutput(const TensorShape& shape, MLValue& status = AllocateFinalBuffer(); ORT_RETURN_IF_ERROR(status); - // get MLValue from operator*() - mlvalue = **this; + // get OrtValue from operator*() + ort_value = **this; return Status::OK(); } -MLValue& OutputIterator::operator*() { +OrtValue& OutputIterator::operator*() { ORT_ENFORCE(cur_iteration_ < num_iterations_); ORT_ENFORCE(is_concrete_shape_, - "Expected AllocateSubgraphOutput to have been called to before we read the MLValue from the iterator."); + "Expected AllocateSubgraphOutput to have been called to before we read the OrtValue from the iterator."); // for v8 both outputs and loop state vars use slicers. for v9 only outputs do if (is_v8_ || !is_loop_state_var_) return **cur_slicer_iterator_; - else - return *final_output_mlvalue_; + + return *final_output_mlvalue_; } OutputIterator& OutputIterator::operator++() { diff --git a/onnxruntime/core/providers/cpu/controlflow/scan_utils.h b/onnxruntime/core/providers/cpu/controlflow/scan_utils.h index ace216ac2b883..1f7ef75b4a7f8 100644 --- a/onnxruntime/core/providers/cpu/controlflow/scan_utils.h +++ b/onnxruntime/core/providers/cpu/controlflow/scan_utils.h @@ -10,7 +10,7 @@ #include "core/framework/allocator.h" #include "core/framework/feeds_fetches_manager.h" #include "core/framework/ml_value.h" -#include "core/framework/mlvalue_tensor_slicer.h" +#include "core/framework/ort_value_tensor_slicer.h" #include "core/graph/onnx_protobuf.h" namespace onnxruntime { @@ -25,19 +25,19 @@ enum class ScanDirection { kForward = 0, kReverse = 1 }; /** -Class to provide input/output MLValue instances for a loop state variable. -The MLValue flips between two internal temporary buffers to minimize copies. +Class to provide input/output OrtValue instances for a loop state variable. +The OrtValue flips between two internal temporary buffers to minimize copies. */ class LoopStateVariable { public: - LoopStateVariable(const MLValue& original_value, MLValue& final_value, const int64_t sequence_len, + LoopStateVariable(const OrtValue& original_value, OrtValue& final_value, int64_t sequence_len, AllocatorPtr& allocator); // get current Input MLValue - const MLValue& Input() const; + const OrtValue& Input() const; // get current Output MLValue - MLValue& Output(); + OrtValue& Output(); // move to next usage of the loop state variable. call after each iteration of the subgraph. void Next(); @@ -46,9 +46,9 @@ class LoopStateVariable { int64_t iteration_num_{0}; const int64_t sequence_len_; - // copy original and final value from temporary MLValue provided by iterator - const MLValue original_value_; - MLValue final_value_; + // copy original and final value from temporary OrtValue provided by iterator + const OrtValue original_value_; + OrtValue final_value_; /* we use original_value and final_value once, and alternate between a_ and b_ as input/output for each iteration to avoid copies @@ -60,17 +60,17 @@ class LoopStateVariable { ... seq len - 1 final_value */ - MLValue a_; - MLValue b_; + OrtValue a_; + OrtValue b_; }; /* -Class that co-ordinates writing to slices of the overall Scan output buffer returned by OpKernelContext.Output(i). -If the subgraph has a symbolic dimension in an output it will use a temporary MLValue for the first execution -in order to discover the output shape. Once the shape is known, it will switch to using the overall output buffer +Class that co-ordinates writing to slices of the overall Scan output buffer returned by OpKernelContext.Output(i). +If the subgraph has a symbolic dimension in an output it will use a temporary OrtValue for the first execution +in order to discover the output shape. Once the shape is known, it will switch to using the overall output buffer to avoid copies. -If 'temporary' is true it will use a temporary MLValue for the overall output as well. Set this to true if the output -needs to be transposed before being returned by the Scan operator. The data_type also needs to be provided if +If 'temporary' is true it will use a temporary OrtValue for the overall output as well. Set this to true if the output +needs to be transposed before being returned by the Scan operator. The data_type also needs to be provided if 'temporary' is true to do the allocation. */ class OutputIterator { @@ -89,7 +89,7 @@ class OutputIterator { return iterator->Initialize(); } - MLValue& operator*(); + OrtValue& operator*(); OutputIterator& operator++(); bool FinalOutputAllocated() const { return is_concrete_shape_; } @@ -98,7 +98,7 @@ class OutputIterator { // when the subgraph requests the allocation of the subgraph output, we forward the request to this instance, // allocate the overall output (taking into account the sequence length dimension), // and use a slicer to return the chunk for the subgraph output for this iteration. - Status AllocateSubgraphOutput(const TensorShape& shape, MLValue& mlvalue); + Status AllocateSubgraphOutput(const TensorShape& shape, OrtValue& ort_value); // set the output for the current iteration to zeros. used for short sequence lengths void ZeroOutCurrent() { @@ -106,7 +106,7 @@ class OutputIterator { memset(tensor->MutableDataRaw(), 0, tensor->Size()); } - const MLValue& GetOutput() const { + const OrtValue& GetOutput() const { ORT_ENFORCE(final_output_mlvalue_, "Attempt to retrieve final output before it was set."); return *final_output_mlvalue_; } @@ -138,17 +138,17 @@ class OutputIterator { bool is_concrete_shape_; // one or more slicers for writing to the output - std::vector::Iterator> slicer_iterators_; - std::vector::Iterator>::iterator cur_slicer_iterator_; + std::vector::Iterator> slicer_iterators_; + std::vector::Iterator>::iterator cur_slicer_iterator_; // if true allocate temporary_final_output_mlvalue_ with data_type_ using the temporary allocator // and point final_output_value_ at that. // if false, final_output_value_ is an output from the Scan operator and allocated using the context_. bool temporary_; MLDataType data_type_; - MLValue temporary_final_output_mlvalue_; + OrtValue temporary_final_output_mlvalue_; - MLValue* final_output_mlvalue_; + OrtValue* final_output_mlvalue_; }; void ReadDirections(const OpKernelInfo& info, const std::string& attr_name, @@ -160,27 +160,21 @@ Status AllocateOutput(OpKernelContextInternal& context, const GraphViewer& subgr ScanDirection direction = ScanDirection::kForward, bool temporary = false); -Status CreateFeedsFetchesManager(const GraphViewer& subgraph, - int num_variadic_inputs, - std::unordered_map& implicit_inputs, +Status CreateFeedsFetchesManager(const GraphViewer& subgraph, int num_variadic_inputs, + std::unordered_map& implicit_inputs, std::vector& subgraph_output_names, - const MLValueNameIdxMap& mlvalue_name_idx_map, + const MLValueNameIdxMap& ort_value_name_idx_map, std::unique_ptr& ffm); -Status IterateSequence(OpKernelContextInternal& context, - const SessionState& session_state, +Status IterateSequence(OpKernelContextInternal& context, const SessionState& session_state, std::vector& loop_state_variables, - std::vector::Iterator>& scan_input_stream_iterators, - int64_t seq_length, - int num_loop_state_variables, - int num_variadic_inputs, - int num_variadic_outputs, - std::unordered_map& implicit_inputs, - std::vector>& output_iterators, - FeedsFetchesManager* ffm, + std::vector::Iterator>& scan_input_stream_iterators, + int64_t seq_length, int num_loop_state_variables, int num_variadic_inputs, + int num_variadic_outputs, std::unordered_map& implicit_inputs, + std::vector>& output_iterators, FeedsFetchesManager* ffm, const FeedsFetchesManager* cached_ffm); -MLValue AllocateTensorInMLValue(const MLDataType data_type, const TensorShape& shape, AllocatorPtr& allocator); +OrtValue AllocateTensorInMLValue(MLDataType data_type, const TensorShape& shape, AllocatorPtr& allocator); /** Calculate the transpose permutations and shape by shifting the chosen axis TO the first dimension. @@ -190,7 +184,7 @@ e.g. if shape is {2, 3, 4} and axis 1 is chosen the permutations will be {1, 0, if axis 2 is chosen the permutations will be {2, 0, 1} and the output shape will be {4, 2, 3} */ void CalculateTransposedShapeForInput(const TensorShape& original_shape, int64_t axis, - std::vector& permutations, std::vector& transposed_shape); + std::vector& permutations, std::vector& transposed_shape); /** Calculate the transpose permutations and shape by shifting the chosen axis FROM the first dimension. @@ -199,7 +193,7 @@ e.g. if shape is {4, 2, 3} and axis 2 is chosen, dimension 0 will move to dimens the permutations will be {1, 2, 0} and output shape will be {2, 3, 4} */ void CalculateTransposedShapeForOutput(const TensorShape& original_shape, int64_t axis, - std::vector& permutations, std::vector& transposed_shape); + std::vector& permutations, std::vector& transposed_shape); } // namespace detail } // namespace scan diff --git a/onnxruntime/core/providers/cpu/cpu_execution_provider.cc b/onnxruntime/core/providers/cpu/cpu_execution_provider.cc index 11e251de43e75..06f9c395dd627 100644 --- a/onnxruntime/core/providers/cpu/cpu_execution_provider.cc +++ b/onnxruntime/core/providers/cpu/cpu_execution_provider.cc @@ -235,6 +235,7 @@ class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 9, int64_t, NonZero); class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 9, string, Where); class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 9, float, Where); +class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 9, int32_t, Where); // Opset 10 class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCpuExecutionProvider, kOnnxDomain, 10, StringNormalizer); @@ -496,6 +497,7 @@ void RegisterOnnxOperatorKernels(KernelRegistry& kernel_registry) { BuildKernelCreateInfo, BuildKernelCreateInfo, BuildKernelCreateInfo, + BuildKernelCreateInfo, // Opset 10 BuildKernelCreateInfo, diff --git a/onnxruntime/core/providers/cpu/cpu_provider_factory.cc b/onnxruntime/core/providers/cpu/cpu_provider_factory.cc index be905a8531a24..297f188ae92f8 100644 --- a/onnxruntime/core/providers/cpu/cpu_provider_factory.cc +++ b/onnxruntime/core/providers/cpu/cpu_provider_factory.cc @@ -10,7 +10,7 @@ namespace onnxruntime { struct CpuProviderFactory : IExecutionProviderFactory { CpuProviderFactory(bool create_arena) : create_arena_(create_arena) {} - ~CpuProviderFactory() override {} + ~CpuProviderFactory() override = default; std::unique_ptr CreateProvider() override; private: diff --git a/onnxruntime/core/providers/cpu/math/element_wise_ops.cc b/onnxruntime/core/providers/cpu/math/element_wise_ops.cc index a19038ec7b667..b56ff8f31ab02 100644 --- a/onnxruntime/core/providers/cpu/math/element_wise_ops.cc +++ b/onnxruntime/core/providers/cpu/math/element_wise_ops.cc @@ -4,6 +4,7 @@ #include "core/providers/cpu/math/element_wise_ops.h" #include #include "core/util/math.h" +#include "core/mlas/inc/mlas.h" #include @@ -1032,7 +1033,8 @@ Status Erf::Compute(OpKernelContext* context) const { ORT_ENFORCE(X_ptr != nullptr); auto& X = *X_ptr; auto& Y = *context->Output(0, X.Shape()); - EigenMap(Y) = EigenMap(X).array().erf(); + + MlasComputeErf(X.template Data(), Y.template MutableData(), X.Shape().Size()); return Status::OK(); } diff --git a/onnxruntime/core/providers/cpu/math/element_wise_ops.h b/onnxruntime/core/providers/cpu/math/element_wise_ops.h index 011392541f4ec..7083a6d03036d 100644 --- a/onnxruntime/core/providers/cpu/math/element_wise_ops.h +++ b/onnxruntime/core/providers/cpu/math/element_wise_ops.h @@ -393,9 +393,9 @@ struct Broadcaster { // Scalars are a special case, as it's always a broadcast size_t index = 0; if (dimension_count_min == 0) { - if (shape1.size() == 0) // Shape1 is a scalar + if (shape1.empty()) // Shape1 is a scalar { - if (shape2.size() == 0) // Two scalars? + if (shape2.empty()) // Two scalars? { iterator1_.Init(1, 1); iterator2_.Init(1, 1); diff --git a/onnxruntime/core/providers/cpu/math/logsoftmax.cc b/onnxruntime/core/providers/cpu/math/logsoftmax.cc index 6b3c108e32620..0a0744d0455ee 100644 --- a/onnxruntime/core/providers/cpu/math/logsoftmax.cc +++ b/onnxruntime/core/providers/cpu/math/logsoftmax.cc @@ -15,7 +15,7 @@ Status LogSoftmax::Compute(OpKernelContext* ctx) const { const Tensor* tensor_pointer = ctx->Input(0); if (tensor_pointer == nullptr) return Status(common::ONNXRUNTIME, common::FAIL, "input count mismatch"); const Tensor& X = *tensor_pointer; - const TensorShape input_shape{X.Shape()}; + const TensorShape& input_shape{X.Shape()}; Tensor* Y = ctx->Output(0, input_shape); diff --git a/onnxruntime/core/providers/cpu/math/sign.cc b/onnxruntime/core/providers/cpu/math/sign.cc index fbfa4ce117375..d5f1db1248053 100644 --- a/onnxruntime/core/providers/cpu/math/sign.cc +++ b/onnxruntime/core/providers/cpu/math/sign.cc @@ -46,7 +46,8 @@ template inline T FloatingImpl(T val) { if (std::isnan(val) || val == T(0)) { return T(0); - } else if (val > T(0)) { + } + if (val > T(0)) { return T(1); } else { return T(-1); diff --git a/onnxruntime/core/providers/cpu/math/softmax.cc b/onnxruntime/core/providers/cpu/math/softmax.cc index 61ad9bc71f06e..c85f238305039 100644 --- a/onnxruntime/core/providers/cpu/math/softmax.cc +++ b/onnxruntime/core/providers/cpu/math/softmax.cc @@ -15,7 +15,7 @@ Status Softmax::Compute(OpKernelContext* ctx) const { const Tensor* tensor_pointer = ctx->Input(0); if (tensor_pointer == nullptr) return Status(common::ONNXRUNTIME, common::FAIL, "input count mismatch"); const Tensor& X = *tensor_pointer; - const TensorShape input_shape{X.Shape()}; + const TensorShape& input_shape{X.Shape()}; VLOGS(ctx->Logger(), 2) << "Input tensor shape: " << input_shape; diff --git a/onnxruntime/core/providers/cpu/math/softmax_shared.cc b/onnxruntime/core/providers/cpu/math/softmax_shared.cc index 8ee6ec79c2a02..7dd3a10cfc598 100644 --- a/onnxruntime/core/providers/cpu/math/softmax_shared.cc +++ b/onnxruntime/core/providers/cpu/math/softmax_shared.cc @@ -25,6 +25,7 @@ #pragma warning(disable : 4996) #endif #include +#include #ifdef _MSC_VER #pragma warning(pop) #endif @@ -79,7 +80,7 @@ common::Status SoftmaxCPU(const int64_t N, } } else { for (int i = 0; i < N; ++i) { - auto log_fmaxf_scale_i = log(fmaxf(scale[i], 1e-20f)); + auto log_fmaxf_scale_i = std::log(fmaxf(scale[i], 1e-20f)); for (int j = 0; j < D; ++j) { Ydata[i * D + j] = Xdata[i * D + j] - rowmax[i] - log_fmaxf_scale_i; } diff --git a/onnxruntime/core/providers/cpu/math/softmax_shared.h b/onnxruntime/core/providers/cpu/math/softmax_shared.h index ab107c8370a38..3439b9717f051 100644 --- a/onnxruntime/core/providers/cpu/math/softmax_shared.h +++ b/onnxruntime/core/providers/cpu/math/softmax_shared.h @@ -17,12 +17,6 @@ Calculate Softmax using CPU memory. @param logarithmic If true, compute LogSoftmax. If false compute Softmax. @param rowmax Storage for calculation of maximum in each row. Size must be >= N. */ -common::Status SoftmaxCPU(const int64_t N, - const int64_t D, - const float* Xdata, - float* Ydata, - float* scale, - const float* sum_multiplier, - bool logarithmic, - float* rowmax); +common::Status SoftmaxCPU(int64_t N, int64_t D, const float* Xdata, float* Ydata, float* scale, + const float* sum_multiplier, bool logarithmic, float* rowmax); } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/ml/cast_map.cc b/onnxruntime/core/providers/cpu/ml/cast_map.cc index 489fd8171379c..d181205953ff5 100644 --- a/onnxruntime/core/providers/cpu/ml/cast_map.cc +++ b/onnxruntime/core/providers/cpu/ml/cast_map.cc @@ -118,7 +118,8 @@ Status CastMap::ComputeImpl(OpKernelContext& context, TTo pad_value) const { }); } else { // sparse map puts pad_value in all entries that aren't present in the input, up to map_max_ - auto cur_input = X.cbegin(), end_input = X.cend(); + auto cur_input = X.cbegin(); + auto end_input = X.cend(); auto out_end = out.end(); int64_t cur_idx = 0; diff --git a/onnxruntime/core/providers/cpu/ml/feature_vectorizer.cc b/onnxruntime/core/providers/cpu/ml/feature_vectorizer.cc index ae2e5aa5630b0..b7d2435c652dd 100644 --- a/onnxruntime/core/providers/cpu/ml/feature_vectorizer.cc +++ b/onnxruntime/core/providers/cpu/ml/feature_vectorizer.cc @@ -28,10 +28,9 @@ static void CopyWithCast(typename gsl::span::const_iterator begin, gsl::span::iterator out_iter); Status FeatureVectorizer::Compute(OpKernelContext* context) const { - auto input_count = context->NumVariadicInputs(0); - ORT_ENFORCE(input_count == input_dimensions_.size(), - "Number of inputs (", input_count, ") does not match number of inputdimensions values (", - input_dimensions_.size(), ")."); + int input_count = context->NumVariadicInputs(0); + ORT_ENFORCE(input_count >= 0 && static_cast(input_count) == input_dimensions_.size(), "Number of inputs (", + input_count, ") does not match number of inputdimensions values (", input_dimensions_.size(), ")."); const Tensor* tensor_pointer = context->Input(0); if (tensor_pointer == nullptr) return Status(common::ONNXRUNTIME, common::FAIL, "input count mismatch"); diff --git a/onnxruntime/core/providers/cpu/ml/imputer.cc b/onnxruntime/core/providers/cpu/ml/imputer.cc index 9b64f3d6d599c..4511a6fca5b80 100644 --- a/onnxruntime/core/providers/cpu/ml/imputer.cc +++ b/onnxruntime/core/providers/cpu/ml/imputer.cc @@ -56,9 +56,9 @@ ONNX_CPU_OPERATOR_ML_KERNEL( ImputerOp::ImputerOp(const OpKernelInfo& info) : OpKernel(info), imputed_values_float_(info.GetAttrsOrDefault("imputed_value_floats")), imputed_values_int64_(info.GetAttrsOrDefault("imputed_value_int64s")) { - if (imputed_values_float_.size() && !info.GetAttr("replaced_value_float", &replaced_value_float_).IsOK()) + if (!imputed_values_float_.empty() && !info.GetAttr("replaced_value_float", &replaced_value_float_).IsOK()) ORT_THROW("Expected 'replaced_value_float' attribute since 'imputed_value_floats' is specified"); - if (imputed_values_int64_.size() && !info.GetAttr("replaced_value_int64", &replaced_value_int64_).IsOK()) + if (!imputed_values_int64_.empty() && !info.GetAttr("replaced_value_int64", &replaced_value_int64_).IsOK()) ORT_THROW("Expected 'replace_value_int64' attribute since 'imputed_values_int64' is specified"); ORT_ENFORCE(imputed_values_float_.empty() ^ imputed_values_int64_.empty(), "Must provide imputed_values_float_ or imputed_values_int64_ but not both."); @@ -118,7 +118,8 @@ common::Status ImputerOp::Compute(OpKernelContext* context) const { auto input_type = input_tensor_ptr->DataType(); if (input_type == DataTypeImpl::GetType()) { return ComputeByType(context, replaced_value_float_, imputed_values_float_); - } else if (input_type == DataTypeImpl::GetType()) { + } + if (input_type == DataTypeImpl::GetType()) { return ComputeByType(context, replaced_value_int64_, imputed_values_int64_); } else { return Status(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid type"); diff --git a/onnxruntime/core/providers/cpu/ml/linearclassifier.cc b/onnxruntime/core/providers/cpu/ml/linearclassifier.cc index 70486c66d4f89..e600a9af438a0 100644 --- a/onnxruntime/core/providers/cpu/ml/linearclassifier.cc +++ b/onnxruntime/core/providers/cpu/ml/linearclassifier.cc @@ -85,7 +85,7 @@ Status LinearClassifier::Compute(OpKernelContext* ctx) const { size_t current_weight_0 = i * stride; int maxclass = -1; float maxweight = 0.f; - for (int j = 0; j < class_count; j++) //for each class + for (int j = 0; j < class_count_; j++) // for each class { size_t current_coeff_0 = j * stride; float weight = 0.f; diff --git a/onnxruntime/core/providers/cpu/ml/linearregressor.cc b/onnxruntime/core/providers/cpu/ml/linearregressor.cc index 37aa5deef11a0..6b88d5561186d 100644 --- a/onnxruntime/core/providers/cpu/ml/linearregressor.cc +++ b/onnxruntime/core/providers/cpu/ml/linearregressor.cc @@ -34,7 +34,7 @@ Status LinearRegressor::Compute(OpKernelContext* ctx) const { const auto* Xdata = X->template Data(); int64_t yindex = 0; - bool useIntercepts = intercepts_.size() == static_cast(targets_) ? true : false; + bool useIntercepts = intercepts_.size() == static_cast(targets_); for (int64_t i = 0; i < N; i++) //for each point { std::vector scores; diff --git a/onnxruntime/core/providers/cpu/ml/ml_common.h b/onnxruntime/core/providers/cpu/ml/ml_common.h index 3e8c32f979931..773fd2b662350 100644 --- a/onnxruntime/core/providers/cpu/ml/ml_common.h +++ b/onnxruntime/core/providers/cpu/ml/ml_common.h @@ -28,19 +28,23 @@ enum class NODE_MODE { static inline NODE_MODE MakeTreeNodeMode(const std::string& input) { if (input == "BRANCH_LEQ") { return NODE_MODE::BRANCH_LEQ; - } else if (input == "LEAF") { + } + if (input == "LEAF") { return NODE_MODE::LEAF; - } else if (input == "BRANCH_LT") { + } + if (input == "BRANCH_LT") { return NODE_MODE::BRANCH_LT; - } else if (input == "BRANCH_GTE") { + } + if (input == "BRANCH_GTE") { return NODE_MODE::BRANCH_GTE; - } else if (input == "BRANCH_GT") { + } + if (input == "BRANCH_GT") { return NODE_MODE::BRANCH_GT; - } else if (input == "BRANCH_EQ") { + } + if (input == "BRANCH_EQ") { return NODE_MODE::BRANCH_EQ; - } else { - return NODE_MODE::BRANCH_NEQ; } + return NODE_MODE::BRANCH_NEQ; } enum class POST_EVAL_TRANSFORM { @@ -54,15 +58,17 @@ enum class POST_EVAL_TRANSFORM { static inline POST_EVAL_TRANSFORM MakeTransform(const std::string& input) { if (input == "NONE") { return POST_EVAL_TRANSFORM::NONE; - } else if (input == "LOGISTIC") { + } + if (input == "LOGISTIC") { return POST_EVAL_TRANSFORM::LOGISTIC; - } else if (input == "SOFTMAX") { + } + if (input == "SOFTMAX") { return POST_EVAL_TRANSFORM::SOFTMAX; - } else if (input == "SOFTMAX_ZERO") { + } + if (input == "SOFTMAX_ZERO") { return POST_EVAL_TRANSFORM::SOFTMAX_ZERO; - } else { - return POST_EVAL_TRANSFORM::PROBIT; } + return POST_EVAL_TRANSFORM::PROBIT; } enum class AGGREGATE_FUNCTION { @@ -75,13 +81,14 @@ enum class AGGREGATE_FUNCTION { static inline AGGREGATE_FUNCTION MakeAggregateFunction(const std::string& input) { if (input == "AVERAGE") { return AGGREGATE_FUNCTION::AVERAGE; - } else if (input == "SUM") { + } + if (input == "SUM") { return AGGREGATE_FUNCTION::SUM; - } else if (input == "MIN") { + } + if (input == "MIN") { return AGGREGATE_FUNCTION::MIN; - } else { - return AGGREGATE_FUNCTION::MAX; } + return AGGREGATE_FUNCTION::MAX; } enum class CAST_TO { @@ -93,13 +100,14 @@ enum class CAST_TO { static inline CAST_TO MakeCast(const std::string& input) { if (input == "TO_FLOAT") { return CAST_TO::TO_FLOAT; - } else if (input == "TO_STRING") { + } + if (input == "TO_STRING") { return CAST_TO::TO_STRING; - } else if (input == "TO_INT64") { + } + if (input == "TO_INT64") { return CAST_TO::TO_INT64; - } else { - ORT_THROW("Invalid CAST_TO value of ", input, " Expected TO_FLOAT, TO_STRING or TO_INT64"); } + ORT_THROW("Invalid CAST_TO value of ", input, " Expected TO_FLOAT, TO_STRING or TO_INT64"); } enum PACK_MAP { @@ -110,11 +118,11 @@ enum PACK_MAP { static inline PACK_MAP MakePack(const std::string& input) { if (input == "DENSE") { return PACK_MAP::DENSE; - } else if (input == "SPARSE") { + } + if (input == "SPARSE") { return PACK_MAP::SPARSE; - } else { - ORT_THROW("Invalid PACK_MAP value of ", input, " Expected DENSE or SPARSE"); } + ORT_THROW("Invalid PACK_MAP value of ", input, " Expected DENSE or SPARSE"); } enum KERNEL { @@ -127,13 +135,14 @@ enum KERNEL { static inline KERNEL MakeKernel(const std::string& input) { if (input == "LINEAR") { return KERNEL::LINEAR; - } else if (input == "POLY") { + } + if (input == "POLY") { return KERNEL::POLY; - } else if (input == "RBF") { + } + if (input == "RBF") { return KERNEL::RBF; - } else { - return KERNEL::SIGMOID; } + return KERNEL::SIGMOID; } enum NORMALIZE { @@ -145,13 +154,14 @@ enum NORMALIZE { static inline NORMALIZE MakeNormalize(const std::string& input) { if (input == "MAX") { return NORMALIZE::NMAX; - } else if (input == "L1") { + } + if (input == "L1") { return NORMALIZE::L1; - } else if (input == "L2") { + } + if (input == "L2") { return NORMALIZE::L2; - } else { - ORT_THROW("Invalid normalize value of ", input); } + ORT_THROW("Invalid normalize value of ", input); } enum class SVM_TYPE { @@ -287,39 +297,56 @@ static inline void ComputeSoftmaxZero(std::vector& values) { template void write_scores(std::vector& scores, POST_EVAL_TRANSFORM post_transform, int64_t write_index, Tensor* Z, int add_second_class) { - if (post_transform == POST_EVAL_TRANSFORM::PROBIT && scores.size() == 1) { - scores[0] = ComputeProbit(scores[0]); - } else if (scores.size() >= 2) { //multiclass - if (post_transform == POST_EVAL_TRANSFORM::LOGISTIC) { - for (float& score : scores) { - score = ComputeLogistic(score); - } - } else if (post_transform == POST_EVAL_TRANSFORM::SOFTMAX) { - ComputeSoftmax(scores); - } else if (post_transform == POST_EVAL_TRANSFORM::SOFTMAX_ZERO) { - ComputeSoftmaxZero(scores); + if (scores.size() >= 2) { + switch (post_transform) { + case POST_EVAL_TRANSFORM::PROBIT: + for (float& score : scores) + score = ComputeProbit(score); + break; + case POST_EVAL_TRANSFORM::LOGISTIC: + for (float& score : scores) + score = ComputeLogistic(score); + break; + case POST_EVAL_TRANSFORM::SOFTMAX: + ComputeSoftmax(scores); + break; + case POST_EVAL_TRANSFORM::SOFTMAX_ZERO: + ComputeSoftmaxZero(scores); + break; + default: + case POST_EVAL_TRANSFORM::NONE: + break; } - } else { //binary case - if (add_second_class == 0 && scores.size() == 1) { //0=all positive weights, winning class is positive - scores.push_back(scores[0]); - scores[0] = 1.f - scores[0]; //put opposite score in positive slot - } else if (add_second_class == 1 && scores.size() == 1) { //1 = all positive weights, winning class is negative - scores.push_back(scores[0]); - scores[0] = 1.f - scores[0]; //put opposite score in positive slot - } else if (add_second_class == 2 && scores.size() == 1) { //2 = mixed weights, winning class is positive - if (post_transform == POST_EVAL_TRANSFORM::LOGISTIC) { - scores.push_back(ComputeLogistic(scores[0])); - scores[0] = ComputeLogistic(-scores[0]); - } else { - scores.push_back(scores[0]); - scores[0] = -scores[0]; - } - } else if (add_second_class == 3 && scores.size() == 1) { //3 = mixed weights, winning class is negative - if (post_transform == POST_EVAL_TRANSFORM::LOGISTIC) { - scores.push_back(ComputeLogistic(scores[0])); - scores[0] = ComputeLogistic(-scores[0]); - } else { - scores.push_back(-scores[0]); + } else if (scores.size() == 1) { //binary case + if (post_transform == POST_EVAL_TRANSFORM::PROBIT) { + scores[0] = ComputeProbit(scores[0]); + } else { + switch (add_second_class) { + case 0: //0=all positive weights, winning class is positive + scores.push_back(scores[0]); + scores[0] = 1.f - scores[0]; //put opposite score in positive slot + break; + case 1: //1 = all positive weights, winning class is negative + scores.push_back(scores[0]); + scores[0] = 1.f - scores[0]; //put opposite score in positive slot + break; + case 2: //2 = mixed weights, winning class is positive + if (post_transform == POST_EVAL_TRANSFORM::LOGISTIC) { + scores.push_back(ComputeLogistic(scores[0])); //ml_logit(scores[k]); + scores[0] = ComputeLogistic(-scores[0]); + } else { + scores.push_back(scores[0]); + scores[0] = -scores[0]; + } + break; + case 3: //3 = mixed weights, winning class is negative + if (post_transform == POST_EVAL_TRANSFORM::LOGISTIC) { + scores.push_back(ComputeLogistic(scores[0])); //ml_logit(scores[k]); + scores[0] = ComputeLogistic(-scores[0]); + } else { + scores.push_back(-scores[0]); + } + break; } } } diff --git a/onnxruntime/core/providers/cpu/ml/svmclassifier.cc b/onnxruntime/core/providers/cpu/ml/svmclassifier.cc index 7624d5c52cede..3bc845fa2044e 100644 --- a/onnxruntime/core/providers/cpu/ml/svmclassifier.cc +++ b/onnxruntime/core/providers/cpu/ml/svmclassifier.cc @@ -75,6 +75,32 @@ SVMClassifier::SVMClassifier(const OpKernelInfo& info) } } +template +int _set_score_svm(Tensor* Y, float max_weight, const int64_t maxclass, const int64_t n, + POST_EVAL_TRANSFORM post_transform_, const std::vector& proba_, bool weights_are_all_positive_, + const std::vector& classlabels, LabelType posclass, LabelType negclass) { + int write_additional_scores = -1; + auto output_data = Y->template MutableData(); + if (classlabels.size() == 2) { + write_additional_scores = post_transform_ == POST_EVAL_TRANSFORM::NONE ? 2 : 0; + if (proba_.size() == 0) { + if (weights_are_all_positive_ && max_weight >= 0.5) + output_data[n] = classlabels[1]; + else if (max_weight > 0 && !weights_are_all_positive_) + output_data[n] = classlabels[1]; + else + output_data[n] = classlabels[maxclass]; + } else { + output_data[n] = classlabels[maxclass]; + } + } else if (max_weight > 0) { + output_data[n] = posclass; + } else { + output_data[n] = negclass; + } + return write_additional_scores; +} + template Status SVMClassifier::Compute(OpKernelContext* ctx) const { const Tensor* X = ctx->Input(0); @@ -83,40 +109,51 @@ Status SVMClassifier::Compute(OpKernelContext* ctx) const { int64_t N = X->Shape().NumDimensions() == 1 ? 1 : X->Shape()[0]; Tensor* Y = ctx->Output(0, TensorShape({N})); - Tensor* Z; - std::vector dims; - if (mode_ == SVM_TYPE::SVM_SVC && proba_.size() == 0) - dims = {static_cast(N), static_cast(class_count_ * (class_count_ - 1) / 2)}; - else - dims = {static_cast(N), static_cast(class_count_)}; - Z = ctx->Output(1, TensorShape(dims)); + int64_t nb_columns = class_count_; + if (proba_.size() == 0 && vector_count_ > 0) { + if (class_count_ > 2) + nb_columns = class_count_ * (class_count_ - 1) / 2; + else + nb_columns = 2; + } + + std::vector dims{N, nb_columns}; + Tensor* Z = ctx->Output(1, TensorShape(dims)); - const auto* x_data = X->template Data(); + const T* x_data = X->template Data(); int64_t zindex = 0; for (int64_t n = 0; n < N; n++) //for each example { int64_t current_weight_0 = n * stride; int64_t maxclass = -1; - double maxweight = 0.f; std::vector decisions; std::vector scores; std::vector kernels; std::vector votes; - if (mode_ == SVM_TYPE::SVM_SVC) { + if (vector_count_ == 0 && mode_ == SVM_TYPE::SVM_LINEAR) { + for (int64_t j = 0; j < class_count_; j++) { //for each class + auto val = kernel_dot(x_data, current_weight_0, coefficients_, feature_count_ * j, + feature_count_, get_kernel_type()); + val += rho_[0]; + scores.push_back(val); + } + } else { + if (vector_count_ == 0) + return Status(common::ONNXRUNTIME, common::FAIL, "No support vectors."); + int evals = 0; + for (int64_t j = 0; j < vector_count_; j++) { - float val = kernel_dot(x_data, current_weight_0, support_vectors_, feature_count_ * j, feature_count_, get_kernel_type()); + auto val = kernel_dot(x_data, current_weight_0, support_vectors_, feature_count_ * j, + feature_count_, get_kernel_type()); kernels.push_back(val); } - for (int64_t j = 0; j < class_count_; j++) { - votes.push_back(0); - } - int evals = 0; - for (int64_t i = 0; i < class_count_; i++) { //for each class - for (int64_t j = i + 1; j < class_count_; j++) { //for each class - float sum = 0; + votes.resize(class_count_, 0); + for (int64_t i = 0; i < class_count_; i++) { // for each class + for (int64_t j = i + 1; j < class_count_; j++) { // for each class + double sum = 0; int64_t start_index_i = starting_vector_[i]; // *feature_count_; int64_t start_index_j = starting_vector_[j]; // *feature_count_; @@ -125,120 +162,71 @@ Status SVMClassifier::Compute(OpKernelContext* ctx) const { int64_t pos1 = (vector_count_) * (j - 1); int64_t pos2 = (vector_count_) * (i); - for (int64_t m = 0; m < class_i_support_count; m++) { - float val1 = coefficients_[pos1 + start_index_i + m]; - float val2 = kernels[start_index_i + m]; - sum += val1 * val2; - } - for (int64_t m = 0; m < class_j_support_count; m++) { - float val1 = coefficients_[pos2 + start_index_j + m]; - float val2 = kernels[start_index_j + m]; - sum += val1 * val2; - } + const float* val1 = &(coefficients_[pos1 + start_index_i]); + const float* val2 = &(kernels[start_index_i]); + for (int64_t m = 0; m < class_i_support_count; ++m, ++val1, ++val2) + sum += *val1 * *val2; + + val1 = &(coefficients_[pos2 + start_index_j]); + val2 = &(kernels[start_index_j]); + for (int64_t m = 0; m < class_j_support_count; ++m, ++val1, ++val2) + sum += *val1 * *val2; sum += rho_[evals]; - scores.push_back(sum); - if (sum > 0) { - votes[i]++; - } else { - votes[j]++; - } - evals++; //index into rho + scores.push_back((float)sum); + ++(votes[sum > 0 ? i : j]); + ++evals; //index into rho } } - } else if (mode_ == SVM_TYPE::SVM_LINEAR) { //liblinear - for (int64_t j = 0; j < class_count_; j++) { //for each class - float val = kernel_dot(x_data, current_weight_0, coefficients_, feature_count_ * j, feature_count_, get_kernel_type()); - val += rho_[0]; - scores.push_back(val); - } } + if (proba_.size() > 0 && mode_ == SVM_TYPE::SVM_SVC) { //compute probabilities from the scores - std::vector estimates; - std::vector probsp2; int64_t num = class_count_ * class_count_; - for (int64_t m = 0; m < num; m++) { - probsp2.push_back(0.f); //min prob - } - for (int64_t m = 0; m < class_count_; m++) { - estimates.push_back(0.f); //min prob - } + std::vector probsp2(num, 0.f); + std::vector estimates(class_count_, 0.f); int64_t index = 0; - for (int64_t i = 0; i < class_count_; i++) { - for (int64_t j = i + 1; j < class_count_; j++) { + for (int64_t i = 0; i < class_count_; ++i) { + int64_t p1 = i * class_count_ + i + 1; + int64_t p2 = (i + 1) * class_count_ + i; + for (int64_t j = i + 1; j < class_count_; ++j, ++index) { float val1 = sigmoid_probability(scores[index], proba_[index], probb_[index]); float val2 = std::max(val1, 1.0e-7f); - probsp2[i * class_count_ + j] = std::min(val2, 1 - 1.0e-7f); - probsp2[j * class_count_ + i] = 1 - probsp2[i * class_count_ + j]; - index++; + val2 = std::min(val2, 1 - 1.0e-7f); + probsp2[p1] = val2; + probsp2[p2] = 1 - val2; + ++p1; + p2 += class_count_; } } multiclass_probability(class_count_, probsp2, estimates); - //copy probabilities back into scores + // copy probabilities back into scores scores.resize(estimates.size()); - for (int64_t k = 0; k < static_cast(estimates.size()); k++) { - scores[k] = estimates[k]; - } + std::copy(estimates.begin(), estimates.end(), scores.begin()); } - int64_t maxvotes = 0; + + float max_weight = 0; if (votes.size() > 0) { - for (int64_t k = 0; k < static_cast(votes.size()); k++) { - if (votes[k] > maxvotes) { - maxvotes = votes[k]; - maxclass = k; - } - } + auto it_maxvotes = std::max_element(votes.begin(), votes.end()); + maxclass = std::distance(votes.begin(), it_maxvotes); } else { - for (int64_t k = 0; k < static_cast(scores.size()); k++) { - if (scores[k] > maxweight) { - maxclass = k; - maxweight = scores[k]; - } - } + auto it_max_weight = std::max_element(scores.begin(), scores.end()); + maxclass = std::distance(scores.begin(), it_max_weight); + max_weight = *it_max_weight; } - //write top class + + // write top class + // onnx specs expects one column per class. int write_additional_scores = -1; - if (rho_.size() == 1) //binary - { + if (rho_.size() == 1) { if (using_strings_) { - if (classlabels_strings_.size() == 2 && weights_are_all_positive_ && maxweight >= 0.5 && proba_.size() == 0) { - Y->template MutableData()[n] = classlabels_strings_[1]; //positive label - write_additional_scores = 0; - } else if (classlabels_strings_.size() == 2 && maxweight > 0 && !weights_are_all_positive_ && proba_.size() == 0) { - Y->template MutableData()[n] = classlabels_strings_[1]; //positive label - write_additional_scores = 0; - } else if (classlabels_strings_.size() == 2 && proba_.size() > 0) { //this case all classes are in their rightful spot - Y->template MutableData()[n] = classlabels_strings_[maxclass]; //whichever label - write_additional_scores = -1; - } else if (classlabels_strings_.size() == 2) { - Y->template MutableData()[n] = classlabels_strings_[0]; //negative label - write_additional_scores = 1; - } else if (maxweight > 0) { - Y->template MutableData()[n] = "1"; //positive label - } else { - Y->template MutableData()[n] = "0"; //negative label - } - } else //no strings - { - if (classlabels_ints_.size() == 2 && weights_are_all_positive_ && maxweight >= 0.5 && proba_.size() == 0) { - Y->template MutableData()[n] = classlabels_ints_[1]; //positive label - write_additional_scores = 0; - } else if (classlabels_ints_.size() == 2 && maxweight > 0 && !weights_are_all_positive_ && proba_.size() == 0) { - Y->template MutableData()[n] = classlabels_ints_[0]; //pos label - write_additional_scores = 0; - } else if (classlabels_ints_.size() == 2 && proba_.size() > 0) //this case all classes are in their rightful spot - { - Y->template MutableData()[n] = classlabels_ints_[maxclass]; //whichever label - write_additional_scores = -1; - } else if (classlabels_ints_.size() == 2) { - Y->template MutableData()[n] = classlabels_ints_[0]; //negative label - write_additional_scores = 1; - } else if (maxweight > 0) { - Y->template MutableData()[n] = 1; //positive label - } else { - Y->template MutableData()[n] = 0; //negative label - } + write_additional_scores = _set_score_svm( + Y, max_weight, maxclass, n, post_transform_, proba_, + weights_are_all_positive_, classlabels_strings_, "1", "0"); + } else { + write_additional_scores = _set_score_svm( + Y, max_weight, maxclass, n, post_transform_, proba_, + weights_are_all_positive_, classlabels_ints_, 1, 0); } } else { //multiclass if (using_strings_) { diff --git a/onnxruntime/core/providers/cpu/ml/svmclassifier.h b/onnxruntime/core/providers/cpu/ml/svmclassifier.h index 5490a44fbb333..bc0cf83627b41 100644 --- a/onnxruntime/core/providers/cpu/ml/svmclassifier.h +++ b/onnxruntime/core/providers/cpu/ml/svmclassifier.h @@ -19,7 +19,7 @@ class SVMCommon { std::vector kernel_params; ORT_ENFORCE(info.GetAttrs("kernel_params", kernel_params).IsOK()); - if (kernel_params.size() > 0) { + if (!kernel_params.empty()) { gamma_ = kernel_params[0]; coef0_ = kernel_params[1]; degree_ = kernel_params[2]; @@ -30,31 +30,30 @@ class SVMCommon { KERNEL get_kernel_type() const { return kernel_type_; } float kernel_dot(const T* A, int64_t a, const std::vector& B, int64_t b, int64_t len, KERNEL k) const { - float sum = 0.f; + double sum = 0; + const T* pA = A + a; + const float* pB = B.data() + b; if (k == KERNEL::POLY) { - for (int64_t i = 0; i < len; i++) { - sum += B[b + i] * static_cast(A[a + i]); - } + for (int64_t i = len; i > 0; --i, ++pA, ++pB) + sum += *pA * *pB; sum = gamma_ * sum + coef0_; sum = std::pow(sum, degree_); } else if (k == KERNEL::SIGMOID) { - for (int64_t i = 0; i < len; i++) { - sum += B[b + i] * static_cast(A[a + i]); - } + for (int64_t i = len; i > 0; --i, ++pA, ++pB) + sum += *pA * *pB; sum = gamma_ * sum + coef0_; sum = std::tanh(sum); } else if (k == KERNEL::RBF) { - for (int64_t i = 0; i < len; i++) { - float val = static_cast(A[a + i]) - B[b + i]; - sum += (val * val); + for (int64_t i = len; i > 0; --i, ++pA, ++pB) { + double val = *pA - *pB; + sum += val * val; } sum = std::exp(-gamma_ * sum); } else if (k == KERNEL::LINEAR) { - for (int64_t i = 0; i < len; i++) { - sum += B[b + i] * static_cast(A[a + i]); - } + for (int64_t i = len; i > 0; --i, ++pA, ++pB) + sum += *pA * *pB; } - return sum; + return (float)sum; } private: diff --git a/onnxruntime/core/providers/cpu/ml/svmregressor.cc b/onnxruntime/core/providers/cpu/ml/svmregressor.cc index ba1ace0b09b2e..d7252b68a9b73 100644 --- a/onnxruntime/core/providers/cpu/ml/svmregressor.cc +++ b/onnxruntime/core/providers/cpu/ml/svmregressor.cc @@ -21,7 +21,7 @@ SVMRegressor::SVMRegressor(const OpKernelInfo& info) post_transform_(MakeTransform(info.GetAttrOrDefault("post_transform", "NONE"))) { ORT_ENFORCE(info.GetAttrs("rho", rho_).IsOK()); ORT_ENFORCE(info.GetAttrs("coefficients", coefficients_).IsOK()); - ORT_ENFORCE(coefficients_.size() > 0); + ORT_ENFORCE(!coefficients_.empty()); int64_t onec = info.GetAttrOrDefault("one_class", 0); one_class_ = (onec != 0); diff --git a/onnxruntime/core/providers/cpu/ml/tree_ensemble_classifier.cc b/onnxruntime/core/providers/cpu/ml/tree_ensemble_classifier.cc index e756b172b6129..5f5d18efb8b0f 100644 --- a/onnxruntime/core/providers/cpu/ml/tree_ensemble_classifier.cc +++ b/onnxruntime/core/providers/cpu/ml/tree_ensemble_classifier.cc @@ -222,8 +222,8 @@ void TreeEnsembleClassifier::Initialize() { std::sort(std::begin(leafnodedata_), std::end(leafnodedata_), [](auto const& t1, auto const& t2) { if (std::get<0>(t1) != std::get<0>(t2)) return std::get<0>(t1) < std::get<0>(t2); - else - return std::get<1>(t1) < std::get<1>(t2); + + return std::get<1>(t1) < std::get<1>(t2); }); // make an index so we can find the leafnode data quickly when evaluating int64_t field0 = -1; @@ -344,7 +344,7 @@ common::Status TreeEnsembleClassifier::Compute(OpKernelContext* context) cons } } else // binary case { - maxweight = classes.size() > 0 ? classes[0] : 0.f; // only 1 class + maxweight = !classes.empty() ? classes[0] : 0.f; // only 1 class if (using_strings_) { auto* y_data = Y->template MutableData(); if (classlabels_strings_.size() == 2 && diff --git a/onnxruntime/core/providers/cpu/ml/treeregressor.cc b/onnxruntime/core/providers/cpu/ml/treeregressor.cc index 2bb0a3db3cbe5..a0e9ba91c25fb 100644 --- a/onnxruntime/core/providers/cpu/ml/treeregressor.cc +++ b/onnxruntime/core/providers/cpu/ml/treeregressor.cc @@ -70,7 +70,7 @@ TreeEnsembleRegressor::TreeEnsembleRegressor(const OpKernelInfo& info) ORT_ENFORCE(nodes_id_size == nodes_modes_.size()); ORT_ENFORCE(nodes_id_size == nodes_truenodeids_.size()); ORT_ENFORCE(nodes_id_size == nodes_falsenodeids_.size()); - ORT_ENFORCE((nodes_id_size == nodes_hitrates_.size()) || (0 == nodes_hitrates_.size())); + ORT_ENFORCE((nodes_id_size == nodes_hitrates_.size()) || (nodes_hitrates_.empty())); max_tree_depth_ = 1000; offset_ = four_billion_; @@ -81,8 +81,8 @@ TreeEnsembleRegressor::TreeEnsembleRegressor(const OpKernelInfo& info) std::sort(begin(leafnode_data_), end(leafnode_data_), [](auto const& t1, auto const& t2) { if (std::get<0>(t1) != std::get<0>(t2)) return std::get<0>(t1) < std::get<0>(t2); - else - return std::get<1>(t1) < std::get<1>(t2); + + return std::get<1>(t1) < std::get<1>(t2); }); //make an index so we can find the leafnode data quickly when evaluating int64_t field0 = -1; @@ -147,7 +147,7 @@ TreeEnsembleRegressor::TreeEnsembleRegressor(const OpKernelInfo& info) } template -common::Status TreeEnsembleRegressor::ProcessTreeNode(std::unordered_map& classes, int64_t treeindex, const T* Xdata, int64_t feature_base) const { +common::Status TreeEnsembleRegressor::ProcessTreeNode(std::unordered_map < int64_t, std::tuple>& classes, int64_t treeindex, const T* Xdata, int64_t feature_base) const { //walk down tree to the leaf ::onnxruntime::ml::NODE_MODE mode = static_cast<::onnxruntime::ml::NODE_MODE>(nodes_modes_[treeindex]); int64_t loopcount = 0; @@ -197,9 +197,13 @@ common::Status TreeEnsembleRegressor::ProcessTreeNode(std::unordered_map(leafnode_data_[index]); auto it_classes = classes.find(classid); if (it_classes != classes.end()) { - it_classes->second += weight; + auto& tuple = it_classes->second; + std::get<0>(tuple) += weight; + if (weight < std::get<1>(tuple)) std::get<1>(tuple) = weight; + if (weight > std::get<2>(tuple)) std::get<2>(tuple) = weight; } else { - auto p1 = std::make_pair(classid, weight); + std::tuple tuple = std::make_tuple(weight, weight, weight); + auto p1 = std::make_pair(classid, tuple); classes.insert(p1); } index++; @@ -232,7 +236,7 @@ common::Status TreeEnsembleRegressor::Compute(OpKernelContext* context) const for (int64_t i = 0; i < N; i++) //for each class { int64_t current_weight_0 = i * stride; - std::unordered_map scores; + std::unordered_map> scores; // sum, min, max //for each tree for (size_t j = 0; j < roots_.size(); j++) { //walk each tree from its root @@ -246,13 +250,13 @@ common::Status TreeEnsembleRegressor::Compute(OpKernelContext* context) const float val = base_values_.size() == (size_t)n_targets_ ? base_values_[j] : 0.f; if (it_scores != scores.end()) { if (aggregate_function_ == ::onnxruntime::ml::AGGREGATE_FUNCTION::AVERAGE) { - val += scores[j] / roots_.size(); + val += std::get<0>(scores[j]) / roots_.size(); //first element of tuple is already a sum } else if (aggregate_function_ == ::onnxruntime::ml::AGGREGATE_FUNCTION::SUM) { - val += scores[j]; + val += std::get<0>(scores[j]); } else if (aggregate_function_ == ::onnxruntime::ml::AGGREGATE_FUNCTION::MIN) { - if (scores[j] < val) val = scores[j]; + val += std::get<1>(scores[j]); // second element of tuple is min } else if (aggregate_function_ == ::onnxruntime::ml::AGGREGATE_FUNCTION::MAX) { - if (scores[j] > val) val = scores[j]; + val += std::get<2>(scores[j]); // third element of tuple is max } } outputs.push_back(val); diff --git a/onnxruntime/core/providers/cpu/ml/treeregressor.h b/onnxruntime/core/providers/cpu/ml/treeregressor.h index 2216e5c74b689..2af61fe393541 100644 --- a/onnxruntime/core/providers/cpu/ml/treeregressor.h +++ b/onnxruntime/core/providers/cpu/ml/treeregressor.h @@ -15,7 +15,7 @@ class TreeEnsembleRegressor final : public OpKernel { common::Status Compute(OpKernelContext* context) const override; private: - common::Status ProcessTreeNode(std::unordered_map& classes, int64_t treeindex, const T* Xdata, int64_t feature_base) const; + common::Status ProcessTreeNode(std::unordered_map < int64_t, std::tuple>& classes, int64_t treeindex, const T* Xdata, int64_t feature_base) const; std::vector nodes_treeids_; std::vector nodes_nodeids_; diff --git a/onnxruntime/core/providers/cpu/ml/zipmap.cc b/onnxruntime/core/providers/cpu/ml/zipmap.cc index 4a1080d347ffc..37414efc7810b 100644 --- a/onnxruntime/core/providers/cpu/ml/zipmap.cc +++ b/onnxruntime/core/providers/cpu/ml/zipmap.cc @@ -49,7 +49,7 @@ common::Status ZipMapOp::Compute(OpKernelContext* context) const { if (tensor_pointer == nullptr) return Status(common::ONNXRUNTIME, common::FAIL, "input count mismatch"); const Tensor& X = *tensor_pointer; const TensorShape& x_shape = X.Shape(); - const vector x_dims = x_shape.GetDims(); + const vector& x_dims = x_shape.GetDims(); if (x_dims.empty()) { return Status(ONNXRUNTIME, @@ -81,9 +81,9 @@ common::Status ZipMapOp::Compute(OpKernelContext* context) const { //auto* y_data = Y->template MutableData>>(); y_data->resize(batch_size); int64_t current_weight_0 = 0; - for (int n = 0; n < batch_size; n++) { + for (int64_t n = 0; n < batch_size; n++) { std::map map1; - for (int j = 0; j < features_per_batch; j++) { + for (int64_t j = 0; j < features_per_batch; j++) { map1[classlabels_strings_[j]] = x_data[current_weight_0 + j]; } current_weight_0 += features_per_batch; diff --git a/onnxruntime/core/providers/cpu/nn/autopad_type.h b/onnxruntime/core/providers/cpu/nn/autopad_type.h index d6681e369c70a..c9a0e4822f5f1 100644 --- a/onnxruntime/core/providers/cpu/nn/autopad_type.h +++ b/onnxruntime/core/providers/cpu/nn/autopad_type.h @@ -21,14 +21,16 @@ inline AutoPadType StringToAutoPadType(const std::string& str) { } if (str == "NOTSET") { // in onnx spec, default value is "NOTSET" return AutoPadType::NOTSET; - } else if (str == "VALID") { + } + if (str == "VALID") { return AutoPadType::VALID; - } else if (str == "SAME_UPPER") { + } + if (str == "SAME_UPPER") { return AutoPadType::SAME_UPPER; - } else if (str == "SAME_LOWER") { + } + if (str == "SAME_LOWER") { return AutoPadType::SAME_LOWER; - } else { - ORT_ENFORCE(false, "Unknown AutoPadType String"); } + ORT_ENFORCE(false, "Unknown AutoPadType String"); } } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/nn/conv_base.h b/onnxruntime/core/providers/cpu/nn/conv_base.h index 7ff31ae6032f3..9d6a4316d5730 100644 --- a/onnxruntime/core/providers/cpu/nn/conv_base.h +++ b/onnxruntime/core/providers/cpu/nn/conv_base.h @@ -159,8 +159,8 @@ class ConvBase { const std::vector& dilations, std::vector* pads, std::vector* output_shape) const { - int rank = gsl::narrow_cast(input_shape.NumDimensions()); - for (int dim = 0; dim < rank; ++dim) { + size_t rank = input_shape.NumDimensions(); + for (size_t dim = 0; dim < rank; ++dim) { if (dim >= strides.size() || dim >= kernel_shape.size() || dim >= dilations.size() || dim >= pads->size() || rank + dim >= pads->size()) { diff --git a/onnxruntime/core/providers/cpu/nn/conv_integer.cc b/onnxruntime/core/providers/cpu/nn/conv_integer.cc index ecccd20abddc2..5c78d5b134ccd 100644 --- a/onnxruntime/core/providers/cpu/nn/conv_integer.cc +++ b/onnxruntime/core/providers/cpu/nn/conv_integer.cc @@ -28,7 +28,8 @@ Status ConvInteger::Compute(OpKernelContext* context) const { size_t num_inputs = OpKernel::Node().InputDefs().size(); const Tensor* X = context->Input(0); const Tensor* W = context->Input(1); - int32_t input_offset = 0, filter_offset = 0; + int32_t input_offset = 0; + int32_t filter_offset = 0; if (num_inputs >= 3) { const Tensor* X_Zero_Point = context->Input(2); if (X_Zero_Point->Shape().NumDimensions() == 0 || diff --git a/onnxruntime/core/providers/cpu/nn/conv_transpose.cc b/onnxruntime/core/providers/cpu/nn/conv_transpose.cc index 9c1512854d404..d2d22fbb4f7f5 100644 --- a/onnxruntime/core/providers/cpu/nn/conv_transpose.cc +++ b/onnxruntime/core/providers/cpu/nn/conv_transpose.cc @@ -52,7 +52,7 @@ inline void ComputeTransposePadAndOutputShape( *pad_tail = paddings - paddings / 2; } return; - } else { + } if (pad_type != AutoPadType::NOTSET) { switch (pad_type) { // We handle cases of AutoPadType::VALID and AutoPadType::SAME_UPPER/LOWER, @@ -71,7 +71,6 @@ inline void ComputeTransposePadAndOutputShape( *out_size = (in_size - 1) * stride + kernel + dilation - 1 + adj - *pad_head - *pad_tail; } - } } Status ConvTransposeBase::PrepareForCompute(OpKernelContext* context, bool has_bias, ConvTransposeBase::Prepare& p) const { @@ -177,7 +176,8 @@ void ConvTransposeBase::ComputePadsAndOutputShape( const int64_t N = input_shape[0]; const int64_t H = input_shape[2]; const int64_t W = input_shape[3]; - int64_t output_height = -1, output_width = -1; + int64_t output_height = -1; + int64_t output_width = -1; size_t output_shape_size = output_shape_.size(); if (output_shape_size != 0) { diff --git a/onnxruntime/core/providers/cpu/nn/conv_transpose.h b/onnxruntime/core/providers/cpu/nn/conv_transpose.h index 2c0958a638537..35dcdc08bc4f7 100644 --- a/onnxruntime/core/providers/cpu/nn/conv_transpose.h +++ b/onnxruntime/core/providers/cpu/nn/conv_transpose.h @@ -47,15 +47,10 @@ class ConvTransposeBase : public ConvBase { Status PrepareForCompute(OpKernelContext* context, bool has_bias, Prepare& p) const; - void ComputePadsAndOutputShape( - const TensorShape input_shape, - const int64_t output_channel, - const std::vector& kernel_shape, - const std::vector& strides, - const std::vector& dilations, - const std::vector& output_padding, - std::vector* pads, - std::vector* output_shape) const; + void ComputePadsAndOutputShape(TensorShape input_shape, int64_t output_channel, + const std::vector& kernel_shape, const std::vector& strides, + const std::vector& dilations, const std::vector& output_padding, + std::vector* pads, std::vector* output_shape) const; const std::vector output_padding_; const std::vector output_shape_; diff --git a/onnxruntime/core/providers/cpu/nn/non_max_suppression.cc b/onnxruntime/core/providers/cpu/nn/non_max_suppression.cc deleted file mode 100644 index 3c538cfae9989..0000000000000 --- a/onnxruntime/core/providers/cpu/nn/non_max_suppression.cc +++ /dev/null @@ -1,209 +0,0 @@ -/* Copyright 2015 The TensorFlow Authors. All Rights Reserved. -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -==============================================================================*/ -/* Modifications Copyright (c) Microsoft. */ - -#include "non_max_suppression.h" -#include - -namespace onnxruntime { - -ONNX_OPERATOR_KERNEL_EX( - NonMaxSuppression, - kOnnxDomain, - 10, - kCpuExecutionProvider, - KernelDefBuilder(), - NonMaxSuppression); - -void NonMaxSuppression::MaxMin(const float& lhs, const float& rhs, float& min, float& max) const { - if (lhs >= rhs) { - min = rhs; - max = lhs; - } else { - min = lhs; - max = rhs; - } -} - -bool NonMaxSuppression::SuppressByIOU(const float* boxes_data, int64_t box_index1, int64_t box_index2, float iou_threshold) const { - float x1_min, y1_min, x1_max, y1_max, x2_min, y2_min, x2_max, y2_max; - // center_point_box_ only support 0 or 1 - if (0 == center_point_box_) { - // boxes data format [y1, x1, y2, x2], - MaxMin(boxes_data[4 * box_index1 + 1], boxes_data[4 * box_index1 + 3], x1_min, x1_max); - MaxMin(boxes_data[4 * box_index1 + 0], boxes_data[4 * box_index1 + 2], y1_min, y1_max); - MaxMin(boxes_data[4 * box_index2 + 1], boxes_data[4 * box_index2 + 3], x2_min, x2_max); - MaxMin(boxes_data[4 * box_index2 + 0], boxes_data[4 * box_index2 + 2], y2_min, y2_max); - } else { - // 1 == center_point_box_ => boxes data format [x_center, y_center, width, height] - float box1_width_half = boxes_data[4 * box_index1 + 2] / 2; - float box1_height_half = boxes_data[4 * box_index1 + 3] / 2; - float box2_width_half = boxes_data[4 * box_index2 + 2] / 2; - float box2_height_half = boxes_data[4 * box_index2 + 3] / 2; - - x1_min = boxes_data[4 * box_index1 + 0] - box1_width_half; - x1_max = boxes_data[4 * box_index1 + 0] + box1_width_half; - y1_min = boxes_data[4 * box_index1 + 1] - box1_height_half; - y1_max = boxes_data[4 * box_index1 + 1] + box1_height_half; - - x2_min = boxes_data[4 * box_index2 + 0] - box2_width_half; - x2_max = boxes_data[4 * box_index2 + 0] + box2_width_half; - y2_min = boxes_data[4 * box_index2 + 1] - box2_height_half; - y2_max = boxes_data[4 * box_index2 + 1] + box2_height_half; - } - - const float intersection_x_min = std::max(x1_min, x2_min); - const float intersection_y_min = std::max(y1_min, y2_min); - const float intersection_x_max = std::min(x1_max, x2_max); - const float intersection_y_max = std::min(y1_max, y2_max); - - const float intersection_area = std::max(intersection_x_max - intersection_x_min, static_cast(0.0)) * - std::max(intersection_y_max - intersection_y_min, static_cast(0.0)); - - if (intersection_area <= static_cast(0.0)) { - return false; - } - - const float area1 = (x1_max - x1_min) * (y1_max - y1_min); - const float area2 = (x2_max - x2_min) * (y2_max - y2_min); - const float union_area = area1 + area2 - intersection_area; - - if (area1 <= static_cast(0.0) || area2 <= static_cast(0.0) || union_area <= static_cast(0.0)) { - return false; - } - - const float intersection_over_union = intersection_area / union_area; - - return intersection_over_union > iou_threshold; -} - -Status NonMaxSuppression::ParepareCompute(OpKernelContext* ctx, const TensorShape& boxes_shape, const TensorShape& scores_shape, - int64_t& max_output_boxes_per_class, float& iou_threshold, float& score_threshold, bool& has_score_threshold) const { - ORT_RETURN_IF_NOT(boxes_shape.NumDimensions() == 3, "boxes must be a 3D tensor."); - ORT_RETURN_IF_NOT(scores_shape.NumDimensions() == 3, "scores must be a 3D tensor."); - - auto boxes_dims = boxes_shape.GetDims(); - auto scores_dims = scores_shape.GetDims(); - ORT_RETURN_IF_NOT(boxes_dims[0] == scores_dims[0], "boxes and scores should have same num_batches."); - ORT_RETURN_IF_NOT(boxes_dims[1] == scores_dims[2], "boxes and scores should have same spatial_dimention."); - ORT_RETURN_IF_NOT(boxes_dims[2] == 4, "The most inner dimension in boxes must have 4 data."); - - const_cast(num_batches_) = boxes_dims[0]; - const_cast(num_classes_) = scores_dims[1]; - const_cast(num_boxes_) = boxes_dims[1]; - - const Tensor* max_output_boxes_per_class_tensor = ctx->Input(2); - if (max_output_boxes_per_class_tensor != nullptr) { - max_output_boxes_per_class = *(max_output_boxes_per_class_tensor->Data()); - max_output_boxes_per_class = max_output_boxes_per_class > 0 ? max_output_boxes_per_class : 0; - } - - const Tensor* iou_threshold_tensor = ctx->Input(3); - if (iou_threshold_tensor != nullptr) { - iou_threshold = *(iou_threshold_tensor->Data()); - ORT_RETURN_IF_NOT((iou_threshold >= 0 && iou_threshold <= 1), "iou_threshold must be in range [0, 1]."); - } - - const Tensor* score_threshold_tensor = ctx->Input(4); - if (score_threshold_tensor != nullptr) { - has_score_threshold = true; - score_threshold = *(score_threshold_tensor->Data()); - } - - return Status::OK(); -} - -Status NonMaxSuppression::Compute(OpKernelContext* ctx) const { - const Tensor* boxes = ctx->Input(0); - ORT_ENFORCE(boxes); - const Tensor* scores = ctx->Input(1); - ORT_ENFORCE(scores); - - auto& boxes_shape = boxes->Shape(); - auto& scores_shape = scores->Shape(); - - int64_t max_output_boxes_per_class = 0; - float iou_threshold = 0; - // Not so sure for the value range of score_threshold, so set a bool to indicate whether it has this input - bool has_score_threshold = false; - float score_threshold = 0; - - auto ret = ParepareCompute(ctx, boxes_shape, scores_shape, max_output_boxes_per_class, - iou_threshold, score_threshold, has_score_threshold); - ORT_RETURN_IF_NOT(ret.IsOK(), ret.ErrorMessage()); - - if (0 == max_output_boxes_per_class) { - ctx->Output(0, {0, 3}); - return Status::OK(); - } - - const float* boxes_data = boxes->Data(); - const float* scores_data = scores->Data(); - - struct ScoreIndexPair { - float score; - int64_t index; - }; - - auto LessCompare = [](const ScoreIndexPair& lhs, const ScoreIndexPair& rhs) { - return lhs.score < rhs.score; - }; - - std::vector tmp_selected_indices; - for (int64_t batch_index = 0; batch_index < num_batches_; ++batch_index) { - for (int64_t class_index = 0; class_index < num_classes_; ++class_index) { - int64_t box_score_offset = (batch_index * num_classes_ + class_index) * num_boxes_; - int64_t box_offset = batch_index * num_classes_ * num_boxes_ * 4; - // Filter by score_threshold_ - std::priority_queue, decltype(LessCompare)> sorted_scores_with_index(LessCompare); - for (int64_t box_index = 0; box_index < num_boxes_; ++box_index) { - if (!has_score_threshold || (has_score_threshold && scores_data[box_score_offset + box_index] > score_threshold)) { - sorted_scores_with_index.emplace(ScoreIndexPair({scores_data[box_score_offset + box_index], box_index})); - } - } - - ScoreIndexPair next_top_score; - std::vector selected_indicies_inside_class; - // Get the next box with top score, filter by iou_threshold_ - while (!sorted_scores_with_index.empty()) { - next_top_score = sorted_scores_with_index.top(); - sorted_scores_with_index.pop(); - - bool selected = true; - // Check with existing selected boxes for this class, suppress if exceed the IOU (Intersection Over Union) threshold - for (int i = 0; i < selected_indicies_inside_class.size(); ++i) { - if (SuppressByIOU(boxes_data + box_offset, selected_indicies_inside_class[i], next_top_score.index, iou_threshold)) { - selected = false; - break; - } - } - - if (selected) { - if (max_output_boxes_per_class > 0 && static_cast(selected_indicies_inside_class.size()) >= max_output_boxes_per_class) { - break; - } - selected_indicies_inside_class.push_back(next_top_score.index); - tmp_selected_indices.push_back(selected_index(batch_index, class_index, next_top_score.index)); - } - } //while - } //for class_index - } //for batch_index - - int32_t num_selected = static_cast(tmp_selected_indices.size()); - Tensor* selected_indices = ctx->Output(0, {num_selected, 3}); - ORT_ENFORCE(selected_indices); - memcpy(selected_indices->MutableData(), tmp_selected_indices.data(), num_selected * sizeof(selected_index)); - - return Status::OK(); -} - -} // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/nn/non_max_suppression.h b/onnxruntime/core/providers/cpu/nn/non_max_suppression.h deleted file mode 100644 index d3846280133e8..0000000000000 --- a/onnxruntime/core/providers/cpu/nn/non_max_suppression.h +++ /dev/null @@ -1,44 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -#pragma once - -#include "core/common/common.h" -#include "core/framework/op_kernel.h" - -namespace onnxruntime { - -class NonMaxSuppression final : public OpKernel { - public: - NonMaxSuppression(const OpKernelInfo& info) : OpKernel(info) { - center_point_box_ = info.GetAttrOrDefault("center_point_box", 0); - ORT_ENFORCE(0 == center_point_box_ || 1 == center_point_box_, "center_point_box only support 0 or 1"); - num_batches_ = 0; - num_classes_ = 0; - num_boxes_ = 0; - } - - Status Compute(OpKernelContext* context) const override; - - private: - bool SuppressByIOU(const float* boxes_data, int64_t box_index1, int64_t box_index2, float iou_threshold) const; - void MaxMin(const float& lhs, const float& rhs, float& min, float& max) const; - Status ParepareCompute(OpKernelContext* ctx, const TensorShape& boxes_shape, const TensorShape& scores_shape, - int64_t& max_output_boxes_per_batch, float& iou_threshold, float& score_threshold, bool& has_score_threshold) const; - - private: - int64_t center_point_box_; - - int64_t num_batches_; - int64_t num_classes_; - int64_t num_boxes_; - - struct selected_index { - selected_index(int64_t batch_index, int64_t class_index, int64_t box_index) - : batch_index_(batch_index), class_index_(class_index), box_index_(box_index) {} - int64_t batch_index_ = 0; - int64_t class_index_ = 0; - int64_t box_index_ = 0; - }; -}; -} // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/nn/pool.h b/onnxruntime/core/providers/cpu/nn/pool.h index 40284ea7d5b8b..c4b7022289866 100644 --- a/onnxruntime/core/providers/cpu/nn/pool.h +++ b/onnxruntime/core/providers/cpu/nn/pool.h @@ -17,7 +17,8 @@ class Pool : public OpKernel, public PoolBase { } } - ~Pool() override{}; + ~Pool() override = default; + ; Status Compute(OpKernelContext* context) const override; diff --git a/onnxruntime/core/providers/cpu/nn/pool_base.h b/onnxruntime/core/providers/cpu/nn/pool_base.h index ad90dd70cd6fe..b6fa0843e376b 100644 --- a/onnxruntime/core/providers/cpu/nn/pool_base.h +++ b/onnxruntime/core/providers/cpu/nn/pool_base.h @@ -25,7 +25,7 @@ class PoolProcessContext { public: friend class LpPool; - PoolProcessContext() {} + PoolProcessContext() = default; void init(const OpKernelInfo& info) { ORT_ENFORCE(info.GetAttr("p", &p_).IsOK()); } @@ -134,7 +134,8 @@ class PoolBase { } if (op_name_ == "MaxPool") { - int start, end; + int start; + int end; info.GetKernelDef().SinceVersion(&start, &end); if (start == 8) { storage_order_ = info.GetAttrOrDefault("storage_order", 0 /*default_value*/); @@ -153,7 +154,8 @@ class PoolBase { } } - ~PoolBase(){}; + ~PoolBase() = default; + ; std::vector SetOutputSize(const TensorShape& input_shape, int64_t output_channel, @@ -242,9 +244,9 @@ class PoolBase { int64_t ceil_mode) const { if (ceil_mode == 0) { return static_cast(static_cast(in_size + pad_needed - dilation * (kernel - 1) - 1) / stride + 1); - } else { - return static_cast(ceil(static_cast(in_size + pad_needed - dilation * (kernel - 1) - 1) / stride + 1)); } + return static_cast( + std::ceil(static_cast(in_size + pad_needed - dilation * (kernel - 1) - 1) / stride + 1)); } Status Compute(OpKernelContext* context, MLAS_POOLING_KIND kind) const; diff --git a/onnxruntime/core/providers/cpu/nn/roi_pool.cc b/onnxruntime/core/providers/cpu/nn/roi_pool.cc index 70130a0a173f2..a67be9b476934 100644 --- a/onnxruntime/core/providers/cpu/nn/roi_pool.cc +++ b/onnxruntime/core/providers/cpu/nn/roi_pool.cc @@ -37,10 +37,10 @@ Status RoiPool::Compute(OpKernelContext* context) const { for (int n = 0; n < num_rois; n++) { int roi_batch_id = static_cast(rois[0]); - int roi_start_w = static_cast(round(rois[1] * spatial_scale_)); - int roi_start_h = static_cast(round(rois[2] * spatial_scale_)); - int roi_end_w = static_cast(round(rois[3] * spatial_scale_)); - int roi_end_h = static_cast(round(rois[4] * spatial_scale_)); + int roi_start_w = static_cast(std::round(rois[1] * spatial_scale_)); + int roi_start_h = static_cast(std::round(rois[2] * spatial_scale_)); + int roi_end_w = static_cast(std::round(rois[3] * spatial_scale_)); + int roi_end_h = static_cast(std::round(rois[4] * spatial_scale_)); ORT_ENFORCE(roi_batch_id >= 0); ORT_ENFORCE(roi_batch_id < batch_size); @@ -61,14 +61,10 @@ Status RoiPool::Compute(OpKernelContext* context) const { // Compute pooling region for this output unit: // start (included) = floor(ph * roi_height / pooled_height_) // end (excluded) = ceil((ph + 1) * roi_height / pooled_height_) - int hstart = - static_cast(floor(static_cast(ph) * bin_size_h)); - int wstart = - static_cast(floor(static_cast(pw) * bin_size_w)); - int hend = - static_cast(ceil(static_cast(ph + 1) * bin_size_h)); - int wend = - static_cast(ceil(static_cast(pw + 1) * bin_size_w)); + int hstart = static_cast(std::floor(static_cast(ph) * bin_size_h)); + int wstart = static_cast(std::floor(static_cast(pw) * bin_size_w)); + int hend = static_cast(std::ceil(static_cast(ph + 1) * bin_size_h)); + int wend = static_cast(std::ceil(static_cast(pw + 1) * bin_size_w)); // Add roi offsets and clip to input boundaries hstart = std::min(std::max(hstart + roi_start_h, 0), height); diff --git a/onnxruntime/core/providers/cpu/nn/shrink.cc b/onnxruntime/core/providers/cpu/nn/shrink.cc index 2248cd22ec895..e89b437a2f7ff 100644 --- a/onnxruntime/core/providers/cpu/nn/shrink.cc +++ b/onnxruntime/core/providers/cpu/nn/shrink.cc @@ -21,7 +21,8 @@ inline T ShrinkCore(const T& val, float bias, float lambd) { // Implementing the spec as is for now if (val < -lambd) { return T(val + bias); - } else if (val > lambd) { + } + if (val > lambd) { return T(val - bias); } else { return T(0); diff --git a/onnxruntime/core/providers/cpu/nn/string_normalizer.h b/onnxruntime/core/providers/cpu/nn/string_normalizer.h index 1f15926060f86..ff329a9dab84a 100644 --- a/onnxruntime/core/providers/cpu/nn/string_normalizer.h +++ b/onnxruntime/core/providers/cpu/nn/string_normalizer.h @@ -20,7 +20,7 @@ class StringNormalizer : public OpKernel { }; explicit StringNormalizer(const OpKernelInfo& info); - ~StringNormalizer() = default; + ~StringNormalizer() override = default; Status Compute(OpKernelContext* ctx) const override; diff --git a/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.cc b/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.cc index 34f6cfd8ac70c..a69c94959a562 100644 --- a/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.cc +++ b/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.cc @@ -378,8 +378,7 @@ TfIdfVectorizer::TfIdfVectorizer(const OpKernelInfo& info) : OpKernel(info), imp } } -TfIdfVectorizer::~TfIdfVectorizer() { -} +TfIdfVectorizer::~TfIdfVectorizer() = default; void TfIdfVectorizer::OutputResult(OpKernelContext* ctx, size_t B, const std::vector& frequences) const { const Impl& impl = *impl_; diff --git a/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.h b/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.h index 025933e13f139..e041536ade558 100644 --- a/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.h +++ b/onnxruntime/core/providers/cpu/nn/tfidfvectorizer.h @@ -13,7 +13,7 @@ namespace onnxruntime { class TfIdfVectorizer final : public OpKernel { public: explicit TfIdfVectorizer(const OpKernelInfo& info); - ~TfIdfVectorizer(); + ~TfIdfVectorizer() override; ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(TfIdfVectorizer); Status Compute(OpKernelContext* ctx) const override; diff --git a/onnxruntime/core/providers/cpu/nn/unpool.h b/onnxruntime/core/providers/cpu/nn/unpool.h index 012c1da30d140..66c132a03de64 100644 --- a/onnxruntime/core/providers/cpu/nn/unpool.h +++ b/onnxruntime/core/providers/cpu/nn/unpool.h @@ -53,7 +53,8 @@ class MaxUnpool : public OpKernel { } } - ~MaxUnpool() override{}; + ~MaxUnpool() override = default; + ; Status Compute(OpKernelContext* context) const override; diff --git a/onnxruntime/core/providers/cpu/object_detection/roialign.cc b/onnxruntime/core/providers/cpu/object_detection/roialign.cc index 13d04ffe1199d..1c07e7f50a6db 100644 --- a/onnxruntime/core/providers/cpu/object_detection/roialign.cc +++ b/onnxruntime/core/providers/cpu/object_detection/roialign.cc @@ -16,6 +16,8 @@ /* Modifications Copyright (c) Microsoft. */ #include "roialign.h" + +#include #include "core/util/math_cpuonly.h" #include "core/common/common.h" #include "core/framework/tensor.h" @@ -128,8 +130,12 @@ void pre_calc_for_bilinear_interpolate( T ly = y - y_low; T lx = x - x_low; - T hy = static_cast(1.) - ly, hx = static_cast(1.) - lx; - T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx; + T hy = static_cast(1.) - ly; + T hx = static_cast(1.) - lx; + T w1 = hy * hx; + T w2 = hy * lx; + T w3 = ly * hx; + T w4 = ly * lx; // save weights and indeces PreCalc pc; @@ -190,9 +196,9 @@ void RoiAlignForward( // We use roi_bin_grid to sample the grid and mimic integral int64_t roi_bin_grid_h = (sampling_ratio > 0) ? sampling_ratio - : static_cast(ceil(roi_height / pooled_height)); // e.g., = 2 + : static_cast(std::ceil(roi_height / pooled_height)); // e.g., = 2 int64_t roi_bin_grid_w = - (sampling_ratio > 0) ? sampling_ratio : static_cast(ceil(roi_width / pooled_width)); + (sampling_ratio > 0) ? sampling_ratio : static_cast(std::ceil(roi_width / pooled_width)); // We do average (integral) pooling inside a bin const int64_t count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4 diff --git a/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc b/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc index 6559c447bcc7f..891490b21a75f 100644 --- a/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc +++ b/onnxruntime/core/providers/cpu/reduction/reduction_ops.cc @@ -55,6 +55,7 @@ bool PrepareForReduce(OpKernelContext* ctx, size_t ndim = input.Shape().GetDims().size(); std::vector axes; + axes.reserve(axes_.size()); for (int64_t axis : axes_) { axes.push_back(HandleNegativeAxis(axis, static_cast(ndim))); } @@ -80,14 +81,14 @@ bool PrepareForReduce(OpKernelContext* ctx, //transpose the input so that all to-be-reduced axes are at the head vector transposed_axes(axes.begin(), axes.end()); - for (int i = 0; i < ndim; ++i) { + for (size_t i = 0; i < ndim; ++i) { if (keep_axis[i]) { transposed_axes.push_back(i); } } vector new_dims_(transposed_axes.size()); - for (int i = 0; i < transposed_axes.size(); ++i) { + for (size_t i = 0; i < transposed_axes.size(); ++i) { new_dims_[i] = input.Shape().GetDims().at(transposed_axes[i]); } @@ -112,7 +113,7 @@ bool PrepareForReduce(OpKernelContext* ctx, //set to-be-reduced axes to one. squeeze is keepdims_ is false int64_t first_dim = 1; std::vector reduced_dims; - for (int i = 0; i < in_dims.size(); i++) { + for (size_t i = 0; i < in_dims.size(); i++) { if (keep_axis[i]) { reduced_dims.push_back(in_dims[i]); } else { @@ -142,9 +143,9 @@ bool PrepareForReduce(OpKernelContext* ctx, // Calculate strides std::vector stride_x(itr_axes, 0); - for (size_t i = 0; i < itr_axes; i++) { + for (size_t i = 0; static_cast(i) < itr_axes; i++) { stride_x[i] = 1; - for (size_t j = transposed_axes[i] + 1; j < itr_axes; j++) { + for (size_t j = transposed_axes[i] + 1; static_cast(j) < itr_axes; j++) { stride_x[i] *= in_dims[j]; } } @@ -200,7 +201,8 @@ bool PrepareForReduce(OpKernelContext* ctx, template Status ReduceL1::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -215,7 +217,8 @@ Status ReduceL1::Compute(OpKernelContext* ctx) const { template Status ReduceL2::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -230,7 +233,8 @@ Status ReduceL2::Compute(OpKernelContext* ctx) const { template Status ReduceLogSum::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -249,7 +253,8 @@ Status ReduceLogSum::Compute(OpKernelContext* ctx) const { template Status ReduceLogSumExp::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -272,7 +277,8 @@ Status ReduceLogSumExp::Compute(OpKernelContext* ctx) const { template Status ReduceMax::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -287,7 +293,8 @@ Status ReduceMax::Compute(OpKernelContext* ctx) const { template Status ReduceMean::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; bool no_transpose = PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_, true); @@ -313,7 +320,8 @@ Status ReduceMean::Compute(OpKernelContext* ctx) const { template Status ReduceMin::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -328,7 +336,8 @@ Status ReduceMin::Compute(OpKernelContext* ctx) const { template Status ReduceProd::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -343,7 +352,8 @@ Status ReduceProd::Compute(OpKernelContext* ctx) const { template Status ReduceSum::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; bool no_transpose = PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_, true); @@ -369,7 +379,8 @@ Status ReduceSum::Compute(OpKernelContext* ctx) const { template Status ReduceSumSquare::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -384,7 +395,8 @@ Status ReduceSumSquare::Compute(OpKernelContext* ctx) const { template Status ArgMax::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); @@ -403,7 +415,8 @@ Status ArgMax::Compute(OpKernelContext* ctx) const { template Status ArgMin::Compute(OpKernelContext* ctx) const { std::vector transposedInputData; - int64_t block_size, blocks; + int64_t block_size; + int64_t blocks; Tensor* reduced; PrepareForReduce(ctx, transposedInputData, &reduced, block_size, blocks, axes_, keepdims_); diff --git a/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc b/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc index 4f6878d2763ca..cad650c0177a9 100644 --- a/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc +++ b/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.cc @@ -164,33 +164,19 @@ namespace detail { template class UniDirectionalGru { public: - UniDirectionalGru(AllocatorPtr allocator, - const logging::Logger& logger, - const int seq_length, - const int batch_size, - const int input_size, - const int hidden_size, - const bool linear_before_reset, - Direction direction, - const gsl::span& bias, - const gsl::span& initial_hidden_state, - const ActivationFuncs::Entry& activation_func_f, - const ActivationFuncs::Entry& activation_func_g, - const float clip); - - void Compute(const gsl::span& inputs, - const gsl::span& sequence_lengths, - const int num_directions, - const gsl::span& input_weights, - const gsl::span& recurrent_weights, - gsl::span& outputs, - gsl::span& final_hidden_state); + UniDirectionalGru(AllocatorPtr allocator, int seq_length, int batch_size, int input_size, int hidden_size, + bool linear_before_reset, Direction direction, const gsl::span& bias, + const gsl::span& initial_hidden_state, const ActivationFuncs::Entry& activation_func_f, + const ActivationFuncs::Entry& activation_func_g, float clip); + + void Compute(const gsl::span& inputs, const gsl::span& sequence_lengths, int num_directions, + const gsl::span& input_weights, const gsl::span& recurrent_weights, + gsl::span& outputs, gsl::span& final_hidden_state); ~UniDirectionalGru() = default; private: AllocatorPtr allocator_; - const logging::Logger& logger_; int seq_length_; int batch_size_; @@ -277,8 +263,6 @@ Status DeepCpuGruOp::Compute(OpKernelContext* context) const { template Status DeepCpuGruOp::ComputeImpl(OpKernelContext& context) const { - auto& logger = context.Logger(); - const Tensor& X = *context.Input(0); // inputs. [seq_length, batch_size, input_size] const Tensor& W = *context.Input(1); // weights. [num_directions, 3*hidden_size, input_size] const Tensor& R = *context.Input(2); // recurrence weights. [num_directions, 3*hidden_size, hidden_size] @@ -380,8 +364,13 @@ Status DeepCpuGruOp::ComputeImpl(OpKernelContext& context) const { hidden_output_size_per_direction); std::unique_ptr> fw = std::make_unique>( - alloc, logger, - seq_length, batch_size, input_size, hidden_size_, linear_before_reset_, Direction::kForward, + alloc, + seq_length, + batch_size, + input_size, + hidden_size_, + linear_before_reset_, + Direction::kForward, bias_1, initial_hidden_1, activation_funcs_.Entries()[0], activation_funcs_.Entries()[1], @@ -389,8 +378,13 @@ Status DeepCpuGruOp::ComputeImpl(OpKernelContext& context) const { fw->Compute(input, sequence_lens_span, num_directions_, input_weights_1, recurrent_weights_1, output_1, hidden_output_1); std::unique_ptr> bw = std::make_unique>( - alloc, logger, - seq_length, batch_size, input_size, hidden_size_, linear_before_reset_, Direction::kReverse, + alloc, + seq_length, + batch_size, + input_size, + hidden_size_, + linear_before_reset_, + Direction::kReverse, bias_2, initial_hidden_2, activation_funcs_.Entries()[2], activation_funcs_.Entries()[3], @@ -398,8 +392,13 @@ Status DeepCpuGruOp::ComputeImpl(OpKernelContext& context) const { bw->Compute(input, sequence_lens_span, num_directions_, input_weights_2, recurrent_weights_2, output_2, hidden_output_2); } else { std::unique_ptr> gru_p = std::make_unique>( - alloc, logger, - seq_length, batch_size, input_size, hidden_size_, linear_before_reset_, direction_, + alloc, + seq_length, + batch_size, + input_size, + hidden_size_, + linear_before_reset_, + direction_, bias_1, initial_hidden_1, activation_funcs_.Entries()[0], activation_funcs_.Entries()[1], @@ -422,7 +421,6 @@ namespace detail { template UniDirectionalGru::UniDirectionalGru(AllocatorPtr allocator, - const logging::Logger& logger, const int seq_length, const int batch_size, const int input_size, @@ -435,7 +433,6 @@ UniDirectionalGru::UniDirectionalGru(AllocatorPtr allocator, const ActivationFuncs::Entry& activation_func_g, const float clip) : allocator_(allocator), - logger_(logger), seq_length_(seq_length), batch_size_(batch_size), input_size_(input_size), @@ -798,7 +795,8 @@ void UniDirectionalGru::Compute(const gsl::span& inputs_arg, auto final_hidden_state_dst = final_hidden_state.begin() + i * hidden_size_; std::fill_n(final_hidden_state_dst, hidden_size_, T{}); continue; - } else if (output_sequence) { + } + if (output_sequence) { auto src = outputs.subspan((seq_len - 1) * output_step_length + i * hidden_size_, hidden_size_); auto dest = final_hidden_state.subspan(i * hidden_size_, hidden_size_); gsl::copy(src, dest); diff --git a/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h b/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h index f4967c9d85677..3f6294eb842d6 100644 --- a/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h +++ b/onnxruntime/core/providers/cpu/rnn/deep_cpu_gru.h @@ -45,7 +45,7 @@ class DeepCpuGruOp final : public OpKernel { } } - ORT_ENFORCE(activation_func_names.size() == num_directions_ * 2); + ORT_ENFORCE(activation_func_names.size() == static_cast(num_directions_) * 2); activation_funcs_ = rnn::detail::ActivationFuncs(activation_func_names, activation_func_alphas, diff --git a/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc b/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc index 82dacbd48bb00..8bddcdcb08444 100644 --- a/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc +++ b/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.cc @@ -187,32 +187,17 @@ struct ActivationInfo { template class UniDirectionalLstm { public: - UniDirectionalLstm(AllocatorPtr allocator, - const logging::Logger& logger, - const int seq_length, - const int batch_size, - const int input_size, - const int hidden_size, - Direction direction, - const bool input_forget, - const gsl::span& bias, - const gsl::span& peephole_weights, - const gsl::span& initial_hidden_state, - const gsl::span& initial_cell_state, - const ActivationFuncs::Entry& activation_func_f, - const ActivationFuncs::Entry& activation_func_g, - const ActivationFuncs::Entry& activation_func_h, - const float clip, + UniDirectionalLstm(AllocatorPtr allocator, const logging::Logger& logger, int seq_length, int batch_size, + int input_size, int hidden_size, Direction direction, bool input_forget, + const gsl::span& bias, const gsl::span& peephole_weights, + const gsl::span& initial_hidden_state, const gsl::span& initial_cell_state, + const ActivationFuncs::Entry& activation_func_f, const ActivationFuncs::Entry& activation_func_g, + const ActivationFuncs::Entry& activation_func_h, float clip, onnxruntime::concurrency::ThreadPool& ttp); - void Compute(const gsl::span& inputs, - const gsl::span& sequence_lengths, - const int num_directions, - const gsl::span& input_weights, - const gsl::span& recurrent_weights, - gsl::span& outputs, - gsl::span& final_hidden_state, - gsl::span& final_cell_state); + void Compute(const gsl::span& inputs, const gsl::span& sequence_lengths, int num_directions, + const gsl::span& input_weights, const gsl::span& recurrent_weights, + gsl::span& outputs, gsl::span& final_hidden_state, gsl::span& final_cell_state); ~UniDirectionalLstm() = default; @@ -222,16 +207,11 @@ class UniDirectionalLstm { void SetNumThreads(); - void GateComputations(span_T_iter& out, span_T_iter& out_end, - span_T_iter& C_prev, span_T_iter& C_prev_end, // Ct-1 value not 'ct'. using 'C' for clarity - span_T_iter& C_prev_clipped, span_T_iter& C_prev_clipped_end, - span_T_iter& batched_output, span_T_iter& batched_output_end, - const gsl::span& seq_lengths, - const int min_sequence_length, - const int step, - const int row, - const int local_fused_hidden_rows, - bool output_sequence); + void GateComputations(span_T_iter& out, span_T_iter& out_end, span_T_iter& C_prev, + span_T_iter& C_prev_end, // Ct-1 value not 'ct'. using 'C' for clarity + span_T_iter& C_prev_clipped, span_T_iter& C_prev_clipped_end, span_T_iter& batched_output, + span_T_iter& batched_output_end, const gsl::span& seq_lengths, + int min_sequence_length, int step, int row, int local_fused_hidden_rows, bool output_sequence); void AllocateBuffers(); @@ -851,7 +831,8 @@ void UniDirectionalLstm::Compute(const gsl::span& inputs_arg, DumpMatrix("Xt*(W[iofc]^T) + Ht-t*R[iofc]" + row_str, &*step_out_IOFC, local_fused_hidden_rows, hidden_size_x4); - span_T_iter batched_output, batched_output_end; + span_T_iter batched_output; + span_T_iter batched_output_end; if (output_sequence) { batched_output = outputs.begin() + step * output_step_length; batched_output_end = outputs.end(); @@ -925,7 +906,8 @@ void UniDirectionalLstm::Compute(const gsl::span& inputs_arg, step_out_IOFC, output_iofc_.end(), // input contains Xt*(W[iofc]^T) hidden_size_x4); - span_T_iter batched_output, batched_output_end; + span_T_iter batched_output; + span_T_iter batched_output_end; if (output_sequence) { batched_output = outputs.begin() + step * output_step_length; batched_output_end = outputs.end(); @@ -975,7 +957,8 @@ void UniDirectionalLstm::Compute(const gsl::span& inputs_arg, auto final_hidden_state_dst = final_hidden_state.begin() + i * hidden_size_; std::fill_n(final_hidden_state_dst, hidden_size_, T{}); continue; - } else if (output_sequence) { // copy last output to final_hidden_state + } + if (output_sequence) { // copy last output to final_hidden_state auto src = outputs.subspan((seq_len - 1) * output_step_length + i * hidden_size_, hidden_size_); auto dest = final_hidden_state.subspan(i * hidden_size_, hidden_size_); gsl::copy(src, dest); diff --git a/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h b/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h index f16c8d9ee29dd..606dfbf5b190c 100644 --- a/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h +++ b/onnxruntime/core/providers/cpu/rnn/deep_cpu_lstm.h @@ -44,7 +44,7 @@ class DeepCpuLstmOp final : public OpKernel { } } - ORT_ENFORCE(activation_func_names.size() == num_directions_ * 3); + ORT_ENFORCE(activation_func_names.size() == static_cast(num_directions_) * 3); activation_funcs_ = rnn::detail::ActivationFuncs(activation_func_names, activation_func_alphas, @@ -82,7 +82,8 @@ class DeepCpuLstmOp final : public OpKernel { // across them. mutable due to this. // The alternative would be to create a threadpool in each call to Compute but that would incur thread creation // cost on every call. - mutable onnxruntime::concurrency::ThreadPool ttp_{"DEEPCPU_LSTM", (int)std::thread::hardware_concurrency()}; + mutable onnxruntime::concurrency::ThreadPool ttp_{"DEEPCPU_LSTM", + static_cast(std::thread::hardware_concurrency())}; }; } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/rnn/rnn.cc b/onnxruntime/core/providers/cpu/rnn/rnn.cc index c64ebb44047b1..8bc3a262643cc 100644 --- a/onnxruntime/core/providers/cpu/rnn/rnn.cc +++ b/onnxruntime/core/providers/cpu/rnn/rnn.cc @@ -27,8 +27,8 @@ template T Clip(const T& x, T clip) { if (clip < 0) return x; - else - return std::max(std::min(x, clip), -clip); + + return std::max(std::min(x, clip), -clip); } template diff --git a/onnxruntime/core/providers/cpu/rnn/rnn.h b/onnxruntime/core/providers/cpu/rnn/rnn.h index 3e292c75a389c..2c3b91c272094 100644 --- a/onnxruntime/core/providers/cpu/rnn/rnn.h +++ b/onnxruntime/core/providers/cpu/rnn/rnn.h @@ -34,7 +34,7 @@ class RNN : public OpKernel { activations_.resize(1); } - ORT_ENFORCE(activations_.size() == num_directions); + ORT_ENFORCE(activations_.size() == static_cast(num_directions)); for (int direction = 1; direction < num_directions; direction++) { ORT_ENFORCE(allowed_activations.find(activations_[direction]) != allowed_activations.end()); } diff --git a/onnxruntime/core/providers/cpu/rnn/rnn_activation_functors.h b/onnxruntime/core/providers/cpu/rnn/rnn_activation_functors.h index 8f2be9d1cf97e..f7724746dda31 100644 --- a/onnxruntime/core/providers/cpu/rnn/rnn_activation_functors.h +++ b/onnxruntime/core/providers/cpu/rnn/rnn_activation_functors.h @@ -39,9 +39,8 @@ template inline T Sigmoid(T x, T alpha RNN_UNUSED_PARAMETER, T beta RNN_UNUSED_PARAMETER) { if (x >= 0) { return 1 / (1 + exp(-x)); - } else { - return exp(x) / (1 + exp(x)); } + return exp(x) / (1 + exp(x)); } template @@ -69,7 +68,7 @@ inline T Softsign(T x, T alpha, T beta); template <> inline float Softsign(float x, float alpha ORT_ATTRIBUTE_UNUSED, float beta ORT_ATTRIBUTE_UNUSED) { - return x / (1 + fabs(x)); + return x / (1 + std::fabs(x)); } template <> diff --git a/onnxruntime/core/providers/cpu/rnn/rnn_helpers.cc b/onnxruntime/core/providers/cpu/rnn/rnn_helpers.cc index 42481d8b5f3fe..682c7e0b70202 100644 --- a/onnxruntime/core/providers/cpu/rnn/rnn_helpers.cc +++ b/onnxruntime/core/providers/cpu/rnn/rnn_helpers.cc @@ -178,7 +178,7 @@ ActivationFuncs::ActivationFuncs(const std::vector& funcs, auto cur_beta = betas.cbegin(); auto end_beta = betas.cend(); - for (auto input_func : funcs) { + for (const auto& input_func : funcs) { float alpha = 0.f; float beta = 0.f; std::string func = detail::NormalizeActivationArgumentAndGetAlphaBetaCount( @@ -261,19 +261,19 @@ inline void clip_for_tanh(const float* ps, float* pd, int c) { } } -void add_bias_into_ignore(const float* ps, float* pd, const int c) { +void add_bias_into_ignore(const float* ps, const float* pd, int c) { ORT_UNUSED_PARAMETER(ps); ORT_UNUSED_PARAMETER(pd); ORT_UNUSED_PARAMETER(c); } -void add_bias_into(const float* ps, float* pd, const int c) { +void add_bias_into(const float* ps, float* pd, int c) { for (int i = 0; i < c; i++) { pd[i] += ps[i]; } } -void clip(const float b, float* pd, const int c) { +void clip(const float b, float* pd, int c) { for (int i = 0; i < c; i++) { float x = pd[i]; if (x > b) @@ -283,7 +283,7 @@ void clip(const float b, float* pd, const int c) { } } -void clip_ignore_bias(const float b, const float* pb, float* pd, const int c) { +void clip_ignore_bias(const float b, const float* pb, float* pd, int c) { ORT_UNUSED_PARAMETER(pb); for (int i = 0; i < c; i++) { @@ -297,7 +297,7 @@ void clip_ignore_bias(const float b, const float* pb, float* pd, const int c) { } } -void clip_add_bias(const float b, const float* pb, float* pd, const int c) { +void clip_add_bias(const float b, const float* pb, float* pd, int c) { for (int i = 0; i < c; i++) { float x = pd[i] + pb[i]; if (x > b) @@ -357,8 +357,7 @@ void tanh_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, } } -void relu_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, - const float alpha, const float beta) { +void relu_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(ps1_c); ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -369,17 +368,16 @@ void relu_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, } } -void composed_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, - std::function func, - const float alpha, const float beta) { +void composed_m(const float* ps1, const float* ps1_c, const float* ps2, float* pd, int c, + std::function func, float alpha, float beta) { ORT_UNUSED_PARAMETER(ps1_c); for (int i = 0; i < c; i++) { pd[i] = ps2[i] * func(ps1[i], alpha, beta); } } -void sigmoid_exact_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, - const float alpha, const float beta) { +void sigmoid_exact_m(const float* ps1, const float* ps1_c, const float* ps2, float* pd, int c, float alpha, + float beta) { ORT_UNUSED_PARAMETER(ps1_c); ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -390,8 +388,7 @@ void sigmoid_exact_m(const float* ps1, float* ps1_c, const float* ps2, float* pd } } -void tanh_exact_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, - const float alpha, const float beta) { +void tanh_exact_m(const float* ps1, const float* ps1_c, const float* ps2, float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(ps1_c); ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -401,7 +398,7 @@ void tanh_exact_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, i } } -void sigmoid(float* pd, int c, const float alpha, const float beta) { +void sigmoid(float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -424,7 +421,7 @@ void sigmoid(float* pd, int c, const float alpha, const float beta) { } } -void tanh(float* pd, int c, const float alpha, const float beta) { +void tanh(float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -447,7 +444,7 @@ void tanh(float* pd, int c, const float alpha, const float beta) { } } -void relu(float* pd, int c, const float alpha, const float beta) { +void relu(float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -457,7 +454,7 @@ void relu(float* pd, int c, const float alpha, const float beta) { } } -void sigmoid_exact(float* pd, int c, const float alpha, const float beta) { +void sigmoid_exact(float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -467,7 +464,7 @@ void sigmoid_exact(float* pd, int c, const float alpha, const float beta) { } } -void tanh_exact(float* pd, int c, const float alpha, const float beta) { +void tanh_exact(float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -477,15 +474,14 @@ void tanh_exact(float* pd, int c, const float alpha, const float beta) { } } -void merge_lstm_gates_to_memory(const float* pprev, const float* pi, const float* pf, const float* pg, - float* pcurr, const int c) { +void merge_lstm_gates_to_memory(const float* pprev, const float* pi, const float* pf, const float* pg, float* pcurr, + int c) { for (int i = 0; i < c; i++) { pcurr[i] = pprev[i] * pf[i] + pi[i] * pg[i]; } } -void gru_reset_gate_tanh(const float* ps1, float* ps2, float* pd, const int c, - const float alpha, const float beta) { +void gru_reset_gate_tanh(const float* ps1, float* ps2, float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -508,8 +504,7 @@ void gru_reset_gate_tanh(const float* ps1, float* ps2, float* pd, const int c, } } -void gru_reset_gate_sigmoid(const float* ps1, float* ps2, float* pd, const int c, - const float alpha, const float beta) { +void gru_reset_gate_sigmoid(const float* ps1, float* ps2, float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -532,8 +527,7 @@ void gru_reset_gate_sigmoid(const float* ps1, float* ps2, float* pd, const int c } } -void gru_reset_gate_relu(const float* ps1, float* ps2, float* pd, const int c, - const float alpha, const float beta) { +void gru_reset_gate_relu(const float* ps1, float* ps2, float* pd, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -543,16 +537,14 @@ void gru_reset_gate_relu(const float* ps1, float* ps2, float* pd, const int c, } } -void gru_reset_gate_composed(const float* ps1, float* ps2, float* pd, const int c, - std::function func, - const float alpha, const float beta) { +void gru_reset_gate_composed(const float* ps1, float* ps2, float* pd, int c, + std::function func, float alpha, float beta) { for (int i = 0; i < c; i++) { pd[i] = ps1[i] * func(ps2[i], alpha, beta); } } -void gru_output_gate_tanh(float* ph, const float* pz, const float* ps, float* po, const int c, - const float alpha, const float beta) { +void gru_output_gate_tanh(float* ph, const float* pz, const float* ps, float* po, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -575,8 +567,7 @@ void gru_output_gate_tanh(float* ph, const float* pz, const float* ps, float* po } } -void gru_output_gate_relu(float* ph, const float* pz, const float* ps, float* po, const int c, - const float alpha, const float beta) { +void gru_output_gate_relu(float* ph, const float* pz, const float* ps, float* po, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -586,16 +577,14 @@ void gru_output_gate_relu(float* ph, const float* pz, const float* ps, float* po } } -void gru_output_gate_composed(float* ph, const float* pz, const float* ps, float* po, const int c, - std::function func, - const float alpha, const float beta) { +void gru_output_gate_composed(float* ph, const float* pz, const float* ps, float* po, int c, + std::function func, float alpha, float beta) { for (int i = 0; i < c; i++) { po[i] = (1 - pz[i]) * func(ph[i], alpha, beta) + pz[i] * ps[i]; } } -void gru_output_gate_sigmoid(float* ph, const float* pz, const float* ps, float* po, const int c, - const float alpha, const float beta) { +void gru_output_gate_sigmoid(float* ph, const float* pz, const float* ps, float* po, int c, float alpha, float beta) { ORT_UNUSED_PARAMETER(alpha); ORT_UNUSED_PARAMETER(beta); @@ -618,33 +607,29 @@ void gru_output_gate_sigmoid(float* ph, const float* pz, const float* ps, float* } } -void composed_activation_func(float* ps, const int c, - std::function func, - const float alpha, const float beta) { +void composed_activation_func(float* ps, int c, std::function func, float alpha, + float beta) { for (int i = 0; i < c; i++) { ps[i] = func(ps[i], alpha, beta); } } -void composed_lstm_merge_gates_func(float* ps, const int c, - std::function func, - const float alpha, const float beta) { +void composed_lstm_merge_gates_func(float* ps, int c, std::function func, float alpha, + float beta) { for (int i = 0; i < c; i++) { ps[i] = func(ps[i], alpha, beta); } } -void composed_gru_reset_gate_func(float* ps, const int c, - std::function func, - const float alpha, const float beta) { +void composed_gru_reset_gate_func(float* ps, int c, std::function func, float alpha, + float beta) { for (int i = 0; i < c; i++) { ps[i] = func(ps[i], alpha, beta); } } -void composed_gru_output_gate_func(float* ps, const int c, - std::function func, - const float alpha, const float beta) { +void composed_gru_output_gate_func(float* ps, int c, std::function func, float alpha, + float beta) { for (int i = 0; i < c; i++) { ps[i] = func(ps[i], alpha, beta); } @@ -661,42 +646,39 @@ ActivationFuncPtr ActivationFuncByName(const std::string& func) { return relu; if (func == "affine") - return [](float* ps, const int c, const float alpha, const float beta) { - composed_activation_func(ps, c, Affine, alpha, beta); - }; + return + [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, Affine, alpha, beta); }; if (func == "leakyrelu") - return [](float* ps, const int c, const float alpha, const float beta) { + return [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, LeakyRelu, alpha, beta); }; if (func == "thresholdedrelu") - return [](float* ps, const int c, const float alpha, const float beta) { + return [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, ThresholdedRelu, alpha, beta); }; if (func == "scaledtanh") - return [](float* ps, const int c, const float alpha, const float beta) { + return [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, ScaledTanh, alpha, beta); }; if (func == "hardsigmoid") - return [](float* ps, const int c, const float alpha, const float beta) { + return [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, HardSigmoid, alpha, beta); }; if (func == "elu") - return [](float* ps, const int c, const float alpha, const float beta) { - composed_activation_func(ps, c, Elu, alpha, beta); - }; + return [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, Elu, alpha, beta); }; if (func == "softsign") - return [](float* ps, const int c, const float alpha, const float beta) { + return [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, Softsign, alpha, beta); }; if (func == "softplus") - return [](float* ps, const int c, const float alpha, const float beta) { + return [](float* ps, int c, float alpha, float beta) { composed_activation_func(ps, c, Softplus, alpha, beta); }; @@ -714,50 +696,42 @@ LstmMergeGatesFuncPtr LstmMergeGatesFuncByName(const std::string& func) { return relu_m; if (func == "affine") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, Affine, alpha, beta); }; if (func == "leakyrelu") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, LeakyRelu, alpha, beta); }; if (func == "thresholdedrelu") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, ThresholdedRelu, alpha, beta); }; if (func == "scaledtanh") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, ScaledTanh, alpha, beta); }; if (func == "hardsigmoid") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, HardSigmoid, alpha, beta); }; if (func == "elu") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, Elu, alpha, beta); }; if (func == "softsign") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, Softsign, alpha, beta); }; if (func == "softplus") - return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, const int c, - const float alpha, const float beta) { + return [](const float* ps1, float* ps1_c, const float* ps2, float* ps3, int c, float alpha, float beta) { composed_m(ps1, ps1_c, ps2, ps3, c, Softplus, alpha, beta); }; @@ -775,42 +749,42 @@ GruResetGateFuncPtr GruResetGateFuncByName(const std::string& func) { return gru_reset_gate_relu; if (func == "affine") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, Affine, alpha, beta); }; if (func == "leakyrelu") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, LeakyRelu, alpha, beta); }; if (func == "thresholdedrelu") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, ThresholdedRelu, alpha, beta); }; if (func == "scaledtanh") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, ScaledTanh, alpha, beta); }; if (func == "hardsigmoid") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, HardSigmoid, alpha, beta); }; if (func == "elu") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, Elu, alpha, beta); }; if (func == "softsign") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, Softsign, alpha, beta); }; if (func == "softplus") - return [](const float* ps1, float* ps2, float* ps3, const int c, const float alpha, const float beta) { + return [](const float* ps1, float* ps2, float* ps3, int c, float alpha, float beta) { gru_reset_gate_composed(ps1, ps2, ps3, c, Softplus, alpha, beta); }; @@ -828,50 +802,42 @@ GruOutputGateFuncPtr GruOutputGateFuncByName(const std::string& func) { return gru_output_gate_relu; if (func == "affine") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, Affine, alpha, beta); }; if (func == "leakyrelu") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, LeakyRelu, alpha, beta); }; if (func == "thresholdedrelu") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, ThresholdedRelu, alpha, beta); }; if (func == "scaledtanh") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, ScaledTanh, alpha, beta); }; if (func == "hardsigmoid") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, HardSigmoid, alpha, beta); }; if (func == "elu") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, Elu, alpha, beta); }; if (func == "softsign") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, Softsign, alpha, beta); }; if (func == "softplus") - return [](float* ps1, const float* ps2, const float* ph, float* ps3, const int c, - const float alpha, const float beta) { + return [](float* ps1, const float* ps2, const float* ph, float* ps3, int c, float alpha, float beta) { gru_output_gate_composed(ps1, ps2, ph, ps3, c, Softplus, alpha, beta); }; diff --git a/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h b/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h index 2fb787b476ca5..2e3e5f88d72ec 100644 --- a/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h +++ b/onnxruntime/core/providers/cpu/rnn/rnn_helpers.h @@ -40,14 +40,15 @@ enum Direction { inline Direction MakeDirection(const std::string& direction) { if (direction == "forward") { return kForward; - } else if (direction == "reverse") { + } + if (direction == "reverse") { return kReverse; - } else if (direction == "bidirectional") { + } + if (direction == "bidirectional") { return kBidirectional; - } else { + } ORT_THROW("Invalid 'direction' argument of '", direction, "'. Must be one of 'forward', 'reverse', or 'bidirectional'."); - } } /** Allocate a unique_ptr using allocator_, and return a span to the allocated memory so usage is safe @@ -237,7 +238,7 @@ void ExecuteLambdaInParallel(const std::string& name, TLambda lambda, int max, i }); } - int totalTasks = (int)max / (step > 0 ? step : 1) + (max % step > 0 ? 1 : 0); + int totalTasks = max / (step > 0 ? step : 1) + (max % step > 0 ? 1 : 0); while (done != totalTasks) ; #endif @@ -275,52 +276,53 @@ class ActivationFuncs { namespace deepcpu { using AddBiasIntoFuncPtr = void (*)(const float*, float*, const int); -using ClipWithBiasFuncPtr = void (*)(const float, const float*, float*, const int); -using ActivationFuncPtr = void (*)(float*, const int, const float, const float); -using ActivationFuncBPtr = void (*)(const float*, float*, const int, const float, const float); -using LstmMergeGatesFuncPtr = void (*)(const float*, float*, const float*, float*, const int, const float, const float); -using GruResetGateFuncPtr = void (*)(const float*, float*, float*, const int, const float, const float); -using GruOutputGateFuncPtr = void (*)(float*, const float*, const float*, float*, const int, const float, const float); +using ClipWithBiasFuncPtr = void (*)(float, const float*, float*, const int); +using ActivationFuncPtr = void (*)(float*, int, float, float); +using ActivationFuncBPtr = void (*)(const float*, float*, int, float, float); +using LstmMergeGatesFuncPtr = void (*)(const float*, float*, const float*, float*, int, float, float); +using GruResetGateFuncPtr = void (*)(const float*, float*, float*, int, float, float); +using GruOutputGateFuncPtr = void (*)(float*, const float*, const float*, float*, int, float, float); ActivationFuncPtr ActivationFuncByName(const std::string& func); LstmMergeGatesFuncPtr LstmMergeGatesFuncByName(const std::string& func); GruResetGateFuncPtr GruResetGateFuncByName(const std::string& func); GruOutputGateFuncPtr GruOutputGateFuncByName(const std::string& func); -void add_bias_into_ignore(const float* ignored, float* pd, const int c); -void add_bias_into(const float* ps, float* pd, const int c); -void clip(const float b, float* pd, const int c); -void clip_add_bias(const float b, const float* pb, float* pd, const int c); -void clip_ignore_bias(const float b, const float* pb, float* pd, const int c); -void sigmoid_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, const float alpha, const float beta); -void tanh_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, const float alpha, const float beta); -void relu_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, const float alpha, const float beta); -void sigmoid_exact_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, const float alpha, const float beta); -void tanh_exact_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, const float alpha, const float beta); -void sigmoid(float* pd, int c, const float alpha, const float beta); -void tanh(float* pd, int c, const float alpha, const float beta); -void relu(float* pd, int c, const float alpha, const float beta); -void sigmoid_exact(float* pd, int c, const float alpha, const float beta); -void tanh_exact(float* pd, int c, const float alpha, const float beta); -void merge_lstm_gates_to_memory(const float* pprev, const float* pi, const float* pf, const float* pg, float* pcurr, const int c); -void gru_reset_gate_tanh(const float* ps1, float* ps2, float* pd, const int c, const float alpha, const float beta); -void gru_reset_gate_sigmoid(const float* ps1, float* ps2, float* pd, const int c, const float alpha, const float beta); -void gru_reset_gate_relu(const float* ps1, float* ps2, float* pd, const int c, const float alpha, const float beta); -void gru_output_gate_tanh(float* ph, const float* pz, const float* ps, float* po, const int c, const float alpha, const float beta); -void gru_output_gate_sigmoid(float* ph, const float* pz, const float* ps, float* po, const int c, const float alpha, const float beta); -void gru_output_gate_relu(float* ph, const float* pz, const float* ps, float* po, const int c, const float alpha, const float beta); - -inline void elementwise_product(const float* op1, const float* op2, float* dest, const int size) { +void add_bias_into_ignore(const float* ignored, const float* pd, int c); +void add_bias_into(const float* ps, float* pd, int c); +void clip(float b, float* pd, int c); +void clip_add_bias(float b, const float* pb, float* pd, int c); +void clip_ignore_bias(float b, const float* pb, float* pd, int c); +void sigmoid_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, float alpha, float beta); +void tanh_m(const float* ps1, float* ps1_c, const float* ps2, float* pd, int c, float alpha, float beta); +void relu_m(const float* ps1, const float* ps1_c, const float* ps2, float* pd, int c, float alpha, float beta); +void sigmoid_exact_m(const float* ps1, const float* ps1_c, const float* ps2, float* pd, int c, float alpha, float beta); +void tanh_exact_m(const float* ps1, const float* ps1_c, const float* ps2, float* pd, int c, float alpha, float beta); +void sigmoid(float* pd, int c, float alpha, float beta); +void tanh(float* pd, int c, float alpha, float beta); +void relu(float* pd, int c, float alpha, float beta); +void sigmoid_exact(float* pd, int c, float alpha, float beta); +void tanh_exact(float* pd, int c, float alpha, float beta); +void merge_lstm_gates_to_memory(const float* pprev, const float* pi, const float* pf, const float* pg, float* pcurr, + int c); +void gru_reset_gate_tanh(const float* ps1, float* ps2, float* pd, int c, float alpha, float beta); +void gru_reset_gate_sigmoid(const float* ps1, float* ps2, float* pd, int c, float alpha, float beta); +void gru_reset_gate_relu(const float* ps1, const float* ps2, float* pd, int c, float alpha, float beta); +void gru_output_gate_tanh(float* ph, const float* pz, const float* ps, float* po, int c, float alpha, float beta); +void gru_output_gate_sigmoid(float* ph, const float* pz, const float* ps, float* po, int c, float alpha, float beta); +void gru_output_gate_relu(const float* ph, const float* pz, const float* ps, float* po, int c, float alpha, float beta); + +inline void elementwise_product(const float* op1, const float* op2, float* dest, int size) { for (int i = 0; i < size; i++) dest[i] += op1[i] * op2[i]; } -inline void elementwise_sum1(const float* src, float* dest, const int size) { +inline void elementwise_sum1(const float* src, float* dest, int size) { for (int i = 0; i < size; i++) dest[i] += src[i]; } -inline void elementwise_sum2(const float* src1, const float* src2, float* dest, const int size) { +inline void elementwise_sum2(const float* src1, const float* src2, float* dest, int size) { for (int i = 0; i < size; i++) dest[i] += src1[i] + src2[i]; } diff --git a/onnxruntime/core/providers/cpu/symbols.txt b/onnxruntime/core/providers/cpu/symbols.txt index 6e9577cfedef9..6369f28d80853 100644 --- a/onnxruntime/core/providers/cpu/symbols.txt +++ b/onnxruntime/core/providers/cpu/symbols.txt @@ -17,6 +17,7 @@ OrtCreateEnv OrtCreateEnvWithCustomLogger OrtCreateRunOptions OrtCreateSession +OrtCreateSessionFromArray OrtCreateSessionOptions OrtCreateTensorAsOrtValue OrtCreateTensorTypeAndShapeInfo @@ -33,16 +34,16 @@ OrtEnableProfiling OrtEnableSequentialExecution OrtFillStringTensor OrtGetDimensions +OrtGetDimensionsCount OrtGetErrorCode OrtGetErrorMessage -OrtGetNumOfDimensions OrtGetStringTensorContent OrtGetStringTensorDataLength OrtGetTensorElementType OrtGetTensorMemSizeInBytesFromTensorProto OrtGetTensorMutableData -OrtGetTensorShapeAndType OrtGetTensorShapeElementCount +OrtGetTensorTypeAndShape OrtGetTypeInfo OrtGetValue OrtGetValueCount @@ -74,7 +75,7 @@ OrtSessionGetOutputCount OrtSessionGetOutputName OrtSessionGetOutputTypeInfo OrtSessionOptionsAppendExecutionProvider_CPU -OrtSetDims +OrtSetDimensions OrtSetSessionLogId OrtSetSessionLogVerbosityLevel OrtSetSessionGraphOptimizationLevel diff --git a/onnxruntime/core/providers/cpu/tensor/compress.cc b/onnxruntime/core/providers/cpu/tensor/compress.cc index 9926126a4146b..7231e04807593 100644 --- a/onnxruntime/core/providers/cpu/tensor/compress.cc +++ b/onnxruntime/core/providers/cpu/tensor/compress.cc @@ -63,13 +63,17 @@ Status Compress::Compute(OpKernelContext* ctx) const { axes_left_stride *= input_dimensions[i]; } - for (int i = static_cast(axis_ + 1); i < rank; ++i) { + for (size_t i = static_cast(axis_ + 1); i < rank; ++i) { axes_right_stride *= input_dimensions[i]; } int64_t axes_included_right_stride = axes_right_stride * input_dimensions[axis_]; int64_t axes_included_right_stride_bytes = axes_included_right_stride * element_bytes; - int64_t axes_right_stride_bytes = axes_right_stride * element_bytes; - + ORT_ENFORCE(axes_right_stride >= 0 && + static_cast(axes_right_stride) < std::numeric_limits::max()); + size_t axes_right_stride_bytes = 0; + if (!IAllocator::CalcMemSizeForArray(static_cast(axes_right_stride), element_bytes, + &axes_right_stride_bytes)) + return Status(ONNXRUNTIME, FAIL, "size overflow"); for (int i = 0; i < axes_left_stride; ++i) { for (int j = 0; j < valid_condition_length; ++j) { if (!condition_data[j]) { diff --git a/onnxruntime/core/providers/cpu/tensor/concat.cc b/onnxruntime/core/providers/cpu/tensor/concat.cc index 91721bc0ac87f..06b7bd9a47ad2 100644 --- a/onnxruntime/core/providers/cpu/tensor/concat.cc +++ b/onnxruntime/core/providers/cpu/tensor/concat.cc @@ -21,11 +21,11 @@ Status ConcatBase::PrepareForCompute(OpKernelContext* ctx, int input_count, Prep const size_t inputs_0_rank = inputs_0_dims.size(); ORT_RETURN_IF_NOT(inputs_0_rank > 0, "Cannot concatenate scalars"); - auto axis = HandleNegativeAxis(axis_, inputs_0.Shape().NumDimensions()); + uint64_t axis = static_cast(HandleNegativeAxis(axis_, inputs_0.Shape().NumDimensions())); // cache num of elements in tensor for later use // as it's expensive to call Size() on TensorShape over and over - std::vector tensor_num_elements(input_count); + std::vector tensor_num_elements(static_cast(input_count)); // Ensure all of the non concatenated axes match each other for (int index = 1; index < input_count; index++) { size_t num_elements = 1; @@ -37,7 +37,7 @@ Status ConcatBase::PrepareForCompute(OpKernelContext* ctx, int input_count, Prep ORT_ENFORCE(inputs_n_rank == inputs_0_rank, "Ranks of input data are different, cannot concatenate them, " "expected rank: ", std::to_string(inputs_0_rank), " got: ", std::to_string(inputs_n_rank)); // Ensure all the other (non-concat) axes match - for (int axis_index = 0; axis_index < inputs_0_rank; ++axis_index) { + for (size_t axis_index = 0; axis_index < inputs_0_rank; ++axis_index) { num_elements *= inputs_n_dims[axis_index]; if (axis_index == axis) continue; @@ -59,7 +59,7 @@ Status ConcatBase::PrepareForCompute(OpKernelContext* ctx, int input_count, Prep // Calculate the shape of the output tensor std::vector dims(inputs_0_rank); size_t num_elements = 1; // cache size of the first input along the way - for (int dimension_index = 0; dimension_index < inputs_0_rank; dimension_index++) { + for (size_t dimension_index = 0; dimension_index < inputs_0_rank; dimension_index++) { dims[dimension_index] = inputs_0_dims[dimension_index]; num_elements *= inputs_0_dims[dimension_index]; } @@ -78,8 +78,7 @@ Status ConcatBase::PrepareForCompute(OpKernelContext* ctx, int input_count, Prep // The output_axis_pitch is the number of elements to add to move to the next split axis in the output p.output_axis_pitch = 1; - for (auto i = int64_t(inputs_0_rank); i-- > axis;) - p.output_axis_pitch *= dims[i]; + for (size_t i = inputs_0_rank; i-- > axis;) p.output_axis_pitch *= dims[i]; for (int input_index = 0; input_index < input_count; input_index++) { const Tensor* data_n_ptr = ctx->Input(input_index); @@ -90,8 +89,7 @@ Status ConcatBase::PrepareForCompute(OpKernelContext* ctx, int input_count, Prep // The input_axis_pitch is the number of elements to add to move to the next split axis in the input int64_t input_axis_pitch = 1; const auto& data_dims = data_n.Shape().GetDims(); - for (int i = static_cast(inputs_0_rank); i-- > axis;) - input_axis_pitch *= data_dims[i]; + for (size_t i = inputs_0_rank; i-- > axis;) input_axis_pitch *= data_dims[i]; p.inputs.push_back({&data_n, tensor_num_elements[input_index], input_axis_pitch}); } @@ -125,7 +123,7 @@ Status Concat::Compute(OpKernelContext* ctx) const { // Copy the data across. For every 'input_axis_pitch' values copied, we move over by the 'output_axis_pitch' uint8_t* output = static_cast(p.output_tensor->MutableDataRaw()); - for (int idxCopy = 0; idxCopy < input_size / input_axis_pitch; ++idxCopy) { + for (size_t idxCopy = 0; idxCopy < input_size / input_axis_pitch; ++idxCopy) { if (is_string_type) { for (int idxItem = 0; idxItem < input_axis_pitch; ++idxItem) reinterpret_cast(output)[output_offset + idxCopy * p.output_axis_pitch + idxItem] = diff --git a/onnxruntime/core/providers/cpu/tensor/concat.h b/onnxruntime/core/providers/cpu/tensor/concat.h index def1e68cc0509..f7df2e3d169de 100644 --- a/onnxruntime/core/providers/cpu/tensor/concat.h +++ b/onnxruntime/core/providers/cpu/tensor/concat.h @@ -25,7 +25,7 @@ class ConcatBase { int64_t axis_pitch; }; std::vector inputs; - size_t output_num_elements; + int64_t output_num_elements; int64_t output_axis_pitch; Tensor* output_tensor; }; diff --git a/onnxruntime/core/providers/cpu/tensor/gather.cc b/onnxruntime/core/providers/cpu/tensor/gather.cc index ed8b52524dd8e..2a71d357fe633 100644 --- a/onnxruntime/core/providers/cpu/tensor/gather.cc +++ b/onnxruntime/core/providers/cpu/tensor/gather.cc @@ -51,7 +51,8 @@ Status GatherCopyData(const Tensor* indices_tensor, const uint8_t* src_base, uin #pragma omp parallel for #endif for (int64_t index = 0; index < M * N; ++index) { - int64_t batch = index / N, i = index % N; + int64_t batch = index / N; + int64_t i = index % N; const int64_t src_offset_batch = batch * data_batch_bytes; const int64_t dst_offset_batch = batch * gathered_batch_bytes; @@ -93,7 +94,8 @@ Status Gather::Compute(OpKernelContext* context) const { if (Tind_type == DataTypeImpl::GetType()) { return GatherCopyData(p.indices_tensor, src_base, dst_base, is_string_type, element_bytes, block_size, M, N, data_batch_bytes, gathered_batch_bytes, input_data_shape, p.axis); - } else if (Tind_type == DataTypeImpl::GetType()) { + } + if (Tind_type == DataTypeImpl::GetType()) { return GatherCopyData(p.indices_tensor, src_base, dst_base, is_string_type, element_bytes, block_size, M, N, data_batch_bytes, gathered_batch_bytes, input_data_shape, p.axis); } diff --git a/onnxruntime/core/providers/cpu/tensor/pad.cc b/onnxruntime/core/providers/cpu/tensor/pad.cc index 896c62b830394..e470031337d03 100644 --- a/onnxruntime/core/providers/cpu/tensor/pad.cc +++ b/onnxruntime/core/providers/cpu/tensor/pad.cc @@ -88,25 +88,30 @@ static void ReshapePads(const std::vector& src_pad, size_t src_dim_coun } template <> -Status Pad::Compute(OpKernelContext* ctx) const { +Status PadCpuImpl(OpKernelContext* ctx, + const std::vector& pads, + const std::vector& slices, + const Mode& mode, + float value) { auto& input_tensor = *ctx->Input(0); std::vector output_dims(input_tensor.Shape().GetDims()); size_t dimension_count = output_dims.size(); + // make copy of raw_pads as it may be mutated below ORT_ENFORCE(dimension_count > 0, "Input tensor has no dimensions"); - ORT_ENFORCE(dimension_count * 2 == pads_.size(), "'pads' attribute has wrong number of values"); + ORT_ENFORCE(dimension_count * 2 == pads.size(), "'pads' has wrong number of values"); // Reshape input dims std::vector reshaped_input_dims; - FlattenInnerShape(output_dims, pads_, slices_, reshaped_input_dims); + FlattenInnerShape(output_dims, pads, slices, reshaped_input_dims); // Reshape padding size_t new_dims_count = reshaped_input_dims.size(); size_t inner_axis = new_dims_count - 1; size_t inner_no_pad_size = reshaped_input_dims[inner_axis] / output_dims[inner_axis]; std::vector reshaped_pad(2 * new_dims_count), reshaped_slice(2 * new_dims_count); - ReshapePads(pads_, dimension_count, new_dims_count, inner_no_pad_size, reshaped_pad); - ReshapePads(slices_, dimension_count, new_dims_count, inner_no_pad_size, reshaped_slice); + ReshapePads(pads, dimension_count, new_dims_count, inner_no_pad_size, reshaped_pad); + ReshapePads(slices, dimension_count, new_dims_count, inner_no_pad_size, reshaped_slice); std::vector reshaped_output_dims = reshaped_input_dims; std::vector input_starts; @@ -122,7 +127,7 @@ Status Pad::Compute(OpKernelContext* ctx) const { } for (size_t i = 0; i < dimension_count; i++) { - output_dims[i] += pads_[i] + pads_[i + dimension_count] + slices_[i] + slices_[i + dimension_count]; + output_dims[i] += pads[i] + pads[i + dimension_count] + slices[i] + slices[i + dimension_count]; } TensorShape output_shape(output_dims); @@ -142,7 +147,7 @@ Status Pad::Compute(OpKernelContext* ctx) const { ExtentAxisCounters input_counters(input_extents); - switch (mode_) { + switch (mode) { case Mode::Constant: // Loop over the output tensor, writing out padding between the blocks of copied data // On loop entry, 'pad' is already set to the first continuous block of padding, and @@ -155,8 +160,8 @@ Status Pad::Compute(OpKernelContext* ctx) const { int64_t prePad = reshaped_pad[inner_axis]; int64_t postPad = reshaped_pad[inner_axis + new_dims_count]; - PadAxisConstant(axisStart - prePad, value_, prePad); - PadAxisConstant(output, value_, postPad); + PadAxisConstant(axisStart - prePad, value, prePad); + PadAxisConstant(output, value, postPad); output += postPad; alignSkip = prePad; } @@ -166,8 +171,8 @@ Status Pad::Compute(OpKernelContext* ctx) const { float* axisStart = output - inner_pitch * input_extents[input_counters.Axis()]; int64_t prePad = reshaped_pad[input_counters.Axis()]; int64_t postPad = reshaped_pad[input_counters.Axis() + new_dims_count]; - PadAxisConstant(axisStart - prePad * inner_pitch, value_, prePad * inner_pitch); - PadAxisConstant(output, value_, postPad * inner_pitch); + PadAxisConstant(axisStart - prePad * inner_pitch, value, prePad * inner_pitch); + PadAxisConstant(output, value, postPad * inner_pitch); output += inner_pitch * postPad; alignSkip += inner_pitch * prePad; } @@ -239,4 +244,9 @@ Status Pad::Compute(OpKernelContext* ctx) const { return Status::OK(); } + +template <> +Status Pad::Compute(OpKernelContext* ctx) const { + return PadCpuImpl(ctx, pads_, slices_, mode_, value_); +} }; // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/tensor/pad.h b/onnxruntime/core/providers/cpu/tensor/pad.h index ebeca39f48813..a511a2f7176af 100644 --- a/onnxruntime/core/providers/cpu/tensor/pad.h +++ b/onnxruntime/core/providers/cpu/tensor/pad.h @@ -6,9 +6,15 @@ namespace onnxruntime { +enum class Mode : int { + Constant = 0, + Reflect, + Edge +}; + class PadBase { protected: - PadBase(const OpKernelInfo& info) : value_(info.GetAttrOrDefault("value", 0.f)) { + PadBase(const OpKernelInfo& info, bool dynamic = false) : value_(info.GetAttrOrDefault("value", 0.f)) { std::string mode; if (info.GetAttr("mode", &mode).IsOK()) { if (mode == "constant") @@ -20,32 +26,30 @@ class PadBase { else ORT_THROW("Invalid 'mode' attribute value"); } - if (!info.GetAttrs("pads", pads_).IsOK()) - ORT_THROW("Invalid 'pads' attribute value"); - - // Separate out any negative pads_ into the slices_ array - slices_.resize(pads_.size(), 0); - for (size_t index = 0; index < pads_.size(); index++) { - if (pads_[index] < 0) { - slices_[index] = pads_[index]; - pads_[index] = 0; + + if (!dynamic) { + if (!info.GetAttrs("pads", pads_).IsOK()) + ORT_THROW("Invalid 'pads' attribute value"); + + // Separate out any negative pads_ into the slices_ array + slices_.resize(pads_.size(), 0); + for (size_t index = 0; index < pads_.size(); index++) { + if (pads_[index] < 0) { + slices_[index] = pads_[index]; + pads_[index] = 0; + } } - } - ; // Value is optional and initialized to 0 by default + ; // Value is optional and initialized to 0 by default + } } - ~PadBase() {} + ~PadBase() = default; - enum class Mode : int { - Constant = 0, - Reflect, - Edge - }; Mode mode_{Mode::Constant}; std::vector pads_; // After construction, only >=0 values are in here std::vector slices_; // All of the negative padding values are separated out into slices_ - const float value_; + const float value_; // will always be float (when 'value' parsed from attribute - opset 10 and below) }; template @@ -55,4 +59,11 @@ struct Pad final : public OpKernel, public PadBase { Status Compute(OpKernelContext* context) const override; }; +template +Status PadCpuImpl(OpKernelContext* ctx, + const std::vector& pads, + const std::vector& slices, + const Mode& mode, + T value); + } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/tensor/reverse_sequence.cc b/onnxruntime/core/providers/cpu/tensor/reverse_sequence.cc index 73ec28a04dc2a..ab92e6eb0238a 100644 --- a/onnxruntime/core/providers/cpu/tensor/reverse_sequence.cc +++ b/onnxruntime/core/providers/cpu/tensor/reverse_sequence.cc @@ -32,12 +32,8 @@ ONNX_OPERATOR_KERNEL_EX(ReverseSequence, ReverseSequenceOp); template -static void ReverseSequenceImpl(const Tensor& X, Tensor& Y, - gsl::span sequence_lengths, - const int64_t max_seq_len, - const int64_t batch_size, - const int64_t input_size, - bool time_major); +static void ReverseSequenceImpl(const Tensor& X, Tensor& Y, gsl::span sequence_lengths, + int64_t max_seq_len, int64_t batch_size, int64_t input_size, bool time_major); Status ReverseSequenceOp::Compute(OpKernelContext* context) const { Status status = Status::OK(); diff --git a/onnxruntime/core/providers/cpu/tensor/reverse_sequence.h b/onnxruntime/core/providers/cpu/tensor/reverse_sequence.h index fb9cc0c8284ee..c44c1d46d3ffe 100644 --- a/onnxruntime/core/providers/cpu/tensor/reverse_sequence.h +++ b/onnxruntime/core/providers/cpu/tensor/reverse_sequence.h @@ -12,7 +12,8 @@ namespace onnxruntime { class ReverseSequenceOp : public OpKernel { public: explicit ReverseSequenceOp(const OpKernelInfo& info) : OpKernel(info) { - int64_t batch_axis, time_axis; + int64_t batch_axis; + int64_t time_axis; ORT_ENFORCE(info.GetAttr("batch_axis", &batch_axis).IsOK()); ORT_ENFORCE(info.GetAttr("time_axis", &time_axis).IsOK()); diff --git a/onnxruntime/core/providers/cpu/tensor/scatter.cc b/onnxruntime/core/providers/cpu/tensor/scatter.cc index be6e19b80e9ba..9fd18e8e49f76 100644 --- a/onnxruntime/core/providers/cpu/tensor/scatter.cc +++ b/onnxruntime/core/providers/cpu/tensor/scatter.cc @@ -30,7 +30,7 @@ ONNX_CPU_OPERATOR_KERNEL( .TypeConstraint("Tind", std::vector{DataTypeImpl::GetTensorType(), DataTypeImpl::GetTensorType()}), Scatter); -template +template Status CopyScatterData(const Tensor* data_input, const Tensor* indices_input, const Tensor* updates_input, const int64_t axis, Tensor* data_output) { const TensorShape& input_data_shape = data_input->Shape(); @@ -45,12 +45,11 @@ Status CopyScatterData(const Tensor* data_input, const Tensor* indices_input, co } const auto input_elements = input_data_shape.Size(); - const auto element_bytes = data_input->DataType()->Size(); const auto total_input_bytes = data_input->Size(); - const uint8_t* src_base = reinterpret_cast(data_input->DataRaw()); - uint8_t* dst_base = reinterpret_cast(data_output->MutableDataRaw()); - const bool is_string_type = data_input->DataType() == DataTypeImpl::GetType(); + const Tdata* src_base = static_cast(data_input->DataRaw()); + Tdata* dst_base = static_cast(data_output->MutableDataRaw()); + bool is_string_type = data_input->DataType() == DataTypeImpl::GetType(); // We allow runtime to re-use input for output. If input/output Tensor* are the same // we do not copy @@ -61,7 +60,7 @@ Status CopyScatterData(const Tensor* data_input, const Tensor* indices_input, co std::string* dst = data_output->template MutableData(); std::copy(str_begin, str_end, dst); } else { - memcpy(dst_base, src_base, total_input_bytes); + memcpy(static_cast(dst_base), static_cast(src_base), total_input_bytes); } } @@ -110,7 +109,7 @@ Status CopyScatterData(const Tensor* data_input, const Tensor* indices_input, co } } - const uint8_t* update_data = reinterpret_cast(updates_input->DataRaw()); + const Tdata* update_data = static_cast(updates_input->DataRaw()); // For every update we compute the destination offset and copy it there for (int64_t index = 0; index < num_indices;) { const Tin axis_idx = indices_data[index]; @@ -127,16 +126,7 @@ Status CopyScatterData(const Tensor* data_input, const Tensor* indices_input, co } } - const size_t dst_offset_bytes = dst_offset * element_bytes; - assert(dst_offset_bytes < total_input_bytes); - if (is_string_type) { - reinterpret_cast(dst_base)[dst_offset] = - reinterpret_cast(update_data)[index]; - } else { - // Copy an element - auto src_offset_bytes = index * element_bytes; - memcpy(dst_base + dst_offset_bytes, update_data + src_offset_bytes, element_bytes); - } + dst_base[dst_offset] = update_data[index]; if (++index == num_indices) { break; @@ -158,6 +148,38 @@ Status CopyScatterData(const Tensor* data_input, const Tensor* indices_input, co return Status::OK(); } +#define DispatchOnIndexTypeAndTensorType(index_type, tensor_type, retval, function, ...) \ + if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else if (tensor_type == DataTypeImpl::GetType()) \ + retval = function(__VA_ARGS__); \ + else \ + ORT_ENFORCE(false, "Unknown tensor type of ", tensor_type) + Status Scatter::Compute(OpKernelContext* context) const { const auto* data_input = context->Input(0); const auto& input_data_shape = data_input->Shape(); @@ -203,12 +225,16 @@ Status Scatter::Compute(OpKernelContext* context) const { auto* data_output = context->Output(0, input_data_shape); MLDataType Tind_type = indices_input->DataType(); + MLDataType Tdata_type = data_input->DataType(); + Status status; if (Tind_type == DataTypeImpl::GetType()) { - return CopyScatterData(data_input, indices_input, updates_input, axis, data_output); + DispatchOnIndexTypeAndTensorType(int32_t, Tdata_type, status, CopyScatterData, data_input, indices_input, updates_input, axis, data_output); } else if (Tind_type == DataTypeImpl::GetType()) { - return CopyScatterData(data_input, indices_input, updates_input, axis, data_output); + DispatchOnIndexTypeAndTensorType(int64_t, Tdata_type, status, CopyScatterData, data_input, indices_input, updates_input, axis, data_output); + } else { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Expecting indices to be either int32_t or int64_t"); } - return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Expecting indices to be either int32_t or int64_t"); + return status; } } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/tensor/slice.cc b/onnxruntime/core/providers/cpu/tensor/slice.cc index a96c5402e6791..a54267c97c135 100644 --- a/onnxruntime/core/providers/cpu/tensor/slice.cc +++ b/onnxruntime/core/providers/cpu/tensor/slice.cc @@ -221,13 +221,11 @@ void SliceBase::FillVectorsFromInput(const OpKernelContext* context, std::vector& input_ends, std::vector& input_axes, std::vector& input_steps) const { - auto start_tensor = context->Input(1); - auto ends_tensor = context->Input(2); - const Tensor* axes_tensor = nullptr; - if (context->InputCount() >= 4) - axes_tensor = context->Input(3); - // Slice V10 (optional input) + const Tensor* start_tensor = context->Input(1); + const Tensor* ends_tensor = context->Input(2); + const Tensor* axes_tensor = context->Input(3); const Tensor* steps_tensor = nullptr; + // check if this is Slice V10 - only Slice V10 has this optional input if (context->InputCount() == 5) steps_tensor = context->Input(4); @@ -235,7 +233,7 @@ void SliceBase::FillVectorsFromInput(const OpKernelContext* context, ORT_ENFORCE(nullptr != ends_tensor && ends_tensor->Shape().NumDimensions() == 1, "Ends must be a 1-D array"); ORT_ENFORCE(start_tensor->Shape() == ends_tensor->Shape(), "Starts and ends shape mismatch"); ORT_ENFORCE(nullptr == axes_tensor || start_tensor->Shape() == axes_tensor->Shape(), "Starts and axes shape mismatch"); - ORT_ENFORCE(nullptr == steps_tensor || steps_tensor->Shape() == axes_tensor->Shape(), "Steps and axes shape mismatch"); + ORT_ENFORCE(nullptr == steps_tensor || start_tensor->Shape() == steps_tensor->Shape(), "Starts and steps shape mismatch"); const auto& dtype = start_tensor->DataType(); const auto& size = start_tensor->Shape().Size(); @@ -305,8 +303,7 @@ Status Slice::Compute(OpKernelContext* ctx) const { ORT_ENFORCE(input_tensor_ptr != nullptr, "Missing input tensor to be processed"); const auto& input_tensor = *input_tensor_ptr; const auto& input_dimensions = input_tensor.Shape().GetDims(); - if (input_dimensions.size() < 1) - return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Cannot slice scalars"); + if (input_dimensions.empty()) return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Cannot slice scalars"); // Initialize the starts & ends to the actual tensor shape std::vector starts(input_dimensions.size(), 0); @@ -315,7 +312,10 @@ Status Slice::Compute(OpKernelContext* ctx) const { // Slice V10 & DynamicSlice if (dynamic) { - std::vector input_starts, input_ends, input_axes, input_steps; + std::vector input_starts; + std::vector input_ends; + std::vector input_axes; + std::vector input_steps; FillVectorsFromInput(ctx, input_starts, input_ends, input_axes, input_steps); ORT_RETURN_IF_ERROR(PrepareForCompute(input_starts, input_ends, input_axes, input_steps, input_dimensions, starts, steps, output_dims)); diff --git a/onnxruntime/core/providers/cpu/tensor/split.cc b/onnxruntime/core/providers/cpu/tensor/split.cc index d4e655a7f20e4..efd85a483771a 100644 --- a/onnxruntime/core/providers/cpu/tensor/split.cc +++ b/onnxruntime/core/providers/cpu/tensor/split.cc @@ -16,62 +16,35 @@ ONNX_CPU_OPERATOR_KERNEL( KernelDefBuilder().TypeConstraint("T", std::vector{ DataTypeImpl::GetTensorType(), - DataTypeImpl::GetTensorType(), DataTypeImpl::GetTensorType(), - }), + DataTypeImpl::GetTensorType()}), Split); -Status Split::Compute(OpKernelContext* context) const { - const Tensor& input = *context->Input(0); - - Status status; - auto data_type = input.DataType(); - - if (data_type == DataTypeImpl::GetType()) - status = ComputeImpl(*context, input); - else if (data_type == DataTypeImpl::GetType()) - status = ComputeImpl(*context, input); - else if (data_type == DataTypeImpl::GetType()) { - /* Need to update CopyMatrix to support double... - status = ComputeImpl(*context, input); */ - ORT_NOT_IMPLEMENTED("Split operator does not support double yet"); - } else - ORT_THROW("Invalid data type for Split operator of ", data_type); - - return status; -} - -template -Status Split::ComputeImpl(OpKernelContext& context, const Tensor& input) const { - auto& input_shape = input.Shape(); +Status SplitBase::PrepareForCompute(const TensorShape& input_shape, int num_outputs, int64_t& axis, int& before_dims, + int& after_dims_including_split_axis, int& after_dims_excluding_split, + std::vector& split_sizes) const { auto& input_dims = input_shape.GetDims(); const int64_t num_dimensions = gsl::narrow_cast(input_shape.NumDimensions()); - const int64_t axis = HandleNegativeAxis(axis_, num_dimensions); // handle negative and enforce axis is valid + axis = HandleNegativeAxis(axis_, num_dimensions); // handle negative and enforce axis is valid const int64_t split_dim_size = input_dims[axis]; - auto num_outputs = context.OutputCount(); - std::vector outputs; - outputs.reserve(num_outputs); - - int before_dims = gsl::narrow(input_shape.SizeToDimension(axis)); - int after_dims_including_split_axis = gsl::narrow(input_shape.SizeFromDimension(axis)); - int after_dims_excluding_split = (axis + 1 == num_dimensions) - ? 1 // we multiply by this value so must be 1 not 0 - : gsl::narrow(input_shape.SizeFromDimension(axis + 1)); - - std::vector split_sizes; + before_dims = gsl::narrow(input_shape.SizeToDimension(axis)); + after_dims_including_split_axis = gsl::narrow(input_shape.SizeFromDimension(axis)); + after_dims_excluding_split = (axis + 1 == num_dimensions) + ? 1 // we multiply by this value so must be 1 not 0 + : gsl::narrow(input_shape.SizeFromDimension(axis + 1)); if (split_sizes_.empty()) { // equal split based on number of outputs - if (split_dim_size % num_outputs != 0) { + if (split_dim_size % static_cast(num_outputs) != 0) { return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Input cannot be split evenly on selected axis. Input shape=", input_shape, " Axis=", axis_, " NumOutputs=", num_outputs); } // populate split_sizes with the same size for each output - split_sizes = std::vector(num_outputs, split_dim_size / num_outputs); + split_sizes = std::vector(static_cast(num_outputs), split_dim_size / num_outputs); } else { - if (split_sizes_.size() != num_outputs || split_size_sum_ != split_dim_size) + if (split_sizes_.size() != static_cast(num_outputs) || split_size_sum_ != split_dim_size) return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Cannot split using values in 'split' attribute. Axis=", axis_, " Input shape=", input_shape, @@ -82,7 +55,58 @@ Status Split::ComputeImpl(OpKernelContext& context, const Tensor& input) const { split_sizes = split_sizes_; } + return Status::OK(); +} + +Status Split::Compute(OpKernelContext* context) const { + const Tensor& input = *context->Input(0); + + Status status; + auto data_type = input.DataType(); + + if (data_type == DataTypeImpl::GetType()) + status = ComputeImpl(*context, input); + else if (data_type == DataTypeImpl::GetType()) + status = ComputeImpl(*context, input); + else if (data_type == DataTypeImpl::GetType()) + status = ComputeImpl(*context, input); + else + ORT_THROW("Split operator does not support ", data_type, " yet"); + + return status; +} + +template +inline void copy_data(const T* src, T* dst, size_t count) { + memcpy(dst, src, count * sizeof(T)); +} + +template<> +inline void copy_data(const std::string* src, std::string* dst, size_t count) { + const std::string* end = src + count; + std::copy(src, end, dst); +} + +template +Status Split::ComputeImpl(OpKernelContext& context, const Tensor& input) const { + auto& input_shape = input.Shape(); + auto num_outputs = context.OutputCount(); + int64_t axis = axis_; + int before_dims = 0; + int after_dims_including_split_axis = 0; + int after_dims_excluding_split = 0; + std::vector split_sizes; + + ORT_RETURN_IF_ERROR(PrepareForCompute(input_shape, + num_outputs, + axis, + before_dims, + after_dims_including_split_axis, + after_dims_excluding_split, + split_sizes)); + // copy dimensions so we can update the selected axis in place + auto& input_dims = input_shape.GetDims(); std::vector output_dimensions{input_dims}; int64_t input_offset = 0; @@ -104,7 +128,7 @@ Status Split::ComputeImpl(OpKernelContext& context, const Tensor& input) const { static_cast(output_data), // B split_size * after_dims_excluding_split, // ldb [](const T* src, T* dst, size_t count) { - memcpy(dst, src, count * sizeof(T)); + copy_data(src, dst, count); }); input_offset += split_size * after_dims_excluding_split; // offset by the N data we used in this iteration diff --git a/onnxruntime/core/providers/cpu/tensor/split.h b/onnxruntime/core/providers/cpu/tensor/split.h index 2fdcf1bc250ef..51b13b51be13d 100644 --- a/onnxruntime/core/providers/cpu/tensor/split.h +++ b/onnxruntime/core/providers/cpu/tensor/split.h @@ -10,12 +10,10 @@ namespace onnxruntime { -class Split final : public OpKernel { - public: - Split(const OpKernelInfo& info) : OpKernel(info) { - // required with default of 0 - if (!info.GetAttr("axis", &axis_).IsOK()) - ORT_THROW("Missing 'axis' attribute value"); +class SplitBase { + protected: + SplitBase(const OpKernelInfo& info) { + axis_ = info.GetAttrOrDefault("axis", 0); // optional if (info.GetAttrs("split", split_sizes_).IsOK()) { @@ -25,15 +23,27 @@ class Split final : public OpKernel { } } + /* + * \param num_outputs must >=0 + */ + Status PrepareForCompute(const TensorShape& input_shape, int num_outputs, int64_t& axis, int& before_dims, + int& after_dims_including_split_axis, int& after_dims_excluding_split, + std::vector& split_sizes) const; + + int64_t axis_; + std::vector split_sizes_; + int64_t split_size_sum_ = 0; +}; + +class Split final : public OpKernel, public SplitBase { + public: + Split(const OpKernelInfo& info) : OpKernel(info), SplitBase(info) {} + Status Compute(OpKernelContext* context) const override; private: template Status ComputeImpl(OpKernelContext& context, const Tensor& input) const; - - int64_t axis_; - std::vector split_sizes_; - int64_t split_size_sum_ = 0; }; } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cpu/tensor/tile.cc b/onnxruntime/core/providers/cpu/tensor/tile.cc index 833a2654d0726..e10f0a6cd9a70 100644 --- a/onnxruntime/core/providers/cpu/tensor/tile.cc +++ b/onnxruntime/core/providers/cpu/tensor/tile.cc @@ -99,7 +99,7 @@ Status Tile::Compute(OpKernelContext* ctx) const { // Calculate the shape of the output tensor auto* repeats = repeats_tensor.template Data(); std::vector output_dims = input_shape.GetDims(); - for (auto axis = 0; axis < input_rank; axis++) { + for (size_t axis = 0; axis < input_rank; axis++) { output_dims[axis] *= repeats[axis]; } @@ -125,9 +125,8 @@ Status Tile::Compute(OpKernelContext* ctx) const { dtype == DataTypeImpl::GetType()) return TileCoreForFixedSizeTypes(input_tensor, output_tensor, repeats, input_counters, output_pitches, sizeof(float)); - else if (dtype == DataTypeImpl::GetType() || - dtype == DataTypeImpl::GetType() || - dtype == DataTypeImpl::GetType()) + if (dtype == DataTypeImpl::GetType() || dtype == DataTypeImpl::GetType() || + dtype == DataTypeImpl::GetType()) return TileCoreForFixedSizeTypes(input_tensor, output_tensor, repeats, input_counters, output_pitches, sizeof(double)); else if (dtype == DataTypeImpl::GetType() || diff --git a/onnxruntime/core/providers/cpu/tensor/transpose.cc b/onnxruntime/core/providers/cpu/tensor/transpose.cc index ff3380e4e79fb..54a1825df18e8 100644 --- a/onnxruntime/core/providers/cpu/tensor/transpose.cc +++ b/onnxruntime/core/providers/cpu/tensor/transpose.cc @@ -175,7 +175,7 @@ static void DoTransposeEltWise(int64_t num_axes, const std::vector& tar } } -static Status DoUntypedTranspose(const std::vector& permutations, const Tensor& input, Tensor& output) { +static Status DoUntypedTranspose(const std::vector& permutations, const Tensor& input, Tensor& output) { const auto& input_shape = input.Shape(); const auto& input_dims = input_shape.GetDims(); auto rank = input_shape.NumDimensions(); @@ -184,7 +184,7 @@ static Status DoUntypedTranspose(const std::vector& permutations, const const bool is_string_type = input.DataType() == DataTypeImpl::GetType(); std::vector stride(rank); - for (int i = 0; i < rank; i++) { + for (size_t i = 0; i < rank; i++) { size_t inpdim = permutations[i]; if (inpdim + 1 < rank) stride[i] = input_shape.SizeFromDimension(inpdim + 1); @@ -239,7 +239,7 @@ static Status DoUntypedTranspose(const std::vector& permutations, const return Status::OK(); } -Status TransposeBase::DoTranspose(const std::vector& permutations, const Tensor& input, Tensor& output) { +Status TransposeBase::DoTranspose(const std::vector& permutations, const Tensor& input, Tensor& output) { Status status = Status::OK(); auto input_type = input.DataType(); @@ -265,8 +265,8 @@ Status Transpose::Compute(OpKernelContext* ctx) const { size_t rank = input_dims.size(); std::vector output_dims(rank); - const std::vector* p_perm; - std::vector default_perm(rank); + const std::vector* p_perm; + std::vector default_perm(rank); const auto& status = ComputeOutputShape(X, output_dims, default_perm, p_perm); if (!status.IsOK()) return status; diff --git a/onnxruntime/core/providers/cpu/tensor/transpose.h b/onnxruntime/core/providers/cpu/tensor/transpose.h index 50432c7875906..6600a083d839d 100644 --- a/onnxruntime/core/providers/cpu/tensor/transpose.h +++ b/onnxruntime/core/providers/cpu/tensor/transpose.h @@ -16,20 +16,26 @@ class TransposeBase { Transpose the input Tensor into the output Tensor using the provided permutations. Both Tensors must have the same data type. */ - static Status DoTranspose(const std::vector& permutations, const Tensor& input, Tensor& output); + static Status DoTranspose(const std::vector& permutations, const Tensor& input, Tensor& output); protected: TransposeBase(const OpKernelInfo& info) { - Status status = info.GetAttrs("perm", perm_); - + std::vector temp_perm; + Status status = info.GetAttrs("perm", temp_perm); if (status.IsOK()) { + size_t rank = temp_perm.size(); + perm_.resize(temp_perm.size()); + // Check that perm_ is a valid permutation of [0,rank-1] + for (size_t i = 0; i != temp_perm.size(); ++i) { + int64_t v = temp_perm[i]; + ORT_ENFORCE(v >= 0 && static_cast(v) <= std::numeric_limits::max()); + if (static_cast(v) >= rank) + ORT_THROW("Attribute perm of Transpose has an invalid value. Value ", i, " is outside range."); + perm_[i] = static_cast(v); + } perm_specified_ = true; - size_t rank = perm_.size(); std::vector seen(rank, false); - // Check that perm_ is a valid permutation of [0,rank-1] for (auto i : perm_) { - if ((i < 0) || (i >= gsl::narrow(rank))) - ORT_THROW("Attribute perm of Transpose has an invalid value. Value ", i, " is outside range."); if (seen[i]) ORT_THROW("Attribute perm of Transpose has an invalid value. Value ", i, " is repeated."); seen[i] = true; @@ -37,8 +43,8 @@ class TransposeBase { } } - Status ComputeOutputShape(const Tensor& X, std::vector& output_dims, - std::vector& default_perm, const std::vector*& p_perm) const { + Status ComputeOutputShape(const Tensor& X, std::vector& output_dims, std::vector& default_perm, + const std::vector*& p_perm) const { size_t rank = X.Shape().NumDimensions(); const auto& input_dims = X.Shape().GetDims(); @@ -49,14 +55,13 @@ class TransposeBase { if (perm_specified_) p_perm = &perm_; else { - for (int i = 0; i < rank; ++i) - default_perm[i] = rank - i - 1; + for (size_t i = 0; i < rank; ++i) default_perm[i] = rank - i - 1; p_perm = &default_perm; } // Determine shape of output output_dims.resize(rank); - for (int i = 0; i < rank; i++) { + for (size_t i = 0; i < rank; i++) { size_t inpdim = (*p_perm)[i]; if (inpdim >= rank) { std::ostringstream ss; @@ -73,7 +78,7 @@ class TransposeBase { } bool perm_specified_ = false; - std::vector perm_; + std::vector perm_; }; class Transpose final : public OpKernel, public TransposeBase { diff --git a/onnxruntime/core/providers/cpu/tensor/upsample.cc b/onnxruntime/core/providers/cpu/tensor/upsample.cc index 385f0f886433a..d5db7473a1e70 100644 --- a/onnxruntime/core/providers/cpu/tensor/upsample.cc +++ b/onnxruntime/core/providers/cpu/tensor/upsample.cc @@ -108,8 +108,10 @@ Status upsampleLiner(const T* input, return Status(ONNXRUNTIME, FAIL, "Upsample: input/output value's dimension mismatch"); auto n_dim = input_shape.NumDimensions(); for (size_t i = 0, size = output_shape.Size(); i < size; i++) { - std::vector val1, val2; - std::vector d1, d2; + std::vector val1; + std::vector val2; + std::vector d1; + std::vector d2; size_t cur_idx = i; //val1, vla2, d1, d2 are in reverse order for (int64_t j = static_cast(n_dim - 1); j >= 0; j--) { @@ -158,47 +160,68 @@ void upsampleBilinear( float height_scale, float width_scale, const T* Xdata, - T* Ydata) { + T* Ydata, + AllocatorPtr& alloc) { int64_t output_width = static_cast(input_width * width_scale); int64_t output_height = static_cast(input_height * height_scale); - for (int64_t n = 0; n < batch_size; ++n) { - for (int64_t c = 0; c < num_channels; ++c) { - for (int64_t y = 0; y < output_height; ++y) { - float in_y = std::min(y / height_scale, static_cast(input_height - 1)); - const int64_t in_y1 = std::min(static_cast(in_y), input_height - 1); - const int64_t in_y2 = std::min(in_y1 + 1, input_height - 1); - float dy1 = fabs(in_y - in_y1); - float dy2 = fabs(in_y - in_y2); - if (in_y1 == in_y2) { - dy1 = 0.5f; - dy2 = 0.5f; - } + size_t inx_buffer_size = 2 * sizeof(int64_t) * (output_height + output_width); + size_t scale_buffer_size = 2 * sizeof(float_t) * (output_height + output_width); + auto inx_scale_data_buffer = alloc->Alloc(inx_buffer_size + scale_buffer_size); + BufferUniquePtr inx_scale_data_buffer_holder(inx_scale_data_buffer, BufferDeleter(alloc)); + int64_t* inx_data = static_cast(inx_scale_data_buffer_holder.get()); + int64_t* input_width_mul_y1 = inx_data; + int64_t* input_width_mul_y2 = inx_data + output_height; + int64_t* in_x1 = inx_data + 2 * output_height; + int64_t* in_x2 = inx_data + 2 * output_height + output_width; - const int64_t input_width_mul_y1 = input_width * in_y1; - const int64_t input_width_mul_y2 = input_width * in_y2; + float* scale_data = reinterpret_cast( in_x2 + output_width ); + float* dy1 = scale_data; + float* dy2 = scale_data + output_height; + float* dx1 = scale_data + 2 * output_height; + float* dx2 = scale_data + 2 * output_height + output_width; - for (int64_t x = 0; x < output_width; ++x) { - float in_x = std::min(x / width_scale, static_cast(input_width - 1)); - const int64_t in_x1 = std::min(static_cast(in_x), input_width - 1); - const int64_t in_x2 = std::min(in_x1 + 1, input_width - 1); + for (int64_t y = 0; y < output_height; ++y) { + float in_y = std::min(y / height_scale, static_cast(input_height - 1)); + const int64_t in_y1 = std::min(static_cast(in_y), input_height - 1); + const int64_t in_y2 = std::min(in_y1 + 1, input_height - 1); + dy1[y] = fabs(in_y - in_y1); + dy2[y] = fabs(in_y - in_y2); + if (in_y1 == in_y2) { + dy1[y] = 0.5f; + dy2[y] = 0.5f; + } + + input_width_mul_y1[y] = input_width * in_y1; + input_width_mul_y2[y] = input_width * in_y2; + } - float dx1 = std::abs(in_x - in_x1); - float dx2 = std::abs(in_x - in_x2); - if (in_x1 == in_x2) { - dx1 = 0.5f; - dx2 = 0.5f; - } + for (int64_t x = 0; x < output_width; ++x) { + float in_x = std::min(x / width_scale, static_cast(input_width - 1)); + in_x1[x] = std::min(static_cast(in_x), input_width - 1); + in_x2[x] = std::min(in_x1[x] + 1, input_width - 1); - T X11 = Xdata[input_width_mul_y1 + in_x1]; - T X21 = Xdata[input_width_mul_y1 + in_x2]; - T X12 = Xdata[input_width_mul_y2 + in_x1]; - T X22 = Xdata[input_width_mul_y2 + in_x2]; + dx1[x] = std::abs(in_x - in_x1[x]); + dx2[x] = std::abs(in_x - in_x2[x]); + if (in_x1[x] == in_x2[x]) { + dx1[x] = 0.5f; + dx2[x] = 0.5f; + } + } + + for (int64_t n = 0; n < batch_size; ++n) { + for (int64_t c = 0; c < num_channels; ++c) { + for (int64_t y = 0; y < output_height; ++y) { + for (int64_t x = 0; x < output_width; ++x) { + T X11 = Xdata[input_width_mul_y1[y] + in_x1[x]]; + T X21 = Xdata[input_width_mul_y1[y] + in_x2[x]]; + T X12 = Xdata[input_width_mul_y2[y] + in_x1[x]]; + T X22 = Xdata[input_width_mul_y2[y] + in_x2[x]]; - Ydata[output_width * y + x] = static_cast(dx2 * dy2 * X11 + - dx1 * dy2 * X21 + - dx2 * dy1 * X12 + - dx1 * dy1 * X22); + Ydata[output_width * y + x] = static_cast(dx2[x] * dy2[y] * X11 + + dx1[x] * dy2[y] * X21 + + dx2[x] * dy1[y] * X12 + + dx1[x] * dy1[y] * X22); } } Xdata += input_height * input_width; @@ -232,11 +255,15 @@ Status Upsample::BaseCompute(OpKernelContext* context, const std::vectorGetTempSpaceAllocator(&alloc)); upsampleBilinear(batch_size, num_channels, input_height, input_width, - scales[2], scales[3], X->template Data(), Y->template MutableData()); + scales[2], scales[3], X->template Data(), Y->template MutableData(), alloc); return Status::OK(); } default: diff --git a/onnxruntime/core/providers/cpu/tensor/upsample.h b/onnxruntime/core/providers/cpu/tensor/upsample.h index 303760d57a902..cae9762e598b6 100644 --- a/onnxruntime/core/providers/cpu/tensor/upsample.h +++ b/onnxruntime/core/providers/cpu/tensor/upsample.h @@ -18,7 +18,8 @@ enum UpsampleMode { class UpsampleBase { protected: UpsampleBase(OpKernelInfo info) : scales_cached_(false) { - int start, end; + int start; + int end; info.GetKernelDef().SinceVersion(&start, &end); is_resize = (start == 10); @@ -51,12 +52,12 @@ class UpsampleBase { UpsampleMode StringToUpsampleMode(const std::string& mode) { if (strcmp(mode.c_str(), UpsampleModeNN) == 0) { return UpsampleMode::NN; - } else if (strcmp(mode.c_str(), UpsampleModeLinear) == 0) { + } + if (strcmp(mode.c_str(), UpsampleModeLinear) == 0) { return UpsampleMode::LINEAR; - } else { + } ORT_THROW("mode attribute is " + mode + ". It can only be " + UpsampleModeNN + "(default) or " + UpsampleModeLinear + "."); - } } void ScalesValidation(const std::vector& scales, const UpsampleMode mode) const { @@ -81,7 +82,7 @@ class UpsampleBase { const float* scale_data = scale->template Data(); int64_t scales_size = scale->Shape().Size(); ORT_ENFORCE(scales_size > 0, "scales size should be greater than 0."); - if (scales.size() == 0) { + if (scales.empty()) { scales.resize(scales_size); } memcpy(scales.data(), scale_data, scales_size * sizeof(float)); diff --git a/onnxruntime/core/providers/cpu/tensor/where_op.cc b/onnxruntime/core/providers/cpu/tensor/where_op.cc index 62176f0e05761..d70b768bc13dd 100644 --- a/onnxruntime/core/providers/cpu/tensor/where_op.cc +++ b/onnxruntime/core/providers/cpu/tensor/where_op.cc @@ -28,7 +28,7 @@ namespace onnxruntime { //WHERE_TYPED_KERNEL(uint64_t) //WHERE_TYPED_KERNEL(int8_t) //WHERE_TYPED_KERNEL(int16_t) -//WHERE_TYPED_KERNEL(int32_t) +WHERE_TYPED_KERNEL(int32_t) //WHERE_TYPED_KERNEL(int64_t) //WHERE_TYPED_KERNEL(MLFloat16) //WHERE_TYPED_KERNEL(BFloat16) @@ -140,7 +140,7 @@ template std::enable_if_t, void> MergeBroadcastLoop(TBroadcaster* merge_broadcaster, TBroadcastOutput* merge_broadcast_output) { const auto merge_scalar_and_vector = [](gsl::span output, const T& scalar_value, gsl::span vector_value) { - if (scalar_value != T{}) { + if (!scalar_value.empty()) { std::fill(output.begin(), output.end(), scalar_value); } else { std::copy(vector_value.cbegin(), vector_value.cend(), output.begin()); @@ -156,11 +156,8 @@ MergeBroadcastLoop(TBroadcaster* merge_broadcaster, TBroadcastOutput* m merge_scalar_and_vector(output, Y_selection, X_selection); }, [](gsl::span output, gsl::span X_selection, gsl::span Y_selection) { - std::transform( - X_selection.cbegin(), X_selection.cend(), Y_selection.cbegin(), output.begin(), - [](const T& x, const T& y) { - return x != T{} ? x : y; - }); + std::transform(X_selection.cbegin(), X_selection.cend(), Y_selection.cbegin(), output.begin(), + [](const T& x, const T& y) { return !x.empty() ? x : y; }); }); } } // namespace diff --git a/onnxruntime/core/providers/cuda/activation/activations.cc b/onnxruntime/core/providers/cuda/activation/activations.cc index 7a085156c7a60..9801c98dcae37 100644 --- a/onnxruntime/core/providers/cuda/activation/activations.cc +++ b/onnxruntime/core/providers/cuda/activation/activations.cc @@ -23,14 +23,12 @@ namespace cuda { Status x::ComputeInternal(OpKernelContext* context) const { \ UnaryElementwisePreparation p; \ UnaryElementwise::Prepare(context, &p); \ - CudaAsyncBuffer func_ctx(this, 0, MakeFuncCtx()); \ - if (!std::is_same::value) \ - ORT_RETURN_IF_ERROR(func_ctx.CopyToGpu()); \ + CudaAsyncBuffer func_ctx(this, 0, MakeFuncCtx(), 1); \ + if (!std::is_same::value) ORT_RETURN_IF_ERROR(func_ctx.CopyToGpu()); \ Impl_##x::MappedType>( \ reinterpret_cast::MappedType*>(p.input_tensor->template Data()), \ reinterpret_cast::MappedType*>(p.output_tensor->template MutableData()), \ - func_ctx.GpuPtr(), \ - p.output_tensor->Shape().Size()); \ + func_ctx.GpuPtr(), p.output_tensor->Shape().Size()); \ \ return Status::OK(); \ } diff --git a/onnxruntime/core/providers/cuda/cuda_common.h b/onnxruntime/core/providers/cuda/cuda_common.h index 20c3fc1afae2f..d28c81ea9af20 100644 --- a/onnxruntime/core/providers/cuda/cuda_common.h +++ b/onnxruntime/core/providers/cuda/cuda_common.h @@ -71,8 +71,12 @@ class CudaKernel : public OpKernel { AllocCpuPtr(device_id, count); } - CudaAsyncBuffer(const CudaKernel* op_kernel, int device_id, const T& value) : CudaAsyncBuffer(op_kernel, device_id, 1) { - *CpuPtr() = value; + CudaAsyncBuffer(const CudaKernel* op_kernel, int device_id, const T& value, size_t count) + : CudaAsyncBuffer(op_kernel, device_id, count) { + T* p = CpuPtr(); + for (size_t i = 0; i != count; ++i) { + *p++ = value; + } } CudaAsyncBuffer(const CudaKernel* op_kernel, int device_id, const std::vector& vec) : CudaAsyncBuffer(op_kernel, device_id, vec.size()) { diff --git a/onnxruntime/core/providers/cuda/cuda_execution_provider.cc b/onnxruntime/core/providers/cuda/cuda_execution_provider.cc index 21c4c2cb42efc..03a39efe182d5 100644 --- a/onnxruntime/core/providers/cuda/cuda_execution_provider.cc +++ b/onnxruntime/core/providers/cuda/cuda_execution_provider.cc @@ -323,6 +323,18 @@ class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, float, Sum); class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, double, Sum); class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, MLFloat16, Sum); +class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 6, 7, float, Max); +class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 6, 7, double, Max); +class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 6, 7, MLFloat16, Max); +class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, float, Max); +class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, double, Max); +class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, MLFloat16, Max); +class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 6, 7, float, Min); +class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 6, 7, double, Min); +class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 6, 7, MLFloat16, Min); +class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, float, Min); +class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, double, Min); +class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 8, MLFloat16, Min); class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 7, 8, float, Greater); class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 7, 8, double, Greater); class ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 7, 8, MLFloat16, Greater); @@ -517,6 +529,7 @@ class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 10, MLFloat16, Resize); class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 10, int32_t, Resize); class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 10, uint8_t, Resize); +class ONNX_OPERATOR_KERNEL_CLASS_NAME(kCudaExecutionProvider, kOnnxDomain, 2, Split); static void RegisterCudaKernels(KernelRegistry& kernel_registry) { static const BuildKernelCreateInfoFn function_table[] = { @@ -606,6 +619,18 @@ static void RegisterCudaKernels(KernelRegistry& kernel_registry) { BuildKernelCreateInfo, BuildKernelCreateInfo, BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, BuildKernelCreateInfo, BuildKernelCreateInfo, BuildKernelCreateInfo, @@ -797,6 +822,7 @@ static void RegisterCudaKernels(KernelRegistry& kernel_registry) { BuildKernelCreateInfo, BuildKernelCreateInfo, BuildKernelCreateInfo, + BuildKernelCreateInfo, }; for (auto& function_table_entry : function_table) { diff --git a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.cc b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.cc index c58c3c782b4a8..c62a2b7a63074 100644 --- a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.cc +++ b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.cc @@ -3,6 +3,8 @@ #include "binary_elementwise_ops.h" #include "binary_elementwise_ops_impl.h" +#include "unary_elementwise_ops_impl.h" + using namespace onnxruntime::common; namespace onnxruntime { namespace cuda { @@ -245,6 +247,118 @@ Status Sum::ComputeInternal(OpKernelContext* context) const { return Status::OK(); } +template +Status Max::ComputeInternal(OpKernelContext* context) const { + typedef typename ToCudaType::MappedType CudaT; + const auto& node = Node(); + const auto& node_name = node.Name(); + auto input_count = node.InputArgCount().front(); + ORT_RETURN_IF_NOT(input_count >= 1, "Must have 1 or more inputs"); + + if (input_count == 1) { + auto input_tensor = context->Input(0); + const auto& input_shape = input_tensor->Shape(); + auto output_tensor = context->Output(0, input_shape); + CUDA_RETURN_IF_ERROR(cudaMemcpyAsync(output_tensor->MutableDataRaw(), input_tensor->DataRaw(), sizeof(CudaT) * input_shape.Size(), cudaMemcpyDeviceToDevice)); + } else { + // compute output shape first, using broadcast rule + TensorShape output_shape; + ORT_RETURN_IF_ERROR(ComputeOutputShape(node_name, context->Input(0)->Shape(), context->Input(1)->Shape(), output_shape)); + for (int index = 2; index < input_count; index++) { + TensorShape previous_output_shape = output_shape; + ORT_RETURN_IF_ERROR(ComputeOutputShape(node_name, previous_output_shape, context->Input(index)->Shape(), output_shape)); + } + Tensor* output_tensor = context->Output(0, output_shape); + BinaryElementwisePreparation prepare(this); + + // More than 2 inputs, set output to 0, add input0 to output, so that input0 can be broadcast with output shape correctly + CUDA_RETURN_IF_ERROR(cudaMemset(output_tensor->MutableDataRaw(), 0, output_shape.Size() * sizeof(CudaT))); + ORT_RETURN_IF_ERROR(BinaryElementwiseBroadcastPrepare(0, output_tensor, context->Input(0), output_tensor, &prepare)); + Impl_Add( + prepare.output_rank_or_simple_broadcast, + prepare.lhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.lhs_tensor->template Data()), + prepare.rhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.rhs_tensor->template Data()), + prepare.fdm_output_strides.GpuPtr(), + prepare.fdm_H, + prepare.fdm_C, + reinterpret_cast(prepare.output_tensor->template MutableData()), + prepare.output_tensor->Shape().Size()); + for (int index = 1; index < input_count; index++) { + ORT_RETURN_IF_ERROR(BinaryElementwiseBroadcastPrepare(0, output_tensor, context->Input(index), output_tensor, &prepare)); + Impl_Max( + prepare.output_rank_or_simple_broadcast, + prepare.lhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.lhs_tensor->template Data()), + prepare.rhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.rhs_tensor->template Data()), + prepare.fdm_output_strides.GpuPtr(), + prepare.fdm_H, + prepare.fdm_C, + reinterpret_cast(prepare.output_tensor->template MutableData()), + prepare.output_tensor->Shape().Size()); + } + } + return Status::OK(); +} + +template +Status Min::ComputeInternal(OpKernelContext* context) const { + typedef typename ToCudaType::MappedType CudaT; + const auto& node = Node(); + const auto& node_name = node.Name(); + auto input_count = node.InputArgCount().front(); + ORT_RETURN_IF_NOT(input_count >= 1, "Must have 1 or more inputs"); + + if (input_count == 1) { + auto input_tensor = context->Input(0); + const auto& input_shape = input_tensor->Shape(); + auto output_tensor = context->Output(0, input_shape); + CUDA_RETURN_IF_ERROR(cudaMemcpyAsync(output_tensor->MutableDataRaw(), input_tensor->DataRaw(), sizeof(CudaT) * input_shape.Size(), cudaMemcpyDeviceToDevice)); + } else { + // compute output shape first, using broadcast rule + TensorShape output_shape; + ORT_RETURN_IF_ERROR(ComputeOutputShape(node_name, context->Input(0)->Shape(), context->Input(1)->Shape(), output_shape)); + for (int index = 2; index < input_count; index++) { + TensorShape previous_output_shape = output_shape; + ORT_RETURN_IF_ERROR(ComputeOutputShape(node_name, previous_output_shape, context->Input(index)->Shape(), output_shape)); + } + Tensor* output_tensor = context->Output(0, output_shape); + BinaryElementwisePreparation prepare(this); + + // More than 2 inputs, set output to 0, add input0 to output, so that input0 can be broadcast with output shape correctly + CUDA_RETURN_IF_ERROR(cudaMemset(output_tensor->MutableDataRaw(), 0, output_shape.Size() * sizeof(CudaT))); + ORT_RETURN_IF_ERROR(BinaryElementwiseBroadcastPrepare(0, output_tensor, context->Input(0), output_tensor, &prepare)); + Impl_Add( + prepare.output_rank_or_simple_broadcast, + prepare.lhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.lhs_tensor->template Data()), + prepare.rhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.rhs_tensor->template Data()), + prepare.fdm_output_strides.GpuPtr(), + prepare.fdm_H, + prepare.fdm_C, + reinterpret_cast(prepare.output_tensor->template MutableData()), + prepare.output_tensor->Shape().Size()); + for (int index = 1; index < input_count; index++) { + ORT_RETURN_IF_ERROR(BinaryElementwiseBroadcastPrepare(0, output_tensor, context->Input(index), output_tensor, &prepare)); + Impl_Min( + prepare.output_rank_or_simple_broadcast, + prepare.lhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.lhs_tensor->template Data()), + prepare.rhs_padded_strides.GpuPtr(), + reinterpret_cast(prepare.rhs_tensor->template Data()), + prepare.fdm_output_strides.GpuPtr(), + prepare.fdm_H, + prepare.fdm_C, + reinterpret_cast(prepare.output_tensor->template MutableData()), + prepare.output_tensor->Shape().Size()); + } + } + return Status::OK(); +} + //Greater op output tensor type is bool, so it cannot directly fit in the macros //for other elementwise ops template @@ -257,11 +371,14 @@ Status Greater::ComputeInternal(OpKernelContext* context) const { const Tensor* input1 = context->Input(1); TensorShape output_shape; ORT_RETURN_IF_ERROR(ComputeOutputShape(name, input0->Shape(), input1->Shape(), output_shape)); + size_t output_size = output_shape.Size(); Tensor* output_tensor = context->Output(0, output_shape); BinaryElementwisePreparation prepare(this); ORT_RETURN_IF_ERROR(BinaryElementwiseBroadcastPrepare(0, input0, input1, output_tensor, &prepare)); - Impl_Compare( + + IAllocatorUniquePtr output_buffer = GetScratchBuffer(output_size); + Impl_Greater( prepare.output_rank_or_simple_broadcast, prepare.lhs_padded_strides.GpuPtr(), reinterpret_cast(prepare.lhs_tensor->template Data()), @@ -270,9 +387,13 @@ Status Greater::ComputeInternal(OpKernelContext* context) const { prepare.fdm_output_strides.GpuPtr(), prepare.fdm_H, prepare.fdm_C, - reinterpret_cast(prepare.output_tensor->template MutableData()), - prepare.output_tensor->Shape().Size()); + reinterpret_cast(output_buffer.get()), + output_size); + Impl_Cast::MappedType>( + reinterpret_cast(output_buffer.get()), + reinterpret_cast::MappedType*>(output_tensor->template MutableData()), + output_size); return Status::OK(); } @@ -280,6 +401,10 @@ BINARY_OP_REGISTER_UZILHFD(Sum, 8) BINARY_OP_REGISTER_VERSIONED_UZILHFD(Sum, 6, 7) BINARY_OP_REGISTER_UZILHFD(Greater, 9) BINARY_OP_REGISTER_VERSIONED_HFD(Greater, 7, 8) +BINARY_OP_REGISTER_HFD(Max, 8) +BINARY_OP_REGISTER_VERSIONED_HFD(Max, 6, 7) +BINARY_OP_REGISTER_HFD(Min, 8) +BINARY_OP_REGISTER_VERSIONED_HFD(Min, 6, 7) } // namespace cuda } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.h b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.h index 9149548d1feb4..746640b1d9f64 100644 --- a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.h +++ b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops.h @@ -203,5 +203,23 @@ class Greater final : public CudaKernel { Status ComputeInternal(OpKernelContext* context) const override; }; + +template +class Max final : public CudaKernel { + public: + Max(const OpKernelInfo& info) : CudaKernel(info) { + } + + Status ComputeInternal(OpKernelContext* context) const override; +}; + +template +class Min final : public CudaKernel { + public: + Min(const OpKernelInfo& info) : CudaKernel(info) { + } + + Status ComputeInternal(OpKernelContext* context) const override; +}; } // namespace cuda } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu index c276dc476174c..d025fb034060e 100644 --- a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu +++ b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.cu @@ -81,7 +81,9 @@ SPECIALIZED_BINARY_ELEMENTWISE_IMPL(And, bool) SPECIALIZED_BINARY_ELEMENTWISE_IMPL(Or, bool) SPECIALIZED_BINARY_ELEMENTWISE_IMPL(Xor, bool) SPECIALIZED_BINARY_ELEMENTWISE_IMPL_HFD(PRelu) -SPECIALIZED_BINARY_ELEMENTWISE_IMPL_UZILHFD(Compare) +SPECIALIZED_BINARY_ELEMENTWISE_IMPL_UZILHFD(Greater) +SPECIALIZED_BINARY_ELEMENTWISE_IMPL_HFD(Max) +SPECIALIZED_BINARY_ELEMENTWISE_IMPL_HFD(Min) } // namespace cuda } // namespace onnxruntime diff --git a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.h b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.h index f032a6a6ee652..2220ec7a2f68e 100644 --- a/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.h +++ b/onnxruntime/core/providers/cuda/math/binary_elementwise_ops_impl.h @@ -25,7 +25,9 @@ namespace cuda { BINARY_OP_NAME_EXPR(Or, (a | b)) \ BINARY_OP_NAME_EXPR(Xor, (a ^ b)) \ BINARY_OP_NAME_EXPR(PRelu, (a > (T)0 ? a : a * b)) \ - BINARY_OP_NAME_EXPR(Compare, (a > b) ? 1 : 0) + BINARY_OP_NAME_EXPR(Greater, (a > b) ? 1 : 0) \ + BINARY_OP_NAME_EXPR(Max, _Max(a, b)) \ + BINARY_OP_NAME_EXPR(Min, _Min(a, b)) // NOTE that cu files are compiled with nvcc and should not refer to any onnxruntime headers // so struct BinaryElementwisePreparation cannot be used here diff --git a/onnxruntime/core/providers/cuda/math/matmul.cc b/onnxruntime/core/providers/cuda/math/matmul.cc index 1b91d5564e8d9..c0c189625b31d 100644 --- a/onnxruntime/core/providers/cuda/math/matmul.cc +++ b/onnxruntime/core/providers/cuda/math/matmul.cc @@ -69,7 +69,7 @@ Status MatMul::ComputeInternal(OpKernelContext* ctx) const { ORT_RETURN_IF_ERROR(right_arrays.CopyToGpu()); ORT_RETURN_IF_ERROR(output_arrays.CopyToGpu()); - // note that onnxruntime MLValue is row major, while cublas is column major, + // note that onnxruntime OrtValue is row major, while cublas is column major, // so swap left/right operands CUBLAS_RETURN_IF_ERROR(cublasGemmBatchedHelper( Base::CublasHandle(), diff --git a/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc b/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc index 5dca49d8accef..e56f59df16843 100644 --- a/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc +++ b/onnxruntime/core/providers/cuda/rnn/cudnn_rnn_base.cc @@ -51,13 +51,13 @@ Status CudnnRnnBase::SetCudnnRnnWeightBias(const cudnnHandle_t cudnn_handle, int r_offset = 0; int bias_offset = 0; for (int layer = 0; layer < num_layers_ * num_directions_; ++layer) { - for (int idx = 0; idx < W_lin_layer_id_.size(); ++idx) { + for (size_t idx = 0; idx < W_lin_layer_id_.size(); ++idx) { SetWeightBias(cudnn_handle, rnn_desc, layer, x_desc, w_desc, filter_desc_, w_data, W_lin_layer_id_[idx], W_data, w_offset, true); if (B_data != nullptr) { SetWeightBias(cudnn_handle, rnn_desc, layer, x_desc, w_desc, filter_desc_, w_data, W_lin_layer_id_[idx], B_data, bias_offset, false); } } - for (int idx = 0; idx < R_lin_layer_id_.size(); ++idx) { + for (size_t idx = 0; idx < R_lin_layer_id_.size(); ++idx) { SetWeightBias(cudnn_handle, rnn_desc, layer, x_desc, w_desc, filter_desc_, w_data, R_lin_layer_id_[idx], R_data, r_offset, true); if (B_data != nullptr) { SetWeightBias(cudnn_handle, rnn_desc, layer, x_desc, w_desc, filter_desc_, w_data, R_lin_layer_id_[idx], B_data, bias_offset, false); diff --git a/onnxruntime/core/providers/cuda/tensor/transpose.cc b/onnxruntime/core/providers/cuda/tensor/transpose.cc index 53bdc71c6167b..b87d155d7e298 100644 --- a/onnxruntime/core/providers/cuda/tensor/transpose.cc +++ b/onnxruntime/core/providers/cuda/tensor/transpose.cc @@ -29,8 +29,8 @@ Status Transpose::ComputeInternal(OpKernelContext* ctx) const { size_t rank = input_dims.size(); std::vector output_dims(rank); - std::vector default_perm(rank); - const std::vector* p_perm = nullptr; + std::vector default_perm(rank); + const std::vector* p_perm = nullptr; const auto& status = ComputeOutputShape(X, output_dims, default_perm, p_perm); if (!status.IsOK()) return status; @@ -39,7 +39,7 @@ Status Transpose::ComputeInternal(OpKernelContext* ctx) const { Tensor* Y = ctx->Output(0, output_shape); int device_id = 0; CudaAsyncBuffer input_strides(this, device_id, rank); - CudaAsyncBuffer perm(this, device_id, *p_perm); + CudaAsyncBuffer perm(this, device_id, *p_perm); CudaAsyncBuffer fdm_output_strides(this, device_id, rank); ORT_ENFORCE(TensorPitches::Calculate(input_strides.CpuSpan(), input_dims)); ORT_ENFORCE(CalculateFdmStrides(fdm_output_strides.CpuSpan(), output_dims)); diff --git a/onnxruntime/core/providers/cuda/tensor/transpose_impl.cu b/onnxruntime/core/providers/cuda/tensor/transpose_impl.cu index 32580f928f6f0..482aa916d4c9f 100644 --- a/onnxruntime/core/providers/cuda/tensor/transpose_impl.cu +++ b/onnxruntime/core/providers/cuda/tensor/transpose_impl.cu @@ -8,14 +8,8 @@ namespace onnxruntime { namespace cuda { template -__global__ void _TransposeKernel( - const size_t shape_rank, - const int64_t* input_strides, - const int64_t* perm, - const T* input_data, - const fast_divmod* fdm_output_strides, - T* output_data, - const size_t N) { +__global__ void _TransposeKernel(size_t shape_rank, const int64_t* input_strides, const size_t* perm, + const T* input_data, const fast_divmod* fdm_output_strides, T* output_data, size_t N) { CALCULATE_ELEMENTWISE_INDEX_OR_EXIT(id, N); CUDA_LONG input_index = 0; CUDA_LONG output_index = id; @@ -30,29 +24,18 @@ __global__ void _TransposeKernel( } template -void TransposeImpl( - const size_t shape_rank, - const int64_t* input_strides, - const int64_t* perm, - const T* input_data, - const fast_divmod* fdm_output_strides, - T* output_data, - const size_t N) { +void TransposeImpl(size_t shape_rank, const int64_t* input_strides, const size_t* perm, const T* input_data, + const fast_divmod* fdm_output_strides, T* output_data, size_t N) { int blocksPerGrid = (int)(ceil(static_cast(N) / GridDim::maxThreadsPerBlock)); _TransposeKernel<<>>( shape_rank, input_strides, perm, input_data, fdm_output_strides, output_data, N); } -#define SPECIALIZED_IMPL(T) \ - template void TransposeImpl( \ - const size_t shape_rank, \ - const int64_t* input_strides, \ - const int64_t* perm, \ - const T* input_data, \ - const fast_divmod* fdm_output_strides, \ - T* output_data, \ - const size_t N); +#define SPECIALIZED_IMPL(T) \ + template void TransposeImpl(size_t shape_rank, const int64_t* input_strides, const size_t* perm, \ + const T* input_data, const fast_divmod* fdm_output_strides, T* output_data, \ + size_t N); SPECIALIZED_IMPL(float) SPECIALIZED_IMPL(double) diff --git a/onnxruntime/core/providers/cuda/tensor/transpose_impl.h b/onnxruntime/core/providers/cuda/tensor/transpose_impl.h index 5adc22f63d5ef..0d53abcf49b5a 100644 --- a/onnxruntime/core/providers/cuda/tensor/transpose_impl.h +++ b/onnxruntime/core/providers/cuda/tensor/transpose_impl.h @@ -9,14 +9,8 @@ namespace onnxruntime { namespace cuda { template -void TransposeImpl( - const size_t shape_rank, - const int64_t* input_strides, - const int64_t* perm, - const T* input_data, - const fast_divmod* fdm_output_strides, - T* output_data, - const size_t N); +void TransposeImpl(size_t shape_rank, const int64_t* input_strides, const size_t* perm, const T* input_data, + const fast_divmod* fdm_output_strides, T* output_data, size_t N); } // namespace cuda } // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.cc b/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.cc index cbbfc646d4ca8..752d0ccabd57b 100644 --- a/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.cc +++ b/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.cc @@ -1,10 +1,17 @@ // Copyright (c) Microsoft Corporation. All rights reserved. // Licensed under the MIT License. +#ifdef _MSC_VER +#pragma warning(disable : 4996) +#endif + #include "mkldnn_execution_provider.h" #include "core/framework/allocator.h" #include "core/framework/memcpy.h" #include "core/framework/kernel_registry.h" +#include "core/framework/compute_capability.h" +#include "core/providers/mkldnn/subgraph/mkldnn_func_kernel.h" + #include "mkldnn_fwd.h" namespace onnxruntime { @@ -32,16 +39,26 @@ ONNX_OPERATOR_KERNEL_EX( } // namespace mkl_dnn -MKLDNNExecutionProvider::MKLDNNExecutionProvider(const MKLDNNExecutionProviderInfo& /*info*/) - : IExecutionProvider{onnxruntime::kMklDnnExecutionProvider} { +MKLDNNExecutionProvider::MKLDNNExecutionProvider(const MKLDNNExecutionProviderInfo& info) + : IExecutionProvider{onnxruntime::kMklDnnExecutionProvider} { DeviceAllocatorRegistrationInfo default_allocator_info({OrtMemTypeDefault, [](int) { return std::make_unique(std::make_unique(MKLDNN, OrtAllocatorType::OrtDeviceAllocator, 0, OrtMemTypeDefault)); }, std::numeric_limits::max()}); - InsertAllocator(CreateAllocator(default_allocator_info)); DeviceAllocatorRegistrationInfo cpu_allocator_info({OrtMemTypeCPUOutput, [](int) { return std::make_unique(std::make_unique(MKLDNN_CPU, OrtAllocatorType::OrtDeviceAllocator, 0, OrtMemTypeCPUOutput)); }, std::numeric_limits::max()}); - InsertAllocator(CreateAllocator(cpu_allocator_info)); -} + + if (info.create_arena) { + InsertAllocator(CreateAllocator(default_allocator_info)); + + InsertAllocator(CreateAllocator(cpu_allocator_info)); + } else { + InsertAllocator(std::shared_ptr( + std::make_unique(default_allocator_info.factory(0)))); + + InsertAllocator(std::shared_ptr( + std::make_unique(cpu_allocator_info.factory(0)))); + } +} // namespace onnxruntime MKLDNNExecutionProvider::~MKLDNNExecutionProvider() { } @@ -80,19 +97,19 @@ class ONNX_OPERATOR_TYPED_KERNEL_CLASS_NAME(kMklDnnExecutionProvider, kOnnxDomai void RegisterMKLDNNKernels(KernelRegistry& kernel_registry) { static const BuildKernelCreateInfoFn function_table[] = { - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, - BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, + BuildKernelCreateInfo, }; for (auto& function_table_entry : function_table) { @@ -107,6 +124,292 @@ std::shared_ptr GetMklDnnKernelRegistry() { } } // namespace mkl_dnn +bool MKLDNNExecutionProvider::UseSubgraph(const onnxruntime::GraphViewer& graph_viewer, + const std::vector& kernel_registries, + std::vector>& result) const { + // switch between mkldnn-vanilla and mkldnn-subgraph implementation using + // MKLDNN_SUBGRAPH environment variable + bool use_subgraph = true; + + const char* env = getenv("ORT_MKLDNN_SUBGRAPH"); + int use_subgraph_env = 0; + if (env != nullptr) + use_subgraph_env = atoi(env); + + if (use_subgraph_env == 0) { + use_subgraph = false; + result = IExecutionProvider::GetCapability(graph_viewer, kernel_registries); + } + return use_subgraph; +} + +void MKLDNNExecutionProvider::CreateOrUpdateMklDnnNode(const Node* node, + mkl_dnn::Subgraph::SubgraphVariables& sub_var, + bool fused, + std::map& output_to_source_node_map, + NodeAttributes& subgraph_attributes) const { + const auto& node_inputs = node->InputDefs(); + sub_var.outputs.push_back(node->OutputDefs()[0]->Name()); + + if (!fused) { + mkl_dnn::MklDnnNode mklnode; + mklnode.name = node->OpType(); + mklnode.num_inputs = static_cast(node->InputDefs().size()); + mklnode.input_start_index = static_cast(sub_var.inputs.size()) - 1; + mklnode.node_index = static_cast(sub_var.subgraph_ptr->mklnodes.size()) + 1; + const auto& node_outputs = node->OutputDefs(); + mklnode.output_name = node_outputs[0]->Name(); + if (node->OpType() == "Conv") { + mklnode.weight_name = node->InputDefs()[1]->Name(); + } + for (int i = 0; i < node_inputs.size(); i++) { + auto iter = output_to_source_node_map.find(node_inputs[i]->Name()); + if (iter != output_to_source_node_map.end()) + mklnode.parent_nodes.push_back(iter->second); + } + sub_var.subgraph_ptr->mklnodes.push_back(mklnode); + output_to_source_node_map.insert(std::make_pair(node_outputs[0]->Name(), static_cast(sub_var.subgraph_ptr->mklnodes.size() - 1))); + } else { + const auto& node_outputs = node->OutputDefs(); + output_to_source_node_map.erase(sub_var.subgraph_ptr->mklnodes.back().output_name); + sub_var.subgraph_ptr->mklnodes.back().output_name = node_outputs[0]->Name(); + output_to_source_node_map.insert(std::make_pair(node_outputs[0]->Name(), static_cast(sub_var.subgraph_ptr->mklnodes.size() - 1))); + } + + // Add inputs which are not in the outputs vector. + for (int i = 0; i < node_inputs.size(); i++) { + auto itr = std::find(sub_var.outputs.begin(), sub_var.outputs.end(), node_inputs[i]->Name()); + if (itr == sub_var.outputs.end()) { + sub_var.inputs.push_back(node_inputs[i]->Name()); + } else { + // Vector of node outputs, which is input to other node + // if node output is not input to any other node, then it's the end node + // which we will find later + sub_var.outputs_as_input_other_node.push_back(node_inputs[i]->Name()); + } + } + + NodeAttributes attributes = node->GetAttributes(); + if (attributes.size() > 0) { + int index = static_cast(sub_var.subgraph_ptr->mklnodes.size()); + + for (auto att_it = attributes.begin(); att_it != attributes.end(); ++att_it) { + std::string key = node->OpType() + "-" + std::to_string(index) + "-" + att_it->first; + std::pair att(key, att_it->second); + subgraph_attributes[key] = att_it->second; + } + } +} + +std::vector> MKLDNNExecutionProvider::GetCapability( + const onnxruntime::GraphViewer& graph_viewer, + const std::vector& kernel_registries) const { + ORT_UNUSED_PARAMETER(kernel_registries); + std::vector> result; + + // temporary switch to toggle between mkldnn-vanilla and mkldnn-subgraph implementation using + // ORT_MKLDNN_SUBGRAPH environment variable + if (UseSubgraph(graph_viewer, kernel_registries, result) == false) { + return result; + } + + // use sub-graph implementation + mkl_dnn::Subgraph::SubgraphVariables sub_var; + // output name to node index map. Using it to find sub-graph end nodes + // if output of a node is not an input to any node in a sub-graph is end node + std::map output_to_source_node_map; + NodeAttributes subgraph_attributes; + int node_index = 0; + + while (node_index < graph_viewer.MaxNodeIndex()) { + auto node = graph_viewer.GetNode(node_index); + auto op_it = mkldnn_ops_.find(node->OpType()); + + if (op_it != mkldnn_ops_.end()) { + sub_var.subgraph_node_indexes.push_back(node->Index()); + + // can we fuse (at mkldnn level) nodes? + bool fused = false; + if (sub_var.subgraph_node_indexes.size() > 1 && node->OpType() == "Relu") { + if (sub_var.subgraph_ptr->mklnodes.back().name == "BatchNormalization" || sub_var.subgraph_ptr->mklnodes.back().name == "Conv") { + sub_var.subgraph_ptr->mklnodes.back().name += "-Relu"; + fused = true; + } + } + + // Create MklDnn node: + // Update inputs, outputs and parent nodes + // Collect attributes and modify the key to make it unique + CreateOrUpdateMklDnnNode(node, sub_var, fused, output_to_source_node_map, subgraph_attributes); + + auto temp_index = node_index + 1; + if (temp_index < graph_viewer.MaxNodeIndex()) { + if (!sub_var.subgraph_node_indexes.empty()) { + // if next node is mkldnn node and if it's input is not output of current node + // if next node input is output of any of the nodes in sub-graph continue + // else + // break and create sub-graph + auto next_node = graph_viewer.GetNode(temp_index); + auto sub_it = mkldnn_ops_.find(next_node->OpType()); + if (sub_it != mkldnn_ops_.end()) { + const auto& next_node_inputs = next_node->InputDefs(); + bool input_from_subgraph = true; + int inputs_count = 1; + if (next_node->OpType() == "Sum") + inputs_count = static_cast(next_node_inputs.size()); + for (int i = 0; i < inputs_count; i++) { + auto in = next_node_inputs[i]; + auto itr = std::find(sub_var.outputs.begin(), sub_var.outputs.end(), in->Name()); + if (itr == sub_var.outputs.end()) { + input_from_subgraph = false; + } + } + if (input_from_subgraph == false) { + CreateMetaDef(graph_viewer, subgraph_attributes, sub_var, result); + subgraph_attributes.clear(); + output_to_source_node_map.clear(); + } + } + } + if (!sub_var.subgraph_node_indexes.empty()) { + if (static_cast(node->GetOutputEdgesCount()) > 1) { + // If current node has branches + // iterate and see if all nodes are mkldnn ops OR + // it ends in node with same number of input edges (mkldnn node or cpu node) + // create sub-graph + bool create_subgraph = false; + bool break_loop = false; + while (!break_loop) { + if (temp_index > graph_viewer.MaxNodeIndex()) + break_loop = true; + + auto next_node = graph_viewer.GetNode(temp_index); + if (next_node->GetInputEdgesCount() == node->GetOutputEdgesCount()) { + // if all nodes in the branch loop are mkldnn nodes + // then continue with adding nodes to sub-graph + break_loop = true; + } + // inner nodes. if inner nodes are not mkldnn nodes + // create subgraph (inception v2) + auto sub_it = mkldnn_ops_.find(next_node->OpType()); + if (sub_it == mkldnn_ops_.end()) { + // break and create a sub-graph + break_loop = true; + create_subgraph = true; + } + temp_index++; + } + if (create_subgraph) { + CreateMetaDef(graph_viewer, subgraph_attributes, sub_var, result); + subgraph_attributes.clear(); + output_to_source_node_map.clear(); + } + } + } + } + } else { + if (!sub_var.subgraph_node_indexes.empty()) { + CreateMetaDef(graph_viewer, subgraph_attributes, sub_var, result); + subgraph_attributes.clear(); + } + } + node_index++; + } // graph_viewer node iterator ends + if (!sub_var.subgraph_node_indexes.empty()) { + CreateMetaDef(graph_viewer, subgraph_attributes, sub_var, result); + subgraph_attributes.clear(); + } + return result; +} + +void MKLDNNExecutionProvider::CreateMetaDef(const onnxruntime::GraphViewer& graph_viewer, + const NodeAttributes& subgraph_attributes, + mkl_dnn::Subgraph::SubgraphVariables& sub_var, + std::vector>& result) const { + std::string graph_fused_nodes; + std::string node_list; + std::string subgraph_id = std::to_string(sub_var.subgraph_index); + sub_var.subgraph_index++; + + // This is a list of initializers that subgraph considers as constants. + // Example weights, reshape shape etc. + std::unordered_set input_initializers; + + // Create ng_required_initializers attribute of NGraphCustomOp + ONNX_NAMESPACE::AttributeProto initializers; + initializers.set_name("initializers"); + initializers.set_type(ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_TENSORS); + //auto tensor = initializers.add_tensors(); + //*tensor = *(graph_viewer.GetAllInitializedTensors().at(sub_var.inputs[1])); + + for (const auto& init : sub_var.inputs) { + if (graph_viewer.GetAllInitializedTensors().count(init)) { + auto tensor = initializers.add_tensors(); + *tensor = *(graph_viewer.GetAllInitializedTensors().at(init)); + } + } + + auto meta_def = std::make_unique<::onnxruntime::IndexedSubGraph::MetaDef>(); + meta_def->attributes["initializers"] = initializers; + meta_def->name = "MkldnnCustomOp" + std::to_string(sub_var.subgraph_index); + meta_def->domain = kMSDomain; + meta_def->since_version = 1; + meta_def->status = ONNX_NAMESPACE::EXPERIMENTAL; + meta_def->inputs = sub_var.inputs; + meta_def->attributes.insert(subgraph_attributes.begin(), subgraph_attributes.end()); + + // Find the end nodes + for (auto& mklnode : sub_var.subgraph_ptr->mklnodes) { + auto itr = std::find(sub_var.outputs_as_input_other_node.begin(), + sub_var.outputs_as_input_other_node.end(), mklnode.output_name); + if (itr == sub_var.outputs_as_input_other_node.end()) { + meta_def->outputs.push_back(mklnode.output_name); + mklnode.output_index = static_cast(meta_def->outputs.size()) - 1; + } + } + + ONNX_NAMESPACE::AttributeProto ap; + ap.set_s(subgraph_id); + ap.set_type(ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_STRING); + meta_def->attributes["subgraph_id"] = ap; + std::unique_ptr sub_graph = std::make_unique(); + sub_graph->nodes = sub_var.subgraph_node_indexes; + sub_graph->SetMetaDef(meta_def); + result.push_back(std::make_unique(std::move(sub_graph))); + mkl_subgraphs_.insert(std::make_pair(subgraph_id, sub_var.subgraph_ptr)); + + // Reset subgraph and meta_Def + sub_var.Reset(); +} + +Status MKLDNNExecutionProvider::Compile(const std::vector& fused_nodes, + std::vector& node_compute_funcs) { + for (const auto* fused_node : fused_nodes) { + auto attributes = fused_node->GetAttributes(); + NodeComputeInfo compute_info; + + compute_info.create_state_func = [=](ComputeContext* context, FunctionState* state) { + auto* p = new onnxruntime::mkl_dnn::MkldnnFuncKernel(context, attributes, this); + *state = p; + return 0; + }; + + compute_info.release_state_func = [](FunctionState state) { + if (state) + delete static_cast*>(state); + }; + + compute_info.compute_func = [](FunctionState state, const OrtCustomOpApi* api, OrtKernelContext* context) { + onnxruntime::mkl_dnn::MkldnnFuncKernel* custom_op = reinterpret_cast*>(state); + const Status compute_status = custom_op->Compute(api, context); + return compute_status == Status::OK() ? 0 : 1; + }; + + node_compute_funcs.push_back(compute_info); + } + return Status::OK(); +} + std::shared_ptr MKLDNNExecutionProvider::GetKernelRegistry() const { static std::shared_ptr kernel_registry = onnxruntime::mkl_dnn::GetMklDnnKernelRegistry(); return kernel_registry; diff --git a/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.h b/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.h index 359f9b96be17a..666eb9b4d8df9 100644 --- a/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.h +++ b/onnxruntime/core/providers/mkldnn/mkldnn_execution_provider.h @@ -12,6 +12,7 @@ #include "core/graph/constants.h" #include "core/framework/allocatormgr.h" #include "core/framework/execution_provider.h" +#include "core/providers/mkldnn/subgraph/subgraph.h" namespace mkldnn { struct memory; @@ -28,6 +29,17 @@ struct MKLDNNExecutionProviderInfo { MKLDNNExecutionProviderInfo() = default; }; +struct MKLContext { + AllocateFunc allocate_func = nullptr; + DestroyFunc release_func = nullptr; + AllocatorHandle allocator = nullptr; + + MKLContext(AllocateFunc allocate_func, DestroyFunc release_func, AllocatorHandle alloc) + : allocate_func(allocate_func), + release_func(release_func), + allocator(alloc) {} +}; + // Logical device representation. class MKLDNNExecutionProvider : public IExecutionProvider { public: @@ -42,6 +54,13 @@ class MKLDNNExecutionProvider : public IExecutionProvider { virtual std::shared_ptr GetKernelRegistry() const override; + std::vector> + GetCapability(const onnxruntime::GraphViewer& graph, + const std::vector& /*kernel_registries*/) const override; + + common::Status Compile(const std::vector& fused_nodes, + std::vector& node_compute_funcs) override; + std::shared_ptr GetWeightsMemoryBuffer(const std::string& weight_key) { auto iter = weights_mem_map_.find(weight_key); if (iter != weights_mem_map_.end()) @@ -50,7 +69,7 @@ class MKLDNNExecutionProvider : public IExecutionProvider { } void SetWeightsMemoryBuffer(const std::string& weight_key, - const std::shared_ptr& filter_dst_mem) { + const std::shared_ptr& filter_dst_mem) { weights_mem_map_.insert(std::make_pair(weight_key, filter_dst_mem)); } @@ -70,6 +89,36 @@ class MKLDNNExecutionProvider : public IExecutionProvider { // Save reordered memory buffers in list so that memory is not freed. std::vector> reordered_buffers_; OrtMutex mutex_; + + // SUBGRAPH + private: + bool UseSubgraph(const onnxruntime::GraphViewer& graph_viewer, + const std::vector& kernel_registries, + std::vector>& result) const; + void CreateOrUpdateMklDnnNode(const Node* node, + mkl_dnn::Subgraph::SubgraphVariables& sub_var, + bool fused, + std::map& output_to_source_node_map, + NodeAttributes& subgraph_attributes) const; + + // Create MklDnn node, update inputs, outputs and parent nodes + // collect attribtes + void CreateMetaDef(const onnxruntime::GraphViewer& graph_viewer, + const NodeAttributes& subgraph_attributes, + mkl_dnn::Subgraph::SubgraphVariables& sub_var, + std::vector>& result) const; + + public: + const std::shared_ptr GetMklDnnSubgraph(const std::string& subgraph_id) { + return mkl_subgraphs_[subgraph_id]; + } + + private: + // supported MklDnn Operators + std::set mkldnn_ops_ = {"Conv", "BatchNormalization", "Relu", "Sum", + "AveragePool", "GlobalMaxPool", "GlobalAveragePool", "MaxPool", "LRN"}; + + mutable std::unordered_map> mkl_subgraphs_; }; } // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_activations.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_activations.h new file mode 100644 index 0000000000000..ffe7e1474e8a5 --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_activations.h @@ -0,0 +1,150 @@ +// Copyright(C) 2019 Intel Corporation +// Licensed under the MIT License + +#pragma once +#include "core/util/math.h" +#include "core/util/math_cpuonly.h" +#include "core/framework/op_kernel.h" +#include "core/providers/mkldnn/mkldnn_fwd.h" +#include "core/providers/mkldnn/mkldnn_execution_provider.h" +#include "core/providers/mkldnn/subgraph/mkldnn_kernel.h" + +namespace onnxruntime { +namespace mkl_dnn { + +template +class MklDnnRelu : public MklDnnKernel { + public: + MklDnnRelu(MklDnnNode& node, + MKLDNNExecutionProvider* provider, + std ::shared_ptr mkl_context, + const NodeAttributes& attributes, + const std::string attributes_prefix = "") : MklDnnKernel(node, provider, mkl_context) { + ORT_UNUSED_PARAMETER(attributes); + ORT_UNUSED_PARAMETER(attributes_prefix); + } + + Status CreatePrimitives(Ort::CustomOpApi ort, + OrtKernelContext* context, + mkldnn::engine& cpu_engine, + std::vector& net, + mkldnn::memory::format& source_format) { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + TensorShape x_shape; + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + + mkldnn::memory::dims dims(xdim); + + ort_source_format_ = GetSourceFormat(static_cast(xdim)); + source_format = ort_source_format_; + src_format_ = ort_source_format_; + + x_shape = TensorShape(xshape, xdim); + + mkldnn::memory::dims src_dims_mkl( + x_shape.GetDims().begin(), x_shape.GetDims().end()); + + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), src_format_)); + src_mem_.reset( + new mkldnn::memory({*src_md_, cpu_engine}, nullptr)); + } else { + src_md_.reset( + new mkldnn::memory::desc(parents_[0].get()->primitive_dst_mem_.get()->get_primitive_desc().desc())); + src_mem_ = parents_[0].get()->primitive_dst_mem_; + x_shape = parents_[0].get()->primitive_dst_shape_; + ort_source_format_ = source_format; + src_format_ = parents_[0].get()->primitive_dst_format_; + } + + primitive_dst_shape_ = TensorShape(x_shape); + + mkldnn::memory::dims dst_dims_mkl(primitive_dst_shape_.GetDims().begin(), primitive_dst_shape_.GetDims().end()); + mkldnn::algorithm algo = mkldnn::algorithm::eltwise_relu; + fwd_desc_.reset(new mkldnn::eltwise_forward::desc( + mkldnn::prop_kind::forward_inference, algo, *src_md_, 0)); + + relu_fwd_pd_.reset(new mkldnn::eltwise_forward::primitive_desc( + *fwd_desc_, cpu_engine)); + + primitive_src_format_ = static_cast( + relu_fwd_pd_.get()->src_primitive_desc().desc().data.format); + primitive_dst_format_ = static_cast( + relu_fwd_pd_.get()->dst_primitive_desc().desc().data.format); + + if (mklnode_ptr_->output_index >= 0) { + // last node of sub-graph. need to allocate memory for output_tensor + if (primitive_dst_format_ != ort_source_format_) { + // reorder neded. Use primitive output as input to reorder and + // allocate buffer for reorder output, final output of this subgraph + primitive_dst_mem_.reset(new mkldnn::memory(relu_fwd_pd_.get()->dst_primitive_desc())); + } else { + // Last node but re-order not needed. Allocate buffer to output of this node + primitive_dst_mem_.reset(new mkldnn::memory(relu_fwd_pd_.get()->dst_primitive_desc(), nullptr)); + } + } else { + // Intermediate node. Use mkldnn kernel internal memory for output and + // use this as input to next node. + primitive_dst_mem_.reset(new mkldnn::memory(relu_fwd_pd_.get()->dst_primitive_desc())); + } + + relu_fwd_.reset( + new mkldnn::eltwise_forward(*relu_fwd_pd_, *src_mem_, *primitive_dst_mem_)); + + net.push_back(*relu_fwd_); + + if (mklnode_ptr_->output_index >= 0) { + // one of the end nodes. Allocate output buffer memory and + // reorder is necessary + mkldnn::memory::data_type t = MklDnnType(); + InitDstReorderOutput(cpu_engine, t, net); + } + + return Status::OK(); + } + + Status Bind(Ort::CustomOpApi ort, OrtKernelContext* context) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + if (mklnode_ptr_->parent_nodes.size() == 0) { + // Sub-graph's first node. Read input from input buffer + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + src_mem_->set_data_handle(static_cast(const_cast(src_data))); + } + + if (mklnode_ptr_->output_index >= 0) { + auto& y_dims = primitive_dst_shape_.GetDims(); + // Allocate memory for output bufffer + OrtValue* output = ort.KernelContext_GetOutput(context, mklnode_ptr_->output_index, &y_dims[0], static_cast(primitive_dst_shape_.GetDims().size())); + T* dst_data = ort.GetTensorMutableData(output); + + if (primitive_dst_format_ != ort_source_format_) { + reorder_dst_mem_to_->set_data_handle(dst_data); + } else { + primitive_dst_mem_->set_data_handle(dst_data); + } + } + + return Status::OK(); + } + + private: + std::shared_ptr src_mem_; + + std::unique_ptr fwd_desc_; + std::unique_ptr relu_fwd_pd_; + std::unique_ptr relu_fwd_; + + std::unique_ptr src_md_; +}; +} // namespace mkl_dnn +} // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_batchnorm.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_batchnorm.h new file mode 100644 index 0000000000000..8270371eb4284 --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_batchnorm.h @@ -0,0 +1,361 @@ +// Copyright(C) 2019 Intel Corporation +// Licensed under the MIT License + +#pragma once +#include "core/framework/op_kernel.h" +#include "core/providers/mkldnn/mkldnn_fwd.h" +#include "core/providers/mkldnn/mkldnn_execution_provider.h" +#include "core/providers/mkldnn/subgraph/mkldnn_kernel.h" +#include "core/providers/mkldnn/memcpy_s.h" +#include "core/util/math.h" + +namespace onnxruntime { +namespace mkl_dnn { + +class BatchNormHelper { + public: + static common::Status ValidateInputs(const TensorShape& xshape, + const TensorShape& scale_shape, + const TensorShape& b_shape, + const TensorShape& mean_shape, + const TensorShape& var_shape) { + // defined as per spec and used for validation + constexpr int kNumInputScaleDimensions = 1; + constexpr int kNumInputBiasDimensions = 1; + constexpr int kNumInputMeanDimensions = 1; + constexpr int kNumInputVarianceDimensions = 1; + + if (xshape.GetDims().empty()) { + return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT, "Invalid input X: Empty dimensions"); + } + + int64_t num_channels = xshape.GetDims()[1]; + + if (scale_shape.NumDimensions() != kNumInputScaleDimensions) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input scale: NumDimensions() != ", kNumInputScaleDimensions); + } + if (scale_shape.GetDims()[0] != num_channels) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input scale: 0th dimension != ", num_channels); + } + + if (b_shape.NumDimensions() != kNumInputBiasDimensions) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input B: NumDimensions() != ", kNumInputBiasDimensions); + } + if (b_shape.GetDims()[0] != num_channels) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input B: 0th dimension != ", num_channels); + } + + if (mean_shape.NumDimensions() != kNumInputMeanDimensions) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input mean: NumDimensions() != ", kNumInputMeanDimensions); + } + if (mean_shape.GetDims()[0] != num_channels) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input mean: 0th dimension != ", num_channels); + } + + if (var_shape.NumDimensions() != kNumInputVarianceDimensions) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input var: NumDimensions() != ", kNumInputVarianceDimensions); + } + if (var_shape.GetDims()[0] != num_channels) { + return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Invalid input var: 0th dimension != ", num_channels); + } + return common::Status::OK(); + } + + static void NormalizeDims(const TensorShape& x_shape, std::vector& new_dims) { + new_dims.clear(); + auto& orig_dims = x_shape.GetDims(); + if (orig_dims.size() == 4 /*supported size by CUDA*/ || + orig_dims.size() == 5 /*supported size by CUDA*/) { + new_dims = orig_dims; + return; + } + + auto rank = x_shape.NumDimensions(); + auto num_samples = rank > 0 ? orig_dims[0] : 1; // NCHW + auto num_channels = rank > 1 ? orig_dims[1] : 1; + auto width = rank > 3 ? orig_dims[3] : 1; + auto height = rank > 2 ? orig_dims[2] : 1; + new_dims = {num_samples, num_channels, height, width}; + } +}; + +template +class MklDnnBatchNorm : public MklDnnKernel { + public: + explicit MklDnnBatchNorm(MklDnnNode& node, + MKLDNNExecutionProvider* provider, + std ::shared_ptr mkl_context, + const NodeAttributes& attributes, + const std::string attributes_prefix = "") : MklDnnKernel(node, provider, mkl_context) { + ReadAttributes(attributes, attributes_prefix); + } + void ReadAttributes(const NodeAttributes& attributes, + const std::string attributes_prefix = "") override { + auto attr = attributes.find(attributes_prefix + "epsilon"); + if (attr != attributes.end() && + attr->second.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_FLOAT) { + epsilon_ = attr->second.f(); + } + } + + Status CreatePrimitives(Ort::CustomOpApi ort, + OrtKernelContext* context, + mkldnn::engine& cpu_engine, + std::vector& net, + mkldnn::memory::format& source_format) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + TensorShape x_shape; + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + mkldnn::memory::dims dims(xdim); + + ort_source_format_ = GetSourceFormat(static_cast(xdim)); + source_format = ort_source_format_; + src_format_ = ort_source_format_; + x_shape = TensorShape(xshape, xdim); + + mkldnn::memory::dims src_dims_mkl( + x_shape.GetDims().begin(), x_shape.GetDims().end()); + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), src_format_)); + } else { + src_md_.reset(new mkldnn::memory::desc(parents_[0].get()->primitive_dst_mem_.get()->get_primitive_desc().desc())); + x_shape = parents_[0].get()->primitive_dst_shape_; + ort_source_format_ = source_format; + src_format_ = parents_[0].get()->primitive_dst_format_; + } + + int num_dimensions = static_cast(x_shape.NumDimensions()); + if (num_dimensions == 3) { + primitive_created_ = Status(common::ONNXRUNTIME, + common::NOT_IMPLEMENTED, "BatchNorm: Please call default CPU kernel."); + return primitive_created_; + } + + const OrtValue* scale_input_tensor = ort.KernelContext_GetInput(context, input_index + 1); + const OrtValue* b_input_tensor = ort.KernelContext_GetInput(context, input_index + 2); + const OrtValue* mean_input_tensor = ort.KernelContext_GetInput(context, input_index + 3); + const OrtValue* var_input_tensor = ort.KernelContext_GetInput(context, input_index + 4); + + auto scale_tensor_info = ort.GetTensorTypeAndShape(scale_input_tensor); + auto scale_tensor_shape = ort.GetTensorShape(scale_tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(scale_tensor_info); + auto sshape = scale_tensor_shape.data(); + auto sdim = scale_tensor_shape.size(); + TensorShape scale_shape(sshape, sdim); + + auto b_tensor_info = ort.GetTensorTypeAndShape(b_input_tensor); + auto b_tensor_shape = ort.GetTensorShape(b_tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(b_tensor_info); + auto bshape = b_tensor_shape.data(); + auto bdim = b_tensor_shape.size(); + TensorShape b_shape(bshape, bdim); + + auto mean_tensor_info = ort.GetTensorTypeAndShape(mean_input_tensor); + auto mean_tensor_shape = ort.GetTensorShape(mean_tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(mean_tensor_info); + auto mshape = mean_tensor_shape.data(); + auto mdim = mean_tensor_shape.size(); + TensorShape mean_shape(mshape, mdim); + + auto var_tensor_info = ort.GetTensorTypeAndShape(var_input_tensor); + auto var_tensor_shape = ort.GetTensorShape(var_tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(var_tensor_info); + auto vshape = var_tensor_shape.data(); + auto vdim = var_tensor_shape.size(); + TensorShape var_shape(vshape, vdim); + + primitive_dst_shape_ = TensorShape(x_shape); + + primitive_created_ = BatchNormHelper::ValidateInputs(x_shape, scale_shape, b_shape, mean_shape, var_shape); + if (!primitive_created_.IsOK()) + return primitive_created_; + + mkldnn::memory::dims src_dims_mkl( + x_shape.GetDims().begin(), x_shape.GetDims().end()); + mkldnn::memory::dims scale_dims_mkl( + scale_shape.GetDims().begin(), scale_shape.GetDims().end()); + mkldnn::memory::dims b_dims_mkl( + b_shape.GetDims().begin(), b_shape.GetDims().end()); + mkldnn::memory::dims mean_dims_mkl( + mean_shape.GetDims().begin(), mean_shape.GetDims().end()); + mkldnn::memory::dims var_dims_mkl( + var_shape.GetDims().begin(), var_shape.GetDims().end()); + + mkldnn::memory::dims dst_dims_mkl( + primitive_dst_shape_.GetDims().begin(), primitive_dst_shape_.GetDims().end()); + + scale_shift_md_.reset(new mkldnn::memory::desc( + {2, scale_dims_mkl[0]}, MklDnnType(), mkldnn::memory::format::nc)); + mean_md_.reset(new mkldnn::memory::desc( + {mean_dims_mkl}, MklDnnType(), mkldnn::memory::format::x)); + var_md_.reset(new mkldnn::memory::desc( + {var_dims_mkl}, MklDnnType(), mkldnn::memory::format::x)); + primitive_dst_md_.reset(new mkldnn::memory::desc( + {dst_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + + // scale_shift_mem will allocate 2*C*sizeof(float) buffer + // + scale_shift_mem_.reset( + new mkldnn::memory({*scale_shift_md_, cpu_engine})); + + mean_mem_.reset( + new mkldnn::memory({*mean_md_, cpu_engine}, nullptr)); + var_mem_.reset( + new mkldnn::memory({*var_md_, cpu_engine}, nullptr)); + + batchnorm_fwd_.reset(new mkldnn::batch_normalization_forward::desc( + mkldnn::prop_kind::forward_inference, *src_md_, epsilon_, + mkldnn::batch_normalization_flag::use_scale_shift | + mkldnn::batch_normalization_flag::use_global_stats)); + + if (fuse_relu_) { + mkldnn::primitive_attr attr; + attr.set_int_output_round_mode(mkldnn::round_mode::round_nearest); + // Execute RELU as Fuse PostOps + const float ops_scale = 1.f; + const float ops_alpha = 0.f; // relu negative slope + const float ops_beta = 0.f; + mkldnn::post_ops ops; + ops.append_eltwise(ops_scale, mkldnn::algorithm::eltwise_relu, ops_alpha, ops_beta); + attr.set_post_ops(ops); + + batchnorm_fwd_pd_.reset(new mkldnn::batch_normalization_forward::primitive_desc( + *batchnorm_fwd_, attr, cpu_engine)); + } else { + batchnorm_fwd_pd_.reset( + new mkldnn::batch_normalization_forward::primitive_desc( + *batchnorm_fwd_, cpu_engine)); + } + + // out format of this kernel + primitive_dst_format_ = static_cast( + batchnorm_fwd_pd_.get()->dst_primitive_desc().desc().data.format); + primitive_src_format_ = static_cast( + batchnorm_fwd_pd_.get()->dst_primitive_desc().desc().data.format); + + if (mklnode_ptr_->parent_nodes.size() == 0) { + src_mem_.reset( + new mkldnn::memory(batchnorm_fwd_pd_.get()->src_primitive_desc(), nullptr)); + } else { + src_mem_ = parents_[0].get()->primitive_dst_mem_; + } + + if (mklnode_ptr_->output_index >= 0) { + // Use mkldnn's internal output buffer + if (primitive_dst_format_ != ort_source_format_) { + primitive_dst_mem_.reset(new mkldnn::memory(batchnorm_fwd_pd_->dst_primitive_desc())); + } else { + primitive_dst_mem_.reset(new mkldnn::memory(batchnorm_fwd_pd_->dst_primitive_desc(), nullptr)); + } + } else { + // last node of sub-graph. need to allocate memory for output_tensor + primitive_dst_mem_.reset(new mkldnn::memory(batchnorm_fwd_pd_->dst_primitive_desc())); + } + auto bn = mkldnn::batch_normalization_forward( + *batchnorm_fwd_pd_, + (const mkldnn::primitive::at)*src_mem_, + (const mkldnn::primitive::at)*mean_mem_, + (const mkldnn::primitive::at)*var_mem_, + (const mkldnn::memory)*scale_shift_mem_, + (const mkldnn::memory)*primitive_dst_mem_); + net.push_back(bn); + + // Allocate dst buffer if reorder is necessary + if (mklnode_ptr_->output_index >= 0) { + // one of the end nodes. Allocate output buffer memory and + // reorder is necessary + mkldnn::memory::data_type t = MklDnnType(); + InitDstReorderOutput(cpu_engine, t, net); + } + return Status::OK(); + } + + Status Bind(Ort::CustomOpApi ort, OrtKernelContext* context) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + if (!primitive_created_.IsOK()) { + // abort as MKLDNN cannot execute this. but + // ORT try to delete output_tensor buffer data. allocate memory so that it can delete + // fix for test_averagepool_1d_default node test + return primitive_created_; + } + + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + src_mem_->set_data_handle(static_cast(const_cast(src_data))); + } + + const OrtValue* scale_input_tensor = ort.KernelContext_GetInput(context, input_index + 1); + const T* scale_data = reinterpret_cast(ort.GetTensorData(scale_input_tensor)); + const OrtValue* b_input_tensor = ort.KernelContext_GetInput(context, input_index + 2); + const T* b_data = reinterpret_cast(ort.GetTensorData(b_input_tensor)); + const OrtValue* mean_input_tensor = ort.KernelContext_GetInput(context, input_index + 3); + const T* mean_data = reinterpret_cast(ort.GetTensorData(mean_input_tensor)); + const OrtValue* var_input_tensor = ort.KernelContext_GetInput(context, input_index + 4); + const T* var_data = reinterpret_cast(ort.GetTensorData(var_input_tensor)); + + auto tensor_info = ort.GetTensorTypeAndShape(scale_input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto sshape = tensor_shape.data(); + auto sdim = tensor_shape.size(); + + TensorShape scale_shape(sshape, sdim); + mkldnn::memory::dims scale_dims_mkl( + scale_shape.GetDims().begin(), scale_shape.GetDims().end()); + + mean_mem_->set_data_handle(static_cast(const_cast(mean_data))); + var_mem_->set_data_handle(static_cast(const_cast(var_data))); + + T* scale_shift_buf = static_cast(scale_shift_mem_->get_data_handle()); + + size_t src_bytes = sizeof(T) * scale_dims_mkl[0]; + size_t dst_bytes = sizeof(T) * scale_dims_mkl[0]; + + MEMCPY_S(scale_shift_buf, scale_data, src_bytes, dst_bytes); + MEMCPY_S(&scale_shift_buf[scale_dims_mkl[0]], b_data, src_bytes, dst_bytes); + + if (mklnode_ptr_->output_index >= 0) { + auto& y_dims = primitive_dst_shape_.GetDims(); + // Allocate memory for output bufffer + OrtValue* output = ort.KernelContext_GetOutput(context, mklnode_ptr_->output_index, &y_dims[0], static_cast(primitive_dst_shape_.GetDims().size())); + T* dst_data = ort.GetTensorMutableData(output); + + if (primitive_dst_format_ != ort_source_format_) { + reorder_dst_mem_to_->set_data_handle(dst_data); + } else { + primitive_dst_mem_->set_data_handle(dst_data); + } + } + return Status::OK(); + } + + private: + std::shared_ptr src_mem_; + std::unique_ptr scale_shift_mem_; + std::unique_ptr mean_mem_; + std::unique_ptr var_mem_; + std::unique_ptr dst_mem_; + + std::unique_ptr src_md_; + std::unique_ptr scale_shift_md_; + std::unique_ptr mean_md_; + std::unique_ptr var_md_; + std::unique_ptr dst_md_; + + std::unique_ptr batchnorm_fwd_; + std::unique_ptr batchnorm_fwd_pd_; + + protected: + float epsilon_ = 1e-5f; +}; +} // namespace mkl_dnn +} // namespace onnxruntime \ No newline at end of file diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_conv.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_conv.h new file mode 100644 index 0000000000000..1c24e79c08c9f --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_conv.h @@ -0,0 +1,634 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once +#include "mkldnn_types.h" +#include "core/framework/op_kernel.h" +#include "core/providers/mkldnn/mkldnn_fwd.h" +#include "core/providers/cpu/nn/autopad_type.h" +#include "core/providers/mkldnn/mkldnn_execution_provider.h" +#include "core/providers/mkldnn/subgraph/mkldnn_kernel.h" +#include "core/util/math.h" + +namespace onnxruntime { +namespace mkl_dnn { + +// helper function +template +Status ComputePadAndOutputShape( + const int64_t in_dim, + const int64_t stride, + const int64_t kernel, + const int64_t dilation, + AutoPadType pad_type, + int64_t* pad_head, + int64_t* pad_tail, + int64_t* out_dim) { + const int64_t dkernel = dilation * (kernel - 1) + 1; + + if (pad_type == AutoPadType::NOTSET) { + *out_dim = static_cast(static_cast(in_dim + *pad_head + *pad_tail - dkernel) / stride + 1); + } else { + switch (pad_type) { + case AutoPadType::VALID: + *pad_head = 0; + *pad_tail = 0; + *out_dim = (in_dim - dkernel) / stride + 1; + break; + case AutoPadType::SAME_UPPER: + case AutoPadType::SAME_LOWER: { + ORT_ENFORCE(dilation == 1, "Dilation not supported for AutoPadType::SAME_UPPER or AutoPadType::SAME_LOWER."); + int64_t legacy_target_size = (in_dim + stride - 1) / stride; + int64_t pad_needed = (legacy_target_size - 1) * stride + kernel - in_dim; + *out_dim = (in_dim + pad_needed - dkernel) / stride + 1; + + // make sure padding is symmetric + if (ForceSymmetricAutoPadding) + pad_needed = math::roundUpPow2(pad_needed); + + if (pad_type == AutoPadType::SAME_LOWER) { + *pad_head = (pad_needed + 1) / 2; + } else { + *pad_head = pad_needed / 2; + } + *pad_tail = pad_needed - *pad_head; + } break; + default: + return Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT, "pad type not supported."); + } + } + return Status::OK(); +} + +template +class MklDnnConv : public MklDnnKernel { + public: + MklDnnConv(MklDnnNode& node, + MKLDNNExecutionProvider* provider, + std ::shared_ptr mkl_context, + const NodeAttributes& attributes, + const std::string attributes_prefix = "") : MklDnnKernel(node, provider, mkl_context) { + ReadAttributes(attributes, attributes_prefix); + } + + Status CreatePrimitives(Ort::CustomOpApi ort, + OrtKernelContext* context, + mkldnn::engine& cpu_engine, + std::vector& net, + mkldnn::memory::format& source_format) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + const OrtValue* winput_tensor = ort.KernelContext_GetInput(context, input_index + 1); + auto wtensor_info = ort.GetTensorTypeAndShape(winput_tensor); + auto wtensor_shape = ort.GetTensorShape(wtensor_info); + ort.ReleaseTensorTypeAndShapeInfo(wtensor_info); + auto wshape = wtensor_shape.data(); + auto wdim = wtensor_shape.size(); + + TensorShape w_shape(wshape, wdim); + const int group_mkl = static_cast(group_); + + TensorShape x_shape; + // std::unique_ptr x_shape; + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + + x_shape = TensorShape(xshape, xdim); + + mkldnn::memory::dims src_dims_mkl(x_shape.GetDims().begin(), x_shape.GetDims().end()); + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + } else { + // get the output of previous node (mkldnn block propagation). + // TODO Sourcenode will set src of this node. + x_shape = parents_[0].get()->primitive_dst_shape_; + ort_source_format_ = source_format; + src_format_ = parents_[0].get()->primitive_dst_format_; + mkldnn::memory::dims src_dims_mkl(x_shape.GetDims().begin(), x_shape.GetDims().end()); + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + } + + primitive_created_ = ValidateInputShape(x_shape, w_shape); + if (!primitive_created_.IsOK()) + return primitive_created_; + + std::vector kernel_shape; + primitive_created_ = ComputeKernelShape(w_shape, kernel_shape); + if (!primitive_created_.IsOK()) + return primitive_created_; + + const size_t kernel_rank = kernel_shape.size(); + + if (kernel_rank + 2 != wdim) { + primitive_created_ = ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "kernel_shape num_dims is not compatible with W num_dims.", + " kernel_shape: ", TensorShape(kernel_shape).ToString().c_str(), + " W: ", w_shape.ToString().c_str()); + return primitive_created_; + } + + for (size_t i = 0; i < kernel_rank; ++i) { + if (kernel_shape[i] != w_shape[i + 2]) { + primitive_created_ = ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "kernel_shape is not compatible with W shape.", + " kernel_shape: ", TensorShape(kernel_shape).ToString().c_str(), + " W: ", w_shape.ToString().c_str()); + return primitive_created_; + } + } + + std::vector pads(pads_); + if (pads.empty()) { + pads.resize(kernel_rank * 2, 0); + } + std::vector dilations(dilations_); + if (dilations.empty()) { + dilations.resize(kernel_rank, 1); + } + std::vector strides(strides_); + if (strides.empty()) { + strides.resize(kernel_rank, 1); + } + + const int64_t N = x_shape[0]; + const int64_t M = w_shape[0]; + std::vector y_dims; + y_dims.insert(y_dims.begin(), {N, M}); + TensorShape input_shape = x_shape.Slice(2); + primitive_created_ = InferOutputShape(input_shape, kernel_shape, strides, dilations, &pads, &y_dims); + if (!primitive_created_.IsOK()) + return primitive_created_; + + TensorShape y_shape(y_dims); + primitive_dst_shape_ = TensorShape(y_dims); + TensorShape output_shape = y_shape.Slice(2); + mkldnn::memory::dims dst_dims_mkl(y_dims.begin(), y_dims.end()); + primitive_dst_md_.reset(new mkldnn::memory::desc( + {dst_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + + mkldnn::memory::dims filter_dims_mkl; + if (group_mkl == 1) { + filter_dims_mkl.assign(w_shape.GetDims().begin(), w_shape.GetDims().end()); + } else { + filter_dims_mkl.assign({group_mkl, + static_cast(w_shape[0] / group_mkl)}); + filter_dims_mkl.insert(filter_dims_mkl.end(), w_shape.GetDims().begin() + 1, w_shape.GetDims().end()); + } + mkldnn::memory::dims strides_mkl(strides.begin(), strides.end()); + mkldnn::memory::dims dilations_mkl(dilations.begin(), dilations.end()); + // mkldnn dilations start from 0 so we need to subtract 1 from each dim. + for (size_t dim = 0; dim < kernel_rank; dim++) { + dilations_mkl[dim] -= 1; + } + + mkldnn::memory::dims padding_left_mkl(pads.begin(), pads.begin() + kernel_rank); + mkldnn::memory::dims padding_right_mkl(pads.begin() + kernel_rank, pads.end()); + mkldnn::memory::dims bias_dims_mkl; + if (mklnode_ptr_->num_inputs == 3) { + const OrtValue* binput_tensor = ort.KernelContext_GetInput(context, input_index + 2); + auto btensor_info = ort.GetTensorTypeAndShape(binput_tensor); + auto btensor_shape = ort.GetTensorShape(btensor_info); + ort.ReleaseTensorTypeAndShapeInfo(btensor_info); + auto bshape = btensor_shape.data(); + auto bdim = btensor_shape.size(); + TensorShape b_shape(bshape, bdim); + bias_dims_mkl.assign(b_shape.GetDims().begin(), b_shape.GetDims().end()); + } + + auto fmt = mkldnn::memory::format::any; + if (kernel_rank == 1) { + fmt = mkldnn::memory::format::ncw; + if (group_mkl == 1) { + filter_format_ = mkldnn::memory::format::oiw; + } else { + filter_format_ = mkldnn::memory::format::goiw; + } + } else if (kernel_rank == 2) { + fmt = mkldnn::memory::format::nchw; + if (group_mkl == 1) { + filter_format_ = mkldnn::memory::format::oihw; + } else { + filter_format_ = mkldnn::memory::format::goihw; + } + } else { + fmt = mkldnn::memory::format::ncdhw; + if (group_mkl == 1) { + filter_format_ = mkldnn::memory::format::oidhw; + } else { + filter_format_ = mkldnn::memory::format::goidhw; + } + } + if (src_format_ == mkldnn::memory::format::any) { + src_format_ = fmt; + ort_source_format_ = fmt; + source_format = fmt; + } + + // Set the memory descriptors to format::any to allow MKLDNN to decide what the optimal memory layout should be + // for the computation given the input + filter_md_.reset(new mkldnn::memory::desc( + {filter_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + if (!bias_dims_mkl.empty()) + bias_md_.reset(new mkldnn::memory::desc( + {bias_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + + if (!bias_dims_mkl.empty()) { + fwd_desc_.reset(new mkldnn::convolution_forward::desc( + mkldnn::prop_kind::forward, mkldnn::convolution_direct, *src_md_, + *filter_md_, *bias_md_, *primitive_dst_md_, + strides_mkl, dilations_mkl, padding_left_mkl, + padding_right_mkl, mkldnn::padding_kind::zero)); + } else { + fwd_desc_.reset(new mkldnn::convolution_forward::desc( + mkldnn::prop_kind::forward, mkldnn::convolution_direct, *src_md_, + *filter_md_, *primitive_dst_md_, strides_mkl, + dilations_mkl, padding_left_mkl, + padding_right_mkl, mkldnn::padding_kind::zero)); + } + + if (fuse_relu_) { + mkldnn::primitive_attr attr; + attr.set_int_output_round_mode(mkldnn::round_mode::round_nearest); + // Execute RELU as Fuse PostOps + const float ops_scale = 1.f; + const float ops_alpha = 0.f; // relu negative slope + const float ops_beta = 0.f; + mkldnn::post_ops ops; + ops.append_eltwise(ops_scale, mkldnn::algorithm::eltwise_relu, ops_alpha, ops_beta); + attr.set_post_ops(ops); + + conv_fwd_pd_.reset(new mkldnn::convolution_forward::primitive_desc( + *fwd_desc_, attr, cpu_engine)); + } else { + conv_fwd_pd_.reset(new mkldnn::convolution_forward::primitive_desc( + *fwd_desc_, cpu_engine)); + } + + primitive_src_format_ = static_cast( + conv_fwd_pd_.get()->src_primitive_desc().desc().data.format); + + mkldnn_filter_format_ = static_cast( + conv_fwd_pd_.get()->weights_primitive_desc().desc().data.format); + + primitive_dst_format_ = static_cast( + conv_fwd_pd_.get()->dst_primitive_desc().desc().data.format); + + src_size_ = conv_fwd_pd_.get()->src_primitive_desc().get_size(); + filter_size_ = conv_fwd_pd_.get()->weights_primitive_desc().get_size(); + dst_size_ = conv_fwd_pd_.get()->dst_primitive_desc().get_size(); + + filter_mem_.reset( + new mkldnn::memory(conv_fwd_pd_.get()->weights_primitive_desc(), nullptr)); + + if (primitive_src_format_ != src_format_) { + mkldnn::memory::dims src_dims_mkl(x_shape.GetDims().begin(), x_shape.GetDims().end()); + auto src_md = mkldnn::memory::desc(src_dims_mkl, MklDnnType(), src_format_); + auto pd = mkldnn::memory::primitive_desc(src_md, cpu_engine); + + if (mklnode_ptr_->parent_nodes.size() == 0) + src_mem_from_.reset(new mkldnn::memory(pd, nullptr)); + else + src_mem_from_ = parents_[0].get()->primitive_dst_mem_; + + src_mem_.reset(new mkldnn::memory(conv_fwd_pd_->src_primitive_desc(), nullptr)); + net.push_back(mkldnn::reorder(*src_mem_from_, *src_mem_)); + } else { + if (mklnode_ptr_->parent_nodes.size() == 0) { + src_mem_.reset(new mkldnn::memory(conv_fwd_pd_->src_primitive_desc(), nullptr)); + } else { + src_mem_ = parents_[0].get()->primitive_dst_mem_; + } + } + + if (mklnode_ptr_->output_index >= 0) { + // Use mkldnn's internal output buffer + if (primitive_dst_format_ != ort_source_format_) { + primitive_dst_mem_.reset(new mkldnn::memory(conv_fwd_pd_.get()->dst_primitive_desc())); + } else { + primitive_dst_mem_.reset(new mkldnn::memory(conv_fwd_pd_.get()->dst_primitive_desc(), nullptr)); + } + } else { + // last node of sub-graph. need to allocate memory for output_tensor + primitive_dst_mem_.reset(new mkldnn::memory(conv_fwd_pd_.get()->dst_primitive_desc())); + } + + if (!bias_dims_mkl.empty()) { + bias_mem_.reset(new mkldnn::memory(conv_fwd_pd_.get()->bias_primitive_desc(), nullptr)); + conv_fwd_.reset(new mkldnn::convolution_forward(*conv_fwd_pd_, *src_mem_, *filter_mem_, + *bias_mem_, *primitive_dst_mem_)); + } else { + conv_fwd_.reset(new mkldnn::convolution_forward(*conv_fwd_pd_, *src_mem_, + *filter_mem_, *primitive_dst_mem_)); + } + net.push_back(*conv_fwd_); + + if (mklnode_ptr_->output_index >= 0) { + // one of the end nodes. Allocate output buffer memory and + // reorder is necessary + mkldnn::memory::data_type t = MklDnnType(); + InitDstReorderOutput(cpu_engine, t, net); + } + primitive_created_ = Status::OK(); + return primitive_created_; + } + + virtual void ReorderWeights(Ort::CustomOpApi ort, OrtKernelContext* context, mkldnn::engine& cpu_engine) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index+1); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + + TensorShape W(xshape, xdim); + const T* filter_data = const_cast(ort.GetTensorData(input_tensor)); + + const int group_mkl = static_cast(group_); + + mkldnn::memory::dims filter_dims_mkl; + if (group_mkl == 1) { + filter_dims_mkl.assign(W.GetDims().begin(), W.GetDims().end()); + } else { + filter_dims_mkl.assign({group_mkl, + static_cast(W[0] / group_mkl)}); + filter_dims_mkl.insert(filter_dims_mkl.end(), W.GetDims().begin() + 1, W.GetDims().end()); + } + + { + // lock to make sure reordering is done only once + std::lock_guard lock(provider_->GetMutex()); + std::shared_ptr filter_dst_mem = provider_->GetWeightsMemoryBuffer(mklnode_ptr_->weight_name); + + if (filter_dst_mem == nullptr) { + auto pd = mkldnn::memory::primitive_desc( + mkldnn::memory::desc(filter_dims_mkl, MklDnnType(), filter_format_), cpu_engine); + mkldnn::memory src = mkldnn::memory(pd, (void*)filter_data); + IAllocatorUniquePtr filter_reorder_buffer = + IAllocator::MakeUniquePtr(alloc_, filter_size_); + filter_dst_mem.reset( + new mkldnn::memory(conv_fwd_pd_->weights_primitive_desc(), filter_reorder_buffer.get())); + + MemoryReorderParams params(src, *filter_dst_mem); + DoReorder(params); + provider_->SaveAllocatedMemory(std::move(filter_reorder_buffer)); + filter_data = static_cast(filter_dst_mem->get_data_handle()); + provider_->SetWeightsMemoryBuffer(mklnode_ptr_->weight_name, filter_dst_mem); + } + } + } + + Status Bind(Ort::CustomOpApi ort, OrtKernelContext* context) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + if (!primitive_created_.IsOK()) { + // abort as MKLDNN cannot execute this. but + // ORT try to delete output_tensor buffer data. allocate memory so that it can delete + // fix for test_averagepool_1d_default node test + //auto xshape = input_tensors[input_index].shape; + //auto xdim = input_tensors[input_index].ndim; + //AllocateOutputTensor(output_tensors, mklnode_ptr_->output_index, xshape, xdim, input_tensors[0].dtype); + return primitive_created_; + } + const OrtValue* winput_tensor = ort.KernelContext_GetInput(context, input_index+1); + const T* filter_data = const_cast(ort.GetTensorData(winput_tensor)); + + const T* bias_data = nullptr; + if (mklnode_ptr_->num_inputs == 3) { + const OrtValue* binput_tensor = ort.KernelContext_GetInput(context, input_index + 2); + bias_data = const_cast(ort.GetTensorData(binput_tensor)); + } + std::shared_ptr filter_dst_mem = provider_->GetWeightsMemoryBuffer(mklnode_ptr_->weight_name); + if (filter_dst_mem == nullptr) { + ReorderWeights(ort, context, GetEngine()); + filter_dst_mem = provider_->GetWeightsMemoryBuffer(mklnode_ptr_->weight_name); + } + filter_data = static_cast(filter_dst_mem->get_data_handle()); + + filter_mem_->set_data_handle(static_cast(const_cast(filter_data))); + if (bias_data != nullptr) { + bias_mem_->set_data_handle(static_cast(const_cast(bias_data))); + } + + if (primitive_src_format_ != src_format_) { + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + src_mem_from_->set_data_handle(static_cast(const_cast(src_data))); + } else { + src_mem_from_ = parents_[0].get()->primitive_dst_mem_; + } + + auto src_size = conv_fwd_pd_.get()->src_primitive_desc().get_size(); + src_reorder_buffer_ = IAllocator::MakeUniquePtr(alloc_, src_size); + src_mem_->set_data_handle(src_reorder_buffer_.get()); + } else { + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + src_mem_->set_data_handle(static_cast(const_cast(src_data))); + } else { + src_mem_ = parents_[0].get()->primitive_dst_mem_; + } + } + + if (mklnode_ptr_->output_index >= 0) { + auto& y_dims = primitive_dst_shape_.GetDims(); + // Allocate memory for output bufffer + OrtValue* output = ort.KernelContext_GetOutput(context, mklnode_ptr_->output_index, &y_dims[0], static_cast(primitive_dst_shape_.GetDims().size())); + T* dst_data = ort.GetTensorMutableData(output); + + if (primitive_dst_format_ != ort_source_format_) { + reorder_dst_mem_to_->set_data_handle(dst_data); + } else { + primitive_dst_mem_->set_data_handle(dst_data); + } + } + return Status::OK(); + } + + private: + void ReadAttributes(const NodeAttributes& attributes, + const std::string attributes_prefix = "") override { + std::string auto_pad; + auto attr = attributes.find(attributes_prefix + "auto_pad"); + if (attr != attributes.end() && + attr->second.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_STRING) { + auto_pad = attr->second.s(); + } + auto_pad_ = (auto_pad != "") ? StringToAutoPadType(auto_pad) : AutoPadType::NOTSET; + + kernel_shape_specified_ = false; + attr = attributes.find(attributes_prefix + "kernel_shape"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + Status status = GetIntsAttr(proto, kernel_shape_); + kernel_shape_specified_ = true; + } + + attr = attributes.find(attributes_prefix + "strides"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + Status status = GetIntsAttr(proto, strides_); + } + + bool attr_read = false; + attr = attributes.find(attributes_prefix + "pads"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + if (GetIntsAttr(proto, pads_) == Status::OK()) + attr_read = true; + } + if (!attr_read) { + pads_.resize(kernel_shape_.size() * 2, 0); + } + + attr_read = false; + attr = attributes.find(attributes_prefix + "dilations"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + if (GetIntsAttr(proto, dilations_) == Status::OK()) + attr_read = true; + } + if (!attr_read) { + dilations_.resize(kernel_shape_.size(), 1); + } + + attr_read = false; + attr = attributes.find(attributes_prefix + "group"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + if (GetIntAttr(proto, group_) == Status::OK()) + attr_read = true; + } + if (!attr_read) { + group_ = 1; + } + } + + private: + mkldnn::memory::format mkldnn_filter_format_; + mkldnn::memory::format filter_format_; + + std::shared_ptr src_mem_from_; + std::unique_ptr src_mem_to_; + + size_t src_size_; + size_t filter_size_; + size_t dst_size_; + + std::shared_ptr src_mem_; + std::unique_ptr filter_mem_; + std::unique_ptr bias_mem_; + + std::unique_ptr fwd_desc_; + + std::unique_ptr src_md_; + std::unique_ptr filter_md_; + std::unique_ptr bias_md_; + + std::unique_ptr conv_fwd_pd_; + std::unique_ptr conv_fwd_; + + private: + IAllocatorUniquePtr src_reorder_buffer_; + IAllocatorUniquePtr dst_reorder_buffer_; + + private: + Status ComputeKernelShape(const TensorShape& weight_shape, std::vector& kernel_shape) const { + if (kernel_shape_specified_) { + kernel_shape = kernel_shape_; + if (kernel_shape.size() + 2 != weight_shape.NumDimensions()) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "kernel_shape num_dims is not compatible with W num_dims.", + " kernel_shape: ", TensorShape(kernel_shape).ToString().c_str(), + " W: ", weight_shape.ToString().c_str()); + } + for (size_t i = 0; i < kernel_shape.size(); ++i) { + if (kernel_shape[i] != weight_shape[i + 2]) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "kernel_shape is not compatible with W shape.", + " kernel_shape: ", TensorShape(kernel_shape).ToString().c_str(), + " W: ", weight_shape.ToString().c_str()); + } + } + } else { + auto& weight_dims = weight_shape.GetDims(); + kernel_shape = std::vector(weight_dims.begin() + 2, weight_dims.end()); + } + + return Status::OK(); + } + + Status ValidateInputShape(const TensorShape& X, const TensorShape& W) const { + const int64_t C = X[1]; + const int64_t M = W[0]; + + if (X.NumDimensions() != W.NumDimensions()) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "X num_dims does not match W num_dims.", + " X: ", X.ToString().c_str(), + " W: ", W.ToString().c_str()); + } + + if (C != W[1] * group_) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Input channels C is not equal to kernel channels * group.", + " C: ", C, + " kernel channels: ", W[1], + " group: ", group_); + } + + if (M % group_ != 0) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Output channels M is not divisible by group.", + " M: ", M, + " group: ", group_); + } + return Status::OK(); + } + + template + Status InferOutputShape(const TensorShape& input_shape, + const std::vector& kernel_shape, + const std::vector& strides, + const std::vector& dilations, + std::vector* pads, + std::vector* output_shape) const { + int rank = gsl::narrow_cast(input_shape.NumDimensions()); + for (int dim = 0; dim < rank; ++dim) { + if (dim >= strides.size() || dim >= kernel_shape.size() || + dim >= dilations.size() || dim >= pads->size() || + rank + dim >= pads->size()) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Out of bound access to array"); + } + int64_t dim_size = 0; + ORT_RETURN_IF_ERROR(ComputePadAndOutputShape( + input_shape[dim], + strides[dim], + kernel_shape[dim], + dilations[dim], + auto_pad_, + &pads->at(dim), + &pads->at(input_shape.NumDimensions() + dim), + &dim_size)); + if (dim_size <= 0) { + return Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT, "Invalid input shape: " + input_shape.ToString()); + } + output_shape->push_back(dim_size); + } + return Status::OK(); + } + + private: + std::vector kernel_shape_; // must use ComputeKernelShape(...), instead of kernel_shape_ + AutoPadType auto_pad_; + int64_t group_; + bool kernel_shape_specified_; + std::vector strides_; + std::vector pads_; + std::vector dilations_; + std::string activation_; + float alpha_; +}; +} // namespace mkl_dnn +} // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_func_kernel.cc b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_func_kernel.cc new file mode 100644 index 0000000000000..6cf10e12a65e0 --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_func_kernel.cc @@ -0,0 +1,252 @@ +// Copyright(C) 2018 Intel Corporation +// Licensed under the MIT License +#ifdef _MSC_VER +#pragma warning(disable : 4505) //Unreferenced local function has been removed +#endif + +#include "mkldnn_func_kernel.h" +#include "core/common/exceptions.h" + #include "core/session/onnxruntime_cxx_api.h" +#include "core/providers/mkldnn/mkldnn_common.h" +#include "core/providers/mkldnn/subgraph/mkldnn_conv.h" +#include "core/providers/mkldnn/subgraph/mkldnn_batchnorm.h" +#include "core/providers/mkldnn/subgraph/mkldnn_activations.h" +#include "core/providers/mkldnn/subgraph/mkldnn_pool.h" +#include "core/providers/mkldnn/subgraph/mkldnn_sum.h" +#include "core/providers/mkldnn/subgraph/mkldnn_lrn.h" +#include "core/session/onnxruntime_cxx_api.h" + +namespace onnxruntime { +namespace mkl_dnn { + +namespace { +template +class SubgraphPrimitive : public PrimitiveBase { + public: + SubgraphPrimitive(Ort::CustomOpApi ort, + OrtKernelContext* context, + const SubgraphParams& params) + : cpu_engine_(GetEngine()), ort_(ort) { + context_.stream.reset(new mkldnn::stream(mkldnn::stream::kind::eager)); + + if (context_.net.size() == 0) { + CreateKernels(params); + Initialize(context); + } + } + + Status Compute(OrtKernelContext* context) { + Status status; + for (auto& kernel : context_.kernels) { + status = kernel->Bind(ort_, context); + if (!status.IsOK()) + break; + } + if (status.IsOK()) + context_.stream->submit(context_.net); + return status; + } + + ~SubgraphPrimitive() = default; + + private: + void CreateKernels(const SubgraphParams& params) { + for (auto& mklnode : params.subgraph->mklnodes) { + if (mklnode.name == "Conv") { + std::ostringstream os; + os << "Conv-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnConv(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "Conv-Relu") { + std::ostringstream os; + os << "Conv-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnConv(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + kernel->fuse_relu_ = true; + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "Relu") { + std::ostringstream os; + os << "Relu-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnRelu(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "BatchNormalization") { + std::ostringstream os; + os << "BatchNormalization-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnBatchNorm(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "BatchNormalization-Relu") { + std::ostringstream os; + os << "BatchNormalization-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnBatchNorm(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + kernel->fuse_relu_ = true; + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "MaxPool") { + std::ostringstream os; + os << "MaxPool-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnPool(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "GlobalMaxPool") { + std::ostringstream os; + os << "GlobalMaxPool-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnPool(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "AveragePool") { + std::ostringstream os; + os << "AveragePool-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnPool(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "GlobalAveragePool") { + std::ostringstream os; + os << "GlobalAveragePool-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnPool(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "Sum") { + std::ostringstream os; + os << "Sum-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnSum(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } else if (mklnode.name == "LRN") { + std::ostringstream os; + os << "LRN-" << mklnode.node_index << "-"; + std::shared_ptr> kernel; + kernel.reset(new MklDnnLrn(mklnode, params.provider, params.mkl_context, params.attributes, os.str())); + for (auto& index : mklnode.parent_nodes) { + kernel->parents_.push_back(context_.kernels[index]); + } + context_.kernels.push_back(kernel); + } + } + } + + private: + struct SubgraphContext { + std::unique_ptr stream; + std::vector net; + std::vector> kernels; + + SubgraphContext() : stream(nullptr) {} + }; + + Status Initialize(OrtKernelContext* context) { + // Propagate mkldnn block format + // dst format of current node to src format of next node + mkldnn::memory::format source_format = mkldnn::memory::format::any; // ONNXRuntime format + for (auto& kernel : context_.kernels) { + Status status = kernel->CreatePrimitives(ort_, context, cpu_engine_, context_.net, source_format); + if (status.IsOK()) + kernel->ReorderWeights(ort_, context, cpu_engine_); + else + return status; + } + return Status::OK(); + } + + SubgraphContext context_; + mkldnn::engine& cpu_engine_; + Ort::CustomOpApi ort_; +}; + +// Pool which allows for reuse of MKLDNN Conv primitives which are expensive to instantiate. +// To address thread safety, the primitives are stored in a map on thread local storage. +template +class SubgraphPrimitivePool : public PrimitivePool { + public: + static SubgraphPrimitive* Get(Ort::CustomOpApi ort, + OrtKernelContext* context, + const SubgraphParams& params) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, 0); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + // auto tensor_type = ort.GetTensorElementType(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + + std::vector> input_shapes; + + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + + TensorShape x_shape(xshape, xdim); + mkldnn::memory::dims src_dims(x_shape.GetDims().begin(), x_shape.GetDims().end()); + std::string dims_str; + AddDimsToKey(dims_str, src_dims); + + SubgraphPrimitive* primitive = dynamic_cast*>( + SubgraphPrimitivePool::GetInstance().GetPrimitive(params.subgraph_key + dims_str)); + + if (primitive == nullptr) { + auto subgraph_primitive = std::make_unique>(ort, context, params); + primitive = subgraph_primitive.get(); + SubgraphPrimitivePool::GetInstance().SetPrimitive(params.subgraph_key + dims_str, std::move(subgraph_primitive)); + } + return primitive; + } + + private: + SubgraphPrimitivePool() = default; + ~SubgraphPrimitivePool() = default; + + static SubgraphPrimitivePool& GetInstance() { + static SubgraphPrimitivePool pool; + return pool; + } +}; +} // namespace + +template +Status MkldnnFuncKernel::Compute(const OrtCustomOpApi* api, OrtKernelContext* context) const { + Status status; + Ort::CustomOpApi ort{*api}; + + try { + SubgraphPrimitive* primitive = SubgraphPrimitivePool::Get(ort, context, params_); + status = primitive->Compute(context); + } catch (const mkldnn::error& e) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Status: ", e.status, + ", message: ", e.message.c_str()); + } + return status; +} + +template class MkldnnFuncKernel; + +} // namespace mkl_dnn +} // namespace onnxruntime \ No newline at end of file diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_func_kernel.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_func_kernel.h new file mode 100644 index 0000000000000..de2b342ffea7c --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_func_kernel.h @@ -0,0 +1,52 @@ +// Copyright(C) 2018 Intel Corporation +// Licensed under the MIT License +#pragma once + +#include "core/graph/onnx_protobuf.h" +#include "core/providers/mkldnn/mkldnn_execution_provider.h" +#include "core/session/onnxruntime_c_api.h" +#include "core/framework/func_api.h" + +namespace onnxruntime { +namespace mkl_dnn { + +namespace { +struct SubgraphParams { + NodeAttributes attributes; + MKLDNNExecutionProvider* provider; + std::shared_ptr subgraph; + std::shared_ptr mkl_context; + std::string subgraph_id; + std::string subgraph_key; + + SubgraphParams() {} +}; +} // namespace + +template +class MkldnnFuncKernel { + public: + explicit MkldnnFuncKernel(const ComputeContext* context, + const NodeAttributes& attributes, + MKLDNNExecutionProvider* provider) { + params_.provider = provider; + params_.attributes = attributes; + params_.mkl_context.reset(new MKLContext(context->allocate_func, context->release_func, context->allocator_handle)); + + auto sub_it = attributes.find("subgraph_id"); + if (sub_it->second.type() == ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_STRING) { + params_.subgraph_id = sub_it->second.s(); + params_.subgraph = provider->GetMklDnnSubgraph(params_.subgraph_id); + std::ostringstream key_os; + key_os << params_.subgraph_id << "-" << params_.subgraph->mklnodes.back().name << "-" << params_.subgraph->mklnodes.back().output_name; + params_.subgraph_key = key_os.str(); + } + } + + Status Compute(const OrtCustomOpApi* api, OrtKernelContext* context) const; + + private: + SubgraphParams params_; +}; +} // namespace mkl_dnn +} // namespace onnxruntime \ No newline at end of file diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_kernel.cc b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_kernel.cc new file mode 100644 index 0000000000000..62b680b6f6b75 --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_kernel.cc @@ -0,0 +1,83 @@ +// Copyright(C) 2019 Intel Corporation +// Licensed under the MIT License + +#include "mkldnn_kernel.h" + +namespace onnxruntime { +namespace mkl_dnn { + +void MklDnnKernel::InitDstReorderOutput(mkldnn::engine& cpu_engine, + mkldnn::memory::data_type& data_type, + std::vector& net) { + // Allocate dst buffer if reorder is necessary + if (primitive_dst_format_ != ort_source_format_) { + // reorder to ONNXRuntime format + mkldnn::memory::dims dst_dims_mkl( + primitive_dst_shape_.GetDims().begin(), primitive_dst_shape_.GetDims().end()); + mkldnn::memory::desc dst_des = mkldnn::memory::desc(dst_dims_mkl, + data_type, ort_source_format_); + reorder_dst_mem_to_.reset(new mkldnn::memory({dst_des, cpu_engine}, nullptr)); + net.push_back(mkldnn::reorder(*primitive_dst_mem_, *reorder_dst_mem_to_)); + } +} +/* +void MklDnnKernel::AllocateMemoryAndReorderIfNeeded(ONNXRunTimeTensor* const output_tensors, const DType& dtype) { + // End of sub-graph. Allocate memory and get the output + auto& y_dims = primitive_dst_shape_.GetDims(); + AllocateOutputTensor(output_tensors, mklnode_ptr_->output_index, + &y_dims[0], static_cast(primitive_dst_shape_.GetDims().size()), + dtype); + if (primitive_dst_format_ != ort_source_format_) { + reorder_dst_mem_to_->set_data_handle(output_tensors[mklnode_ptr_->output_index].data); + } else { + primitive_dst_mem_->set_data_handle(output_tensors[mklnode_ptr_->output_index].data); + } +} + +void MklDnnKernel::AllocateOutputTensor(ONNXRunTimeTensor* const output_tensors, + int index, const int64_t* shape, size_t dim) { + output_tensors[index].dtype = dtype; + output_tensors[index].ndim = dim; + output_tensors[index].shape = new int64_t[dim]; + memcpy(output_tensors[index].shape, shape, sizeof(int64_t) * dim); + int64_t data_size = 1; + for (auto j = 0; j < output_tensors[index].ndim; j++) + data_size *= output_tensors[index].shape[j]; + output_tensors[index].data = (*(mkl_context_->allocate_func))(mkl_context_->allocator, sizeof(double) * data_size, 64); +} +*/ + +mkldnn::memory::format MklDnnKernel::GetSourceFormat(int dim_size) { + mkldnn::memory::format source_format = mkldnn::memory::format::any; + switch (dim_size) { + case 1: { + source_format = mkldnn::memory::format::x; + break; + } + case 2: { + source_format = mkldnn::memory::format::nc; + break; + } + case 3: { + source_format = mkldnn::memory::format::ntc; + break; + } + case 4: { + source_format = mkldnn::memory::format::nchw; + break; + } + case 5: { + source_format = mkldnn::memory::format::ncdhw; + break; + } + default: { + source_format = mkldnn::memory::format::any; + break; + } + } + + return source_format; +} + +} // namespace mkl_dnn +} // namespace onnxruntime \ No newline at end of file diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_kernel.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_kernel.h new file mode 100644 index 0000000000000..9752352e83fd2 --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_kernel.h @@ -0,0 +1,118 @@ +// Copyright(C) 2019 Intel Corporation +// Licensed under the MIT License + +#pragma once +#ifdef _WIN32 +#pragma warning(disable : 4244) +#endif + +#include "mkldnn.hpp" +#include "core/common/cpuid_info.h" +#include "core/session/onnxruntime_cxx_api.h" +#include "core/providers/mkldnn/subgraph/subgraph.h" +#include "core/providers/mkldnn/mkldnn_execution_provider.h" + +namespace onnxruntime { +namespace mkl_dnn { + +class MklDnnKernel { + public: + explicit MklDnnKernel(MklDnnNode& node, + MKLDNNExecutionProvider* provider, + std::shared_ptr mkl_context) { + mkl_context_ = mkl_context; + mklnode_ptr_ = std::make_shared(node); + provider_ = provider; + alloc_ = provider_->GetAllocator(0, OrtMemTypeDefault); + } + virtual ~MklDnnKernel(){}; + + virtual Status CreatePrimitives(Ort::CustomOpApi ort, + OrtKernelContext* context, + mkldnn::engine& cpu_engine, + std::vector& net, + mkldnn::memory::format& src_fmt) = 0; + + virtual void ReorderWeights(Ort::CustomOpApi ort, OrtKernelContext* context, mkldnn::engine& cpu_engine) { + ORT_UNUSED_PARAMETER(ort); + ORT_UNUSED_PARAMETER(cpu_engine); + } + + virtual Status Bind(Ort::CustomOpApi ort, OrtKernelContext* context) = 0; + + protected: + virtual void ReadAttributes(const NodeAttributes& attributes, + const std::string attributes_prefix = "") { + ORT_UNUSED_PARAMETER(attributes); + ORT_UNUSED_PARAMETER(attributes_prefix); + } + + Status GetIntsAttr(ONNX_NAMESPACE::AttributeProto& proto, std::vector& values) { + ORT_RETURN_IF_NOT(proto.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_INTS); + values.reserve(proto.ints_size()); + for (int i = 0; i < proto.ints_size(); i++) { + values.push_back(proto.ints(i)); + } + return Status::OK(); + } + + Status GetIntAttr(ONNX_NAMESPACE::AttributeProto& proto, int64_t& value) { + ORT_RETURN_IF_NOT(proto.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_INT); + value = proto.i(); + return Status::OK(); + } + + Status GetFloatAttr(ONNX_NAMESPACE::AttributeProto& proto, float& value) { + ORT_RETURN_IF_NOT(proto.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_FLOAT); + value = proto.f(); + return Status::OK(); + } + Status GetStringAttr(ONNX_NAMESPACE::AttributeProto& proto, std::string& value) { + ORT_RETURN_IF_NOT(proto.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_STRING); + value = proto.s(); + return Status::OK(); + } + + void InitDstReorderOutput(mkldnn::engine& cpu_engine, + mkldnn::memory::data_type& data_type, + std::vector& net); + + //void AllocateMemoryAndReorderIfNeeded(const OrtCustomOpApi* api); + + //void AllocateOutputTensor(const OrtCustomOpApi* api, int index, const int64_t* shape, size_t dim); + + mkldnn::memory::format GetSourceFormat(int dim_size); + + public: + std::vector> parents_; + bool fuse_relu_ = false; + bool fuse_sum_ = false; + std::shared_ptr primitive_dst_mem_; + std::unique_ptr primitive_dst_md_; + TensorShape primitive_dst_shape_; + mkldnn::memory::format primitive_dst_format_ = mkldnn::memory::format::any; + + protected: + // ONNX Runtime format + mkldnn::memory::format ort_source_format_ = mkldnn::memory::format::any; + // input format. + // It can be ORT format (nchw) or blocked memory format from parent nce + mkldnn::memory::format src_format_ = mkldnn::memory::format::any; + // Pointer to MklNode of subgraph IR + std::shared_ptr mklnode_ptr_; + // input format expected by primitive object + mkldnn::memory::format primitive_src_format_ = mkldnn::memory::format::any; + + // memory used for reorders + std::unique_ptr reorder_dst_mem_to_; + + protected: + Status primitive_created_; + std::shared_ptr mkl_context_; + + AllocatorPtr alloc_; + MKLDNNExecutionProvider* provider_; +}; + +} // namespace mkl_dnn +} // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_lrn.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_lrn.h new file mode 100644 index 0000000000000..bd54f51ddaabf --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_lrn.h @@ -0,0 +1,182 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once +#include "core/util/math.h" +#include "core/util/math_cpuonly.h" +#include "core/framework/op_kernel.h" +#include "core/providers/mkldnn/mkldnn_fwd.h" +#include "core/providers/mkldnn/mkldnn_execution_provider.h" +#include "core/providers/mkldnn/subgraph/mkldnn_kernel.h" + +namespace onnxruntime { +namespace mkl_dnn { + +template +class MklDnnLrn : public MklDnnKernel { + public: + MklDnnLrn(MklDnnNode& node, + MKLDNNExecutionProvider* provider, + std ::shared_ptr mkl_context, + const NodeAttributes& attributes, + const std::string attributes_prefix = "") : MklDnnKernel(node, provider, mkl_context) { + ReadAttributes(attributes, attributes_prefix); + } + + Status CreatePrimitives(Ort::CustomOpApi ort, + OrtKernelContext* context, + mkldnn::engine& cpu_engine, + std::vector& net, + mkldnn::memory::format& source_format) { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + TensorShape x_shape; + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + + ort_source_format_ = GetSourceFormat(static_cast(xdim)); + source_format = ort_source_format_; + x_shape = TensorShape(xshape, xdim); + + mkldnn::memory::dims src_dims_mkl( + x_shape.GetDims().begin(), x_shape.GetDims().end()); + + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), source_format)); + src_mem_.reset( + new mkldnn::memory({*src_md_, cpu_engine}, nullptr)); + } else { + src_md_.reset( + new mkldnn::memory::desc(parents_[0].get()->primitive_dst_mem_.get()->get_primitive_desc().desc())); + src_mem_ = parents_[0].get()->primitive_dst_mem_; + x_shape = parents_[0].get()->primitive_dst_shape_; + ort_source_format_ = source_format; + } + + primitive_dst_shape_ = TensorShape(x_shape); + + mkldnn::algorithm algo = mkldnn::algorithm::lrn_across_channels; + fwd_desc_.reset(new mkldnn::lrn_forward::desc( + mkldnn::prop_kind::forward_scoring, algo, *src_md_, + size_, alpha_, beta_, bias_)); + + fwd_primitive_desc_.reset(new mkldnn::lrn_forward::primitive_desc( + *fwd_desc_, cpu_engine)); + + primitive_src_format_ = static_cast( + fwd_primitive_desc_.get()->src_primitive_desc().desc().data.format); + primitive_dst_format_ = static_cast( + fwd_primitive_desc_.get()->dst_primitive_desc().desc().data.format); + + if (mklnode_ptr_->output_index >= 0) { + // last node of sub-graph. need to allocate memory for output_tensor + if (primitive_dst_format_ != ort_source_format_) { + // reorder neded. Use primitive output as input to reorder and + // allocate buffer for reorder output, final output of this subgraph + primitive_dst_mem_.reset( + new mkldnn::memory(fwd_primitive_desc_.get()->dst_primitive_desc())); + } else { + // Last node but re-order not needed. Allocate buffer to output of this node + primitive_dst_mem_.reset( + new mkldnn::memory(fwd_primitive_desc_.get()->dst_primitive_desc(), nullptr)); + } + } else { + // Intermediate node. Use mkldnn kernel internal memory for output and + // use this as input to next node. + primitive_dst_mem_.reset( + new mkldnn::memory(fwd_primitive_desc_.get()->dst_primitive_desc())); + } + + lrn_fwd_.reset( + new mkldnn::lrn_forward(*fwd_primitive_desc_, *src_mem_, *primitive_dst_mem_)); + net.push_back(*lrn_fwd_); + + if (mklnode_ptr_->output_index >= 0) { + // one of the end nodes. Allocate output buffer memory and + // reorder is necessary + mkldnn::memory::data_type t = MklDnnType(); + InitDstReorderOutput(cpu_engine, t, net); + } + + return Status::OK(); + } + + Status Bind(Ort::CustomOpApi ort, OrtKernelContext* context) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + if (mklnode_ptr_->parent_nodes.size() == 0) { + // Sub-graph's first node. Read input from input buffer + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + src_mem_->set_data_handle(static_cast(const_cast(src_data))); + } + + if (mklnode_ptr_->output_index >= 0) { + auto& y_dims = primitive_dst_shape_.GetDims(); + // Allocate memory for output bufffer + OrtValue* output = ort.KernelContext_GetOutput(context, mklnode_ptr_->output_index, &y_dims[0], static_cast(primitive_dst_shape_.GetDims().size())); + T* dst_data = ort.GetTensorMutableData(output); + + if (primitive_dst_format_ != ort_source_format_) { + reorder_dst_mem_to_->set_data_handle(dst_data); + } else { + primitive_dst_mem_->set_data_handle(dst_data); + } + } + + return Status::OK(); + } + + private: + void ReadAttributes(const NodeAttributes& attributes, + const std::string attributes_prefix = "") override { + auto attr = attributes.find(attributes_prefix + "size"); + if (attr != attributes.end() && + attr->second.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_INT) { + size_ = attr->second.i(); + } + ORT_ENFORCE(size_ > 0); + ORT_ENFORCE(size_ % 2 == 1); + + attr = attributes.find(attributes_prefix + "alpha"); + if (attr != attributes.end() && + attr->second.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_FLOAT) { + alpha_ = attr->second.f(); + } + + attr = attributes.find(attributes_prefix + "beta"); + if (attr != attributes.end() && + attr->second.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_FLOAT) { + beta_ = attr->second.f(); + } + + bias_ = 1.0f; + attr = attributes.find(attributes_prefix + "bias"); + if (attr != attributes.end() && + attr->second.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_FLOAT) { + bias_ = attr->second.f(); + } + } + + private: + float alpha_ = 0; + float beta_ = 0; + float bias_ = 0; + int size_ = 0; + + private: + std::shared_ptr src_mem_; + + std::unique_ptr fwd_desc_; + std::unique_ptr fwd_primitive_desc_; + std::unique_ptr lrn_fwd_; + + std::unique_ptr src_md_; +}; +} // namespace mkl_dnn +} // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_pool.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_pool.h new file mode 100644 index 0000000000000..0671188d02365 --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_pool.h @@ -0,0 +1,426 @@ +// Copyright (c) Microsoft Corporation. All rights reserved. +// Licensed under the MIT License. + +#pragma once +#include "core/providers/mkldnn/mkldnn_fwd.h" +#include "core/providers/cpu/nn/autopad_type.h" +#include "core/providers/mkldnn/mkldnn_execution_provider.h" +#include "core/providers/mkldnn/subgraph/mkldnn_kernel.h" +#include "core/util/math.h" + +namespace onnxruntime { +namespace mkl_dnn { +template +class MklDnnPool : public MklDnnKernel { + public: + MklDnnPool(MklDnnNode& node, + MKLDNNExecutionProvider* provider, + std ::shared_ptr mkl_context, + const NodeAttributes& attributes, + const std::string attributes_prefix = "") : MklDnnKernel(node, provider, mkl_context) { + op_name_ = node.name; + ReadAttributes(attributes, attributes_prefix); + } + + Status CreatePrimitives(Ort::CustomOpApi ort, + OrtKernelContext* context, + mkldnn::engine& cpu_engine, std::vector& net, + mkldnn::memory::format& source_format) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + + mkldnn::memory::dims dims(xdim); + ort_source_format_ = GetSourceFormat(static_cast(xdim)); + source_format = ort_source_format_; + src_format_ = ort_source_format_; + x_shape_ = TensorShape(xshape, xdim); + + mkldnn::memory::dims src_dims_mkl(x_shape_.GetDims().begin(), x_shape_.GetDims().end()); + + // reorder for better performance + mkldnn::memory::format fmt = GetAVXFormat(src_dims_mkl); + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), fmt)); + } else { + // get the output of previous node (mkldnn block propagation). + // TODO Sourcenode will set src of this node. + x_shape_ = parents_[0].get()->primitive_dst_shape_; + ort_source_format_ = source_format; + src_format_ = parents_[0].get()->primitive_dst_format_; + mkldnn::memory::dims src_dims_mkl(x_shape_.GetDims().begin(), x_shape_.GetDims().end()); + + if (src_format_ == ort_source_format_) { + // reorder for better performance + mkldnn::memory::format fmt = GetAVXFormat(src_dims_mkl); + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), fmt)); + } else { + src_md_.reset(new mkldnn::memory::desc( + parents_[0].get()->primitive_dst_mem_.get()->get_primitive_desc().desc())); + } + } + + const auto& x_dims = x_shape_.GetDims(); + std::vector y_dims = SetOutputSize(x_shape_, x_shape_[1], &pads_); + primitive_dst_shape_ = TensorShape(y_dims); + + if (x_shape_.NumDimensions() <= 3) { + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Please call default CPU kernel."); + } + + if (global_pooling_) { + kernel_shape_.assign(x_dims.begin() + 2, x_dims.end()); + pads_.assign(kernel_shape_.size() * 2, 0); + strides_.assign(kernel_shape_.size(), 1); + } + + size_t num_outputs = 1; //OpKernel::Node().OutputDefs().size(); TODO + if (num_outputs == 2) { + ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "can not call cpu default op"); + } + + mkldnn::memory::dims dst_dims_mkl(y_dims.begin(), y_dims.end()); + mkldnn::memory::dims kernel_mkl(kernel_shape_.begin(), kernel_shape_.end()); + mkldnn::memory::dims strides_mkl(strides_.begin(), strides_.end()); + mkldnn::memory::dims padding_left_mkl(pads_.begin(), pads_.begin() + (pads_.size() / 2)); + mkldnn::memory::dims padding_right_mkl(pads_.begin() + (pads_.size() / 2), pads_.end()); + + primitive_dst_md_.reset(new mkldnn::memory::desc( + {dst_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + + mkldnn::algorithm algo = mkldnn::algorithm::pooling_max; + if (op_name_ == "AveragePool" || op_name_ == "GlobalAveragePool") { + algo = mkldnn::algorithm::pooling_avg_exclude_padding; + if (count_include_pad_) { + algo = mkldnn::algorithm::pooling_avg_include_padding; + } + } + fwd_desc_.reset(new mkldnn::pooling_forward::desc( + mkldnn::prop_kind::forward_inference, algo, + *src_md_, *primitive_dst_md_, + strides_mkl, kernel_mkl, + padding_left_mkl, padding_right_mkl, + mkldnn::padding_kind::zero)); + + fwd_primitive_desc_.reset(new mkldnn::pooling_forward::primitive_desc( + *fwd_desc_, cpu_engine)); + + if (mklnode_ptr_->parent_nodes.size() == 0) { + // Sub-graph's first node. Read input from input buffer + src_mem_.reset(new mkldnn::memory( + fwd_primitive_desc_.get()->src_primitive_desc(), nullptr)); + } else { + // Sub-graph's inner node. set input to parent's output + src_mem_ = parents_[0].get()->primitive_dst_mem_; + } + + primitive_src_format_ = static_cast( + fwd_primitive_desc_.get()->src_primitive_desc().desc().data.format); + + primitive_dst_format_ = static_cast( + fwd_primitive_desc_.get()->dst_primitive_desc().desc().data.format); + + src_size_ = fwd_primitive_desc_.get()->src_primitive_desc().get_size(); + dst_size_ = fwd_primitive_desc_.get()->dst_primitive_desc().get_size(); + + // reorder source memory for best performance (AVX512); + if (primitive_src_format_ != src_format_) { + mkldnn::memory::dims src_dims_mkl(x_shape_.GetDims().begin(), x_shape_.GetDims().end()); + auto src_md = mkldnn::memory::desc(src_dims_mkl, MklDnnType(), src_format_); + auto pd = mkldnn::memory::primitive_desc(src_md, cpu_engine); + + if (mklnode_ptr_->parent_nodes.size() == 0) + src_mem_from_.reset(new mkldnn::memory(pd, nullptr)); + else + src_mem_from_ = parents_[0].get()->primitive_dst_mem_; + + src_mem_.reset(new mkldnn::memory(fwd_primitive_desc_->src_primitive_desc(), nullptr)); + net.push_back(mkldnn::reorder(*src_mem_from_, *src_mem_)); + } else { + if (mklnode_ptr_->parent_nodes.size() == 0) { + src_mem_.reset(new mkldnn::memory(fwd_primitive_desc_->src_primitive_desc(), nullptr)); + } else { + src_mem_ = parents_[0].get()->primitive_dst_mem_; + } + } + + if (mklnode_ptr_->output_index >= 0) { + // last node of sub-graph. need to allocate memory for output_tensor + if (primitive_dst_format_ != ort_source_format_) { + // reorder neded. Use primitive output as input to reorder and + // allocate buffer for reorder output, final output of this subgraph + primitive_dst_mem_.reset( + new mkldnn::memory(fwd_primitive_desc_.get()->dst_primitive_desc())); + } else { + // Last node but re-order not needed. Allocate buffer to output of this node + primitive_dst_mem_.reset( + new mkldnn::memory(fwd_primitive_desc_.get()->dst_primitive_desc(), nullptr)); + } + } else { + // Intermediate node. Use mkldnn kernel internal memory for output and + // use this as input to next node. + primitive_dst_mem_.reset( + new mkldnn::memory(fwd_primitive_desc_.get()->dst_primitive_desc())); + } + pool_fwd_.reset( + new mkldnn::pooling_forward(*fwd_primitive_desc_, *src_mem_, *primitive_dst_mem_)); + + net.push_back(*pool_fwd_); + + if (mklnode_ptr_->output_index >= 0) { + // one of the end nodes. Allocate output buffer memory and + // reorder is necessary + mkldnn::memory::data_type t = MklDnnType(); + InitDstReorderOutput(cpu_engine, t, net); + } + return Status::OK(); + } + + Status Bind(Ort::CustomOpApi ort, OrtKernelContext* context) override { + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + if (x_shape_.NumDimensions() <= 3) { + if (mklnode_ptr_->parent_nodes.size() == 0) { + // abort as MKLDNN cannot execute this. but + // ORT try to delete output_tensor buffer data. allocate memory so that it can delete + // fix for test_averagepool_1d_default node test + //auto xshape = input_tensors[input_index].shape; + //auto xdim = input_tensors[input_index].ndim; + //AllocateOutputTensor(output_tensors, mklnode_ptr_->output_index, xshape, xdim, input_tensors[0].dtype); + } + std::cout << "MKLDNN cannot compute shape with dim less than three." << std::endl; + return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Please call default CPU kernel."); + } + + if (primitive_src_format_ != src_format_) { + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + src_mem_from_->set_data_handle(static_cast(const_cast(src_data))); + } else { + src_mem_from_ = parents_[0].get()->primitive_dst_mem_; + } + + auto src_size = fwd_primitive_desc_.get()->src_primitive_desc().get_size(); + src_reorder_buffer_ = IAllocator::MakeUniquePtr(alloc_, src_size); + src_mem_->set_data_handle(src_reorder_buffer_.get()); + } else { + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + src_mem_->set_data_handle(static_cast(const_cast(src_data))); + } else { + src_mem_ = parents_[0].get()->primitive_dst_mem_; + } + } + + if (mklnode_ptr_->output_index >= 0) { + // Last node of sub-graph. Allocate memory for output_buffer data + // Reorder if needed + auto& y_dims = primitive_dst_shape_.GetDims(); + // Allocate memory for output bufffer + OrtValue* output = ort.KernelContext_GetOutput(context, mklnode_ptr_->output_index, &y_dims[0], static_cast(primitive_dst_shape_.GetDims().size())); + T* dst_data = ort.GetTensorMutableData(output); + + if (primitive_dst_format_ != ort_source_format_) { + reorder_dst_mem_to_->set_data_handle(dst_data); + } else { + primitive_dst_mem_->set_data_handle(dst_data); + } + } + return Status::OK(); + } + +private: + void ReadAttributes(const NodeAttributes& attributes, + const std::string attributes_prefix = "") override { + global_pooling_ = (op_name_ == "GlobalAveragePool" || op_name_ == "GlobalMaxPool" || op_name_ == "GlobalLpPool"); + global_pooling_ = (op_name_ == "GlobalAveragePool" || op_name_ == "GlobalMaxPool" || op_name_ == "GlobalLpPool"); + + if (!global_pooling_) { + bool attr_read = false; + auto attr = attributes.find(attributes_prefix + "kernel_shape"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + GetIntsAttr(proto, kernel_shape_); + attr_read = true; + } + ORT_ENFORCE(attr_read, "No kernel shape is set."); + + std::string auto_padding; + attr = attributes.find(attributes_prefix + "auto_pad"); + if (attr != attributes.end() && + attr->second.type() == ::ONNX_NAMESPACE::AttributeProto_AttributeType::AttributeProto_AttributeType_STRING) { + auto_padding = attr->second.s(); + } + auto_pad_ = StringToAutoPadType(auto_padding); + + attr_read = false; + attr = attributes.find(attributes_prefix + "pads"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + if (GetIntsAttr(proto, pads_) == Status::OK()) + attr_read = true; + } + if (!attr_read) { + pads_.resize(kernel_shape_.size() * 2, 0); + } + + attr_read = false; + attr = attributes.find(attributes_prefix + "strides"); + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + if (GetIntsAttr(proto, strides_) == Status::OK()) + attr_read = true; + } + if (!attr_read || strides_.empty()) { + strides_.resize(kernel_shape_.size(), 1); + } + + attr = attributes.find(attributes_prefix + "count_include_pad"); + int64_t temp = 0; + if (attr != attributes.end()) { + ONNX_NAMESPACE::AttributeProto proto = attr->second; + GetIntAttr(proto, temp); + } + count_include_pad_ = (temp != 0); + + storage_order_ = 0; + for (size_t dim = 0; dim < kernel_shape_.size(); ++dim) { + ORT_ENFORCE(kernel_shape_[dim] > 0); + ORT_ENFORCE(pads_[dim] < kernel_shape_[dim] && pads_[dim + kernel_shape_.size()] < kernel_shape_[dim], + "Pad should be smaller than kernel."); + } + + ORT_ENFORCE(strides_.size() == kernel_shape_.size()); + } + } + + private: + size_t src_size_; + size_t dst_size_; + + std::shared_ptr src_mem_; + + std::unique_ptr fwd_desc_; + std::unique_ptr src_md_; + std::unique_ptr fwd_primitive_desc_; + std::unique_ptr pool_fwd_; + + std::shared_ptr src_mem_from_; + std::unique_ptr src_mem_to_; + + std::unique_ptr dst_mem_from_; + std::unique_ptr dst_mem_to_; + + private: + mkldnn::memory::format GetAVXFormat(const mkldnn::memory::dims& src_dims_mkl) { + bool is_2D = src_dims_mkl.size() == 4 ? true : false; + mkldnn::memory::format fmt = mkldnn::memory::format::any; + if (CPUIDInfo::GetCPUIDInfo().HasAVX512f()) { + fmt = is_2D ? mkldnn::memory::format::nChw16c : mkldnn::memory::format::nCdhw16c; + } else if (CPUIDInfo::GetCPUIDInfo().HasAVX2() && (src_dims_mkl[1] % 8 == 0)) { + fmt = is_2D ? mkldnn::memory::format::nChw8c : mkldnn::memory::format::ncdhw; + } else { + fmt = is_2D ? mkldnn::memory::format::nchw : mkldnn::memory::format::ncdhw; + } + return fmt; + } + + std::vector SetOutputSize(const TensorShape& input_shape, + int64_t output_channel, + std::vector* pads) const { + ORT_ENFORCE(input_shape.Size() > 0); + std::vector output_dims; + int64_t N = input_shape[0]; + InferOutputSize(input_shape.GetDims(), &output_dims, pads); + + output_dims.insert(output_dims.begin(), {N, output_channel}); + + return output_dims; + } + + inline void InferOutputSize(const std::vector& input_dims, + std::vector* output_dims, + std::vector* pads) const { + ORT_ENFORCE(input_dims.size() >= 2); + if (global_pooling_) { + output_dims->assign(input_dims.size() - 2, 1); + } else { + for (size_t dim = 0; dim < input_dims.size() - 2; ++dim) { + int64_t dim_size = 0; + ComputeSizeAndPad(static_cast(input_dims[dim + 2]), + strides_[dim], + kernel_shape_[dim], + &pads->at(dim), + &pads->at(input_dims.size() + dim - 2), + &dim_size); + output_dims->push_back(dim_size); + } + } + } + + inline void ComputeSizeAndPad(const int64_t in_size, + const int64_t stride, + const int64_t kernel, + int64_t* pad_head, + int64_t* pad_tail, + int64_t* out_size) const { + if (auto_pad_ != AutoPadType::NOTSET) { + switch (auto_pad_) { + case AutoPadType::VALID: + *pad_head = 0; + *pad_tail = 0; + *out_size = (in_size - kernel) / stride + 1; + break; + case AutoPadType::SAME_LOWER: { + int64_t legacy_target_size = (in_size + stride - 1) / stride; + int64_t pad_needed = (legacy_target_size - 1) * stride + kernel - in_size; + *pad_head = (pad_needed + 1) / 2; + *pad_tail = pad_needed - *pad_head; + *out_size = (in_size + pad_needed - kernel) / stride + 1; + break; + } + case AutoPadType::SAME_UPPER: { + int64_t legacy_target_size = (in_size + stride - 1) / stride; + int64_t pad_needed = (legacy_target_size - 1) * stride + kernel - in_size; + *pad_head = pad_needed / 2; + *pad_tail = pad_needed - *pad_head; + *out_size = (in_size + pad_needed - kernel) / stride + 1; + break; + } + default: { + ORT_THROW("Unsupported AutoPad Type."); + } + } + } else { + *out_size = static_cast( + static_cast(in_size + *pad_head + *pad_tail - kernel) / stride + 1); + } + } + + private: + IAllocatorUniquePtr src_reorder_buffer_; + IAllocatorUniquePtr dst_reorder_buffer_; + + private: + std::string op_name_; + bool global_pooling_{}; + bool count_include_pad_{}; + int64_t storage_order_{0}; // MaxPool_8 only. 0 is row major, and 1 is column major. Default is 0. + std::vector kernel_shape_; + std::vector pads_; + std::vector strides_; + AutoPadType auto_pad_; + + TensorShape x_shape_; +}; +} // namespace mkl_dnn +} // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_sum.h b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_sum.h new file mode 100644 index 0000000000000..6f43d50d9896f --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/mkldnn_sum.h @@ -0,0 +1,170 @@ +// Copyright(C) 2019 Intel Corporation +// Licensed under the MIT License + +#pragma once +#include "core/framework/op_kernel.h" +#include "core/providers/mkldnn/mkldnn_fwd.h" +#include "core/providers/mkldnn/mkldnn_common.h" +#include "core/providers/mkldnn/subgraph/mkldnn_kernel.h" +#include "core/util/math.h" + +namespace onnxruntime { +namespace mkl_dnn { + +template +class MklDnnSum : public MklDnnKernel { + public: + explicit MklDnnSum(MklDnnNode& node, + MKLDNNExecutionProvider* provider, + std ::shared_ptr mkl_context, + const NodeAttributes& attributes, + const std::string attributes_prefix = "") : MklDnnKernel(node, provider, mkl_context) { + ReadAttributes(attributes, attributes_prefix); + } + + Status CreatePrimitives(Ort::CustomOpApi ort, + OrtKernelContext* context, + mkldnn::engine& cpu_engine, + std::vector& net, + mkldnn::memory::format& source_format) override { + int num_inputs = mklnode_ptr_->num_inputs; + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + std::vector coeff; + TensorShape x_shape; + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + + ort_source_format_ = GetSourceFormat(static_cast(xdim)); + source_format = ort_source_format_; + src_format_ = ort_source_format_; + x_shape = TensorShape(xshape, xdim); + } else { + x_shape = parents_[0].get()->primitive_dst_shape_; + src_format_ = parents_[0].get()->primitive_dst_format_; + } + primitive_dst_shape_ = TensorShape(x_shape); + + mkldnn::memory::dims dst_dims_mkl( + primitive_dst_shape_.GetDims().begin(), primitive_dst_shape_.GetDims().end()); + + for (int i = 0; i < num_inputs; i++) { + TensorShape x_shape1; + + if (mklnode_ptr_->parent_nodes.size() == 0) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + auto xshape = tensor_shape.data(); + auto xdim = tensor_shape.size(); + mkldnn::memory::dims dims(xdim); + + ort_source_format_ = GetSourceFormat(static_cast(xdim)); + x_shape1 = TensorShape(xshape, xdim); + mkldnn::memory::dims src_dims_mkl( + x_shape1.GetDims().begin(), x_shape1.GetDims().end()); + + src_md_.reset(new mkldnn::memory::desc( + {src_dims_mkl}, MklDnnType(), src_format_)); + + auto mpd = mkldnn::memory::primitive_desc(*src_md_, cpu_engine); + auto src_memory = mkldnn::memory(mpd, nullptr); + srcs_pd_.push_back(mpd); + srcs_memory_.push_back(src_memory); + coeff.push_back(1.0); + } else { + src_md_.reset( + new mkldnn::memory::desc(parents_[i].get()->primitive_dst_mem_.get()->get_primitive_desc().desc())); + auto mpd = mkldnn::memory::primitive_desc(*src_md_, cpu_engine); + auto src_memory = *parents_[i].get()->primitive_dst_mem_; //mkldnn::memory(mpd); + srcs_pd_.push_back(mpd); + srcs_memory_.push_back(src_memory); + coeff.push_back(1.0); + ort_source_format_ = source_format; + } + } + + primitive_dst_md_.reset(new mkldnn::memory::desc( + {dst_dims_mkl}, MklDnnType(), mkldnn::memory::format::any)); + sum_pd_.reset(new mkldnn::sum::primitive_desc( + *primitive_dst_md_, coeff, srcs_pd_)); + primitive_dst_format_ = static_cast(sum_pd_->dst_primitive_desc().desc().data.format); + + if (mklnode_ptr_->output_index >= 0) { + // last node of sub-graph. need to allocate memory for output_tensor + if (primitive_dst_format_ != ort_source_format_) { + // reorder neded. Use primitive output as input to reorder and + // allocate buffer for reorder output, final output of this subgraph + primitive_dst_mem_.reset(new mkldnn::memory(sum_pd_->dst_primitive_desc())); + } else { + // Last node but re-order not needed. Allocate buffer to output of this node + primitive_dst_mem_.reset(new mkldnn::memory(sum_pd_->dst_primitive_desc(), nullptr)); + } + } else { + // Intermediate node. Use mkldnn kernel internal memory for output and + // use this as input to next node. + primitive_dst_mem_.reset(new mkldnn::memory(sum_pd_->dst_primitive_desc())); + } + primitive_dst_format_ = static_cast(sum_pd_->dst_primitive_desc().desc().data.format); + + std::vector inputs; + for (int i = 0; i < num_inputs; i++) { + inputs.push_back(srcs_memory_[i]); + } + auto c = mkldnn::sum(*sum_pd_, inputs, *primitive_dst_mem_); + net.push_back(c); + + if (mklnode_ptr_->output_index >= 0) { + // one of the end nodes. Allocate output buffer memory and + // reorder is necessary + mkldnn::memory::data_type t = MklDnnType(); + InitDstReorderOutput(cpu_engine, t, net); + } + return Status::OK(); + } + + Status Bind(Ort::CustomOpApi ort, OrtKernelContext* context) override { + int num_inputs = mklnode_ptr_->num_inputs; + int input_index = mklnode_ptr_->input_start_index < 0 ? 0 : mklnode_ptr_->input_start_index; + + if (mklnode_ptr_->parent_nodes.size() == 0) { + for (int i = 0; i < num_inputs; i++) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index+i); + const T* src_data = const_cast(ort.GetTensorData(input_tensor)); + srcs_memory_[i].set_data_handle(static_cast(const_cast(src_data))); + } + } + + if (mklnode_ptr_->output_index >= 0) { + // Last node. Allocate output buffer memory and reorder if needed + auto& y_dims = primitive_dst_shape_.GetDims(); + // Allocate memory for output bufffer + OrtValue* output = ort.KernelContext_GetOutput(context, mklnode_ptr_->output_index, &y_dims[0], static_cast(primitive_dst_shape_.GetDims().size())); + T* dst_data = ort.GetTensorMutableData(output); + + if (primitive_dst_format_ != ort_source_format_) { + reorder_dst_mem_to_->set_data_handle(dst_data); + } else { + primitive_dst_mem_->set_data_handle(dst_data); + } + } + return Status::OK(); + } + + private: + std::unique_ptr src_md_; + std::vector srcs_memory_; + + std::vector srcs_pd_; + std::unique_ptr src_mpd_; + std::unique_ptr dst_pd_; + std::unique_ptr sum_pd_; +}; +} // namespace mkl_dnn +} // namespace onnxruntime diff --git a/onnxruntime/core/providers/mkldnn/subgraph/subgraph.h b/onnxruntime/core/providers/mkldnn/subgraph/subgraph.h new file mode 100644 index 0000000000000..1e067e69345cd --- /dev/null +++ b/onnxruntime/core/providers/mkldnn/subgraph/subgraph.h @@ -0,0 +1,71 @@ +// Copyright(C) 2019 Intel Corporation +// Licensed under the MIT License + +#pragma once + +#include +#include +#include +#include "core/framework/op_node_proto_helper.h" +#include "core/graph/graph.h" + +namespace onnxruntime { +namespace mkl_dnn { + +struct MklDnnNode { + std::string name; + int node_index = -1; + int input_start_index = -1; // start index in inputs() + int num_inputs = 0; // and how many inputs + int output_index = -1; // index in output() + std::string weight_name; + std::string output_name; + std::vector parent_nodes; // index to parents in vector mklnodes + + std::string ToString() const { // For Debug purpose only + std::string key; + key.reserve(128); + key.append(name); + key.append(", input_start_index: "); + key.append(std::to_string(input_start_index)); + key.append(",num_inputs: "); + key.append(std::to_string(num_inputs)); + key.append(",output_index: "); + key.append(std::to_string(output_index)); + key.append(",output_name: "); + key.append(output_name); + key.append(", Parent nodes"); + for (auto& out : parent_nodes) + key.append(std::to_string(out) + ","); + key.append(";"); + return key; + } +}; + +struct Subgraph { + struct SubgraphVariables { + std::vector inputs; + std::vector outputs; + std::vector outputs_as_input_other_node; + std::vector subgraph_node_indexes; + std::shared_ptr subgraph_ptr; + int subgraph_index = 0; + + SubgraphVariables() { + subgraph_index = 0; + subgraph_ptr.reset(new Subgraph()); + } + void Reset() { + subgraph_node_indexes.clear(); + inputs.clear(); + outputs.clear(); + outputs_as_input_other_node.clear(); + subgraph_ptr.reset(new Subgraph()); + } + }; + + std::string subgraph_id; + std::vector mklnodes; +}; +} // namespace mkl_dnn +} // namespace onnxruntime \ No newline at end of file diff --git a/onnxruntime/core/providers/ngraph/ngraph_custom_op.cc b/onnxruntime/core/providers/ngraph/ngraph_custom_op.cc index e7c681d510637..5ac6354bbb11b 100644 --- a/onnxruntime/core/providers/ngraph/ngraph_custom_op.cc +++ b/onnxruntime/core/providers/ngraph/ngraph_custom_op.cc @@ -5,44 +5,34 @@ #include #include +#if defined(_MSC_VER) +#pragma warning(disable : 4244 4245) +#elif __GNUC__ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wunused-parameter" +#endif #include +#if defined(_MSC_VER) +#pragma warning(default : 4244 4245) +#elif __GNUC__ #pragma GCC diagnostic pop +#endif #include "ngraph_custom_op.h" #include "core/common/logging/logging.h" +#include "core/session/onnxruntime_cxx_api.h" namespace onnxruntime { namespace ngraph_ep { -static DType GetDataType(const ngraph::element::Type& ng_type) { - switch (ng_type.get_type_enum()) { - case ngraph::element::Type_t::f32: - return DType::TFloat32; - case ngraph::element::Type_t::f64: - return DType::TDouble; - case ngraph::element::Type_t::boolean: - return DType::TBool; - case ngraph::element::Type_t::u8: - return DType::TUint8; - case ngraph::element::Type_t::i8: - return DType::TInt8; - case ngraph::element::Type_t::u16: - return DType::TUint16; - case ngraph::element::Type_t::i16: - return DType::TInt16; - case ngraph::element::Type_t::u32: - return DType::TUint32; - case ngraph::element::Type_t::i32: - return DType::TInt32; - case ngraph::element::Type_t::u64: - return DType::TUint64; - case ngraph::element::Type_t::i64: - return DType::TInt64; - default: - throw "Unsupported DataType"; - } +static bool check_ngraph_dump_ops() { +#ifdef _WIN32 + size_t env_name_len = 0; + char* env_name = nullptr; + return (_dupenv_s(&env_name, &env_name_len, "ONNXRUNTIME_NGRAPH_DUMP_OPS") == 0); +#else + return (std::getenv("ONNXRUNTIME_NGRAPH_DUMP_OPS") != nullptr); +#endif } NGRAPHCustomOp::NGRAPHCustomOp(const ComputeContext* context, const ONNX_NAMESPACE::ModelProto& model_proto, @@ -54,7 +44,7 @@ NGRAPHCustomOp::NGRAPHCustomOp(const ComputeContext* context, const ONNX_NAMESPA allocator_ = context->allocator_handle; name_ = context->node_name; - if (std::getenv("ONNXRUNTIME_NGRAPH_DUMP_OPS") != nullptr) { + if (check_ngraph_dump_ops()) { std::fstream dump(name_ + ".onnx", std::ios::out | std::ios::trunc | std::ios::binary); model_proto_.SerializeToOstream(&dump); } @@ -67,9 +57,12 @@ NGRAPHCustomOp::~NGRAPHCustomOp() { } //This method gets called in critical path of execution: Optimize -void NGRAPHCustomOp::Initialize(const ONNXRunTimeTensor* input_tensors, const size_t& num_inputs) const { +void NGRAPHCustomOp::Initialize(const OrtCustomOpApi* api, OrtKernelContext* context) const { + Ort::CustomOpApi ort{*api}; LOGS_DEFAULT(INFO) << "nGraph compiling customOp: " << name_; + size_t num_inputs = ort.KernelContext_GetInputCount(context); + //Key for ng_exe_map std::string uniq_input_shape; @@ -77,9 +70,14 @@ void NGRAPHCustomOp::Initialize(const ONNXRunTimeTensor* input_tensors, const si uniq_input_shape.reserve(4 * sizeof(int64_t) * num_inputs + num_inputs); for (size_t i = 0; i < num_inputs; i++) { - const auto& ndim = input_tensors[i].ndim; + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, i); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + + const auto ndim = tensor_shape.size(); uniq_input_shape.append(reinterpret_cast(&ndim), sizeof(ndim)); - uniq_input_shape.append(reinterpret_cast(input_tensors[i].shape), ndim * sizeof(int64_t)); + uniq_input_shape.append(reinterpret_cast(tensor_shape.data()), ndim * sizeof(int64_t)); } auto it = ng_exe_map_.insert({uniq_input_shape, nullptr}); //TODO: Limit the size of map with configurable size. @@ -92,11 +90,16 @@ void NGRAPHCustomOp::Initialize(const ONNXRunTimeTensor* input_tensors, const si auto graph_proto = model_proto_.mutable_graph(); // Clear previous shapes if any and set new input shapes for (size_t i = 0; i < num_inputs; i++) { - auto g_in_shape = graph_proto->mutable_input(i)->mutable_type()->mutable_tensor_type()->mutable_shape(); + auto g_in_shape = graph_proto->mutable_input((int)i)->mutable_type()->mutable_tensor_type()->mutable_shape(); g_in_shape->clear_dim(); - for (size_t dim = 0; dim < input_tensors[i].ndim; dim++) { - g_in_shape->add_dim()->set_dim_value(input_tensors[i].shape[dim]); + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, i); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_shape = ort.GetTensorShape(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + + for (size_t dim = 0; dim < tensor_shape.size(); dim++) { + g_in_shape->add_dim()->set_dim_value(tensor_shape[dim]); } } @@ -131,14 +134,14 @@ void NGRAPHCustomOp::Initialize(const ONNXRunTimeTensor* input_tensors, const si } // namespace ngraph_ep //This method gets called in critical path of execution: Optimize -Status NGRAPHCustomOp::Compute(const ONNXRunTimeTensor* input_tensors, const size_t num_inputs, ONNXRunTimeTensor* const output_tensors, const size_t num_outputs) const { - ORT_UNUSED_PARAMETER(num_outputs); +Status NGRAPHCustomOp::Compute(const OrtCustomOpApi* api, OrtKernelContext* context) const { + Ort::CustomOpApi ort{*api}; //TODO: Minimize locked region std::lock_guard lock(compute_lock_); // Initialize nGraph function if it is not already initialized. - Initialize(input_tensors, num_inputs); + Initialize(api, context); ORT_ENFORCE(ng_curr_exe_ != nullptr); @@ -147,9 +150,11 @@ Status NGRAPHCustomOp::Compute(const ONNXRunTimeTensor* input_tensors, const siz // Write ONNXR input data to nGraph input tensors. try { - auto& in_tensor = input_tensors; + unsigned input_index = 0; for (const auto& ng_param : ng_curr_exe_->get_parameters()) { - ng_inputs.emplace_back(ng_backend_->create_tensor(ng_param->get_output_element_type(0), ng_param->get_output_shape(0), (in_tensor++)->data)); + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, input_index++); + void* input_data = const_cast(ort.GetTensorData(input_tensor)); + ng_inputs.emplace_back(ng_backend_->create_tensor(ng_param->get_output_element_type(0), ng_param->get_output_shape(0), input_data)); } } catch (const std::exception& exp) { return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Exception while copying input data to nGraph: " + std::string(exp.what())); @@ -160,25 +165,15 @@ Status NGRAPHCustomOp::Compute(const ONNXRunTimeTensor* input_tensors, const siz // Initialize output tensors try { //TODO: Optimize - auto onxr_output = output_tensors; + unsigned output_index = 0; for (auto& ng_result : ng_curr_exe_->get_results()) { const auto& dtype = ng_result->get_element_type(); const auto& shape = ng_result->get_shape(); - onxr_output->dtype = GetDataType(dtype); - onxr_output->ndim = shape.size(); - onxr_output->shape = new int64_t[onxr_output->ndim]; - - size_t num_elements = 1; - for (size_t dim = 0; dim < shape.size(); dim++) { - num_elements *= shape[dim]; - onxr_output->shape[dim] = shape[dim]; - } - - onxr_output->data = (*(allocate_func_))(allocator_, 64, num_elements * sizeof(onxr_output->dtype)); - - ng_outputs.emplace_back(ng_backend_->create_tensor(dtype, shape, onxr_output->data)); - ++onxr_output; + std::vector ort_shape{shape.begin(), shape.end()}; + OrtValue* output_tensor = ort.KernelContext_GetOutput(context, output_index++, ort_shape.data(), ort_shape.size()); + void* output_data = ort.GetTensorMutableData(output_tensor); + ng_outputs.emplace_back(ng_backend_->create_tensor(dtype, shape, output_data)); } } catch (const std::exception& exp) { return ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Exception while creating nGraph output Tensor: " + std::string(exp.what())); diff --git a/onnxruntime/core/providers/ngraph/ngraph_custom_op.h b/onnxruntime/core/providers/ngraph/ngraph_custom_op.h index 630a070cdca11..6661fdb378e56 100644 --- a/onnxruntime/core/providers/ngraph/ngraph_custom_op.h +++ b/onnxruntime/core/providers/ngraph/ngraph_custom_op.h @@ -3,10 +3,20 @@ #pragma once +#if defined(_MSC_VER) +#pragma warning(disable : 4244 4245) +#elif __GNUC__ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wunused-parameter" +#endif #include +#if defined(_MSC_VER) +#pragma warning(default : 4244 4245) +#elif __GNUC__ #pragma GCC diagnostic pop +#endif + +#include "core/session/onnxruntime_c_api.h" #include "core/framework/func_api.h" #include "core/graph/onnx_protobuf.h" @@ -17,13 +27,12 @@ class NGRAPHCustomOp { public: NGRAPHCustomOp(const ComputeContext* context, const ONNX_NAMESPACE::ModelProto& model_proto, const std::shared_ptr& ng_backend); - Status Compute(const ONNXRunTimeTensor* input_tensors, const size_t num_inputs, ONNXRunTimeTensor* const output_tensors, const size_t num_outputs) const; + Status Compute(const OrtCustomOpApi* api, OrtKernelContext* context) const; ~NGRAPHCustomOp(); private: - - void Initialize(const ONNXRunTimeTensor* input_tensors, const size_t& num_inputs) const; + void Initialize(const OrtCustomOpApi* api, OrtKernelContext* context) const; std::shared_ptr ng_backend_; diff --git a/onnxruntime/core/providers/ngraph/ngraph_execution_provider.cc b/onnxruntime/core/providers/ngraph/ngraph_execution_provider.cc index a9a13875a6744..94532c81c7e4a 100644 --- a/onnxruntime/core/providers/ngraph/ngraph_execution_provider.cc +++ b/onnxruntime/core/providers/ngraph/ngraph_execution_provider.cc @@ -11,13 +11,21 @@ #include "ngraph_execution_provider.h" #include "ngraph_custom_op.h" +#if defined(_MSC_VER) +#pragma warning(disable : 4244 4245) +#elif __GNUC__ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wunused-parameter" +#endif #include #include +#if defined(_MSC_VER) +#pragma warning(default : 4244 4245) +#elif __GNUC__ #pragma GCC diagnostic pop +#endif -#define MEMCPY_S(dest, src, destsz, srcsz) memcpy(dest, src, MIN(destsz, srcsz)) +#define MEMCPY_S(dest, src, destsz, srcsz) memcpy(dest, src, std::min(destsz, srcsz)) namespace onnxruntime { @@ -318,7 +326,7 @@ static std::vector GetUnsupportedNodeIndices(const GraphViewer& graph } /* Returns a vector clusters(or node_idx). For each unsupported node, the graph is split into 3 parts. - supported_cluster + (UNsupported_node + rest_of_the_graph). This functions returns vector of all supported_clusters by nGraph + supported_cluster + (UNsupported_node + rest_of_the_graph). This functions returns vector of all supported_clusters by nGraph */ static std::vector> GetPartitionedClusters(const std::vector& topological_order, const std::vector& unsupported_nodes) { std::vector> ng_clusters; @@ -450,22 +458,20 @@ NGRAPHExecutionProvider::GetCapability(const onnxruntime::GraphViewer& graph_vie std::for_each(graph_viewer.GetInputs().begin(), graph_viewer.GetInputs().end(), [&inputs](const NodeArg* node_arg) { inputs.push_back(node_arg->Name()); }); - /* In scenarios, when there are no inputs or all inputs being initializers, + /* In scenarios, when there are no inputs or all inputs being initializers, ConstantFolding optimization in onnxruntime pre-computes the value.*/ if (inputs.empty()) { return result; } + //Initializers need to be part of meta_def->inputs + std::for_each(ng_required_initializers.begin(), ng_required_initializers.end(), + [&inputs](const std::string& initializer) { inputs.push_back(initializer); }); + //Fill outputs with names std::for_each(graph_viewer.GetOutputs().begin(), graph_viewer.GetOutputs().end(), [&outputs](const NodeArg* node_arg) { outputs.push_back(node_arg->Name()); }); - // Remove initializers from inputs if they are in ng_required_initializers - inputs.erase(std::remove_if(inputs.begin(), inputs.end(), [&ng_required_initializers](const std::string& name) -> bool { - return ng_required_initializers.count(name); - }), - inputs.end()); - // Create and add this graph to result. AppendClusterToSubGraph(graph_viewer.GetNodesInTopologicalOrder(), graph_viewer, inputs, outputs, ng_required_initializers, result); @@ -539,10 +545,10 @@ Status NGRAPHExecutionProvider::Compile(const std::vector& f delete reinterpret_cast(state); }; - compute_info.compute_func = [](FunctionState state, ONNXRunTimeTensor* input_tensors, size_t num_inputs, ONNXRunTimeTensor* output_tensors, size_t num_outputs) { + compute_info.compute_func = [](FunctionState state, const OrtCustomOpApi* api, OrtKernelContext* context) { onnxruntime::ngraph_ep::NGRAPHCustomOp* ng_custom_op = reinterpret_cast(state); - const Status compute_status = ng_custom_op->Compute(input_tensors, num_inputs, output_tensors, num_outputs); + const Status compute_status = ng_custom_op->Compute(api, context); return compute_status == Status::OK() ? 0 : 1; }; diff --git a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc index 8abf11969087a..69d64871d00fd 100755 --- a/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc +++ b/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc @@ -3,6 +3,7 @@ #include "tensorrt_execution_provider.h" #include "tensorrt_allocator.h" +#include "core/session/onnxruntime_cxx_api.h" #include "core/framework/execution_provider.h" #include "core/framework/op_kernel.h" #include "core/framework/kernel_registry.h" @@ -21,13 +22,13 @@ using namespace ::onnxruntime::logging; namespace onnxruntime { -#define CHECK_CUDA(call) \ - do { \ - cudaError_t status = call; \ - if(status != cudaSuccess) { \ - return -1; \ - } \ - } while(0) +#define CHECK_CUDA(call) \ + do { \ + cudaError_t status = call; \ + if (status != cudaSuccess) { \ + return -1; \ + } \ + } while (0) TensorrtExecutionProvider::TensorrtExecutionProvider() : IExecutionProvider{onnxruntime::kTensorrtExecutionProvider} { @@ -83,7 +84,7 @@ TensorrtExecutionProvider::GetCapability(const onnxruntime::GraphViewer& graph, } std::unique_ptr sub_graph = std::make_unique(); // Find inputs and outputs of the subgraph - std::unordered_map fused_inputs, fused_outputs, fused_outputs_to_add; + std::unordered_map fused_inputs, fused_outputs, fused_outputs_to_add; std::unordered_set erased; int input_order = 0; int output_order = 0; @@ -146,7 +147,7 @@ TensorrtExecutionProvider::GetCapability(const onnxruntime::GraphViewer& graph, fused_outputs.insert(fused_outputs_to_add.begin(), fused_outputs_to_add.end()); // Sort inputs and outputs by the order they were added - std::multimap inputs, outputs; + std::multimap inputs, outputs; for (auto it = fused_inputs.begin(), end = fused_inputs.end(); it != end; ++it) { inputs.insert(std::pair(it->second, it->first)); @@ -215,65 +216,15 @@ common::Status TensorrtExecutionProvider::Compile(const std::vectorName()] = i; } - // Reconstruct graph from fused node's function body + // Reconstruct graph proto from fused node's function body const auto* func_body = fused_node->GetFunctionBody(); if (!func_body) { return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT, "Function body is empty"); } const Graph& graph_body = func_body->Body(); onnxruntime::Model model(graph_body.Name(), true, ModelMetaData(), IOnnxRuntimeOpSchemaRegistryList(), graph_body.DomainToVersionMap()); - onnxruntime::Graph& graph = model.MainGraph(); - - for (const auto& graph_body_node : graph_body.Nodes()) { - graph.AddNode(graph_body_node); - } - - ORT_ENFORCE(graph.Resolve().IsOK()); - - // Add initializer to graph - const auto& init_tensors = graph_body.GetAllInitializedTensors(); - for (const auto& tensor : init_tensors) { - graph.AddInitializedTensor(*(tensor.second)); - } - - // Add fused node's outputs to graph's outputs if the outputs are not included yet - // for the case that node's output is connected to more than one EdgeEnd nodes and some of them don't belong to the graph ONNX_NAMESPACE::ModelProto model_proto = model.ToProto(); - const auto& graph_output = model_proto.graph().output(); - std::unordered_set graph_outputs_set; - graph_outputs_set.reserve(graph_output.size()); - for (int i = 0, end = graph_output.size(); i < end; ++i) { - graph_outputs_set.insert(graph_output[i].name()); - } - - const auto& graph_value_info = model_proto.graph().value_info(); - std::vector output_to_add; - std::vector location; - int num_defs = output_defs.size(); - for (int i = num_defs - 1; i >= 0; --i) { - const std::string& output_name = output_defs[i]->Name(); - if (graph_outputs_set.find(output_name) == graph_outputs_set.end()) { - for (int j = 0, end = graph_value_info.size(); j < end; ++j) { - if (output_name == graph_value_info[j].name()) { - output_to_add.push_back(j); - location.push_back(num_defs - 1 - i); - } - } - } - } - - // Add outputs and move them to the right places - auto* mutable_output = model_proto.mutable_graph()->mutable_output(); - for (int i = 0, end = output_to_add.size(); i < end; ++i) { - *(mutable_output->Add()) = graph_value_info[output_to_add[i]]; - int start_index = (*mutable_output).size() - 1; - int end_index = start_index - location[i]; - for (int j = start_index; j > end_index; --j) { - mutable_output->SwapElements(j, j - 1); - } - } - - // Set version + *(model_proto.mutable_graph()) = graph_body.ToGraphProto(); model_proto.set_ir_version(ONNX_NAMESPACE::Version::IR_VERSION); // Create TensorRT engine @@ -290,14 +241,14 @@ common::Status TensorrtExecutionProvider::Compile(const std::vectorsetMaxBatchSize(max_batch_size_); trt_builder->setMaxWorkspaceSize(max_workspace_size_); auto trt_engine = unique_pointer(trt_builder->buildCudaEngine(*trt_network.get())); @@ -347,6 +298,7 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector(state); const std::vector& input_indexes = (trt_state->input_info)[0]; const std::vector& input_dim_sizes = (trt_state->input_info)[1]; const std::vector& output_indexes = (trt_state->output_info)[0]; const std::vector& output_dim_sizes = (trt_state->output_info)[1]; std::vector> output_shapes = trt_state->output_shapes; + int num_binding_inputs = input_indexes.size(); int num_binding_outputs = output_indexes.size(); int total_bindings = num_binding_inputs + num_binding_outputs; @@ -402,15 +354,18 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector(input_tensor); + const int input_batch_size = tensor_shape[0]; if (i > 0 && batch_size != input_batch_size) { ORT_THROW("Input batch size is inconsistent"); } batch_size = input_batch_size; - const float* input = static_cast(tensor_input.data); CHECK_CUDA(cudaMalloc(&buffers[i], input_batch_size * input_dim_sizes[i] * sizeof(float))); CHECK_CUDA(cudaMemcpy(buffers[i], input, input_batch_size * input_dim_sizes[i] * sizeof(float), cudaMemcpyHostToDevice)); } @@ -426,18 +381,10 @@ common::Status TensorrtExecutionProvider::Compile(const std::vectortest_allocate_func))(trt_state->allocator, 64, sizeof(double) * batch_size * output_dim_sizes[i]); - - CHECK_CUDA(cudaMemcpy(output_tensors[output_index].data, buffers[i + num_binding_inputs], batch_size * output_dim_sizes[i] * sizeof(float), cudaMemcpyDeviceToHost)); + OrtValue* output_tensor = ort.KernelContext_GetOutput(context, output_index, output_shapes[i].data(), output_shapes[i].size()); + CHECK_CUDA(cudaMemcpy(ort.GetTensorMutableData(output_tensor), buffers[i + num_binding_inputs], batch_size * output_dim_sizes[i] * sizeof(float), cudaMemcpyDeviceToHost)); } // Sync stream @@ -459,4 +406,3 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector Contains(const std::vector& output_n return {true, it - std::begin(output_names)}; } -common::Status IOBinding::BindOutput(const std::string& name, const MLValue& ml_value) { +common::Status IOBinding::BindOutput(const std::string& name, const OrtValue& ml_value) { auto rc = Contains(output_names_, name); if (rc.first) { outputs_[rc.second] = ml_value; @@ -81,17 +81,13 @@ const std::vector& IOBinding::GetOutputNames() const { return output_names_; } -std::vector& IOBinding::GetOutputs() { - return outputs_; -} +std::vector& IOBinding::GetOutputs() { return outputs_; } const std::vector& IOBinding::GetInputNames() const { return feed_names_; } -const std::vector& IOBinding::GetInputs() const { - return feeds_; -} +const std::vector& IOBinding::GetInputs() const { return feeds_; } AllocatorPtr IOBinding::GetCPUAllocator(int id, onnxruntime::ProviderType provider_type) const { auto& exec_providers = session_state_.GetExecutionProviders(); diff --git a/onnxruntime/core/session/IOBinding.h b/onnxruntime/core/session/IOBinding.h index 521d7d147bb5b..1896bd778b69f 100644 --- a/onnxruntime/core/session/IOBinding.h +++ b/onnxruntime/core/session/IOBinding.h @@ -16,37 +16,37 @@ namespace onnxruntime { class SessionState; /** - * Input/Output binding. - * Usage is as follows: - * - * InferenceSession session; - * session.Load(); - * session.Initialize(); - * ... - * shared_ptr io_binding; - * session.NewIOBinding("DML", &io_binding); - * io_binding->BindInput(...); - * io_binding->BindInput(...); - * io_binding->SynchronizeInputs(); - * - * io_binding->BindOutput(...); - * io_binding->BindOutput(...); - * - * session.Run(io_binding); - * - * vector& outputs = io_binding->GetOutputs(); - */ + * Input/Output binding. + * Usage is as follows: + * + * InferenceSession session; + * session.Load(); + * session.Initialize(); + * ... + * shared_ptr io_binding; + * session.NewIOBinding("DML", &io_binding); + * io_binding->BindInput(...); + * io_binding->BindInput(...); + * io_binding->SynchronizeInputs(); + * + * io_binding->BindOutput(...); + * io_binding->BindOutput(...); + * + * session.Run(io_binding); + * + * vector& outputs = io_binding->GetOutputs(); + */ class IOBinding { public: /** - * Call repeatedly to bind as many inputs as required. - * If the input mlvalue is not at the desired location (specified by the execution provider), this will - * copy it to the desired location. This copy may or may not be async. It depends on the exec provider. - * If the input mlvalue is not at the desired location, it should be preallocated - * If the input mlvalue isn't preallocated, it should have memtype of OrtMemTypeDefault - * For copying it leverages IExecutionProvider::CopyTensor(). - */ - common::Status BindInput(const std::string& name, const MLValue& ml_value); + * Call repeatedly to bind as many inputs as required. + * If the input ort_value is not at the desired location (specified by the execution provider), this will + * copy it to the desired location. This copy may or may not be async. It depends on the exec provider. + * If the input ort_value is not at the desired location, it should be preallocated + * If the input ort_value isn't preallocated, it should have memtype of OrtMemTypeDefault + * For copying it leverages IExecutionProvider::CopyTensor(). + */ + common::Status BindInput(const std::string& name, const OrtValue& ml_value); /** * If the BindInput calls are async this function acts as a barrier to ensure all inputs are fully copied @@ -60,16 +60,16 @@ class IOBinding { /** * This simply provides the names and optionally allocated output containers. */ - common::Status BindOutput(const std::string& name, const MLValue& ml_value); + common::Status BindOutput(const std::string& name, const OrtValue& ml_value); /** * This simply collects the outputs obtained after calling Run() inside the @param outputs. */ const std::vector& GetOutputNames() const; - std::vector& GetOutputs(); + std::vector& GetOutputs(); const std::vector& GetInputNames() const; - const std::vector& GetInputs() const; + const std::vector& GetInputs() const; /** * Get a CPU allocator from provider for async copy later if the provider supports that @@ -84,9 +84,9 @@ class IOBinding { IOBinding(const SessionState& session_state); const SessionState& session_state_; std::vector feed_names_; - std::vector feeds_; + std::vector feeds_; std::vector output_names_; - std::vector outputs_; + std::vector outputs_; ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(IOBinding); }; diff --git a/onnxruntime/core/session/abi_session_options.cc b/onnxruntime/core/session/abi_session_options.cc index a0f82d15ec952..a283fad921714 100644 --- a/onnxruntime/core/session/abi_session_options.cc +++ b/onnxruntime/core/session/abi_session_options.cc @@ -7,8 +7,7 @@ #include "core/session/inference_session.h" #include "abi_session_options_impl.h" -OrtSessionOptions::~OrtSessionOptions() { -} +OrtSessionOptions::~OrtSessionOptions() = default; OrtSessionOptions& OrtSessionOptions::operator=(const OrtSessionOptions&) { throw std::runtime_error("not implemented"); diff --git a/onnxruntime/core/session/custom_ops.cc b/onnxruntime/core/session/custom_ops.cc index a821a718de419..cc0990b13dc88 100644 --- a/onnxruntime/core/session/custom_ops.cc +++ b/onnxruntime/core/session/custom_ops.cc @@ -29,8 +29,16 @@ ORT_API_STATUS_IMPL(OrtKernelInfoGetAttribute_int64, _In_ const OrtKernelInfo* i return onnxruntime::ToOrtStatus(status); } -ORT_API(OrtValue*, OrtKernelContext_GetInput, OrtKernelContext* context, _In_ size_t index) { - return reinterpret_cast(const_cast(reinterpret_cast(context)->GetInputMLValue(index))); +ORT_API(size_t, OrtKernelContext_GetInputCount, const OrtKernelContext* context) { + return reinterpret_cast(context)->InputCount(); +}; + +ORT_API(size_t, OrtKernelContext_GetOutputCount, const OrtKernelContext* context) { + return reinterpret_cast(context)->OutputCount(); +}; + +ORT_API(const OrtValue*, OrtKernelContext_GetInput, const OrtKernelContext* context, _In_ size_t index) { + return reinterpret_cast(reinterpret_cast(context)->GetInputMLValue(index)); }; ORT_API(OrtValue*, OrtKernelContext_GetOutput, OrtKernelContext* context, _In_ size_t index, _In_ const int64_t* dim_values, size_t dim_count) { @@ -42,21 +50,26 @@ constexpr OrtCustomOpApi g_custom_op_api = { &OrtKernelInfoGetAttribute_float, &OrtKernelInfoGetAttribute_int64, - &OrtGetTensorShapeAndType, + &OrtGetTensorTypeAndShape, &OrtGetTensorShapeElementCount, - &OrtGetNumOfDimensions, - &OrtGetDimensions, - &OrtSetDims, + &OrtGetTensorElementType, + &OrtGetDimensionsCount, + &OrtGetDimensions, + &OrtSetDimensions, &OrtGetTensorMutableData, &OrtReleaseTensorTypeAndShapeInfo, + &OrtKernelContext_GetInputCount, &OrtKernelContext_GetInput, + &OrtKernelContext_GetOutputCount, &OrtKernelContext_GetOutput, }; +const OrtCustomOpApi& GetCustomOpApi() { return g_custom_op_api; } + namespace onnxruntime { struct CustomOpKernel : OpKernel { @@ -66,9 +79,7 @@ struct CustomOpKernel : OpKernel { op_kernel_ = op_.CreateKernel(&op_, &g_custom_op_api, reinterpret_cast(const_cast(&info))); } - ~CustomOpKernel() { - op_.KernelDestroy(op_kernel_); - } + ~CustomOpKernel() override { op_.KernelDestroy(op_kernel_); } Status Compute(OpKernelContext* ctx) const override { auto* ictx = static_cast(ctx); diff --git a/onnxruntime/core/session/default_cpu_allocator_c_api.cc b/onnxruntime/core/session/default_cpu_allocator_c_api.cc index b87c95e52ba07..e699f605c5931 100644 --- a/onnxruntime/core/session/default_cpu_allocator_c_api.cc +++ b/onnxruntime/core/session/default_cpu_allocator_c_api.cc @@ -8,7 +8,7 @@ // In the future we'll have more than one allocator type. Since all allocators are of type 'OrtAllocator' and there is a single // OrtReleaseAllocator function, we need to have a common base type that lets us delete them. struct OrtAllocatorImpl : OrtAllocator { - virtual ~OrtAllocatorImpl() {} + virtual ~OrtAllocatorImpl() = default; }; struct OrtDefaultAllocator : OrtAllocatorImpl { @@ -20,9 +20,7 @@ struct OrtDefaultAllocator : OrtAllocatorImpl { ORT_THROW_ON_ERROR(OrtCreateAllocatorInfo("Cpu", OrtDeviceAllocator, 0, OrtMemTypeDefault, &cpuAllocatorInfo)); } - ~OrtDefaultAllocator() { - OrtReleaseAllocatorInfo(cpuAllocatorInfo); - } + ~OrtDefaultAllocator() override { OrtReleaseAllocatorInfo(cpuAllocatorInfo); } void* Alloc(size_t size) { return ::malloc(size); diff --git a/onnxruntime/core/session/inference_session.cc b/onnxruntime/core/session/inference_session.cc index cb8069143655b..4544abf7c3f72 100644 --- a/onnxruntime/core/session/inference_session.cc +++ b/onnxruntime/core/session/inference_session.cc @@ -31,7 +31,7 @@ #include "core/framework/kernel_registry.h" #include "core/framework/ml_value_patterns_planner.h" #include "core/framework/mldata_type_utils.h" -#include "core/framework/mlvalue_name_idx_map.h" +#include "core/framework/ort_value_name_idx_map.h" #include "core/framework/sequential_executor.h" #include "core/framework/op_kernel_context_internal.h" #include "core/framework/parallel_executor.h" @@ -89,7 +89,8 @@ inline std::basic_string GetCurrentTimeString() { } // namespace InferenceSession::InferenceSession(const SessionOptions& session_options, logging::LoggingManager* logging_manager) - : session_state_{execution_providers_}, + : session_state_(execution_providers_, + session_options.enable_mem_pattern && session_options.enable_sequential_execution), session_options_{session_options}, graph_transformation_mgr_{session_options_.max_num_graph_transformation_steps}, logging_manager_{logging_manager}, @@ -110,7 +111,6 @@ InferenceSession::InferenceSession(const SessionOptions& session_options, loggin } session_state_.SetThreadPool(thread_pool_.get()); - session_state_.SetEnableMemoryPattern(session_options.enable_mem_pattern && session_options.enable_sequential_execution); session_profiler_.Initialize(session_logger_); session_state_.SetProfiler(session_profiler_); if (session_options.enable_profiling) { @@ -280,6 +280,22 @@ common::Status InferenceSession::Load(std::istream& model_istream) { return Load(loader, "model_loading_istream"); } +common::Status InferenceSession::Load(const void* model_data, int model_data_len) { + auto loader = [this, model_data, model_data_len](std::shared_ptr& model) { + ModelProto model_proto; + + const bool result = model_proto.ParseFromArray(model_data, model_data_len); + if (!result) { + return Status(common::ONNXRUNTIME, common::INVALID_PROTOBUF, + "Failed to load model because protobuf parsing failed."); + } + + return onnxruntime::Model::Load(model_proto, model, HasLocalSchema() ? &custom_schema_registries_ : nullptr); + }; + + return Load(loader, "model_loading_array"); +} + common::Status InferenceSession::TransformGraph(onnxruntime::Graph& graph, const onnxruntime::GraphTransformerManager& graph_transformer_mgr, const ExecutionProviders& providers, @@ -345,7 +361,8 @@ common::Status InferenceSession::CreateSubgraphSessionState(Graph& graph, Sessio Graph* subgraph = entry.second; ORT_ENFORCE(subgraph, "Main Graph instance should have populated all subgraphs when being resolved."); - auto subgraph_session_state = std::make_unique(execution_providers_); + auto subgraph_session_state = + std::make_unique(execution_providers_, session_state.GetEnableMemoryPattern()); subgraph_session_state->SetProfiler(session_profiler_); subgraph_session_state->SetLogger(*session_logger_); // Pass threadpool to subgraph @@ -377,13 +394,14 @@ common::Status InferenceSession::InitializeSubgraphSessions(Graph& graph, Sessio ORT_ENFORCE(subgraph_session_state, "CreateSubgraphSessionState should have created an entry earlier."); // setup everything required to execute the subgraph and save it in subgraph_session_state - SessionStateInitializer initializer{model_location_, subgraph, *subgraph_session_state, execution_providers_, - kernel_registry_manager_}; + SessionStateInitializer initializer(session_options_.enable_mem_pattern, model_location_, subgraph, + *subgraph_session_state, execution_providers_, kernel_registry_manager_); - ORT_RETURN_IF_ERROR(initializer.CreatePlan(&node, node.ImplicitInputDefs(), + const auto implicit_inputs = node.ImplicitInputDefs(); + ORT_RETURN_IF_ERROR(initializer.CreatePlan(&node, &implicit_inputs, session_options_.enable_sequential_execution)); - ORT_RETURN_IF_ERROR(initializer.InitializeAndSave(&node.ImplicitInputDefs())); + ORT_RETURN_IF_ERROR(initializer.InitializeAndSave(&implicit_inputs)); // LOGS(*session_logger_, VERBOSE) << std::make_pair(subgraph_info.session_state->GetExecutionPlan(), // &*subgraph_info.session_state); @@ -421,6 +439,15 @@ common::Status InferenceSession::Initialize() { std::make_unique(epi))); } + if (!session_options_.enable_sequential_execution && + execution_providers_.Get(onnxruntime::kCudaExecutionProvider)) { + LOGS(*session_logger_, ERROR) << "Parallel execution is currently not supported " + "for the registered CUDA Execution Provider."; + return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT, + "Parallel execution is currently not supported " + "for the registered CUDA Execution Provider."); + } + // add predefined transformers AddPredefinedTransformers(graph_transformation_mgr_, session_options_.graph_optimization_level, transformers_to_enable_); @@ -436,8 +463,8 @@ common::Status InferenceSession::Initialize() { // Register 2nd registries into KernelRegistryManager. ORT_RETURN_IF_ERROR(kernel_registry_manager_.RegisterKernels(execution_providers_)); - SessionStateInitializer session_initializer{model_location_, graph, session_state_, execution_providers_, - kernel_registry_manager_}; + SessionStateInitializer session_initializer(session_options_.enable_mem_pattern, model_location_, graph, + session_state_, execution_providers_, kernel_registry_manager_); // create SessionState for subgraphs as it's needed by the transformers ORT_RETURN_IF_ERROR(CreateSubgraphSessionState(graph, session_state_)); @@ -451,7 +478,7 @@ common::Status InferenceSession::Initialize() { // now that all the transforms are done, call Resolve on the main graph. this will recurse into the subgraphs. ORT_RETURN_IF_ERROR(graph.Resolve()); - ORT_RETURN_IF_ERROR(session_initializer.CreatePlan(nullptr, {}, session_options_.enable_sequential_execution)); + ORT_RETURN_IF_ERROR(session_initializer.CreatePlan(nullptr, nullptr, session_options_.enable_sequential_execution)); ORT_RETURN_IF_ERROR(session_initializer.InitializeAndSave(nullptr)); // handle any subgraphs @@ -466,7 +493,7 @@ common::Status InferenceSession::Initialize() { status = ORT_MAKE_STATUS(ONNXRUNTIME, NOT_IMPLEMENTED, "Exception during initialization: ", ex.what()); LOGS(*session_logger_, ERROR) << status.ErrorMessage(); } catch (const std::exception& ex) { - status = ORT_MAKE_STATUS(ONNXRUNTIME, FAIL, "Exception during initialization: ", ex.what()); + status = ORT_MAKE_STATUS(ONNXRUNTIME, RUNTIME_EXCEPTION, "Exception during initialization: ", ex.what()); LOGS(*session_logger_, ERROR) << status.ErrorMessage(); } catch (...) { status = ORT_MAKE_STATUS(ONNXRUNTIME, RUNTIME_EXCEPTION, "Encountered unknown exception in Initialize()"); @@ -494,7 +521,7 @@ common::Status InferenceSession::CheckTypes(MLDataType actual, MLDataType expect } common::Status InferenceSession::ValidateInputs(const std::vector& feed_names, - const std::vector& feeds) { + const std::vector& feeds) { if (feed_names.size() != feeds.size()) { return ORT_MAKE_STATUS(ONNXRUNTIME, INVALID_ARGUMENT, "Size mismatch: feed_names has ", @@ -527,7 +554,7 @@ common::Status InferenceSession::ValidateInputs(const std::vector& } common::Status InferenceSession::ValidateOutputs(const std::vector& output_names, - const std::vector* p_fetches) { + const std::vector* p_fetches) { if (!p_fetches) { return common::Status(common::ONNXRUNTIME, common::INVALID_ARGUMENT, "Output vector pointer is NULL"); @@ -558,11 +585,9 @@ common::Status InferenceSession::ValidateOutputs(const std::vector& return common::Status::OK(); } -Status InferenceSession::Run(const RunOptions& run_options, - const std::vector& feed_names, - const std::vector& feeds, - const std::vector& output_names, - std::vector* p_fetches) { +Status InferenceSession::Run(const RunOptions& run_options, const std::vector& feed_names, + const std::vector& feeds, const std::vector& output_names, + std::vector* p_fetches) { auto tp = session_profiler_.StartTime(); Status retval = Status::OK(); @@ -625,18 +650,15 @@ Status InferenceSession::Run(const RunOptions& run_options, return retval; } -common::Status InferenceSession::Run(const NameMLValMap& feeds, - const std::vector& output_names, - std::vector* p_fetches) { +common::Status InferenceSession::Run(const NameMLValMap& feeds, const std::vector& output_names, + std::vector* p_fetches) { return Run(RunOptions(), feeds, output_names, p_fetches); } -common::Status InferenceSession::Run(const RunOptions& run_options, - const NameMLValMap& feeds_map, - const std::vector& output_names, - std::vector* p_fetches) { +common::Status InferenceSession::Run(const RunOptions& run_options, const NameMLValMap& feeds_map, + const std::vector& output_names, std::vector* p_fetches) { std::vector feed_names; - std::vector feeds; + std::vector feeds; auto num_feeds = feeds_map.size(); feed_names.reserve(num_feeds); @@ -765,18 +787,12 @@ common::Status InferenceSession::SaveModelMetadata(const onnxruntime::Model& mod // save required inputs const auto& required_inputs = graph.GetInputs(); // inputs excluding initializers required_input_def_list_ = required_inputs; // A direct copy of required inputs - required_model_input_names_.reserve(required_inputs.size()); - for (const auto& elem : required_inputs) { - required_model_input_names_.insert(elem->Name()); - } // save all valid inputs auto& all_inputs = graph.GetInputsIncludingInitializers(); input_def_map_.reserve(all_inputs.size()); - model_input_names_.reserve(all_inputs.size()); for (auto elem : all_inputs) { input_def_map_.insert({elem->Name(), elem}); - model_input_names_.insert(elem->Name()); } // save outputs diff --git a/onnxruntime/core/session/inference_session.h b/onnxruntime/core/session/inference_session.h index 88e627a8ea55f..af3c59f4d9756 100644 --- a/onnxruntime/core/session/inference_session.h +++ b/onnxruntime/core/session/inference_session.h @@ -57,6 +57,7 @@ struct SessionOptions { // The idea is if the input shapes are the same, we could trace the internal memory allocation // and generate a memory pattern for future request. So next time we could just do one allocation // with a big chunk for all the internal memory allocation. + // See class 'MLValuePatternPlanner'. bool enable_mem_pattern = true; // enable the memory arena on CPU @@ -92,25 +93,25 @@ struct ModelMetadata { }; /** - * @brief This is the main class used to Run a model. - * Sample simple usage: - * CPUExecutionProviderInfo epi; - * ProviderOption po{"CPUExecutionProvider", epi}; - * SessionOptions so(vector{po}); - * InferenceSession session_object{so}; - * common::Status status = session_object.Load(MODEL_URI); - * common::Status status = session_object.Initialize(); - * - * NameMLValMap feeds; - * feeds.insert({}); - * ... - * std::vector output_names; - * output_names.insert(...); - * ... - * std::vector fetches; - * common::Status status = session_object.Run(run_options, feeds, output_names, &fetches); - * process the output here... - */ + * @brief This is the main class used to Run a model. + * Sample simple usage: + * CPUExecutionProviderInfo epi; + * ProviderOption po{"CPUExecutionProvider", epi}; + * SessionOptions so(vector{po}); + * InferenceSession session_object{so}; + * common::Status status = session_object.Load(MODEL_URI); + * common::Status status = session_object.Initialize(); + * + * NameMLValMap feeds; + * feeds.insert({}); + * ... + * std::vector output_names; + * output_names.insert(...); + * ... + * std::vector fetches; + * common::Status status = session_object.Run(run_options, feeds, output_names, &fetches); + * process the output here... + */ class InferenceSession { public: @@ -131,7 +132,7 @@ class InferenceSession { /** * Register an execution provider. If you've one to register, call this before invoking Initialize(). - * The order of invocation indicates the preference order as well. In other words call this method + * The order of invocation indicates the preference order as well. In other words call this method * on your most preferred execution provider first followed by the less preferred ones. * Calling this API is optional in which case onnxruntime will use its internal CPU execution provider. * @return OK if success. @@ -141,7 +142,7 @@ class InferenceSession { /** * Register a graph transformer. If you've one to register, call this before invoking Initialize(). * Calling this API is optional. - * @param[in] - providers Optional. If providers is non-empty this transformer will only to + * @param[in] - providers Optional. If providers is non-empty this transformer will only to applied to nodes which are assigned to given providers. * @param[in] - level Optional. Level to which this transformer should be registered. Default is set to 2. * @return OK if success. @@ -160,9 +161,9 @@ class InferenceSession { common::Status AddCustomOpDomains(const std::vector& ops); /** - * Register a custom registry for operator schema and kernels. If you've one to register, + * Register a custom registry for operator schema and kernels. If you've one to register, * call this before invoking Initialize(). - * The order of invocation indicates the reversed preference order: Register your most + * The order of invocation indicates the reversed preference order: Register your most * preferred registry at the end. * Calling this API is optional. * @return OK if success. @@ -185,6 +186,14 @@ class InferenceSession { */ common::Status Load(std::istream& model_istream); + /** + * Load an ONNX model. + * @param model_data Model data buffer + * @param model_data_len Model data buffer size + * @return OK if success. + */ + common::Status Load(const void* model_data, int model_data_len); + /** * Initializes a previously loaded model. Initialization includes but is not * limited to graph transformations, construction of kernels, etc. @@ -193,11 +202,9 @@ class InferenceSession { */ common::Status Initialize(); - common::Status Run(const RunOptions& run_options, - const std::vector& feed_names, - const std::vector& feeds, - const std::vector& output_names, - std::vector* p_fetches); + common::Status Run(const RunOptions& run_options, const std::vector& feed_names, + const std::vector& feeds, const std::vector& output_names, + std::vector* p_fetches); /** * Run a pre-loaded and pre-intialized model. @@ -209,23 +216,20 @@ class InferenceSession { * This should not be changed during execution of this function. * @return OK if success. */ - common::Status Run(const NameMLValMap& feeds, - const std::vector& output_names, - std::vector* p_fetches); + common::Status Run(const NameMLValMap& feeds, const std::vector& output_names, + std::vector* p_fetches); /** - * See Run(const NameMLValMap& feeds, const std::vector& output_names, std::vector* p_fetches) - * for details. - * @param run_options use this to tune the Run call to your needs. - */ - common::Status Run(const RunOptions& run_options, - const NameMLValMap& feeds, - const std::vector& output_names, - std::vector* p_fetches); + * See Run(const NameMLValMap& feeds, const std::vector& output_names, std::vector* p_fetches) + * for details. + * @param run_options use this to tune the Run call to your needs. + */ + common::Status Run(const RunOptions& run_options, const NameMLValMap& feeds, + const std::vector& output_names, std::vector* p_fetches); /** * Creates a new binding object for binding inputs and outputs. - * @param provider_type specifies the location where the inputs need to be potentially copied. + * @param provider_type specifies the location where the inputs need to be potentially copied. * See IOBinding class for more info. */ common::Status NewIOBinding(std::unique_ptr* io_binding); @@ -260,9 +264,9 @@ class InferenceSession { int GetCurrentNumRuns() const; /** - * Start profiling on this inference session. This simply turns on profiling events to be + * Start profiling on this inference session. This simply turns on profiling events to be * recorded. A corresponding EndProfiling has to follow to write profiling data to a file. - *@param file_prefix is the prefix of the profile file. It can include a directory path. + *@param file_prefix is the prefix of the profile file. It can include a directory path. */ void StartProfiling(const std::string& file_prefix); #ifdef _WIN32 @@ -311,9 +315,7 @@ class InferenceSession { // Immutable state for each op in the model. Shared by all executors. SessionState session_state_; - // names of model inputs and outputs used for quick validation. - std::unordered_set required_model_input_names_; - std::unordered_set model_input_names_; + // names of model outputs used for quick validation. std::unordered_set model_output_names_; // The file path of where the model was loaded. e.g. /tmp/test_squeezenet/model.onnx @@ -357,11 +359,9 @@ class InferenceSession { static common::Status CheckTypes(MLDataType actual, MLDataType expected); - common::Status ValidateInputs(const std::vector& feed_names, - const std::vector& feeds); + common::Status ValidateInputs(const std::vector& feed_names, const std::vector& feeds); - common::Status ValidateOutputs(const std::vector& output_names, - const std::vector* p_fetches); + common::Status ValidateOutputs(const std::vector& output_names, const std::vector* p_fetches); common::Status WaitForNotification(Notification* p_executor_done, int64_t timeout_in_ms); @@ -415,7 +415,7 @@ class InferenceSession { InsertCastTransformer insert_cast_transformer_; //CustomRegistry objects own the corresponding KernelRegistry and OnnxRuntimeOpSchemaRegistry objects. - //So its lifetime should be same as its constituents. This vector is to extend the lifetime of the owner. + //So its lifetime should be same as its constituents. This vector is to extend the lifetime of the owner. std::vector> custom_registries_; }; diff --git a/onnxruntime/core/session/onnxruntime_c_api.cc b/onnxruntime/core/session/onnxruntime_c_api.cc index 5e3b8d7ce6d41..048d82ad85067 100644 --- a/onnxruntime/core/session/onnxruntime_c_api.cc +++ b/onnxruntime/core/session/onnxruntime_c_api.cc @@ -32,7 +32,6 @@ using onnxruntime::IAllocator; using onnxruntime::InputDefList; using onnxruntime::MLFloat16; using onnxruntime::MLStatus; -using onnxruntime::MLValue; using onnxruntime::OutputDefList; using onnxruntime::Tensor; using onnxruntime::ToOrtStatus; @@ -70,14 +69,14 @@ struct OrtEnv { return OrtCreateStatus(ORT_RUNTIME_EXCEPTION, ex.what()); \ } -#define TENSOR_READ_API_BEGIN \ - API_IMPL_BEGIN \ - auto v = reinterpret_cast(value); \ +#define TENSOR_READ_API_BEGIN \ + API_IMPL_BEGIN \ + auto v = reinterpret_cast(value); \ auto& tensor = v->Get(); -#define TENSOR_READWRITE_API_BEGIN \ - API_IMPL_BEGIN \ - auto v = reinterpret_cast<::onnxruntime::MLValue*>(value); \ +#define TENSOR_READWRITE_API_BEGIN \ + API_IMPL_BEGIN \ + auto v = reinterpret_cast<::OrtValue*>(value); \ auto tensor = v->GetMutable(); class LoggingWrapper : public ISink { @@ -265,11 +264,11 @@ ORT_API_STATUS_IMPL(OrtCreateTensorWithDataAsOrtValue, _In_ const OrtAllocatorIn return OrtCreateStatus(ORT_NOT_IMPLEMENTED, errmsg.c_str()); } } - std::unique_ptr value = std::make_unique(); + std::unique_ptr value = std::make_unique(); value->Init(tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); - *out = reinterpret_cast(value.release()); + *out = value.release(); return nullptr; API_IMPL_END } @@ -331,11 +330,11 @@ ORT_API_STATUS_IMPL(OrtCreateTensorAsOrtValue, _Inout_ OrtAllocator* allocator, return OrtCreateStatus(ORT_NOT_IMPLEMENTED, errmsg.c_str()); } } - std::unique_ptr value = std::make_unique(); + std::unique_ptr value = std::make_unique(); value->Init(tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); - *out = reinterpret_cast(value.release()); + *out = value.release(); return nullptr; API_IMPL_END } @@ -364,34 +363,55 @@ ORT_API_STATUS_IMPL(OrtAddCustomOpDomain, _In_ OrtSessionOptions* options, OrtCu API_IMPL_END } +namespace { + template + OrtStatus* CreateSessionImpl(_In_ OrtEnv* env, _In_ const OrtSessionOptions* options, + Loader loader, _Out_ OrtSession** out) { + auto sess = std::make_unique<::onnxruntime::InferenceSession>( + options == nullptr ? onnxruntime::SessionOptions() : options->value, env->loggingManager); + Status status; + if (options != nullptr) { + if (!options->custom_op_domains_.empty()) { + status = sess->AddCustomOpDomains(options->custom_op_domains_); + if (!status.IsOK()) + return ToOrtStatus(status); + } + } + + if (options != nullptr) + for (auto& factory : options->provider_factories) { + auto provider = factory->CreateProvider(); + if (provider) + sess->RegisterExecutionProvider(std::move(provider)); + } + status = loader(*sess); + if (!status.IsOK()) + return ToOrtStatus(status); + status = sess->Initialize(); + if (!status.IsOK()) + return ToOrtStatus(status); + *out = reinterpret_cast(sess.release()); + return nullptr; + } +} + ORT_API_STATUS_IMPL(OrtCreateSession, _In_ OrtEnv* env, _In_ const ORTCHAR_T* model_path, _In_ const OrtSessionOptions* options, _Out_ OrtSession** out) { API_IMPL_BEGIN - auto sess = std::make_unique<::onnxruntime::InferenceSession>( - options == nullptr ? onnxruntime::SessionOptions() : options->value, env->loggingManager); - Status status; - if (options != nullptr) { - if (!options->custom_op_domains_.empty()) { - status = sess->AddCustomOpDomains(options->custom_op_domains_); - if (!status.IsOK()) - return ToOrtStatus(status); - } - } + const auto loader = [model_path](InferenceSession& sess) { + return sess.Load(model_path); + }; + return CreateSessionImpl(env, options, loader, out); + API_IMPL_END +} - if (options != nullptr) - for (auto& factory : options->provider_factories) { - auto provider = factory->CreateProvider(); - if (provider) - sess->RegisterExecutionProvider(std::move(provider)); - } - status = sess->Load(model_path); - if (!status.IsOK()) - return ToOrtStatus(status); - status = sess->Initialize(); - if (!status.IsOK()) - return ToOrtStatus(status); - *out = reinterpret_cast(sess.release()); - return nullptr; +ORT_API_STATUS_IMPL(OrtCreateSessionFromArray, _In_ OrtEnv* env, _In_ const void* model_data, int model_data_len, + _In_ const OrtSessionOptions* options, _Out_ OrtSession** out) { + API_IMPL_BEGIN + const auto loader = [model_data, model_data_len](InferenceSession& sess) { + return sess.Load(model_data, model_data_len); + }; + return CreateSessionImpl(env, options, loader, out); API_IMPL_END } @@ -404,7 +424,7 @@ ORT_API_STATUS_IMPL(OrtRun, _In_ OrtSession* sess, const int queue_id = 0; std::vector feed_names(input_len); - std::vector feeds(input_len); + std::vector feeds(input_len); for (size_t i = 0; i != input_len; ++i) { if (input_names[i] == nullptr || input_names[i][0] == '\0') { @@ -412,10 +432,9 @@ ORT_API_STATUS_IMPL(OrtRun, _In_ OrtSession* sess, } feed_names[i] = input_names[i]; - auto& mlvalue = feeds[i] = *reinterpret_cast(input[i]); + auto& ort_value = feeds[i] = *reinterpret_cast(input[i]); - if (mlvalue.Fence()) - mlvalue.Fence()->BeforeUsingAsInput(onnxruntime::kCpuExecutionProvider, queue_id); + if (ort_value.Fence()) ort_value.Fence()->BeforeUsingAsInput(onnxruntime::kCpuExecutionProvider, queue_id); } // Create output feed @@ -427,10 +446,10 @@ ORT_API_STATUS_IMPL(OrtRun, _In_ OrtSession* sess, output_names[i] = output_names1[i]; } - std::vector fetches(output_names_len); + std::vector fetches(output_names_len); for (size_t i = 0; i != output_names_len; ++i) { if (output[i] != nullptr) { - ::onnxruntime::MLValue& value = *reinterpret_cast<::onnxruntime::MLValue*>(output[i]); + ::OrtValue& value = *reinterpret_cast<::OrtValue*>(output[i]); if (value.Fence()) value.Fence()->BeforeUsingAsOutput(onnxruntime::kCpuExecutionProvider, queue_id); fetches[i] = value; @@ -447,11 +466,11 @@ ORT_API_STATUS_IMPL(OrtRun, _In_ OrtSession* sess, if (!status.IsOK()) return ToOrtStatus(status); for (size_t i = 0; i != output_names_len; ++i) { - ::onnxruntime::MLValue& value = fetches[i]; + ::OrtValue& value = fetches[i]; if (value.Fence()) value.Fence()->BeforeUsingAsInput(onnxruntime::kCpuExecutionProvider, queue_id); if (output[i] == nullptr) { - output[i] = reinterpret_cast(new MLValue(value)); + output[i] = new OrtValue(value); } } return nullptr; @@ -512,7 +531,7 @@ ORT_API_STATUS_IMPL(OrtTensorProtoToOrtValue, _In_ const void* input, int input_ if (!proto.ParseFromArray(input, input_len)) { return OrtCreateStatus(ORT_FAIL, "parse input tensor proto failed"); } - std::unique_ptr value = std::make_unique(); + std::unique_ptr value = std::make_unique(); std::unique_ptr del = std::make_unique(); auto status = utils::TensorProtoToMLValue(Env::Default(), input_file_path, proto, @@ -521,7 +540,7 @@ ORT_API_STATUS_IMPL(OrtTensorProtoToOrtValue, _In_ const void* input, int input_ if (!status.IsOK()) { return ToOrtStatus(status); } - *out = reinterpret_cast(value.release()); + *out = value.release(); if (del->f != nullptr) { *deleter = del.release(); } else @@ -626,7 +645,7 @@ static OrtStatus* GetInputOutputNameImpl(_In_ const OrtSession* sess, size_t ind } ORT_API(int, OrtIsTensor, _In_ const OrtValue* value) { - auto v = reinterpret_cast(value); + auto v = reinterpret_cast(value); return v->IsTensor() ? 1 : 0; } @@ -678,7 +697,7 @@ const int NUM_MAP_INDICES = 2; //////////////////// // OrtGetValueCount template -OrtStatus* OrtGetNumSequenceElements(const MLValue* p_ml_value, size_t* out) { +OrtStatus* OrtGetNumSequenceElements(const OrtValue* p_ml_value, size_t* out) { auto& data = p_ml_value->Get(); *out = data.size(); return nullptr; @@ -689,13 +708,15 @@ static OrtStatus* OrtGetValueCountImpl(const OrtValue* value, size_t* out) { if (value_type == ONNX_TYPE_MAP) { *out = NUM_MAP_INDICES; return nullptr; - } else if (value_type == ONNX_TYPE_SEQUENCE) { - auto v = reinterpret_cast(value); + } + if (value_type == ONNX_TYPE_SEQUENCE) { + auto v = reinterpret_cast(value); auto type = v->Type(); // Note: keep these in sync with the registered types in data_types.h if (type == DataTypeImpl::GetType()) { return OrtGetNumSequenceElements(v, out); - } else if (type == DataTypeImpl::GetType()) { + } + if (type == DataTypeImpl::GetType()) { return OrtGetNumSequenceElements(v, out); } else if (type == DataTypeImpl::GetType()) { return OrtGetNumSequenceElements(v, out); @@ -722,19 +743,18 @@ ORT_API_STATUS_IMPL(OrtGetValueCount, const OrtValue* value, size_t* out) { /////////////////// // OrtGetValue template -static OrtStatus* OrtGetValueImplSeqOfMap(const MLValue* p_ml_value, int index, - OrtValue** out) { +static OrtStatus* OrtGetValueImplSeqOfMap(const OrtValue* p_ml_value, int index, OrtValue** out) { using TKey = typename T::value_type::key_type; using TVal = typename T::value_type::mapped_type; using MapType = std::map; auto& data_vec = p_ml_value->Get(); auto& data_elem = data_vec.at(index); auto copy_data_elem = std::make_unique(data_elem); - std::unique_ptr value = std::make_unique(); + std::unique_ptr value = std::make_unique(); value->Init(copy_data_elem.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); - *out = reinterpret_cast(value.release()); + *out = value.release(); return nullptr; } @@ -777,7 +797,7 @@ OrtStatus* PopulateTensorWithData(OrtValue* oval, const T* data_elem, size_t num template <> OrtStatus* PopulateTensorWithData(OrtValue* oval, const std::string* data_elem, size_t num_elems) { - auto v = reinterpret_cast(oval); + auto v = reinterpret_cast(oval); auto tensor = v->GetMutable(); auto* dst = tensor->MutableData(); auto len = static_cast(tensor->Shape().Size()); @@ -791,7 +811,7 @@ OrtStatus* PopulateTensorWithData(OrtValue* oval, const std::string } template -OrtStatus* OrtGetValueImplSeqOfPrimitives(const MLValue* p_ml_value, int index, OrtAllocator* allocator, +OrtStatus* OrtGetValueImplSeqOfPrimitives(const OrtValue* p_ml_value, int index, OrtAllocator* allocator, OrtValue** out) { using ElemType = typename T::value_type; auto& data = p_ml_value->Get(); @@ -804,12 +824,13 @@ OrtStatus* OrtGetValueImplSeqOfPrimitives(const MLValue* p_ml_value, int index, static OrtStatus* OrtGetValueImplSeq(const OrtValue* value, int index, OrtAllocator* allocator, OrtValue** out) { - auto p_ml_value = reinterpret_cast(value); + auto p_ml_value = reinterpret_cast(value); auto type = p_ml_value->Type(); // Note: keep these in sync with the registered types in data_types.h if (type == DataTypeImpl::GetType()) { return OrtGetValueImplSeqOfPrimitives(p_ml_value, index, allocator, out); - } else if (type == DataTypeImpl::GetType()) { + } + if (type == DataTypeImpl::GetType()) { return OrtGetValueImplSeqOfPrimitives(p_ml_value, index, allocator, out); } else if (type == DataTypeImpl::GetType()) { return OrtGetValueImplSeqOfPrimitives(p_ml_value, index, allocator, out); @@ -825,7 +846,7 @@ static OrtStatus* OrtGetValueImplSeq(const OrtValue* value, int index, OrtAlloca } template -static OrtStatus* OrtGetValueImplMapHelper(const MLValue* p_ml_value, int index, OrtAllocator* allocator, +static OrtStatus* OrtGetValueImplMapHelper(const OrtValue* p_ml_value, int index, OrtAllocator* allocator, OrtValue** out) { using TKey = typename T::key_type; using TVal = typename T::mapped_type; @@ -861,12 +882,13 @@ static OrtStatus* OrtGetValueImplMapHelper(const MLValue* p_ml_value, int index, static OrtStatus* OrtGetValueImplMap(const OrtValue* value, int index, OrtAllocator* allocator, OrtValue** out) { - auto p_ml_value = reinterpret_cast(value); + auto p_ml_value = reinterpret_cast(value); auto type = p_ml_value->Type(); // Note: keep these in sync with the registered types in data_types.h if (type == DataTypeImpl::GetType()) { return OrtGetValueImplMapHelper(p_ml_value, index, allocator, out); - } else if (type == DataTypeImpl::GetType()) { + } + if (type == DataTypeImpl::GetType()) { return OrtGetValueImplMapHelper(p_ml_value, index, allocator, out); } else if (type == DataTypeImpl::GetType()) { return OrtGetValueImplMapHelper(p_ml_value, index, allocator, out); @@ -890,7 +912,8 @@ static OrtStatus* OrtGetValueImpl(const OrtValue* value, int index, OrtAllocator auto value_type = OrtGetValueType(value); if (value_type == ONNX_TYPE_MAP) { return OrtGetValueImplMap(value, index, allocator, out); - } else if (value_type == ONNX_TYPE_SEQUENCE) { + } + if (value_type == ONNX_TYPE_SEQUENCE) { return OrtGetValueImplSeq(value, index, allocator, out); } else { return OrtCreateStatus(ORT_FAIL, "Input is not of type sequence or map."); @@ -907,46 +930,46 @@ ORT_API_STATUS_IMPL(OrtGetValue, const OrtValue* value, int index, OrtAllocator* /////////////////// // OrtCreateValue template -static OrtStatus* OrtCreateValueImplSeqHelperMap(OrtValue** const in, int num_values, OrtValue** out) { +static OrtStatus* OrtCreateValueImplSeqHelperMap(OrtValue** const in, size_t num_values, OrtValue** out) { using SeqType = std::vector; auto vec_ptr = std::make_unique(); vec_ptr->reserve(num_values); - for (int idx = 0; idx < num_values; ++idx) { - auto& m = reinterpret_cast(in[idx])->Get(); + for (size_t idx = 0; idx < num_values; ++idx) { + auto& m = reinterpret_cast(in[idx])->Get(); vec_ptr->push_back(m); } - // create MLValue with this vector - std::unique_ptr value = std::make_unique(); + // create OrtValue with this vector + std::unique_ptr value = std::make_unique(); value->Init(vec_ptr.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); - *out = reinterpret_cast(value.release()); + *out = value.release(); return nullptr; } template -static OrtStatus* OrtCreateValueImplSeqHelper(OrtValue** const in, int num_values, OrtValue** out) { +static OrtStatus* OrtCreateValueImplSeqHelper(OrtValue** in, size_t num_values, OrtValue** out) { using SeqType = std::vector; auto vec_ptr = std::make_unique(); vec_ptr->reserve(num_values); - for (int idx = 0; idx < num_values; ++idx) { - auto& tensor = reinterpret_cast(in[idx])->Get(); + for (size_t idx = 0; idx < num_values; ++idx) { + auto& tensor = reinterpret_cast(in[idx])->Get(); auto data = tensor.Data(); if (!data) { return OrtCreateStatus(ORT_FAIL, "Encountered nullptr."); } vec_ptr->push_back(*data); } - // create MLValue with this vector - std::unique_ptr value = std::make_unique(); + // create OrtValue with this vector + std::unique_ptr value = std::make_unique(); value->Init(vec_ptr.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); - *out = reinterpret_cast(value.release()); + *out = value.release(); return nullptr; } -static OrtStatus* OrtCreateValueImplSeq(OrtValue** const in, int num_values, OrtValue** out) { +static OrtStatus* OrtCreateValueImplSeq(OrtValue** in, size_t num_values, OrtValue** out) { // We only support limited sequence types. For the sake of simplicity the type of the first // OrtValue* in OrtValue** will determine the type of the vector used to create the output OrtValue // this type should be either a tensor of limited types or map of limited types @@ -960,7 +983,7 @@ static OrtStatus* OrtCreateValueImplSeq(OrtValue** const in, int num_values, Ort // check if all OrtValues in the input array are of the same type // this is because even though the ONNX spec and this API spec supports heterogenous sequences, // only a fixed types are registered in onnxruntime - for (int i = 0; i < num_values; ++i) { + for (size_t i = 0; i < num_values; ++i) { const OrtValue* ov = in[i]; auto ov_type = OrtGetValueType(ov); if (ov_type != first_value_type) { @@ -970,12 +993,13 @@ static OrtStatus* OrtCreateValueImplSeq(OrtValue** const in, int num_values, Ort } // finally create the output vector/MLValue - auto first_mlvalue = reinterpret_cast(ovfirst); + auto first_mlvalue = reinterpret_cast(ovfirst); if (first_value_type == ONNX_TYPE_TENSOR) { auto vec_type = first_mlvalue->Get().DataType(); if (vec_type == DataTypeImpl::GetType()) { return OrtCreateValueImplSeqHelper(in, num_values, out); - } else if (vec_type == DataTypeImpl::GetType()) { + } + if (vec_type == DataTypeImpl::GetType()) { return OrtCreateValueImplSeqHelper(in, num_values, out); } else if (vec_type == DataTypeImpl::GetType()) { return OrtCreateValueImplSeqHelper(in, num_values, out); @@ -988,7 +1012,8 @@ static OrtStatus* OrtCreateValueImplSeq(OrtValue** const in, int num_values, Ort auto map_type = first_mlvalue->Type(); if (map_type == DataTypeImpl::GetType()) { return OrtCreateValueImplSeqHelperMap(in, num_values, out); - } else if (map_type == DataTypeImpl::GetType()) { + } + if (map_type == DataTypeImpl::GetType()) { return OrtCreateValueImplSeqHelperMap(in, num_values, out); } else { return OrtCreateStatus(ORT_FAIL, "Input is not of one of the supported map types."); @@ -1010,12 +1035,12 @@ static OrtStatus* OrtCreateMapMLValue(const Tensor& key_tensor, const Tensor& va for (size_t n = 0; n < num_kv_pairs; ++n, ++key_data, ++value_data) { map_ptr->insert({*key_data, *value_data}); } - // create mlvalue with this map - auto value = std::make_unique(); + // create ort_value with this map + auto value = std::make_unique(); value->Init(map_ptr.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); - *out = reinterpret_cast(value.release()); + *out = value.release(); return nullptr; } @@ -1025,7 +1050,8 @@ static OrtStatus* OrtCreateValueImplMapHelper(const Tensor& key_tensor, const Te auto value_type = value_tensor.DataType(); if (value_type == DataTypeImpl::GetType()) { return OrtCreateMapMLValue(key_tensor, value_tensor, out); - } else if (value_type == DataTypeImpl::GetType()) { + } + if (value_type == DataTypeImpl::GetType()) { return OrtCreateMapMLValue(key_tensor, value_tensor, out); } else if (value_type == DataTypeImpl::GetType()) { return OrtCreateMapMLValue(key_tensor, value_tensor, out); @@ -1036,18 +1062,18 @@ static OrtStatus* OrtCreateValueImplMapHelper(const Tensor& key_tensor, const Te } } -static OrtStatus* OrtCreateValueImplMap(OrtValue** const in, int num_values, OrtValue** out) { +static OrtStatus* OrtCreateValueImplMap(OrtValue** in, size_t num_values, OrtValue** out) { if (num_values != NUM_MAP_INDICES) { return OrtCreateStatus(ORT_FAIL, "For map type num_values MUST be 2"); } const OrtValue* ort_keys = in[0]; - auto p_key_ml_value = reinterpret_cast(ort_keys); + auto p_key_ml_value = reinterpret_cast(ort_keys); auto& key_tensor = p_key_ml_value->Get(); auto key_type = key_tensor.DataType(); const OrtValue* ort_values = in[1]; - auto p_value_ml_value = reinterpret_cast(ort_values); + auto p_value_ml_value = reinterpret_cast(ort_values); auto& value_tensor = p_value_ml_value->Get(); // as per data_types.h, we only support maps of primitive data types. @@ -1069,8 +1095,7 @@ static OrtStatus* OrtCreateValueImplMap(OrtValue** const in, int num_values, Ort return OrtCreateStatus(ORT_FAIL, "Key type is not supported yet."); } -static OrtStatus* OrtCreateValueImpl(OrtValue** const in, int num_values, enum ONNXType value_type, - OrtValue** out) { +static OrtStatus* OrtCreateValueImpl(OrtValue** in, size_t num_values, enum ONNXType value_type, OrtValue** out) { if (num_values <= 0) { return OrtCreateStatus(ORT_FAIL, "Number of values should be at least 1."); } @@ -1083,8 +1108,7 @@ static OrtStatus* OrtCreateValueImpl(OrtValue** const in, int num_values, enum O return OrtCreateStatus(ORT_FAIL, "Input is not of type sequence or map."); } -ORT_API_STATUS_IMPL(OrtCreateValue, OrtValue** const in, int num_values, enum ONNXType value_type, - OrtValue** out) { +ORT_API_STATUS_IMPL(OrtCreateValue, OrtValue** in, size_t num_values, enum ONNXType value_type, OrtValue** out) { API_IMPL_BEGIN return OrtCreateValueImpl(in, num_values, value_type, out); API_IMPL_END @@ -1093,6 +1117,6 @@ ORT_API_STATUS_IMPL(OrtCreateValue, OrtValue** const in, int num_values, enum ON // End support for non-tensor types DEFINE_RELEASE_ORT_OBJECT_FUNCTION(Env, OrtEnv) -DEFINE_RELEASE_ORT_OBJECT_FUNCTION(Value, MLValue) +DEFINE_RELEASE_ORT_OBJECT_FUNCTION(Value, OrtValue) DEFINE_RELEASE_ORT_OBJECT_FUNCTION(RunOptions, OrtRunOptions) DEFINE_RELEASE_ORT_OBJECT_FUNCTION(Session, ::onnxruntime::InferenceSession) diff --git a/onnxruntime/core/util/math.h b/onnxruntime/core/util/math.h index 86edc4884aad5..70d9cb3630dd8 100644 --- a/onnxruntime/core/util/math.h +++ b/onnxruntime/core/util/math.h @@ -76,17 +76,11 @@ void Not(int N, const T* x, T* y, Provider* provider); template void Powx(int N, const T* a, T b, T* y, Provider* provider); -#define DECLARE_BINARY_OP_BINARY_RESULT(name) \ - template \ - void name(const int N, const T* a, const T* b, bool* y, Provider* provider); \ - template \ - void name##ToRow( \ - const int M, \ - const int N, \ - const T* a, \ - const T* b, \ - bool* y, \ - Provider* provider); +#define DECLARE_BINARY_OP_BINARY_RESULT(name) \ + template \ + void name(int N, const T* a, const T* b, bool* y, Provider* provider); \ + template \ + void name##ToRow(int M, int N, const T* a, const T* b, bool* y, Provider* provider); DECLARE_BINARY_OP_BINARY_RESULT(LT); DECLARE_BINARY_OP_BINARY_RESULT(LE); @@ -99,23 +93,15 @@ DECLARE_BINARY_OP_BINARY_RESULT(Xor); #undef DECLARE_BINARY_OP_BINARY_RESULT -#define DECLARE_BINARY_OP(name) \ - template \ - void name(const int N, const T* a, const T* b, T* y, Provider* provider); \ - template \ - void name##ToRow( \ - const int M, \ - const int N, \ - const T* a, \ - const T* b, \ - T* y, \ - Provider* provider); \ - template \ - void name##ToRow( \ - const int M, const int N, const T* x, T* y, Provider* provider); \ - template \ - void name##ToCol( \ - const int M, const int N, const T* x, T* y, Provider* provider); +#define DECLARE_BINARY_OP(name) \ + template \ + void name(int N, const T* a, const T* b, T* y, Provider* provider); \ + template \ + void name##ToRow(int M, int N, const T* a, const T* b, T* y, Provider* provider); \ + template \ + void name##ToRow(int M, int N, const T* x, T* y, Provider* provider); \ + template \ + void name##ToCol(int M, int N, const T* x, T* y, Provider* provider); DECLARE_BINARY_OP(Add); DECLARE_BINARY_OP(Sub); @@ -266,8 +252,7 @@ template void Set(int64_t N, T alpha, T* X, Provider* provider); template -void RandUniform(int n, T a, T b, T* r, - Provider* provider); +void RandUniform(int n, T a, T b, const T* r, Provider* provider); template void RandUniformUnique( @@ -280,12 +265,7 @@ void RandUniformUnique( Provider* provider); template -void RandGaussian( - int n, - T mean, - T std, - T* r, - Provider* provider); +void RandGaussian(int n, T mean, T std, const T* r, Provider* provider); // Dot matrix of vector a and b, and writes the result to a single value y. template @@ -359,26 +339,15 @@ struct Im2colNd { template struct Im2colNd { - void operator()( - const T* data_img, - const int64_t* im_shape, - const int64_t* col_shape, - const int64_t /*img_size*/, - const int64_t /*col_size*/, - const int64_t* kernel_shape, - const int64_t* stride, - const int64_t* dilation, - const int64_t* pad, - const int64_t N, - T* data_col, - Provider* /*provider*/, - bool accumulate_output = false, - T padding_value = 0) { + void operator()(const T* data_img, const int64_t* im_shape, const int64_t* col_shape, int64_t /*img_size*/, + int64_t /*col_size*/, const int64_t* kernel_shape, const int64_t* stride, const int64_t* dilation, + const int64_t* pad, int64_t N, T* data_col, Provider* /*provider*/, bool accumulate_output = false, + T padding_value = 0) { int64_t kernel_size = 1; for (int64_t i = 0; i < N; ++i) { kernel_size *= kernel_shape[i]; } - const int64_t channels_col = col_shape[0]; + int64_t channels_col = col_shape[0]; std::vector d_offset(N, 0); std::vector d_iter(N, 0); for (int64_t c_col = 0; c_col < channels_col; ++c_col) { @@ -397,9 +366,8 @@ struct Im2colNd { int64_t index_im = c_col / kernel_size; bool is_padding = false; for (int64_t d_i = 0; d_i < N; ++d_i) { - const int64_t d = d_iter[d_i]; - const int64_t d_im = - d * stride[d_i] - pad[d_i] + d_offset[d_i] * dilation[d_i]; + int64_t d = d_iter[d_i]; + int64_t d_im = d * stride[d_i] - pad[d_i] + d_offset[d_i] * dilation[d_i]; is_padding |= d_im < 0 || d_im >= im_shape[d_i + 1]; index_col *= col_shape[d_i + 1]; index_col += d; @@ -419,7 +387,7 @@ struct Im2colNd { // like counting. incremented = false; for (int64_t d_i = N - 1; d_i >= 0; --d_i) { - const int64_t d_max = col_shape[d_i + 1]; + int64_t d_max = col_shape[d_i + 1]; ORT_ENFORCE(d_iter[d_i] < d_max); if (d_iter[d_i] == d_max - 1) { d_iter[d_i] = 0; diff --git a/onnxruntime/core/util/math_cpu.cc b/onnxruntime/core/util/math_cpu.cc index 4d5cfa1e538c5..3359a2f3343d2 100644 --- a/onnxruntime/core/util/math_cpu.cc +++ b/onnxruntime/core/util/math_cpu.cc @@ -125,22 +125,12 @@ void GemmEigen( // (transpose) if the argument TransA or TransB is set to CblasNoTrans or // CblasTrans, respectively, for each of A and B. template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const float* A, - const float* B, - const float beta, - float* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const float* A, const float* B, float beta, + float* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { #if defined(USE_MLAS) - int lda = (int)((TransA == CblasNoTrans) ? K : M); - int ldb = (int)((TransB == CblasNoTrans) ? N : K); + int lda = static_cast((TransA == CblasNoTrans) ? K : M); + int ldb = static_cast((TransB == CblasNoTrans) ? N : K); // TODO: Make this use the operator threadpool MlasSgemm(TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, N, nullptr); #else @@ -149,111 +139,49 @@ void Gemm( } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const double* A, - const double* B, - const float beta, - double* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const double* A, const double* B, + float beta, double* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { // No double precision Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const int32_t* A, - const int32_t* B, - const float beta, - int32_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No int32_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const int32_t* A, const int32_t* B, + float beta, int32_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No int32_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const uint32_t* A, - const uint32_t* B, - const float beta, - uint32_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No uint32_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const uint32_t* A, const uint32_t* B, + float beta, uint32_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No uint32_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const int64_t* A, - const int64_t* B, - const float beta, - int64_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No int64_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const int64_t* A, const int64_t* B, + float beta, int64_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No int64_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const uint64_t* A, - const uint64_t* B, - const float beta, - uint64_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No uint64_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const uint64_t* A, const uint64_t* B, + float beta, uint64_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No uint64_t Gemm offering from MLAS or MKLDNN. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void GemmEx( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int M, - const int N, - const int K, - const float alpha, - const float* A, - const int lda, - const float* B, - const int ldb, - const float beta, - float* C, - const int ldc, - CPUMathUtil*) { +void GemmEx(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, int M, int N, int K, + float alpha, const float* A, int lda, const float* B, int ldb, float beta, float* C, + int ldc, CPUMathUtil*) { #if defined(USE_MLAS) MlasSgemm(TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc, nullptr); #else @@ -306,17 +234,8 @@ void GemmEx( } template <> -void Gemv( - const CBLAS_TRANSPOSE TransA, - const int M, - const int N, - const float alpha, - const float* A, - const float* x, - const float beta, - float* y, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { +void Gemv(const CBLAS_TRANSPOSE TransA, int M, int N, float alpha, const float* A, const float* x, + float beta, float* y, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { EigenVectorMap y_vec(y, TransA == CblasNoTrans ? M : N); if (beta == 0) { // In Caffe2 we often do a lazy initialization, which may contain NaNs in @@ -341,54 +260,43 @@ void Gemv( } } -#define SPECIALIZED_SCALE(T) \ - template <> \ - void Scale( \ - const int n, const float alpha, const T* x, T* y, CPUMathUtil* /*provider*/) { \ - EigenVectorMap(y, n) = ConstEigenVectorMap(x, n) * alpha; \ - } \ - template <> \ - void Scale( \ - const int n, \ - const float* alpha, \ - const T* x, \ - T* y, \ - CPUMathUtil* /*provider*/) { \ - EigenVectorMap(y, n) = ConstEigenVectorMap(x, n) * (*alpha); \ +#define SPECIALIZED_SCALE(T) \ + template <> \ + void Scale(int n, float alpha, const T* x, T* y, CPUMathUtil* /*provider*/) { \ + EigenVectorMap(y, n) = ConstEigenVectorMap(x, n) * alpha; \ + } \ + template <> \ + void Scale(int n, const float* alpha, const T* x, T* y, CPUMathUtil* /*provider*/) { \ + EigenVectorMap(y, n) = ConstEigenVectorMap(x, n) * (*alpha); \ } SPECIALIZED_SCALE(float) #undef SPECIALIZED_SCALE -#define SPECIALIZED_DOT(T) \ - template <> \ - void Dot( \ - const int N, const T* a, const T* b, T* y, \ - CPUMathUtil* /*provider*/) { \ - *y = ConstEigenVectorMap(a, N).dot(ConstEigenVectorMap(b, N)); \ +#define SPECIALIZED_DOT(T) \ + template <> \ + void Dot(int N, const T* a, const T* b, T* y, CPUMathUtil* /*provider*/) { \ + *y = ConstEigenVectorMap(a, N).dot(ConstEigenVectorMap(b, N)); \ } SPECIALIZED_DOT(float) #undef SPECIALIZED_DOT -#define SPECIALIZED_AXPY(T) \ - template <> \ - void Axpy( \ - const int N, const T alpha, const T* x, T* Y, CPUMathUtil* /*provider*/) { \ - EigenVectorMap(Y, N) += ConstEigenVectorMap(x, N) * alpha; \ - } \ - template <> \ - void Axpy( \ - const int N, const T* alpha, const T* x, T* Y, CPUMathUtil* /*provider*/) { \ - EigenVectorMap(Y, N) += ConstEigenVectorMap(x, N) * (*alpha); \ +#define SPECIALIZED_AXPY(T) \ + template <> \ + void Axpy(int N, const T alpha, const T* x, T* Y, CPUMathUtil* /*provider*/) { \ + EigenVectorMap(Y, N) += ConstEigenVectorMap(x, N) * alpha; \ + } \ + template <> \ + void Axpy(int N, const T* alpha, const T* x, T* Y, CPUMathUtil* /*provider*/) { \ + EigenVectorMap(Y, N) += ConstEigenVectorMap(x, N) * (*alpha); \ } SPECIALIZED_AXPY(float) #undef SPECIALIZED_AXPY -#define SPECIALIZED_AXPBY(T) \ - template <> \ - void Axpby(const int N, const T alpha, const T* x, \ - const T beta, T* y, CPUMathUtil* /*context*/) { \ - EigenVectorMap y_vec(y, N); \ - y_vec = y_vec * beta + ConstEigenVectorMap(x, N) * alpha; \ +#define SPECIALIZED_AXPBY(T) \ + template <> \ + void Axpby(int N, const T alpha, const T* x, const T beta, T* y, CPUMathUtil* /*context*/) { \ + EigenVectorMap y_vec(y, N); \ + y_vec = y_vec * beta + ConstEigenVectorMap(x, N) * alpha; \ } SPECIALIZED_AXPBY(float) #undef SPECIALIZED_AXPBY @@ -396,19 +304,9 @@ SPECIALIZED_AXPBY(float) #else // USE_EIGEN_FOR_BLAS template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const float* A, - const float* B, - const float beta, - float* C, - CPUMathUtil* /*context*/, - MLDataType /*math_type*/) { +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const float* A, const float* B, float beta, + float* C, CPUMathUtil* /*context*/, MLDataType /*math_type*/) { int lda = gsl::narrow_cast((TransA == CblasNoTrans) ? K : M); int ldb = gsl::narrow_cast((TransB == CblasNoTrans) ? N : K); cblas_sgemm(CblasRowMajor, TransA, TransB, @@ -420,188 +318,101 @@ void Gemm( } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const double* A, - const double* B, - const float beta, - double* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - int lda = gsl::narrow_cast((TransA == CblasNoTrans) ? K : M); - int ldb = gsl::narrow_cast((TransB == CblasNoTrans) ? N : K); - cblas_dgemm(CblasRowMajor, TransA, TransB, - gsl::narrow_cast(M), - gsl::narrow_cast(N), - gsl::narrow_cast(K), - gsl::narrow_cast(alpha), A, lda, B, ldb, - gsl::narrow_cast(beta), C, gsl::narrow_cast(N)); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const double* A, const double* B, + float beta, double* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + int lda = gsl::narrow_cast((TransA == CblasNoTrans) ? K : M); + int ldb = gsl::narrow_cast((TransB == CblasNoTrans) ? N : K); + cblas_dgemm(CblasRowMajor, TransA, TransB, gsl::narrow_cast(M), gsl::narrow_cast(N), + gsl::narrow_cast(K), gsl::narrow_cast(alpha), A, lda, B, ldb, gsl::narrow_cast(beta), + C, gsl::narrow_cast(N)); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const int32_t* A, - const int32_t* B, - const float beta, - int32_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No int32_t Gemm offering from MKLML. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const int32_t* A, const int32_t* B, + float beta, int32_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No int32_t Gemm offering from MKLML. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const uint32_t* A, - const uint32_t* B, - const float beta, - uint32_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No uint32_t Gemm offering from MKLML. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const uint32_t* A, const uint32_t* B, + float beta, uint32_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No uint32_t Gemm offering from MKLML. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const int64_t* A, - const int64_t* B, - const float beta, - int64_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No int64_t Gemm offering from MKLML. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const int64_t* A, const int64_t* B, + float beta, int64_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No int64_t Gemm offering from MKLML. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void Gemm( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int64_t M, - const int64_t N, - const int64_t K, - const float alpha, - const uint64_t* A, - const uint64_t* B, - const float beta, - uint64_t* C, - CPUMathUtil* /*provider*/, - MLDataType /*math_type*/) { - // No uint64_t Gemm offering from MKLML. Directly fallback to Eigen. - GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); +void Gemm(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int64_t M, + const int64_t N, const int64_t K, float alpha, const uint64_t* A, const uint64_t* B, + float beta, uint64_t* C, CPUMathUtil* /*provider*/, MLDataType /*math_type*/) { + // No uint64_t Gemm offering from MKLML. Directly fallback to Eigen. + GemmEigen(TransA, TransB, M, N, K, alpha, A, B, beta, C); } template <> -void GemmEx( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int M, - const int N, - const int K, - const float alpha, - const float* A, - const int lda, - const float* B, - const int ldb, - const float beta, - float* C, - const int ldc, - CPUMathUtil* /*context*/) { +void GemmEx(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, int M, int N, int K, + float alpha, const float* A, int lda, const float* B, int ldb, float beta, float* C, + int ldc, CPUMathUtil* /*context*/) { cblas_sgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B, ldb, beta, C, ldc); } template <> -void Gemv( - const CBLAS_TRANSPOSE TransA, - const int M, - const int N, - const float alpha, - const float* A, - const float* x, - const float beta, - float* y, - CPUMathUtil* /*context*/, - MLDataType /*math_type*/) { +void Gemv(const CBLAS_TRANSPOSE TransA, int M, int N, float alpha, const float* A, const float* x, + float beta, float* y, CPUMathUtil* /*context*/, MLDataType /*math_type*/) { cblas_sgemv(CblasRowMajor, TransA, M, N, alpha, A, N, x, 1, beta, y, 1); } -#define CAFFE2_SPECIALIZED_SCALE(T, prefix) \ - template <> \ - void Scale( \ - const int n, const float alpha, const T* x, T* y, CPUMathUtil*) { \ - if (y != x) \ - cblas_##prefix##copy(n, x, 1, y, 1); \ - cblas_##prefix##scal(n, static_cast(alpha), y, 1); \ - } \ - template <> \ - void Scale( \ - const int n, const float* alpha, const T* x, T* y, CPUMathUtil*) { \ - if (y != x) \ - cblas_##prefix##copy(n, x, 1, y, 1); \ - cblas_##prefix##scal(n, static_cast(*alpha), y, 1); \ +#define CAFFE2_SPECIALIZED_SCALE(T, prefix) \ + template <> \ + void Scale(int n, float alpha, const T* x, T* y, CPUMathUtil*) { \ + if (y != x) cblas_##prefix##copy(n, x, 1, y, 1); \ + cblas_##prefix##scal(n, static_cast(alpha), y, 1); \ + } \ + template <> \ + void Scale(int n, const float* alpha, const T* x, T* y, CPUMathUtil*) { \ + if (y != x) cblas_##prefix##copy(n, x, 1, y, 1); \ + cblas_##prefix##scal(n, static_cast(*alpha), y, 1); \ } CAFFE2_SPECIALIZED_SCALE(float, s) #undef CAFFE2_SPECIALIZED_SCALE -#define CAFFE2_SPECIALIZED_DOT(T, prefix) \ - template <> \ - void Dot( \ - const int N, const T* a, const T* b, T* y, CPUMathUtil*) { \ - *y = cblas_##prefix##dot(N, a, 1, b, 1); \ +#define CAFFE2_SPECIALIZED_DOT(T, prefix) \ + template <> \ + void Dot(int N, const T* a, const T* b, T* y, CPUMathUtil*) { \ + *y = cblas_##prefix##dot(N, a, 1, b, 1); \ } CAFFE2_SPECIALIZED_DOT(float, s) #undef CAFFE2_SPECIALIZED_DOT -#define CAFFE2_SPECIALIZED_AXPY(T, prefix) \ - template <> \ - void Axpy( \ - const int N, const T alpha, const T* x, T* y, CPUMathUtil*) { \ - cblas_##prefix##axpy(N, alpha, x, 1, y, 1); \ - } \ - template <> \ - void Axpy( \ - const int N, const T* alpha, const T* x, T* y, CPUMathUtil*) { \ - cblas_##prefix##axpy(N, *alpha, x, 1, y, 1); \ +#define CAFFE2_SPECIALIZED_AXPY(T, prefix) \ + template <> \ + void Axpy(int N, const T alpha, const T* x, T* y, CPUMathUtil*) { \ + cblas_##prefix##axpy(N, alpha, x, 1, y, 1); \ + } \ + template <> \ + void Axpy(int N, const T* alpha, const T* x, T* y, CPUMathUtil*) { \ + cblas_##prefix##axpy(N, *alpha, x, 1, y, 1); \ } CAFFE2_SPECIALIZED_AXPY(float, s) #undef CAFFE2_SPECIALIZED_AXPY -#define CAFFE2_SPECIALIZED_AXPBY(T, prefix) \ - template <> \ - void Axpby( \ - const int N, \ - const T alpha, \ - const T* x, \ - const T beta, \ - T* y, \ - CPUMathUtil*) { \ - cblas_##prefix##scal(N, beta, y, 1); \ - cblas_##prefix##axpy(N, alpha, x, 1, y, 1); \ +#define CAFFE2_SPECIALIZED_AXPBY(T, prefix) \ + template <> \ + void Axpby(int N, const T alpha, const T* x, const T beta, T* y, CPUMathUtil*) { \ + cblas_##prefix##scal(N, beta, y, 1); \ + cblas_##prefix##axpy(N, alpha, x, 1, y, 1); \ } CAFFE2_SPECIALIZED_AXPBY(float, s) #undef CAFFE2_SPECIALIZED_AXPBY @@ -609,24 +420,11 @@ CAFFE2_SPECIALIZED_AXPBY(float, s) #endif // USE_EIGEN_FOR_BLAS template <> -void GemmBatched( - const CBLAS_TRANSPOSE TransA, - const CBLAS_TRANSPOSE TransB, - const int A_size, - const int A_batches, - const int B_size, - const int B_batches, - const int M, - const int N, - const int K, - const float /*alpha*/, - const float* A, - const float* B, - const float /*beta*/, - float* C, - CPUMathUtil* provider, - Tensor*, /* scratch */ - MLDataType /* math_type */) { +void GemmBatched(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, int A_size, + int A_batches, int B_size, int B_batches, int M, int N, int K, float /*alpha*/, + const float* A, const float* B, float /*beta*/, float* C, CPUMathUtil* provider, + Tensor*, /* scratch */ + MLDataType /* math_type */) { auto a_offset = A_size / A_batches; auto b_offset = B_size / B_batches; auto y_offset = M * N; @@ -657,10 +455,10 @@ void GemmBatched( // functions. //////////////////////////////////////////////////////////////////////////////// -#define DELEGATE_SIMPLE_UNARY_FUNCTION(T, Funcname, expr) \ - template <> \ - void Funcname(const int N, const T* x, T* y, CPUMathUtil*) { \ - EigenVectorMap(y, N) = ConstEigenVectorMap(x, N).array().expr(); \ +#define DELEGATE_SIMPLE_UNARY_FUNCTION(T, Funcname, expr) \ + template <> \ + void Funcname(int N, const T* x, T* y, CPUMathUtil*) { \ + EigenVectorMap(y, N) = ConstEigenVectorMap(x, N).array().expr(); \ } DELEGATE_SIMPLE_UNARY_FUNCTION(float, Exp, exp) DELEGATE_SIMPLE_UNARY_FUNCTION(float, Log, log) @@ -672,34 +470,28 @@ DELEGATE_SIMPLE_UNARY_FUNCTION(float, InvSqrt, rsqrt) DELEGATE_SIMPLE_UNARY_FUNCTION(float, Sqr, square) #undef DELEGATE_SIMPLE_UNARY_FUNCTION -#define DELEGATE_SINCOS_FUNCTION(T) \ - template <> \ - void SinCos( \ - const int N, const T* x, T* ys, T* yc, CPUMathUtil*) { \ - EigenVectorMap(ys, N) = ConstEigenVectorMap(x, N).array().sin(); \ - EigenVectorMap(yc, N) = ConstEigenVectorMap(x, N).array().cos(); \ +#define DELEGATE_SINCOS_FUNCTION(T) \ + template <> \ + void SinCos(int N, const T* x, T* ys, T* yc, CPUMathUtil*) { \ + EigenVectorMap(ys, N) = ConstEigenVectorMap(x, N).array().sin(); \ + EigenVectorMap(yc, N) = ConstEigenVectorMap(x, N).array().cos(); \ } DELEGATE_SINCOS_FUNCTION(float) DELEGATE_SINCOS_FUNCTION(double) #undef DELEGATE_SINCOS_FUNCTION -#define DELEGATE_POWX_FUNCTION(T) \ - template <> \ - void Powx(const int N, const T* a, T b, T* y, CPUMathUtil*) { \ - EigenVectorMap(y, N) = ConstEigenVectorMap(a, N).array().pow(b); \ +#define DELEGATE_POWX_FUNCTION(T) \ + template <> \ + void Powx(int N, const T* a, T b, T* y, CPUMathUtil*) { \ + EigenVectorMap(y, N) = ConstEigenVectorMap(a, N).array().pow(b); \ } DELEGATE_POWX_FUNCTION(float) #undef DELEGATE_POWX_FUNCTION -#define EIGEN_SIMPLE_BINARY_FUNCTION(T, Funcname, expr) \ - template <> \ - void Funcname( \ - const int N, const T* a, const T* b, T* y, \ - CPUMathUtil*) { \ - EigenVectorMap(y, N) = \ - ConstEigenVectorMap(a, N).array() expr \ - ConstEigenVectorMap(b, N) \ - .array(); \ +#define EIGEN_SIMPLE_BINARY_FUNCTION(T, Funcname, expr) \ + template <> \ + void Funcname(int N, const T* a, const T* b, T* y, CPUMathUtil*) { \ + EigenVectorMap(y, N) = ConstEigenVectorMap(a, N).array() expr ConstEigenVectorMap(b, N).array(); \ } #define DEFINE_SIMPLE_BINARY_FUNCTION(Funcname, expr) \ @@ -721,28 +513,18 @@ DEFINE_SIMPLE_BINARY_FUNCTION(Div, /) // Eigen or via custom code. //////////////////////////////////////////////////////////////////////////////// -#define SPECIALIZED_REDUCEMIN(T) \ - template <> \ - void ReduceMin( \ - const int N, \ - const T* x, \ - T* y, \ - Tensor* /*scratch_ptr*/, \ - CPUMathUtil* /*context*/) { \ - *y = *std::min_element(x, x + N); \ +#define SPECIALIZED_REDUCEMIN(T) \ + template <> \ + void ReduceMin(int N, const T* x, T* y, Tensor* /*scratch_ptr*/, CPUMathUtil* /*context*/) { \ + *y = *std::min_element(x, x + N); \ } SPECIALIZED_REDUCEMIN(float) #undef SPECIALIZED_REDUCEMIN -#define SPECIALIZED_REDUCEMAX(T) \ - template <> \ - void ReduceMax( \ - const int N, \ - const T* x, \ - T* y, \ - Tensor* /*scratch_ptr*/, \ - CPUMathUtil* /*context*/) { \ - *y = *std::max_element(x, x + N); \ +#define SPECIALIZED_REDUCEMAX(T) \ + template <> \ + void ReduceMax(int N, const T* x, T* y, Tensor* /*scratch_ptr*/, CPUMathUtil* /*context*/) { \ + *y = *std::max_element(x, x + N); \ } SPECIALIZED_REDUCEMAX(float) SPECIALIZED_REDUCEMAX(int32_t) @@ -750,63 +532,50 @@ SPECIALIZED_REDUCEMAX(int64_t) #undef SPECIALIZED_REDUCEMAX -#define SPECIALIZED_ROWWISESUM(T) \ - template <> \ - void RowwiseSum( \ - const int N, const int D, const T* x, T* y, CPUMathUtil*) { \ - EigenVectorMap(y, N) = \ - ConstEigenMatrixMap(x, D, N).colwise().sum(); \ +#define SPECIALIZED_ROWWISESUM(T) \ + template <> \ + void RowwiseSum(int N, int D, const T* x, T* y, CPUMathUtil*) { \ + EigenVectorMap(y, N) = ConstEigenMatrixMap(x, D, N).colwise().sum(); \ } SPECIALIZED_ROWWISESUM(float) #undef SPECIALIZED_ROWWISESUM -#define SPECIALIZED_COLWISESUM(T) \ - template <> \ - void ColwiseSum( \ - const int N, const int D, const T* x, T* y, CPUMathUtil*) { \ - EigenVectorMap(y, D) = \ - ConstEigenMatrixMap(x, D, N).rowwise().sum(); \ +#define SPECIALIZED_COLWISESUM(T) \ + template <> \ + void ColwiseSum(int N, int D, const T* x, T* y, CPUMathUtil*) { \ + EigenVectorMap(y, D) = ConstEigenMatrixMap(x, D, N).rowwise().sum(); \ } SPECIALIZED_COLWISESUM(float) #undef SPECIALIZED_COLWISESUM -#define SPECIALIZED_ROWWISEMAX(T) \ - template <> \ - void RowwiseMax( \ - const int N, const int D, const T* x, T* y, CPUMathUtil*) { \ - EigenVectorMap(y, N) = \ - ConstEigenMatrixMap(x, D, N).colwise().maxCoeff(); \ +#define SPECIALIZED_ROWWISEMAX(T) \ + template <> \ + void RowwiseMax(int N, int D, const T* x, T* y, CPUMathUtil*) { \ + EigenVectorMap(y, N) = ConstEigenMatrixMap(x, D, N).colwise().maxCoeff(); \ } SPECIALIZED_ROWWISEMAX(float) #undef SPECIALIZED_ROWWISEMAX -#define SPECIALIZED_COLWISEMAX(T) \ - template <> \ - void ColwiseMax( \ - const int N, const int D, const T* x, T* y, CPUMathUtil*) { \ - EigenVectorMap(y, D) = \ - ConstEigenMatrixMap(x, D, N).rowwise().maxCoeff(); \ +#define SPECIALIZED_COLWISEMAX(T) \ + template <> \ + void ColwiseMax(int N, int D, const T* x, T* y, CPUMathUtil*) { \ + EigenVectorMap(y, D) = ConstEigenMatrixMap(x, D, N).rowwise().maxCoeff(); \ } SPECIALIZED_COLWISEMAX(float) #undef SPECIALIZED_COLWISEMAX -#define SPECIALIZED_ELEMWISEMAX(T) \ - template <> \ - void ElemwiseMax( \ - const int N, const T* x, const T* y, T* z, CPUMathUtil* /*context*/) { \ - std::transform(x, x + N, y, z, [](const T& x_i, const T& y_i) { \ - return std::max(x_i, y_i); \ - }); \ +#define SPECIALIZED_ELEMWISEMAX(T) \ + template <> \ + void ElemwiseMax(int N, const T* x, const T* y, T* z, CPUMathUtil* /*context*/) { \ + std::transform(x, x + N, y, z, [](const T& x_i, const T& y_i) { return std::max(x_i, y_i); }); \ } SPECIALIZED_ELEMWISEMAX(float) #undef SPECIALIZED_ELEMWISEMAX -#define SPECIALIZED_MAXIMUM(T) \ - template <> \ - void Maximum( \ - const int N, const float alpha, const T* x, T* y, CPUMathUtil* /*provider*/) { \ - std::transform( \ - x, x + N, y, [&alpha](const T& x_i) { return std::max(x_i, alpha); }); \ +#define SPECIALIZED_MAXIMUM(T) \ + template <> \ + void Maximum(int N, float alpha, const T* x, T* y, CPUMathUtil* /*provider*/) { \ + std::transform(x, x + N, y, [&alpha](const T& x_i) { return std::max(x_i, alpha); }); \ } SPECIALIZED_MAXIMUM(float) #undef SPECIALIZED_MAXIMUM @@ -814,25 +583,19 @@ SPECIALIZED_MAXIMUM(float) // AddToRow and AddToCol adds the corresponding row/col vector b to the matrix a // of shape M x N. The actual implementation uses eigen which is column major, // so notice the row/column swap in the actual implementation. -#define DELEGATE_BROADCAST_BINARY_FUNCTION(T, Funcname, expr) \ - template <> \ - void Funcname##ToRow( \ - const int M, const int N, const T* a, const T* b, T* y, CPUMathUtil*) { \ - EigenArrayMap(y, N, M) = ConstEigenArrayMap(a, N, M).colwise() \ - expr ConstEigenVectorArrayMap(b, N); \ - } \ - /* inplace versions */ \ - template <> \ - void Funcname##ToRow( \ - const int M, const int N, const T* x, T* y, CPUMathUtil*) { \ - EigenArrayMap(y, N, M).colwise() expr## = \ - ConstEigenVectorArrayMap(x, N); \ - } \ - template <> \ - void Funcname##ToCol( \ - const int M, const int N, const T* x, T* y, CPUMathUtil*) { \ - EigenArrayMap(y, N, M).rowwise() expr## = \ - ConstEigenVectorArrayMap(x, M).transpose(); \ +#define DELEGATE_BROADCAST_BINARY_FUNCTION(T, Funcname, expr) \ + template <> \ + void Funcname##ToRow(int M, int N, const T* a, const T* b, T* y, CPUMathUtil*) { \ + EigenArrayMap(y, N, M) = ConstEigenArrayMap(a, N, M).colwise() expr ConstEigenVectorArrayMap(b, N); \ + } \ + /* inplace versions */ \ + template <> \ + void Funcname##ToRow(int M, int N, const T* x, T* y, CPUMathUtil*) { \ + EigenArrayMap(y, N, M).colwise() expr## = ConstEigenVectorArrayMap(x, N); \ + } \ + template <> \ + void Funcname##ToCol(int M, int N, const T* x, T* y, CPUMathUtil*) { \ + EigenArrayMap(y, N, M).rowwise() expr## = ConstEigenVectorArrayMap(x, M).transpose(); \ } #define DEFINE_BROADCAST_BINARY_FUNCTION(name, op) \ @@ -870,25 +633,18 @@ SPECIALIZED_SET(uint8_t); SPECIALIZED_SET(uint16_t); #undef SPECIALIZED_SET -#define INSTANTIATE_BINARY_OP(name, op, T) \ - template <> \ - void name( \ - const int n, const T* a, const T* b, bool* y, CPUMathUtil*) { \ - for (int i = 0; i < n; ++i) { \ - y[i] = a[i] op b[i]; \ - } \ - } \ - template <> \ - void name##ToRow( \ - const int m, \ - const int n, \ - const T* a, \ - const T* b, \ - bool* y, \ - CPUMathUtil*) { \ - for (int i = 0; i < n * m; ++i) { \ - y[i] = a[i] op b[i % n]; \ - } \ +#define INSTANTIATE_BINARY_OP(name, op, T) \ + template <> \ + void name(int n, const T* a, const T* b, bool* y, CPUMathUtil*) { \ + for (int i = 0; i < n; ++i) { \ + y[i] = a[i] op b[i]; \ + } \ + } \ + template <> \ + void name##ToRow(int m, int n, const T* a, const T* b, bool* y, CPUMathUtil*) { \ + for (int i = 0; i < n * m; ++i) { \ + y[i] = a[i] op b[i % n]; \ + } \ } #define DEFINE_BINARY_OP(name, op) \ @@ -906,11 +662,7 @@ INSTANTIATE_BINARY_OP(And, &, bool); INSTANTIATE_BINARY_OP(Xor, ^, bool); template <> -void Not( - const int n, - const bool* x, - bool* y, - CPUMathUtil* /*context*/) { +void Not(int n, const bool* x, bool* y, CPUMathUtil* /*context*/) { for (int i = 0; i < n; ++i) { y[i] = !x[i]; } @@ -919,27 +671,19 @@ void Not( #undef DEFINE_BINARY_OP #undef INSTANTIATE_BINARY_OP -#define SPECIALIZED_CPU_ADD_STRIPED_BATCH(T) \ - template <> \ - void AddStripedBatch( \ - const int N, \ - const T* first, \ - T* y, \ - const int stripe, \ - const int batch, \ - CPUMathUtil* provider) { \ - for (int j = 0; j < batch; j++) { \ - Add(N, first + j * stripe, y, y, provider); \ - } \ +#define SPECIALIZED_CPU_ADD_STRIPED_BATCH(T) \ + template <> \ + void AddStripedBatch(int N, const T* first, T* y, int stripe, int batch, CPUMathUtil* provider) { \ + for (int j = 0; j < batch; j++) { \ + Add(N, first + j * stripe, y, y, provider); \ + } \ } SPECIALIZED_CPU_ADD_STRIPED_BATCH(float); #undef SPECIALIZED_CPU_ADD_STRIPED_BATCH template <> -void RandUniform( - const int n, const float a, const float b, float* r, - CPUMathUtil* /*provider*/) { +void RandUniform(int n, float a, float b, const float* r, CPUMathUtil* /*provider*/) { std::uniform_real_distribution distribution(a, b); //todo: need implmenet "RandGenerator()" in execution provider ORT_UNUSED_PARAMETER(n); @@ -951,9 +695,7 @@ void RandUniform( } template <> -void RandUniform( - const int n, const int a, const int b, int* r, - CPUMathUtil* /*provider*/) { +void RandUniform(int n, int a, int b, const int* r, CPUMathUtil* /*provider*/) { std::uniform_int_distribution distribution(a, b); //todo: need implmenet "RandGenerator()" in execution provider ORT_UNUSED_PARAMETER(n); @@ -999,9 +741,7 @@ void RandUniform( //#undef CAFFE2_SPECIALIZED_RAND_UNIFORM_UNIQUE template <> -void RandGaussian( - const int n, const float mean, const float std, float* r, - CPUMathUtil* /*provider*/) { +void RandGaussian(int n, float mean, float std, const float* r, CPUMathUtil* /*provider*/) { std::normal_distribution distribution(mean, std); ORT_UNUSED_PARAMETER(n); ORT_UNUSED_PARAMETER(r); @@ -1011,15 +751,10 @@ void RandGaussian( }*/ } -#define SPECIALIZED_SUM(T) \ - template <> \ - void Sum( \ - const int N, \ - const T* x, \ - T* y, \ - CPUMathUtil* /* unused */, \ - Tensor* /* unused */) { \ - *y = ConstEigenVectorMap(x, N).sum(); \ +#define SPECIALIZED_SUM(T) \ + template <> \ + void Sum(int N, const T* x, T* y, CPUMathUtil* /* unused */, Tensor* /* unused */) { \ + *y = ConstEigenVectorMap(x, N).sum(); \ } SPECIALIZED_SUM(float); @@ -1029,23 +764,13 @@ SPECIALIZED_SUM(int64_t); #undef SPECIALIZED_SUM template <> -void SumSqr( - const int N, - const float* x, - float* y, - CPUMathUtil* /*context*/ /* unused */, - Tensor* /*scratch_ptr*/ /* unused */) { +void SumSqr(int N, const float* x, float* y, CPUMathUtil* /*context*/ /* unused */, + Tensor* /*scratch_ptr*/ /* unused */) { *y = ConstEigenVectorMap(x, N).squaredNorm(); } template <> -void Select( - const int N, - const int D, - const float* x, - const int* idx, - float* y, - CPUMathUtil* /*context*/) { +void Select(int N, int D, const float* x, const int* idx, float* y, CPUMathUtil* /*context*/) { for (int i = 0; i < N; ++i) { ORT_ENFORCE(idx[i] < D); y[i] = x[i * D + idx[i]]; @@ -1053,19 +778,11 @@ void Select( } template <> -void Col2imNd( - const float* data_col, - const int64_t* img_shape, - const int64_t* col_shape, - const int64_t img_size, - const int64_t col_size, - const int64_t* kernel_shape, - const int64_t* stride, - const int64_t* dilation, - const int64_t* pad, - const int64_t N, - float* data_img, - CPUMathUtil* context) { +void Col2imNd(const float* data_col, const int64_t* img_shape, + const int64_t* col_shape, int64_t img_size, int64_t col_size, + const int64_t* kernel_shape, const int64_t* stride, + const int64_t* dilation, const int64_t* pad, int64_t N, + float* data_img, CPUMathUtil* context) { Set(img_size, 0, data_img, context); Im2colNd()( data_col, @@ -1083,23 +800,14 @@ void Col2imNd( true); } -static void Im2colWithEqualPadding(int64_t output_h, int64_t output_w, const float* data_im, - const int64_t channels, - const int64_t height, - const int64_t width, - const int64_t kernel_h, - const int64_t kernel_w, - const int64_t dilation_h, - const int64_t dilation_w, - const int64_t pad_t, - const int64_t pad_l, - const int64_t stride_h, - const int64_t stride_w, - float* data_col) { +static void Im2colWithEqualPadding(int64_t output_h, int64_t output_w, const float* data_im, int64_t channels, + int64_t height, int64_t width, int64_t kernel_h, int64_t kernel_w, + int64_t dilation_h, int64_t dilation_w, int64_t pad_t, int64_t pad_l, + int64_t stride_h, int64_t stride_w, float* data_col) { // From Intel, https://github.com/BVLC/caffe/pull/3536 - const int64_t pad_h = pad_t; - const int64_t pad_w = pad_l; - const int64_t channel_size = height * width; + int64_t pad_h = pad_t; + int64_t pad_w = pad_l; + int64_t channel_size = height * width; for (int64_t channel = channels; channel--; data_im += channel_size) { for (int64_t kernel_row = 0; kernel_row < kernel_h; kernel_row++) { for (int64_t kernel_col = 0; kernel_col < kernel_w; kernel_col++) { @@ -1127,23 +835,11 @@ static void Im2colWithEqualPadding(int64_t output_h, int64_t output_w, const flo } } template <> -void Im2col( - const float* data_im, - const int64_t channels, - const int64_t height, - const int64_t width, - const int64_t kernel_h, - const int64_t kernel_w, - const int64_t dilation_h, - const int64_t dilation_w, - const int64_t pad_t, - const int64_t pad_l, - const int64_t pad_b, - const int64_t pad_r, - const int64_t stride_h, - const int64_t stride_w, - float* data_col, - CPUMathUtil* /*context*/) { +void Im2col(const float* data_im, int64_t channels, int64_t height, + int64_t width, int64_t kernel_h, int64_t kernel_w, + int64_t dilation_h, int64_t dilation_w, int64_t pad_t, + int64_t pad_l, int64_t pad_b, int64_t pad_r, int64_t stride_h, + int64_t stride_w, float* data_col, CPUMathUtil* /*context*/) { const int64_t output_h = (height + pad_b + pad_t - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1; @@ -1217,23 +913,11 @@ void Im2col( } template <> -void Im2col( - const float* data_im, - const int64_t channels, - const int64_t height, - const int64_t width, - const int64_t kernel_h, - const int64_t kernel_w, - const int64_t dilation_h, - const int64_t dilation_w, - const int64_t pad_t, - const int64_t pad_l, - const int64_t pad_b, - const int64_t pad_r, - const int64_t stride_h, - const int64_t stride_w, - float* data_col, - CPUMathUtil* /*context*/) { +void Im2col(const float* data_im, int64_t channels, int64_t height, + int64_t width, int64_t kernel_h, int64_t kernel_w, + int64_t dilation_h, int64_t dilation_w, int64_t pad_t, + int64_t pad_l, int64_t pad_b, int64_t pad_r, int64_t stride_h, + int64_t stride_w, float* data_col, CPUMathUtil* /*context*/) { const int64_t dkernel_h = dilation_h * (kernel_h - 1) + 1; const int64_t dkernel_w = dilation_w * (kernel_w - 1) + 1; @@ -1263,23 +947,11 @@ void Im2col( } template <> -void Col2im( - const float* data_col, - const int64_t channels, - const int64_t height, - const int64_t width, - const int64_t kernel_h, - const int64_t kernel_w, - const int64_t dilation_h, - const int64_t dilation_w, - const int64_t pad_t, - const int64_t pad_l, - const int64_t pad_b, - const int64_t pad_r, - const int64_t stride_h, - const int64_t stride_w, - float* data_im, - CPUMathUtil* context) { +void Col2im(const float* data_col, int64_t channels, int64_t height, + int64_t width, int64_t kernel_h, int64_t kernel_w, + int64_t dilation_h, int64_t dilation_w, int64_t pad_t, + int64_t pad_l, int64_t pad_b, int64_t pad_r, int64_t stride_h, + int64_t stride_w, float* data_im, CPUMathUtil* context) { const int64_t output_h = (height + pad_b + pad_t - (dilation_h * (kernel_h - 1) + 1)) / stride_h + 1; @@ -1379,23 +1051,11 @@ void Col2im( } template <> -void Col2im( - const float* data_col, - const int64_t channels, - const int64_t height, - const int64_t width, - const int64_t kernel_h, - const int64_t kernel_w, - const int64_t dilation_h, - const int64_t dilation_w, - const int64_t pad_t, - const int64_t pad_l, - const int64_t pad_b, - const int64_t pad_r, - const int64_t stride_h, - const int64_t stride_w, - float* data_im, - CPUMathUtil* context) { +void Col2im(const float* data_col, int64_t channels, int64_t height, + int64_t width, int64_t kernel_h, int64_t kernel_w, + int64_t dilation_h, int64_t dilation_w, int64_t pad_t, + int64_t pad_l, int64_t pad_b, int64_t pad_r, int64_t stride_h, + int64_t stride_w, float* data_im, CPUMathUtil* context) { const int64_t dkernel_h = dilation_h * (kernel_h - 1) + 1; const int64_t dkernel_w = dilation_w * (kernel_w - 1) + 1; @@ -1422,14 +1082,12 @@ void Col2im( } } - -#define SPECIALIZED_COPYVECTOR(T) \ - template <> \ - void CopyVector( \ - const int N, const T* src, T* dst, CPUMathUtil* /*context*/) { \ - if (src != dst && N > 0) { \ - memcpy(dst, src, sizeof(T) * N); \ - } \ +#define SPECIALIZED_COPYVECTOR(T) \ + template <> \ + void CopyVector(int N, const T* src, T* dst, CPUMathUtil* /*context*/) { \ + if (src != dst && N > 0) { \ + memcpy(dst, src, sizeof(T) * N); \ + } \ } SPECIALIZED_COPYVECTOR(float) #undef SPECIALIZED_COPYVECTOR diff --git a/onnxruntime/core/util/protobuf_parsing_utils.cc b/onnxruntime/core/util/protobuf_parsing_utils.cc index 4e46d0efcdb74..20969a140e087 100644 --- a/onnxruntime/core/util/protobuf_parsing_utils.cc +++ b/onnxruntime/core/util/protobuf_parsing_utils.cc @@ -169,11 +169,10 @@ int FileInputStream::CopyingFileInputStream::Read(void* buffer, int size) { int FileInputStream::CopyingFileInputStream::Skip(int count) { GOOGLE_CHECK(!is_closed_); - if (!previous_seek_failed_ && - lseek(file_, count, SEEK_CUR) != (off_t)-1) { + if (!previous_seek_failed_ && lseek(file_, count, SEEK_CUR) != static_cast(-1)) { // Seek succeeded. return count; - } else { + } // Failed to seek. // Note to self: Don't seek again. This file descriptor doesn't @@ -182,7 +181,6 @@ int FileInputStream::CopyingFileInputStream::Skip(int count) { // Use the default implementation. return CopyingInputStream::Skip(count); - } } // =================================================================== @@ -317,7 +315,7 @@ IstreamInputStream::CopyingIstreamInputStream::CopyingIstreamInputStream( std::istream* input) : input_(input) {} -IstreamInputStream::CopyingIstreamInputStream::~CopyingIstreamInputStream() {} +IstreamInputStream::CopyingIstreamInputStream::~CopyingIstreamInputStream() = default; int IstreamInputStream::CopyingIstreamInputStream::Read( void* buffer, int size) { @@ -354,8 +352,7 @@ OstreamOutputStream::CopyingOstreamOutputStream::CopyingOstreamOutputStream( std::ostream* output) : output_(output) {} -OstreamOutputStream::CopyingOstreamOutputStream::~CopyingOstreamOutputStream() { -} +OstreamOutputStream::CopyingOstreamOutputStream::~CopyingOstreamOutputStream() = default; bool OstreamOutputStream::CopyingOstreamOutputStream::Write( const void* buffer, int size) { @@ -417,9 +414,8 @@ bool ConcatenatingInputStream::Skip(int count) { int64 ConcatenatingInputStream::ByteCount() const { if (stream_count_ == 0) { return bytes_retired_; - } else { - return bytes_retired_ + streams_[0]->ByteCount(); } + return bytes_retired_ + streams_[0]->ByteCount(); } // =================================================================== @@ -463,19 +459,17 @@ bool LimitingInputStream::Skip(int count) { input_->Skip(static_cast(limit_)); limit_ = 0; return false; - } else { + } if (!input_->Skip(count)) return false; limit_ -= count; return true; - } } int64 LimitingInputStream::ByteCount() const { if (limit_ < 0) { return input_->ByteCount() + limit_ - prior_bytes_read_; - } else { - return input_->ByteCount() - prior_bytes_read_; } + return input_->ByteCount() - prior_bytes_read_; } // =================================================================== diff --git a/onnxruntime/core/util/protobuf_parsing_utils.h b/onnxruntime/core/util/protobuf_parsing_utils.h index 679b9bb42f39a..34c261b79f41e 100644 --- a/onnxruntime/core/util/protobuf_parsing_utils.h +++ b/onnxruntime/core/util/protobuf_parsing_utils.h @@ -83,24 +83,24 @@ class LIBPROTOBUF_EXPORT FileInputStream : public ZeroCopyInputStream { int GetErrno() { return copying_input_.GetErrno(); } // implements ZeroCopyInputStream ---------------------------------- - bool Next(const void** data, int* size); - void BackUp(int count); - bool Skip(int count); - int64 ByteCount() const; + bool Next(const void** data, int* size) override; + void BackUp(int count) override; + bool Skip(int count) override; + int64 ByteCount() const override; private: class LIBPROTOBUF_EXPORT CopyingFileInputStream : public CopyingInputStream { public: CopyingFileInputStream(int file_descriptor); - ~CopyingFileInputStream(); + ~CopyingFileInputStream() override; bool Close(); void SetCloseOnDelete(bool value) { close_on_delete_ = value; } int GetErrno() { return errno_; } // implements CopyingInputStream --------------------------------- - int Read(void* buffer, int size); - int Skip(int count); + int Read(void* buffer, int size) override; + int Skip(int count) override; private: // The file descriptor. @@ -140,7 +140,7 @@ class LIBPROTOBUF_EXPORT FileOutputStream : public ZeroCopyOutputStream { // that should be returned by Next(). Otherwise, a reasonable default // is used. explicit FileOutputStream(int file_descriptor, int block_size = -1); - ~FileOutputStream(); + ~FileOutputStream() override; // Flushes any buffers and closes the underlying file. Returns false if // an error occurs during the process; use GetErrno() to examine the error. @@ -166,22 +166,22 @@ class LIBPROTOBUF_EXPORT FileOutputStream : public ZeroCopyOutputStream { int GetErrno() { return copying_output_.GetErrno(); } // implements ZeroCopyOutputStream --------------------------------- - bool Next(void** data, int* size); - void BackUp(int count); - int64 ByteCount() const; + bool Next(void** data, int* size) override; + void BackUp(int count) override; + int64 ByteCount() const override; private: class LIBPROTOBUF_EXPORT CopyingFileOutputStream : public CopyingOutputStream { public: CopyingFileOutputStream(int file_descriptor); - ~CopyingFileOutputStream(); + ~CopyingFileOutputStream() override; bool Close(); void SetCloseOnDelete(bool value) { close_on_delete_ = value; } int GetErrno() { return errno_; } // implements CopyingOutputStream -------------------------------- - bool Write(const void* buffer, int size); + bool Write(const void* buffer, int size) override; private: // The file descriptor. @@ -216,19 +216,19 @@ class LIBPROTOBUF_EXPORT IstreamInputStream : public ZeroCopyInputStream { explicit IstreamInputStream(std::istream* stream, int block_size = -1); // implements ZeroCopyInputStream ---------------------------------- - bool Next(const void** data, int* size); - void BackUp(int count); - bool Skip(int count); - int64 ByteCount() const; + bool Next(const void** data, int* size) override; + void BackUp(int count) override; + bool Skip(int count) override; + int64 ByteCount() const override; private: class LIBPROTOBUF_EXPORT CopyingIstreamInputStream : public CopyingInputStream { public: CopyingIstreamInputStream(std::istream* input); - ~CopyingIstreamInputStream(); + ~CopyingIstreamInputStream() override; // implements CopyingInputStream --------------------------------- - int Read(void* buffer, int size); + int Read(void* buffer, int size) override; // (We use the default implementation of Skip().) private: @@ -257,21 +257,21 @@ class LIBPROTOBUF_EXPORT OstreamOutputStream : public ZeroCopyOutputStream { // that should be returned by Next(). Otherwise, a reasonable default // is used. explicit OstreamOutputStream(std::ostream* stream, int block_size = -1); - ~OstreamOutputStream(); + ~OstreamOutputStream() override; // implements ZeroCopyOutputStream --------------------------------- - bool Next(void** data, int* size); - void BackUp(int count); - int64 ByteCount() const; + bool Next(void** data, int* size) override; + void BackUp(int count) override; + int64 ByteCount() const override; private: class LIBPROTOBUF_EXPORT CopyingOstreamOutputStream : public CopyingOutputStream { public: CopyingOstreamOutputStream(std::ostream* output); - ~CopyingOstreamOutputStream(); + ~CopyingOstreamOutputStream() override; // implements CopyingOutputStream -------------------------------- - bool Write(const void* buffer, int size); + bool Write(const void* buffer, int size) override; private: // The stream. @@ -302,10 +302,10 @@ class LIBPROTOBUF_EXPORT ConcatenatingInputStream : public ZeroCopyInputStream { ConcatenatingInputStream(ZeroCopyInputStream* const streams[], int count); // implements ZeroCopyInputStream ---------------------------------- - bool Next(const void** data, int* size); - void BackUp(int count); - bool Skip(int count); - int64 ByteCount() const; + bool Next(const void** data, int* size) override; + void BackUp(int count) override; + bool Skip(int count) override; + int64 ByteCount() const override; private: // As streams are retired, streams_ is incremented and count_ is @@ -324,13 +324,13 @@ class LIBPROTOBUF_EXPORT ConcatenatingInputStream : public ZeroCopyInputStream { class LIBPROTOBUF_EXPORT LimitingInputStream : public ZeroCopyInputStream { public: LimitingInputStream(ZeroCopyInputStream* input, int64 limit); - ~LimitingInputStream(); + ~LimitingInputStream() override; // implements ZeroCopyInputStream ---------------------------------- - bool Next(const void** data, int* size); - void BackUp(int count); - bool Skip(int count); - int64 ByteCount() const; + bool Next(const void** data, int* size) override; + void BackUp(int count) override; + bool Skip(int count) override; + int64 ByteCount() const override; private: ZeroCopyInputStream* input_; diff --git a/onnxruntime/python/onnxruntime_pybind_mlvalue.cc b/onnxruntime/python/onnxruntime_pybind_mlvalue.cc index 508c491d377b2..baaae6166b26e 100644 --- a/onnxruntime/python/onnxruntime_pybind_mlvalue.cc +++ b/onnxruntime/python/onnxruntime_pybind_mlvalue.cc @@ -24,12 +24,14 @@ int OnnxRuntimeTensorToNumpyType(const DataTypeImpl* tensor_type) { static std::map type_map{ {DataTypeImpl::GetType(), NPY_BOOL}, {DataTypeImpl::GetType(), NPY_FLOAT}, + {DataTypeImpl::GetType(), NPY_FLOAT16}, {DataTypeImpl::GetType(), NPY_DOUBLE}, - {DataTypeImpl::GetType(), NPY_INT}, {DataTypeImpl::GetType(), NPY_INT8}, {DataTypeImpl::GetType(), NPY_UINT8}, {DataTypeImpl::GetType(), NPY_INT16}, {DataTypeImpl::GetType(), NPY_UINT16}, + {DataTypeImpl::GetType(), NPY_INT}, + {DataTypeImpl::GetType(), NPY_UINT}, {DataTypeImpl::GetType(), NPY_LONGLONG}, {DataTypeImpl::GetType(), NPY_ULONGLONG}, {DataTypeImpl::GetType(), NPY_OBJECT}, @@ -47,15 +49,32 @@ const DataTypeImpl* NumpyToOnnxRuntimeTensorType(int numpy_type) { static std::map type_map{ {NPY_BOOL, DataTypeImpl::GetType()}, {NPY_FLOAT, DataTypeImpl::GetType()}, + // Special, not a C type expands to enum value of 16 + {NPY_FLOAT16, DataTypeImpl::GetType()}, {NPY_DOUBLE, DataTypeImpl::GetType()}, - {NPY_INT, DataTypeImpl::GetType()}, - {NPY_INT8, DataTypeImpl::GetType()}, - {NPY_UINT8, DataTypeImpl::GetType()}, - {NPY_INT16, DataTypeImpl::GetType()}, - {NPY_UINT16, DataTypeImpl::GetType()}, + // We don't want to use size specific types such + // as NPY_INT32 bc they are not enums but hash defines + // which may map into other enums and may conflict with other entries here + // also NPY docs define these sizes as platform specific, thus we + // choose to do some rudimentary checks for proper mapping on C++ size + {NPY_BYTE, DataTypeImpl::GetType()}, + {NPY_UBYTE, DataTypeImpl::GetType()}, + {NPY_SHORT, sizeof(short) == sizeof(int16_t) ? DataTypeImpl::GetType() + : DataTypeImpl::GetType()}, + {NPY_USHORT, sizeof(unsigned short) == sizeof(uint16_t) ? DataTypeImpl::GetType() + : DataTypeImpl::GetType()}, + {NPY_INT, + sizeof(int) == sizeof(int32_t) ? DataTypeImpl::GetType() + : DataTypeImpl::GetType()}, + {NPY_UINT, sizeof(int) == sizeof(int32_t) ? DataTypeImpl::GetType() + : DataTypeImpl::GetType()}, + {NPY_LONG, - sizeof(long) == sizeof(int) ? DataTypeImpl::GetType() - : DataTypeImpl::GetType()}, + sizeof(long) == sizeof(int32_t) ? DataTypeImpl::GetType() + : DataTypeImpl::GetType()}, + {NPY_ULONG, + sizeof(unsigned long) == sizeof(uint32_t) ? DataTypeImpl::GetType() + : DataTypeImpl::GetType()}, {NPY_LONGLONG, DataTypeImpl::GetType()}, {NPY_ULONGLONG, DataTypeImpl::GetType()}, {NPY_UNICODE, DataTypeImpl::GetType()}, @@ -76,7 +95,8 @@ bool PyObjectCheck_Array(PyObject* o) { return PyObject_HasAttrString(o, "__array_finalize__"); } -void CreateTensorMLValue(AllocatorPtr alloc, const std::string& name_input, PyArrayObject* pyObject, MLValue* p_mlvalue) { +void CreateTensorMLValue(AllocatorPtr alloc, const std::string& name_input, PyArrayObject* pyObject, + OrtValue* p_mlvalue) { PyArrayObject* darray = PyArray_GETCONTIGUOUS(pyObject); if (darray == NULL) { throw std::runtime_error(std::string("The object must be a contiguous array for input '") + name_input + std::string("'.")); @@ -229,9 +249,8 @@ void CreateMapMLValue_LoopIntoMap(Py_ssize_t& pos, PyObject*& key, const std::st template void CreateMapMLValue_Map(Py_ssize_t& pos, PyObject*& key, const std::string& name_input, PyObject*& value, - PyObject* item, - AllocatorPtr /*alloc*/, MLValue* p_mlvalue, - KeyGetterType keyGetter, ValueGetterType valueGetter) { + PyObject* item, AllocatorPtr /*alloc*/, OrtValue* p_mlvalue, KeyGetterType keyGetter, + ValueGetterType valueGetter) { std::unique_ptr> dst; dst = std::make_unique>(); CreateMapMLValue_LoopIntoMap(pos, key, name_input, value, item, *dst, keyGetter, valueGetter); @@ -241,8 +260,7 @@ void CreateMapMLValue_Map(Py_ssize_t& pos, PyObject*& key, const std::string& na template void CreateMapMLValue_VectorMap(Py_ssize_t& pos, PyObject*& key, const std::string& name_input, PyObject*& value, - PyObject* iterator, PyObject* item, - AllocatorPtr /*alloc*/, MLValue* p_mlvalue, + PyObject* iterator, PyObject* item, AllocatorPtr /*alloc*/, OrtValue* p_mlvalue, KeyGetterType keyGetter, ValueGetterType valueGetter) { std::unique_ptr>> dstVector; dstVector = std::make_unique>>(); @@ -259,8 +277,7 @@ void CreateMapMLValue_VectorMap(Py_ssize_t& pos, PyObject*& key, const std::stri } void CreateMapMLValue_AgnosticMap(Py_ssize_t& pos, PyObject*& key, const std::string& name_input, PyObject*& value, - PyObject* iterator, PyObject* item, - AllocatorPtr alloc, MLValue* p_mlvalue) { + PyObject* iterator, PyObject* item, AllocatorPtr alloc, OrtValue* p_mlvalue) { // If iterator is NULL, it returns a single Map, // if is not NULL, it returns a VectorMap. auto int64Getter = [](PyObject* obj, int64_t& value) -> bool { @@ -330,7 +347,8 @@ void CreateMapMLValue_AgnosticMap(Py_ssize_t& pos, PyObject*& key, const std::st } } -void CreateMapMLValue_AgnosticVectorMap(PyObject* iterator, PyObject* item, AllocatorPtr alloc, const std::string& name_input, MLValue* p_mlvalue) { +void CreateMapMLValue_AgnosticVectorMap(PyObject* iterator, PyObject* item, AllocatorPtr alloc, + const std::string& name_input, OrtValue* p_mlvalue) { // CreateMapMLValue is called by CreateGenericTerableMLValue // or CreateGenericMLValue which ensures // item is a dictionary, no need to check type again. @@ -354,9 +372,10 @@ void CreateMapMLValue_AgnosticVectorMap(PyObject* iterator, PyObject* item, Allo } } -void CreateGenericIterableMLValue(PyObject* iterator, AllocatorPtr alloc, const std::string& name_input, MLValue* p_mlvalue) { +void CreateGenericIterableMLValue(PyObject* iterator, AllocatorPtr alloc, const std::string& name_input, + OrtValue* p_mlvalue) { PyObject* item; - MLValue ml_value; + OrtValue ml_value; item = PyIter_Next(iterator); if (item == NULL) { throw std::runtime_error("Input '" + name_input + "' must not be empty."); @@ -380,7 +399,7 @@ void CreateGenericIterableMLValue(PyObject* iterator, AllocatorPtr alloc, const } } -void CreateGenericMLValue(AllocatorPtr alloc, const std::string& name_input, py::object& value, MLValue* p_mlvalue) { +void CreateGenericMLValue(AllocatorPtr alloc, const std::string& name_input, py::object& value, OrtValue* p_mlvalue) { if (PyObjectCheck_Array(value.ptr())) { // The most frequent case: input comes as an array. PyArrayObject* arr = reinterpret_cast(value.ptr()); diff --git a/onnxruntime/python/onnxruntime_pybind_mlvalue.h b/onnxruntime/python/onnxruntime_pybind_mlvalue.h index 8b47883700e8f..d37c772e06527 100644 --- a/onnxruntime/python/onnxruntime_pybind_mlvalue.h +++ b/onnxruntime/python/onnxruntime_pybind_mlvalue.h @@ -24,7 +24,7 @@ namespace py = pybind11; int OnnxRuntimeTensorToNumpyType(const DataTypeImpl* tensor_type); -void CreateGenericMLValue(AllocatorPtr alloc, const std::string& name_input, py::object& value, MLValue* p_mlvalue); +void CreateGenericMLValue(AllocatorPtr alloc, const std::string& name_input, py::object& value, OrtValue* p_mlvalue); } // namespace python } // namespace onnxruntime diff --git a/onnxruntime/python/onnxruntime_pybind_state.cc b/onnxruntime/python/onnxruntime_pybind_state.cc index 644c361fa7c3d..b4489b89d7c2d 100644 --- a/onnxruntime/python/onnxruntime_pybind_state.cc +++ b/onnxruntime/python/onnxruntime_pybind_state.cc @@ -111,10 +111,10 @@ static const SessionOptions& GetDefaultCPUSessionOptions() { } template -void AddNonTensor(onnxruntime::MLValue& val, vector& pyobjs) { +void AddNonTensor(OrtValue& val, vector& pyobjs) { pyobjs.push_back(py::cast(val.Get())); } -void AddNonTensorAsPyObj(onnxruntime::MLValue& val, vector& pyobjs) { +void AddNonTensorAsPyObj(OrtValue& val, vector& pyobjs) { // Should be in sync with core/framework/datatypes.h if (val.Type() == DataTypeImpl::GetType()) { AddNonTensor(val, pyobjs); @@ -149,7 +149,7 @@ void AddNonTensorAsPyObj(onnxruntime::MLValue& val, vector& pyobjs) } } -void AddTensorAsPyObj(onnxruntime::MLValue& val, vector& pyobjs) { +void AddTensorAsPyObj(OrtValue& val, vector& pyobjs) { const Tensor& rtensor = val.Get(); std::vector npy_dims; const TensorShape& shape = rtensor.Shape(); @@ -361,8 +361,6 @@ void addOpSchemaSubmodule(py::module& m){ #endif //onnxruntime_PYBIND_EXPORT_OPSCHEMA void addObjectMethods(py::module& m) { - // allow unit tests to redirect std::cout and std::cerr to sys.stdout and sys.stderr - py::add_ostream_redirect(m, "onnxruntime_ostream_redirect"); py::class_(m, "SessionOptions", R"pbdoc(Configuration information for a session.)pbdoc") .def(py::init()) .def_readwrite("enable_cpu_mem_arena", &SessionOptions::enable_cpu_mem_arena, @@ -491,7 +489,7 @@ including arg name, arg type (contains both type and shape).)pbdoc") .def("run", [](InferenceSession* sess, std::vector output_names, std::map pyfeeds, RunOptions* run_options = nullptr) -> std::vector { NameMLValMap feeds; for (auto _ : pyfeeds) { - MLValue ml_value; + OrtValue ml_value; CreateGenericMLValue(GetAllocator(), _.first, _.second, &ml_value); if (PyErr_Occurred()) { PyObject *ptype, *pvalue, *ptraceback; @@ -509,13 +507,17 @@ including arg name, arg type (contains both type and shape).)pbdoc") feeds.insert(std::make_pair(_.first, ml_value)); } - std::vector fetches; + std::vector fetches; common::Status status; - if (run_options != nullptr) { - status = sess->Run(*run_options, feeds, output_names, &fetches); - } else { - status = sess->Run(feeds, output_names, &fetches); + { + // release GIL to allow multiple python threads to invoke Run() in parallel. + py::gil_scoped_release release; + if (run_options != nullptr) { + status = sess->Run(*run_options, feeds, output_names, &fetches); + } else { + status = sess->Run(feeds, output_names, &fetches); + } } if (!status.IsOK()) { diff --git a/onnxruntime/python/tools/gen_doc.py b/onnxruntime/python/tools/gen_doc.py deleted file mode 100644 index ff84b69a6aacd..0000000000000 --- a/onnxruntime/python/tools/gen_doc.py +++ /dev/null @@ -1,379 +0,0 @@ -#!/usr/bin/env python - -# This file is copied and adapted from https://github.com/onnx/onnx repository. -# There was no copyright statement on the file at the time of copying. -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function -from __future__ import unicode_literals - -from collections import defaultdict -import io -import os -import sys -import argparse - -import numpy as np # type: ignore - -import onnxruntime as rt -import onnxruntime.capi.onnxruntime_pybind11_state as rtpy -from onnxruntime.capi.onnxruntime_pybind11_state import schemadef -from onnxruntime.capi.onnxruntime_pybind11_state.schemadef import OpSchema #, ONNX_DOMAIN, ONNX_ML_DOMAIN -from typing import Any, Text, Sequence, Dict, List, Type, Set, Tuple - - -ONNX_ML = not bool(os.getenv('ONNX_ML') == '0') -ONNX_DOMAIN = "onnx" -ONNX_ML_DOMAIN = "onnx-ml" - -if ONNX_ML: - ext = '-ml.md' -else: - ext = '.md' - - -def display_number(v): # type: (int) -> Text - if OpSchema.is_infinite(v): - return '∞' - return Text(v) - - -def should_render_domain(domain): # type: (Text) -> bool - if domain == ONNX_DOMAIN or domain == '' or domain == ONNX_ML_DOMAIN or domain == 'ai.onnx.ml': - return False - return True - - -def format_name_with_domain(domain, schema_name): # type: (Text, Text) -> Text - if domain: - return '{}.{}'.format(domain, schema_name) - else: - return schema_name - - -def display_attr_type(v): # type: (OpSchema.AttrType) -> Text - assert isinstance(v, OpSchema.AttrType) - s = Text(v) - s = s[s.rfind('.') + 1:].lower() - if s[-1] == 's': - s = 'list of ' + s - return s - - -def display_domain(domain): # type: (Text) -> Text - if domain: - return "the '{}' operator set".format(domain) - else: - return "the default ONNX operator set" - - -def display_domain_short(domain): # type: (Text) -> Text - if domain: - return domain - else: - return 'ai.onnx (default)' - - -def display_version_link(name, version): # type: (Text, int) -> Text - changelog_md = 'Changelog' + ext - name_with_ver = '{}-{}'.format(name, version) - return '{}'.format(changelog_md, name_with_ver, name_with_ver) - - -def display_function_version_link(name, version): # type: (Text, int) -> Text - changelog_md = 'FunctionsChangelog' + ext - name_with_ver = '{}-{}'.format(name, version) - return '{}'.format(changelog_md, name_with_ver, name_with_ver) - - -def get_attribute_value(attr): # type: (AttributeProto) -> Any - if attr.HasField('f'): - return attr.f - elif attr.HasField('i'): - return attr.i - elif attr.HasField('s'): - return attr.s - elif attr.HasField('t'): - return attr.t - elif attr.HasField('g'): - return attr.g - elif len(attr.floats): - return list(attr.floats) - elif len(attr.ints): - return list(attr.ints) - elif len(attr.strings): - return list(attr.strings) - elif len(attr.tensors): - return list(attr.tensors) - elif len(attr.graphs): - return list(attr.graphs) - else: - raise ValueError("Unsupported ONNX attribute: {}".format(attr)) - -def display_schema(schema, versions): # type: (OpSchema, Sequence[OpSchema]) -> Text - s = '' - - # doc - if schema.doc: - s += '\n' - s += '\n'.join(' ' + line - for line in schema.doc.lstrip().splitlines()) - s += '\n' - - # since version - s += '\n#### Version\n' - if schema.support_level == OpSchema.SupportType.EXPERIMENTAL: - s += '\nNo versioning maintained for experimental ops.' - else: - s += '\nThis version of the operator has been ' + ('deprecated' if schema.deprecated else 'available') + ' since version {}'.format(schema.since_version) - s += ' of {}.\n'.format(display_domain(schema.domain)) - if len(versions) > 1: - # TODO: link to the Changelog.md - s += '\nOther versions of this operator: {}\n'.format( - ', '.join(display_version_link(format_name_with_domain(v.domain, v.name), - v.since_version) for v in versions[:-1])) - - # If this schema is deprecated, don't display any of the following sections - if schema.deprecated: - return s - - # attributes - if schema.attributes: - s += '\n#### Attributes\n\n' - s += '
\n' - for _, attr in sorted(schema.attributes.items()): - # option holds either required or default value - opt = '' - if attr.required: - opt = 'required' - elif hasattr(attr, 'default_value') and attr.default_value.name: - default_value = get_attribute_value(attr.default_value) - - def format_value(value): # type: (Any) -> Text - if isinstance(value, float): - value = np.round(value, 5) - if isinstance(value, (bytes, bytearray)) and sys.version_info[0] == 3: - value = value.decode('utf-8') - return str(value) - - if isinstance(default_value, list): - default_value = [format_value(val) for val in default_value] - else: - default_value = format_value(default_value) - opt = 'default is {}'.format(default_value) - - s += '
{} : {}{}
\n'.format( - attr.name, - display_attr_type(attr.type), - ' ({})'.format(opt) if opt else '') - s += '
{}
\n'.format(attr.description) - s += '
\n' - - # inputs - s += '\n#### Inputs' - if schema.min_input != schema.max_input: - s += ' ({} - {})'.format(display_number(schema.min_input), - display_number(schema.max_input)) - s += '\n\n' - if schema.inputs: - s += '
\n' - for input in schema.inputs: - option_str = "" - if OpSchema.FormalParameterOption.Optional == input.option: - option_str = " (optional)" - elif OpSchema.FormalParameterOption.Variadic == input.option: - if input.isHomogeneous: - option_str = " (variadic)" - else: - option_str = " (variadic, heterogeneous)" - s += '
{}{} : {}
\n'.format(input.name, option_str, input.typeStr) - s += '
{}
\n'.format(input.description) - s += '
\n' - - # outputs - s += '\n#### Outputs' - if schema.min_output != schema.max_output: - s += ' ({} - {})'.format(display_number(schema.min_output), - display_number(schema.max_output)) - s += '\n\n' - - if schema.outputs: - s += '
\n' - for output in schema.outputs: - option_str = "" - if OpSchema.FormalParameterOption.Optional == output.option: - option_str = " (optional)" - elif OpSchema.FormalParameterOption.Variadic == output.option: - if output.isHomogeneous: - option_str = " (variadic)" - else: - option_str = " (variadic, heterogeneous)" - s += '
{}{} : {}
\n'.format(output.name, option_str, output.typeStr) - s += '
{}
\n'.format(output.description) - s += '
\n' - - # type constraints - s += '\n#### Type Constraints' - s += '\n\n' - if schema.type_constraints: - s += '
\n' - for type_constraint in schema.type_constraints: - allowedTypes = type_constraint.allowed_type_strs - allowedTypeStr = '' - if (len(allowedTypes) > 0): - allowedTypeStr = allowedTypes[0] - for allowedType in allowedTypes[1:]: - allowedTypeStr += ', ' + allowedType - s += '
{} : {}
\n'.format( - type_constraint.type_param_str, allowedTypeStr) - s += '
{}
\n'.format(type_constraint.description) - s += '
\n' - - return s - - -def display_function(function, versions, domain=ONNX_DOMAIN): # type: (FunctionProto, List[int], Text) -> Text - s = '' - - if domain: - domain_prefix = '{}.'.format(ONNX_ML_DOMAIN) - else: - domain_prefix = '' - - # doc - if function.doc_string: - s += '\n' - s += '\n'.join(' ' + line - for line in function.doc_string.lstrip().splitlines()) - s += '\n' - - # since version - s += '\n#### Version\n' - s += '\nThis version of the function has been available since version {}'.format(function.since_version) - s += ' of {}.\n'.format(display_domain(domain_prefix)) - if len(versions) > 1: - s += '\nOther versions of this function: {}\n'.format( - ', '.join(display_function_version_link(domain_prefix + function.name, v) for v in versions if v != function.since_version)) - - # inputs - s += '\n#### Inputs' - s += '\n\n' - if function.input: - s += '
\n' - for input in function.input: - s += '
{};
\n'.format(input) - s += '
\n' - - # outputs - s += '\n#### Outputs' - s += '\n\n' - if function.output: - s += '
\n' - for output in function.output: - s += '
{};
\n'.format(output) - s += '
\n' - - # attributes - if function.attribute: - s += '\n#### Attributes\n\n' - s += '
\n' - for attr in function.attribute: - s += '
{};
\n'.format(attr) - s += '
\n' - - return s - - -def support_level_str(level): # type: (OpSchema.SupportType) -> Text - return \ - "experimental " if level == OpSchema.SupportType.EXPERIMENTAL else "" - - -# def function_status_str(status=OperatorStatus.Value("EXPERIMENTAL")): # type: ignore -# return \ -# "experimental " if status == OperatorStatus.Value('EXPERIMENTAL') else "" # type: ignore - - -def main(args): # type: (Type[Args]) -> None - - with io.open(args.output, 'w', newline='', encoding="utf-8") as fout: - fout.write('## Contrib Operator Schemas\n') - fout.write( - "*This file is automatically generated from the\n" - " [def files](/onnxruntime/core/graph/contrib_ops/contrib_defs.cc) via [this script](/onnxruntime/python/tools/gen_doc.py).\n" - " Do not modify directly and instead edit operator definitions.*\n") - - # domain -> support level -> name -> [schema] - index = defaultdict(lambda: defaultdict(lambda: defaultdict(list))) # type: Dict[Text, Dict[int, Dict[Text, List[OpSchema]]]] - for schema in rtpy.get_all_operator_schema(): - index[schema.domain][int(schema.support_level)][schema.name].append(schema) - - fout.write('\n') - - # Preprocess the Operator Schemas - # [(domain, [(support_level, [(schema name, current schema, all versions schemas)])])] - operator_schemas = list() # type: List[Tuple[Text, List[Tuple[int, List[Tuple[Text, OpSchema, List[OpSchema]]]]]]] - exsting_ops = set() # type: Set[Text] - for domain, _supportmap in sorted(index.items()): - if not should_render_domain(domain): - continue - - processed_supportmap = list() - for _support, _namemap in sorted(_supportmap.items()): - processed_namemap = list() - for n, unsorted_versions in sorted(_namemap.items()): - versions = sorted(unsorted_versions, key=lambda s: s.since_version) - schema = versions[-1] - if schema.name in exsting_ops: - continue - exsting_ops.add(schema.name) - processed_namemap.append((n, schema, versions)) - processed_supportmap.append((_support, processed_namemap)) - operator_schemas.append((domain, processed_supportmap)) - - # Table of contents - for domain, supportmap in operator_schemas: - s = '* {}\n'.format(display_domain_short(domain)) - fout.write(s) - - for _, namemap in supportmap: - for n, schema, versions in namemap: - s = ' * {}{}\n'.format( - support_level_str(schema.support_level), - format_name_with_domain(domain, n), - format_name_with_domain(domain, n)) - fout.write(s) - - fout.write('\n') - - for domain, supportmap in operator_schemas: - s = '## {}\n'.format(display_domain_short(domain)) - fout.write(s) - - for _, namemap in supportmap: - for op_type, schema, versions in namemap: - # op_type - s = ('### {}**{}**' + (' (deprecated)' if schema.deprecated else '') + '\n').format( - support_level_str(schema.support_level), - format_name_with_domain(domain, op_type), - format_name_with_domain(domain, op_type.lower()), - format_name_with_domain(domain, op_type)) - - s += display_schema(schema, versions) - - s += '\n\n' - - fout.write(s) - - -if __name__ == '__main__': - parser = argparse.ArgumentParser(description='ONNX Runtime Operator Documentation Generator') - parser.add_argument('--output_path', help='output markdown file path', - default=os.path.join(os.path.dirname(os.path.realpath(__file__)), 'ContribOperators.md') - ) - args = parser.parse_args() - - - class Args(object): - output = args.output_path - main(Args) diff --git a/onnxruntime/test/framework/allocation_planner_test.cc b/onnxruntime/test/framework/allocation_planner_test.cc index d7467de703fb4..ccfff5363b40f 100644 --- a/onnxruntime/test/framework/allocation_planner_test.cc +++ b/onnxruntime/test/framework/allocation_planner_test.cc @@ -103,7 +103,7 @@ class AllocationPlanTestUtility { EXPECT_GE(index, 0); EXPECT_LT(index, num_ml_values); // An index should not be freed more than once - EXPECT_EQ(freed.count(index), 0) << "MLValue " << index << " freed multiple times"; + EXPECT_EQ(freed.count(index), 0) << "OrtValue " << index << " freed multiple times"; freed.insert(index); } // Check the free-index information for every execution step: they should cover the @@ -126,7 +126,7 @@ class SequentialPlannerTestContext : public ISequentialPlannerContext { public: SequentialPlannerTestContext(ShapeMap* shape_map) : shape_map_(shape_map) {} - virtual TensorShapeProto* GetShape(const onnxruntime::NodeArg& arg) const override { + TensorShapeProto* GetShape(const onnxruntime::NodeArg& arg) const override { auto iter = shape_map_->find(&arg); return (shape_map_->end() != iter) ? iter->second : nullptr; } @@ -160,7 +160,7 @@ class PlannerTest : public ::testing::Test { std::unique_ptr plan_; public: - PlannerTest() : model_("test"), graph_{model_.MainGraph()}, state_{execution_providers_} { + PlannerTest() : model_("test"), graph_{model_.MainGraph()}, state_{execution_providers_, false} { std_kernel_ = KernelDefBuilder().SetName("Transpose").Build(); in_place_kernel_ = KernelDefBuilder().SetName("Clip").MayInplace(0, 0).Build(); CPUExecutionProviderInfo epi; diff --git a/onnxruntime/test/framework/cuda/fence_cuda_test.cc b/onnxruntime/test/framework/cuda/fence_cuda_test.cc index 73f0730b7a6a8..65287e0cf6fc5 100644 --- a/onnxruntime/test/framework/cuda/fence_cuda_test.cc +++ b/onnxruntime/test/framework/cuda/fence_cuda_test.cc @@ -108,7 +108,7 @@ TEST(CUDAFenceTests, DISABLED_PartOnCPU) { shape, cpu_allocator); memcpy(p_tensor->MutableData(), data, sizeof(data)); - MLValue value; + OrtValue value; value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); @@ -121,10 +121,8 @@ TEST(CUDAFenceTests, DISABLED_PartOnCPU) { ASSERT_TRUE(session.Initialize().IsOK()); ASSERT_TRUE(1 == CountCopyNodes(graph)); - vector outputs; - session.Run(std::unordered_map{{"X1", value}}, - std::vector{"Out"}, - &outputs); + vector outputs; + session.Run(std::unordered_map{{"X1", value}}, std::vector{"Out"}, &outputs); ASSERT_TRUE(1 == outputs.size()); const Tensor& output = outputs[0].Get(); EXPECT_EQ(output.Shape(), shape); @@ -163,7 +161,7 @@ TEST(CUDAFenceTests, TileWithInitializer) { cpu_allocator); memcpy(p_tensor->MutableData(), data, sizeof(data)); - MLValue value; + OrtValue value; value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); @@ -175,10 +173,8 @@ TEST(CUDAFenceTests, TileWithInitializer) { session.RegisterExecutionProvider(std::make_unique(xp_info)); ASSERT_TRUE(session.Initialize().IsOK()); - vector outputs; - session.Run(std::unordered_map{{"X1", value}}, - std::vector{"Y"}, - &outputs); + vector outputs; + session.Run(std::unordered_map{{"X1", value}}, std::vector{"Y"}, &outputs); ASSERT_TRUE(1 == outputs.size()); const Tensor& output = outputs[0].Get(); EXPECT_EQ(output.Shape(), TensorShape({2, 4})); @@ -227,7 +223,7 @@ TEST(CUDAFenceTests, TileWithComputedInput) { cpu_allocator); memcpy(p_tensor->MutableData(), data, sizeof(data)); - MLValue value; + OrtValue value; value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); @@ -239,10 +235,8 @@ TEST(CUDAFenceTests, TileWithComputedInput) { session.RegisterExecutionProvider(std::make_unique(xp_info)); ASSERT_TRUE(session.Initialize().IsOK()); - vector outputs; - session.Run(std::unordered_map{{"X1", value}}, - std::vector{"Out"}, - &outputs); + vector outputs; + session.Run(std::unordered_map{{"X1", value}}, std::vector{"Out"}, &outputs); ASSERT_TRUE(1 == outputs.size()); const Tensor& output = outputs[0].Get(); EXPECT_EQ(output.Shape(), TensorShape({4, 4})); diff --git a/onnxruntime/test/framework/execution_frame_test.cc b/onnxruntime/test/framework/execution_frame_test.cc index 752edb35efed9..9b1b5f0775693 100644 --- a/onnxruntime/test/framework/execution_frame_test.cc +++ b/onnxruntime/test/framework/execution_frame_test.cc @@ -53,7 +53,7 @@ TEST(ExecutionFrameTest, TensorAllocationTest) { status = kernel_registry_manager.RegisterKernels(execution_providers); EXPECT_TRUE(status.IsOK()) << status.ErrorMessage(); - SessionState state{execution_providers}; + SessionState state{execution_providers, true}; state.SetGraphViewer(std::make_unique(graph)); MLValueNameIdxMap& mlvalue_name_idx_map{state.GetMLValueNameIdxMap()}; @@ -64,26 +64,27 @@ TEST(ExecutionFrameTest, TensorAllocationTest) { std::unique_ptr p_seq_exec_plan; // TODO below line is for testing only. In production use SequentialPlanner::CreatePlan() + SequentialPlannerContext context(false); status = SequentialPlanner::CreatePlan(nullptr, GraphViewer(graph), {}, execution_providers, kernel_registry_manager, - mlvalue_name_idx_map, p_seq_exec_plan); + mlvalue_name_idx_map, context, p_seq_exec_plan); EXPECT_TRUE(status.IsOK()) << status.ErrorMessage(); state.SetExecutionPlan(std::move(p_seq_exec_plan)); state.CalculateNodeIndexInfo(); - vector outputs; + vector outputs; ExecutionFrame frame({}, {}, {}, outputs, {}, state); int start_index = frame.GetNodeOffset(node->Index()); EXPECT_EQ(start_index, 0); TensorShape shape(std::vector{2, 3}); - MLValue& mlvalue0 = *frame.GetMutableNodeInputOrOutputMLValue(start_index); + OrtValue& mlvalue0 = *frame.GetMutableNodeInputOrOutputMLValue(start_index); status = frame.AllocateMLValueTensorSelfOwnBuffer(mlvalue0, start_index, DataTypeImpl::GetType(), execution_providers.Get(xp_typ)->GetAllocator(0, OrtMemTypeDefault)->Info(), shape); EXPECT_TRUE(status.IsOK()) << status.ErrorMessage(); - MLValue* p_ml_value = frame.GetMutableNodeInputOrOutputMLValue(0); + OrtValue* p_ml_value = frame.GetMutableNodeInputOrOutputMLValue(0); Tensor* p_tensor = p_ml_value ? p_ml_value->GetMutable() : nullptr; EXPECT_TRUE(p_tensor); EXPECT_EQ(p_tensor->Shape(), shape); @@ -91,7 +92,7 @@ TEST(ExecutionFrameTest, TensorAllocationTest) { //test share memory from tensor TensorShape shape2(std::vector{3, 2}); - MLValue& mlvalue1 = *frame.GetMutableNodeInputOrOutputMLValue(start_index + 1); + OrtValue& mlvalue1 = *frame.GetMutableNodeInputOrOutputMLValue(start_index + 1); status = frame.AllocateMLValueTensorPreAllocateBuffer(mlvalue1, start_index, DataTypeImpl::GetType(), @@ -99,7 +100,7 @@ TEST(ExecutionFrameTest, TensorAllocationTest) { shape2); EXPECT_TRUE(status.IsOK()) << status.ErrorMessage(); - const MLValue* p_ml_value_const = frame.GetNodeInputOrOutputMLValue(1); + const OrtValue* p_ml_value_const = frame.GetNodeInputOrOutputMLValue(1); auto tensor2 = p_ml_value_const ? &(p_ml_value_const->Get()) : nullptr; EXPECT_TRUE(tensor2); EXPECT_EQ(tensor2->Shape(), shape2); @@ -122,7 +123,7 @@ TEST(ExecutionFrameTest, FeedInDataTest) { std::unique_ptr p_tensor = std::make_unique(element_type, shape, cpu_allocator); - MLValue value; + OrtValue value; value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); @@ -135,7 +136,7 @@ TEST(ExecutionFrameTest, FeedInDataTest) { execution_providers.Add(xp_typ, std::move(cpu_xp)); EXPECT_TRUE(kernel_registry_manager.RegisterKernels(execution_providers).IsOK()); - SessionState state{execution_providers}; + SessionState state{execution_providers, true}; state.SetGraphViewer(std::make_unique(graph)); MLValueNameIdxMap& mlvalue_name_idx_map{state.GetMLValueNameIdxMap()}; @@ -144,10 +145,10 @@ TEST(ExecutionFrameTest, FeedInDataTest) { state.CalculateNodeIndexInfo(); - vector outputs; + vector outputs; ExecutionFrame frame({x_idx}, {value}, {y_idx}, outputs, {}, state); - MLValue* p_ml_value = frame.GetMutableNodeInputOrOutputMLValue(0); + OrtValue* p_ml_value = frame.GetMutableNodeInputOrOutputMLValue(0); Tensor* p_tensor_arg_0 = p_ml_value ? p_ml_value->GetMutable() : nullptr; EXPECT_TRUE(p_tensor_arg_0); EXPECT_EQ(p_tensor_arg_0->Shape(), shape); @@ -182,13 +183,12 @@ TEST(ExecutionFrameTest, MemPatternTest) { EXPECT_TRUE(status.IsOK()) << status.ErrorMessage(); KernelRegistryManager kernel_registry_manager; - kernel_registry_manager.RegisterKernelRegistry(cpu_xp->GetKernelRegistry()); ExecutionProviders execution_providers; execution_providers.Add(xp_type, std::move(cpu_xp)); - + kernel_registry_manager.RegisterKernels(execution_providers); //1. prepare input - SessionState state{execution_providers}; + SessionState state{execution_providers, true}; state.SetGraphViewer(std::make_unique(graph)); MLValueNameIdxMap& mlvalue_name_idx_map{state.GetMLValueNameIdxMap()}; @@ -202,7 +202,7 @@ TEST(ExecutionFrameTest, MemPatternTest) { auto cpu_allocator = execution_providers.Get(xp_type)->GetAllocator(0, OrtMemTypeDefault); - MLValue v1, v2, v3; + OrtValue v1, v2, v3; CreateMLValue(cpu_allocator, std::vector{1, 2}, std::vector{1.0f, 1.0f}, &v1); @@ -214,20 +214,21 @@ TEST(ExecutionFrameTest, MemPatternTest) { std::vector(6, 1.0f), &v3); std::unique_ptr p_seq_exec_plan = std::make_unique(); + SequentialPlannerContext context(false); status = SequentialPlanner::CreatePlan(nullptr, GraphViewer(graph), {}, execution_providers, kernel_registry_manager, - mlvalue_name_idx_map, p_seq_exec_plan); + mlvalue_name_idx_map, context, p_seq_exec_plan); EXPECT_TRUE(status.IsOK()) << status.ErrorMessage(); state.SetExecutionPlan(std::move(p_seq_exec_plan)); state.CalculateNodeIndexInfo(); - vector outputs; + vector outputs; ExecutionFrame frame({x1_idx, x2_idx, x3_idx}, {v1, v2, v3}, {t3_idx}, outputs, {}, state); - MLValue& mlvalue3 = *frame.GetMutableNodeInputOrOutputMLValue(3); - MLValue& mlvalue4 = *frame.GetMutableNodeInputOrOutputMLValue(4); - MLValue& mlvalue5 = *frame.GetMutableNodeInputOrOutputMLValue(5); + OrtValue& mlvalue3 = *frame.GetMutableNodeInputOrOutputMLValue(3); + OrtValue& mlvalue4 = *frame.GetMutableNodeInputOrOutputMLValue(4); + OrtValue& mlvalue5 = *frame.GetMutableNodeInputOrOutputMLValue(5); status = frame.AllocateMLValueTensorSelfOwnBuffer(mlvalue3, 3, DataTypeImpl::GetType(), diff --git a/onnxruntime/test/framework/float_16_test.cc b/onnxruntime/test/framework/float_16_test.cc index a3adb7414aefb..399ac6882d1e3 100644 --- a/onnxruntime/test/framework/float_16_test.cc +++ b/onnxruntime/test/framework/float_16_test.cc @@ -105,7 +105,7 @@ void RunSession(InferenceSession& session_object, std::vector& dims_y, std::vector& values_y) { // prepare inputs - MLValue ml_value; + OrtValue ml_value; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_x, values_x, &ml_value); NameMLValMap feeds; feeds.insert(std::make_pair("X", ml_value)); @@ -113,7 +113,7 @@ void RunSession(InferenceSession& session_object, // prepare outputs std::vector output_names; output_names.push_back("Y"); - std::vector fetches; + std::vector fetches; // Now run common::Status st = session_object.Run(run_options, feeds, output_names, &fetches); diff --git a/onnxruntime/test/framework/inference_session_test.cc b/onnxruntime/test/framework/inference_session_test.cc index 82caf20dc19c8..b3c0e4b0b8bc5 100644 --- a/onnxruntime/test/framework/inference_session_test.cc +++ b/onnxruntime/test/framework/inference_session_test.cc @@ -125,8 +125,7 @@ class FuseExecutionProvider : public IExecutionProvider { }; namespace test { -static void VerifyOutputs(const std::vector& fetches, - const std::vector& expected_dims, +static void VerifyOutputs(const std::vector& fetches, const std::vector& expected_dims, const std::vector& expected_values); static const std::string MODEL_URI = "testdata/mul_1.pb"; static const std::string MODEL_URI_NO_OPSET = "testdata/mul_1.pb.noopset"; @@ -167,8 +166,7 @@ static void CreateMatMulModel(std::unique_ptr& p_model, Prov ASSERT_TRUE(status.IsOK()) << status.ErrorMessage(); } -void VerifyOutputs(const std::vector& fetches, - const std::vector& expected_dims, +void VerifyOutputs(const std::vector& fetches, const std::vector& expected_dims, const std::vector& expected_values) { ASSERT_EQ(1, fetches.size()); auto& rtensor = fetches.front().Get(); @@ -185,7 +183,7 @@ void RunModel(InferenceSession& session_object, // prepare inputs std::vector dims_mul_x = {3, 2}; std::vector values_mul_x = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f}; - MLValue ml_value; + OrtValue ml_value; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value); NameMLValMap feeds; @@ -194,7 +192,7 @@ void RunModel(InferenceSession& session_object, // prepare outputs std::vector output_names; output_names.push_back("Y"); - std::vector fetches; + std::vector fetches; if (is_preallocate_output_vec) { fetches.resize(output_names.size()); @@ -237,11 +235,11 @@ void RunModelWithBindingMatMul(InferenceSession& session_object, */ // bind one input to cpu allocator from bind_provider_type, and another on user provided CPU memory // so both code pathes are covered - MLValue input_ml_value_A; + OrtValue input_ml_value_A; std::vector dims_mul_x_A = {3, 4}; CreateMLValue(input_allocator, dims_mul_x_A, values_mul_x, &input_ml_value_A); - MLValue input_ml_value_B; + OrtValue input_ml_value_B; std::vector dims_mul_x_B = {4, 3}; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x_B, values_mul_x, &input_ml_value_B); @@ -251,7 +249,7 @@ void RunModelWithBindingMatMul(InferenceSession& session_object, // prepare outputs std::vector expected_output_dims = {3, 3}; - MLValue output_ml_value; + OrtValue output_ml_value; if (is_preallocate_output_vec) { if (allocation_provider == kCpuExecutionProvider) { AllocateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), expected_output_dims, @@ -281,7 +279,7 @@ void RunModelWithBindingMatMul(InferenceSession& session_object, allocation_provider == kCudaExecutionProvider) { #ifdef USE_CUDA // in this case we need to copy the tensor from cuda to cpu - vector& outputs = io_binding->GetOutputs(); + vector& outputs = io_binding->GetOutputs(); ASSERT_EQ(1, outputs.size()); auto& rtensor = outputs.front().Get(); auto element_type = rtensor.DataType(); @@ -292,7 +290,7 @@ void RunModelWithBindingMatMul(InferenceSession& session_object, cpu_allocator); st = TestCudaExecutionProvider()->CopyTensor(rtensor, *cpu_tensor.get()); ASSERT_TRUE(st.IsOK()); - MLValue ml_value; + OrtValue ml_value; ml_value.Init(cpu_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); @@ -314,8 +312,9 @@ TEST(InferenceSessionTests, NoTimeout) { so.session_logid = "InferenceSessionTests.NoTimeout"; InferenceSession session_object{so, &DefaultLoggingManager()}; - ASSERT_TRUE(session_object.Load(MODEL_URI).IsOK()); - ASSERT_TRUE(session_object.Initialize().IsOK()); + Status st; + ASSERT_TRUE((st = session_object.Load(MODEL_URI)).IsOK()) << st.ErrorMessage(); + ASSERT_TRUE((st = session_object.Initialize()).IsOK()) << st.ErrorMessage(); RunOptions run_options; run_options.run_tag = "one session/one tag"; @@ -721,7 +720,7 @@ TEST(InferenceSessionTests, TestIOBindingReuse) { Status st = session_object.NewIOBinding(&io_binding); ASSERT_TRUE(st.IsOK()); - MLValue ml_value1; + OrtValue ml_value1; vector v1{2.f}; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), {1}, v1, &ml_value1); io_binding->BindOutput("foo", ml_value1); @@ -732,7 +731,7 @@ TEST(InferenceSessionTests, TestIOBindingReuse) { ASSERT_TRUE(v1[i] == span[i]); } - MLValue ml_value2; + OrtValue ml_value2; vector v2{3.f}; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), {1}, v2, &ml_value2); io_binding->BindOutput("foo", ml_value2); @@ -759,7 +758,7 @@ TEST(InferenceSessionTests, InvalidInputTypeOfTensorElement) { // prepare inputs std::vector dims_mul_x = {3, 2}; std::vector values_mul_x = {1, 2, 3, 4, 5, 6}; - MLValue ml_value; + OrtValue ml_value; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value); NameMLValMap feeds; @@ -768,7 +767,7 @@ TEST(InferenceSessionTests, InvalidInputTypeOfTensorElement) { // prepare outputs std::vector output_names; output_names.push_back("Y"); - std::vector fetches; + std::vector fetches; // prepare expected inputs and outputs std::vector expected_dims_mul_y = {3, 2}; @@ -890,15 +889,15 @@ static common::Status RunOptionalInputTest(bool add_required_input, std::vector optional_input_val = {10.f}; // override initializer value of 1 std::vector unknown_input_val = {20.f}; - MLValue required_input_mlvalue; + OrtValue required_input_mlvalue; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims, required_input_val, &required_input_mlvalue); - MLValue optional_input_mlvalue; + OrtValue optional_input_mlvalue; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims, optional_input_val, &optional_input_mlvalue); - MLValue unknown_input_mlvalue; + OrtValue unknown_input_mlvalue; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims, unknown_input_val, &unknown_input_mlvalue); @@ -916,7 +915,7 @@ static common::Status RunOptionalInputTest(bool add_required_input, // prepare outputs std::vector output_names; output_names.push_back("add_output"); - std::vector fetches; + std::vector fetches; float expected_value = required_input_val[0]; expected_value += add_optional_input ? optional_input_val[0] : 1.f; @@ -924,7 +923,7 @@ static common::Status RunOptionalInputTest(bool add_required_input, status = session_object.Run(run_options, feeds, output_names, &fetches); if (status.IsOK()) { - MLValue& output = fetches.front(); + OrtValue& output = fetches.front(); const auto& tensor = output.Get(); float output_value = *tensor.Data(); if (output_value != expected_value) { @@ -1005,11 +1004,11 @@ TEST(ExecutionProviderTest, FunctionTest) { std::vector dims_mul_x = {3, 2}; std::vector values_mul_x = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f}; - MLValue ml_value_x; + OrtValue ml_value_x; CreateMLValue(testCPUExecutionProvider->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value_x); - MLValue ml_value_y; + OrtValue ml_value_y; CreateMLValue(testCPUExecutionProvider->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value_y); - MLValue ml_value_z; + OrtValue ml_value_z; CreateMLValue(testCPUExecutionProvider->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value_z); NameMLValMap feeds; feeds.insert(std::make_pair("X", ml_value_x)); @@ -1019,7 +1018,7 @@ TEST(ExecutionProviderTest, FunctionTest) { // prepare outputs std::vector output_names; output_names.push_back("M"); - std::vector fetches; + std::vector fetches; // prepare expected inputs and outputs std::vector expected_dims_mul_m = {3, 2}; @@ -1109,13 +1108,13 @@ TEST(ExecutionProviderTest, FunctionInlineTest) { std::vector dims_mul_x = {2, 2}; std::vector values_mul_x = {1.0f, 2.0f, 3.0f, 4.0f}; - MLValue ml_value_x; + OrtValue ml_value_x; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value_x); - MLValue ml_value_y; + OrtValue ml_value_y; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value_y); - MLValue ml_value_z; + OrtValue ml_value_z; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value_z); NameMLValMap feeds; @@ -1126,7 +1125,7 @@ TEST(ExecutionProviderTest, FunctionInlineTest) { // prepare outputs std::vector output_names; output_names.push_back("M"); - std::vector fetches; + std::vector fetches; // prepare expected inputs and outputs std::vector expected_dims_mul_m = {2, 2}; @@ -1209,7 +1208,7 @@ TEST(InferenceSessionTests, TestTruncatedSequence) { -2.5120965e-04f, -2.9920202e-03f, 3.0980256e-05f, -3.5933927e-03f}; - MLValue ml_value; + OrtValue ml_value; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), X_dims, X, &ml_value); std::string input_name = "Input13165"; @@ -1227,7 +1226,7 @@ TEST(InferenceSessionTests, TestTruncatedSequence) { } std::vector output_names = {final_output_name}; - std::vector fetches; + std::vector fetches; // Now run the full sequence common::Status st = session_object.Run(run_options, feeds, output_names, &fetches); @@ -1255,7 +1254,7 @@ TEST(InferenceSessionTests, TestTruncatedSequence) { for (auto truncated_len : truncated_lengths) { std::vector truncated_input_dims = X_dims; truncated_input_dims[0] = truncated_len; - MLValue truncated_ml_value; + OrtValue truncated_ml_value; std::vector truncated_input(X.begin() + seq_start * seq_stride, X.begin() + (seq_start + truncated_len) * seq_stride); CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), truncated_input_dims, truncated_input, &truncated_ml_value); NameMLValMap truncated_feeds = {{input_name, truncated_ml_value}}; @@ -1268,7 +1267,7 @@ TEST(InferenceSessionTests, TestTruncatedSequence) { truncated_feeds.insert(std::make_pair(iter->second, fetches[i_output])); } } - std::vector truncated_fetches; + std::vector truncated_fetches; st = session_object.Run(run_options, truncated_feeds, output_names, &truncated_fetches); if (!st.IsOK()) { std::cout << "Run returned status: " << st.ErrorMessage() << std::endl; @@ -1308,12 +1307,12 @@ TEST(InferenceSessionTests, TestCopyToFromDevices) { // prepare inputs std::vector dims_mul_x = {3, 2}; std::vector values_mul_x = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f}; - MLValue ml_value; + OrtValue ml_value; CreateMLValue(p_dummy_provider->GetAllocator(0, OrtMemTypeDefault), dims_mul_x, values_mul_x, &ml_value); std::vector feed_names; - std::vector feeds; + std::vector feeds; feed_names.push_back("X"); feeds.push_back(ml_value); @@ -1324,7 +1323,7 @@ TEST(InferenceSessionTests, TestCopyToFromDevices) { auto run_test = [&](int run_num) { // prepare outputs std::vector output_names; - std::vector fetches; + std::vector fetches; output_names.push_back("Y"); fetches.resize(output_names.size()); @@ -1394,5 +1393,28 @@ TEST(InferenceSessionTests, TestL1AndL2Transformers) { } } +#ifdef USE_CUDA + +TEST(InferenceSessionTests, TestParallelExecutionWithCudaProvider) { + string model_uri = "testdata/transform/fusion/fuse-conv-bn-mul-add-unsqueeze.onnx"; + + SessionOptions so; + so.enable_sequential_execution = false; + so.session_logid = "InferenceSessionTests.TestParallelExecutionWithCudaProvider"; + InferenceSession session_object{so}; + + CUDAExecutionProviderInfo epi; + epi.device_id = 0; + EXPECT_TRUE(session_object.RegisterExecutionProvider(std::make_unique(epi)).IsOK()); + + ASSERT_TRUE(session_object.Load(model_uri).IsOK()); + + auto status = session_object.Initialize(); + + ASSERT_TRUE(!status.IsOK()); +} + +#endif + } // namespace test } // namespace onnxruntime diff --git a/onnxruntime/test/framework/local_kernel_registry_test.cc b/onnxruntime/test/framework/local_kernel_registry_test.cc index 796221d4bd358..7cd6d3b6865c3 100644 --- a/onnxruntime/test/framework/local_kernel_registry_test.cc +++ b/onnxruntime/test/framework/local_kernel_registry_test.cc @@ -197,7 +197,7 @@ void RunSession(InferenceSession& session_object, std::vector& dims_y, std::vector& values_y) { // prepare inputs - MLValue ml_value; + OrtValue ml_value; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_x, values_x, &ml_value); NameMLValMap feeds; feeds.insert(std::make_pair("X", ml_value)); @@ -205,7 +205,7 @@ void RunSession(InferenceSession& session_object, // prepare outputs std::vector output_names; output_names.push_back("Y"); - std::vector fetches; + std::vector fetches; // Now run common::Status st = session_object.Run(run_options, feeds, output_names, &fetches); diff --git a/onnxruntime/test/framework/memcpy_transformer_test.cc b/onnxruntime/test/framework/memcpy_transformer_test.cc index b8c88c13b1b81..f8ef7b35622c5 100644 --- a/onnxruntime/test/framework/memcpy_transformer_test.cc +++ b/onnxruntime/test/framework/memcpy_transformer_test.cc @@ -68,9 +68,13 @@ void ExpectCopy(const onnxruntime::Node& source, const std::string copy_op, } EXPECT_TRUE(false) << "Copy node expected but not found"; } +#ifdef USE_CUDA TEST(TransformerTest, MemcpyTransformerTest) { - auto model = std::make_shared("test"); + std::unordered_map domain_to_version; + domain_to_version[kOnnxDomain] = 7; + auto model = std::make_shared("test", false, ModelMetaData(), IOnnxRuntimeOpSchemaRegistryList(), + domain_to_version); onnxruntime::Graph& graph = model->MainGraph(); TypeProto tensor_float_type; @@ -95,9 +99,14 @@ TEST(TransformerTest, MemcpyTransformerTest) { auto status = graph.Resolve(); ASSERT_TRUE(status.IsOK()) << status.ErrorMessage(); - auto cpu_execution_provider = TestCPUExecutionProvider(); + KernelRegistryManager kernel_registry_manager; + ExecutionProviders execution_providers; + execution_providers.Add(onnxruntime::kCudaExecutionProvider, + std::make_unique(CUDAExecutionProviderInfo())); + execution_providers.Add(onnxruntime::kCpuExecutionProvider, + std::make_unique(CPUExecutionProviderInfo())); KernelRegistryManager test_registry_manager; - test_registry_manager.RegisterKernelRegistry(cpu_execution_provider->GetKernelRegistry()); + test_registry_manager.RegisterKernels(execution_providers); MemcpyTransformer transformer({onnxruntime::kCudaExecutionProvider}, test_registry_manager); @@ -116,7 +125,10 @@ TEST(TransformerTest, MemcpyTransformerTest) { } TEST(TransformerTest, MemcpyTransformerTestCudaFirst) { - auto model = std::make_shared("test"); + std::unordered_map domain_to_version; + domain_to_version[kOnnxDomain] = 7; + auto model = std::make_shared("test", false, ModelMetaData(), IOnnxRuntimeOpSchemaRegistryList(), + domain_to_version); onnxruntime::Graph& graph = model->MainGraph(); TypeProto tensor_float_type; @@ -133,7 +145,7 @@ TEST(TransformerTest, MemcpyTransformerTestCudaFirst) { node1.SetExecutionProviderType(onnxruntime::kCudaExecutionProvider); auto& node2 = graph.AddNode("node2", "MatMul", "cpu operator1", ArgMap{&o1_def, &i3_def}, ArgMap{&o2_def}); node2.SetExecutionProviderType(onnxruntime::kCpuExecutionProvider); - auto& node3 = graph.AddNode("node3", "Clip", "gpu operator2", ArgMap{&o2_def}, ArgMap{&o3_def}); + auto& node3 = graph.AddNode("node3", "Abs", "gpu operator2", ArgMap{&o2_def}, ArgMap{&o3_def}); node3.SetExecutionProviderType(onnxruntime::kCudaExecutionProvider); auto& node4 = graph.AddNode("node4", "MatMul", "cpu operator2", ArgMap{&o2_def, &o2_def}, ArgMap{&o4_def}); node4.SetExecutionProviderType(onnxruntime::kCpuExecutionProvider); @@ -141,9 +153,14 @@ TEST(TransformerTest, MemcpyTransformerTestCudaFirst) { auto status = graph.Resolve(); ASSERT_TRUE(status.IsOK()) << status.ErrorMessage(); - auto cpu_execution_provider = TestCPUExecutionProvider(); + KernelRegistryManager kernel_registry_manager; + ExecutionProviders execution_providers; + execution_providers.Add(onnxruntime::kCudaExecutionProvider, + std::make_unique(CUDAExecutionProviderInfo())); + execution_providers.Add(onnxruntime::kCpuExecutionProvider, + std::make_unique(CPUExecutionProviderInfo())); KernelRegistryManager test_registry_manager; - test_registry_manager.RegisterKernelRegistry(cpu_execution_provider->GetKernelRegistry()); + test_registry_manager.RegisterKernels(execution_providers); MemcpyTransformer transformer({onnxruntime::kCudaExecutionProvider}, test_registry_manager); @@ -160,6 +177,7 @@ TEST(TransformerTest, MemcpyTransformerTestCudaFirst) { ExpectSame(node2, node4, 0); ExpectSame(node2, node4, 1); } +#endif } // namespace test } // namespace onnxruntime diff --git a/onnxruntime/test/framework/opaque_kernels_test.cc b/onnxruntime/test/framework/opaque_kernels_test.cc index fe3367c86c2b4..ab994cce963f8 100644 --- a/onnxruntime/test/framework/opaque_kernels_test.cc +++ b/onnxruntime/test/framework/opaque_kernels_test.cc @@ -364,17 +364,17 @@ TEST_F(OpaqueTypeTests, RunModel) { std::vector val_dims = {2}; std::vector values = {1, 2}; // prepare inputs - MLValue ml_values; + OrtValue ml_values; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), val_dims, values, &ml_values); std::vector ind_dims = {2}; std::vector indicies = {1, 4}; - MLValue ml_indicies; + OrtValue ml_indicies; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), ind_dims, indicies, &ml_indicies); std::vector shape_dims = {1}; std::vector shape = {5}; - MLValue ml_shape; + OrtValue ml_shape; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), shape_dims, shape, &ml_shape); NameMLValMap feeds; @@ -388,7 +388,7 @@ TEST_F(OpaqueTypeTests, RunModel) { std::vector output_names; output_names.push_back("sparse_tensor_shape"); - std::vector fetches; + std::vector fetches; EXPECT_TRUE(session_object.Run(run_options, feeds, output_names, &fetches).IsOK()); ASSERT_EQ(1, fetches.size()); diff --git a/onnxruntime/test/framework/session_state_test.cc b/onnxruntime/test/framework/session_state_test.cc index 62c6932c588a1..b499144883b4d 100644 --- a/onnxruntime/test/framework/session_state_test.cc +++ b/onnxruntime/test/framework/session_state_test.cc @@ -36,7 +36,7 @@ TEST(SessionStateTest, AddGetKernelTest) { .SetDoc("Input variable.") .Output(0, "output_1", "docstr for output_1.", "tensor(int32)"); ExecutionProviders execution_providers; - SessionState s{execution_providers}; + SessionState s{execution_providers, true}; onnxruntime::Model model("graph_1"); auto& graph = model.MainGraph(); diff --git a/onnxruntime/test/framework/test_utils.h b/onnxruntime/test/framework/test_utils.h index 2609f9acb6a72..417c1f5d03288 100644 --- a/onnxruntime/test/framework/test_utils.h +++ b/onnxruntime/test/framework/test_utils.h @@ -15,21 +15,22 @@ namespace onnxruntime { namespace test { +// Doesn't work with ExecutionProviders class and KernelRegistryManager IExecutionProvider* TestCPUExecutionProvider(); #ifdef USE_CUDA +// Doesn't work with ExecutionProviders class and KernelRegistryManager IExecutionProvider* TestCudaExecutionProvider(); #endif #ifdef USE_TENSORRT +// Doesn't work with ExecutionProviders class and KernelRegistryManager IExecutionProvider* TestTensorrtExecutionProvider(); #endif template -void CreateMLValue(AllocatorPtr alloc, - const std::vector& dims, - const std::vector& value, - MLValue* p_mlvalue) { +void CreateMLValue(AllocatorPtr alloc, const std::vector& dims, const std::vector& value, + OrtValue* p_mlvalue) { TensorShape shape(dims); auto element_type = DataTypeImpl::GetType(); std::unique_ptr p_tensor = std::make_unique(element_type, @@ -44,9 +45,7 @@ void CreateMLValue(AllocatorPtr alloc, } template -void AllocateMLValue(AllocatorPtr alloc, - const std::vector& dims, - MLValue* p_mlvalue) { +void AllocateMLValue(AllocatorPtr alloc, const std::vector& dims, OrtValue* p_mlvalue) { TensorShape shape(dims); auto element_type = DataTypeImpl::GetType(); std::unique_ptr p_tensor = std::make_unique(element_type, diff --git a/onnxruntime/test/ir/graph_test.cc b/onnxruntime/test/ir/graph_test.cc index 74ba5b025e727..992c88b7a3eed 100644 --- a/onnxruntime/test/ir/graph_test.cc +++ b/onnxruntime/test/ir/graph_test.cc @@ -229,7 +229,7 @@ TEST(ResolvingGraphTest, GraphConstruction_VerifyNoDuplicateName) { auto& node_with_dup_name = graph.AddNode("node_1", "Variable", "node 2", inputs, outputs); auto status = graph.Resolve(); EXPECT_FALSE(status.IsOK()); - EXPECT_EQ("Error: two nodes with same node name (node_1).", status.ErrorMessage()); + EXPECT_EQ("This is an invalid model. Error: two nodes with same node name (node_1).", status.ErrorMessage()); graph.RemoveNode(node_with_dup_name.Index()); // Case 2: Adding two nodes with same output arg name should fail. @@ -258,7 +258,7 @@ TEST(ResolvingGraphTest, GraphConstruction_VerifyNodeAndOpMatch) { graph.AddNode("node_1", "OpNotExist", "node 1", inputs, outputs); auto status = graph.Resolve(); EXPECT_FALSE(status.IsOK()); - EXPECT_EQ(0, status.ErrorMessage().find_first_of("No Schema registered for OpNotExist")); + EXPECT_EQ(0, status.ErrorMessage().find_first_of("This is an invalid model. No Schema registered for OpNotExist")); } TEST(ResolvingGraphTest, GraphConstruction_CheckIsAcyclic) { @@ -626,7 +626,7 @@ TEST(ResolvingGraphTest, GraphConstruction_CheckIsNotAcyclic) { auto status = graph.Resolve(); EXPECT_FALSE(status.IsOK()); - EXPECT_EQ("Error: the graph is not acyclic.", status.ErrorMessage()); + EXPECT_EQ("This is an invalid model. Error: the graph is not acyclic.", status.ErrorMessage()); } TEST(ResolvingGraphTest, GraphConstruction_OnlyInitializer) { diff --git a/onnxruntime/test/ir/utils_test.cc b/onnxruntime/test/ir/utils_test.cc index 0cc0d98a0e509..96b87974ea4c6 100644 --- a/onnxruntime/test/ir/utils_test.cc +++ b/onnxruntime/test/ir/utils_test.cc @@ -207,7 +207,7 @@ static void UpdateSubgraphWhenRemovingNode(bool include_nested = false) { auto& node_to_remove = *graph.GetNode(1); const auto& if_node = *graph.GetNode(2); - bool removed = graph_utils::RemoveSingleInputNode(graph, node_to_remove); + bool removed = graph_utils::RemoveNode(graph, node_to_remove); ASSERT_TRUE(removed); // check subgraph implicit input was updated @@ -238,7 +238,7 @@ static void DontRemoveNodeIfItWillBreakSubgraph(bool test_nested = false) { auto& graph = model.MainGraph(); auto& node_to_remove = *graph.GetNode(1); - bool removed = graph_utils::RemoveSingleInputNode(graph, node_to_remove); + bool removed = graph_utils::RemoveNode(graph, node_to_remove); ASSERT_FALSE(removed); } @@ -287,7 +287,7 @@ TEST(GraphUtils, TestMultiEdgeRemovalNodes) { ASSERT_EQ(nodes[2]->GetOutputEdgesCount(), 2); // Remove id_2. This leaves id_0 with 3 output edges. id_0 is now incoming node to id_3 and id_4. - ASSERT_TRUE(graph_utils::RemoveSingleInputNode(graph, *nodes[2])); + ASSERT_TRUE(graph_utils::RemoveNode(graph, *nodes[2])); ASSERT_EQ(graph.NumberOfNodes(), 4); ASSERT_EQ(nodes[0]->GetOutputEdgesCount(), 3); ASSERT_EQ(nodes[3]->InputDefs().size(), 1); @@ -296,12 +296,61 @@ TEST(GraphUtils, TestMultiEdgeRemovalNodes) { ASSERT_TRUE(nodes[4]->InputDefs()[0]->Name() == "id_0_out"); // Remove id_0 - ASSERT_TRUE(graph_utils::RemoveSingleInputNode(graph, *nodes[0])); + ASSERT_TRUE(graph_utils::RemoveNode(graph, *nodes[0])); ASSERT_EQ(graph.NumberOfNodes(), 3); ASSERT_TRUE(nodes[1]->InputDefs()[0]->Name() == "id_0_in"); ASSERT_TRUE(nodes[3]->InputDefs()[0]->Name() == "id_0_in"); ASSERT_TRUE(nodes[4]->InputDefs()[0]->Name() == "id_0_in"); } +TEST(GraphUtils, TestMultiOutputRemoveNode) { + + Model model("MultiOutputRemovalGraph"); + auto& graph = model.MainGraph(); + + TypeProto float_tensor; + float_tensor.mutable_tensor_type()->set_elem_type(TensorProto_DataType_FLOAT); + float_tensor.mutable_tensor_type()->mutable_shape()->add_dim()->set_dim_value(1); + + TypeProto bool_tensor; + bool_tensor.mutable_tensor_type()->set_elem_type(TensorProto_DataType_BOOL); + bool_tensor.mutable_tensor_type()->mutable_shape()->add_dim()->set_dim_value(1); + + auto& do_0_in = graph.GetOrCreateNodeArg("do_0_in", &float_tensor); + auto& do_0_out = graph.GetOrCreateNodeArg("do_0_out", &float_tensor); + auto& do_0_out1 = graph.GetOrCreateNodeArg("do_0_out1", &bool_tensor); + auto& id_1_out = graph.GetOrCreateNodeArg("id_1_out", &float_tensor); + auto& id_2_out = graph.GetOrCreateNodeArg("id_2_out", &bool_tensor); + + std::vector nodes; + nodes.push_back(&graph.AddNode("do_0", "Dropout", "Dropout node 0", {&do_0_in}, {&do_0_out, &do_0_out1})); + nodes.push_back(&graph.AddNode("id_1", "Identity", "Identity node 1", {&do_0_out}, {&id_1_out})); + nodes.push_back(&graph.AddNode("id_2", "Identity", "Identity node 2", {&do_0_out1}, {&id_2_out})); + + std::vector graph_outputs; + graph_outputs.push_back(&id_2_out); + graph.SetOutputs(graph_outputs); + + auto status = graph.Resolve(); + ASSERT_TRUE(status.IsOK()) << status.ErrorMessage(); + + ASSERT_EQ(graph.NumberOfNodes(), 3); + + // Check inputs/outputs of do_0, id_1, id_2 + ASSERT_EQ(nodes[0]->GetOutputEdgesCount(), 2); + ASSERT_EQ(nodes[1]->GetInputEdgesCount(), 1); + ASSERT_EQ(nodes[2]->GetInputEdgesCount(), 1); + + // Try to remove do_0, which should return false + // because both outputs are consumed by downstream Operators. + ASSERT_FALSE(graph_utils::RemoveNode(graph, *nodes[0])); + + // Try removing do_0 after removing id_2, which should return true + // because it now has exactly one output consumed by downstream Operators. + ASSERT_TRUE(graph_utils::RemoveNode(graph, *nodes[1])); + ASSERT_FALSE(graph_utils::IsOutputUsed(*nodes[0], 0)); + ASSERT_TRUE(graph_utils::RemoveNode(graph, *nodes[0])); +} + } // namespace test } // namespace onnxruntime diff --git a/onnxruntime/test/onnx/OrtValueList.h b/onnxruntime/test/onnx/OrtValueList.h index 391a21dc86ab0..a8de05f1f5b27 100644 --- a/onnxruntime/test/onnx/OrtValueList.h +++ b/onnxruntime/test/onnx/OrtValueList.h @@ -12,6 +12,7 @@ class OrtValueArray { std::vector values; public: + ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(OrtValueArray); //n must be non-negative OrtValueArray(int n) : values(static_cast(n), nullptr){}; ~OrtValueArray() { diff --git a/onnxruntime/test/onnx/TestCase.cc b/onnxruntime/test/onnx/TestCase.cc index bdf040aed93fc..d6ca87cc6bea2 100644 --- a/onnxruntime/test/onnx/TestCase.cc +++ b/onnxruntime/test/onnx/TestCase.cc @@ -522,7 +522,7 @@ void OnnxTestCase::LoadTestData(size_t id, HeapBuffer& b, std::unordered_map file_prefix = is_input ? ORT_TSTR("input_") : ORT_TSTR("output_"); - if (!filename_str.compare(0, file_prefix.length(), file_prefix.c_str())) { + if (!filename_str.compare(0, file_prefix.length(), file_prefix)) { std::basic_string p = ConcatPathComponent(dir_path, filename_str); test_data_pb_files.push_back(p); } diff --git a/onnxruntime/test/onnx/TestCase.h b/onnxruntime/test/onnx/TestCase.h index 5d7fe59453d78..56a5eb9dfe95f 100644 --- a/onnxruntime/test/onnx/TestCase.h +++ b/onnxruntime/test/onnx/TestCase.h @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include diff --git a/onnxruntime/test/onnx/main.cc b/onnxruntime/test/onnx/main.cc index 9ad622a74e0fe..008854e16c3ef 100644 --- a/onnxruntime/test/onnx/main.cc +++ b/onnxruntime/test/onnx/main.cc @@ -28,11 +28,13 @@ void usage() { "Options:\n" "\t-j [models]: Specifies the number of models to run simultaneously.\n" "\t-A : Disable memory arena\n" + "\t-M : Disable memory pattern\n" "\t-c [runs]: Specifies the number of Session::Run() to invoke simultaneously for each model.\n" "\t-r [repeat]: Specifies the number of times to repeat\n" "\t-v: verbose\n" "\t-n [test_case_name]: Specifies a single test case to run.\n" - "\t-e [EXECUTION_PROVIDER]: EXECUTION_PROVIDER could be 'cpu', 'cuda', 'mkldnn', 'tensorrt' or 'ngraph'. Default: 'cpu'.\n" + "\t-e [EXECUTION_PROVIDER]: EXECUTION_PROVIDER could be 'cpu', 'cuda', 'mkldnn', 'tensorrt' or 'ngraph'. " + "Default: 'cpu'.\n" "\t-x: Use parallel executor, default (without -x): sequential executor.\n" "\t-h: help\n"); } @@ -62,14 +64,14 @@ int GetNumCpuCores() { return processorCoreCount; } #else -int GetNumCpuCores() { return std::thread::hardware_concurrency(); } +int GetNumCpuCores() { return static_cast(std::thread::hardware_concurrency()); } #endif } // namespace #ifdef _WIN32 -int real_main(int argc, wchar_t* argv[], OrtEnv** p_env) { +int real_main(int argc, wchar_t* argv[], Ort::Env& env) { #else -int real_main(int argc, char* argv[], OrtEnv** p_env) { +int real_main(int argc, char* argv[], Ort::Env& env) { #endif // if this var is not empty, only run the tests with name in this list std::vector > whitelisted_test_cases; @@ -83,10 +85,11 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { bool enable_ngraph = false; bool enable_nuphar = false; bool enable_tensorrt = false; + bool enable_mem_pattern = true; OrtLoggingLevel logging_level = ORT_LOGGING_LEVEL_WARNING; { int ch; - while ((ch = getopt(argc, argv, ORT_TSTR("Ac:hj:m:n:r:e:xv"))) != -1) { + while ((ch = getopt(argc, argv, ORT_TSTR("Ac:hj:Mn:r:e:xv"))) != -1) { switch (ch) { case 'A': enable_cpu_mem_arena = false; @@ -115,8 +118,8 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { return -1; } break; - case 'm': - // ignore. + case 'M': + enable_mem_pattern = false; break; case 'n': // run only some whitelisted tests @@ -164,16 +167,14 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { usage(); return -1; } - OrtEnv* env; - { - OrtStatus* ost = OrtCreateEnv(logging_level, "Default", &env); - if (ost != nullptr) { - fprintf(stderr, "Error creating environment: %s \n", OrtGetErrorMessage(ost)); - OrtReleaseStatus(ost); - return -1; - } - *p_env = env; + + try { + env = Ort::Env{logging_level, "Default"}; + } catch (std::exception& ex) { + fprintf(stderr, "Error creating environment: %s \n", ex.what()); + return -1; } + std::vector > data_dirs; TestResultStat stat; @@ -184,11 +185,15 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { double per_sample_tolerance = 1e-3; // when cuda is enabled, set it to a larger value for resolving random MNIST test failure double relative_per_sample_tolerance = enable_cuda ? 0.017 : 1e-3; - SessionOptionsWrapper sf(env); + Ort::SessionOptions sf; if (enable_cpu_mem_arena) sf.EnableCpuMemArena(); else sf.DisableCpuMemArena(); + if (enable_mem_pattern) + sf.EnableMemPattern(); + else + sf.DisableMemPattern(); if (enable_sequential_execution) sf.EnableSequentialExecution(); else @@ -240,14 +245,14 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { #if (defined(_WIN32) && !defined(_WIN64)) || (defined(__GNUG__) && !defined(__LP64__)) //Minimize mem consumption - LoadTests(data_dirs, whitelisted_test_cases, per_sample_tolerance, relative_per_sample_tolerance, [&stat, &sf, enable_cuda, &cuda_flaky_tests](ITestCase* l) { + LoadTests(data_dirs, whitelisted_test_cases, per_sample_tolerance, relative_per_sample_tolerance, [&stat, &sf, enable_cuda, &cuda_flaky_tests, &env](ITestCase* l) { std::unique_ptr test_case_ptr(l); if (enable_cuda && cuda_flaky_tests.find(l->GetTestCaseName()) != cuda_flaky_tests.end()) { return; } TestResultStat per_case_stat; std::vector per_case_tests = {l}; - TestEnv per_case_args(per_case_tests, per_case_stat, sf); + TestEnv per_case_args(per_case_tests, per_case_stat, env, sf); RunTests(per_case_args, 1, 1, 1, GetDefaultThreadPool(Env::Default())); stat += per_case_stat; }); @@ -265,7 +270,8 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { } } } - TestEnv args(tests, stat, sf); + + TestEnv args(tests, stat, env, sf); Status st = RunTests(args, p_models, concurrent_session_runs, static_cast(repeat_count), GetDefaultThreadPool(Env::Default())); if (!st.IsOK()) { @@ -315,10 +321,6 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { {"flatten_default_axis", "disable reason"}, {"gemm_broadcast", "disable reason"}, {"gemm_nobroadcast", "disable reason"}, - {"greater", "disable reason"}, - {"greater_bcast", "disable reason"}, - {"less", "disable reason"}, - {"less_bcast", "disable reason"}, {"matmul_2d", "disable reason"}, {"matmul_3d", "disable reason"}, {"matmul_4d", "disable reason"}, @@ -355,14 +357,15 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { {"tf_mobilenet_v2_1.4_224", "result mismatch"}, {"tf_mobilenet_v1_1.0_224", "result mismatch"}, {"mobilenetv2-1.0", "result mismatch"}, - {"mxnet_arcface", "result mismatch"}, - {"mod_float_mixed_sign_example", "faulty test"} + {"mxnet_arcface", "result mismatch"} }; #ifdef USE_NGRAPH broken_tests["dequantizelinear"] = "ambiguity in scalar dimensions [] vs [1]"; broken_tests["qlinearconv"] = "ambiguity in scalar dimensions [] vs [1]"; broken_tests["quantizelinear"] = "ambiguity in scalar dimensions [] vs [1]"; + broken_tests["tiny_yolov2"] = "temporarily disable due to graph resolve failure."; + broken_tests["operator_repeat_dim_overflow"] = "temporarily disable due to graph resolve failure."; #endif #ifdef USE_CUDA @@ -431,17 +434,16 @@ int wmain(int argc, wchar_t* argv[]) { #else int main(int argc, char* argv[]) { #endif - OrtEnv* env = nullptr; + Ort::Env env{nullptr}; int retval = -1; try { - retval = real_main(argc, argv, &env); + retval = real_main(argc, argv, env); } catch (std::exception& ex) { fprintf(stderr, "%s\n", ex.what()); retval = -1; } - if (env) { - OrtReleaseEnv(env); - } else { + // Release the protobuf library if we failed to create an env (the env will release it automatically on destruction) + if (!env) { ::google::protobuf::ShutdownProtobufLibrary(); } return retval; diff --git a/onnxruntime/test/onnx/microbenchmark/model_init.cc b/onnxruntime/test/onnx/microbenchmark/model_init.cc index f016c77230dac..5a841276e06bf 100644 --- a/onnxruntime/test/onnx/microbenchmark/model_init.cc +++ b/onnxruntime/test/onnx/microbenchmark/model_init.cc @@ -112,7 +112,7 @@ Status CreateKernelRegistryManagerFromModel(std::unique_ptr kernel_registry_manager = std::make_unique(); ORT_RETURN_IF_ERROR(kernel_registry_manager->RegisterKernels(*execution_providers)); - SessionState s{*execution_providers}; + SessionState s{*execution_providers, true}; s.SetLogger(logging::LoggingManager::DefaultLogger()); ORT_RETURN_IF_ERROR(model->MainGraph().Resolve()); @@ -179,7 +179,7 @@ static void BM_PartitionModel_tiny_yolo(benchmark::State& state) { for (auto _ : state) { state.PauseTiming(); std::shared_ptr model = std::make_shared(model_proto); - SessionState s{*execution_providers}; + SessionState s{*execution_providers, true}; s.SetLogger(logging::LoggingManager::DefaultLogger()); BM_BREAK_IF_ERROR(model->MainGraph().Resolve()); s.SetGraphViewer(std::make_unique(model->MainGraph())); @@ -209,7 +209,7 @@ static void BM_PartitionModel_inception_v4(benchmark::State& state) { for (auto _ : state) { state.PauseTiming(); std::shared_ptr model = std::make_shared(model_proto); - SessionState s{*execution_providers}; + SessionState s{*execution_providers, true}; s.SetLogger(logging::LoggingManager::DefaultLogger()); BM_BREAK_IF_ERROR(model->MainGraph().Resolve()); s.SetGraphViewer(std::make_unique(model->MainGraph())); diff --git a/onnxruntime/test/onnx/runner.cc b/onnxruntime/test/onnx/runner.cc index fc00f2ec5b4b2..b53ecf743ad9d 100644 --- a/onnxruntime/test/onnx/runner.cc +++ b/onnxruntime/test/onnx/runner.cc @@ -18,7 +18,7 @@ #include #endif -#include +#include #include "TestCase.h" #include "heap_buffer.h" #include "OrtValueList.h" @@ -30,11 +30,11 @@ using ::onnxruntime::common::Status; void ORT_CALLBACK RunTestCase(ORT_CALLBACK_INSTANCE pci, void* context, ORT_WORK work) { OnnxRuntimeCloseThreadpoolWork(work); assert(context != nullptr); - TestCaseTask* task((TestCaseTask*)context); + TestCaseTask* task(static_cast(context)); ITestCase* info = task->env.tests[task->task_id]; std::shared_ptr ret; try { - RunSingleTestCase(info, task->env.sf, task->concurrent_runs, task->repeat_count, task->pool, pci, [task](std::shared_ptr result, ORT_CALLBACK_INSTANCE pci) { + RunSingleTestCase(info, task->env.env, task->env.sf, task->concurrent_runs, task->repeat_count, task->pool, pci, [task](std::shared_ptr result, ORT_CALLBACK_INSTANCE pci) { return OnTestCaseFinished(pci, task, result); }); return; @@ -96,7 +96,7 @@ PTestRunner::PTestRunner(OrtSession* session1, void ORT_CALLBACK RunSingleDataItem(ORT_CALLBACK_INSTANCE instance, void* context, ORT_WORK work) { OnnxRuntimeCloseThreadpoolWork(work); - DataTask* task((DataTask*)context); + DataTask* task(static_cast(context)); PTestRunner* env = task->env; const size_t task_id = task->task_id; delete task; @@ -128,7 +128,7 @@ Status OnTestCaseFinished(ORT_CALLBACK_INSTANCE pci, TestCaseTask* task, std::sh //Do not run this function in the thread pool passed in static Status ParallelRunTests(TestEnv& env, int p_models, size_t current_runs, size_t repeat_count, PThreadPool pool) { - p_models = (int)std::min(p_models, env.tests.size()); + p_models = static_cast(std::min(p_models, env.tests.size())); LOGF_DEFAULT(ERROR, "Running tests in parallel: at most %d models at any time", p_models); env.next_test_to_run = p_models; for (int i = 0; i != p_models; ++i) { @@ -169,7 +169,7 @@ Status RunTests(TestEnv& env, int p_models, int concurrent_runs, size_t repeat_c ORT_EVENT ev; ORT_RETURN_IF_ERROR(CreateOnnxRuntimeEvent(&ev)); try { - RunSingleTestCase(env.tests[i], env.sf, concurrent_runs, repeat_count, tpool, nullptr, [repeat_count, &results, ev, concurrent_runs, test_case_name](std::shared_ptr result, ORT_CALLBACK_INSTANCE pci) { + RunSingleTestCase(env.tests[i], env.env, env.sf, concurrent_runs, repeat_count, tpool, nullptr, [repeat_count, &results, ev, concurrent_runs, test_case_name](std::shared_ptr result, ORT_CALLBACK_INSTANCE pci) { //TODO:output this information to a xml if (concurrent_runs == 1) { TIME_SPEC ts = result->GetSpentTime(); @@ -306,10 +306,6 @@ void DataRunner::RunTask(size_t task_id, ORT_CALLBACK_INSTANCE pci, bool store_r OnTaskFinished(task_id, res, pci); } -std::pair CompareGenericValue(const OrtValue* o, const OrtValue* expected_mlvalue, double per_sample_tolerance, double relative_per_sample_tolerance, - bool post_processing) { - return onnxruntime::CompareMLValue(*(MLValue*)o, *(MLValue*)expected_mlvalue, per_sample_tolerance, relative_per_sample_tolerance, post_processing); -} EXECUTE_RESULT DataRunner::RunTaskImpl(size_t task_id) { HeapBuffer holder; std::unordered_map feeds; @@ -338,7 +334,8 @@ EXECUTE_RESULT DataRunner::RunTaskImpl(size_t task_id) { ++input_index; } - TIME_SPEC start_time, end_time; + TIME_SPEC start_time; + TIME_SPEC end_time; OrtValueArray output_values(static_cast(output_count)); { std::vector output_names_raw_ptr(output_count); @@ -353,7 +350,8 @@ EXECUTE_RESULT DataRunner::RunTaskImpl(size_t task_id) { GetMonotonicTimeCounter(&end_time); AccumulateTimeSpec(&spent_time_, &start_time, &end_time); - double per_sample_tolerance, relative_per_sample_tolerance; + double per_sample_tolerance; + double relative_per_sample_tolerance; bool post_procesing; Status status; if (!(status = c_->GetPerSampleTolerance(&per_sample_tolerance)).IsOK()) { @@ -395,12 +393,14 @@ EXECUTE_RESULT DataRunner::RunTaskImpl(size_t task_id) { break; } OrtValue* actual_output_value = iter->second; - std::pair ret = CompareGenericValue(actual_output_value, expected_output_value, per_sample_tolerance, relative_per_sample_tolerance, post_procesing); + std::pair ret = + CompareOrtValue(*actual_output_value, *expected_output_value, per_sample_tolerance, + relative_per_sample_tolerance, post_procesing); COMPARE_RESULT compare_result = ret.first; if (compare_result == COMPARE_RESULT::SUCCESS) { const ONNX_NAMESPACE::ValueInfoProto* v = name_output_value_info_proto[output_name]; if (v == nullptr) continue; - ret = VerifyValueInfo(*v, actual_output_value); + ret = VerifyValueInfo(*v, Ort::Unowned{actual_output_value}); compare_result = ret.first; if (compare_result != COMPARE_RESULT::SUCCESS) { switch (compare_result) { @@ -458,16 +458,15 @@ void SeqTestRunner::Start(ORT_CALLBACK_INSTANCE pci, size_t) { finish(pci); } -void RunSingleTestCase(ITestCase* info, const onnxruntime::SessionOptionsWrapper& sf, size_t concurrent_runs, size_t repeat_count, PThreadPool tpool, ORT_CALLBACK_INSTANCE pci, TestCaseCallBack on_finished) { +void RunSingleTestCase(ITestCase* info, Ort::Env& env, const Ort::SessionOptions& sf, size_t concurrent_runs, size_t repeat_count, PThreadPool tpool, ORT_CALLBACK_INSTANCE pci, TestCaseCallBack on_finished) { std::shared_ptr ret; size_t data_count = info->GetDataCount(); try { DataRunner* r = nullptr; std::string node_name = info->GetNodeName(); - auto sf2 = sf.clone(); - sf2.SetSessionLogId(info->GetTestCaseName().c_str()); - std::unique_ptr session_object( - sf2.OrtCreateSession(info->GetModelUrl()), OrtReleaseSession); + auto sf2 = sf.Clone(); + sf2.SetLogId(info->GetTestCaseName().c_str()); + Ort::Session session_object{env, info->GetModelUrl(), sf2}; LOGF_DEFAULT(INFO, "testing %s\n", info->GetTestCaseName().c_str()); //temp hack. Because we have no resource control. We may not have enough memory to run this test in parallel if (info->GetTestCaseName() == "coreml_FNS-Candy_ImageNet") @@ -479,6 +478,13 @@ void RunSingleTestCase(ITestCase* info, const onnxruntime::SessionOptionsWrapper } r->Start(pci, concurrent_runs); return; + } catch (const Ort::Exception& ex) { + if (ex.GetOrtErrorCode() != ORT_NOT_IMPLEMENTED) + throw; + + LOGF_DEFAULT(ERROR, "Test %s failed:%s", info->GetTestCaseName().c_str(), ex.what()); + std::string node_name; + ret = std::make_shared(data_count, EXECUTE_RESULT::NOT_SUPPORT, ""); } catch (onnxruntime::NotImplementedException& ex) { LOGF_DEFAULT(ERROR, "Test %s failed:%s", info->GetTestCaseName().c_str(), ex.what()); std::string node_name; diff --git a/onnxruntime/test/onnx/runner.h b/onnxruntime/test/onnx/runner.h index a7821b5e9f351..9a26496b454f1 100644 --- a/onnxruntime/test/onnx/runner.h +++ b/onnxruntime/test/onnx/runner.h @@ -27,7 +27,6 @@ struct TestCaseTask { }; void ORT_CALLBACK RunTestCase(ORT_CALLBACK_INSTANCE instance, void* context, ORT_WORK work); -//TODO: implement this function for Linux void ORT_CALLBACK RunSingleDataItem(ORT_CALLBACK_INSTANCE instance, void* context, ORT_WORK work); ::onnxruntime::common::Status OnTestCaseFinished(ORT_CALLBACK_INSTANCE pci, TestCaseTask* task, std::shared_ptr result); @@ -136,5 +135,5 @@ void LoadTests(const std::vector>& input_paths //Do not run this function in the thread pool passed in ::onnxruntime::common::Status RunTests(TestEnv& env, int p_models, int concurrent_runs, size_t repeat_count, PThreadPool tpool); EXECUTE_RESULT StatusCodeToExecuteResult(int input); -void RunSingleTestCase(ITestCase* info, const onnxruntime::SessionOptionsWrapper& sf, size_t concurrent_runs, +void RunSingleTestCase(ITestCase* info, Ort::Env& env, const Ort::SessionOptions& sf, size_t concurrent_runs, size_t repeat_count, PThreadPool tpool, ORT_CALLBACK_INSTANCE pci, TestCaseCallBack on_finished); diff --git a/onnxruntime/test/onnx/simple_thread_pool.h b/onnxruntime/test/onnx/simple_thread_pool.h deleted file mode 100644 index 79b477ef33407..0000000000000 --- a/onnxruntime/test/onnx/simple_thread_pool.h +++ /dev/null @@ -1,154 +0,0 @@ -// Eigen is "a C++ template library for linear algebra: -//matrices, vectors, numerical solvers, and related algorithms." -//See http://eigen.tuxfamily.org/index.php?title=Main_Page. -//This material is licensed under the MPL v2.0. -// -// Copyright (C) 2014 Benoit Steiner -// -// This Source Code Form is subject to the terms of the Mozilla -// Public License v. 2.0. If a copy of the MPL was not distributed -// with this file, You can obtain one at http://mozilla.org/MPL/2.0/. - -#pragma once -#include -#include - -//: copied from Eigen, with just one tiny modification: remove the default vaule of the constructor of SimpleThreadPoolTempl -namespace onnxruntime { - -// The implementation of the ThreadPool type ensures that the Schedule method -// runs the functions it is provided in FIFO order when the scheduling is done -// by a single thread. -// Environment provides a way to create threads and also allows to intercept -// task submission and execution. -template -class SimpleThreadPoolTempl : public Eigen::ThreadPoolInterface { - public: - // Construct a pool that contains "num_threads" threads. - explicit SimpleThreadPoolTempl(int num_threads, const Environment& env) - : env_(env), threads_(num_threads), waiters_(num_threads) { - for (int i = 0; i < num_threads; i++) { - threads_.push_back(env.CreateThread([this, i]() { WorkerLoop(i); })); - } - } - - // Wait until all scheduled work has finished and then destroy the - // set of threads. - ~SimpleThreadPoolTempl() { - { - // Wait for all work to get done. - std::unique_lock l(mu_); - while (!pending_.empty()) { - empty_.wait(l); - } - exiting_ = true; - - // Wakeup all waiters. - for (auto w : waiters_) { - w->ready = true; - w->task.f = nullptr; - w->cv.notify_one(); - } - } - - // Wait for threads to finish. - for (auto t : threads_) { - delete t; - } - } - - // Schedule fn() for execution in the pool of threads. The functions are - // executed in the order in which they are scheduled. - void Schedule(std::function fn) final { - Task t = env_.CreateTask(std::move(fn)); - std::unique_lock l(mu_); - if (waiters_.empty()) { - pending_.push_back(std::move(t)); - } else { - Waiter* w = waiters_.back(); - waiters_.pop_back(); - w->ready = true; - w->task = std::move(t); - w->cv.notify_one(); - } - } - - int NumThreads() const final { - return static_cast(threads_.size()); - } - - int CurrentThreadId() const final { - const PerThread* pt = this->GetPerThread(); - if (pt->pool == this) { - return pt->thread_id; - } else { - return -1; - } - } - - protected: - void WorkerLoop(int thread_id) { - std::unique_lock l(mu_); - PerThread* pt = GetPerThread(); - pt->pool = this; - pt->thread_id = thread_id; - Waiter w; - Task t; - while (!exiting_) { - if (pending_.empty()) { - // Wait for work to be assigned to me - w.ready = false; - waiters_.push_back(&w); - while (!w.ready) { - w.cv.wait(l); - } - t = w.task; - w.task.f = nullptr; - } else { - // Pick up pending work - t = std::move(pending_.front()); - pending_.pop_front(); - if (pending_.empty()) { - empty_.notify_all(); - } - } - if (t.f) { - mu_.unlock(); - env_.ExecuteTask(t); - t.f = nullptr; - mu_.lock(); - } - } - } - - private: - typedef typename Environment::Task Task; - typedef typename Environment::EnvThread Thread; - - struct Waiter { - onnxruntime::OrtCondVar cv; - Task task; - bool ready; - }; - - struct PerThread { - constexpr PerThread() : pool(NULL), thread_id(-1) {} - SimpleThreadPoolTempl* pool; // Parent pool, or null for normal threads. - int thread_id; // Worker thread index in pool. - }; - - const Environment& env_; - onnxruntime::OrtMutex mu_; - Eigen::MaxSizeVector threads_; // All threads - Eigen::MaxSizeVector waiters_; // Stack of waiting threads. - std::deque pending_; // Queue of pending work - onnxruntime::OrtCondVar empty_; // Signaled on pending_.empty() - bool exiting_ = false; - - PerThread* GetPerThread() const { - EIGEN_THREAD_LOCAL PerThread per_thread; - return &per_thread; - } -}; - -} // namespace onnxruntime diff --git a/onnxruntime/test/onnx/sync_api.cc b/onnxruntime/test/onnx/sync_api.cc index 10265a6510a23..e75da8adc62ac 100644 --- a/onnxruntime/test/onnx/sync_api.cc +++ b/onnxruntime/test/onnx/sync_api.cc @@ -2,6 +2,7 @@ // Licensed under the MIT License. #include "sync_api.h" +#include #include #if defined(_MSC_VER) @@ -70,7 +71,7 @@ static std::once_flag default_pool_init; PThreadPool GetDefaultThreadPool(const onnxruntime::Env& env) { std::call_once(default_pool_init, [&env] { int core_num = env.GetNumCpuCores(); - default_pool.reset(new DefaultThreadPoolType(core_num)); + default_pool = std::make_unique(core_num); }); return default_pool.get(); } @@ -86,10 +87,9 @@ Status OnnxRuntimeSetEventWhenCallbackReturns(ORT_CALLBACK_INSTANCE pci, ORT_EVE } finish_event->finish_event_data.notify_all(); return Status::OK(); - } else { + } pci->AddEvent(finish_event); return Status::OK(); - } } void OnnxRuntimeCallbackInstance::AddEvent(ORT_EVENT event) { diff --git a/onnxruntime/test/onnx/testenv.cc b/onnxruntime/test/onnx/testenv.cc index 1826f60931283..b5cb918605928 100644 --- a/onnxruntime/test/onnx/testenv.cc +++ b/onnxruntime/test/onnx/testenv.cc @@ -5,11 +5,9 @@ #include "FixedCountFinishCallback.h" #include -using onnxruntime::SessionOptionsWrapper; - using onnxruntime::Status; -TestEnv::TestEnv(const std::vector& tests1, TestResultStat& stat1, SessionOptionsWrapper& sf1) - : tests(tests1), next_test_to_run(0), stat(stat1), finished(new FixedCountFinishCallback(static_cast(tests1.size()))), sf(sf1) { +TestEnv::TestEnv(const std::vector& tests1, TestResultStat& stat1, Ort::Env& env1, Ort::SessionOptions& sf1) + : tests(tests1), next_test_to_run(0), stat(stat1), finished(new FixedCountFinishCallback(static_cast(tests1.size()))), env(env1), sf(sf1) { } TestEnv::~TestEnv() { diff --git a/onnxruntime/test/onnx/testenv.h b/onnxruntime/test/onnx/testenv.h index 34b591f733b51..6bc3607da4398 100644 --- a/onnxruntime/test/onnx/testenv.h +++ b/onnxruntime/test/onnx/testenv.h @@ -20,8 +20,9 @@ class TestEnv { std::atomic_int next_test_to_run; TestResultStat& stat; FixedCountFinishCallback* finished; - const onnxruntime::SessionOptionsWrapper& sf; - TestEnv(const std::vector& tests, TestResultStat& stat1, onnxruntime::SessionOptionsWrapper& sf1); + Ort::Env& env; + const Ort::SessionOptions& sf; + TestEnv(const std::vector& tests, TestResultStat& stat1, Ort::Env& env, Ort::SessionOptions& sf1); ~TestEnv(); private: diff --git a/onnxruntime/test/optimizer/dummy_graph_transformer.h b/onnxruntime/test/optimizer/dummy_graph_transformer.h index b89f6c860be85..1bff4af37fcf3 100644 --- a/onnxruntime/test/optimizer/dummy_graph_transformer.h +++ b/onnxruntime/test/optimizer/dummy_graph_transformer.h @@ -47,7 +47,7 @@ class DummyRewriteRule : public RewriteRule { return true; } - Status Apply(Graph& /*graph*/, Node& /*node*/, bool& /*modified*/, bool& /*deleted*/) override { + Status Apply(Graph& /*graph*/, Node& /*node*/, RewriteRuleEffect& /*rule_effect*/) override { rewrite_rule_invoked_ = true; return Status::OK(); } diff --git a/onnxruntime/test/optimizer/graph_transform_test.cc b/onnxruntime/test/optimizer/graph_transform_test.cc index a2d96d6730fce..1d2facbbd7b30 100644 --- a/onnxruntime/test/optimizer/graph_transform_test.cc +++ b/onnxruntime/test/optimizer/graph_transform_test.cc @@ -7,6 +7,7 @@ #include "core/optimizer/graph_transformer.h" #include "core/optimizer/graph_transformer_mgr.h" #include "core/optimizer/identity_elimination.h" +#include "core/optimizer/dropout_elimination.h" #include "core/optimizer/slice_elimination.h" #include "core/optimizer/unsqueeze_elimination.h" #include "core/optimizer/conv_bn_fusion.h" @@ -52,16 +53,40 @@ TEST(GraphTransformationTests, IdentityElimination) { std::map op_to_count = CountOpsInGraph(graph); ASSERT_TRUE(op_to_count["Identity"] == 1); - auto rule_transformer = std::make_unique("RuleTransformer1"); - rule_transformer->Register(std::make_unique()); + auto rule_transformer_L1 = std::make_unique("RuleTransformer1"); + rule_transformer_L1->Register(std::make_unique()); onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; - graph_transformation_mgr.Register(std::move(rule_transformer), TransformerLevel::Level1); + graph_transformation_mgr.Register(std::move(rule_transformer_L1), TransformerLevel::Level1); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level1).IsOK()); op_to_count = CountOpsInGraph(graph); ASSERT_TRUE(op_to_count["Identity"] == 0); } +TEST(GraphTransformationTests, DropoutEliminationSingleOutput) { + string model_uri = MODEL_FOLDER + "dropout.onnx"; + std::shared_ptr model; + ASSERT_TRUE(Model::Load(model_uri, model).IsOK()); + Graph& graph = model->MainGraph(); + std::map op_to_count = CountOpsInGraph(graph); + ASSERT_TRUE(op_to_count["Identity"] == 5); + ASSERT_TRUE(op_to_count["Dropout"] == 6); + + auto rule_transformer_L1 = std::make_unique("RuleTransformer1"); + rule_transformer_L1->Register(std::make_unique()); + onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; + graph_transformation_mgr.Register(std::move(rule_transformer_L1), TransformerLevel::Level1); + ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level1).IsOK()); + + op_to_count = CountOpsInGraph(graph); + // Of the 6 Dropout nodes in the graph, all but the ones named `d1` and `d6` should have been removed. + // A Dropout node can be removed if its second, optional output `mask` is either missing or unused downstream. + // `d1` cannot be removed because an Identity node has its `mask` output as an input; + // `d6` cannot be removed because its `mask` output is marked as a graph output. + ASSERT_TRUE(op_to_count["Identity"] == 5); + ASSERT_TRUE(op_to_count["Dropout"] == 2); +} + TEST(GraphTransformationTests, SliceElimination) { string model_uri = MODEL_FOLDER + "slice-elim.onnx"; std::shared_ptr model; @@ -70,10 +95,10 @@ TEST(GraphTransformationTests, SliceElimination) { std::map op_to_count = CountOpsInGraph(graph); ASSERT_TRUE(op_to_count["Slice"] == 5); - auto rule_transformer = std::make_unique("RuleTransformer1"); - rule_transformer->Register(std::make_unique()); + auto rule_transformer_L1 = std::make_unique("RuleTransformer1"); + rule_transformer_L1->Register(std::make_unique()); onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; - graph_transformation_mgr.Register(std::move(rule_transformer), TransformerLevel::Level1); + graph_transformation_mgr.Register(std::move(rule_transformer_L1), TransformerLevel::Level1); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level1).IsOK()); op_to_count = CountOpsInGraph(graph); @@ -116,7 +141,7 @@ TEST(GraphTransformationTests, SubgraphWithConstantInputs) { RunOptions run_options; std::vector output_names = {"output"}; - std::vector fetches; + std::vector fetches; ASSERT_TRUE(session_object.Run(run_options, feeds, output_names, &fetches).IsOK()); } @@ -129,7 +154,9 @@ TEST(GraphTransformationTests, FuseConvBNNoBias) { Graph& graph = p_model->MainGraph(); onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); + auto rule_transformer_L2 = std::make_unique("RuleTransformerL2"); + rule_transformer_L2->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L2), TransformerLevel::Level2); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level2).IsOK()); @@ -148,12 +175,15 @@ TEST(GraphTransformationTests, FuseConvBNMulAddUnsqueeze) { Graph& graph = p_model->MainGraph(); onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; - auto rule_transformer = std::make_unique("RuleTransformer1"); - rule_transformer->Register(std::make_unique()); - graph_transformation_mgr.Register(std::move(rule_transformer), TransformerLevel::Level1); - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); + auto rule_transformer_L1 = std::make_unique("RuleTransformer1"); + rule_transformer_L1->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L1), TransformerLevel::Level1); + + auto rule_transformer_L2 = std::make_unique("RuleTransformerL2"); + rule_transformer_L2->Register(std::make_unique()); + rule_transformer_L2->Register(std::make_unique()); + rule_transformer_L2->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L2), TransformerLevel::Level2); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level1).IsOK()); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level2).IsOK()); @@ -201,10 +231,13 @@ TEST(GraphTransformationTests, FuseConvMulNoBias) { Graph& graph = p_model->MainGraph(); onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; - auto rule_transformer = std::make_unique("RuleTransformer1"); - rule_transformer->Register(std::make_unique()); - graph_transformation_mgr.Register(std::move(rule_transformer), TransformerLevel::Level1); - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); + auto rule_transformer_L1 = std::make_unique("RuleTransformer1"); + rule_transformer_L1->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L1), TransformerLevel::Level1); + + auto rule_transformer_L2 = std::make_unique("RuleTransformerL2"); + rule_transformer_L2->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L2), TransformerLevel::Level2); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level1).IsOK()); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level2).IsOK()); @@ -222,10 +255,13 @@ TEST(GraphTransformationTests, FuseConvAddNoBias) { Graph& graph = p_model->MainGraph(); onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; - auto rule_transformer = std::make_unique("RuleTransformer1"); - rule_transformer->Register(std::make_unique()); - graph_transformation_mgr.Register(std::move(rule_transformer), TransformerLevel::Level1); - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); + auto rule_transformer_L1 = std::make_unique("RuleTransformer1"); + rule_transformer_L1->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L1), TransformerLevel::Level1); + + auto rule_transformer_L2 = std::make_unique("RuleTransformerL2"); + rule_transformer_L2->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L2), TransformerLevel::Level2); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level1).IsOK()); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level2).IsOK()); @@ -243,8 +279,30 @@ TEST(GraphTransformationTests, FuseConvAddMul3D) { Graph& graph = p_model->MainGraph(); onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); - graph_transformation_mgr.Register(std::make_unique(), TransformerLevel::Level2); + auto rule_transformer_L2 = std::make_unique("RuleTransformerL2"); + rule_transformer_L2->Register(std::make_unique()); + rule_transformer_L2->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L2), TransformerLevel::Level2); + + ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level2).IsOK()); + + std::map op_to_count = CountOpsInGraph(graph); + ASSERT_TRUE(op_to_count["Add"] == 0); + ASSERT_TRUE(op_to_count["Mul"] == 0); +} + +TEST(GraphTransformationTests, FuseConvAddMul3D_2) { + string model_uri = MODEL_FOLDER + "fusion/fuse-conv-add-mul-3d-2.onnx"; + + std::shared_ptr p_model; + ASSERT_TRUE(Model::Load(model_uri, p_model).IsOK()); + Graph& graph = p_model->MainGraph(); + + onnxruntime::GraphTransformerManager graph_transformation_mgr{5}; + auto rule_transformer_L2 = std::make_unique("RuleTransformerL2"); + rule_transformer_L2->Register(std::make_unique()); + rule_transformer_L2->Register(std::make_unique()); + graph_transformation_mgr.Register(std::move(rule_transformer_L2), TransformerLevel::Level2); ASSERT_TRUE(graph_transformation_mgr.ApplyTransformers(graph, TransformerLevel::Level2).IsOK()); @@ -315,19 +373,18 @@ TEST(GraphTransformationTests, FuseConvBnAddMulFloat16) { std::shared_ptr p_model; ASSERT_TRUE(Model::Load(model_uri, p_model).IsOK()); - std::unique_ptr ConvBNFusion_transformer = std::make_unique(); - std::unique_ptr ConvMulFusion_transformer = std::make_unique(); - std::unique_ptr ConvAddFusion_transformer = std::make_unique(); - session_object.RegisterGraphTransformer(std::move(ConvBNFusion_transformer)); - session_object.RegisterGraphTransformer(std::move(ConvMulFusion_transformer)); - session_object.RegisterGraphTransformer(std::move(ConvAddFusion_transformer)); + auto rule_transformer_L2 = std::make_unique("RuleTransformerL2"); + rule_transformer_L2->Register(std::make_unique()); + rule_transformer_L2->Register(std::make_unique()); + rule_transformer_L2->Register(std::make_unique()); + session_object.RegisterGraphTransformer(std::move(rule_transformer_L2), TransformerLevel::Level2); ASSERT_TRUE(session_object.Initialize().IsOK()); NameMLValMap feeds; RunOptions run_options; run_options.run_tag = "one session/one tag"; - MLValue ml_value_x; + OrtValue ml_value_x; auto x_f = MLFloat16(math::floatToHalf(1.0)); std::vector dims_x = {1, 1, 3, 3}; @@ -335,12 +392,13 @@ TEST(GraphTransformationTests, FuseConvBnAddMulFloat16) { for (int i = 0; i < 9; ++i) { values_x.push_back(x_f); } - CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_x, values_x, &ml_value_x); + CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), + dims_x, values_x, &ml_value_x); feeds.insert(std::make_pair("X", ml_value_x)); std::vector output_names; output_names.push_back("PROD"); - std::vector fetches; + std::vector fetches; ASSERT_TRUE(session_object.Run(run_options, feeds, output_names, &fetches).IsOK()); @@ -355,7 +413,8 @@ TEST(GraphTransformationTests, FuseConvBnAddMulFloat16) { auto& rtensor = fetches.front().Get(); TensorShape expected_shape(expected_dims_prod); ASSERT_EQ(expected_shape, rtensor.Shape()); - const std::vector found(rtensor.template Data(), rtensor.template Data() + expected_dims_prod.size()); + const std::vector found(rtensor.template Data(), + rtensor.template Data() + expected_dims_prod.size()); ASSERT_EQ(expected_values_prod, found); } diff --git a/onnxruntime/test/optimizer/graph_transform_utils_test.cc b/onnxruntime/test/optimizer/graph_transform_utils_test.cc index 8d3e2a0d99ae6..b554224b0b209 100644 --- a/onnxruntime/test/optimizer/graph_transform_utils_test.cc +++ b/onnxruntime/test/optimizer/graph_transform_utils_test.cc @@ -34,30 +34,11 @@ TEST(GraphTransformerUtilsTests, TestGenerateRewriterules) { } TEST(GraphTransformerUtilsTests, TestGenerateGraphTransformers) { - auto transformers = transformer_utils::GenerateTransformers(TransformerLevel::Level2); - ASSERT_TRUE(transformers.size() != 0); - - // Transformer name match test - std::vector custom_list = {"EliminateIdentity", "ConvAddFusion", "ConvMulFusion", "abc", "def"}; - transformers = transformer_utils::GenerateTransformers(TransformerLevel::Level2, custom_list); - ASSERT_TRUE(transformers.size() == 2); - // validate each rule returned is present in the custom list - for (const auto& transformer : transformers) { - ASSERT_TRUE(std::find(custom_list.begin(), custom_list.end(), transformer->Name()) != custom_list.end()); - } - - // Transformer name no match test. When there is no match empty list is expected. - custom_list = {"EliminateIdentity"}; - transformers = transformer_utils::GenerateTransformers(TransformerLevel::Level2, custom_list); - ASSERT_TRUE(transformers.size() == 0); -} - -TEST(GraphTransformerUtilsTests, TestGenerateGraphTransformers_CustomList) { // custom list of rules and transformers std::string l1_rule1 = "EliminateIdentity"; std::string l1_transformer = "ConstantFolding"; - std::string l2_transformer = "ConvAddFusion"; - std::vector custom_list = {l1_rule1, l1_transformer, l2_transformer}; + std::string l2_rule1 = "ConvBNFusion"; + std::vector custom_list = {l1_rule1, l1_transformer, l2_rule1}; auto transformers = transformer_utils::GenerateTransformers(TransformerLevel::Level1, custom_list); ASSERT_TRUE(transformers.size() == 2); @@ -72,7 +53,8 @@ TEST(GraphTransformerUtilsTests, TestGenerateGraphTransformers_CustomList) { transformers = transformer_utils::GenerateTransformers(TransformerLevel::Level2, custom_list); ASSERT_TRUE(transformers.size() == 1); - ASSERT_TRUE(transformers[0]->Name() == l2_transformer); + rule_transformer = dynamic_cast(transformers[0].get()); + ASSERT_TRUE(rule_transformer->RulesCount() == 1); } } // namespace test diff --git a/onnxruntime/test/optimizer/optimizer_test.cc b/onnxruntime/test/optimizer/optimizer_test.cc index 268b908bb96bb..020e322051810 100644 --- a/onnxruntime/test/optimizer/optimizer_test.cc +++ b/onnxruntime/test/optimizer/optimizer_test.cc @@ -78,7 +78,7 @@ TEST(OptimizerTest, Basic) { kernel->Compute(&op_kernel_context); - std::vector fetches; + std::vector fetches; frame.GetOutputs(fetches); auto& tensor = fetches[0].Get(); const std::vector found(tensor.template Data(), tensor.template Data() + tensor_dim); diff --git a/onnxruntime/test/perftest/main.cc b/onnxruntime/test/perftest/main.cc index 8a5cbfe6fa25a..2b5483f17d36d 100644 --- a/onnxruntime/test/perftest/main.cc +++ b/onnxruntime/test/perftest/main.cc @@ -3,7 +3,7 @@ // onnxruntime dependencies #include - +#include #include "command_args_parser.h" #include "performance_runner.h" @@ -30,7 +30,8 @@ int real_main(int argc, char* argv[], OrtEnv** p_env) { } *p_env = env; } - perftest::PerformanceRunner perf_runner(env, test_config); + std::random_device rd; + perftest::PerformanceRunner perf_runner(env, test_config, rd); auto status = perf_runner.Run(); if (!status.IsOK()) { printf("Run failed:%s\n", status.ErrorMessage().c_str()); diff --git a/onnxruntime/test/perftest/ort_test_session.cc b/onnxruntime/test/perftest/ort_test_session.cc index f20edd29149fd..12d0dd1639165 100644 --- a/onnxruntime/test/perftest/ort_test_session.cc +++ b/onnxruntime/test/perftest/ort_test_session.cc @@ -11,10 +11,13 @@ namespace onnxruntime { namespace perftest { -std::chrono::duration OnnxRuntimeTestSession::Run(const OrtValue* const* input) { +std::chrono::duration OnnxRuntimeTestSession::Run() { + //Randomly pick one OrtValueArray from test_inputs_. (NOT ThreadSafe) + const std::uniform_int_distribution::param_type p(0, static_cast(test_inputs_.size() - 1)); + const size_t id = static_cast(dist_(rand_engine_, p)); + OrtValueArray* const input = test_inputs_.at(id); auto start = std::chrono::high_resolution_clock::now(); - - ORT_THROW_ON_ERROR(OrtRun(session_object_, nullptr, input_names_.data(), input, input_names_.size(), + ORT_THROW_ON_ERROR(OrtRun(session_object_, nullptr, input_names_.data(), input->Data(), input_names_.size(), output_names_raw_ptr.data(), output_names_raw_ptr.size(), output_values_.data())); auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration duration_seconds = end - start; @@ -25,9 +28,10 @@ std::chrono::duration OnnxRuntimeTestSession::Run(const OrtValue* const* return duration_seconds; } -OnnxRuntimeTestSession::OnnxRuntimeTestSession(OrtEnv* env, const PerformanceTestConfig& performance_test_config, +OnnxRuntimeTestSession::OnnxRuntimeTestSession(OrtEnv* env, std::random_device& rd, + const PerformanceTestConfig& performance_test_config, const TestModelInfo* m) - : input_names_(m->GetInputCount()) { + : rand_engine_(rd()), input_names_(m->GetInputCount()), input_length_(m->GetInputCount()) { SessionOptionsWrapper sf(env); const bool enable_cpu_mem_arena = true; const std::string& provider_name = performance_test_config.machine_config.provider_type_name; diff --git a/onnxruntime/test/perftest/ort_test_session.h b/onnxruntime/test/perftest/ort_test_session.h index debec792f1a8f..3264a2a92d330 100644 --- a/onnxruntime/test/perftest/ort_test_session.h +++ b/onnxruntime/test/perftest/ort_test_session.h @@ -3,6 +3,7 @@ #pragma once #include +#include #include "test_configuration.h" #include "test_session.h" class TestModelInfo; @@ -10,7 +11,18 @@ namespace onnxruntime { namespace perftest { class OnnxRuntimeTestSession : public TestSession { public: - OnnxRuntimeTestSession(OrtEnv* env, const PerformanceTestConfig& performance_test_config, const TestModelInfo* m); + OnnxRuntimeTestSession(OrtEnv* env, std::random_device& rd, const PerformanceTestConfig& performance_test_config, + const TestModelInfo* m); + + void PreLoadTestData(size_t test_data_id, size_t input_id, OrtValue* value) override { + if (test_inputs_.size() < test_data_id + 1) { + test_inputs_.resize(test_data_id + 1); + } + if (test_inputs_.at(test_data_id) == nullptr) { + test_inputs_[test_data_id] = new OrtValueArray(input_length_); + } + test_inputs_[test_data_id]->Set(input_id, value); + } ~OnnxRuntimeTestSession() override { if (session_object_ != nullptr) OrtReleaseSession(session_object_); @@ -18,18 +30,22 @@ class OnnxRuntimeTestSession : public TestSession { free(p); } } - std::chrono::duration Run(const OrtValue* const* input) override; + std::chrono::duration Run() override; ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(OnnxRuntimeTestSession); private: OrtSession* session_object_ = nullptr; + std::mt19937 rand_engine_; + std::uniform_int_distribution dist_; + std::vector test_inputs_; std::vector output_names_; // The same size with output_names_. // TODO: implement a customized allocator, then we can remove output_names_ to simplify this code std::vector output_names_raw_ptr; std::vector output_values_; std::vector input_names_; + const int input_length_; }; } // namespace perftest diff --git a/onnxruntime/test/perftest/performance_runner.cc b/onnxruntime/test/perftest/performance_runner.cc index e3a8d06a2558e..51adecc478a8f 100644 --- a/onnxruntime/test/perftest/performance_runner.cc +++ b/onnxruntime/test/perftest/performance_runner.cc @@ -47,7 +47,7 @@ Status PerformanceRunner::Run() { } // warm up - RunOneIteration(true /*isWarmup*/); + RunOneIteration(); // TODO: start profiling // if (!performance_test_config_.run_config.profile_file.empty()) @@ -76,20 +76,6 @@ Status PerformanceRunner::Run() { return Status::OK(); } -Status PerformanceRunner::RunOneIteration(bool isWarmup) { - std::chrono::duration duration_seconds = session_->Run(inputs_.Data()); - if (!isWarmup) { - std::lock_guard guard(results_mutex_); - performance_result_.time_costs.emplace_back(duration_seconds.count()); - performance_result_.total_time_cost += duration_seconds.count(); - if (performance_test_config_.run_config.f_verbose) { - std::cout << "iteration:" << performance_result_.time_costs.size() << "," - << "time_cost:" << performance_result_.time_costs.back() << std::endl; - } - } - return Status::OK(); -} - Status PerformanceRunner::FixDurationTest() { if (performance_test_config_.run_config.concurrent_session_runs <= 1) { return RunFixDuration(); @@ -120,11 +106,11 @@ Status PerformanceRunner::RunParallelDuration() { do { // We will queue work as deep as requested, ignoring the size of the threadpool itself int count = counter.load(std::memory_order_seq_cst); - while (count < performance_test_config_.run_config.concurrent_session_runs) { + while (count < static_cast(performance_test_config_.run_config.concurrent_session_runs)) { count++; counter++; tpool->Schedule([this, &counter, &m, &cv]() { - RunOneIteration(); + session_->ThreadSafeRun(); // Simplified version of Eigen::Barrier std::lock_guard lg(m); counter--; @@ -157,7 +143,7 @@ Status PerformanceRunner::ForkJoinRepeat() { for (size_t i = 0; i != performance_test_config_.run_config.concurrent_session_runs; ++i) { counter++; tpool->Schedule([this, &counter, &m, &cv]() { - RunOneIteration(); + session_->ThreadSafeRun(); // Simplified version of Eigen::Barrier std::lock_guard lg(m); counter--; @@ -182,23 +168,23 @@ static TestModelInfo* CreateModelInfo(const PerformanceTestConfig& performance_t ORT_NOT_IMPLEMENTED(ToMBString(performance_test_config_.backend), " is not supported"); } -static TestSession* CreateSession(OrtEnv* env, const PerformanceTestConfig& performance_test_config_, +static TestSession* CreateSession(OrtEnv* env, std::random_device& rd, + const PerformanceTestConfig& performance_test_config_, TestModelInfo* test_model_info) { if (CompareCString(performance_test_config_.backend.c_str(), ORT_TSTR("ort")) == 0) { - return new OnnxRuntimeTestSession(env, performance_test_config_, test_model_info); + return new OnnxRuntimeTestSession(env, rd, performance_test_config_, test_model_info); } #ifdef HAVE_TENSORFLOW if (CompareCString(performance_test_config_.backend.c_str(), ORT_TSTR("tf")) == 0) { - return new TensorflowTestSession(performance_test_config_, test_model_info); + return new TensorflowTestSession(rd, performance_test_config_, test_model_info); } #endif ORT_NOT_IMPLEMENTED(ToMBString(performance_test_config_.backend), " is not supported"); } -PerformanceRunner::PerformanceRunner(OrtEnv* env, const PerformanceTestConfig& test_config) +PerformanceRunner::PerformanceRunner(OrtEnv* env, const PerformanceTestConfig& test_config, std::random_device& rd) : performance_test_config_(test_config), test_model_info_(CreateModelInfo(test_config)), - session_(CreateSession(env, test_config, test_model_info_)), - inputs_(test_model_info_->GetInputCount()) {} + session_(CreateSession(env, rd, test_config, test_model_info_)) {} PerformanceRunner::~PerformanceRunner() = default; @@ -220,22 +206,25 @@ bool PerformanceRunner::Initialize() { test_case_.reset(CreateOnnxTestCase(narrow_model_name, test_model_info_, 0.0, 0.0)); // TODO: Place input tensor on cpu memory if mkldnn provider type to avoid CopyTensor logic in CopyInputAcrossDevices - if (test_case_->GetDataCount() <= 0) { + size_t test_data_count = test_case_->GetDataCount(); + if (test_data_count == 0) { std::cout << "there is no test data for model " << test_case_->GetTestCaseName() << std::endl; return false; } - std::unordered_map feeds; - test_case_->LoadTestData(0 /* id */, b_, feeds, true); - // Discard the names in feeds - int input_count = test_model_info_->GetInputCount(); - for (int i = 0; i != input_count; ++i) { - auto iter = feeds.find(test_model_info_->GetInputName(i)); - if (iter == feeds.end()) { - std::cout << "there is no test input data for input " << test_model_info_->GetInputName(i) << " and model " - << test_case_->GetTestCaseName() << std::endl; - return false; + for (size_t test_data_id = 0; test_data_id != test_data_count; ++test_data_id) { + std::unordered_map feeds; + test_case_->LoadTestData(test_data_id /* id */, b_, feeds, true); + // Discard the names in feeds + int input_count = test_model_info_->GetInputCount(); + for (int i = 0; i != input_count; ++i) { + auto iter = feeds.find(test_model_info_->GetInputName(i)); + if (iter == feeds.end()) { + std::cout << "there is no test input data for input " << test_model_info_->GetInputName(i) << " and model " + << test_case_->GetTestCaseName() << std::endl; + return false; + } + session_->PreLoadTestData(test_data_id, static_cast(i), iter->second); } - inputs_.Set(static_cast(i), iter->second); } test_case_.reset(nullptr); test_model_info_ = nullptr; diff --git a/onnxruntime/test/perftest/performance_runner.h b/onnxruntime/test/perftest/performance_runner.h index 12af387f54c1f..258d4847e2777 100644 --- a/onnxruntime/test/perftest/performance_runner.h +++ b/onnxruntime/test/perftest/performance_runner.h @@ -8,7 +8,8 @@ #include #include #include - +#include +#include // onnxruntime dependencies #include #include @@ -72,7 +73,7 @@ struct PerformanceResult { class PerformanceRunner { public: - PerformanceRunner(OrtEnv* env, const PerformanceTestConfig& test_config); + PerformanceRunner(OrtEnv* env, const PerformanceTestConfig& test_config, std::random_device& rd); ~PerformanceRunner(); Status Run(); @@ -87,7 +88,21 @@ class PerformanceRunner { private: bool Initialize(); - Status RunOneIteration(bool isWarmup = false); + + template + Status RunOneIteration() { + std::chrono::duration duration_seconds = session_->Run(); + if (!isWarmup) { + std::lock_guard guard(results_mutex_); + performance_result_.time_costs.emplace_back(duration_seconds.count()); + performance_result_.total_time_cost += duration_seconds.count(); + if (performance_test_config_.run_config.f_verbose) { + std::cout << "iteration:" << performance_result_.time_costs.size() << "," + << "time_cost:" << performance_result_.time_costs.back() << std::endl; + } + } + return Status::OK(); + } Status FixDurationTest(); Status RepeatedTimesTest(); @@ -96,14 +111,14 @@ class PerformanceRunner { inline Status RunFixDuration() { while (performance_result_.total_time_cost < performance_test_config_.run_config.duration_in_seconds) { - ORT_RETURN_IF_ERROR(RunOneIteration()); + ORT_RETURN_IF_ERROR(RunOneIteration()); } return Status::OK(); } inline Status RunRepeatedTimes() { for (size_t ite = 0; ite < performance_test_config_.run_config.repeated_times; ite++) { - ORT_RETURN_IF_ERROR(RunOneIteration()); + ORT_RETURN_IF_ERROR(RunOneIteration()); } return Status::OK(); } @@ -113,7 +128,6 @@ class PerformanceRunner { PerformanceTestConfig performance_test_config_; TestModelInfo* test_model_info_; std::unique_ptr session_; - OrtValueArray inputs_; HeapBuffer b_; std::unique_ptr test_case_; diff --git a/onnxruntime/test/perftest/test_session.h b/onnxruntime/test/perftest/test_session.h index 11cf1f308e6dc..e03d61984a5ed 100644 --- a/onnxruntime/test/perftest/test_session.h +++ b/onnxruntime/test/perftest/test_session.h @@ -2,11 +2,21 @@ // Licensed under the MIT License. #pragma once +#include + +#include "OrtValueList.h" + namespace onnxruntime { namespace perftest { class TestSession { public: - virtual std::chrono::duration Run(const OrtValue* const* input) = 0; + virtual std::chrono::duration Run() = 0; + // TODO: implement it + // This function won't return duration, because it may vary largely. + // Please measure the perf at a higher level. + void ThreadSafeRun() { abort(); } + virtual void PreLoadTestData(size_t test_data_id, size_t input_id, OrtValue* value) = 0; + virtual ~TestSession() = default; }; } // namespace perftest diff --git a/onnxruntime/test/perftest/tf_test_session.h b/onnxruntime/test/perftest/tf_test_session.h index 78581de8df8ef..27170db1d796f 100644 --- a/onnxruntime/test/perftest/tf_test_session.h +++ b/onnxruntime/test/perftest/tf_test_session.h @@ -12,9 +12,12 @@ namespace onnxruntime { namespace perftest { class TensorflowTestSession : public TestSession { private: + std::mt19937 rand_engine_; + std::uniform_int_distribution dist_; OrtCallback model_deleter; std::vector feed_; std::vector fetches_; + std::vector> feed_tensors_; TF_Session* sess_; TF_Graph* tf_graph_; // This function is for both graph inputs and outputs @@ -39,114 +42,173 @@ class TensorflowTestSession : public TestSession { return ret; } - TF_Tensor* AllocateTFTensor(const OrtTensorTypeAndShapeInfo* shape, size_t& buffer_length) const { - size_t dim_count = OrtGetNumOfDimensions(shape); - std::vector dims(dim_count); + public: + TensorflowTestSession(std::random_device& rd, const PerformanceTestConfig& performance_test_config, + const TestModelInfo* m) + : rand_engine_(rd()) { + TF_Status* s = TF_NewStatus(); + tf_graph_ = TF_NewGraph(); + TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions(); + TF_ImportGraphDefOptionsSetPrefix(opts, ""); + TF_Buffer* graph_def = TF_NewBuffer(); + void* model_data; + auto st = Env::Default().ReadFileAsString(performance_test_config.model_info.model_file_path.c_str(), 0, model_data, + graph_def->length, model_deleter); + if (!st.IsOK()) + ORT_THROW("read file ", performance_test_config.model_info.model_file_path, " failed:", st.ErrorMessage()); + graph_def->data = model_data; + TF_GraphImportGraphDef(tf_graph_, graph_def, opts, s); + if (TF_GetCode(s) != TF_OK) ORT_THROW("load TF model failed:", TF_Message(s)); + TF_SessionOptions* session_opts = TF_NewSessionOptions(); + sess_ = TF_NewSession(tf_graph_, session_opts, s); + if (TF_GetCode(s) != TF_OK) ORT_THROW("load TF model failed:", TF_Message(s)); + feed_.resize(static_cast(m->GetInputCount())); + for (size_t i = 0; i != feed_.size(); ++i) { + feed_[i] = GetOutputFromGraph(m->GetInputName(i).c_str(), tf_graph_); + } + fetches_.resize(static_cast(m->GetOutputCount())); + for (size_t i = 0; i != fetches_.size(); ++i) { + fetches_[i] = GetOutputFromGraph(m->GetOutputName(i).c_str(), tf_graph_); + } + } + ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(TensorflowTestSession); + + bool isDimMatches(const std::vector& dims1, const std::vector& dims2) { + if (dims1.size() != dims2.size()) return false; + size_t len = dims1.size(); + for (size_t i = 0; i != len; ++i) { + if (dims1[i] > 0 && dims2[i] > 0 && dims1[i] != dims2[i]) return false; + } + return true; + } + + /** + * convert input from CHW format to HWC format + * \param input A single image. This float array has length of 3*h*w + * \param h image height + * \param w image width + * \param output A float array. should be freed by caller after use + */ + template + static void chw_to_hwc(const T* input, int64_t h, int64_t w, T* output_data) { + int64_t stride = h * w; + for (int c = 0; c != 3; ++c) { + int64_t t = c * stride; + for (int64_t i = 0; i != stride; ++i) { + output_data[i * 3 + c] = input[t + i]; + } + } + } + + void PreLoadTestData(size_t test_data_id, size_t input_id, OrtValue* value) override { + if (feed_tensors_.size() < test_data_id + 1) { + feed_tensors_.resize(test_data_id + 1); + } + if (feed_tensors_.at(test_data_id).size() < input_id + 1) { + feed_tensors_.at(test_data_id).resize(input_id + 1); + } + + TF_Status* s = TF_NewStatus(); + void* input_buffer = nullptr; + ORT_THROW_ON_ERROR(OrtGetTensorMutableData(const_cast(value), &input_buffer)); + assert(input_buffer != nullptr); + OrtTensorTypeAndShapeInfo* shape = nullptr; + ORT_THROW_ON_ERROR(OrtGetTensorTypeAndShape(value, &shape)); + size_t buffer_length = 0; + std::vector dims; + size_t dim_count = OrtGetDimensionsCount(shape); + dims.resize(dim_count); OrtGetDimensions(shape, dims.data(), dim_count); int64_t ele_count = OrtGetTensorShapeElementCount(shape); - TF_DataType d; + TF_DataType tf_datatype; switch (OrtGetTensorElementType(shape)) { case ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT: // maps to c type float buffer_length = ele_count * sizeof(float); - d = TF_FLOAT; + tf_datatype = TF_FLOAT; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT8: // maps to c type uint8_t buffer_length = ele_count * sizeof(uint8_t); - d = TF_UINT8; + tf_datatype = TF_UINT8; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT8: // maps to c type int8_t buffer_length = ele_count * sizeof(int8_t); - d = TF_INT8; + tf_datatype = TF_INT8; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT16: // maps to c type uint16_t buffer_length = ele_count * sizeof(uint16_t); - d = TF_UINT16; + tf_datatype = TF_UINT16; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT16: // maps to c type int16_t buffer_length = ele_count * sizeof(int16_t); - d = TF_INT16; + tf_datatype = TF_INT16; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32: // maps to c type int32_t buffer_length = ele_count * sizeof(int32_t); - d = TF_INT32; + tf_datatype = TF_INT32; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64: // maps to c type int64_t buffer_length = ele_count * sizeof(int64_t); - d = TF_INT64; + tf_datatype = TF_INT64; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_BOOL: buffer_length = ele_count * sizeof(bool); - d = TF_BOOL; + tf_datatype = TF_BOOL; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE: // maps to c type double buffer_length = ele_count * sizeof(double); - d = TF_DOUBLE; + tf_datatype = TF_DOUBLE; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT32: // maps to c type uint32_t buffer_length = ele_count * sizeof(uint32_t); - d = TF_UINT32; + tf_datatype = TF_UINT32; break; case ONNX_TENSOR_ELEMENT_DATA_TYPE_UINT64: // maps to c type uint64_t buffer_length = ele_count * sizeof(uint64_t); - d = TF_UINT64; + tf_datatype = TF_UINT64; break; default: ORT_NOT_IMPLEMENTED("unexpected input data type"); } - return TF_AllocateTensor(d, dims.data(), static_cast(dims.size()), buffer_length); + TF_Tensor* t = nullptr; + int tf_dims_count = TF_GraphGetTensorNumDims(tf_graph_, feed_[input_id], s); + if (TF_GetCode(s) != TF_OK || tf_dims_count < 0) ORT_THROW("run TF model failed:", TF_Message(s)); + std::vector tf_dims(static_cast(tf_dims_count)); + TF_GraphGetTensorShape(tf_graph_, feed_[input_id], tf_dims.data(), tf_dims_count, s); + if (TF_GetCode(s) != TF_OK || tf_dims_count < 0) ORT_THROW("run TF model failed:", TF_Message(s)); + if (!isDimMatches(dims, tf_dims)) { + // detect if it's NCHW, if it is, switch it to NHWC + // TODO: make this code more generic + if (dims.size() == 4 && tf_dims.size() == 4 && dims[0] == 1 && dims[1] == 3 && dims[2] == dims[3] && + (tf_dims[0] == 1 || tf_dims[0] == -1) && tf_dims[3] == 3 && tf_dims[1] == tf_dims[2] && tf_dims[1] > 0) { + tf_dims[0] = 1; + t = TF_AllocateTensor(tf_datatype, tf_dims.data(), static_cast(tf_dims.size()), buffer_length); + chw_to_hwc((const float*)input_buffer, tf_dims[1], tf_dims[2], (float*)TF_TensorData(t)); + } else + ORT_THROW("dimension doesn't match"); + } else { + t = TF_AllocateTensor(tf_datatype, dims.data(), static_cast(dims.size()), buffer_length); + memcpy(TF_TensorData(t), input_buffer, buffer_length); + } + assert(TF_TensorByteSize(t) == buffer_length); + assert(t != nullptr); + feed_tensors_[test_data_id][input_id] = t; } + std::chrono::duration Run() override { + //Randomly pick one OrtValueArray from feed_tensors_. (NOT ThreadSafe) + const std::uniform_int_distribution::param_type p(0, static_cast(feed_tensors_.size() - 1)); + const size_t id = static_cast(dist_(rand_engine_, p)); + std::vector& feed_tensors = feed_tensors_.at(id); - public: - TensorflowTestSession(const PerformanceTestConfig& performance_test_config, const TestModelInfo* m) { TF_Status* s = TF_NewStatus(); - tf_graph_ = TF_NewGraph(); - TF_ImportGraphDefOptions* opts = TF_NewImportGraphDefOptions(); - TF_ImportGraphDefOptionsSetPrefix(opts, ""); - TF_Buffer* graph_def = TF_NewBuffer(); - void* model_data; - auto st = Env::Default().ReadFileAsString(performance_test_config.model_info.model_file_path.c_str(), 0, model_data, - graph_def->length, model_deleter); - if (!st.IsOK()) - ORT_THROW("read file ", performance_test_config.model_info.model_file_path, " failed:", st.ErrorMessage()); - graph_def->data = model_data; - TF_GraphImportGraphDef(tf_graph_, graph_def, opts, s); - if (TF_GetCode(s) != TF_OK) ORT_THROW("load TF model failed:", TF_Message(s)); - TF_SessionOptions* session_opts = TF_NewSessionOptions(); - sess_ = TF_NewSession(tf_graph_, session_opts, s); - if (TF_GetCode(s) != TF_OK) ORT_THROW("load TF model failed:", TF_Message(s)); - feed_.resize(static_cast(m->GetInputCount())); - for (size_t i = 0; i != feed_.size(); ++i) { - feed_[i] = GetOutputFromGraph(m->GetInputName(i).c_str(), tf_graph_); - } - fetches_.resize(static_cast(m->GetOutputCount())); - for (size_t i = 0; i != fetches_.size(); ++i) { - fetches_[i] = GetOutputFromGraph(m->GetOutputName(i).c_str(), tf_graph_); - } - } - ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(TensorflowTestSession); - std::chrono::duration Run(const OrtValue* const* input) override { - size_t input_len = feed_.size(); - std::vector feed_tensors(input_len); - for (size_t i = 0; i != input_len; ++i) { - void* input_buffer = nullptr; - ORT_THROW_ON_ERROR(OrtGetTensorMutableData(const_cast(input[i]), &input_buffer)); - assert(input_buffer != nullptr); - OrtTensorTypeAndShapeInfo* shape; - ORT_THROW_ON_ERROR(OrtGetTensorShapeAndType(input[i], &shape)); - size_t buffer_length = 0; - TF_Tensor* t = AllocateTFTensor(shape, buffer_length); - assert(t != nullptr); - feed_tensors[i] = t; - assert(TF_TensorByteSize(t) == buffer_length); - memcpy(TF_TensorData(t), input_buffer, buffer_length); - } std::vector output_tensors(fetches_.size()); - TF_Status* s = TF_NewStatus(); auto start = std::chrono::high_resolution_clock::now(); TF_SessionRun(sess_, nullptr, feed_.data(), feed_tensors.data(), static_cast(feed_.size()), fetches_.data(), output_tensors.data(), static_cast(fetches_.size()), nullptr, 0, nullptr, s); auto end = std::chrono::high_resolution_clock::now(); if (TF_GetCode(s) != TF_OK) ORT_THROW("run TF model failed:", TF_Message(s)); + for (TF_Tensor* f : output_tensors) { + TF_DeleteTensor(f); + } TF_DeleteStatus(s); return end - start; } @@ -162,4 +224,4 @@ class TensorflowTestSession : public TestSession { }; } // namespace perftest -} // namespace onnxruntime \ No newline at end of file +} // namespace onnxruntime diff --git a/onnxruntime/test/providers/cpu/controlflow/scan_test.cc b/onnxruntime/test/providers/cpu/controlflow/scan_test.cc index 1e9e348e4c1e4..c38656a8cfff7 100644 --- a/onnxruntime/test/providers/cpu/controlflow/scan_test.cc +++ b/onnxruntime/test/providers/cpu/controlflow/scan_test.cc @@ -380,7 +380,7 @@ static void RunTest_v9(const std::string test_name, int64_t sequence_len, int64_ // skip if this is an invalid input test and axis is out of the valid range if (axis >= -rank && axis < rank) { - std::vector permutations; + std::vector permutations; std::vector new_shape; scan::detail::CalculateTransposedShapeForOutput(output_shape, HandleNegativeAxis(axis, output_shape.size()), permutations, new_shape); diff --git a/onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc b/onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc index f56da177ef956..030f357b9629e 100644 --- a/onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc +++ b/onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc @@ -585,6 +585,21 @@ TEST(MathOpTest, Max_8) { test.Run(OpTester::ExpectResult::kExpectSuccess, "", {kTensorrtExecutionProvider}); //Input batch size is inconsistent } +TEST(MathOpTest, Max_8_2inputbroadcast) { + OpTester test("Max", 8); + test.AddInput("data_0", {1, 3}, + {1.0f, 2.0f, 3.0f}); + test.AddInput("data_1", {3, 3}, + {10.0f, 20.0f, 30.0f, + 40.0f, 50.0f, 60.0f, + 70.0f, 80.0f, 90.0f}); + test.AddOutput("max", {3, 3}, + {10.0f, 20.0f, 30.0f, + 40.0f, 50.0f, 60.0f, + 70.0f, 80.0f, 90.0f}); + test.Run(OpTester::ExpectResult::kExpectSuccess, "", {kTensorrtExecutionProvider}); //Input batch size is inconsistent +} + TEST(MathOpTest, Not) { OpTester test("Not"); std::vector dims{2}; @@ -1010,6 +1025,35 @@ TEST(MathOpTest, Erf) { test.Run(); } +TEST(MathOpTest, ErfMoreData) { + OpTester test("Erf", 9); + std::vector inputs{ + -3.625f, 3.375f, 0.0f, 0.00025f, 0.0005f, -0.00075f, -0.001f, 0.00125f, + 0.0015f, -3.125f, 0.00175f, 2.875f, 2.625f, 2.375f, 2.125f, 6.25e-05f, + 0.0003125f, 0.0005625f, -0.0008125f, 0.0010625f, 0.0013125f, 0.0015625f, 0.0018125f, 3.5625f, + 3.3125f, 3.0625f, 2.8125f, -2.5625f, 2.3125f, 2.0625f, 0.000125f, 0.000375f, + -0.000625f, -0.000875f, -0.001125f, -0.001375f, -0.001625f, -0.001875f, -3.5f, -3.25f, + 3.0f, 2.75f, -2.5f, -2.25f, -2.0f, -0.0001875f, 0.0004375f, 0.0006875f, + 2.1875f, -1.9375f, 0.0014375f, -0.0016875f, -0.0019375f, 3.4375f, 3.1875f, -2.9375f, + -2.4375f, -0.0009375f, 0.0011875f + }; + std::vector outputs{ + -1.0f, 0.999998f, 0.0f, 0.000282095f, 0.00056419f, -0.000846284f, -0.00112838f, 0.00141047f, + 0.00169257f, -0.99999f, 0.00197466f, 0.999952f, 0.999795f, 0.999217f, 0.997346f, 7.05237e-05f, + 0.000352618f, 0.000634713f, -0.000916808f, 0.0011989f, 0.001481f, 0.00176309f, 0.00204518f, 1.0f, + 0.999997f, 0.999985f, 0.99993f, -0.99971f, 0.998926f, 0.996464f, 0.000141047f, 0.000423142f, + -0.000705237f, -0.000987331f, -0.00126943f, -0.00155152f, -0.00183361f, -0.00211571f, -0.999999f, -0.999996f, + 0.999978f, 0.999899f, -0.999593f, -0.998537f, -0.995322f, -0.000211571f, 0.000493666f, 0.000775761f, + 0.998022f, -0.993857f, 0.00162204f, -0.00190414f, -0.00218623f, 0.999999f, 0.999993f, -0.999967f, + -0.999433f, -0.00105786f, 0.00133995f + }; + std::vector dims{static_cast(inputs.size())}; + + test.AddInput("A", dims, inputs); + test.AddOutput("B", dims, outputs); + test.Run(); +} + const int ModOp_ver = 10; TEST(ModOpTest, Fmod_float_mixed_sign) { diff --git a/onnxruntime/test/providers/cpu/ml/svmclassifier_test.cc b/onnxruntime/test/providers/cpu/ml/svmclassifier_test.cc index 9b75ebe5616b2..2928b39e6f5f0 100644 --- a/onnxruntime/test/providers/cpu/ml/svmclassifier_test.cc +++ b/onnxruntime/test/providers/cpu/ml/svmclassifier_test.cc @@ -10,14 +10,20 @@ namespace test { TEST(MLOpTest, SVMClassifierMulticlassSVC) { OpTester test("SVMClassifier", 1, onnxruntime::kMLDomain); - std::vector dual_coefficients = {1.14360327f, 1.95968249f, -1.175683f, -1.92760275f, -1.32575698f, -1.32575698f, 0.66332785f, 0.66242913f, 0.53120854f, 0.53510444f, -1.06631298f, -1.06631298f, 0.66332785f, 0.66242913f, 0.53120854f, 0.53510444f, 1.f, -1.f}; - std::vector support_vectors = {0.f, 0.5f, 32.f, 2.f, 2.9f, -32.f, 1.f, 1.5f, 1.f, 3.f, 13.3f, -11.f, 12.f, 12.9f, -312.f, 43.f, 413.3f, -114.f}; + std::vector dual_coefficients = {1.14360327f, 1.95968249f, -1.175683f, -1.92760275f, -1.32575698f, + -1.32575698f, 0.66332785f, 0.66242913f, 0.53120854f, 0.53510444f, + -1.06631298f, -1.06631298f, 0.66332785f, 0.66242913f, 0.53120854f, + 0.53510444f, 1.f, -1.f}; + std::vector support_vectors = {0.f, 0.5f, 32.f, 2.f, 2.9f, -32.f, 1.f, 1.5f, 1.f, 3.f, + 13.3f, -11.f, 12.f, 12.9f, -312.f, 43.f, 413.3f, -114.f}; std::vector classes = {0, 1, 2, 3}; std::vector vectors_per_class = {2, 2, 1, 1}; std::vector rho = {0.5279583f, 0.32605162f, 0.32605162f, 0.06663721f, 0.06663721f, 0.f}; std::vector kernel_params = {0.001f, 0.f, 3.f}; //gamma, coef0, degree - std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, 12.0f, 12.9f, -312.f, 23.0f, 11.3f, -222.f, 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, 23.0f, 11.3f, -222.f, 43.0f, 413.3f, -114.f}; + std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, 12.0f, 12.9f, -312.f, 23.0f, + 11.3f, -222.f, 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, 23.0f, + 11.3f, -222.f, 43.0f, 413.3f, -114.f}; std::vector predictions = {1, 1, 2, 0, 0, 0, 0, 3}; std::vector scores = { -0.956958294f, 0.799815655f, 0.799815655f, 0.988598406f, 0.988598406f, 0, @@ -47,12 +53,18 @@ TEST(MLOpTest, SVMClassifierMulticlassSVC) { TEST(MLOpTest, SVMClassifierMulticlassLinearSVC) { OpTester test("SVMClassifier", 1, onnxruntime::kMLDomain); - std::vector dual_coefficients = {-1.55181212e-01f, 2.42698956e-01f, 7.01893432e-03f, 4.07614474e-01f, -3.24927823e-02f, 2.79897536e-04f, -1.95771302e-01f, -3.52437368e-01f, -2.15973096e-02f, -4.38190277e-01f, 4.56869105e-02f, -1.29375499e-02f}; + std::vector dual_coefficients = {-1.55181212e-01f, 2.42698956e-01f, 7.01893432e-03f, + 4.07614474e-01f, -3.24927823e-02f, 2.79897536e-04f, + -1.95771302e-01f, -3.52437368e-01f, -2.15973096e-02f, + -4.38190277e-01f, 4.56869105e-02f, -1.29375499e-02f}; std::vector classes = {0, 1, 2, 3}; std::vector rho = {-0.07489691f, -0.1764396f, -0.21167431f, -0.51619097f}; std::vector kernel_params = {0.001f, 0.f, 3.f}; //gamma, coef0, degree - std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, 12.0f, 12.9f, -312.f, 23.0f, 11.3f, -222.f, 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, 23.0f, 11.3f, -222.f, 43.0f, 413.3f, -114.f}; + std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, + 12.0f, 12.9f, -312.f, 23.0f, 11.3f, -222.f, + 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, + 23.0f, 11.3f, -222.f, 43.0f, 413.3f, -114.f}; std::vector predictions = {1, 0, 1, 1, 1, 0, 1, 0}; std::vector scores = { -0.227270544f, 0.332829535f, -0.279307127f, -0.518262208f, @@ -115,5 +127,89 @@ TEST(MLOpTest, SVMClassifierSVCProbabilities) { test.Run(); } +TEST(MLOpTest, SVMClassifierSVC) { + OpTester test("SVMClassifier", 1, onnxruntime::kMLDomain); + + std::vector coefficients = {1.14360327f, 1.95968249f, -1.175683f, -1.92760275f, -1.32575698f, -1.32575698f, + 0.66332785f, 0.66242913f, 0.53120854f, 0.53510444f, -1.06631298f, -1.06631298f, + 0.66332785f, 0.66242913f, 0.53120854f, 0.53510444f, 1.f, -1.f}; + std::vector support_vectors = {0.f, 0.5f, 32.f, 2.f, 2.9f, -32.f, + 1.f, 1.5f, 1.f, 3.f, 13.3f, -11.f, + 12.f, 12.9f, -312.f, 43.f, 413.3f, -114.f}; + std::vector rho = {0.5279583f}; + std::vector kernel_params = {0.001f, 0.f, 3.f}; //gamma, coef0, degree + std::vector classes = {0, 1}; + std::vector vectors_per_class = {3, 3}; + + std::vector X = {1.f, 0.0f, 0.4f, + 3.0f, 44.0f, -3.f, + 12.0f, 12.9f, -312.f, + 23.0f, 11.3f, -222.f, + 23.0f, 11.3f, -222.f}; + std::vector scores_predictions = { + 0.95695829391479492f, -0.95695829391479492f, + 0.1597825288772583f, -0.1597825288772583f, + 0.797798752784729f, -0.797798752784729f, + -0.52760261297225952f, 0.52760261297225952f, + -0.52760261297225952f, 0.52760261297225952f}; + std::vector class_predictions = {1, 1, 1, 0, 0}; + + test.AddAttribute("kernel_type", std::string("RBF")); + test.AddAttribute("coefficients", coefficients); + test.AddAttribute("support_vectors", support_vectors); + test.AddAttribute("vectors_per_class", vectors_per_class); + test.AddAttribute("rho", rho); + test.AddAttribute("kernel_params", kernel_params); + test.AddAttribute("classlabels_ints", classes); + + test.AddInput("X", {5, 3}, X); + test.AddOutput("Y", {5}, class_predictions); + test.AddOutput("Z", {5, 2}, scores_predictions); + + test.Run(); +} + +TEST(MLOpTest, SVMClassifierSVCDouble) { + OpTester test("SVMClassifier", 1, onnxruntime::kMLDomain); + + std::vector coefficients = {1.14360327f, 1.95968249f, -1.175683f, -1.92760275f, -1.32575698f, -1.32575698f, + 0.66332785f, 0.66242913f, 0.53120854f, 0.53510444f, -1.06631298f, -1.06631298f, + 0.66332785f, 0.66242913f, 0.53120854f, 0.53510444f, 1.f, -1.f}; + std::vector support_vectors = {0.f, 0.5f, 32.f, 2.f, 2.9f, -32.f, + 1.f, 1.5f, 1.f, 3.f, 13.3f, -11.f, + 12.f, 12.9f, -312.f, 43.f, 413.3f, -114.f}; + std::vector rho = {0.5279583f}; + std::vector kernel_params = {0.001f, 0.f, 3.f}; //gamma, coef0, degree + std::vector classes = {0, 1}; + std::vector vectors_per_class = {3, 3}; + + std::vector X = {1.f, 0.0f, 0.4f, + 3.0f, 44.0f, -3.f, + 12.0f, 12.9f, -312.f, + 23.0f, 11.3f, -222.f, + 23.0f, 11.3f, -222.f}; + std::vector scores_predictions = { + 0.95695829391479492f, -0.95695829391479492f, + 0.1597825288772583f, -0.1597825288772583f, + 0.797798752784729f, -0.797798752784729f, + -0.52760261297225952f, 0.52760261297225952f, + -0.52760261297225952f, 0.52760261297225952f}; + std::vector class_predictions = {1, 1, 1, 0, 0}; + + test.AddAttribute("kernel_type", std::string("RBF")); + test.AddAttribute("coefficients", coefficients); + test.AddAttribute("support_vectors", support_vectors); + test.AddAttribute("vectors_per_class", vectors_per_class); + test.AddAttribute("rho", rho); + test.AddAttribute("kernel_params", kernel_params); + test.AddAttribute("classlabels_ints", classes); + + test.AddInput("X", {5, 3}, X); + test.AddOutput("Y", {5}, class_predictions); + test.AddOutput("Z", {5, 2}, scores_predictions); + + test.Run(); +} + } // namespace test } // namespace onnxruntime diff --git a/onnxruntime/test/providers/cpu/ml/treeregressor_test.cc b/onnxruntime/test/providers/cpu/ml/treeregressor_test.cc index 335c17b856714..31cfcc7cadb5b 100644 --- a/onnxruntime/test/providers/cpu/ml/treeregressor_test.cc +++ b/onnxruntime/test/providers/cpu/ml/treeregressor_test.cc @@ -7,7 +7,8 @@ namespace onnxruntime { namespace test { -TEST(MLOpTest, TreeRegressorMultiTarget) { +void GenTreeAndRunTest(const std::vector& X, const std::vector& base_values, const std::vector& results, const std::string& aggFunction) +{ OpTester test("TreeEnsembleRegressor", 1, onnxruntime::kMLDomain); //tree @@ -25,10 +26,6 @@ TEST(MLOpTest, TreeRegressorMultiTarget) { std::vector target_weights = {1.5f, 27.5f, 2.25f, 20.75f, 2.f, 23.f, 3.f, 14.f, 0.f, 41.f, 1.83333333f, 24.5f, 0.f, 41.f, 2.75f, 16.25f, 2.f, 23.f, 3.f, 14.f, 2.66666667f, 17.f, 2.f, 23.f, 3.f, 14.f}; std::vector classes = {0, 1}; - //test data - std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, 12.0f, 12.9f, -312.f, 23.0f, 11.3f, -222.f, 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, 23.0f, 11.3f, -222.f, 43.0f, 413.3f, -114.f}; - std::vector results = {1.33333333f, 29.f, 3.f, 14.f, 2.f, 23.f, 2.f, 23.f, 2.f, 23.f, 2.66666667f, 17.f, 2.f, 23.f, 3.f, 14.f}; - //add attributes test.AddAttribute("nodes_truenodeids", lefts); test.AddAttribute("nodes_falsenodeids", rights); @@ -41,14 +38,88 @@ TEST(MLOpTest, TreeRegressorMultiTarget) { test.AddAttribute("target_nodeids", target_nodeids); test.AddAttribute("target_ids", target_classids); test.AddAttribute("target_weights", target_weights); + test.AddAttribute("base_values", base_values); test.AddAttribute("n_targets", (int64_t)2); - test.AddAttribute("aggregate_function", "AVERAGE"); + + if (aggFunction == "AVERAGE") { + test.AddAttribute("aggregate_function", "AVERAGE"); + } else if (aggFunction == "MIN") { + test.AddAttribute("aggregate_function", "MIN"); + } else if (aggFunction == "MAX") { + test.AddAttribute("aggregate_function", "MAX"); + } // default function is SUM + //fill input data test.AddInput("X", {8, 3}, X); test.AddOutput("Y", {8, 2}, results); test.Run(); } +TEST(MLOpTest, TreeRegressorMultiTargetAverage) { + std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, 12.0f, 12.9f, -312.f, 23.0f, 11.3f, -222.f, 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, 23.0f, 11.3f, -222.f, 43.0f, 413.3f, -114.f}; + std::vector results = {1.33333333f, 29.f, 3.f, 14.f, 2.f, 23.f, 2.f, 23.f, 2.f, 23.f, 2.66666667f, 17.f, 2.f, 23.f, 3.f, 14.f}; + std::vector base_values{0.f, 0.f}; + GenTreeAndRunTest(X, base_values, results, "AVERAGE"); +} + +TEST(MLOpTest, TreeRegressorMultiTargetMin) { + std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, 12.0f, 12.9f, -312.f, 23.0f, 11.3f, -222.f, 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, 23.0f, 11.3f, -222.f, 43.0f, 413.3f, -114.f}; + std::vector results = {5.f, 28.f, 8.f, 19.f, 7.f, 28.f, 7.f, 28.f, 7.f, 28.f, 7.f, 19.f, 7.f, 28.f, 8.f, 19.f}; + std::vector base_values{5.f, 5.f}; + GenTreeAndRunTest(X, base_values, results, "MIN"); +} + +TEST(MLOpTest, TreeRegressorMultiTargetMax) { + std::vector X = {1.f, 0.0f, 0.4f, 3.0f, 44.0f, -3.f, 12.0f, 12.9f, -312.f, 23.0f, 11.3f, -222.f, 23.0f, 11.3f, -222.f, 23.0f, 3311.3f, -222.f, 23.0f, 11.3f, -222.f, 43.0f, 413.3f, -114.f}; + std::vector results = {2.f, 41.f, 3.f, 14.f, 2.f, 23.f, 2.f, 23.f, 2.f, 23.f, 3.f, 23.f, 2.f, 23.f, 3.f, 14.f}; + std::vector base_values{0.f, 0.f}; + GenTreeAndRunTest(X, base_values, results, "MAX"); +} + +TEST(MLOpTest, TreeRegressorSingleTargetSum) { + OpTester test("TreeEnsembleRegressor", 1, onnxruntime::kMLDomain); + + //tree + std::vector lefts = {1, 0, 0, 1, 0, 0, 1, 0 ,0}; + std::vector rights = {2,0,0,2,0,0,2,0,0}; + std::vector treeids = {0,0,0,1,1,1,2,2,2}; + std::vector nodeids = {0,1,2,0,1,2,0,1,2}; + std::vector featureids = {0,0,0,0,0,0,1,0,0}; + std::vector thresholds = {1,0,0,0.5,0,0,0.5,0,0 }; + std::vector modes = {"BRANCH_LEQ", "LEAF", "LEAF", "BRANCH_LEQ", "LEAF", "LEAF", "BRANCH_LEQ", "LEAF", "LEAF"}; + + std::vector target_treeids = {0,0,1,1,2,2}; + std::vector target_nodeids = {1,2,1,2,1,2}; + std::vector target_classids = {0,0,0,0,0,0}; + std::vector target_weights = {33.33333f, 16.66666f, 33.33333f, -3.33333f, 16.66666f, -3.333333f}; + std::vector classes = {0, 1}; + + //test data + std::vector X = {0,1,1,1,2,0}; + std::vector results = {63.33333333f, 26.66666667f, 30.0f}; + + //add attributes + test.AddAttribute("nodes_truenodeids", lefts); + test.AddAttribute("nodes_falsenodeids", rights); + test.AddAttribute("nodes_treeids", treeids); + test.AddAttribute("nodes_nodeids", nodeids); + test.AddAttribute("nodes_featureids", featureids); + test.AddAttribute("nodes_values", thresholds); + test.AddAttribute("nodes_modes", modes); + test.AddAttribute("target_treeids", target_treeids); + test.AddAttribute("target_nodeids", target_nodeids); + test.AddAttribute("target_ids", target_classids); + test.AddAttribute("target_weights", target_weights); + + test.AddAttribute("n_targets", (int64_t)1); + // SUM aggregation by default -- no need to add explicitly + + //fill input data + test.AddInput("X", {3, 2}, X); + test.AddOutput("Y", {3, 1}, results); + test.Run(); +} + } // namespace test } // namespace onnxruntime diff --git a/onnxruntime/test/providers/cpu/tensor/slice_op.test.cc b/onnxruntime/test/providers/cpu/tensor/slice_op.test.cc index a6dc7946d79ff..83ee06bed5af2 100644 --- a/onnxruntime/test/providers/cpu/tensor/slice_op.test.cc +++ b/onnxruntime/test/providers/cpu/tensor/slice_op.test.cc @@ -18,18 +18,18 @@ void RunSliceTest(const std::vector& input_dims, const std::vector& steps, const std::vector& output_dims, const std::vector& output_vals, - bool v10_only = false) { + bool v10_only = false) { // V1-9 if (!v10_only) { - OpTester testv9("Slice", 9); - testv9.AddAttribute("starts", starts); - testv9.AddAttribute("ends", ends); - if (axes.size() != 0) - testv9.AddAttribute("axes", axes); - testv9.AddInput("data", input_dims, input_vals); - testv9.AddOutput("output", output_dims, output_vals); - testv9.Run(OpTester::ExpectResult::kExpectSuccess, "", {kTensorrtExecutionProvider}); + OpTester testv9("Slice", 9); + testv9.AddAttribute("starts", starts); + testv9.AddAttribute("ends", ends); + if (axes.size() != 0) + testv9.AddAttribute("axes", axes); + testv9.AddInput("data", input_dims, input_vals); + testv9.AddOutput("output", output_dims, output_vals); + testv9.Run(OpTester::ExpectResult::kExpectSuccess, "", {kTensorrtExecutionProvider}); } // V10 @@ -70,7 +70,7 @@ TEST(SliceTest, Slice1D_ValidStartEndRange_NoOutput) { TEST(SliceTest, Slice1D_Regular) { RunSliceTest - ({6}, + ({6}, {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f}, {2}, {4}, @@ -236,7 +236,7 @@ TEST(SliceTest, Slice1D_WithPositiveSteps) { {2}, {3}, {0.0f, 2.0f, 4.0f}, - true); + true); } // In numpy: @@ -312,7 +312,7 @@ TEST(SliceTest, Slice2D_WithPositiveSteps_1) { {1, 2}, {1, 2}, {5.0f, 7.0f}, - true); + true); } TEST(SliceTest, Slice2D_WithPositiveSteps_2) { @@ -324,7 +324,7 @@ TEST(SliceTest, Slice2D_WithPositiveSteps_2) { {}, // default steps {1, 3}, {2.0f, 3.0f, 4.0f}, - true); + true); } TEST(SliceTest, Slice2D_WithNegativeSteps_1) { @@ -354,10 +354,10 @@ TEST(SliceTest, Slice2D_WithNegativeSteps_2) { TEST(SliceTest, Slice3D_WithPositiveSteps_AllAxes) { RunSliceTest({3, 3, 3}, {27, 20, 2, - 4, 26, 11, + 4, 26, 11, 26, 5, 17, - 0, 21, 6, + 0, 21, 6, 22, 13, 29, 19, 17, 27, @@ -370,7 +370,7 @@ TEST(SliceTest, Slice3D_WithPositiveSteps_AllAxes) { {2, 2, 2}, {2, 1, 1}, {26, 9}, - true); + true); } TEST(SliceTest, Slice3D_WithPositiveAndNegativeSteps_SubsetOfAxes_1) { @@ -414,7 +414,7 @@ TEST(SliceTest, Slice3D_WithPositiveAndNegativeSteps_SubsetOfAxes_2) { {3, -2}, {3, 1, 0}, {}, - true); + true); } // Slice for Reversing @@ -509,5 +509,24 @@ TEST(SliceTest, Slice2D_ImplicitCopyBySlicingADimensionFully) { true); } +TEST(SliceTest, OptionalAxesInputAloneMissing) { + std::vector input_dims = {6}; + auto input_vals = {0.0f, 1.0f, 2.0f, 3.0f, 4.0f, 5.0f}; + std::initializer_list starts = {2}; + std::initializer_list ends = {4}; + std::initializer_list steps = {1}; + std::vector output_dims = {2}; + auto output_vals = {2.0f, 3.0f}; + + OpTester testv10("Slice", 10); + testv10.AddInput("data", input_dims, input_vals); + testv10.AddInput("starts", {static_cast(starts.size())}, starts); + testv10.AddInput("ends", {static_cast(ends.size())}, ends); + testv10.AddMissingOptionalInput(); + testv10.AddInput("steps", {static_cast(steps.size())}, steps); + testv10.AddOutput("output", output_dims, output_vals); + testv10.Run(OpTester::ExpectResult::kExpectSuccess, "", {kTensorrtExecutionProvider}); +} + } // namespace test } // namespace onnxruntime diff --git a/onnxruntime/test/providers/cpu/tensor/split_op_test.cc b/onnxruntime/test/providers/cpu/tensor/split_op_test.cc index 964a0d73df284..1957f40908def 100644 --- a/onnxruntime/test/providers/cpu/tensor/split_op_test.cc +++ b/onnxruntime/test/providers/cpu/tensor/split_op_test.cc @@ -7,15 +7,18 @@ namespace onnxruntime { namespace test { -template using ShapeAndData = std::pair, const std::vector>; +template +using ShapeAndData = std::pair, const std::vector>; using ShapeAndFloatData = ShapeAndData; using ShapeAndInt32Data = ShapeAndData; +using ShapeAndStringData = ShapeAndData; using ExpectResult = OpTester::ExpectResult; -template void RunTest(int64_t axis, const std::vector split_sizes, const ShapeAndData& input, - const std::vector>& outputs, bool is_tensorrt_supported = true, - bool expect_failure = false, const std::string& err_msg = {}) { +template +void RunTest(int64_t axis, const std::vector split_sizes, const ShapeAndData& input, + const std::vector>& outputs, bool is_tensorrt_supported = true, + bool expect_failure = false, const std::string& err_msg = {}) { OpTester test("Split"); test.AddAttribute("axis", axis); @@ -40,16 +43,16 @@ template void RunTest(int64_t axis, const std::vector split test.Run(expect_failure ? ExpectResult::kExpectFailure : ExpectResult::kExpectSuccess, err_msg, excluded_providers); } -TEST(SplitOperatorTest, Axis0EqualSplit) { +TEST(SplitOperatorTest, Axis0EqualSplitFloat) { const int64_t axis = 0; std::vector outputs; // input shape and data ShapeAndFloatData input = {{4, 2}, // shape - {1.f, 2.f, - 3.f, 4.f, - 5.f, 6.f, - 7.f, 8.f}}; + {1.f, 2.f, + 3.f, 4.f, + 5.f, 6.f, + 7.f, 8.f}}; outputs.push_back({{2, 2}, {1.f, 2.f, @@ -59,7 +62,7 @@ TEST(SplitOperatorTest, Axis0EqualSplit) { {5.f, 6.f, 7.f, 8.f}}); - RunTest(axis, {}, input, outputs, false);//TensorRT parser: Assertion failed: axis != BATCH_DIM + RunTest(axis, {}, input, outputs, false); //TensorRT parser: Assertion failed: axis != BATCH_DIM } TEST(SplitOperatorTest, Axis0EqualSplitInt32) { @@ -81,19 +84,41 @@ TEST(SplitOperatorTest, Axis0EqualSplitInt32) { {5, 6, 7, 8}}); - RunTest(axis, {}, input, outputs, false);//TensorRT parser: Assertion failed: axis != BATCH_DIM + RunTest(axis, {}, input, outputs, false); //TensorRT parser: Assertion failed: axis != BATCH_DIM } -TEST(SplitOperatorTest, Axis0UnequalSplit) { +TEST(SplitOperatorTest, Axis0EqualSplitString) { + const int64_t axis = 0; + std::vector outputs; + + // input shape and data + ShapeAndStringData input = {{4, 2}, // shape + {"a", "b", + "c", "d", + "e", "f", + "g", "h"}}; + + outputs.push_back({{2, 2}, + {"a", "b", + "c", "d"}}); + + outputs.push_back({{2, 2}, + {"e", "f", + "g", "h"}}); + + RunTest(axis, {}, input, outputs, false); //TensorRT parser: Assertion failed: axis != BATCH_DIM +} + +TEST(SplitOperatorTest, Axis0UnequalSplitFloat) { const int64_t axis = 0; std::vector outputs; // input shape and data ShapeAndFloatData input = {{4, 2}, // shape - {1.f, 2.f, - 3.f, 4.f, - 5.f, 6.f, - 7.f, 8.f}}; + {1.f, 2.f, + 3.f, 4.f, + 5.f, 6.f, + 7.f, 8.f}}; std::vector splits{1, 3}; @@ -104,17 +129,40 @@ TEST(SplitOperatorTest, Axis0UnequalSplit) { 5.f, 6.f, 7.f, 8.f}}); - RunTest(axis, splits, input, outputs, false);//TensorRT parser: Assertion failed: axis != BATCH_DIM + RunTest(axis, splits, input, outputs, false); //TensorRT parser: Assertion failed: axis != BATCH_DIM +} + +TEST(SplitOperatorTest, Axis0UnequalSplitString) { + const int64_t axis = 0; + std::vector outputs; + + // input shape and data + ShapeAndStringData input = {{4, 2}, // shape + {"a", "b", + "c", "d", + "e", "f", + "g", "h"}}; + + std::vector splits{1, 3}; + + outputs.push_back({{1, 2}, {"a", "b"}}); + + outputs.push_back({{3, 2}, + {"c", "d", + "e", "f", + "g", "h"}}); + + RunTest(axis, splits, input, outputs, false); //TensorRT parser: Assertion failed: axis != BATCH_DIM } -TEST(SplitOperatorTest, Axis1EqualSplit) { +TEST(SplitOperatorTest, Axis1EqualSplitFloat) { const int64_t axis = 1; std::vector outputs; // input shape and data ShapeAndFloatData input = {{2, 4}, - {1.f, 2.f, 3.f, 4.f, - 5.f, 6.f, 7.f, 8.f}}; + {1.f, 2.f, 3.f, 4.f, + 5.f, 6.f, 7.f, 8.f}}; outputs.push_back({{2, 2}, {1.f, 2.f, @@ -127,14 +175,34 @@ TEST(SplitOperatorTest, Axis1EqualSplit) { RunTest(axis, {}, input, outputs, false); } -TEST(SplitOperatorTest, Axis1UnequalSplit) { +TEST(SplitOperatorTest, Axis1EqualSplitString) { + const int64_t axis = 1; + std::vector outputs; + + // input shape and data + ShapeAndStringData input = {{2, 4}, + {"a", "b", "c", "d", + "e", "f", "g", "h"}}; + + outputs.push_back({{2, 2}, + {"a", "b", + "e", "f"}}); + + outputs.push_back({{2, 2}, + {"c", "d", + "g", "h"}}); + + RunTest(axis, {}, input, outputs, false); +} + +TEST(SplitOperatorTest, Axis1UnequalSplitFloat) { const int64_t axis = 1; std::vector outputs; // input shape and data ShapeAndFloatData input = {{2, 4}, - {1.f, 2.f, 3.f, 4.f, - 5.f, 6.f, 7.f, 8.f}}; + {1.f, 2.f, 3.f, 4.f, + 5.f, 6.f, 7.f, 8.f}}; std::vector splits{3, 1}; @@ -149,6 +217,28 @@ TEST(SplitOperatorTest, Axis1UnequalSplit) { RunTest(axis, splits, input, outputs, false); } +TEST(SplitOperatorTest, Axis1UnequalSplitString) { + const int64_t axis = 1; + std::vector outputs; + + // input shape and data + ShapeAndStringData input = {{2, 4}, + {"a", "b", "c", "d", + "e", "f", "g", "h"}}; + + std::vector splits{3, 1}; + + outputs.push_back({{2, 3}, + {"a", "b", "c", + "e", "f", "g"}}); + + outputs.push_back({{2, 1}, + {"d", + "h"}}); + + RunTest(axis, splits, input, outputs, false); +} + ShapeAndFloatData CreateInput(std::vector shape) { auto size = TensorShape(shape).Size(); @@ -280,8 +370,8 @@ TEST(SplitOperatorTest, NegativeAxis) { // input shape and data ShapeAndFloatData input = {{2, 4}, - {1.f, 2.f, 3.f, 4.f, - 5.f, 6.f, 7.f, 8.f}}; + {1.f, 2.f, 3.f, 4.f, + 5.f, 6.f, 7.f, 8.f}}; outputs.push_back({{2, 2}, {1.f, 2.f, @@ -300,10 +390,10 @@ TEST(SplitOperatorTest, InvalidAxis) { // input shape and data ShapeAndFloatData input = {{4, 2}, // shape - {1.f, 2.f, - 3.f, 4.f, - 5.f, 6.f, - 7.f, 8.f}}; + {1.f, 2.f, + 3.f, 4.f, + 5.f, 6.f, + 7.f, 8.f}}; outputs.push_back({{1}, {0.f}}); @@ -317,17 +407,17 @@ TEST(SplitOperatorTest, SplitAttributeSumTooSmall) { // input shape and data ShapeAndFloatData input = {{4, 2}, // shape - {1.f, 2.f, - 3.f, 4.f, - 5.f, 6.f, - 7.f, 8.f}}; + {1.f, 2.f, + 3.f, 4.f, + 5.f, 6.f, + 7.f, 8.f}}; std::vector splits{1, 2}; // should sum to 4 outputs.push_back({{1, 2}, {1.f, 2.f}}); outputs.push_back({{2, 2}, {3.f, 4.f, 5.f, 6.f}}); - RunTest(axis, splits, input, outputs, false, true, "Cannot split using values in 'split' attribute");//TensorRT parser: Assertion failed: axis != BATCH_DIM + RunTest(axis, splits, input, outputs, false, true, "Cannot split using values in 'split' attribute"); //TensorRT parser: Assertion failed: axis != BATCH_DIM } TEST(SplitOperatorTest, InvalidValueInSplitAttribute) { @@ -336,16 +426,16 @@ TEST(SplitOperatorTest, InvalidValueInSplitAttribute) { // input shape and data ShapeAndFloatData input = {{4, 2}, // shape - {1.f, 2.f, - 3.f, 4.f, - 5.f, 6.f, - 7.f, 8.f}}; + {1.f, 2.f, + 3.f, 4.f, + 5.f, 6.f, + 7.f, 8.f}}; std::vector splits{1, 0, 3}; // 0 is not valid outputs.push_back({{1, 2}, {1.f, 2.f}}); outputs.push_back({{3, 2}, {3.f, 4.f, 5.f, 6.f, 7.f, 8.f}}); - RunTest(axis, splits, input, outputs, false, true, "Invalid value in 'split' attribute");//TensorRT parser: Assertion failed: axis != BATCH_DIM + RunTest(axis, splits, input, outputs, false, true, "Invalid value in 'split' attribute"); //TensorRT parser: Assertion failed: axis != BATCH_DIM } /* diff --git a/onnxruntime/test/providers/memcpy_test.cc b/onnxruntime/test/providers/memcpy_test.cc index e0ac5972f320e..38f46ba4d4c80 100644 --- a/onnxruntime/test/providers/memcpy_test.cc +++ b/onnxruntime/test/providers/memcpy_test.cc @@ -26,7 +26,7 @@ TEST(MemcpyTest, copy1) { CPUExecutionProviderInfo epi; auto st = execution_providers.Add(onnxruntime::kCpuExecutionProvider, std::make_unique(epi)); ASSERT_TRUE(st.IsOK()) << st.ErrorMessage(); - SessionState s{execution_providers}; + SessionState s{execution_providers, true}; s.SetLogger(logging::LoggingManager::DefaultLogger()); KernelRegistryManager kernel_registry_manager; kernel_registry_manager.RegisterKernels(execution_providers); @@ -42,8 +42,8 @@ TEST(MemcpyTest, copy1) { ASSERT_TRUE(st.IsOK()) << st.ErrorMessage(); s.SetGraphViewer(std::make_unique(model.MainGraph())); PutAllNodesOnOneProvider(model.MainGraph(), onnxruntime::kCpuExecutionProvider); - SessionStateInitializer session_initializer{ORT_TSTR(""), model.MainGraph(), s, execution_providers, - kernel_registry_manager}; + SessionStateInitializer session_initializer{true, ORT_TSTR(""), model.MainGraph(), + s, execution_providers, kernel_registry_manager}; st = session_initializer.CreatePlan(nullptr, {}, true); ASSERT_TRUE(st.IsOK()) << st.ErrorMessage(); st = session_initializer.InitializeAndSave(nullptr); @@ -54,10 +54,10 @@ TEST(MemcpyTest, copy1) { std::unique_ptr p_tensor = std::make_unique(data_type, TensorShape({3, 2}), allocator); float data[] = {1.f, 1.f, 0.f, 1.f, 1.f, 1.f}; memcpy(p_tensor->MutableData(), data, sizeof(data)); - MLValue input = - MLValue{p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()}; + OrtValue input = + OrtValue{p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()}; - MLValue output; + OrtValue output; st = utils::CopyOneInputAcrossDevices(s, "X", input, output); ASSERT_TRUE(st.IsOK()) << st.ErrorMessage(); } diff --git a/onnxruntime/test/providers/ngraph/ngraph_execution_provider_test.cc b/onnxruntime/test/providers/ngraph/ngraph_execution_provider_test.cc index 83bf930aaf725..0efbc6ee49c0c 100644 --- a/onnxruntime/test/providers/ngraph/ngraph_execution_provider_test.cc +++ b/onnxruntime/test/providers/ngraph/ngraph_execution_provider_test.cc @@ -64,7 +64,7 @@ ONNX_NAMESPACE::OpSchema GetUnSupportedOpSchema() { } void add_feeds(NameMLValMap& feeds, std::string name, std::vector dims, std::vector value) { - MLValue ml_value; + OrtValue ml_value; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims, value, &ml_value); feeds.insert(std::make_pair(name, ml_value)); } @@ -104,7 +104,7 @@ void RunTest(const std::string& model_path, const NameMLValMap& feeds, const std run_options.run_tag = "nGraph EP test tag"; run_options.run_log_verbosity_level = 1; - std::vector fetches; + std::vector fetches; status = session_object.Run(run_options, feeds, output_names, &fetches); if (!status.IsOK()) { @@ -285,7 +285,6 @@ TEST(NGraphExecutionProviderTest, InOut_isAlso_GraphOut) { RunTest("testdata/ngraph/InOut_isAlso_GraphOut.onnx", feeds, {"Y", "Z"}, expected_shapes, expected_values); } - /* (A) (A) \ / @@ -341,7 +340,7 @@ This test is to ensure, we do not have cyclic dependent sub-graphs. Example: Sub-Graph-1{1,2,3,5} is invalid because the output of this cluster is input to UnSupportedOp whose output is again input to the same cluster. */ -TEST(NGraphExecutionProviderTest, Independent_SubGraphs) { +TEST(NGraphExecutionProviderTest, DISABLED_Independent_SubGraphs) { NameMLValMap feeds; add_feeds(feeds, "A", {4}, {1.0f, 2.0f, 3.0f, 4.0f}); diff --git a/onnxruntime/test/providers/provider_test_utils.cc b/onnxruntime/test/providers/provider_test_utils.cc index 1422ff889835e..90e41885db36c 100644 --- a/onnxruntime/test/providers/provider_test_utils.cc +++ b/onnxruntime/test/providers/provider_test_utils.cc @@ -126,23 +126,26 @@ void Check(const OpTester::Data& expected_data, const T& run_output, const std:: } template -void CheckDispatch(MLDataType type, const OpTester::Data& expected_data, MLValue& mlvalue, const std::string& provider_type) { +void CheckDispatch(MLDataType type, const OpTester::Data& expected_data, OrtValue& ort_value, + const std::string& provider_type) { if (type == DataTypeImpl::GetType()) - Check(expected_data, mlvalue.Get(), provider_type); + Check(expected_data, ort_value.Get(), provider_type); else ORT_THROW("OpTester:Check() not implemented for output tensor type of ", type); } template -void CheckDispatch(MLDataType type, const OpTester::Data& expected_data, MLValue& mlvalue, const std::string& provider_type) { +void CheckDispatch(MLDataType type, const OpTester::Data& expected_data, OrtValue& ort_value, + const std::string& provider_type) { if (type == DataTypeImpl::GetType()) - Check(expected_data, mlvalue.Get(), provider_type); + Check(expected_data, ort_value.Get(), provider_type); else - CheckDispatch(type, expected_data, mlvalue, provider_type); + CheckDispatch(type, expected_data, ort_value, provider_type); } -void Check(const OpTester::Data& expected_data, MLValue& mlvalue, const std::string& provider_type) { - CheckDispatch(expected_data.data_.Type(), expected_data, mlvalue, provider_type); +void Check(const OpTester::Data& expected_data, OrtValue& ort_value, const std::string& provider_type) { + CheckDispatch(expected_data.data_.Type(), expected_data, ort_value, + provider_type); } void DebugTrap() { @@ -162,7 +165,7 @@ OpTester::~OpTester() { #endif } -void OpTester::FillFeedsAndOutputNames(std::unordered_map& feeds, +void OpTester::FillFeedsAndOutputNames(std::unordered_map& feeds, std::vector& output_names) { for (auto& output : output_data_) { if (output.def_.Exists()) @@ -265,13 +268,9 @@ std::unique_ptr OpTester::BuildGraph() { return p_model; } -void OpTester::ExecuteModel(Model& model, - InferenceSession& session_object, - ExpectResult expect_result, - const std::string& expected_failure_string, - const RunOptions* run_options, - std::unordered_map feeds, - std::vector output_names, +void OpTester::ExecuteModel(Model& model, InferenceSession& session_object, ExpectResult expect_result, + const std::string& expected_failure_string, const RunOptions* run_options, + std::unordered_map feeds, std::vector output_names, const std::string& provider_type) { std::string s1; const bool rc = model.ToProto().SerializeToString(&s1); @@ -306,7 +305,7 @@ void OpTester::ExecuteModel(Model& model, default_run_options.run_tag = op_; default_run_options.run_log_verbosity_level = 1; - std::vector fetches; + std::vector fetches; status = session_object.Run(run_options ? *run_options : default_run_options, feeds, output_names, &fetches); if (status.IsOK()) { EXPECT_TRUE(expect_result == ExpectResult::kExpectSuccess); @@ -330,9 +329,8 @@ void OpTester::ExecuteModel(Model& model, // Todo: support check output with map/sequence/.... size_t idx = 0; for (auto& expected_data : output_data_) { - MLValue& mlvalue = fetches[idx]; - if (mlvalue.Fence()) - mlvalue.Fence()->BeforeUsingAsInput(onnxruntime::kCpuExecutionProvider, 0); + OrtValue& ort_value = fetches[idx]; + if (ort_value.Fence()) ort_value.Fence()->BeforeUsingAsInput(onnxruntime::kCpuExecutionProvider, 0); if (expected_data.def_.Exists()) { // optional outputs won't exist if (expected_data.data_.IsTensor()) { @@ -349,9 +347,9 @@ void OpTester::ExecuteModel(Model& model, EXPECT_EQ(expected_shape[d], inferred_dims[d]) << "Output idx = " << idx << " dim = " << d; } } - Check(expected_data, mlvalue.Get(), provider_type); + Check(expected_data, ort_value.Get(), provider_type); } else { - Check(expected_data, mlvalue, provider_type); + Check(expected_data, ort_value, provider_type); } ++idx; @@ -401,7 +399,7 @@ void OpTester::Run(ExpectResult expect_result, } // Hookup the inputs and outputs - std::unordered_map feeds; + std::unordered_map feeds; std::vector output_names; FillFeedsAndOutputNames(feeds, output_names); diff --git a/onnxruntime/test/providers/provider_test_utils.h b/onnxruntime/test/providers/provider_test_utils.h index 216a8bc8721fe..492fec979275b 100644 --- a/onnxruntime/test/providers/provider_test_utils.h +++ b/onnxruntime/test/providers/provider_test_utils.h @@ -179,7 +179,7 @@ class OpTester { template void AddInput(const char* name, const std::map& val) { std::unique_ptr> ptr = std::make_unique>(val); - MLValue value; + OrtValue value; value.Init(ptr.release(), DataTypeImpl::GetType>(), DataTypeImpl::GetType>()->GetDeleteFunc()); @@ -212,7 +212,7 @@ class OpTester { template void AddOutput(const char* name, const std::vector>& val) { auto ptr = std::make_unique>>(val); - MLValue ml_value; + OrtValue ml_value; ml_value.Init(ptr.release(), DataTypeImpl::GetType>>(), DataTypeImpl::GetType>>()->GetDeleteFunc()); @@ -248,7 +248,7 @@ class OpTester { struct Data { onnxruntime::NodeArg def_; - MLValue data_; + OrtValue data_; optional relative_error_; optional absolute_error_; }; @@ -261,7 +261,7 @@ class OpTester { void AddInitializers(onnxruntime::Graph& graph); - void FillFeedsAndOutputNames(std::unordered_map& feeds, + void FillFeedsAndOutputNames(std::unordered_map& feeds, std::vector& output_names); std::unique_ptr BuildGraph(); @@ -299,7 +299,7 @@ class OpTester { } TTypeProto type_proto(add_shape_to_tensor_data_ ? &dims_for_proto : nullptr); - MLValue value; + OrtValue value; value.Init(p_tensor.release(), DataTypeImpl::GetType(), DataTypeImpl::GetType()->GetDeleteFunc()); data.push_back({{name, &type_proto}, value, optional(), optional()}); if (is_initializer) @@ -310,13 +310,9 @@ class OpTester { } } - void ExecuteModel(Model& model, - InferenceSession& session_object, - ExpectResult expect_result, - const std::string& expected_failure_string, - const RunOptions* run_options, - std::unordered_map feeds, - std::vector output_names, + void ExecuteModel(Model& model, InferenceSession& session_object, ExpectResult expect_result, + const std::string& expected_failure_string, const RunOptions* run_options, + std::unordered_map feeds, std::vector output_names, const std::string& provider_type); const char* domain_; diff --git a/onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc b/onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc index fbd847ebb36aa..d5c7f33b6d9f7 100644 --- a/onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc +++ b/onnxruntime/test/providers/tensorrt/tensorrt_basic_test.cc @@ -14,8 +14,7 @@ using namespace ::onnxruntime::logging; namespace onnxruntime { namespace test { -void VerifyOutputs(const std::vector& fetches, - const std::vector& expected_dims, +void VerifyOutputs(const std::vector& fetches, const std::vector& expected_dims, const std::vector& expected_values) { ASSERT_EQ(1, fetches.size()); auto& rtensor = fetches.front().Get(); @@ -62,11 +61,11 @@ TEST(TensorrtExecutionProviderTest, FunctionTest) { std::vector dims_mul_x = {1, 3, 2}; std::vector values_mul_x = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f}; - MLValue ml_value_x; + OrtValue ml_value_x; CreateMLValue(TestTensorrtExecutionProvider()->GetAllocator(0, OrtMemTypeCPU), dims_mul_x, values_mul_x, &ml_value_x); - MLValue ml_value_y; + OrtValue ml_value_y; CreateMLValue(TestTensorrtExecutionProvider()->GetAllocator(0, OrtMemTypeCPU), dims_mul_x, values_mul_x, &ml_value_y); - MLValue ml_value_z; + OrtValue ml_value_z; CreateMLValue(TestTensorrtExecutionProvider()->GetAllocator(0, OrtMemTypeCPU), dims_mul_x, values_mul_x, &ml_value_z); NameMLValMap feeds; feeds.insert(std::make_pair("X", ml_value_x)); @@ -76,7 +75,7 @@ TEST(TensorrtExecutionProviderTest, FunctionTest) { // prepare outputs std::vector output_names; output_names.push_back("M"); - std::vector fetches; + std::vector fetches; // prepare expected inputs and outputs std::vector expected_dims_mul_m = {1, 3, 2}; diff --git a/onnxruntime/test/python/onnx_backend_test_series.py b/onnxruntime/test/python/onnx_backend_test_series.py index 0bd0c5a2618c3..abb09acde94ca 100644 --- a/onnxruntime/test/python/onnx_backend_test_series.py +++ b/onnxruntime/test/python/onnx_backend_test_series.py @@ -87,10 +87,10 @@ def assert_similar_outputs(cls, ref_outputs, outputs, rtol, atol): '|^test_mvn_cpu.*' '|^test_qlinearconv_cpu.*' '|^test_quantizelinear_cpu.*' -'|^test_mod_float_mixed_sign_example.*' '|^test_reversesequence_batch_cpu.*' '|^test_reversesequence_time_cpu.*' '|^test_roialign_cpu.*' +'|^test_operator_repeat_dim_overflow_cpu.*' ')') # import all test cases at global scope to make diff --git a/onnxruntime/test/python/onnxruntime_test_python.py b/onnxruntime/test/python/onnxruntime_test_python.py index 91851264cc940..9bb4b3488f877 100644 --- a/onnxruntime/test/python/onnxruntime_test_python.py +++ b/onnxruntime/test/python/onnxruntime_test_python.py @@ -4,10 +4,9 @@ # -*- coding: UTF-8 -*- import unittest import os -import sys import numpy as np import onnxruntime as onnxrt -from onnxruntime.capi._pybind_state import onnxruntime_ostream_redirect +import threading class TestInferenceSession(unittest.TestCase): @@ -25,6 +24,13 @@ def get_name(self, name): return res raise FileNotFoundError("Unable to find '{0}' or '{1}' or '{2}'".format(name, rel, res)) + def run_model(self, session_object, run_options): + x = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], dtype=np.float32) + input_name = session_object.get_inputs()[0].name + res = session_object.run([], {input_name: x}, run_options=run_options) + output_expected = np.array([[1.0, 4.0], [9.0, 16.0], [25.0, 36.0]], dtype=np.float32) + np.testing.assert_allclose(output_expected, res[0], rtol=1e-05, atol=1e-08) + def testRunModel(self): sess = onnxrt.InferenceSession(self.get_name("mul_1.pb")) x = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], dtype=np.float32) @@ -72,6 +78,22 @@ def testRunModel2(self): output_expected = np.array([[5.0], [11.0], [17.0]], dtype=np.float32) np.testing.assert_allclose(output_expected, res[0], rtol=1e-05, atol=1e-08) + def testRunModelMultipleThreads(self): + so = onnxrt.SessionOptions() + so.session_log_verbosity_level = 1 + so.session_logid = "MultiThreadsTest" + sess = onnxrt.InferenceSession(self.get_name("mul_1.pb"), sess_options=so) + ro1 = onnxrt.RunOptions() + ro1.run_tag = "thread1" + t1 = threading.Thread(target=self.run_model, args = (sess, ro1)) + ro2 = onnxrt.RunOptions() + ro2.run_tag = "thread2" + t2 = threading.Thread(target=self.run_model, args = (sess, ro2)) + t1.start() + t2.start() + t1.join() + t2.join() + def testRunDevice(self): device = onnxrt.get_device() self.assertTrue('CPU' in device or 'GPU' in device) @@ -315,29 +337,6 @@ def testModelMeta(self): self.assertEqual('', modelmeta.domain) self.assertEqual('', modelmeta.description) - def testConfigureSessionVerbosityLevel(self): - so = onnxrt.SessionOptions() - so.session_log_verbosity_level = 1 - - # use onnxruntime_ostream_redirect to redirect c++ stdout/stderr to python sys.stdout and sys.stderr - with onnxruntime_ostream_redirect(stdout=True, stderr=True): - sess = onnxrt.InferenceSession(self.get_name("matmul_1.pb"), sess_options=so) - output = sys.stderr.getvalue() - self.assertTrue('[I:onnxruntime:InferenceSession, inference_session' in output) - - def testConfigureRunVerbosityLevel(self): - ro = onnxrt.RunOptions() - ro.run_log_verbosity_level = 1 - ro.run_tag = "testtag123" - - # use onnxruntime_ostream_redirect to redirect c++ stdout/stderr to python sys.stdout and sys.stderr - with onnxruntime_ostream_redirect(stdout=True, stderr=True): - sess = onnxrt.InferenceSession(self.get_name("mul_1.pb")) - x = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], dtype=np.float32) - sess.run([], {'X': x}, run_options=ro) - output = sys.stderr.getvalue() - self.assertTrue('[I:onnxruntime:testtag123,' in output) - def testProfilerWithSessionOptions(self): so = onnxrt.SessionOptions() so.enable_profiling = True @@ -459,4 +458,4 @@ def test_run_model_mlnet(self): if __name__ == '__main__': - unittest.main(module=__name__, buffer=True) + unittest.main() diff --git a/onnxruntime/test/shared_lib/fns_candy_style_transfer.c b/onnxruntime/test/shared_lib/fns_candy_style_transfer.c index a058268162647..5aa2f74c3f87f 100644 --- a/onnxruntime/test/shared_lib/fns_candy_style_transfer.c +++ b/onnxruntime/test/shared_lib/fns_candy_style_transfer.c @@ -96,8 +96,8 @@ static int read_png_file(const char* input_file, size_t* height, size_t* width, */ static int write_tensor_to_png_file(OrtValue* tensor, const char* output_file) { struct OrtTensorTypeAndShapeInfo* shape_info; - ORT_ABORT_ON_ERROR(OrtGetTensorShapeAndType(tensor, &shape_info)); - size_t dim_count = OrtGetNumOfDimensions(shape_info); + ORT_ABORT_ON_ERROR(OrtGetTensorTypeAndShape(tensor, &shape_info)); + size_t dim_count = OrtGetDimensionsCount(shape_info); if (dim_count != 4) { printf("output tensor must have 4 dimensions"); return -1; diff --git a/onnxruntime/test/shared_lib/test_allocator.cc b/onnxruntime/test/shared_lib/test_allocator.cc index 92d4533bacc10..28dd873ef95e8 100644 --- a/onnxruntime/test/shared_lib/test_allocator.cc +++ b/onnxruntime/test/shared_lib/test_allocator.cc @@ -18,17 +18,12 @@ TEST_F(CApiTest, allocation_info) { } TEST_F(CApiTest, DefaultAllocator) { - std::unique_ptr default_allocator; - { - OrtAllocator* ptr; - ORT_THROW_ON_ERROR(OrtCreateDefaultAllocator(&ptr)); - default_allocator.reset(ptr); - } - char* p = (char*)OrtAllocatorAlloc(default_allocator.get(), 100); + Ort::Allocator default_allocator = Ort::Allocator::CreateDefault(); + char* p = (char*)default_allocator.Alloc(100); ASSERT_NE(p, nullptr); memset(p, 0, 100); - OrtAllocatorFree(default_allocator.get(), p); - const OrtAllocatorInfo* info1 = OrtAllocatorGetInfo(default_allocator.get()); - const OrtAllocatorInfo* info2 = default_allocator->Info(default_allocator.get()); + default_allocator.Free(p); + const OrtAllocatorInfo* info1 = default_allocator.GetInfo(); + const OrtAllocatorInfo* info2 = static_cast(default_allocator)->Info(default_allocator); ASSERT_EQ(0, OrtCompareAllocatorInfo(info1, info2)); } diff --git a/onnxruntime/test/shared_lib/test_inference.cc b/onnxruntime/test/shared_lib/test_inference.cc index 8f7f18b6c5865..1b938db4d24ed 100644 --- a/onnxruntime/test/shared_lib/test_inference.cc +++ b/onnxruntime/test/shared_lib/test_inference.cc @@ -44,10 +44,10 @@ void RunSession(OrtAllocator* env, OrtSession* session_object, std::unique_ptr shape_info; { OrtTensorTypeAndShapeInfo* shape_info_ptr; - ORT_THROW_ON_ERROR(OrtGetTensorShapeAndType(output_tensor, &shape_info_ptr)); + ORT_THROW_ON_ERROR(OrtGetTensorTypeAndShape(output_tensor, &shape_info_ptr)); shape_info.reset(shape_info_ptr); } - size_t rtensor_dims = OrtGetNumOfDimensions(shape_info.get()); + size_t rtensor_dims = OrtGetDimensionsCount(shape_info.get()); std::vector shape_array(rtensor_dims); OrtGetDimensions(shape_info.get(), shape_array.data(), shape_array.size()); ASSERT_EQ(shape_array, dims_y); @@ -166,20 +166,11 @@ INSTANTIATE_TEST_CASE_P(CApiTestWithProviders, ::testing::Values(0, 1, 2, 3, 4)); struct OrtTensorDimensions : std::vector { - OrtTensorDimensions(onnxruntime::CustomOpApi ort, OrtValue* value) { - OrtTensorTypeAndShapeInfo* info = ort.GetTensorShapeAndType(value); - auto dimensionCount = ort.GetDimensionCount(info); - resize(dimensionCount); - ort.GetDimensions(info, data(), dimensionCount); + OrtTensorDimensions(Ort::CustomOpApi ort, const OrtValue* value) { + OrtTensorTypeAndShapeInfo* info = ort.GetTensorTypeAndShape(value); + std::vector::operator=(ort.GetTensorShape(info)); ort.ReleaseTensorTypeAndShapeInfo(info); } - - size_t ElementCount() const { - int64_t count = 1; - for (size_t i = 0; i < size(); i++) - count *= (*this)[i]; - return count; - } }; // Once we use C++17 this could be replaced with std::size @@ -187,28 +178,22 @@ template constexpr size_t countof(T (&)[N]) { return N; } struct MyCustomKernel { - MyCustomKernel(onnxruntime::CustomOpApi ort, const OrtKernelInfo* /*info*/) : ort_(ort) { - } - - void GetOutputShape(OrtKernelContext* context, size_t /*output_index*/, OrtTensorTypeAndShapeInfo* info) { - OrtValue* input_X = ort_.KernelContext_GetInput(context, 0); - OrtTensorDimensions dimensions(ort_, input_X); - ort_.SetDimensions(info, dimensions.data(), dimensions.size()); + MyCustomKernel(Ort::CustomOpApi ort, const OrtKernelInfo* /*info*/) : ort_(ort) { } void Compute(OrtKernelContext* context) { // Setup inputs - OrtValue* input_X = ort_.KernelContext_GetInput(context, 0); - OrtValue* input_Y = ort_.KernelContext_GetInput(context, 1); - float* X = ort_.GetTensorMutableData(input_X); - float* Y = ort_.GetTensorMutableData(input_Y); + const OrtValue* input_X = ort_.KernelContext_GetInput(context, 0); + const OrtValue* input_Y = ort_.KernelContext_GetInput(context, 1); + const float* X = ort_.GetTensorData(input_X); + const float* Y = ort_.GetTensorData(input_Y); // Setup output OrtTensorDimensions dimensions(ort_, input_X); OrtValue* output = ort_.KernelContext_GetOutput(context, 0, dimensions.data(), dimensions.size()); float* out = ort_.GetTensorMutableData(output); - OrtTensorTypeAndShapeInfo* output_info = ort_.GetTensorShapeAndType(output); + OrtTensorTypeAndShapeInfo* output_info = ort_.GetTensorTypeAndShape(output); int64_t size = ort_.GetTensorShapeElementCount(output_info); ort_.ReleaseTensorTypeAndShapeInfo(output_info); @@ -219,11 +204,11 @@ struct MyCustomKernel { } private: - onnxruntime::CustomOpApi ort_; + Ort::CustomOpApi ort_; }; -struct MyCustomOp : onnxruntime::CustomOpBase { - void* CreateKernel(onnxruntime::CustomOpApi api, const OrtKernelInfo* info) { return new MyCustomKernel(api, info); }; +struct MyCustomOp : Ort::CustomOpBase { + void* CreateKernel(Ort::CustomOpApi api, const OrtKernelInfo* info) { return new MyCustomKernel(api, info); }; const char* GetName() const { return "Foo"; }; size_t GetInputTypeCount() const { return 2; }; @@ -247,11 +232,10 @@ TEST_F(CApiTest, custom_op_handler) { std::vector expected_values_y = {2.0f, 4.0f, 6.0f, 8.0f, 10.0f, 12.0f}; MyCustomOp custom_op; - OrtCustomOpDomain* custom_op_domain = OrtCreateCustomOpDomain(""); - ORT_THROW_ON_ERROR(OrtCustomOpDomain_Add(custom_op_domain, &custom_op)); + Ort::CustomOpDomain custom_op_domain(""); + custom_op_domain.Add(&custom_op); TestInference(env, CUSTOM_OP_MODEL_URI, inputs, "Y", expected_dims_y, expected_values_y, 0, custom_op_domain); - OrtReleaseCustomOpDomain(custom_op_domain); } #ifdef ORT_RUN_EXTERNAL_ONNX_TESTS @@ -276,7 +260,7 @@ TEST_F(CApiTest, create_tensor) { std::unique_ptr shape_info; { OrtTensorTypeAndShapeInfo* shape_info_ptr; - ORT_THROW_ON_ERROR(OrtGetTensorShapeAndType(tensor.get(), &shape_info_ptr)); + ORT_THROW_ON_ERROR(OrtGetTensorTypeAndShape(tensor.get(), &shape_info_ptr)); shape_info.reset(shape_info_ptr); } int64_t len = OrtGetTensorShapeElementCount(shape_info.get()); @@ -308,7 +292,7 @@ TEST_F(CApiTest, create_tensor_with_data) { ORT_THROW_ON_ERROR(OrtGetTypeInfo(tensor.get(), &type_info)); const struct OrtTensorTypeAndShapeInfo* tensor_info = OrtCastTypeInfoToTensorInfo(type_info); ASSERT_NE(tensor_info, nullptr); - ASSERT_EQ(1, OrtGetNumOfDimensions(tensor_info)); + ASSERT_EQ(1, OrtGetDimensionsCount(tensor_info)); OrtReleaseTypeInfo(type_info); } diff --git a/onnxruntime/test/shared_lib/test_io_types.cc b/onnxruntime/test/shared_lib/test_io_types.cc index f67325f34f6fb..37651cd17cde6 100644 --- a/onnxruntime/test/shared_lib/test_io_types.cc +++ b/onnxruntime/test/shared_lib/test_io_types.cc @@ -33,7 +33,7 @@ static void TestModelInfo(const OrtSession* inference_session, bool is_input, co enum ONNXTensorElementDataType ele_type = OrtGetTensorElementType(p); ASSERT_EQ(ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, ele_type); - ASSERT_EQ(dims.size(), OrtGetNumOfDimensions(p)); + ASSERT_EQ(dims.size(), OrtGetDimensionsCount(p)); std::vector real_dims(dims.size()); OrtGetDimensions(p, real_dims.data(), real_dims.size()); ASSERT_EQ(real_dims, dims); diff --git a/onnxruntime/test/shared_lib/test_model_loading.cc b/onnxruntime/test/shared_lib/test_model_loading.cc index 091aa5870c678..5537801649d35 100644 --- a/onnxruntime/test/shared_lib/test_model_loading.cc +++ b/onnxruntime/test/shared_lib/test_model_loading.cc @@ -5,6 +5,7 @@ #include "core/platform/env.h" #include "onnx_protobuf.h" #include +#include #include "test_fixture.h" #include "file_util.h" namespace onnxruntime { @@ -171,5 +172,26 @@ TEST_F(CApiTest, model_with_external_data) { OrtReleaseStatus(st); ::OrtReleaseSession(session); } + +TEST_F(CApiTest, model_from_array) { + const char* model_path = "testdata/matmul_1.pb"; + std::vector buffer; + { + std::ifstream file(model_path, std::ios::binary | std::ios::ate); + if (!file) + throw std::runtime_error("Error reading model"); + buffer.resize(file.tellg()); + file.seekg(0, std::ios::beg); + if (!file.read(buffer.data(), buffer.size())) + throw std::runtime_error("Error reading model"); + } + + std::unique_ptr so(OrtCreateSessionOptions()); + OrtSession* session; + auto st = ::OrtCreateSessionFromArray(env, buffer.data(), static_cast(buffer.size()), so.get(), &session); + ASSERT_EQ(st, nullptr) << OrtGetErrorMessage(st); + OrtReleaseStatus(st); + ::OrtReleaseSession(session); +} } // namespace test } // namespace onnxruntime diff --git a/onnxruntime/test/shared_lib/test_nontensor_types.cc b/onnxruntime/test/shared_lib/test_nontensor_types.cc index ca3cdc5cee80c..fd9e8c0357d89 100644 --- a/onnxruntime/test/shared_lib/test_nontensor_types.cc +++ b/onnxruntime/test/shared_lib/test_nontensor_types.cc @@ -28,109 +28,59 @@ struct RelAllocations { TEST_F(CApiTest, CreateGetVectorOfMapsInt64Float) { // support zipmap output type seq(map(int64, float)) // Creation - std::unique_ptr default_allocator(std::make_unique()); - OrtAllocatorInfo* info; - ORT_THROW_ON_ERROR(OrtCreateAllocatorInfo("Cpu", OrtDeviceAllocator, 0, OrtMemTypeDefault, &info)); - std::unique_ptr rel_info(info, OrtReleaseAllocatorInfo); - - RelAllocations rel(&OrtReleaseValue); - RelAllocations rels(&OrtReleaseStatus); + auto default_allocator = std::make_unique(); + Ort::AllocatorInfo info("Cpu", OrtDeviceAllocator, 0, OrtMemTypeDefault); const int N = 3; const int NUM_KV_PAIRS = 4; - std::vector in(N); + std::vector in; std::vector keys{3, 1, 2, 0}; std::vector dims = {4}; std::vector values{3.0f, 1.0f, 2.f, 0.f}; for (int i = 0; i < N; ++i) { // create key tensor - OrtValue* keys_tensor = OrtCreateTensorWithDataAsOrtValue(info, keys.data(), keys.size() * sizeof(int64_t), - dims, ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64); - ASSERT_NE(keys_tensor, nullptr); - rel.add(keys_tensor); - + Ort::Value keys_tensor = Ort::Value::CreateTensor(info, keys.data(), keys.size() * sizeof(int64_t), + dims.data(), dims.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64); // create value tensor - OrtValue* values_tensor = OrtCreateTensorWithDataAsOrtValue(info, values.data(), values.size() * sizeof(float), - dims, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT); - ASSERT_NE(values_tensor, nullptr); - rel.add(values_tensor); - + Ort::Value values_tensor = Ort::Value::CreateTensor(info, values.data(), values.size() * sizeof(float), + dims.data(), dims.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT); // create map ort value - std::vector map_in{keys_tensor, values_tensor}; - OrtValue* map_ort = nullptr; - OrtStatus* stx = OrtCreateValue(map_in.data(), 2, ONNX_TYPE_MAP, &map_ort); - rels.add(stx); - rel.add(map_ort); - ASSERT_EQ(stx, nullptr); - ASSERT_NE(map_ort, nullptr); - - in[i] = map_ort; + in.emplace_back(Ort::Value::CreateMap(keys_tensor, values_tensor)); } // repeat above 3 steps N times and store the result in an OrtValue array // create sequence ort value - OrtValue* seq_ort = nullptr; - OrtStatus* sty = OrtCreateValue(in.data(), N, ONNX_TYPE_SEQUENCE, &seq_ort); - rels.add(sty); - rel.add(seq_ort); - ASSERT_EQ(sty, nullptr); - ASSERT_NE(seq_ort, nullptr); + Ort::Value seq_ort = Ort::Value::CreateSequence(in); // Get count - size_t num_values = 0; - OrtStatus* st2 = OrtGetValueCount(seq_ort, &num_values); - rels.add(st2); - ASSERT_EQ(st2, nullptr); + size_t num_values = seq_ort.GetCount(); ASSERT_EQ(num_values, N); // test negative case - OrtValue* tmp = nullptr; - OrtStatus* st_temp = OrtGetValue(seq_ort, 999, default_allocator.get(), &tmp); - rels.add(st_temp); - rel.add(tmp); - ASSERT_NE(st_temp, nullptr); + bool failed = false; + try { + auto temp = seq_ort.GetValue(999, default_allocator.get()); + } catch (const Ort::Exception& e) { + failed = e.GetOrtErrorCode() == ORT_RUNTIME_EXCEPTION; + } + ASSERT_EQ(failed, true); // Fetch for (int idx = 0; idx < N; ++idx) { - OrtValue* map_out = nullptr; - OrtStatus* st = OrtGetValue(seq_ort, idx, default_allocator.get(), &map_out); - rel.add(map_out); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(map_out, nullptr); + Ort::Value map_out = seq_ort.GetValue(idx, default_allocator.get()); // fetch the map // first fetch the keys - OrtValue* keys_ort = nullptr; - st = OrtGetValue(map_out, 0, default_allocator.get(), &keys_ort); - rel.add(keys_ort); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(keys_ort, nullptr); - - std::unique_ptr keys_ret_u; - int64_t* keys_ret = keys_ret_u.get(); - st = OrtGetTensorMutableData(keys_ort, reinterpret_cast(&keys_ret)); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(keys_ret, nullptr); + Ort::Value keys_ort = map_out.GetValue(0, default_allocator.get()); + + int64_t* keys_ret = keys_ort.GetTensorMutableData(); ASSERT_EQ(std::set(keys_ret, keys_ret + NUM_KV_PAIRS), std::set(std::begin(keys), std::end(keys))); // second fetch the values - OrtValue* values_ort = nullptr; - st = OrtGetValue(map_out, 1, default_allocator.get(), &values_ort); - rel.add(values_ort); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(values_ort, nullptr); - - std::unique_ptr values_ret_u; - float* values_ret = values_ret_u.get(); - st = OrtGetTensorMutableData(values_ort, reinterpret_cast(&values_ret)); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(values_ret, nullptr); + Ort::Value values_ort = map_out.GetValue(1, default_allocator.get()); + + float* values_ret = values_ort.GetTensorMutableData(); ASSERT_EQ(std::set(values_ret, values_ret + NUM_KV_PAIRS), std::set(std::begin(values), std::end(values))); } @@ -138,89 +88,50 @@ TEST_F(CApiTest, CreateGetVectorOfMapsInt64Float) { // support zipmap output ty TEST_F(CApiTest, CreateGetVectorOfMapsStringFloat) { // support zipmap output type seq(map(string, float)) // Creation - std::unique_ptr default_allocator(std::make_unique()); - OrtAllocatorInfo* info; - ORT_THROW_ON_ERROR(OrtCreateAllocatorInfo("Cpu", OrtDeviceAllocator, 0, OrtMemTypeDefault, &info)); - std::unique_ptr rel_info(info, OrtReleaseAllocatorInfo); - - RelAllocations rel(&OrtReleaseValue); - RelAllocations rels(&OrtReleaseStatus); + auto default_allocator = std::make_unique(); + Ort::AllocatorInfo info("Cpu", OrtDeviceAllocator, 0, OrtMemTypeDefault); const int N = 3; const int64_t NUM_KV_PAIRS = 4; - std::vector in(N); + std::vector in; const char* keys_arr[NUM_KV_PAIRS] = {"abc", "def", "ghi", "jkl"}; std::vector keys{keys_arr, keys_arr + NUM_KV_PAIRS}; std::vector dims = {NUM_KV_PAIRS}; std::vector values{3.0f, 1.0f, 2.f, 0.f}; for (int i = 0; i < N; ++i) { // create key tensor - OrtValue* keys_tensor = OrtCreateTensorWithDataAsOrtValue(info, keys.data(), keys.size() * sizeof(std::string), - dims, ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING); - ASSERT_NE(keys_tensor, nullptr); - rel.add(keys_tensor); - + Ort::Value keys_tensor = Ort::Value::CreateTensor(info, keys.data(), keys.size() * sizeof(std::string), + dims.data(), dims.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_STRING); // create value tensor - OrtValue* values_tensor = OrtCreateTensorWithDataAsOrtValue(info, values.data(), values.size() * sizeof(float), - dims, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT); - ASSERT_NE(values_tensor, nullptr); - rel.add(values_tensor); + Ort::Value values_tensor = Ort::Value::CreateTensor(info, values.data(), values.size() * sizeof(float), + dims.data(), dims.size(), ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT); // create map ort value - std::vector map_in{keys_tensor, values_tensor}; - OrtValue* map_ort = nullptr; - OrtStatus* stx = OrtCreateValue(map_in.data(), 2, ONNX_TYPE_MAP, &map_ort); - rels.add(stx); - rel.add(map_ort); - ASSERT_EQ(stx, nullptr); - ASSERT_NE(map_ort, nullptr); - - in[i] = map_ort; + in.emplace_back(Ort::Value::CreateMap(keys_tensor, values_tensor)); } // repeat above 3 steps N times and store the result in an OrtValue array // create sequence ort value - OrtValue* seq_ort = nullptr; - OrtStatus* sty = OrtCreateValue(in.data(), N, ONNX_TYPE_SEQUENCE, &seq_ort); - rels.add(sty); - rel.add(seq_ort); - ASSERT_EQ(sty, nullptr); - ASSERT_NE(seq_ort, nullptr); + Ort::Value seq_ort = Ort::Value::CreateSequence(in); // Get count - size_t num_values; - OrtStatus* st2 = OrtGetValueCount(seq_ort, &num_values); - rels.add(st2); - ASSERT_EQ(st2, nullptr); + size_t num_values = seq_ort.GetCount(); ASSERT_EQ(num_values, N); // Fetch for (int idx = 0; idx < N; ++idx) { - OrtValue* map_out = nullptr; - OrtStatus* st = OrtGetValue(seq_ort, idx, default_allocator.get(), &map_out); - rel.add(map_out); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(map_out, nullptr); + Ort::Value map_out = seq_ort.GetValue(idx, default_allocator.get()); // fetch the map // first fetch the keys - OrtValue* keys_ort = nullptr; - st = OrtGetValue(map_out, 0, default_allocator.get(), &keys_ort); - rel.add(keys_ort); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(keys_ort, nullptr); - - size_t data_len; - st = OrtGetStringTensorDataLength(keys_ort, &data_len); - rels.add(st); - ASSERT_EQ(st, nullptr); + Ort::Value keys_ort = map_out.GetValue(0, default_allocator.get()); + + size_t data_len = keys_ort.GetStringTensorDataLength(); std::string result(data_len, '\0'); std::vector offsets(NUM_KV_PAIRS); - st = OrtGetStringTensorContent(keys_ort, (void*)result.data(), data_len, offsets.data(), offsets.size()); - rels.add(st); + keys_ort.GetStringTensorContent((void*)result.data(), data_len, offsets.data(), offsets.size()); + const char* s = result.data(); std::set keys_ret; for (size_t i = 0; i < offsets.size(); ++i) { @@ -232,19 +143,9 @@ TEST_F(CApiTest, CreateGetVectorOfMapsStringFloat) { // support zipmap output t ASSERT_EQ(keys_ret, std::set(std::begin(keys), std::end(keys))); // second fetch the values - OrtValue* values_ort = nullptr; - st = OrtGetValue(map_out, 1, default_allocator.get(), &values_ort); - rel.add(values_ort); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(values_ort, nullptr); - - std::unique_ptr values_ret_u; - float* values_ret = values_ret_u.get(); - st = OrtGetTensorMutableData(values_ort, reinterpret_cast(&values_ret)); - rels.add(st); - ASSERT_EQ(st, nullptr); - ASSERT_NE(values_ret, nullptr); + Ort::Value values_ort = map_out.GetValue(1, default_allocator.get()); + + float* values_ret = values_ort.GetTensorMutableData(); ASSERT_EQ(std::set(values_ret, values_ret + NUM_KV_PAIRS), std::set(std::begin(values), std::end(values))); } diff --git a/onnxruntime/test/shared_lib/test_run_options.cc b/onnxruntime/test/shared_lib/test_run_options.cc index ea20ada5f1f36..5a07d663546e4 100644 --- a/onnxruntime/test/shared_lib/test_run_options.cc +++ b/onnxruntime/test/shared_lib/test_run_options.cc @@ -3,13 +3,12 @@ #include "core/session/onnxruntime_cxx_api.h" #include "test_fixture.h" -using namespace onnxruntime; TEST_F(CApiTest, run_options) { - std::unique_ptr options(OrtCreateRunOptions()); + Ort::RunOptions options; ASSERT_NE(options, nullptr); - ASSERT_EQ(OrtRunOptionsSetRunLogVerbosityLevel(options.get(), 1), nullptr); - ASSERT_EQ(OrtRunOptionsSetRunTag(options.get(), "abc"), nullptr); - ASSERT_STREQ(OrtRunOptionsGetRunTag(options.get()), "abc"); - ASSERT_EQ(OrtRunOptionsGetRunLogVerbosityLevel(options.get()), unsigned(1)); + options.SetRunLogVerbosityLevel(1); + options.SetRunTag("abc"); + ASSERT_STREQ(options.GetRunTag(), "abc"); + ASSERT_EQ(options.GetRunLogVerbosityLevel(), unsigned(1)); } diff --git a/onnxruntime/test/shared_lib/test_tensor_loader.cc b/onnxruntime/test/shared_lib/test_tensor_loader.cc index 4445770f40171..5fe845af11c17 100644 --- a/onnxruntime/test/shared_lib/test_tensor_loader.cc +++ b/onnxruntime/test/shared_lib/test_tensor_loader.cc @@ -2,6 +2,7 @@ // Licensed under the MIT License. #include "core/session/onnxruntime_cxx_api.h" +#include "core/common/common.h" #include "onnx_protobuf.h" #include "test_fixture.h" diff --git a/onnxruntime/test/tvm/tvm_basic_test.cc b/onnxruntime/test/tvm/tvm_basic_test.cc index 6924232797f26..9c2ffbce3db82 100644 --- a/onnxruntime/test/tvm/tvm_basic_test.cc +++ b/onnxruntime/test/tvm/tvm_basic_test.cc @@ -9,6 +9,7 @@ #include "core/graph/graph_viewer.h" #include "core/providers/cpu/cpu_execution_provider.h" #include "core/session/inference_session.h" +#include "core/session/onnxruntime_cxx_api.h" #include "core/common/logging/logging.h" #include "test/framework/test_utils.h" #include "test/test_environment.h" @@ -74,8 +75,8 @@ class UnionSet { std::vector farthers_; }; -static DLDataType GetDataType(DType type) { - if (type == DType::TDouble) { +static DLDataType GetDataType(ONNXTensorElementDataType type) { + if (type == ONNX_TENSOR_ELEMENT_DATA_TYPE_DOUBLE) { return {kDLFloat, 64, 1}; } else ORT_THROW("not implement."); @@ -209,48 +210,57 @@ class FuseExecutionProviderX : public CPUExecutionProvider { }; //we use lambda to capture the tvm model, so we can use it to get the funciton. - compute_info.compute_func = [](FunctionState state, ONNXRunTimeTensor* input_tensors, size_t num_inputs, ONNXRunTimeTensor* output_tensors, size_t num_outputs) { + compute_info.compute_func = [](FunctionState state, const OrtCustomOpApi* api, OrtKernelContext* context) { + Ort::CustomOpApi ort{*api}; + TVMFuncState* tvm_state = reinterpret_cast(state); + std::vector> input_shapes; + std::vector> output_shapes; + auto eval_func_name = "func"; DLContext cpu_context = {kDLCPU, 0}; + size_t num_inputs = ort.KernelContext_GetInputCount(context); + size_t num_outputs = ort.KernelContext_GetOutputCount(context); size_t n_args = num_inputs + num_outputs; std::vector dl_tensors(n_args); std::vector tvm_values(n_args); std::vector tvm_type_codes(n_args); for (auto i = 0; i < num_inputs; i++) { + const OrtValue* input_tensor = ort.KernelContext_GetInput(context, i); + auto tensor_info = ort.GetTensorTypeAndShape(input_tensor); + auto tensor_type = ort.GetTensorElementType(tensor_info); + input_shapes.emplace_back(ort.GetTensorShape(tensor_info)); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); + tvm_type_codes[i] = kNDArrayContainer; dl_tensors[i].ctx = cpu_context; - dl_tensors[i].dtype = GetDataType(input_tensors[i].dtype); + dl_tensors[i].dtype = GetDataType(tensor_type); dl_tensors[i].strides = nullptr; dl_tensors[i].byte_offset = 0; - dl_tensors[i].data = input_tensors[i].data; - dl_tensors[i].ndim = input_tensors[i].ndim; - dl_tensors[i].shape = input_tensors[i].shape; + dl_tensors[i].data = const_cast(ort.GetTensorData(input_tensor)); + dl_tensors[i].ndim = input_shapes.back().size(); + dl_tensors[i].shape = input_shapes.back().data(); tvm_values[i].v_handle = &dl_tensors[i]; } for (auto i = 0; i < num_outputs; i++) { //setup output tensor property //todo: type should be set by framework. - output_tensors[i].dtype = input_tensors[0].dtype; - //todo: shape inference - output_tensors[i].ndim = input_tensors[0].ndim; - output_tensors[i].shape = new int64_t[output_tensors[i].ndim]; - memcpy(output_tensors[i].shape, input_tensors[0].shape, sizeof(int64_t) * output_tensors[i].ndim); - int64_t size = 1; - for (auto j = 0; j < output_tensors[i].ndim; j++) - size *= output_tensors[i].shape[j]; - output_tensors[i].data = (*(tvm_state->test_allocate_func))(tvm_state->allocator, sizeof(double) * size, 64); + output_shapes.push_back(input_shapes[i]); + OrtValue* output_tensor = ort.KernelContext_GetOutput(context, i, output_shapes[i].data(), output_shapes[i].size()); + auto tensor_info = ort.GetTensorTypeAndShape(output_tensor); + auto tensor_type = ort.GetTensorElementType(tensor_info); + ort.ReleaseTensorTypeAndShapeInfo(tensor_info); tvm_type_codes[num_inputs + i] = kNDArrayContainer; dl_tensors[num_inputs + i].ctx = cpu_context; - dl_tensors[num_inputs + i].dtype = GetDataType(output_tensors[i].dtype); + dl_tensors[num_inputs + i].dtype = GetDataType(tensor_type); dl_tensors[num_inputs + i].strides = nullptr; dl_tensors[num_inputs + i].byte_offset = 0; - dl_tensors[num_inputs + i].data = output_tensors[i].data; - dl_tensors[num_inputs + i].ndim = output_tensors[i].ndim; - dl_tensors[num_inputs + i].shape = output_tensors[i].shape; + dl_tensors[num_inputs + i].data = ort.GetTensorMutableData(output_tensor); + dl_tensors[num_inputs + i].ndim = output_shapes.back().size(); + dl_tensors[num_inputs + i].shape = output_shapes.back().data(); tvm_values[num_inputs + i].v_handle = &dl_tensors[num_inputs + i]; } @@ -285,7 +295,7 @@ static void RunSession(InferenceSession& session_object, std::vector& dims_y, std::vector& values_y) { // prepare inputs - MLValue ml_value; + OrtValue ml_value; CreateMLValue(TestCPUExecutionProvider()->GetAllocator(0, OrtMemTypeDefault), dims_x, values_x, &ml_value); NameMLValMap feeds; feeds.insert(std::make_pair("X1", ml_value)); @@ -293,7 +303,7 @@ static void RunSession(InferenceSession& session_object, // prepare outputs std::vector output_names; output_names.push_back("Y4"); - std::vector fetches; + std::vector fetches; // Now run common::Status st = session_object.Run(run_options, feeds, output_names, &fetches); diff --git a/onnxruntime/test/util/compare_mlvalue.cc b/onnxruntime/test/util/compare_mlvalue.cc deleted file mode 100644 index 96492c8987184..0000000000000 --- a/onnxruntime/test/util/compare_mlvalue.cc +++ /dev/null @@ -1,395 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -#include "test/compare_mlvalue.h" -#include -#include - -#ifdef USE_FULL_PROTOBUF -#include -#else -#include -#endif - -#include "core/graph/onnx_protobuf.h" -#include "core/framework/tensorprotoutils.h" -#include "Eigen/Core" -#include "Eigen/src/Core/arch/GPU/Half.h" - -using namespace onnxruntime; - -#if (!EIGEN_VERSION_AT_LEAST(3, 3, 6)) -namespace Eigen { -namespace half_impl { -using __half_raw = ::Eigen::half_impl::__half; -} -} // namespace Eigen -#endif - -namespace { - -template -bool IsResultCloselyMatch(const T& outvalue, const T& expected_value, const double diff, const double tol) { - if (diff > tol) return false; - if (std::isnan(diff) && !(std::isnan(outvalue) && std::isnan(expected_value)) && !(std::isinf(outvalue) && std::isinf(expected_value))) return false; - return true; -} - -template -std::pair CompareFloatResult(const Tensor& outvalue, const Tensor& expected_value, - double per_sample_tolerance, - double relative_per_sample_tolerance, - bool post_processing) { - const size_t size1 = expected_value.Shape().Size(); - const FLOAT_TYPE* expected_output = expected_value.template Data(); - const FLOAT_TYPE* real_output = outvalue.template Data(); - std::pair res = std::make_pair(COMPARE_RESULT::SUCCESS, ""); - double max_diff = 0; - size_t diff_count = 0; - for (size_t di = 0; di != size1; ++di) { - const double real_value = post_processing ? std::max(0.0, std::min(255.0, real_output[di])) - : real_output[di]; - const double diff = fabs(expected_output[di] - real_value); - const double tol = per_sample_tolerance + relative_per_sample_tolerance * fabs(expected_output[di]); - if (!IsResultCloselyMatch(real_value, expected_output[di], diff, tol)) { - res.first = COMPARE_RESULT::RESULT_DIFFERS; - // update error message if this is a larger diff - if (diff > max_diff || (std::isnan(diff) && !std::isnan(max_diff))) { - int64_t expected_int = 0; - int64_t real_int = 0; - memcpy(&expected_int, &expected_output[di], sizeof(FLOAT_TYPE)); - memcpy(&real_int, &real_output[di], sizeof(FLOAT_TYPE)); - - std::ostringstream oss; - oss << std::hex << "expected " << expected_output[di] << " (" << expected_int << "), got " - << real_value << " (" << real_int << ")" - << ", diff: " << diff << ", tol=" << tol << "."; - res.second = oss.str(); - max_diff = diff; - } - ++diff_count; - } - } - - if (res.first == COMPARE_RESULT::SUCCESS) return res; - - std::ostringstream oss; - oss << res.second << " " << diff_count << " of " << size1 << " differ"; - res.second = oss.str(); - return res; -} - -template -std::pair IsResultExactlyMatch(const Tensor& outvalue, const Tensor& expected_value) { - const size_t size1 = expected_value.Shape().Size(); - const T* expected_output = expected_value.template Data(); - const T* real_output = outvalue.template Data(); - for (size_t di = 0; di != size1; ++di) { - if (expected_output[di] != real_output[di]) { - std::ostringstream oss; - oss << "expected " << expected_output[di] << ", got " << real_output[di]; - return std::make_pair(COMPARE_RESULT::RESULT_DIFFERS, oss.str()); - } - } - return std::make_pair(COMPARE_RESULT::SUCCESS, ""); -} - -std::pair CompareFloat16Result(const Tensor& outvalue, const Tensor& expected_value, - double per_sample_tolerance, - double relative_per_sample_tolerance, - bool post_processing) { - const size_t size1 = expected_value.Shape().Size(); - const MLFloat16* expected_output = expected_value.template Data(); - const MLFloat16* real_output = outvalue.template Data(); - for (size_t di = 0; di != size1; ++di) { - float expected = Eigen::half_impl::half_to_float(Eigen::half_impl::__half_raw(expected_output[di].val)); - float real = Eigen::half_impl::half_to_float(Eigen::half_impl::__half_raw(real_output[di].val)); - real = post_processing ? std::max(0.0f, std::min(255.0f, real)) : real; - const double diff = fabs(expected - real); - const double rtol = per_sample_tolerance + relative_per_sample_tolerance * fabs(expected); - if (!IsResultCloselyMatch(real, expected, diff, rtol)) { - std::ostringstream oss; - oss << "expected " << expected << ", got " << real << ", diff: " << diff << ", tol=" << rtol; - - return std::make_pair(COMPARE_RESULT::RESULT_DIFFERS, oss.str()); - } - } - return std::make_pair(COMPARE_RESULT::SUCCESS, ""); -} - -std::pair CompareBFloat16Result(const Tensor& outvalue, const Tensor& expected_value, - double per_sample_tolerance, - double relative_per_sample_tolerance, - bool post_processing) { - const size_t size1 = expected_value.Shape().Size(); - const BFloat16* expected_output = expected_value.template Data(); - const BFloat16* real_output = outvalue.template Data(); - for (size_t di = 0; di != size1; ++di) { - float expected = expected_output[di].ToFloat(); - float real = real_output[di].ToFloat(); - real = post_processing ? std::max(0.0f, std::min(255.0f, real)) : real; - const double diff = fabs(expected - real); - const double rtol = per_sample_tolerance + relative_per_sample_tolerance * fabs(expected); - if (!IsResultCloselyMatch(real, expected, diff, rtol)) { - std::ostringstream oss; - oss << "expected " << expected << ", got " << real << ", diff: " << diff << ", tol=" << rtol; - - return std::make_pair(COMPARE_RESULT::RESULT_DIFFERS, oss.str()); - } - } - return std::make_pair(COMPARE_RESULT::SUCCESS, ""); -} - -std::pair CompareTwoTensors(const Tensor& outvalue, const Tensor& expected_tensor, - double per_sample_tolerance, - double relative_per_sample_tolerance, - bool post_processing) { - if (expected_tensor.Shape() != outvalue.Shape()) { - std::ostringstream oss; - oss << "shape mismatch, expect " << expected_tensor.Shape().ToString() << " got " << outvalue.Shape().ToString(); - return std::make_pair(COMPARE_RESULT::SHAPE_MISMATCH, oss.str()); - } - auto p1 = outvalue.DataType(); - if (p1 == DataTypeImpl::GetType()) { - return CompareFloatResult(outvalue, expected_tensor, - per_sample_tolerance, relative_per_sample_tolerance, post_processing); - } else if (p1 == DataTypeImpl::GetType()) { - return CompareFloatResult(outvalue, expected_tensor, - per_sample_tolerance, relative_per_sample_tolerance, post_processing); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return IsResultExactlyMatch(outvalue, expected_tensor); - } else if (p1 == DataTypeImpl::GetType()) { - return CompareFloat16Result(outvalue, expected_tensor, - per_sample_tolerance, relative_per_sample_tolerance, post_processing); - } else if (p1 == DataTypeImpl::GetType()) { - return CompareBFloat16Result(outvalue, expected_tensor, - per_sample_tolerance, relative_per_sample_tolerance, post_processing); - } else { - return std::make_pair(COMPARE_RESULT::NOT_SUPPORT, ""); - } -} -template -std::pair CompareSeqOfMapToFloat(const T& real_output_vector, const T& expected_value, - double per_sample_tolerance, - double relative_per_sample_tolerance, - bool post_processing) { - if (real_output_vector.size() != expected_value.size()) { - std::ostringstream oss; - oss << "vector size mismatch, expected " << expected_value.size() << ", got " << real_output_vector.size(); - return std::make_pair(COMPARE_RESULT::RESULT_DIFFERS, oss.str()); - } - for (size_t i = 0; i != real_output_vector.size(); ++i) { - const auto& expected_map = expected_value[i]; - //compare if expected_map equals real_output_vector[i] - if (real_output_vector[i].size() != expected_map.size()) { - std::ostringstream oss; - oss << "map size mismatch, expected " << expected_map.size() << ", got " << real_output_vector[i].size(); - return std::make_pair(COMPARE_RESULT::RESULT_DIFFERS, oss.str()); - } - - for (const auto& real_output_key_value_pair : real_output_vector[i]) { - auto expected_key_value_pair = expected_map.find(real_output_key_value_pair.first); - if (expected_key_value_pair == expected_map.end()) { - return std::make_pair(COMPARE_RESULT::RESULT_DIFFERS, ""); - } - const double real = post_processing - ? std::max(0.0, std::min(255.0, real_output_key_value_pair.second)) - : real_output_key_value_pair.second; - const double diff = fabs(expected_key_value_pair->second - real); - const double rtol = per_sample_tolerance + relative_per_sample_tolerance * fabs(expected_key_value_pair->second); - if (!IsResultCloselyMatch(real, expected_key_value_pair->second, diff, rtol)) { - std::ostringstream oss; - oss << "expected " << expected_key_value_pair->second << ", got " << real - << ", diff: " << diff << ", tol=" << rtol; - return std::make_pair(COMPARE_RESULT::RESULT_DIFFERS, oss.str()); - } - } - } - return std::make_pair(COMPARE_RESULT::SUCCESS, ""); -} - -const char* ElementTypeToString(MLDataType type) { - if (type == DataTypeImpl::GetType()) { - return "tensor(float)"; - } else if (type == DataTypeImpl::GetType()) { - return "tensor(bool)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(int32)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(double)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(string)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(uint8)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(uint16)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(int16)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(int64)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(uint32)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(uint64)"; - } - - else if (type == DataTypeImpl::GetType()) { - return "tensor(MLFloat16)"; - } else if (type == DataTypeImpl::GetType()) { - return "tensor(bfloat16)"; - } else { - return "unknown"; - } -} - -//The expected_shape could contain unknown dimensions, but the real_shape cannot -bool AreShapesEqual(const std::vector& real_shape, const ::ONNX_NAMESPACE::TensorShapeProto& expected_shape) { - const int len = expected_shape.dim_size(); - if (len < 0) return false; - if (real_shape.size() != static_cast(len)) return false; - for (int i = 0; i != len; ++i) { - if (!expected_shape.dim(i).has_dim_value()) { - //symbolic shape, cannot validate it right now, assume it matches every thing - continue; - } - ::google::protobuf::int64 d = expected_shape.dim(i).dim_value(); - if (d != real_shape[i]) return false; - } - return true; -} - -template -std::ostringstream& VectorToString(const std::vector& input, std::ostringstream& oss) { - size_t len = input.size(); - oss << "["; - if (len > 0) { - oss << input[0]; - for (size_t i = 1; i != len; ++i) { - oss << ", " << input[i]; - } - } - oss << "]"; - return oss; -} - -} // namespace - -namespace onnxruntime { -std::pair CompareMLValue(const MLValue& o, const MLValue& expected_mlvalue, - double per_sample_tolerance, - double relative_per_sample_tolerance, - bool post_processing) { - if (o.IsTensor() != expected_mlvalue.IsTensor() || o.Type() != expected_mlvalue.Type()) { - return std::make_pair(COMPARE_RESULT::TYPE_MISMATCH, ""); - } - if (!o.IsTensor()) { - if (o.Type() == DataTypeImpl::GetType()) { - return CompareSeqOfMapToFloat(o.Get(), expected_mlvalue.Get(), - per_sample_tolerance, relative_per_sample_tolerance, post_processing); - } - if (o.Type() == DataTypeImpl::GetType()) { - return CompareSeqOfMapToFloat(o.Get(), expected_mlvalue.Get(), - per_sample_tolerance, relative_per_sample_tolerance, post_processing); - } - return std::make_pair(COMPARE_RESULT::NOT_SUPPORT, ""); - } - const Tensor& outvalue = o.Get(); - const Tensor& expected_tensor = expected_mlvalue.Get(); - if (outvalue.DataType() != expected_tensor.DataType()) { - std::ostringstream oss; - oss << "expect " << ElementTypeToString(expected_tensor.DataType()) - << " got " << ElementTypeToString(outvalue.DataType()); - return std::make_pair(COMPARE_RESULT::TYPE_MISMATCH, oss.str()); - } - return CompareTwoTensors(outvalue, expected_tensor, - per_sample_tolerance, relative_per_sample_tolerance, post_processing); -} - -std::pair VerifyValueInfo(const ONNX_NAMESPACE::ValueInfoProto& v, const OrtValue* o) { - if (!v.has_type()) return std::make_pair(COMPARE_RESULT::SUCCESS, ""); - if (v.type().has_tensor_type()) { - if (!OrtIsTensor(o)) { - return std::make_pair(COMPARE_RESULT::TYPE_MISMATCH, ""); - } - - ::ONNX_NAMESPACE::TypeProto_Tensor t = v.type().tensor_type(); - //below code doesn't work - //if (((TensorTypeBase*)o.Type())->GetElementType() != DataTypeImpl::ElementTypeFromProto(t.elem_type())) { - // return COMPARE_RESULT::TYPE_MISMATCH; - //} - std::unique_ptr info; - { - OrtTensorTypeAndShapeInfo* t1 = nullptr; - ORT_THROW_ON_ERROR(OrtGetTensorShapeAndType(o, &t1)); - info.reset(t1); - } - ONNXTensorElementDataType real_type = OrtGetTensorElementType(info.get()); - ONNXTensorElementDataType expected_type = onnxruntime::utils::CApiElementTypeFromProtoType(t.elem_type()); - if (real_type != expected_type) { - std::ostringstream oss; - oss << "expect " << ElementTypeToString((MLDataType)expected_type) - << " got " << ElementTypeToString((MLDataType)real_type); - - return std::make_pair(COMPARE_RESULT::TYPE_MISMATCH, oss.str()); - } - std::vector shape = GetTensorShape(info.get()); - const auto& tensor_shape_proto = t.shape(); - if (!AreShapesEqual(shape, tensor_shape_proto)) { - std::ostringstream oss; - oss << "Tensor shape mismatch, model file expects '"; - if (tensor_shape_proto.dim_size() == 0) { - oss << "(unknown)"; - } else { - oss << tensor_shape_proto; - } - oss << "', real output is "; - VectorToString(shape, oss); - return std::make_pair(COMPARE_RESULT::SHAPE_MISMATCH, oss.str()); - } - } else { - //Cannot do this check for tensor type. - //For tensor type, o.Type() is TensorTypeBase*, but p points to a subclass of TensorTypeBase - auto p = DataTypeImpl::TypeFromProto(v.type()); - if (((MLValue*)o)->Type() != p) { - return std::make_pair(COMPARE_RESULT::TYPE_MISMATCH, ""); - } - } - return std::make_pair(COMPARE_RESULT::SUCCESS, ""); -} -} // namespace onnxruntime diff --git a/onnxruntime/test/util/include/test/compare_mlvalue.h b/onnxruntime/test/util/include/test/compare_mlvalue.h deleted file mode 100644 index 32024287a76d4..0000000000000 --- a/onnxruntime/test/util/include/test/compare_mlvalue.h +++ /dev/null @@ -1,27 +0,0 @@ -// Copyright (c) Microsoft Corporation. All rights reserved. -// Licensed under the MIT License. - -#pragma once -//TODO(): move compare_mlvalue.{h,cc} to test dir - -#include -#include -#include - -namespace ONNX_NAMESPACE { -class ValueInfoProto; -} -namespace onnxruntime { -enum class COMPARE_RESULT { - SUCCESS, - RESULT_DIFFERS, - TYPE_MISMATCH, - SHAPE_MISMATCH, - NOT_SUPPORT -}; -std::pair CompareMLValue(const MLValue& real, const MLValue& expected, double per_sample_tolerance, - double relative_per_sample_tolerance, bool post_processing); - -//verify if the 'value' matches the 'expected' ValueInfoProto. 'value' is a model output -std::pair VerifyValueInfo(const ONNX_NAMESPACE::ValueInfoProto& expected, const OrtValue* value); -} // namespace onnxruntime diff --git a/tools/ci_build/build.py b/tools/ci_build/build.py index 7449970d8fc85..a0412384010c7 100755 --- a/tools/ci_build/build.py +++ b/tools/ci_build/build.py @@ -84,7 +84,7 @@ def parse_arguments(): # Python bindings parser.add_argument("--enable_pybind", action='store_true', help="Enable Python Bindings.") parser.add_argument("--build_wheel", action='store_true', help="Build Python Wheel. ") - parser.add_argument("--numpy_version", default='1.15.0', help="Installs a specific version of numpy " + parser.add_argument("--numpy_version", help="Installs a specific version of numpy " "before building the python binding.") parser.add_argument("--skip-keras-test", action='store_true', help="Skip tests with Keras if keras is installed") @@ -95,6 +95,11 @@ def parse_arguments(): # Build a shared lib parser.add_argument("--build_shared_lib", action='store_true', help="Build a shared library for the ONNXRuntime.") + # Build ONNX Runtime server + parser.add_argument("--build_server", action='store_true', help="Build server application for the ONNXRuntime.") + parser.add_argument("--enable_server_tests", action='store_true', help="Run server application tests.") + parser.add_argument("--enable_server_model_tests", action='store_true', help="Run server model tests.") + # Build options parser.add_argument("--cmake_extra_defines", nargs="+", help="Extra definitions to pass to CMake during build system generation. " + @@ -214,7 +219,7 @@ def install_ubuntu_deps(args): def install_python_deps(numpy_version=""): dep_packages = ['setuptools', 'wheel'] - dep_packages.append('numpy==%s' % numpy_version if numpy_version else 'numpy') + dep_packages.append('numpy=={}'.format(numpy_version) if numpy_version else 'numpy>=1.15.0') run_subprocess([sys.executable, '-m', 'pip', 'install', '--trusted-host', 'files.pythonhosted.org'] + dep_packages) def check_md5(filename, expected_md5): @@ -265,8 +270,10 @@ def download_test_data(build_dir, src_url, expected_md5, azure_sas_key): shutil.rmtree(models_dir) if shutil.which('unzip'): run_subprocess(['unzip','-qd', models_dir, local_zip_file]) - elif shutil.which('7za'): - run_subprocess(['7za','x', local_zip_file, '-y', '-o' + models_dir]) + elif shutil.which('7z'): # 7-Zip + run_subprocess(['7z','x', local_zip_file, '-y', '-o' + models_dir]) + elif shutil.which('7za'): # 7-Zip standalone + run_subprocess(['7za', 'x', local_zip_file, '-y', '-o' + models_dir]) else: #TODO: use python for unzip log.error("No unzip tool for use") @@ -324,9 +331,10 @@ def generate_build_tree(cmake_path, source_dir, build_dir, cuda_home, cudnn_home "-Donnxruntime_TENSORRT_HOME=" + (tensorrt_home if args.use_tensorrt else ""), # By default - we currently support only cross compiling for ARM/ARM64 (no native compilation supported through this script) "-Donnxruntime_CROSS_COMPILING=" + ("ON" if args.arm64 or args.arm else "OFF"), + "-Donnxruntime_BUILD_SERVER=" + ("ON" if args.build_server else "OFF"), "-Donnxruntime_BUILD_x86=" + ("ON" if args.x86 else "OFF"), # nGraph and TensorRT providers currently only supports full_protobuf option. - "-Donnxruntime_USE_FULL_PROTOBUF=" + ("ON" if args.use_full_protobuf or args.use_ngraph or args.use_tensorrt else "OFF"), + "-Donnxruntime_USE_FULL_PROTOBUF=" + ("ON" if args.use_full_protobuf or args.use_ngraph or args.use_tensorrt or args.build_server or args.gen_doc else "OFF"), "-Donnxruntime_DISABLE_CONTRIB_OPS=" + ("ON" if args.disable_contrib_ops else "OFF"), "-Donnxruntime_MSVC_STATIC_RUNTIME=" + ("ON" if args.enable_msvc_static_runtime else "OFF"), ] @@ -535,6 +543,7 @@ def run_onnxruntime_tests(args, source_dir, ctest_path, build_dir, configs, enab if onnxml_test: run_subprocess([sys.executable, 'onnxruntime_test_python_keras.py'], cwd=cwd, dll_path=dll_path) + def run_onnx_tests(build_dir, configs, onnx_test_data_dir, provider, enable_parallel_executor_test, num_parallel_models): for config in configs: cwd = get_config_build_dir(build_dir, config) @@ -565,6 +574,44 @@ def run_onnx_tests(build_dir, configs, onnx_test_data_dir, provider, enable_para run_subprocess([exe,'-x'] + cmd, cwd=cwd) +def run_server_tests(build_dir, configs): + pip_freeze_result = run_subprocess([sys.executable, '-m', 'pip', 'freeze'], capture=True).stdout + installed_packages = [r.decode().split('==')[0] for r in pip_freeze_result.split()] + if not (('requests' in installed_packages) and ('protobuf' in installed_packages) and ('numpy' in installed_packages)): + if hasattr(sys, 'real_prefix'): + # In virtualenv + run_subprocess([sys.executable, '-m', 'pip', 'install', '--trusted-host', 'files.pythonhosted.org', 'requests', 'protobuf', 'numpy']) + else: + # Outside virtualenv + run_subprocess([sys.executable, '-m', 'pip', 'install', '--user', '--trusted-host', 'files.pythonhosted.org', 'requests', 'protobuf', 'numpy']) + + for config in configs: + config_build_dir = get_config_build_dir(build_dir, config) + if is_windows(): + server_app_path = os.path.join(config_build_dir, config, 'onnxruntime_server.exe') + else: + server_app_path = os.path.join(config_build_dir, 'onnxruntime_server') + server_test_folder = os.path.join(config_build_dir, 'server_test') + server_test_data_folder = os.path.join(os.path.join(config_build_dir, 'testdata'), 'server') + run_subprocess([sys.executable, 'test_main.py', server_app_path, server_test_data_folder, server_test_data_folder], cwd=server_test_folder, dll_path=None) + + +def run_server_model_tests(build_dir, configs): + for config in configs: + config_build_dir = get_config_build_dir(build_dir, config) + if is_windows(): + server_app_path = os.path.join(config_build_dir, config, 'onnxruntime_server.exe') + test_raw_data_folder = os.path.join(config_build_dir, 'models') + else: + server_app_path = os.path.join(config_build_dir, 'onnxruntime_server') + test_raw_data_folder = os.path.join(build_dir, 'models') + + server_test_folder = os.path.join(config_build_dir, 'server_test') + server_test_data_folder = os.path.join(config_build_dir, 'server_test_data') + run_subprocess([sys.executable, 'model_zoo_data_prep.py', test_raw_data_folder, server_test_data_folder], cwd=server_test_folder, dll_path=None) + run_subprocess([sys.executable, 'model_zoo_tests.py', server_app_path, test_raw_data_folder, server_test_data_folder], cwd=server_test_folder, dll_path=None) + + def build_python_wheel(source_dir, build_dir, configs, use_cuda, use_ngraph, use_tensorrt, nightly_build = False): for config in configs: cwd = get_config_build_dir(build_dir, config) @@ -624,7 +671,7 @@ def generate_documentation(source_dir, build_dir, configs): operator_doc_path = os.path.join(source_dir, 'docs', 'ContribOperators.md') for config in configs: #copy the gen_doc.py - shutil.copy(os.path.join(source_dir,'onnxruntime','python','tools','gen_doc.py'), + shutil.copy(os.path.join(source_dir,'tools','python','gen_doc.py'), os.path.join(build_dir,config, config)) run_subprocess([ sys.executable, @@ -632,10 +679,16 @@ def generate_documentation(source_dir, build_dir, configs): '--output_path', operator_doc_path ], cwd = os.path.join(build_dir,config, config)) - - docdiff = run_subprocess(['git', 'diff', operator_doc_path], capture=True).stdout + docdiff = '' + try: + docdiff = subprocess.check_output(['git', 'diff', operator_doc_path]) + except subprocess.CalledProcessError: + print('git diff returned non-zero error code') + + if len(docdiff) > 0: - raise BuildError("The updated operator document file "+operator_doc_path+" must be checked in") + raise BuildError('The updated operator document file '+str(operator_doc_path)+' must be checked in.\n diff:\n'+str(docdiff)) + def main(): args = parse_arguments() @@ -741,7 +794,7 @@ def main(): if args.test : run_onnxruntime_tests(args, source_dir, ctest_path, build_dir, configs, - args.enable_pybind if not args.skip_onnx_tests else False, + args.enable_pybind and not args.skip_onnx_tests, args.use_tvm, args.use_tensorrt, args.use_ngraph) # run the onnx model tests if requested explicitly. if args.enable_onnx_tests and not args.skip_onnx_tests: @@ -766,12 +819,18 @@ def main(): if args.use_mkldnn: run_onnx_tests(build_dir, configs, onnx_test_data_dir, 'mkldnn', True, 1) + if args.build_server and args.enable_server_tests: + run_server_tests(build_dir, configs) + + if args.build_server and args.enable_server_model_tests: + run_server_model_tests(build_dir, configs) + if args.build: if args.build_wheel: nightly_build = bool(os.getenv('NIGHTLY_BUILD') == '1') build_python_wheel(source_dir, build_dir, configs, args.use_cuda, args.use_ngraph, args.use_tensorrt, nightly_build) - if args.gen_doc: + if args.gen_doc and (args.build or args.test): generate_documentation(source_dir, build_dir, configs) log.info("Build complete") diff --git a/tools/ci_build/github/azure-pipelines/android-arm64-crosscompile-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/android-arm64-crosscompile-ci-pipeline.yml index 0243752fbb570..cfd453fa31aad 100644 --- a/tools/ci_build/github/azure-pipelines/android-arm64-crosscompile-ci-pipeline.yml +++ b/tools/ci_build/github/azure-pipelines/android-arm64-crosscompile-ci-pipeline.yml @@ -4,7 +4,7 @@ jobs: steps: - template: templates/set-test-data-variables-step.yml - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o android -r $(Build.BinariesDirectory) -d cpu' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o android -r $(Build.BinariesDirectory) -d cpu -x "--build_wheel"' displayName: 'Command Line Script' - task: ms.vss-governance-buildtask.governance-build-task-component-detection.ComponentGovernanceComponentDetection@0 diff --git a/tools/ci_build/github/azure-pipelines/azure-pipelines-py-packaging.yml b/tools/ci_build/github/azure-pipelines/azure-pipelines-py-packaging.yml index a2b873947f31d..2292df4089150 100644 --- a/tools/ci_build/github/azure-pipelines/azure-pipelines-py-packaging.yml +++ b/tools/ci_build/github/azure-pipelines/azure-pipelines-py-packaging.yml @@ -69,8 +69,8 @@ jobs: displayName: 'Clean untagged docker images' inputs: script: | - docker rm $(docker ps -a | grep Exited | awk '{print $1;}') || true - docker images -q --filter "dangling=true" | xargs -n1 -r docker rmi + docker container prune -f + docker image prune -f workingDirectory: $(Build.BinariesDirectory) continueOnError: true condition: always() diff --git a/tools/ci_build/github/azure-pipelines/c-api-packaging-pipelines.yml b/tools/ci_build/github/azure-pipelines/c-api-packaging-pipelines.yml index b4e874ade7257..22296ee997273 100644 --- a/tools/ci_build/github/azure-pipelines/c-api-packaging-pipelines.yml +++ b/tools/ci_build/github/azure-pipelines/c-api-packaging-pipelines.yml @@ -6,7 +6,7 @@ jobs: - template: templates/set-test-data-variables-step.yml - template: templates/set-version-number-variables-step.yml - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d cpu -r $(Build.BinariesDirectory) -x " --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum)"' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d cpu -r $(Build.BinariesDirectory) -x "--test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum)"' displayName: 'Build and Test Linux on Docker' - template: templates/c-api-artifacts-package-and-publish-steps-posix.yml parameters: @@ -22,7 +22,7 @@ jobs: - template: templates/set-test-data-variables-step.yml - template: templates/set-version-number-variables-step.yml - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d cpu -r $(Build.BinariesDirectory) -a x86 -x " --x86 --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum)"' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d cpu -r $(Build.BinariesDirectory) -a x86 -x "--x86 --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum)"' displayName: 'Build and Test Linux on Docker' - template: templates/c-api-artifacts-package-and-publish-steps-posix.yml parameters: @@ -38,7 +38,7 @@ jobs: - template: templates/set-test-data-variables-step.yml - template: templates/set-version-number-variables-step.yml - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d gpu -c cuda9.1-cudnn7.1 -r $(Build.BinariesDirectory) -x " --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum)"' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d gpu -c cuda9.1-cudnn7.1 -r $(Build.BinariesDirectory) -x "--test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum)"' displayName: 'Build and Test Linux on Docker' - template: templates/c-api-artifacts-package-and-publish-steps-posix.yml parameters: diff --git a/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml index ab4c4575feda1..f350adaab192b 100644 --- a/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml +++ b/tools/ci_build/github/azure-pipelines/linux-ci-pipeline.yml @@ -8,8 +8,8 @@ jobs: displayName: 'Clean untagged docker images' inputs: script: | - docker rm $(docker ps -a | grep Exited | awk '{print $1;}') || true - docker images -q --filter "dangling=true" | xargs -n1 -r docker rmi + docker container prune -f + docker image prune -f workingDirectory: $(Build.BinariesDirectory) continueOnError: true condition: always() @@ -30,7 +30,7 @@ jobs: pythonInterpreter: '/usr/bin/python3' workingDirectory: $(Build.BinariesDirectory) - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d cpu -r $(Build.BinariesDirectory) -x "--use_mklml --use_tvm"' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d cpu -r $(Build.BinariesDirectory) -x "--use_mklml --use_tvm --build_wheel"' displayName: 'Command Line Script' - task: ms.vss-governance-buildtask.governance-build-task-component-detection.ComponentGovernanceComponentDetection@0 diff --git a/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline-cuda9.yml b/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline-cuda9.yml index d4f1d5636aa89..aa262490ee54f 100644 --- a/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline-cuda9.yml +++ b/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline-cuda9.yml @@ -20,7 +20,7 @@ jobs: pythonInterpreter: '/usr/bin/python3' workingDirectory: $(Build.BinariesDirectory) - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d gpu -c cuda9.1-cudnn7.1 -r $(Build.BinariesDirectory)' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d gpu -c cuda9.1-cudnn7.1 -r $(Build.BinariesDirectory) -x "--build_wheel"' displayName: 'Command Line Script' - template: templates/clean-agent-build-directory-step.yml diff --git a/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml index d88daa67c5423..c19f646e6db01 100644 --- a/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml +++ b/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml @@ -8,8 +8,8 @@ jobs: displayName: 'Clean untagged docker images' inputs: script: | - docker rm $(docker ps -a | grep Exited | awk '{print $1;}') || true - docker images -q --filter "dangling=true" | xargs -n1 -r docker rmi + docker container prune -f + docker image prune -f workingDirectory: $(Build.BinariesDirectory) continueOnError: true condition: always() @@ -30,7 +30,7 @@ jobs: pythonInterpreter: '/usr/bin/python3' workingDirectory: $(Build.BinariesDirectory) - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d gpu -r $(Build.BinariesDirectory)' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d gpu -r $(Build.BinariesDirectory) -x "--build_wheel"' displayName: 'Command Line Script' - task: ms.vss-governance-buildtask.governance-build-task-component-detection.ComponentGovernanceComponentDetection@0 diff --git a/tools/ci_build/github/azure-pipelines/linux-gpu-tensorrt-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/linux-gpu-tensorrt-ci-pipeline.yml index f1e1be7865d27..90f597a28a72a 100644 --- a/tools/ci_build/github/azure-pipelines/linux-gpu-tensorrt-ci-pipeline.yml +++ b/tools/ci_build/github/azure-pipelines/linux-gpu-tensorrt-ci-pipeline.yml @@ -5,7 +5,7 @@ jobs: - template: templates/set-test-data-variables-step.yml # There are some tests in 20190130.zip that TensorRT can't run. Instead here use 20181210 opset8 for TensorRT test. - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d tensorrt -r $(Build.BinariesDirectory) -x "--test_data_url https://onnxruntimetestdata.blob.core.windows.net/models/20181210.zip --test_data_checksum a966def7447f4ff04f5665bca235b3f3"' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d tensorrt -r $(Build.BinariesDirectory) -x "--build_wheel --test_data_url https://onnxruntimetestdata.blob.core.windows.net/models/20181210.zip --test_data_checksum a966def7447f4ff04f5665bca235b3f3"' displayName: 'Command Line Script' diff --git a/tools/ci_build/github/azure-pipelines/linux-ngraph-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/linux-ngraph-ci-pipeline.yml index 24790be099e09..fe0769044e4d5 100644 --- a/tools/ci_build/github/azure-pipelines/linux-ngraph-ci-pipeline.yml +++ b/tools/ci_build/github/azure-pipelines/linux-ngraph-ci-pipeline.yml @@ -8,8 +8,8 @@ jobs: displayName: 'Clean untagged docker images' inputs: script: | - docker rm $(docker ps -a | grep Exited | awk '{print $1;}') || true - docker images -q --filter "dangling=true" | xargs -n1 -r docker rmi + docker container prune -f + docker image prune -f workingDirectory: $(Build.BinariesDirectory) continueOnError: true condition: always() @@ -30,7 +30,7 @@ jobs: pythonInterpreter: '/usr/bin/python3' workingDirectory: $(Build.BinariesDirectory) - - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d ngraph -r $(Build.BinariesDirectory) -x "--use_ngraph"' + - script: 'tools/ci_build/github/linux/run_dockerbuild.sh -o ubuntu16.04 -d ngraph -r $(Build.BinariesDirectory) -x "--use_ngraph --build_wheel"' displayName: 'Command Line Script' - task: ms.vss-governance-buildtask.governance-build-task-component-detection.ComponentGovernanceComponentDetection@0 diff --git a/tools/ci_build/github/azure-pipelines/mac-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/mac-ci-pipeline.yml index 068ea58256cf3..f12a995d59caa 100644 --- a/tools/ci_build/github/azure-pipelines/mac-ci-pipeline.yml +++ b/tools/ci_build/github/azure-pipelines/mac-ci-pipeline.yml @@ -4,10 +4,24 @@ jobs: vmImage: 'macOS-10.13' steps: - template: templates/set-test-data-variables-step.yml + - task: CmdLine@2 + displayName: 'Download azcopy' + inputs: + script: | + curl -so azcopy.tar.gz -L 'https://aka.ms/downloadazcopy-v10-mac' + tar -zxvf azcopy.tar.gz --strip 1 + workingDirectory: $(Build.BinariesDirectory) + - task: PythonScript@0 + displayName: 'Download test data' + inputs: + scriptPath: '$(Build.SourcesDirectory)/tools/ci_build/github/download_test_data.py' + arguments: --test_data_url $(TestDataUrl) --azure_region centralus + pythonInterpreter: '/usr/local/bin/python3' + workingDirectory: $(Build.BinariesDirectory) - script: | sudo python3 -m pip install numpy==1.15.0 sudo xcode-select --switch /Applications/Xcode_10.app/Contents/Developer - python3 $(Build.SourcesDirectory)/tools/ci_build/build.py --use_openmp --build_dir $(Build.BinariesDirectory) --build_wheel --skip_submodule_sync --parallel --build_shared_lib --enable_onnx_tests --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum) + python3 $(Build.SourcesDirectory)/tools/ci_build/build.py --use_openmp --build_dir $(Build.BinariesDirectory) --build_wheel --skip_submodule_sync --parallel --build_shared_lib --enable_onnx_tests --config Debug Release displayName: 'Build and Test OnnxRuntime lib for MacOS' - task: ms.vss-governance-buildtask.governance-build-task-component-detection.ComponentGovernanceComponentDetection@0 diff --git a/tools/ci_build/github/azure-pipelines/win-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/win-ci-pipeline.yml index b63664792a4cd..b796042d13142 100644 --- a/tools/ci_build/github/azure-pipelines/win-ci-pipeline.yml +++ b/tools/ci_build/github/azure-pipelines/win-ci-pipeline.yml @@ -13,7 +13,7 @@ jobs: displayName: 'Download test data and generate cmake config' inputs: filename: '$(Build.BinariesDirectory)\packages\python\python.exe' - arguments: '$(Build.SourcesDirectory)\tools\ci_build\build.py --config Debug Release --build_dir $(Build.BinariesDirectory) --skip_submodule_sync --cmake_path $(Build.BinariesDirectory)\cmake\bin\cmake.exe --ctest_path $(Build.BinariesDirectory)\cmake\bin\ctest.exe --use_tvm --enable_pybind --use_mkldnn --use_mklml --use_openmp --build_shared_lib --enable_onnx_tests --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum) --update' + arguments: '$(Build.SourcesDirectory)\tools\ci_build\build.py --config Debug Release --build_dir $(Build.BinariesDirectory) --skip_submodule_sync --cmake_path $(Build.BinariesDirectory)\cmake\bin\cmake.exe --ctest_path $(Build.BinariesDirectory)\cmake\bin\ctest.exe --use_tvm --enable_pybind --use_mkldnn --use_mklml --use_openmp --build_shared_lib --enable_onnx_tests --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum) --gen_doc --update' workingDirectory: "$(Build.BinariesDirectory)" - task: VSBuild@1 @@ -30,7 +30,7 @@ jobs: displayName: 'Test Debug' inputs: filename: '$(Build.BinariesDirectory)\packages\python\python.exe' - arguments: '$(Build.SourcesDirectory)\tools\ci_build\build.py --config Debug --build_dir $(Build.BinariesDirectory) --skip_submodule_sync --cmake_path $(Build.BinariesDirectory)\cmake\bin\cmake.exe --ctest_path $(Build.BinariesDirectory)\cmake\bin\ctest.exe --use_tvm --enable_pybind --use_mkldnn --use_mklml --use_openmp --build_shared_lib --enable_onnx_tests --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum) --test' + arguments: '$(Build.SourcesDirectory)\tools\ci_build\build.py --config Debug --build_dir $(Build.BinariesDirectory) --skip_submodule_sync --cmake_path $(Build.BinariesDirectory)\cmake\bin\cmake.exe --ctest_path $(Build.BinariesDirectory)\cmake\bin\ctest.exe --use_tvm --enable_pybind --use_mkldnn --use_mklml --use_openmp --build_shared_lib --enable_onnx_tests --test_data_url $(TestDataUrl) --test_data_checksum $(TestDataChecksum) --gen_doc --test' workingFolder: '$(Build.BinariesDirectory)' - task: VSBuild@1 displayName: 'Build C# Debug' diff --git a/tools/ci_build/github/download_test_data.py b/tools/ci_build/github/download_test_data.py index 03f6bf82c7c46..dc475071265fc 100755 --- a/tools/ci_build/github/download_test_data.py +++ b/tools/ci_build/github/download_test_data.py @@ -17,13 +17,15 @@ def get_azure_region(): def parse_arguments(): parser = argparse.ArgumentParser(description="ONNXRuntime Data Downloader.") parser.add_argument("--test_data_url", help="Test data URL.") + parser.add_argument("--azure_region", help="Azure region") return parser.parse_args() -def get_server_hostname(): - #should be northcentralus or centralus - azure_location=get_azure_region() - print(azure_location) +def get_server_hostname(azure_location): + if azure_location is None: + #should be northcentralus or centralus + azure_location=get_azure_region() + print("This VM is in azure location: %s" % azure_location) if azure_location == 'centralus': hostname='onnxruntimetestdata' elif azure_location == 'northcentralus': @@ -34,7 +36,7 @@ def get_server_hostname(): return hostname args = parse_arguments() -hostname=get_server_hostname() +hostname=get_server_hostname(args.azure_region) url=args.test_data_url.replace('onnxruntimetestdata', hostname) print('data url=%s' % url) subprocess.run(['./azcopy','cp', '--log-level','ERROR', url,'.'],check=True) diff --git a/tools/ci_build/github/linux/docker/Dockerfile.ubuntu b/tools/ci_build/github/linux/docker/Dockerfile.ubuntu index e5156366561a4..c4e201106ab7f 100644 --- a/tools/ci_build/github/linux/docker/Dockerfile.ubuntu +++ b/tools/ci_build/github/linux/docker/Dockerfile.ubuntu @@ -9,21 +9,6 @@ RUN /tmp/scripts/install_ubuntu.sh -p ${PYTHON_VERSION} && /tmp/scripts/install_ WORKDIR /root -# Build and Install LLVM -# ARG LLVM_VERSION=6.0.1 -# RUN cd /tmp && \ -# wget --no-verbose http://releases.llvm.org/$LLVM_VERSION/llvm-$LLVM_VERSION.src.tar.xz && \ -# xz -d llvm-$LLVM_VERSION.src.tar.xz && \ -# tar xvf llvm-$LLVM_VERSION.src.tar && \ -# cd llvm-$LLVM_VERSION.src && \ -# mkdir -p build && \ -# cd build && \ -# cmake .. -DCMAKE_BUILD_TYPE=Release && \ -# cmake --build . -- -j$(nproc) && \ -# cmake -DCMAKE_INSTALL_PREFIX=/usr/local/llvm-$LLVM_VERSION -DBUILD_TYPE=Release -P cmake_install.cmake && \ -# cd /tmp && \ -# rm -rf llvm* - ENV LD_LIBRARY_PATH /usr/local/openblas/lib:$LD_LIBRARY_PATH ARG BUILD_UID=1000 @@ -31,4 +16,3 @@ ARG BUILD_USER=onnxruntimedev WORKDIR /home/$BUILD_USER RUN adduser --gecos 'onnxruntime Build User' --disabled-password $BUILD_USER --uid $BUILD_UID USER $BUILD_USER - diff --git a/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_for_android b/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_for_android index 822e4c39accdc..74301e1e75f3e 100644 --- a/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_for_android +++ b/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_for_android @@ -9,21 +9,6 @@ RUN /tmp/scripts/install_ubuntu_for_android.sh -p ${PYTHON_VERSION} && /tmp/scri WORKDIR /root -# Build and Install LLVM -# ARG LLVM_VERSION=6.0.1 -# RUN cd /tmp && \ -# wget --no-verbose http://releases.llvm.org/$LLVM_VERSION/llvm-$LLVM_VERSION.src.tar.xz && \ -# xz -d llvm-$LLVM_VERSION.src.tar.xz && \ -# tar xvf llvm-$LLVM_VERSION.src.tar && \ -# cd llvm-$LLVM_VERSION.src && \ -# mkdir -p build && \ -# cd build && \ -# cmake .. -DCMAKE_BUILD_TYPE=Release && \ -# cmake --build . -- -j$(nproc) && \ -# cmake -DCMAKE_INSTALL_PREFIX=/usr/local/llvm-$LLVM_VERSION -DBUILD_TYPE=Release -P cmake_install.cmake && \ -# cd /tmp && \ -# rm -rf llvm* - ENV LD_LIBRARY_PATH /usr/local/openblas/lib:$LD_LIBRARY_PATH ARG BUILD_UID=1000 @@ -31,5 +16,3 @@ ARG BUILD_USER=onnxruntimedev WORKDIR /home/$BUILD_USER RUN adduser --gecos 'onnxruntime Build User' --disabled-password $BUILD_USER --uid $BUILD_UID USER $BUILD_USER - - diff --git a/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_tensorrt b/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_tensorrt index 7771c33b2633d..678b976886200 100644 --- a/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_tensorrt +++ b/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_tensorrt @@ -21,21 +21,6 @@ RUN _CUDNN_VERSION=$(echo $CUDNN_VERSION | cut -d. -f1-2) && \ ln -s /etc/alternatives/libcudnn_so /usr/local/cudnn-$_CUDNN_VERSION/cuda/lib64/libcudnn.so && \ ln -s /usr/local/cudnn{-$_CUDNN_VERSION,} -# Build and Install LLVM -ARG LLVM_VERSION=6.0.1 -RUN cd /tmp && \ - wget --no-verbose http://releases.llvm.org/$LLVM_VERSION/llvm-$LLVM_VERSION.src.tar.xz && \ - xz -d llvm-$LLVM_VERSION.src.tar.xz && \ - tar xvf llvm-$LLVM_VERSION.src.tar && \ - cd llvm-$LLVM_VERSION.src && \ - mkdir -p build && \ - cd build && \ - cmake .. -DCMAKE_BUILD_TYPE=Release && \ - cmake --build . -- -j$(nproc) && \ - cmake -DCMAKE_INSTALL_PREFIX=/usr/local/llvm-$LLVM_VERSION -DBUILD_TYPE=Release -P cmake_install.cmake && \ - cd /tmp && \ - rm -rf llvm* - ENV LD_LIBRARY_PATH /usr/local/openblas/lib:$LD_LIBRARY_PATH ARG BUILD_USER=onnxruntimedev @@ -43,4 +28,3 @@ ARG BUILD_UID=1000 WORKDIR /home/$BUILD_USER RUN adduser --gecos 'onnxruntime Build User' --disabled-password $BUILD_USER --uid $BUILD_UID USER $BUILD_USER - diff --git a/tools/ci_build/github/linux/docker/scripts/install_deps.sh b/tools/ci_build/github/linux/docker/scripts/install_deps.sh index ffb20d2e5e1e1..ff1a243e0865a 100755 --- a/tools/ci_build/github/linux/docker/scripts/install_deps.sh +++ b/tools/ci_build/github/linux/docker/scripts/install_deps.sh @@ -38,8 +38,8 @@ else #5af210ca8a1c73aa6bae8754c9346ec54d0a756e is v1.2.3 #bae6333e149a59a3faa9c4d9c44974373dcf5256 is v1.3.0 #9e55ace55aad1ada27516038dfbdc66a8a0763db is v1.4.1 - #c1c04af4e9fa0c96fbc1fda7b330bb994118f3c5 is v1.4.1 latest - for onnx_version in "5af210ca8a1c73aa6bae8754c9346ec54d0a756e" "bae6333e149a59a3faa9c4d9c44974373dcf5256" "9e55ace55aad1ada27516038dfbdc66a8a0763db" "c1c04af4e9fa0c96fbc1fda7b330bb994118f3c5"; do + #7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef is v1.5.0 + for onnx_version in "5af210ca8a1c73aa6bae8754c9346ec54d0a756e" "bae6333e149a59a3faa9c4d9c44974373dcf5256" "9e55ace55aad1ada27516038dfbdc66a8a0763db" "7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef"; do if [ -z ${lastest_onnx_version+x} ]; then echo "first pass"; else diff --git a/tools/ci_build/github/linux/docker/scripts/install_deps_x86.sh b/tools/ci_build/github/linux/docker/scripts/install_deps_x86.sh index 722d1a1e4bbd0..27e417f62400c 100755 --- a/tools/ci_build/github/linux/docker/scripts/install_deps_x86.sh +++ b/tools/ci_build/github/linux/docker/scripts/install_deps_x86.sh @@ -32,8 +32,8 @@ else #5af210ca8a1c73aa6bae8754c9346ec54d0a756e is v1.2.3 #bae6333e149a59a3faa9c4d9c44974373dcf5256 is v1.3.0 #9e55ace55aad1ada27516038dfbdc66a8a0763db is v1.4.1 - #27d4b617e7097cda7d0d4c45ff2b09d248f33179 is v1.4.1 latest - for onnx_version in "5af210ca8a1c73aa6bae8754c9346ec54d0a756e" "bae6333e149a59a3faa9c4d9c44974373dcf5256" "9e55ace55aad1ada27516038dfbdc66a8a0763db" "27d4b617e7097cda7d0d4c45ff2b09d248f33179"; do + #7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef is v1.5.0 + for onnx_version in "5af210ca8a1c73aa6bae8754c9346ec54d0a756e" "bae6333e149a59a3faa9c4d9c44974373dcf5256" "9e55ace55aad1ada27516038dfbdc66a8a0763db" "7d7bc83d29a328233d3e8affa4c4ea8b3e3599ef"; do if [ -z ${lastest_onnx_version+x} ]; then echo "first pass"; else diff --git a/tools/ci_build/github/linux/docker/scripts/install_ubuntu.sh b/tools/ci_build/github/linux/docker/scripts/install_ubuntu.sh index 00bda4df84ecd..e8e6111418ce1 100755 --- a/tools/ci_build/github/linux/docker/scripts/install_ubuntu.sh +++ b/tools/ci_build/github/linux/docker/scripts/install_ubuntu.sh @@ -67,6 +67,7 @@ fi /usr/bin/python${PYTHON_VER} -m pip install --upgrade --force-reinstall pip==19.0.3 /usr/bin/python${PYTHON_VER} -m pip install --upgrade --force-reinstall numpy==1.15.0 +/usr/bin/python${PYTHON_VER} -m pip install --upgrade --force-reinstall requests==2.21.0 rm -rf /var/lib/apt/lists/* mkdir -p /tmp/azcopy diff --git a/tools/ci_build/github/linux/run_build.sh b/tools/ci_build/github/linux/run_build.sh index 30c9986c104cd..97279f9336fff 100755 --- a/tools/ci_build/github/linux/run_build.sh +++ b/tools/ci_build/github/linux/run_build.sh @@ -20,7 +20,7 @@ if [ $BUILD_OS = "android" ]; then /opt/cmake/bin/cmake -DCMAKE_TOOLCHAIN_FILE=/android-ndk/build/cmake/android.toolchain.cmake -DANDROID_CPP_FEATURES=exceptions -DANDROID_PLATFORM=android-28 -DANDROID_ABI=arm64-v8a -DCMAKE_BUILD_TYPE=Release -Donnxruntime_CROSS_COMPILING=ON -Donnxruntime_BUILD_x86=OFF -DONNX_CUSTOM_PROTOC_EXECUTABLE=/usr/bin/protoc ../cmake /opt/cmake/bin/cmake --build . -- -j$(nproc) else - COMMON_BUILD_ARGS="--skip_submodule_sync --enable_onnx_tests --parallel --build_shared_lib --build_wheel --use_openmp" + COMMON_BUILD_ARGS="--skip_submodule_sync --enable_onnx_tests --parallel --build_shared_lib --use_openmp" if [ $BUILD_DEVICE = "gpu" ]; then _CUDNN_VERSION=$(echo $CUDNN_VERSION | cut -d. -f1-2) python3 $SCRIPT_DIR/../../build.py --build_dir /build \ diff --git a/tools/ci_build/github/linux/run_dockerbuild.sh b/tools/ci_build/github/linux/run_dockerbuild.sh index 8c0948d66747d..2bd5e06c16323 100755 --- a/tools/ci_build/github/linux/run_dockerbuild.sh +++ b/tools/ci_build/github/linux/run_dockerbuild.sh @@ -48,6 +48,7 @@ else else IMAGE="ubuntu16.04" if [ $BUILD_ARCH = "x86" ]; then + IMAGE="$IMAGE.x86" docker build -t "onnxruntime-$IMAGE" --build-arg BUILD_USER=onnxruntimedev --build-arg BUILD_UID=$(id -u) --build-arg OS_VERSION=16.04 --build-arg PYTHON_VERSION=${PYTHON_VER} -f Dockerfile.ubuntu_x86 . else docker build -t "onnxruntime-$IMAGE" --build-arg BUILD_USER=onnxruntimedev --build-arg BUILD_UID=$(id -u) --build-arg OS_VERSION=16.04 --build-arg PYTHON_VERSION=${PYTHON_VER} -f Dockerfile.ubuntu .