-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[MIGraphx EP] Sync AMD changes upstream #25338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7c3d02a
3836f43
a5ce8dc
1f43e34
ebb5583
8e1ce14
cd24224
867e216
e04609f
da8e778
526d27a
8150cce
a908755
6fe613e
9202f06
1a63b95
10e5eec
8fca486
dbbd483
9d930a2
af57264
02ff9cc
b1f603f
72fe23b
5adc480
17308b9
9a5ad86
e24ac6c
57e0a0c
dbb89fb
0ed6013
8c91aeb
071ca51
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| diff --git a/include/ck/ck.hpp b/include/ck/ck.hpp | ||
| index 55f562061..ee340eba1 100644 | ||
| --- a/include/ck/ck.hpp | ||
| +++ b/include/ck/ck.hpp | ||
| @@ -53,7 +53,7 @@ CK_DECLARE_ENV_VAR_BOOL(CK_LOGGING) | ||
|
|
||
| // define general macros for various architectures | ||
| #if defined(__gfx908__) || defined(__gfx90a__) || defined(__gfx940__) || defined(__gfx941__) || \ | ||
| - defined(__gfx942__) | ||
| + defined(__gfx942__) || defined(__gfx950__) | ||
| #define __gfx9__ | ||
| #endif | ||
| #if defined(__gfx940__) || defined(__gfx941__) || defined(__gfx942__) | ||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -711,15 +711,13 @@ typedef struct OrtTensorRTProviderOptions { | |
| typedef struct OrtMIGraphXProviderOptions { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a breaking change. ONNX Runtime maintains ABI backwards compatibility. Certainly, this change breaks that. As it will only impact the MIGraphX EP, I could accept this change. So, ORT's ABI backwards compatibility promise becomes if ORT provides ABI backwards compatibility if you don't use the functions that are not backwards compatible. (A little bit funny) The correct way of doing this is to move the whole data structure out of this header file and make it opaque(invisible), and use getters/setters to access the fields. https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was added in an earlier commit and I assume moving forward anyone using OnnxRT will be using the latest release supported with their ROCm install since we have a mapping in the docs. https://onnxruntime.ai/docs/execution-providers/MIGraphX-ExecutionProvider.html#requirements
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is part of ORT's public API that everyone uses, no matter if MIGRAPHX EP is used or not.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we pad this instead if this is a concern then? We're already using this internally for our builds and release, adding two padding vars should keep things consistent since we're missing a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or is it better to have these items but tie them to nothing in the EP then and have a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Padding is also a breaking change. Because clients built with older ORT versions will end-up with garbage data for the new fields. Please try to minimize the number of changes of this struct. Going forward you should move it to an opaque type like what CUDA EP did. |
||
| int device_id; // hip device id. | ||
| int migraphx_fp16_enable; // MIGraphX FP16 precision. Default 0 = false, nonzero = true | ||
| int migraphx_bf16_enable; // MIGraphX BF16 precision. Default 0 = false, nonzero = true | ||
| int migraphx_fp8_enable; // MIGraphX FP8 precision. Default 0 = false, nonzero = true | ||
| int migraphx_int8_enable; // MIGraphX INT8 precision. Default 0 = false, nonzero = true | ||
| int migraphx_use_native_calibration_table; // MIGraphx INT8 cal table. Default 0 = false, noznero = true | ||
| int migraphx_use_native_calibration_table; // MIGraphx INT8 cal table. Default 0 = false, nonzero = true | ||
| const char* migraphx_int8_calibration_table_name; // MIGraphx INT8 calibration table name | ||
| int migraphx_save_compiled_model; // migraphx save compiled model. Default 0 = false, noznero = true | ||
| const char* migraphx_save_model_path; // migraphx model path name | ||
| int migraphx_load_compiled_model; // migraphx int8 cal table. Default 0 = false, noznero = true | ||
| const char* migraphx_load_model_path; // migraphx model path name | ||
| bool migraphx_exhaustive_tune; // migraphx tuned compile Default = false | ||
| const char* migraphx_cache_dir; // MIGraphX model cache directory | ||
| int migraphx_exhaustive_tune; // MIGraphX tuned compile. Default = false, nonzero = true | ||
|
|
||
| /** \brief MIGraphX memory limit (To use all possible memory pass in maximum size_t) | ||
| * Defaults to SIZE_MAX. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -562,7 +562,10 @@ static D3D12_COMMAND_LIST_TYPE CalculateCommandListType(ID3D12Device* d3d12_devi | |
| sizeof(feature_levels) | ||
| )); | ||
|
|
||
| auto use_compute_command_list = (feature_levels.MaxSupportedFeatureLevel <= D3D_FEATURE_LEVEL_1_0_CORE); | ||
| // Use compute queue whenever possible on supported hardware to avoid TDR and maintain UI QoS | ||
| // Core and generic devices only have compute queues, DX11 has "immediate" submission, DX12 has both | ||
| auto use_compute_command_list = (feature_levels.MaxSupportedFeatureLevel <= D3D_FEATURE_LEVEL_1_0_CORE) || | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does it relate to MIGRAPHX?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its a bunch of changes that are being integrated on the ROCm side for our EPs we maintain.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who made this change? Was someone from AMD or not?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From AMD
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be possible to split this change to a separate PR? I need to find someone from the directml team to review this change.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I can pull this one out. I'm getting back to the other devs responsible for these to make sure these are up to standard to merge back upstream |
||
| (feature_levels.MaxSupportedFeatureLevel >= D3D_FEATURE_LEVEL_12_0); | ||
|
|
||
| if (use_compute_command_list) | ||
| { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,11 +23,11 @@ void MIGraphXAllocator::CheckDevice() const { | |
| #endif | ||
| } | ||
|
|
||
| void* MIGraphXAllocator::Alloc(size_t size) { | ||
| void* MIGraphXAllocator::Alloc(const size_t size) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you need to add const here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We ran this through clang-tidy
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Our clang-tidy configs need to be improve. Our coding style is inherited from Google's coding style See https://github.com/microsoft/onnxruntime/blob/main/docs/Coding_Conventions_and_Standards.md. The document is the golden standard. |
||
| CheckDevice(); | ||
| void* p = nullptr; | ||
| if (size > 0) { | ||
| HIP_CALL_THROW(hipMalloc((void**)&p, size)); | ||
| HIP_CALL_THROW(hipMalloc(&p, size)); | ||
| } | ||
| return p; | ||
| } | ||
|
|
@@ -37,7 +37,7 @@ void MIGraphXAllocator::Free(void* p) { | |
| (void)hipFree(p); // do not throw error since it's OK for hipFree to fail during shutdown | ||
| } | ||
|
|
||
| void* MIGraphXExternalAllocator::Alloc(size_t size) { | ||
| void* MIGraphXExternalAllocator::Alloc(const size_t size) { | ||
| void* p = nullptr; | ||
| if (size > 0) { | ||
| p = alloc_(size); | ||
|
|
@@ -51,27 +51,27 @@ void* MIGraphXExternalAllocator::Alloc(size_t size) { | |
|
|
||
| void MIGraphXExternalAllocator::Free(void* p) { | ||
| free_(p); | ||
| std::lock_guard<std::mutex> lock(lock_); | ||
| auto it = reserved_.find(p); | ||
| if (it != reserved_.end()) { | ||
| std::lock_guard lock(lock_); | ||
| if (const auto it = reserved_.find(p); it != reserved_.end()) { | ||
| reserved_.erase(it); | ||
| if (empty_cache_) empty_cache_(); | ||
| } | ||
| } | ||
|
|
||
| void* MIGraphXExternalAllocator::Reserve(size_t size) { | ||
| void* MIGraphXExternalAllocator::Reserve(const size_t size) { | ||
| void* p = Alloc(size); | ||
| if (!p) return nullptr; | ||
| std::lock_guard<std::mutex> lock(lock_); | ||
| ORT_ENFORCE(reserved_.find(p) == reserved_.end()); | ||
| reserved_.insert(p); | ||
| if (p != nullptr) { | ||
| std::lock_guard lock(lock_); | ||
| ORT_ENFORCE(reserved_.find(p) == reserved_.end()); | ||
| reserved_.insert(p); | ||
| } | ||
| return p; | ||
| } | ||
|
|
||
| void* MIGraphXPinnedAllocator::Alloc(size_t size) { | ||
| void* MIGraphXPinnedAllocator::Alloc(const size_t size) { | ||
| void* p = nullptr; | ||
| if (size > 0) { | ||
| HIP_CALL_THROW(hipHostMalloc((void**)&p, size)); | ||
| HIP_CALL_THROW(hipHostMalloc(&p, size)); | ||
| } | ||
| return p; | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,27 +11,27 @@ namespace onnxruntime { | |
|
|
||
| class MIGraphXAllocator : public IAllocator { | ||
| public: | ||
| MIGraphXAllocator(int device_id, const char* name) | ||
| MIGraphXAllocator(const OrtDevice::DeviceId device_id, const char* name) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You do not need to use const here. See https://abseil.io/tips/109 and https://google.github.io/styleguide/cppguide.html#Use_of_const
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your linter and tidy said otherwise. These were ran through clang-tidy
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then I think we do not have good clang-tidy config here. |
||
| : IAllocator( | ||
| OrtMemoryInfo(name, OrtAllocatorType::OrtDeviceAllocator, | ||
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::DEFAULT, OrtDevice::VendorIds::AMD, | ||
| static_cast<OrtDevice::DeviceId>(device_id)), | ||
| device_id), | ||
| OrtMemTypeDefault)) {} | ||
|
|
||
| virtual void* Alloc(size_t size) override; | ||
| virtual void Free(void* p) override; | ||
| void* Alloc(size_t size) override; | ||
| void Free(void* p) override; | ||
|
|
||
| private: | ||
| void CheckDevice() const; | ||
| }; | ||
|
|
||
| class MIGraphXExternalAllocator : public MIGraphXAllocator { | ||
| class MIGraphXExternalAllocator final : public MIGraphXAllocator { | ||
| typedef void* (*ExternalAlloc)(size_t size); | ||
| typedef void (*ExternalFree)(void* p); | ||
| typedef void (*ExternalEmptyCache)(); | ||
|
|
||
| public: | ||
| MIGraphXExternalAllocator(OrtDevice::DeviceId device_id, const char* name, void* alloc, void* free, void* empty_cache) | ||
| MIGraphXExternalAllocator(const OrtDevice::DeviceId device_id, const char* name, void* alloc, void* free, void* empty_cache) | ||
| : MIGraphXAllocator(device_id, name) { | ||
| alloc_ = reinterpret_cast<ExternalAlloc>(alloc); | ||
| free_ = reinterpret_cast<ExternalFree>(free); | ||
|
|
@@ -52,11 +52,11 @@ class MIGraphXExternalAllocator : public MIGraphXAllocator { | |
|
|
||
| class MIGraphXPinnedAllocator final : public IAllocator { | ||
| public: | ||
| MIGraphXPinnedAllocator(const int device_id, const char* name) | ||
| MIGraphXPinnedAllocator(const OrtDevice::DeviceId device_id, const char* name) | ||
| : IAllocator( | ||
| OrtMemoryInfo(name, OrtDeviceAllocator, | ||
| OrtDevice(OrtDevice::GPU, OrtDevice::MemType::HOST_ACCESSIBLE, OrtDevice::VendorIds::AMD, | ||
| static_cast<OrtDevice::DeviceId>(device_id)), | ||
| device_id), | ||
| OrtMemTypeCPUOutput)) {} | ||
|
|
||
| void* Alloc(size_t size) override; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change it from onnxruntime_add_shared_library_module to onnxruntime_add_shared_library ? A Module Library is a plugin that may not be linked by other targets, but may be dynamically loaded at runtime using dlopen-like functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed due to development for our windows side run. Is there one way or the other preferred here? let me know so I can flow this back to the ROCm internal branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should not be made, since nobody would link to this library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a use case which does link to this. Hence why this was added and integrated back on the ROCm side. Let me know whats the optimal way to handle this. Looks like I'll need to handle the integration work on this. Appreciate your comments sand insights btw @snnn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then where are the header files of this library? I need to know more background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, different link type results different construction/destruction order for global vars, which is common source of crashes. It's not just a build type thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree with you. Unfortunately this was pushed from windows side changes since the original interface was changed. Seems like a better solution is to keep the old flags then (save/load) and add the newer one (model cache) so that we don't break abi compatibility then and just extend it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed to
onnxruntime_add_shared_librarybecauseonnxruntime_add_shared_library_moduleputs the provider DLL inlibdirectory when installing withcmake --install ..., and we need the DLL inbindirectory, andonnxruntime_add_shared_librarydoes precisely that. The side effects are the header and interface library installed alongside.