Enable tests for externally-managed work areas by expanding hipfft_params#160
Conversation
| [](size_t total, const gpubuf& buf) { return total + buf.size(); }); | ||
| } | ||
|
|
||
| bool is_preventing_auto_allocation_at_generation() const |
There was a problem hiding this comment.
This seems like it's going to affect performance tests, which is also a target for hipfft_params.
There was a problem hiding this comment.
AFAIK, the suggested changes should not modify the overall behavior of this class for configurations that could be tested prior (at least not w.r.t. the allocations of work areas). With the suggested changes, any previously-testable configuration remains testable, and their (newly-introduced) member auto_allocate is then fft_auto_allocation_default (note: fft_auto_allocation_on behaves equivalently in case of hipfft_params). In such cases, this specific member function would always return false and the logic flow of create_plan is globally unchanged (work areas are automatically allocated by hipfft at plan generation).
For newly-enabled configurations of hipfft_params though (i.e., cases with auto_allocate set to fft_auto_allocation_off), note that the additions to create_plan will still fully set the required work areas (see set_externally_managed_work_areas) before returning. The changes do not rely on the backend library (e.g., rocfft) allocating its own work area(s) at plan execution if they were not provided with some prior.
Therefore, I do not expect a significant change in measured performance, in either case. I verified with a few various sizes on a gfx1201 (single-process, single-GPU cases), using hipfft-bench on this branch and develop. Please let me know if there is a specific size you'd want me to look into.
malcolmroberts
left a comment
There was a problem hiding this comment.
Does this affect a performance test?
I don't think it does. |
…scription thereof
| // hipfftXtSetWorkArea would be required | ||
| throw std::runtime_error( | ||
| "cannot request externally-managed work areas with multi-gpu usage"); | ||
| } |
There was a problem hiding this comment.
This isn't really an error; we just haven't implemented it yet. Normally, one would just skip the test using gtest's mechanism, but we can't do that here because it's in the fft params level.
There was a problem hiding this comment.
Thank you. I introduced a dedicated base-class-defined unimplemented_exception along with the corresponding logic in the accuracy test in 517f101: let me know if that's more acceptable o you, or not.
Note: thinking about it, I don't think the (current) lack implementation support warrants an exception to be thrown by validate_fields, so I removed it therefrom.
…nt failure for get_xt_api_execution_type in case of unexepected precision
Following up here upon completion of ROCm/hipFFT#160. Note: 1. if auto-allocation is used, plans allocate resources at execution for rocFFT whereas they do it at plan generation for hipFFT; 2. "default" auto-allocation flag for rocFFT tests is equivalent to "off", while "default" auto-allocation behavior for hipFFT tests is equivalent to "on", consistently with the above; 3. If "on" auto-allocation flag is used for a rocFFT test, that would force the underlying plan to allocate resources at execution time. It might be best to have a warning printed by bench in that case? --- 🔁 Imported from [ROCm/rocFFT#626](ROCm/rocFFT#626) 🧑💻 Originally authored by @regan-amd Co-authored-by: Raphael Egan <Raphael.Egan@amd.com>
Adding auto-allocation logic for rocfft tests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Following up here upon completion of ROCm/hipFFT#160. Note: 1. if auto-allocation is used, plans allocate resources at execution for rocFFT whereas they do it at plan generation for hipFFT; 2. "default" auto-allocation flag for rocFFT tests is equivalent to "off", while "default" auto-allocation behavior for hipFFT tests is equivalent to "on", consistently with the above; 3. If "on" auto-allocation flag is used for a rocFFT test, that would force the underlying plan to allocate resources at execution time. It might be best to have a warning printed by bench in that case?
…e discarded + clear externally-managed work areas while runtime is initialized (#1388) This fixes failures for `hipfft-test` with CUDA backend for tests added in ROCm/hipFFT#160.
[hipfft-test] do not use nullptrs for pointers to workarea sizes to be discarded + clear externally-managed work areas while runtime is initialized (#1388) This fixes failures for `hipfft-test` with CUDA backend for tests added in #160.
…rams (#160) * Enable (and test for) externally-managed workareas via hipfft_params * Remove dependency on external random_seed and repair build for hipfft-bench * Adding 'auto_allocation' option to hip-bench target and clarifying description thereof * Using ad-hoc exception type for unimplemented cases and avoiding silent failure for get_xt_api_execution_type in case of unexepected precision * 'HipFFT' -> 'hipFFT' [ROCm/hipFFT commit: 3ee8789]
…e discarded + clear externally-managed work areas while runtime is initialized (#1388) This fixes failures for `hipfft-test` with CUDA backend for tests added in #160.
Summary: changes suggested to enable coverage for features relevant to externally-managed work areas with hipFFT.
Details
size_tvalues wheresize_t* workSizearguments of its various APIs. Some implementation changes may be required for multi-gpu usage as only one of those values is written anyways on AMD platforms currently unless I missed something; full testing cannot be done at the moment due to a missing API in my understanding, though.Anyhow, this point likely motivated some gymnastics currently present within
hipfft_paramsthat makes the base class member variableworkbuffersizeirrelevant/unused/possibly misleading in some cases. I suggest to no longer declareworkbuffersizein the base class to let derived classes consistently unroll their own logic, free of that "constraint". This also triggers some required changes for thefft_params::work_buffer_alloc_failureexception, in turn.rocfft_params, "default:=off" whereas "default:=on" forhipfft_params.hipfft_paramclass is expanded with astatic std::vector<gpubuf> externally_managed_workareas;class member that is meant to be used by its instances if they're required to create plans without auto-allocation.externally_managed_workareasexists independently of any of thehipfft_paramsinstance. Any instance instructed to create plans without auto-allocate will use the member(s) ofexternally_managed_workareas(after reallocating them if they're too small for their needs) and provide them as work areas to the created plans.If any plan creation is found to fail due to a failing device allocation at any point, the specific
hipfft_paramsinstance will clearexternally_managed_workareasand attempt the plan creation again before reporting the failure (or not), given thatexternally_managed_workareasmay contain buffers that are not needed or too large for the instance.Please let me know if
hipfft_paramsis supposed to be thread-safe as more care would be required w.r.t.externally_managed_workareasin that case.