Skip to content

Enable tests for externally-managed work areas by expanding hipfft_params#160

Merged
regan-amd merged 5 commits into
ROCm:developfrom
regan-amd:externally_managed_workareas_in_hipfftparams
Jul 18, 2025
Merged

Enable tests for externally-managed work areas by expanding hipfft_params#160
regan-amd merged 5 commits into
ROCm:developfrom
regan-amd:externally_managed_workareas_in_hipfftparams

Conversation

@regan-amd
Copy link
Copy Markdown
Contributor

Summary: changes suggested to enable coverage for features relevant to externally-managed work areas with hipFFT.

Details

  1. In general, hipFFT should assume/require a pointer to an array of $N_\text{GPU}$ size_t values where $N_\text{GPU}$ is the number of GPU devices being used, when it comes to the size_t* workSize arguments of its various APIs. Some implementation changes may be required for multi-gpu usage as only one of those values is written anyways on AMD platforms currently unless I missed something; full testing cannot be done at the moment due to a missing API in my understanding, though.
    Anyhow, this point likely motivated some gymnastics currently present within hipfft_params that makes the base class member variable workbuffersize irrelevant/unused/possibly misleading in some cases. I suggest to no longer declare workbuffersize in the base class to let derived classes consistently unroll their own logic, free of that "constraint". This also triggers some required changes for the fft_params::work_buffer_alloc_failure exception, in turn.
  2. The base class is expanded with a member enum variable identifying whether the encapsulated plan is allowed to "auto-allocate" its work area(s) or not; the possible values are basically "on", "off", and "default". For rocfft_params, "default:=off" whereas "default:=on" for hipfft_params.
  3. The hipfft_param class is expanded with a static std::vector<gpubuf> externally_managed_workareas; class member that is meant to be used by its instances if they're required to create plans without auto-allocation. externally_managed_workareas exists independently of any of the hipfft_params instance. Any instance instructed to create plans without auto-allocate will use the member(s) of externally_managed_workareas (after reallocating them if they're too small for their needs) and provide them as work areas to the created plans.
    If any plan creation is found to fail due to a failing device allocation at any point, the specific hipfft_params instance will clear externally_managed_workareas and attempt the plan creation again before reporting the failure (or not), given that externally_managed_workareas may contain buffers that are not needed or too large for the instance.
  4. I'm suggesting some implementation changes to hipFFT to prevent it from allocating more buffer(s) if a simple work buffer size or size estimate is queried.

Please let me know if hipfft_params is supposed to be thread-safe as more care would be required w.r.t. externally_managed_workareas in that case.

@regan-amd regan-amd changed the title Enable (and test for) externally-managed work areas by expanding hipfft_params Enable tests for externally-managed work areas by expanding hipfft_params Jul 3, 2025
Comment thread clients/hipfft_params.h
[](size_t total, const gpubuf& buf) { return total + buf.size(); });
}

bool is_preventing_auto_allocation_at_generation() const
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's going to affect performance tests, which is also a target for hipfft_params.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, the suggested changes should not modify the overall behavior of this class for configurations that could be tested prior (at least not w.r.t. the allocations of work areas). With the suggested changes, any previously-testable configuration remains testable, and their (newly-introduced) member auto_allocate is then fft_auto_allocation_default (note: fft_auto_allocation_on behaves equivalently in case of hipfft_params). In such cases, this specific member function would always return false and the logic flow of create_plan is globally unchanged (work areas are automatically allocated by hipfft at plan generation).

For newly-enabled configurations of hipfft_params though (i.e., cases with auto_allocate set to fft_auto_allocation_off), note that the additions to create_plan will still fully set the required work areas (see set_externally_managed_work_areas) before returning. The changes do not rely on the backend library (e.g., rocfft) allocating its own work area(s) at plan execution if they were not provided with some prior.

Therefore, I do not expect a significant change in measured performance, in either case. I verified with a few various sizes on a gfx1201 (single-process, single-GPU cases), using hipfft-bench on this branch and develop. Please let me know if there is a specific size you'd want me to look into.

Copy link
Copy Markdown
Contributor

@malcolmroberts malcolmroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this affect a performance test?

@regan-amd
Copy link
Copy Markdown
Contributor Author

Does this affect a performance test?

I don't think it does.

@regan-amd regan-amd requested a review from malcolmroberts July 7, 2025 23:26
Comment thread clients/hipfft_params.h Outdated
// hipfftXtSetWorkArea would be required
throw std::runtime_error(
"cannot request externally-managed work areas with multi-gpu usage");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really an error; we just haven't implemented it yet. Normally, one would just skip the test using gtest's mechanism, but we can't do that here because it's in the fft params level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I introduced a dedicated base-class-defined unimplemented_exception along with the corresponding logic in the accuracy test in 517f101: let me know if that's more acceptable o you, or not.

Note: thinking about it, I don't think the (current) lack implementation support warrants an exception to be thrown by validate_fields, so I removed it therefrom.

Comment thread clients/hipfft_params.h
…nt failure for get_xt_api_execution_type in case of unexepected precision
@regan-amd regan-amd requested a review from malcolmroberts July 9, 2025 23:58
Comment thread clients/bench/bench.cpp Outdated
@regan-amd regan-amd merged commit 3ee8789 into ROCm:develop Jul 18, 2025
9 of 10 checks passed
regan-amd added a commit to ROCm/rocm-libraries that referenced this pull request Aug 27, 2025
Following up here upon completion of
ROCm/hipFFT#160.

Note:
1. if auto-allocation is used, plans allocate resources at execution for
rocFFT whereas they do it at plan generation for hipFFT;
2. "default" auto-allocation flag for rocFFT tests is equivalent to
"off", while "default" auto-allocation behavior for hipFFT tests is
equivalent to "on", consistently with the above;
3. If "on" auto-allocation flag is used for a rocFFT test, that would
force the underlying plan to allocate resources at execution time. It
might be best to have a warning printed by bench in that case?

---
🔁 Imported from
[ROCm/rocFFT#626](ROCm/rocFFT#626)
🧑‍💻 Originally authored by @regan-amd

Co-authored-by: Raphael Egan <Raphael.Egan@amd.com>
assistant-librarian Bot pushed a commit to ROCm/rocFFT that referenced this pull request Aug 27, 2025
Adding auto-allocation logic for rocfft tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Following up here upon completion of
ROCm/hipFFT#160.

Note:
1. if auto-allocation is used, plans allocate resources at execution for
rocFFT whereas they do it at plan generation for hipFFT;
2. "default" auto-allocation flag for rocFFT tests is equivalent to
"off", while "default" auto-allocation behavior for hipFFT tests is
equivalent to "on", consistently with the above;
3. If "on" auto-allocation flag is used for a rocFFT test, that would
force the underlying plan to allocate resources at execution time. It
might be best to have a warning printed by bench in that case?
regan-amd added a commit to ROCm/rocm-libraries that referenced this pull request Sep 6, 2025
…e discarded + clear externally-managed work areas while runtime is initialized (#1388)

This fixes failures for `hipfft-test` with CUDA backend for tests added
in ROCm/hipFFT#160.
assistant-librarian Bot pushed a commit that referenced this pull request Sep 6, 2025
[hipfft-test] do not use nullptrs for pointers to workarea
 sizes to be discarded + clear externally-managed work areas while runtime is
 initialized (#1388)

This fixes failures for `hipfft-test` with CUDA backend for tests added
in #160.
ammallya pushed a commit that referenced this pull request Oct 28, 2025
…rams (#160)

* Enable (and test for) externally-managed workareas via hipfft_params

* Remove dependency on external random_seed and repair build for hipfft-bench

* Adding 'auto_allocation' option to hip-bench target and clarifying description thereof

* Using ad-hoc exception type for unimplemented cases and avoiding silent failure for get_xt_api_execution_type in case of unexepected precision

* 'HipFFT' -> 'hipFFT'

[ROCm/hipFFT commit: 3ee8789]
ammallya pushed a commit that referenced this pull request Oct 28, 2025
…e discarded + clear externally-managed work areas while runtime is initialized (#1388)

This fixes failures for `hipfft-test` with CUDA backend for tests added
in #160.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants