Skip to content

Develop#1

Merged
VincentSC merged 2 commits into
masterfrom
develop
Apr 17, 2019
Merged

Develop#1
VincentSC merged 2 commits into
masterfrom
develop

Conversation

@VincentSC
Copy link
Copy Markdown
Contributor

Updated licenses texts

neon60 and others added 2 commits April 17, 2019 16:27
Update CUB license

See merge request amd/hipCUB!2
@VincentSC VincentSC merged commit 55db27a into master Apr 17, 2019
ammallya pushed a commit that referenced this pull request Oct 28, 2025
assistant-librarian Bot pushed a commit that referenced this pull request Jan 5, 2026
[rocPRIM] Config modernization
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

Our previous configuration system had become limiting in several ways.
Most importantly, it was not able to differentiate between individual
GPUs when selecting config parameters. This made proper tuning difficult
and prevented future work involving SPIR-V–specific tuning. In addition,
the old approach relied heavily on complex template metaprogramming,
which had become difficult to maintain. With the move to C++17, we now
have cleaner and more expressive language features available, making
this a good opportunity to redesign the system.

## Technical Details

All changes are internal. **There are no API changes for users.**

The majority of the diff in this PR consists of the new configuration
definitions themselves, so while the PR appears large, the actual code
changes are relatively small.

### New Configuration Structure

Each algorithm now defines a *_config_picker templated on the target and
value type. Below is a simplified example:

```cpp
template<class Target, class value_type>
constexpr <algo_name>_config_picker()
    -> std::enable_if_t<
        std::is_same_v<Target,
                       comp_target<gen::gcn5, target_arch::gfx906, gpu::mi50, rep::amdgcn>>,
        <algo_name>_config_params>
{
    // Tuned configuration #1
    if constexpr (/* condition for this combination */)
    {
        return <algo_name>_config_params{ ... };
    }
    // Tuned configuration #2
    if constexpr (/* condition for this combination */)
    {
        return <algo_name>_config_params{ ... };
    }
    // Default for this target
    return <algo_name>_config_params_base<value_type>();
}
```

Each tuned target provides a similar overload. For untuned or unknown
targets, we provide a general fallback:

```cpp
template<class Target, class value_type>
constexpr auto <algo_name>_config_picker()
    -> std::enable_if_t<
        std::is_same_v<Target,
                       comp_target<gen::unknown, target_arch::unknown, gpu::generic, rep::amdgcn>>,
        <algo_name>_config_params>
{
    // Fallback: use a commonly tuned target (often MI100)
    return <algo_name>_config_picker<
        comp_target<gen::cdna1, target_arch::gfx908, gpu::mi100, rep::amdgcn>,
        key_type, value_type>();
}
```

All available tuned targets are listed in:
```cpp
using <algo_name>_targets = comp_targets<
    comp_target<gen::gcn5, target_arch::gfx906, gpu::mi50, rep::amdgcn>,
    ...,
    comp_target<gen::unknown, target_arch::unknown, gpu::generic, rep::amdgcn>>;
```
### How Config Selection Works Now

In the new system, kernels are compiled for all tuned targets. At
runtime, if the current GPU does not have dedicated tuning, the library
uses the most_common_config policy to choose the best matching compiled
kernel.

The selection policy (tested in test_config_dispatch.cpp) attempts to
match, in decreasing priority:
1. Exact GPU model
2. Architecture
3. Generation

If no match is found, it falls back to the unknown target. If multiple
candidates match, the last one listed in the comp_targets type list is
chosen, which gives us a controlled and predictable fallback order.

We also pass the selected target into kernel compilation, enabling
compile-time specialization based on GPU, architecture, and generation.

### Target struct
The target struct currently stores only:
- GPU generation
- Architecture
- GPU Name
- Representation (rep), which distinguishes SPIR-V from native AMDGCN

The rep field is not yet functional (requires compiler support), and the
dispatch policy does not consider it at the moment. Also this target
structs makes it relatively easy to store more data.

### Scripts
The python script changes in this PR are there for scripts that used the
configs as input/output.

### Summary of Improvements:
- Better differentiation and selection across GPUs
- Cleaner C++17-based implementation
- Easier extension for future SPIR-V tuning
- Improved maintainability of config definitions
- Added more flexibility for future features.

## Test Plan

Some tests were added in test_config_dispatch.cpp, these and all the
other tests should pass. Also everything needs to be benchmarked to see
if the correct configs are chosen.

## Test Result

All tests pass, benchmarks are still WIP.

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants