Skip to content

Added Jenkinsfile#2

Merged
saadrahim merged 6 commits into
ROCm:developfrom
amdkila:Jenkins
May 7, 2019
Merged

Added Jenkinsfile#2
saadrahim merged 6 commits into
ROCm:developfrom
amdkila:Jenkins

Conversation

@amdkila
Copy link
Copy Markdown
Contributor

@amdkila amdkila commented May 5, 2019

  • Have to test this in Jenkins

@saadrahim saadrahim merged commit 7756cb3 into ROCm:develop May 7, 2019
@amdkila amdkila deleted the Jenkins branch May 7, 2019 20:35
stanleytsang-amd added a commit to stanleytsang-amd/hipCUB that referenced this pull request Jun 26, 2024
stanleytsang-amd pushed a commit to stanleytsang-amd/hipCUB that referenced this pull request Jun 26, 2024
Update thread load/store assembly for GFX12

The "s_wait_cnt" instruction is used to avoid data hazards after some load and store
instructions. On gfx12, s_wait_cnt has been depricated, and replaced with more
specific instructions for each individual type of counter (eg. loadcnt, storecnt).

This changes updates two locations where the old s_waitcnt instruction is used
within some inline assembly. In these two cases, the instruction is replaced
with s_[load/store]cnt_dscnt. The "dscnt" suffix ensures that we also wait for
any outstanding local memory operations to complete.
ammallya pushed a commit that referenced this pull request Oct 28, 2025
Added Jenkinsfile

[ROCm/hipCUB commit: 7756cb3]
assistant-librarian Bot pushed a commit that referenced this pull request Jan 5, 2026
[rocPRIM] Config modernization
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

Our previous configuration system had become limiting in several ways.
Most importantly, it was not able to differentiate between individual
GPUs when selecting config parameters. This made proper tuning difficult
and prevented future work involving SPIR-V–specific tuning. In addition,
the old approach relied heavily on complex template metaprogramming,
which had become difficult to maintain. With the move to C++17, we now
have cleaner and more expressive language features available, making
this a good opportunity to redesign the system.

## Technical Details

All changes are internal. **There are no API changes for users.**

The majority of the diff in this PR consists of the new configuration
definitions themselves, so while the PR appears large, the actual code
changes are relatively small.

### New Configuration Structure

Each algorithm now defines a *_config_picker templated on the target and
value type. Below is a simplified example:

```cpp
template<class Target, class value_type>
constexpr <algo_name>_config_picker()
    -> std::enable_if_t<
        std::is_same_v<Target,
                       comp_target<gen::gcn5, target_arch::gfx906, gpu::mi50, rep::amdgcn>>,
        <algo_name>_config_params>
{
    // Tuned configuration #1
    if constexpr (/* condition for this combination */)
    {
        return <algo_name>_config_params{ ... };
    }
    // Tuned configuration #2
    if constexpr (/* condition for this combination */)
    {
        return <algo_name>_config_params{ ... };
    }
    // Default for this target
    return <algo_name>_config_params_base<value_type>();
}
```

Each tuned target provides a similar overload. For untuned or unknown
targets, we provide a general fallback:

```cpp
template<class Target, class value_type>
constexpr auto <algo_name>_config_picker()
    -> std::enable_if_t<
        std::is_same_v<Target,
                       comp_target<gen::unknown, target_arch::unknown, gpu::generic, rep::amdgcn>>,
        <algo_name>_config_params>
{
    // Fallback: use a commonly tuned target (often MI100)
    return <algo_name>_config_picker<
        comp_target<gen::cdna1, target_arch::gfx908, gpu::mi100, rep::amdgcn>,
        key_type, value_type>();
}
```

All available tuned targets are listed in:
```cpp
using <algo_name>_targets = comp_targets<
    comp_target<gen::gcn5, target_arch::gfx906, gpu::mi50, rep::amdgcn>,
    ...,
    comp_target<gen::unknown, target_arch::unknown, gpu::generic, rep::amdgcn>>;
```
### How Config Selection Works Now

In the new system, kernels are compiled for all tuned targets. At
runtime, if the current GPU does not have dedicated tuning, the library
uses the most_common_config policy to choose the best matching compiled
kernel.

The selection policy (tested in test_config_dispatch.cpp) attempts to
match, in decreasing priority:
1. Exact GPU model
2. Architecture
3. Generation

If no match is found, it falls back to the unknown target. If multiple
candidates match, the last one listed in the comp_targets type list is
chosen, which gives us a controlled and predictable fallback order.

We also pass the selected target into kernel compilation, enabling
compile-time specialization based on GPU, architecture, and generation.

### Target struct
The target struct currently stores only:
- GPU generation
- Architecture
- GPU Name
- Representation (rep), which distinguishes SPIR-V from native AMDGCN

The rep field is not yet functional (requires compiler support), and the
dispatch policy does not consider it at the moment. Also this target
structs makes it relatively easy to store more data.

### Scripts
The python script changes in this PR are there for scripts that used the
configs as input/output.

### Summary of Improvements:
- Better differentiation and selection across GPUs
- Cleaner C++17-based implementation
- Easier extension for future SPIR-V tuning
- Improved maintainability of config definitions
- Added more flexibility for future features.

## Test Plan

Some tests were added in test_config_dispatch.cpp, these and all the
other tests should pass. Also everything needs to be benchmarked to see
if the correct configs are chosen.

## Test Result

All tests pass, benchmarks are still WIP.

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants