Added Jenkinsfile#2
Merged
Merged
Conversation
Contributor
amdkila
commented
May 5, 2019
- Have to test this in Jenkins
stanleytsang-amd
added a commit
to stanleytsang-amd/hipCUB
that referenced
this pull request
Jun 26, 2024
stanleytsang-amd
pushed a commit
to stanleytsang-amd/hipCUB
that referenced
this pull request
Jun 26, 2024
Update thread load/store assembly for GFX12 The "s_wait_cnt" instruction is used to avoid data hazards after some load and store instructions. On gfx12, s_wait_cnt has been depricated, and replaced with more specific instructions for each individual type of counter (eg. loadcnt, storecnt). This changes updates two locations where the old s_waitcnt instruction is used within some inline assembly. In these two cases, the instruction is replaced with s_[load/store]cnt_dscnt. The "dscnt" suffix ensures that we also wait for any outstanding local memory operations to complete.
ammallya
pushed a commit
that referenced
this pull request
Oct 28, 2025
Added Jenkinsfile [ROCm/hipCUB commit: 7756cb3]
assistant-librarian Bot
pushed a commit
that referenced
this pull request
Jan 5, 2026
[rocPRIM] Config modernization
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Motivation
Our previous configuration system had become limiting in several ways.
Most importantly, it was not able to differentiate between individual
GPUs when selecting config parameters. This made proper tuning difficult
and prevented future work involving SPIR-V–specific tuning. In addition,
the old approach relied heavily on complex template metaprogramming,
which had become difficult to maintain. With the move to C++17, we now
have cleaner and more expressive language features available, making
this a good opportunity to redesign the system.
## Technical Details
All changes are internal. **There are no API changes for users.**
The majority of the diff in this PR consists of the new configuration
definitions themselves, so while the PR appears large, the actual code
changes are relatively small.
### New Configuration Structure
Each algorithm now defines a *_config_picker templated on the target and
value type. Below is a simplified example:
```cpp
template<class Target, class value_type>
constexpr <algo_name>_config_picker()
-> std::enable_if_t<
std::is_same_v<Target,
comp_target<gen::gcn5, target_arch::gfx906, gpu::mi50, rep::amdgcn>>,
<algo_name>_config_params>
{
// Tuned configuration #1
if constexpr (/* condition for this combination */)
{
return <algo_name>_config_params{ ... };
}
// Tuned configuration #2
if constexpr (/* condition for this combination */)
{
return <algo_name>_config_params{ ... };
}
// Default for this target
return <algo_name>_config_params_base<value_type>();
}
```
Each tuned target provides a similar overload. For untuned or unknown
targets, we provide a general fallback:
```cpp
template<class Target, class value_type>
constexpr auto <algo_name>_config_picker()
-> std::enable_if_t<
std::is_same_v<Target,
comp_target<gen::unknown, target_arch::unknown, gpu::generic, rep::amdgcn>>,
<algo_name>_config_params>
{
// Fallback: use a commonly tuned target (often MI100)
return <algo_name>_config_picker<
comp_target<gen::cdna1, target_arch::gfx908, gpu::mi100, rep::amdgcn>,
key_type, value_type>();
}
```
All available tuned targets are listed in:
```cpp
using <algo_name>_targets = comp_targets<
comp_target<gen::gcn5, target_arch::gfx906, gpu::mi50, rep::amdgcn>,
...,
comp_target<gen::unknown, target_arch::unknown, gpu::generic, rep::amdgcn>>;
```
### How Config Selection Works Now
In the new system, kernels are compiled for all tuned targets. At
runtime, if the current GPU does not have dedicated tuning, the library
uses the most_common_config policy to choose the best matching compiled
kernel.
The selection policy (tested in test_config_dispatch.cpp) attempts to
match, in decreasing priority:
1. Exact GPU model
2. Architecture
3. Generation
If no match is found, it falls back to the unknown target. If multiple
candidates match, the last one listed in the comp_targets type list is
chosen, which gives us a controlled and predictable fallback order.
We also pass the selected target into kernel compilation, enabling
compile-time specialization based on GPU, architecture, and generation.
### Target struct
The target struct currently stores only:
- GPU generation
- Architecture
- GPU Name
- Representation (rep), which distinguishes SPIR-V from native AMDGCN
The rep field is not yet functional (requires compiler support), and the
dispatch policy does not consider it at the moment. Also this target
structs makes it relatively easy to store more data.
### Scripts
The python script changes in this PR are there for scripts that used the
configs as input/output.
### Summary of Improvements:
- Better differentiation and selection across GPUs
- Cleaner C++17-based implementation
- Easier extension for future SPIR-V tuning
- Improved maintainability of config definitions
- Added more flexibility for future features.
## Test Plan
Some tests were added in test_config_dispatch.cpp, these and all the
other tests should pass. Also everything needs to be benchmarked to see
if the correct configs are chosen.
## Test Result
All tests pass, benchmarks are still WIP.
## Submission Checklist
- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.