Closed
Conversation
…lds (llvm#4338) From: Jon Chesterfield <jon@spectralcompute.co.uk>
Fix for SWDEV-550687 - incorrect rpath for certain Red Hat systems --------- Co-authored-by: Nicole Aschenbrenner <nicole.aschenbrenner@amd.com>
Co-authored-by: Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>
#435) … build Make use of find_package(AMDDeviceLibs) to search for the cmake config in the build tree instead of looping through various paths. Co-authored-by: Ethan Stewart <ethan.stewart@amd.com>
Co-authored-by: Yaxun (Sam) Liu <yaxun.liu@amd.com>
…4217) This reverts commit 78bf682. Original PR: llvm#157463 Revert PR: llvm#158566 The relevant buildbots have been updated to a ROCm version that does not use the macros anymore to avoid the failures. Implements SWDEV-522062.
The existing check for this case only comes after a derefence of what can be an iterator sentinel (leading to an assert). This may not be purely NFC in that it also avoids queuing the effectively-empty region for rescheduling, but AFAICT this should be purely an optimization. Testing this seems difficult, as the high-level scheduler avoids scheduling these "empty" regions. This means a reproducer has to depend on behavior of the scheduler passes before PreRARematStage in order to craft a region which triggers the bug. Since this is a release blocker I am posting a PR now, as both Shore Shen and I have manually verified that this resolves the particular crash from [SWDEV-564142](https://ontrack-internal.amd.com/browse/SWDEV-564142) but I am still working on making a reasonable test. (cherry picked from commit 004cfea)
This attribute isn't fully supported GDB and it's usage was leading to some errors.
…e dependency on isZeroSize (llvm#96422)" (#156)"" (llvm#2065)"" This reverts commit e56121c.
Since llvm#156765 ("[AMDGPU] Define 1024 VGPRs on gfx1250") we have been considering unaddressable VGPRs when determining which to mark as undefined in CFI. The net result was a combination of redundant and nonsense records being generated.
…#552) We do not have native instructions for direct bfloat comparisons. However, we can expand bfloat to float, and do float comparison instead. TODO: handle bfloat comparison for ballot intrinsic on global isel path. Fixes: SWDEV-563403
…) (#541) This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host. This pass implements following rewrites and optimisations: - **FissionWorkdistribute**: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region. - **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently. - **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}. - **TeamsWorkdistributeToSingle**: hoists all the ops inside teams {workdistribute{}} before teams op. The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation: [flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024) Paper related to this work by @ivanradanov ["Automatic Parallelization and OpenMP Offloadingof Fortran Array Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))
will work on script changes for aomp, and npsdb after it lands
- add_subdirectory(utils) in offload CMakeLists.txt - usage of new macro add_openmp_util to install utils into llvm/bin
rocm 7.2 changed pci layout/info
really messes up xnack=1 performance
necessitates forced path to numactl
-nr use numactl ROCR_VISIBLE_DEVICES
-nm use numactl OMPI_COMM_WORLD_LOCAL_RANK
Expose HW_REG_WAVE_SCHED_MODE to the s_getreg_b32, s_setreg_b32, s_setreg_imm32_b32 instructions. Co-authored-by: lancesix <lancelot.six@amd.com>
MIOpen failing for everyone on amd-mainline [2025-12-07T19:34:15.558Z] -- Could NOT find rocMLIR (missing: rocMLIR_DIR) [2025-12-07T19:34:15.558Z] -- Falling back to find library libMLIRMIOpen [2025-12-07T19:34:15.558Z] CMake Error at CMakeLists.txt:445 (find_library): [2025-12-07T19:34:15.558Z] Could not find LIBMLIRMIOPEN using the following names: MLIRMIOpen
Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD
…#858) …ipc_memory_create'. (#475) - Use reinterpret_cast<uptr> for pointer arithmetic. - Add sanitizer interception logic for api 'hsa_amd_pointer_info'. - Allow only valid values of ptr and len in non-ASan mode. - ptr == Actual agentBaseAddress && len == original_len_used_in_alloc - Allow only valid values of ptr and len in ASan mode. - Here pinfo is retrieved from external hsa_amd_pointer_info(not internal device allocator function AmdgpuMemFuncs::GetPointerInfo) - ptr == pinfo.agentBaseAddress && len == pinfo.sizeInBytes - ptr == original_ptr_returned_by_ASAN && len == original_len_used_in_alloc
Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD
- added secret variables to configure sha's for spirv and hipify - added secret variable to configure theROCK sha - added fix for setting up PR_LABEL used in the CI framework - Removed hardcoded path /__w/llvm-project/llvm-project
…cs" (llvm#174224) Reverts llvm#131759 seeing regressions in : Pytorch UT- 8 test cases failed in "test_ops" test suite
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
revert "AMDGPU: Do not infer implicit inputs for !nocallback intrinsics" (llvm#174224)
for pyTiruch UT