Amd/dev/rlieberm/rev py torch ut by ronlieb · Pull Request #1015 · ROCm/llvm-project

ronlieb · 2026-01-06T17:51:13Z

revert "AMDGPU: Do not infer implicit inputs for !nocallback intrinsics" (llvm#174224)

for pyTiruch UT

…lds (llvm#4338) From: Jon Chesterfield <jon@spectralcompute.co.uk>

Fix for SWDEV-550687 - incorrect rpath for certain Red Hat systems --------- Co-authored-by: Nicole Aschenbrenner <nicole.aschenbrenner@amd.com>

Co-authored-by: Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

#435) … build Make use of find_package(AMDDeviceLibs) to search for the cmake config in the build tree instead of looping through various paths. Co-authored-by: Ethan Stewart <ethan.stewart@amd.com>

Co-authored-by: Yaxun (Sam) Liu <yaxun.liu@amd.com>

…4217) This reverts commit 78bf682. Original PR: llvm#157463 Revert PR: llvm#158566 The relevant buildbots have been updated to a ROCm version that does not use the macros anymore to avoid the failures. Implements SWDEV-522062.

…4217) (#492)

The existing check for this case only comes after a derefence of what can be an iterator sentinel (leading to an assert). This may not be purely NFC in that it also avoids queuing the effectively-empty region for rescheduling, but AFAICT this should be purely an optimization. Testing this seems difficult, as the high-level scheduler avoids scheduling these "empty" regions. This means a reproducer has to depend on behavior of the scheduler passes before PreRARematStage in order to craft a region which triggers the bug. Since this is a release blocker I am posting a PR now, as both Shore Shen and I have manually verified that this resolves the particular crash from [SWDEV-564142](https://ontrack-internal.amd.com/browse/SWDEV-564142) but I am still working on making a reasonable test. (cherry picked from commit 004cfea)

This attribute isn't fully supported GDB and it's usage was leading to some errors.

…e dependency on isZeroSize (llvm#96422)" (#156)"" (llvm#2065)"" This reverts commit e56121c.

Since llvm#156765 ("[AMDGPU] Define 1024 VGPRs on gfx1250") we have been considering unaddressable VGPRs when determining which to mark as undefined in CFI. The net result was a combination of redundant and nonsense records being generated.

…#552) We do not have native instructions for direct bfloat comparisons. However, we can expand bfloat to float, and do float comparison instead. TODO: handle bfloat comparison for ballot intrinsic on global isel path. Fixes: SWDEV-563403

@ivanradanov

…) (#541) This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host. This pass implements following rewrites and optimisations: - **FissionWorkdistribute**: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region. - **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently. - **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}. - **TeamsWorkdistributeToSingle**: hoists all the ops inside teams {workdistribute{}} before teams op. The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation: [flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024) Paper related to this work by @ivanradanov ["Automatic Parallelization and OpenMP Offloadingof Fortran Array Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))

will work on script changes for aomp, and npsdb after it lands

- add_subdirectory(utils) in offload CMakeLists.txt - usage of new macro add_openmp_util to install utils into llvm/bin

rocm 7.2 changed pci layout/info really messes up xnack=1 performance necessitates forced path to numactl -nr use numactl ROCR_VISIBLE_DEVICES -nm use numactl OMPI_COMM_WORLD_LOCAL_RANK

Expose HW_REG_WAVE_SCHED_MODE to the s_getreg_b32, s_setreg_b32, s_setreg_imm32_b32 instructions. Co-authored-by: lancesix <lancelot.six@amd.com>

MIOpen failing for everyone on amd-mainline [2025-12-07T19:34:15.558Z] -- Could NOT find rocMLIR (missing: rocMLIR_DIR) [2025-12-07T19:34:15.558Z] -- Falling back to find library libMLIRMIOpen [2025-12-07T19:34:15.558Z] CMake Error at CMakeLists.txt:445 (find_library): [2025-12-07T19:34:15.558Z] Could not find LIBMLIRMIOPEN using the following names: MLIRMIOpen

Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD

…#858) …ipc_memory_create'. (#475) - Use reinterpret_cast<uptr> for pointer arithmetic. - Add sanitizer interception logic for api 'hsa_amd_pointer_info'. - Allow only valid values of ptr and len in non-ASan mode. - ptr == Actual agentBaseAddress && len == original_len_used_in_alloc - Allow only valid values of ptr and len in ASan mode. - Here pinfo is retrieved from external hsa_amd_pointer_info(not internal device allocator function AmdgpuMemFuncs::GetPointerInfo) - ptr == pinfo.agentBaseAddress && len == pinfo.sizeInBytes - ptr == original_ptr_returned_by_ASAN && len == original_len_used_in_alloc

Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD

passed all but MIOpen https://compiler-ci.amd.com/job/compiler-psdb-amd-mainline/537

- added secret variables to configure sha's for spirv and hipify - added secret variable to configure theROCK sha - added fix for setting up PR_LABEL used in the CI framework - Removed hardcoded path /__w/llvm-project/llvm-project

…cs" (llvm#174224) Reverts llvm#131759 seeing regressions in : Pytorch UT- 8 test cases failed in "test_ops" test suite

z1-cciauto · 2026-01-06T17:51:45Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-staging/3473

hjagasiaAMD and others added 30 commits October 27, 2025 19:47

[AMDGPU] Fix 160181. Be less optimistic when allocating module scope …

74bb07b

…lds (llvm#4338) From: Jon Chesterfield <jon@spectralcompute.co.uk>

[AMDGPU] Fix 160181. Be less optimistic when allocating module scope … (

0ade4ba

llvm#4642)

Amd/dev/catmoore/rel path (llvm#4422)

93b6499

Fix for SWDEV-550687 - incorrect rpath for certain Red Hat systems --------- Co-authored-by: Nicole Aschenbrenner <nicole.aschenbrenner@amd.com>

Amd/dev/catmoore/rel path (llvm#4422) (llvm#4643)

e28322a

Adjust device-libs search (llvm#4517)

322e3dc

Adjust device-libs search (llvm#4517) (llvm#4644)

451a842

Add missing ${extra_cmake_args} (llvm#4603)

5d0fb29

Co-authored-by: Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Add missing ${extra_cmake_args} (llvm#4603) (llvm#4645)

ce6e8d8

[openmp] - Update search method for rocm-device-libs during devicertl… (

cf5a4bf

#435) … build Make use of find_package(AMDDeviceLibs) to search for the cmake config in the build tree instead of looping through various paths. Co-authored-by: Ethan Stewart <ethan.stewart@amd.com>

convert HIP struct type vector to llvm vector type (#416)

87ad04d

Co-authored-by: Yaxun (Sam) Liu <yaxun.liu@amd.com>

convert HIP struct type vector to llvm vector type (#416) (#490)

445dd85

Reapply "[HIP][Clang] Remove __AMDGCN_WAVEFRONT_SIZE macros" (llvm#16…

abda683

…4217) (#492)

[device-libs][comgr] - Add gfx1250 and gfx1251 support (#553)

c514213

[HeterogeneousDWARF] Stop emitting DW_AT_address_class on pointer types

d95a90b

This attribute isn't fully supported GDB and it's usage was leading to some errors.

Revert "Revert "Revert "Reland "Revert "[clang][CGRecordLayout] Remov…

254d1a1

…e dependency on isZeroSize (llvm#96422)" (#156)"" (llvm#2065)"" This reverts commit e56121c.

Fixes for GDB testsuite failures (#564)

fbb4cb1

move gpurun to offload/utils

3f1ca97

will work on script changes for aomp, and npsdb after it lands

[offload][utils] - Add cmake install step for gpurun

0ac3344

- add_subdirectory(utils) in offload CMakeLists.txt - usage of new macro add_openmp_util to install utils into llvm/bin

[gpurun] force numatcl with rocr_vis_dev or mpi rank (#619)

7a00afd

rocm 7.2 changed pci layout/info really messes up xnack=1 performance necessitates forced path to numactl -nr use numactl ROCR_VISIBLE_DEVICES -nm use numactl OMPI_COMM_WORLD_LOCAL_RANK

[gpurun] add numactl check and fallback for -nm and -nr (#625)

49fea23

[gpurun] enable GPURUN_BYPASS after argument processing (#630)

f51cf9a

[gpurun] add to llvm-project , -nr and -nm to use numactl (#652)

b736d1f

[AMDGPU] Add support for HW_REG_WAVE_SCHED_MODE (llvm#169840) (#710)

0079ed9

Expose HW_REG_WAVE_SCHED_MODE to the s_getreg_b32, s_setreg_b32, s_setreg_imm32_b32 instructions. Co-authored-by: lancesix <lancelot.six@amd.com>

Bulk Promotion

3eb9e3e

Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD

ampandey-AMD and others added 7 commits December 17, 2025 07:33

enable Linux rock CI psdb (#912)

58339c2

`Bulk Promotion from 2025.11.24

265f4e5

Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD

Bulk Promotion from 2025.11.24 (#929)

08a72fc

passed all but MIOpen https://compiler-ci.amd.com/job/compiler-psdb-amd-mainline/537

Fix for multi SDMA error on strix Halo (#905) (#941)

68aab90

fix Linux ROCK CI trigger (#1000)

971d963

- added secret variables to configure sha's for spirv and hipify - added secret variable to configure theROCK sha - added fix for setting up PR_LABEL used in the CI framework - Removed hardcoded path /__w/llvm-project/llvm-project

Revert "AMDGPU: Do not infer implicit inputs for !nocallback intrinsi…

34bf9c4

…cs" (llvm#174224) Reverts llvm#131759 seeing regressions in : Pytorch UT- 8 test cases failed in "test_ops" test suite

ronlieb requested review from SyamaAmd and lajagapp January 6, 2026 17:51

ronlieb closed this Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amd/dev/rlieberm/rev py torch ut#1015

Amd/dev/rlieberm/rev py torch ut#1015
ronlieb wants to merge 37 commits intoamd-stagingfrom
amd/dev/rlieberm/RevPyTorchUT

ronlieb commented Jan 6, 2026

Uh oh!

z1-cciauto commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

ronlieb commented Jan 6, 2026

Uh oh!

z1-cciauto commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants