Skip to content

Amd/dev/rlieberm/rev py torch ut#1015

Closed
ronlieb wants to merge 37 commits intoamd-stagingfrom
amd/dev/rlieberm/RevPyTorchUT
Closed

Amd/dev/rlieberm/rev py torch ut#1015
ronlieb wants to merge 37 commits intoamd-stagingfrom
amd/dev/rlieberm/RevPyTorchUT

Conversation

@ronlieb
Copy link
Collaborator

@ronlieb ronlieb commented Jan 6, 2026

revert "AMDGPU: Do not infer implicit inputs for !nocallback intrinsics" (llvm#174224)

for pyTiruch UT

hjagasiaAMD and others added 30 commits October 27, 2025 19:47
…lds (llvm#4338)

From: Jon Chesterfield <jon@spectralcompute.co.uk>
Fix for SWDEV-550687 - incorrect rpath for certain Red Hat systems

---------

Co-authored-by: Nicole Aschenbrenner <nicole.aschenbrenner@amd.com>
Co-authored-by: Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>
#435)

… build

Make use of find_package(AMDDeviceLibs) to search for the cmake config
in the build tree instead of looping through various paths.

Co-authored-by: Ethan Stewart <ethan.stewart@amd.com>
Co-authored-by: Yaxun (Sam) Liu <yaxun.liu@amd.com>
…4217)

This reverts commit 78bf682.

Original PR: llvm#157463
Revert PR: llvm#158566

The relevant buildbots have been updated to a ROCm version that does not
use the macros anymore to avoid the failures.

Implements SWDEV-522062.
The existing check for this case only comes after a derefence of what
can be an iterator sentinel (leading to an assert).

This may not be purely NFC in that it also avoids queuing the
effectively-empty region for rescheduling, but AFAICT this should be
purely an optimization.

Testing this seems difficult, as the high-level scheduler avoids
scheduling these "empty" regions. This means a reproducer has to depend
on behavior of the scheduler passes before PreRARematStage in order to
craft a region which triggers the bug.

Since this is a release blocker I am posting a PR now, as both Shore
Shen and I have manually verified that this resolves the particular
crash from
[SWDEV-564142](https://ontrack-internal.amd.com/browse/SWDEV-564142) but
I am still working on making a reasonable test.

(cherry picked from commit 004cfea)
This attribute isn't fully supported GDB and it's usage was leading to
some errors.
Since llvm#156765 ("[AMDGPU] Define 1024 VGPRs on gfx1250")
we have been considering unaddressable VGPRs when determining which to
mark as undefined in CFI. The net result was a combination of redundant
and nonsense records being generated.
…#552)

We do not have native instructions for direct bfloat comparisons.
However, we can
expand bfloat to float, and do float comparison instead.
    
TODO: handle bfloat comparison for ballot intrinsic on global isel path.
    
Fixes: SWDEV-563403
…) (#541)

This PR introduces a new pass "lower-workdistribute" Fortran array
statements are lowered to fir as fir.do_loop unordered.
"lower-workdistribute" pass works mainly on identifying "fir.do_loop
unordered" that is nested in target{teams{workdistribute{fir.do_loop
unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops
outside target region. Relaces heap allocation on target with
omp.target_allocmem and deallocation with omp.target_freemem from host.
Also replaces runtime function "Assign" with omp.target_memcpy from
host.

This pass implements following rewrites and optimisations:

- **FissionWorkdistribute**: finds the parallelizable ops within teams
{workdistribute} region and moves them to their own
teams{workdistribute} region.
- **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls
nested in teams {workdistribute{}} and lowers it to unordered do loop if
src is scalar and dest is array. Other runtime calls are not handled
currently.
- **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in
teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams
{parallel { distribute {wsloop {loop_nest}}}}.
- **TeamsWorkdistributeToSingle**: hoists all the ops inside teams
{workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from
coexecute implementation:


[flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024)

Paper related to this work by @ivanradanov ["Automatic Parallelization
and OpenMP Offloadingof Fortran Array

Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))
will work on script changes for aomp, and npsdb after it lands
- add_subdirectory(utils) in offload CMakeLists.txt
- usage of new macro add_openmp_util to install utils
  into llvm/bin
rocm 7.2 changed pci layout/info

really messes up xnack=1 performance

necessitates  forced path to numactl

      -nr  use numactl ROCR_VISIBLE_DEVICES
      -nm  use numactl OMPI_COMM_WORLD_LOCAL_RANK
Expose HW_REG_WAVE_SCHED_MODE to the s_getreg_b32, s_setreg_b32,
s_setreg_imm32_b32 instructions.

Co-authored-by: lancesix <lancelot.six@amd.com>
MIOpen failing for everyone on amd-mainline
[2025-12-07T19:34:15.558Z] -- Could NOT find rocMLIR (missing: rocMLIR_DIR)

[2025-12-07T19:34:15.558Z] -- Falling back to find library libMLIRMIOpen

[2025-12-07T19:34:15.558Z] CMake Error at CMakeLists.txt:445 (find_library):

[2025-12-07T19:34:15.558Z]   Could not find LIBMLIRMIOPEN using the following names: MLIRMIOpen
Merge commit '669e22f6553c5f9bca2d40a34cbfde9a770033f8' into HEAD
ampandey-AMD and others added 7 commits December 17, 2025 07:33
…#858)

…ipc_memory_create'. (#475)

- Use reinterpret_cast<uptr> for pointer arithmetic.
 - Add sanitizer interception logic for api 'hsa_amd_pointer_info'.
 - Allow only valid values of ptr and len in non-ASan mode.
- ptr == Actual agentBaseAddress && len == original_len_used_in_alloc
 - Allow only valid values of ptr and len in ASan mode.
- Here pinfo is retrieved from external hsa_amd_pointer_info(not
internal device allocator function AmdgpuMemFuncs::GetPointerInfo) - ptr
== pinfo.agentBaseAddress && len == pinfo.sizeInBytes
- ptr == original_ptr_returned_by_ASAN && len ==
original_len_used_in_alloc
 Merge remote-tracking branch 'external-mirror/promotion/amd-mainline/2025.11.24' into HEAD
 - added secret variables to configure sha's for spirv and hipify
 - added secret variable to configure theROCK sha
 - added fix for setting up PR_LABEL used in the CI framework
 - Removed hardcoded  path /__w/llvm-project/llvm-project
…cs" (llvm#174224)

Reverts llvm#131759

seeing regressions in : Pytorch UT- 8 test cases failed in "test_ops"
test suite
@ronlieb ronlieb requested review from SyamaAmd and lajagapp January 6, 2026 17:51
@ronlieb ronlieb closed this Jan 6, 2026
@z1-cciauto
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.