[experimental][FP16] Add native __half support for sum_functor #1655

mhalk · 2024-10-29T20:23:24Z

During cmake step set EnVar PYTORCH_REDUCESUM_ENABLE_NATIVE_HALF=1.
Enables experimental support of FP16 for sum_functor.
That is, operator+ will utilize __half types directly,
instead of using static_cast of arguments.

Note: only additions are affected by these changes.

- Fortran package installation moved after gcc - Update libtinfo search code in cmake1 - Install libstdc++.so

Co-authored-by: Wang, Yanyao <[email protected]>

Reversed the condition as required

(cherry picked from commit 9848db1) (cherry picked from commit ae01701)

…pired (ROCm#1399) * Skip certificate check only for CentOS7 since certificate expired * Naming

* Triton build conditionalized on ROCM_VERSION (cherry picked from commit 1a7e1fa) * Update pinned commit for rocm6.1 conditionalisation --------- Co-authored-by: Pruthvi Madugundu <[email protected]>

…t date This logic was present until release/2.2 (https://github.com/ROCm/pytorch/blob/4cd7f3ac9078ed449b8ae096887125f9b3b30659/.circleci/scripts/binary_populate_env.sh#L14) but was removed in release/2.3

…rsion (ROCm#1410) * Include ROCm patch version in triton version * Always include patch version (cherry picked from commit 9692570)

…MENTS since we plan to use builder repo to set it for older and newer branches. Otherwise we end up with duplicate triton dependency specification eg. PYTORCH_EXTRA_INSTALL_REQUIREMENTS='pytorch-triton-rocm==2.3.0+rocm6.1.0.4804a0dd4a | pytorch-triton-rocm==2.3.0+rocm6.1.0.4804a0dd4a'

…#1450) (cherry picked from commit 5783935)

… sync (ROCm#1455) (ROCm#1472) * [SWDEV-469514] hipGraphExecDestroy requires an explicit sync There is a new hip feature where they do not free hipGraph memory as soon as hipGraphExecDestroy is called. This is to support async work on the GPU. See this for more details: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-user-objects We noticed this issue when an allreduce op inside a hipGraph hung. Essentially, ncclCommAbort was waiting for all GPU activity to finish. However, since hipGraph memory was technically still in use, we had an infinite hang. So, I added an extra hipDeviceSynchronize in CUDAGraph's destructor to esure that memory is freed and got test_allreduce_in_cudagraph UT to pass. However, when I ran this on CUDA machine, I noticed that they did not require this extra sync in order to successfully run the UT. It seems that they were calling cudaGraphInstantiateWithFlags with cudaGraphInstantiateFlagAutoFreeOnLaunch, which aggressively frees memory after graph lauch. There is support for this API in our ROCm stack, but we were missing cuda to hip mappings in PyTorch. So, I brought them in and added the necesary conditions to call this API in HIP case also. * Update comments * Use USE_ROCM in keeping with convention * Use USE_ROCM to match convention --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit e752b4f)

…m#1492) (ROCm#1510) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9)

(cherry picked from commit 1feb1a8) Co-authored-by: Jithun Nair <[email protected]>

…tree (ROCm#1494) * Skip test__int_mm in 6.0 (cherry picked from commit bf4c478) * [release/2.3] fix test_vmapvjpvjp and skip test_profiler_experimental_tree (ROCm#1460) * fix test_vmapvjpvjp and skip test_profiler_experimental_tree (ROCm#1444) (cherry picked from commit 7e96391) * remove trailing spaces --------- Co-authored-by: Ramana Cherukuri <[email protected]> (cherry picked from commit 0766b9c) * Reformat test_float8_basics for current rocm support (ROCm#1415) (cherry picked from commit cb0e9ad) --------- Co-authored-by: Jack Taylor <[email protected]> Co-authored-by: Andres Lugo <[email protected]>

[release/2.4] Cherry-picks from release/2.3

ROCm#1525

[MPS][TYPE_PROMOTION] Fix Clamp (pytorch#130226) Summary: 1. Fixed pytorch#130201 by adding type promotion. 2. Added proper tests. 3. Found torch's type promotion is different from numpy as follows: ```python import torch import numpy as np np.clip(np.array([1], dtype=np.float32), np.array([1], dtype=np.int32), None).dtype # dtype('float64') torch.clamp(torch.tensor([1], dtype=torch.float32), torch.tensor([1], dtype=torch.int32)).dtype # torch.float32 ``` ~Not sure the proper way to handle it, it causes numpy ref tests to fail.~ Reason here, so think I'm gonna xfail it: https://github.com/pytorch/pytorch/blob/3c1cf03fde145bdbe1f5ffb81765d076c10b4c04/test/test_ops.py#L260-L264 Pull Request resolved: pytorch#130226 Approved by: https://github.com/malfet (cherry picked from commit 99967e1) Co-authored-by: Li-Huai (Allan) Lin <[email protected]>

@atalman

[Doc] update guide install mkl-static from conda to pip (pytorch#130026) <img width="619" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4ac3ca68-57dc-42c7-ac7a-876dc377ebcf"> Conda intel channel is not avaliable now. Use `pip` install instead of `conda`. `Windows` and `Linux` are avaliable: Binary list: https://pypi.org/project/mkl-static/#files `MacOS` is avaliable for old version: https://pypi.org/project/mkl-static/2021.3.0/#files TODO: 1. cherry-pick to `release/2.4` branch, @atalman . 2. fix it also in `release/2.3` branch: pytorch#131853 Pull Request resolved: pytorch#130026 Approved by: https://github.com/jgong5, https://github.com/atalman (cherry picked from commit 484852c) Co-authored-by: Xu Han <[email protected]>

pytorch#133346) fix for launching kernel invalid config error when calling embedding … (pytorch#130994) …with large index Fixes pytorch#130806 When an output size of 2147483648 (=131072*16384) is expected in the above issue, it throwed out the following error: RuntimeError: HIP error: invalid configuration argument What happened was that the second parameter passed to hipLaunchKernel was crazy {2147483648,1,1}. Found two issues in the Indexing.cu: 1: ptrdiff_t was used but it is signed int, outTotalSize >= 2147483648 can cause overflow when doing [this](https://github.com/pytorch/pytorch/blame/39493aa93419532957e6e5ee97cae842b53b8b59/aten/src/ATen/native/cuda/Indexing.cu#L1367): 2: On ROCm, std::min -> ::min did not work as expected when outTotalSize>=2147483648 As the result, 2147483648 was sent to hipLaunchKernel which the GPU does not support such a huge number since this number specifies the number of threads per block. The original code intended to set 128 threads per block, though this is debatable as the perf would not good for latest powerful GPUs (a TODO item to update for perf maybe?) , but at least it would not cause `invalid configuration argument` error. [Test] Run the same code snippet in the [issue](pytorch#130806), and print the output, its dim and numel(), which looks like below now: ``` output=tensor([[ 0.4044, -0.0244, -0.6865, ..., -0.7800, 0.1175, 1.6726], [-1.0866, -0.1609, 0.3538, ..., 1.9105, 0.7882, 1.1583], [-2.2079, 0.3736, 0.3610, ..., -0.2658, -0.0459, 1.3077], ..., [ 0.8753, -0.7482, -0.1978, ..., 0.9016, 1.1501, -0.5178], [-1.5845, -0.6277, 1.4520, ..., 0.5733, -2.1198, -0.0915], [-0.6310, -1.0239, -0.1910, ..., 0.4309, 0.1630, 0.3239]], device='cuda:0'), dim=2, numel=2147483648 ``` Added a large tensor unit test too. ``` /pytorch# pytest test/nn/test_embedding.py -k test_large_tensors ================================================================================== test session starts =================================================================================== platform linux -- Python 3.9.19, pytest-7.3.2, pluggy-1.4.0 rootdir: /dockerx/development/pytorch configfile: pytest.ini plugins: flakefinder-1.1.0, rerunfailures-14.0, xdist-3.3.1, xdoctest-1.1.0, cpp-2.3.0, hypothesis-5.35.1 collected 288 items / 287 deselected / 1 selected Running 1 items in this shard test/nn/test_embedding.py . [100%] =========================================================================== 1 passed, 287 deselected in 3.16s ============================================================================ ``` Pull Request resolved: pytorch#130994 Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell (cherry picked from commit 637ab85) Co-authored-by: hongxyan <[email protected]>

rocm-mici · 2024-11-14T02:42:52Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici · 2024-11-14T05:38:42Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici · 2024-11-15T06:16:34Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

  580 | #define DEPRECATED(msg) __attribute__ ((deprecated(msg)))
      |                                         ^
1 warning generated when compiling for gfx908.
[7950/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_t.hip.o
[7951/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-11-15T06:27:00Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-mici · 2024-11-15T18:55:44Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7978/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_t.hip.o
[7979/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_u.hip.o
[7980/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_v.hip.o
[7981/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_w.hip.o
[7982/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-11-19T20:26:10Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

okakarpa · 2024-11-21T17:43:13Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

okakarpa · 2024-11-22T18:39:45Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7944/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_t.hip.o
[7945/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adam_impl.hip.o
[7946/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adam_amsgrad_impl.hip.o
[7947/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_Loss.hip.o
[7948/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/./torch_hip_generated_flash_api.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.hip:57:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doru1004 · 2024-11-22T19:58:08Z

aten/src/ATen/native/cuda/ReduceSumProdKernel.cu

@@ -172,7 +172,7 @@ template <
    typename GeneralDispatcher>
 static void reduce_dispatch(TensorIterator& iter, GeneralDispatcher op) {
  if (iter.dtype() == kHalf) {
-    return OpFunctor<at::Half, float>{}(iter);


You need to ifdef this also.

Good catch -- Thank you!

okakarpa · 2024-11-23T09:43:27Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

okakarpa · 2024-11-26T05:39:20Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

okakarpa · 2024-11-26T06:30:51Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

Warning: Unused direct dependencies:
	/var/lib/jenkins/pytorch/build/lib/libshm.so
	/opt/rocm/lib/libhsa-runtime64.so.1
	/lib/x86_64-linux-gnu/libm.so.6
[7981/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/./torch_hip_generated_flash_api.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.hip:57:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

okakarpa · 2024-11-27T17:32:05Z

Jenkins build for 82d9a8816220c4582c87e348a7acb66bda51682e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7941/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adam_amsgrad_impl.hip.o
[7942/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_scaled_modified_bessel_k0.hip.o
[7943/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_modified_bessel_k1.hip.o
[7944/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adam_impl.hip.o
[7945/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/./torch_hip_generated_flash_api.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/flash_attn/torch_hip_generated_flash_api.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.hip:57:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

During cmake step set EnVar `PYTORCH_REDUCESUM_ENABLE_NATIVE_HALF=1`. Enables experimental support of FP16 for `sum_functor`. That is, operator+ will utilize __half types directly, instead of using static_cast<float> of arguments. Note: only additions are affected by these changes.

mhalk · 2024-11-29T19:22:22Z

Applied feedback from @doru1004 and changed the EnVar name to be more expressive.
Now it is PYTORCH_REDUCESUM_ENABLE_NATIVE_HALF (was: PYTORCH_ENABLE_HALF).

okakarpa · 2024-11-29T20:37:19Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

/var/lib/jenkins/pytorch/aten/src/ATen/native/hip/DistributionTemplates.h:182:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/var/lib/jenkins/pytorch/aten/src/ATen/native/hip/DistributionTemplates.h:182:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/var/lib/jenkins/pytorch/aten/src/ATen/native/hip/DistributionTemplates.h:182:17: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
12 warnings generated when compiling for gfx942.
[7983/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention.hip:84:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-12-03T18:20:01Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

  580 | #define DEPRECATED(msg) __attribute__ ((deprecated(msg)))
      |                                         ^
1 warning generated when compiling for gfx908.
[7944/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_modified_bessel_k1.hip.o
[7945/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-mici · 2024-12-04T06:02:19Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7942/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_modified_bessel_k1.hip.o
[7943/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_t.hip.o
[7944/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_modified_bessel_k0.hip.o
[7945/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_fused_adam_impl.hip.o
[7946/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-repo-management-api · 2024-12-04T19:01:40Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

	/var/lib/jenkins/pytorch/build/lib/libshm.so
	/opt/rocm/lib/libhsa-runtime64.so.1
	/lib/x86_64-linux-gnu/libm.so.6
[7982/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_UnaryGeometricAcoshKernel.hip.o
[7983/8635] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip && /opt/conda/envs/py_3.10/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/. && /opt/conda/envs/py_3.10/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/./torch_hip_generated_attention_backward.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/transformers/hip/torch_hip_generated_attention_backward.hip.o.cmake
In file included from /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/attention_backward.hip:49:
/var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/aotriton_adapter.h:120:10: error: no matching constructor for initialization of 'aotriton::TensorView<0>'
  120 |   return aotriton::TensorView<0>(reinterpret_cast<intptr_t>(q.data_ptr()),
      |          ^                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rocm-repo-management-api · 2024-12-06T01:16:59Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2024-12-06T15:04:57Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2024-12-09T06:20:58Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2024-12-11T09:57:15Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2024-12-11T22:21:43Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-03-17T22:35:48Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-03-17T22:47:26Z

Jenkins build for 2bfb4fcaf342a40c5df330fbb47fd7ae00647389 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

pruthvistony and others added 30 commits August 1, 2024 20:13

Updates to build on Jammy

16db906

- Fortran package installation moved after gcc - Update libtinfo search code in cmake1 - Install libstdc++.so

Set ROCM_PATH ENV in Centos docker container

ca7fcb9

[UB22.04] Updates to support latest scipy

1e13b02

Updated condition for libstc++ for Jammy

7132452

Fix ROCm installation failure in Ubuntu22.04 (ROCm#1285)

e942f9c

Co-authored-by: Wang, Yanyao <[email protected]>

Build required version of libpng for CentOS7

f548074

Changes to support docker v23

fbfb835

Reversed the condition as required

Skipped certain distributed tests (ROCm#1383)

a73847b

temporarily ignore certificate check for Miniconda

7727031

(cherry picked from commit 9848db1) (cherry picked from commit ae01701)

[release/2.1] Skip certificate check for CentOS7 since certificate ex…

6e9aa77

…pired (ROCm#1399) * Skip certificate check only for CentOS7 since certificate expired * Naming

release/2.2 triton commit pin for rocm6.1 conditionalisation (ROCm#1369)

518a0a7

* Triton build conditionalized on ROCM_VERSION (cherry picked from commit 1a7e1fa) * Update pinned commit for rocm6.1 conditionalisation --------- Co-authored-by: Pruthvi Madugundu <[email protected]>

[release/2.3] Include ROCm patch version unconditionally in triton ve…

c2c73d7

…rsion (ROCm#1410) * Include ROCm patch version in triton version * Always include patch version (cherry picked from commit 9692570)

increase tensor size to force out of memory exception on MI300X (ROCm…

5366b62

…#1450) (cherry picked from commit 5783935)

Fix lxml requirement (ROCm#1454)

f66244b

[release/2.2] cudagraph explicit sync only after capture_begin() (ROC…

2827617

…m#1492) (ROCm#1510) * cudagraph explicit sync only after capture_begin * use 'capture_dev_=-1' as not initialized value * use named constant instead of magic '-1' value (cherry picked from commit eb433b9)

SWDEV-472761: limit sympy version to 1.12.1 or lower (ROCm#1482)

f57dc6d

(cherry picked from commit 1feb1a8) Co-authored-by: Jithun Nair <[email protected]>

Update requirements.txt (ROCm#1498)

bc99aff

Use Triton commit with TRITON_LLVM_TARBALL_PATH changes

5144034

related_commits for release/2.4

f81a9ac

Merge pull request ROCm#1525 from ROCm/IFU_rel_2.4

07ce684

[release/2.4] Cherry-picks from release/2.3

Update lib paths for Almalinux/Manylinux2_28 and remove ROCm<6.0 support

dbbe001

Undo inadvertent introduction of scripts/amd/setup_rocm_libs.sh call by

af2171b

ROCm#1525

Use triton with TRITON_BUILD_PROTON fix

b35eecf

doru1004 reviewed Nov 22, 2024

View reviewed changes

mhalk force-pushed the mhalk/native-half-support branch from 82d9a88 to 2bfb4fc Compare November 29, 2024 19:20

jithunnair-amd force-pushed the release/2.4 branch from 9f6c415 to 2f00d85 Compare March 17, 2025 22:17

jithunnair-amd requested review from jithunnair-amd and jataylo as code owners March 17, 2025 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental][FP16] Add native __half support for sum_functor #1655

[experimental][FP16] Add native __half support for sum_functor #1655

mhalk commented Oct 29, 2024 •

edited

Loading

rocm-mici commented Nov 14, 2024

rocm-mici commented Nov 14, 2024

rocm-mici commented Nov 15, 2024

rocm-mici commented Nov 15, 2024

rocm-mici commented Nov 15, 2024

rocm-mici commented Nov 19, 2024

okakarpa commented Nov 21, 2024

okakarpa commented Nov 22, 2024

doru1004 Nov 22, 2024

mhalk Nov 22, 2024

okakarpa commented Nov 23, 2024

okakarpa commented Nov 26, 2024

okakarpa commented Nov 26, 2024

okakarpa commented Nov 27, 2024

mhalk commented Nov 29, 2024

okakarpa commented Nov 29, 2024

rocm-mici commented Dec 3, 2024

rocm-mici commented Dec 4, 2024

rocm-repo-management-api bot commented Dec 4, 2024

rocm-repo-management-api bot commented Dec 6, 2024

rocm-repo-management-api bot commented Dec 6, 2024

rocm-repo-management-api bot commented Dec 9, 2024 •

edited

Loading

rocm-repo-management-api bot commented Dec 11, 2024 •

edited

Loading

rocm-repo-management-api bot commented Dec 11, 2024 •

edited

Loading

rocm-repo-management-api bot commented Mar 17, 2025 •

edited

Loading

rocm-repo-management-api bot commented Mar 17, 2025 •

edited

Loading

[experimental][FP16] Add native __half support for sum_functor #1655

Are you sure you want to change the base?

[experimental][FP16] Add native __half support for sum_functor #1655

Conversation

mhalk commented Oct 29, 2024 • edited Loading

rocm-mici commented Nov 14, 2024

rocm-mici commented Nov 14, 2024

rocm-mici commented Nov 15, 2024

rocm-mici commented Nov 15, 2024

rocm-mici commented Nov 15, 2024

rocm-mici commented Nov 19, 2024

okakarpa commented Nov 21, 2024

okakarpa commented Nov 22, 2024

doru1004 Nov 22, 2024

Choose a reason for hiding this comment

mhalk Nov 22, 2024

Choose a reason for hiding this comment

okakarpa commented Nov 23, 2024

okakarpa commented Nov 26, 2024

okakarpa commented Nov 26, 2024

okakarpa commented Nov 27, 2024

mhalk commented Nov 29, 2024

okakarpa commented Nov 29, 2024

rocm-mici commented Dec 3, 2024

rocm-mici commented Dec 4, 2024

rocm-repo-management-api bot commented Dec 4, 2024

rocm-repo-management-api bot commented Dec 6, 2024

rocm-repo-management-api bot commented Dec 6, 2024

rocm-repo-management-api bot commented Dec 9, 2024 • edited Loading

rocm-repo-management-api bot commented Dec 11, 2024 • edited Loading

rocm-repo-management-api bot commented Dec 11, 2024 • edited Loading

rocm-repo-management-api bot commented Mar 17, 2025 • edited Loading

rocm-repo-management-api bot commented Mar 17, 2025 • edited Loading

mhalk commented Oct 29, 2024 •

edited

Loading

rocm-repo-management-api bot commented Dec 9, 2024 •

edited

Loading

rocm-repo-management-api bot commented Dec 11, 2024 •

edited

Loading

rocm-repo-management-api bot commented Dec 11, 2024 •

edited

Loading

rocm-repo-management-api bot commented Mar 17, 2025 •

edited

Loading

rocm-repo-management-api bot commented Mar 17, 2025 •

edited

Loading