Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legion Build errors with CUDA 12.5+ (or CCCL 2.4+) #1775

Open
Jacobfaib opened this issue Oct 17, 2024 · 8 comments
Open

Legion Build errors with CUDA 12.5+ (or CCCL 2.4+) #1775

Jacobfaib opened this issue Oct 17, 2024 · 8 comments

Comments

@Jacobfaib
Copy link
Contributor

Jacobfaib commented Oct 17, 2024

Legion defines its own implementation of __half, and cuda::std::complex<__half>. Relatively recent versions of CUDA and libcuxx now also define these operators, both on host and device. The following set of patches allow Legion to be compiled with CUDA 12.2 - 12.5 (possibly also lower versions, but not tested).

They should not be applied as-is, however, as they don't address several other cases:

  1. Lower versions of CUDA (not proven they don't work, just didn't test lower than 12.2).
  2. HIP.
  3. Varying versions of both CUDA and CCCL.
  4. Building Legion on a system where __half should be defined, but including Legion headers later where __half is already defined. Specifically, the CMakeLists.txt check should somehow be moved to header files and be done by checking versions. The specific error case is: a Legion package is built on a cluster using CUDA 12.2. A user installs that package, but they themselves have CUDA 12.5. Legion isn't incompatible with 12.5, so this is OK (and package managers won't downgrade the version of CUDA). But now LEGION_HAVE_CUDA_HOST_HALF is undefined, when it really should be true.
runtime/CMakeLists.txt
diff --git a/runtime/CMakeLists.txt b/runtime/CMakeLists.txt
index 70de60b25..2e5917444 100644
--- a/runtime/CMakeLists.txt
+++ b/runtime/CMakeLists.txt
@@ -901,6 +901,35 @@ if(Legion_USE_CUDA)
         message(FATAL_ERROR "Legion_REDOP_COMPLEX requires CUDA >= 11.7 (found ${CUDAToolkit_VERSION}) or libcudacxx >\
= 1.8.0")
       endif()
     endif()
+    include(CheckSourceCompiles)
+    include(CMakePushCheckState)
+
+    cmake_push_check_state(RESET)
+    set(CMAKE_REQUIRED_LIBRARIES CUDA::toolkit)
+    check_source_compiles(CUDA
+      [=[
+  #include <cuda_fp16.h>
+
+  int main()
+  {
+    __half h = __half(1);
+
+    h *= h;
+    h += h;
+    h /= h;
+    h -= h;
+    h = __half(1);
+    h = h + h - h * h / h;
+    return h == h ? 0 : 1;
+  }
+  ]=] LEGION_HAVE_CUDA_HOST_HALF)
+    cmake_pop_check_state()
+
+    if(LEGION_HAVE_CUDA_HOST_HALF)
+      target_compile_definitions(LegionRuntime PUBLIC LEGION_HAVE_CUDA_HOST_HALF=1)
+      target_compile_definitions(RealmRuntime PUBLIC LEGION_HAVE_CUDA_HOST_HALF=1)
+    endif()
+
     # complex reduction ops bring in a public dependency on cuda headers
     target_link_libraries(LegionRuntime PUBLIC CUDA::toolkit)
   endif()
runtime/mathtypes/complex.h
diff --git a/runtime/mathtypes/complex.h b/runtime/mathtypes/complex.h
index 62dd69611..82a84932c 100644
--- a/runtime/mathtypes/complex.h
+++ b/runtime/mathtypes/complex.h
@@ -38,7 +38,11 @@
 // cuda 12 (https://github.com/StanfordLegion/legion/issues/1469#)
 // TODO: remove it once the bug is fixed in the future release of cuda.
 #include <cuda_runtime.h>
 #include <cuda/std/complex>
+#include <cuda/version> // CCCL_MAJOR_VERSION
+#if CCCL_MAJOR_VERSION > 2 || (CCCL_MAJOR_VERSION == 2 && CCCL_MINOR_VERSION >= 4)
+#define LEGION_HAVE_CUDA_COMPLEX_HALF 1
+#endif
 #define COMPLEX_NAMESPACE cuda::std
 #endif
 #elif defined(LEGION_USE_HIP) && defined(__HIP_PLATFORM_AMD__)
@@ -93,7 +98,7 @@ inline bool operator>=(const complex<T>& c1, const complex<T>& c2) {

 } // namespace COMPLEX_NAMESPACE

-#ifdef COMPLEX_HALF
+#if defined(COMPLEX_HALF) && !defined(LEGION_HAVE_CUDA_COMPLEX_HALF)
 template<>
 class COMPLEX_NAMESPACE::complex<__half> {
 public:
runtime/mathtypes/half.h
diff --git a/runtime/mathtypes/half.h b/runtime/mathtypes/half.h
index dce3249c7..d285932d0 100644
--- a/runtime/mathtypes/half.h
+++ b/runtime/mathtypes/half.h
@@ -16,6 +16,8 @@
 #ifndef __HALF_H__
 #define __HALF_H__

+#include <legion_defines.h>
+
 #include <stdint.h>
 #include <string.h> // memcpy
 #include <cmath>
@@ -138,202 +140,26 @@ inline float __convert_halfint_to_float(uint16_t __x)
 #if defined(__CUDA_FP16_H__)
 #error "This header must be included before cuda_fp16.h"
 #endif
-#define __CUDA_NO_HALF_OPERATORS__
 #include <cuda_fp16.h>
 #elif defined(LEGION_USE_HIP)
 #ifdef __HIP_PLATFORM_NVCC__
 #if defined(__CUDA_FP16_H__)
 #error "This header must be included before cuda_fp16.h"
 #endif
-#define __CUDA_NO_HALF_OPERATORS__
 #include <cuda_fp16.h>
 #else
 #if defined(HIP_INCLUDE_HIP_HIP_FP16_H)
 #error "This header must be included before hip_fp16.h"
 #endif
-#define __HIP_NO_HALF_OPERATORS__
 #include <hip/hip_fp16.h>
 #endif
 #endif

-__CUDA_HD__
-inline __half operator-(const __half &one)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hneg(one);
-#else
-  return __float2half(-__half2float(one));
+#if __has_include(<cuda/version>)
+#include <cuda/version> // CCCL_MAJOR_VERSION
 #endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hneg(one);
-#else
-  return __half(-(float(one)));
-#endif
-}
-
-__CUDA_HD__
-inline __half operator+(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hadd(one, two);
-#else
-  return __float2half(__half2float(one) + __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hadd(one, two);
-#else
-  return __half(float(one) + float(two));
-#endif
-}
-
-__CUDA_HD__
-inline __half operator-(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hsub(one, two);
-#else
-  return __float2half(__half2float(one) - __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hsub(one, two);
-#else
-  return __half(float(one) - float(two));
-#endif
-}
-
-__CUDA_HD__
-inline __half operator*(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hmul(one, two);
-#else
-  return __float2half(__half2float(one) * __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hmul(one, two);
-#else
-  return __half(float(one) * float(two));
-#endif
-}
-
-__CUDA_HD__
-inline __half operator/(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ == 8
-  return hdiv(one, two);
-#elif __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 9
-  return __hdiv(one, two);
-#else
-  return __float2half(__half2float(one) / __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hdiv(one, two);
-#else
-  return __half(float(one) / float(two));
-#endif
-}
-
-__CUDA_HD__
-inline bool operator==(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __heq(one, two);
-#else
-  return (__half2float(one) == __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __heq(one, two);
-#else
-  return (float(one) == float(two));
-#endif
-}
-
-__CUDA_HD__
-inline bool operator!=(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hne(one, two);
-#else
-  return (__half2float(one) != __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hne(one, two);
-#else
-  return (float(one) != float(two));
-#endif
-}
-
-__CUDA_HD__
-inline bool operator<(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hlt(one, two);
-#else
-  return (__half2float(one) < __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hlt(one, two);
-#else
-  return (float(one) < float(two));
-#endif
-}
-
-__CUDA_HD__
-inline bool operator<=(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hle(one, two);
-#else
-  return (__half2float(one) <= __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hle(one, two);
-#else
-  return (float(one) <= float(two));
-#endif
-}
-
-__CUDA_HD__
-inline bool operator>(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hgt(one, two);
-#else
-  return (__half2float(one) > __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hgt(one, two);
-#else
-  return (float(one) > float(two));
-#endif
-}
-
-__CUDA_HD__
-inline bool operator>=(const __half &one, const __half &two)
-{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hge(one, two);
-#else
-  return (__half2float(one) >= __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hge(one, two);
-#else
-  return (float(one) >= float(two));
-#endif
-}

+#if CCCL_MAJOR_VERSION < 2 || (CCCL_MAJOR_VERSION == 2 && CCCL_MINOR_VERSION < 4)
 __CUDA_HD__
 inline __half asin(const __half &one)
 {
@@ -561,9 +387,11 @@ inline __half acos(const __half &one)
   return (__float2half(std::acos(__half2float(one))));
 #endif
 }
+#endif

-#else // not __CUDACC__ or __HIPCC__
-
+#elif defined(LEGION_USE_CUDA) && defined(LEGION_HAVE_CUDA_HOST_HALF)
+#include <cuda_fp16.h>
+#elif !defined(__CUDA_FP16_TYPES_EXIST__)
 struct __half
 {
   uint16_t __x;
runtime/legion/legion_redop.cc
diff --git a/runtime/legion/legion_redop.cc b/runtime/legion/legion_redop.cc
index 9d4cf49da..75ef9c3b3 100644
--- a/runtime/legion/legion_redop.cc
+++ b/runtime/legion/legion_redop.cc
@@ -20,10 +20,10 @@
 namespace Legion {

 #ifdef LEGION_REDOP_HALF
-  /*static*/ const __half SumReduction<__half>::identity = __half(0, false/*raw*/);
-  /*static*/ const __half DiffReduction<__half>::identity = __half(0, false/*raw*/);
-  /*static*/ const __half ProdReduction<__half>::identity = __half(1, false/*raw*/);
-  /*static*/ const __half DivReduction<__half>::identity = __half(1, false/*raw*/);
+  /*static*/ const __half SumReduction<__half>::identity = __half(0);
+  /*static*/ const __half DiffReduction<__half>::identity = __half(0);
+  /*static*/ const __half ProdReduction<__half>::identity = __half(1);
+  /*static*/ const __half DivReduction<__half>::identity = __half(1);
   /*static*/ const __half MaxReduction<__half>::identity = __half(-2e10);
   /*static*/ const __half MinReduction<__half>::identity = __half(2e10);
 #endif
@@ -45,10 +45,10 @@ namespace Legion {

 #ifdef LEGION_REDOP_COMPLEX
 #ifdef LEGION_REDOP_HALF
-  /*static*/ const complex<__half> SumReduction<complex<__half> >::identity = complex<__half>(__half(0, false/*raw*/),\
 __half(0, false/*raw*/));
-  /*static*/ const complex<__half> DiffReduction<complex<__half> >::identity = complex<__half>(__half(0, false/*raw*/)\
, __half(0, false/*raw*/));
-  /*static*/ const complex<__half> ProdReduction<complex<__half> >::identity = complex<__half>(__half(1, false/*raw*/)\
, __half(0, false/*raw*/));
-  /*static*/ const complex<__half> DivReduction<complex<__half> >::identity = complex<__half>(__half(1, false/*raw*/),\
 __half(0, false/*raw*/));
+  /*static*/ const complex<__half> SumReduction<complex<__half> >::identity = complex<__half>(__half(0), __half(0));
+  /*static*/ const complex<__half> DiffReduction<complex<__half> >::identity = complex<__half>(__half(0), __half(0));
+  /*static*/ const complex<__half> ProdReduction<complex<__half> >::identity = complex<__half>(__half(1), __half(0));
+  /*static*/ const complex<__half> DivReduction<complex<__half> >::identity = complex<__half>(__half(1), __half(0));
 #endif
   /*static*/ const complex<float> SumReduction<complex<float> >::identity = complex<float>(0.f, 0.f);
   /*static*/ const complex<float> DiffReduction<complex<float> >::identity = complex<float>(0.f, 0.f);
@Jacobfaib
Copy link
Contributor Author

cc @manopapad @elliottslaughter

@elliottslaughter
Copy link
Contributor

In the past we've relied on version checks rather than test-compiling programs to determine what's available. In the interest of consistency with the existing build, is there a reason not to do that?

Items 1-3 would be good to check. I'm not sure about item 4. Is this a use case we really want to support? I'm not against it per se, but this is the sort of thing that can add dramatically more complexity to our build process so unless the solution can be introduced without substantially more complexity, it doesn't honestly seem worth it.

@Jacobfaib
Copy link
Contributor Author

In the interest of consistency with the existing build, is there a reason not to do that?

No, and that's why the changes above shouldn't be applied as-is :). The CMakeLists.txt diff should be changed to do the same thing using version checks inside the header. I just was not sure exactly which magic combination of versions (CUDA, CCCL, libcucxx, etc.) was required...

Is this a use case we really want to support?

I think for the most part Legion already does. It does not explicitly rely on the build version of CUDA runtime (until now at least), and Realm dlsym()s all the symbols it needs from the CUDA driver at runtime.

We definitely want to support a single Legion binary for multiple CUDA versions in cunumeric/legate. Otherwise we might have to ship a copy of liblegion.so compiled for each Major/Minor pairing of the CUDA toolkit...

@manopapad
Copy link
Contributor

AFAIK the only CUDA Runtime dependency comes from legion_redop, which is the reason we're planning to absorb that code in Legate and build Legion w/o it (which should allow a single build of Legion to work across CUDA versions).

@elliottslaughter
Copy link
Contributor

The other thing we should check is whether the proposed change would be ABI compatible across versions. If we're playing tricks with version checks in the headers, my concern is that it would be easy for non-ABI compatibility to slip in there without us noticing. Sure, in principle it should work, but it's just very easy to make mistakes.

Overall I don't have an objection to portable solutions, I just want to be mindful about the tradeoffs we're picking up while we do it.

@Jacobfaib
Copy link
Contributor Author

The following set of achieves the same effect without build-time checks

runtime/mathtypes/complex.h

diff --git a/runtime/mathtypes/complex.h b/runtime/mathtypes/complex.h
index 62dd69611..82a84932c 100644
--- a/runtime/mathtypes/complex.h
+++ b/runtime/mathtypes/complex.h
@@ -38,7 +38,11 @@
 // cuda 12 (https://github.com/StanfordLegion/legion/issues/1469#)
 // TODO: remove it once the bug is fixed in the future release of cuda.
 #include <cuda_runtime.h>
 #include <cuda/std/complex>
+#include <cuda/version> // CCCL_MAJOR_VERSION
+#if CCCL_MAJOR_VERSION > 2 || (CCCL_MAJOR_VERSION == 2 && CCCL_MINOR_VERSION >= 4)
+#define LEGION_HAVE_CUDA_COMPLEX_HALF 1
+#endif
 #define COMPLEX_NAMESPACE cuda::std
 #endif
 #elif defined(LEGION_USE_HIP) && defined(__HIP_PLATFORM_AMD__)
@@ -93,7 +98,7 @@ inline bool operator>=(const complex<T>& c1, const complex<T>& c2) {

 } // namespace COMPLEX_NAMESPACE

-#ifdef COMPLEX_HALF
+#if defined(COMPLEX_HALF) && !defined(LEGION_HAVE_CUDA_COMPLEX_HALF)
 template<>
 class COMPLEX_NAMESPACE::complex<__half> {
 public:

runtime/mathtypes/half.h

diff --git a/runtime/mathtypes/half.h b/runtime/mathtypes/half.h
index dce3249c7..43e4e4f5e 100644
--- a/runtime/mathtypes/half.h
+++ b/runtime/mathtypes/half.h
@@ -16,6 +16,8 @@
 #ifndef __HALF_H__
 #define __HALF_H__
 
+#include <legion_defines.h>
+
 #include <stdint.h>
 #include <string.h> // memcpy
 #include <cmath>
@@ -131,209 +133,169 @@ inline float __convert_halfint_to_float(uint16_t __x)
   return result;
 }
 
-#if defined (__CUDACC__) || defined (__HIPCC__)
-// The CUDA Toolkit only provides device versions for half precision operators,
-// so we have to provide custom implementations below.
-#if defined(LEGION_USE_CUDA)
-#if defined(__CUDA_FP16_H__)
-#error "This header must be included before cuda_fp16.h"
-#endif
-#define __CUDA_NO_HALF_OPERATORS__
+#ifdef LEGION_USE_CUDA
 #include <cuda_fp16.h>
+// Must include cuda/std/cmath here because CCCL does e.g. "using ::isinf", and we want it to
+// pick up the std::isinf, not our isinf, because otherwise it results in multiple
+// definitions. I don't know why this fixes it (obviously, there still will be multiple
+// definitions of isinf()), but hey, I don't make the rules.
+#include <cuda/std/cmath>
 #elif defined(LEGION_USE_HIP)
 #ifdef __HIP_PLATFORM_NVCC__
-#if defined(__CUDA_FP16_H__)
-#error "This header must be included before cuda_fp16.h"
-#endif
-#define __CUDA_NO_HALF_OPERATORS__
 #include <cuda_fp16.h>
+#include <cuda/std/cmath>
 #else
-#if defined(HIP_INCLUDE_HIP_HIP_FP16_H)
-#error "This header must be included before hip_fp16.h"
-#endif
-#define __HIP_NO_HALF_OPERATORS__
 #include <hip/hip_fp16.h>
 #endif
+#elif __has_include(<cuda_fp16.h>)
+// Include this proactively because CCCL will if __has_include(<cuda_fp16.h>) is true, which
+// ultimately ends up with multiple definitions of __half
+#include <cuda_fp16.h>
+#include <cuda/std/cmath>
 #endif
 
-__CUDA_HD__
+#ifndef __CUDA_FP16_TYPES_EXIST__
+struct __half
+{
+  uint16_t __x{};
+
+  constexpr __half() = default;
+
+  /// Constructor from uint16_t
+  inline __half(short a, bool raw)
+  {
+    if (raw)
+      __x = a;
+    else
+      __x = __convert_float_to_halfint(float(a));
+  }
+
+  /// Constructor from float
+  inline explicit __half(float a)
+  {
+    __x = __convert_float_to_halfint(a);
+  }
+
+  inline __half& operator=(const float &rhs)
+  {
+    __x = __convert_float_to_halfint(rhs);
+    return *this;
+  }
+
+  /// Cast to float
+  inline operator float() const
+  {
+    return __convert_halfint_to_float(__x);
+  }
+
+  /// Get raw storage
+  inline uint16_t raw() const
+  {
+    return this->__x;
+  }
+
+  inline void set_raw(uint16_t raw)
+  {
+    this->__x = raw;
+  }
+
+  /// Increment
+  inline __half& operator +=(const __half &rhs)
+  {
+    *this = __half(float(*this) + float(rhs));
+    return *this;
+  }
+
+  /// Decrement
+  inline __half& operator -=(const __half&rhs)
+  {
+    *this = __half(float(*this) - float(rhs));
+    return *this;
+  }
+
+  /// Scale up
+  inline __half& operator *=(const __half &rhs)
+  {
+    *this = __half(float(*this) * float(rhs));
+    return *this;
+  }
+
+  /// Scale down
+  inline __half& operator /=(const __half &rhs)
+  {
+    *this = __half(float(*this) / float(rhs));
+    return *this;
+  }
+
+};
+
 inline __half operator-(const __half &one)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hneg(one);
-#else
-  return __float2half(-__half2float(one));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hneg(one);
-#else
   return __half(-(float(one)));
-#endif
 }
 
-__CUDA_HD__
 inline __half operator+(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hadd(one, two);
-#else
-  return __float2half(__half2float(one) + __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hadd(one, two);
-#else
   return __half(float(one) + float(two));
-#endif
 }
 
-__CUDA_HD__
 inline __half operator-(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hsub(one, two);
-#else
-  return __float2half(__half2float(one) - __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hsub(one, two);
-#else
   return __half(float(one) - float(two));
-#endif
 }
 
-__CUDA_HD__
 inline __half operator*(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hmul(one, two);
-#else
-  return __float2half(__half2float(one) * __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hmul(one, two);
-#else
   return __half(float(one) * float(two));
-#endif
 }
 
-__CUDA_HD__
 inline __half operator/(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ == 8
-  return hdiv(one, two);
-#elif __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 9
-  return __hdiv(one, two);
-#else
-  return __float2half(__half2float(one) / __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hdiv(one, two);
-#else
   return __half(float(one) / float(two));
-#endif
 }
 
-__CUDA_HD__
 inline bool operator==(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __heq(one, two);
-#else
-  return (__half2float(one) == __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __heq(one, two);
-#else
   return (float(one) == float(two));
-#endif
 }
 
-__CUDA_HD__
 inline bool operator!=(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hne(one, two);
-#else
-  return (__half2float(one) != __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hne(one, two);
-#else
   return (float(one) != float(two));
-#endif
 }
 
-__CUDA_HD__
 inline bool operator<(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hlt(one, two);
-#else
-  return (__half2float(one) < __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hlt(one, two);
-#else
   return (float(one) < float(two));
-#endif
 }
 
-__CUDA_HD__
 inline bool operator<=(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hle(one, two);
-#else
-  return (__half2float(one) <= __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hle(one, two);
-#else
   return (float(one) <= float(two));
-#endif
 }
 
-__CUDA_HD__
 inline bool operator>(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hgt(one, two);
-#else
-  return (__half2float(one) > __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hgt(one, two);
-#else
   return (float(one) > float(two));
-#endif
 }
 
-__CUDA_HD__
 inline bool operator>=(const __half &one, const __half &two)
 {
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
-  return __hge(one, two);
-#else
-  return (__half2float(one) >= __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
-  return __hge(one, two);
-#else
   return (float(one) >= float(two));
-#endif
 }
 
+inline __half __convert_float_to_half(const float &a)
+{
+  uint16_t temp = __convert_float_to_halfint(a);
+  __half result(0, true/*raw*/);
+  result.set_raw(temp);
+  return result;
+}
+#endif
+
+#if defined (__CUDACC__) || defined (__HIPCC__)
+// The CUDA Toolkit only provides device versions for half precision operators,
+// so we have to provide custom implementations below.
 __CUDA_HD__
 inline __half asin(const __half &one)
 {
@@ -564,146 +526,6 @@ inline __half acos(const __half &one)
 
 #else // not __CUDACC__ or __HIPCC__
 
-struct __half
-{
-  uint16_t __x;
-
-  inline __half(void)
-  {
-    __x = 0;
-  }
-
-  /// Constructor from uint16_t
-  inline __half(short a, bool raw)
-  {
-    if (raw)
-      __x = a;
-    else
-      __x = __convert_float_to_halfint(float(a));
-  }
-
-  /// Constructor from float
-  inline explicit __half(float a)
-  {
-    __x = __convert_float_to_halfint(a);
-  }
-
-  inline __half& operator=(const float &rhs)
-  {
-    __x = __convert_float_to_halfint(rhs);
-    return *this;
-  }
-
-  /// Cast to float
-  inline operator float() const
-  {
-    return __convert_halfint_to_float(__x);
-  }
-
-  /// Get raw storage
-  inline uint16_t raw() const
-  {
-    return this->__x;
-  }
-
-  inline void set_raw(uint16_t raw)
-  {
-    this->__x = raw;
-  }
-
-  /// Increment
-  inline __half& operator +=(const __half &rhs)
-  {
-    *this = __half(float(*this) + float(rhs));
-    return *this;
-  }
-
-  /// Decrement
-  inline __half& operator -=(const __half&rhs)
-  {
-    *this = __half(float(*this) - float(rhs));
-    return *this;
-  }
-
-  /// Scale up
-  inline __half& operator *=(const __half &rhs)
-  {
-    *this = __half(float(*this) * float(rhs));
-    return *this;
-  }
-
-  /// Scale down
-  inline __half& operator /=(const __half &rhs)
-  {
-    *this = __half(float(*this) / float(rhs));
-    return *this;
-  }
-
-};
-
-inline __half operator-(const __half &one)
-{
-  return __half(-(float(one)));
-}
-
-inline __half operator+(const __half &one, const __half &two)
-{
-  return __half(float(one) + float(two));
-}
-
-inline __half operator-(const __half &one, const __half &two)
-{
-  return __half(float(one) - float(two));
-}
-
-inline __half operator*(const __half &one, const __half &two)
-{
-  return __half(float(one) * float(two));
-}
-
-inline __half operator/(const __half &one, const __half &two)
-{
-  return __half(float(one) / float(two));
-}
-
-inline bool operator==(const __half &one, const __half &two)
-{
-  return (float(one) == float(two));
-}
-
-inline bool operator!=(const __half &one, const __half &two)
-{
-  return (float(one) != float(two));
-}
-
-inline bool operator<(const __half &one, const __half &two)
-{
-  return (float(one) < float(two));
-}
-
-inline bool operator<=(const __half &one, const __half &two)
-{
-  return (float(one) <= float(two));
-}
-
-inline bool operator>(const __half &one, const __half &two)
-{
-  return (float(one) > float(two));
-}
-
-inline bool operator>=(const __half &one, const __half &two)
-{
-  return (float(one) >= float(two));
-}
-
-inline __half __convert_float_to_half(const float &a)
-{
-  uint16_t temp = __convert_float_to_halfint(a);
-  __half result(0, true/*raw*/);
-  result.set_raw(temp);
-  return result;
-}
-
 inline __half floor(const __half &a)
 {
   return static_cast<__half>(::floor(static_cast<float>(a)));
@@ -774,6 +596,16 @@ inline __half sqrt(const __half &a)
   return static_cast<__half>(::sqrt(static_cast<float>(a)));
 }
 
+inline bool isinf(__half a)
+{
+  return std::isinf(static_cast<float>(a));
+}
+
+inline bool isnan(__half a)
+{
+  return std::isnan(static_cast<float>(a));
+}
+
 #endif // Not nvcc or hipcc
 
 #endif // __HALF_H__

runtime/legion/legion_redop.cc

diff --git a/runtime/legion/legion_redop.cc b/runtime/legion/legion_redop.cc
index 9d4cf49da..75ef9c3b3 100644
--- a/runtime/legion/legion_redop.cc
+++ b/runtime/legion/legion_redop.cc
@@ -20,10 +20,10 @@
 namespace Legion {

 #ifdef LEGION_REDOP_HALF
-  /*static*/ const __half SumReduction<__half>::identity = __half(0, false/*raw*/);
-  /*static*/ const __half DiffReduction<__half>::identity = __half(0, false/*raw*/);
-  /*static*/ const __half ProdReduction<__half>::identity = __half(1, false/*raw*/);
-  /*static*/ const __half DivReduction<__half>::identity = __half(1, false/*raw*/);
+  /*static*/ const __half SumReduction<__half>::identity = __half(0);
+  /*static*/ const __half DiffReduction<__half>::identity = __half(0);
+  /*static*/ const __half ProdReduction<__half>::identity = __half(1);
+  /*static*/ const __half DivReduction<__half>::identity = __half(1);
   /*static*/ const __half MaxReduction<__half>::identity = __half(-2e10);
   /*static*/ const __half MinReduction<__half>::identity = __half(2e10);
 #endif
@@ -45,10 +45,10 @@ namespace Legion {

 #ifdef LEGION_REDOP_COMPLEX
 #ifdef LEGION_REDOP_HALF
-  /*static*/ const complex<__half> SumReduction<complex<__half> >::identity = complex<__half>(__half(0, false/*raw*/), __half(0, false/*raw*/));
-  /*static*/ const complex<__half> DiffReduction<complex<__half> >::identity = complex<__half>(__half(0, false/*raw*/), __half(0, false/*raw*/));
-  /*static*/ const complex<__half> ProdReduction<complex<__half> >::identity = complex<__half>(__half(1, false/*raw*/), __half(0, false/*raw*/));
-  /*static*/ const complex<__half> DivReduction<complex<__half> >::identity = complex<__half>(__half(1, false/*raw*/), __half(0, false/*raw*/));
+  /*static*/ const complex<__half> SumReduction<complex<__half> >::identity = complex<__half>(__half(0), __half(0));
+  /*static*/ const complex<__half> DiffReduction<complex<__half> >::identity = complex<__half>(__half(0), __half(0));
+  /*static*/ const complex<__half> ProdReduction<complex<__half> >::identity = complex<__half>(__half(1), __half(0));
+  /*static*/ const complex<__half> DivReduction<complex<__half> >::identity = complex<__half>(__half(1), __half(0));
 #endif
   /*static*/ const complex<float> SumReduction<complex<float> >::identity = complex<float>(0.f, 0.f);
   /*static*/ const complex<float> DiffReduction<complex<float> >::identity = complex<float>(0.f, 0.f);

@lightsighter
Copy link
Contributor

I don't think we should be trying to make Legion's version of __half and complex play nice with CCCL. Instead users should be picking at build time either to use CCCL or Legion's version of these types. They need to pick one or the other when they run Legion's build system, but they shouldn't be trying to mix them. I think the only thing we should do is to change Legion's build systems to make __half and complex be opt-in features instead of opt-out and then users who don't have CCCL can opt-in and get the backwards support that they need from Legion's types. We should be doing the bare minimum of effort to support these types because they were only there as a bridge until CUDA got their shit together on this front (which they finally have with CCCL). I don't believe we should do anything to support them further.

@lightsighter
Copy link
Contributor

Further amending my comment here: it is not Legion's responsibility to provide a portability layer for CUDA. If users want to use CCCL's types, they should turn off Legion's support for __half and complex and then roll their own reduction operators using CCCL's types. Legion's built-in reduction operators are a convenience and come as they are. The built-in reduction operators are not the only reduction operators you can use and we allow users to register their own reduction operators for expressly this purpose: to support other types that Legion does not know about (you can even register them with the same reduction operator IDs if you want). I don't want to be in a position where Legion has to do different things depending on which version of CUDA someone is using and be willing to guarantee that portability to everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants