-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Legion Build errors with CUDA 12.5+ (or CCCL 2.4+) #1775
Comments
In the past we've relied on version checks rather than test-compiling programs to determine what's available. In the interest of consistency with the existing build, is there a reason not to do that? Items 1-3 would be good to check. I'm not sure about item 4. Is this a use case we really want to support? I'm not against it per se, but this is the sort of thing that can add dramatically more complexity to our build process so unless the solution can be introduced without substantially more complexity, it doesn't honestly seem worth it. |
No, and that's why the changes above shouldn't be applied as-is :). The
I think for the most part Legion already does. It does not explicitly rely on the build version of CUDA runtime (until now at least), and Realm We definitely want to support a single Legion binary for multiple CUDA versions in cunumeric/legate. Otherwise we might have to ship a copy of |
AFAIK the only CUDA Runtime dependency comes from |
The other thing we should check is whether the proposed change would be ABI compatible across versions. If we're playing tricks with version checks in the headers, my concern is that it would be easy for non-ABI compatibility to slip in there without us noticing. Sure, in principle it should work, but it's just very easy to make mistakes. Overall I don't have an objection to portable solutions, I just want to be mindful about the tradeoffs we're picking up while we do it. |
The following set of achieves the same effect without build-time checks runtime/mathtypes/complex.h
diff --git a/runtime/mathtypes/complex.h b/runtime/mathtypes/complex.h
index 62dd69611..82a84932c 100644
--- a/runtime/mathtypes/complex.h
+++ b/runtime/mathtypes/complex.h
@@ -38,7 +38,11 @@
// cuda 12 (https://github.com/StanfordLegion/legion/issues/1469#)
// TODO: remove it once the bug is fixed in the future release of cuda.
#include <cuda_runtime.h>
#include <cuda/std/complex>
+#include <cuda/version> // CCCL_MAJOR_VERSION
+#if CCCL_MAJOR_VERSION > 2 || (CCCL_MAJOR_VERSION == 2 && CCCL_MINOR_VERSION >= 4)
+#define LEGION_HAVE_CUDA_COMPLEX_HALF 1
+#endif
#define COMPLEX_NAMESPACE cuda::std
#endif
#elif defined(LEGION_USE_HIP) && defined(__HIP_PLATFORM_AMD__)
@@ -93,7 +98,7 @@ inline bool operator>=(const complex<T>& c1, const complex<T>& c2) {
} // namespace COMPLEX_NAMESPACE
-#ifdef COMPLEX_HALF
+#if defined(COMPLEX_HALF) && !defined(LEGION_HAVE_CUDA_COMPLEX_HALF)
template<>
class COMPLEX_NAMESPACE::complex<__half> {
public:
runtime/mathtypes/half.h
diff --git a/runtime/mathtypes/half.h b/runtime/mathtypes/half.h
index dce3249c7..43e4e4f5e 100644
--- a/runtime/mathtypes/half.h
+++ b/runtime/mathtypes/half.h
@@ -16,6 +16,8 @@
#ifndef __HALF_H__
#define __HALF_H__
+#include <legion_defines.h>
+
#include <stdint.h>
#include <string.h> // memcpy
#include <cmath>
@@ -131,209 +133,169 @@ inline float __convert_halfint_to_float(uint16_t __x)
return result;
}
-#if defined (__CUDACC__) || defined (__HIPCC__)
-// The CUDA Toolkit only provides device versions for half precision operators,
-// so we have to provide custom implementations below.
-#if defined(LEGION_USE_CUDA)
-#if defined(__CUDA_FP16_H__)
-#error "This header must be included before cuda_fp16.h"
-#endif
-#define __CUDA_NO_HALF_OPERATORS__
+#ifdef LEGION_USE_CUDA
#include <cuda_fp16.h>
+// Must include cuda/std/cmath here because CCCL does e.g. "using ::isinf", and we want it to
+// pick up the std::isinf, not our isinf, because otherwise it results in multiple
+// definitions. I don't know why this fixes it (obviously, there still will be multiple
+// definitions of isinf()), but hey, I don't make the rules.
+#include <cuda/std/cmath>
#elif defined(LEGION_USE_HIP)
#ifdef __HIP_PLATFORM_NVCC__
-#if defined(__CUDA_FP16_H__)
-#error "This header must be included before cuda_fp16.h"
-#endif
-#define __CUDA_NO_HALF_OPERATORS__
#include <cuda_fp16.h>
+#include <cuda/std/cmath>
#else
-#if defined(HIP_INCLUDE_HIP_HIP_FP16_H)
-#error "This header must be included before hip_fp16.h"
-#endif
-#define __HIP_NO_HALF_OPERATORS__
#include <hip/hip_fp16.h>
#endif
+#elif __has_include(<cuda_fp16.h>)
+// Include this proactively because CCCL will if __has_include(<cuda_fp16.h>) is true, which
+// ultimately ends up with multiple definitions of __half
+#include <cuda_fp16.h>
+#include <cuda/std/cmath>
#endif
-__CUDA_HD__
+#ifndef __CUDA_FP16_TYPES_EXIST__
+struct __half
+{
+ uint16_t __x{};
+
+ constexpr __half() = default;
+
+ /// Constructor from uint16_t
+ inline __half(short a, bool raw)
+ {
+ if (raw)
+ __x = a;
+ else
+ __x = __convert_float_to_halfint(float(a));
+ }
+
+ /// Constructor from float
+ inline explicit __half(float a)
+ {
+ __x = __convert_float_to_halfint(a);
+ }
+
+ inline __half& operator=(const float &rhs)
+ {
+ __x = __convert_float_to_halfint(rhs);
+ return *this;
+ }
+
+ /// Cast to float
+ inline operator float() const
+ {
+ return __convert_halfint_to_float(__x);
+ }
+
+ /// Get raw storage
+ inline uint16_t raw() const
+ {
+ return this->__x;
+ }
+
+ inline void set_raw(uint16_t raw)
+ {
+ this->__x = raw;
+ }
+
+ /// Increment
+ inline __half& operator +=(const __half &rhs)
+ {
+ *this = __half(float(*this) + float(rhs));
+ return *this;
+ }
+
+ /// Decrement
+ inline __half& operator -=(const __half&rhs)
+ {
+ *this = __half(float(*this) - float(rhs));
+ return *this;
+ }
+
+ /// Scale up
+ inline __half& operator *=(const __half &rhs)
+ {
+ *this = __half(float(*this) * float(rhs));
+ return *this;
+ }
+
+ /// Scale down
+ inline __half& operator /=(const __half &rhs)
+ {
+ *this = __half(float(*this) / float(rhs));
+ return *this;
+ }
+
+};
+
inline __half operator-(const __half &one)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hneg(one);
-#else
- return __float2half(-__half2float(one));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hneg(one);
-#else
return __half(-(float(one)));
-#endif
}
-__CUDA_HD__
inline __half operator+(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hadd(one, two);
-#else
- return __float2half(__half2float(one) + __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hadd(one, two);
-#else
return __half(float(one) + float(two));
-#endif
}
-__CUDA_HD__
inline __half operator-(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hsub(one, two);
-#else
- return __float2half(__half2float(one) - __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hsub(one, two);
-#else
return __half(float(one) - float(two));
-#endif
}
-__CUDA_HD__
inline __half operator*(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hmul(one, two);
-#else
- return __float2half(__half2float(one) * __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hmul(one, two);
-#else
return __half(float(one) * float(two));
-#endif
}
-__CUDA_HD__
inline __half operator/(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ == 8
- return hdiv(one, two);
-#elif __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 9
- return __hdiv(one, two);
-#else
- return __float2half(__half2float(one) / __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hdiv(one, two);
-#else
return __half(float(one) / float(two));
-#endif
}
-__CUDA_HD__
inline bool operator==(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __heq(one, two);
-#else
- return (__half2float(one) == __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __heq(one, two);
-#else
return (float(one) == float(two));
-#endif
}
-__CUDA_HD__
inline bool operator!=(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hne(one, two);
-#else
- return (__half2float(one) != __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hne(one, two);
-#else
return (float(one) != float(two));
-#endif
}
-__CUDA_HD__
inline bool operator<(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hlt(one, two);
-#else
- return (__half2float(one) < __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hlt(one, two);
-#else
return (float(one) < float(two));
-#endif
}
-__CUDA_HD__
inline bool operator<=(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hle(one, two);
-#else
- return (__half2float(one) <= __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hle(one, two);
-#else
return (float(one) <= float(two));
-#endif
}
-__CUDA_HD__
inline bool operator>(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hgt(one, two);
-#else
- return (__half2float(one) > __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hgt(one, two);
-#else
return (float(one) > float(two));
-#endif
}
-__CUDA_HD__
inline bool operator>=(const __half &one, const __half &two)
{
-#ifdef __CUDA_ARCH__
-#if __CUDA_ARCH__ >= 530 && __CUDACC_VER_MAJOR__ >= 8
- return __hge(one, two);
-#else
- return (__half2float(one) >= __half2float(two));
-#endif
-#elif defined(__HIP_DEVICE_COMPILE__)
- return __hge(one, two);
-#else
return (float(one) >= float(two));
-#endif
}
+inline __half __convert_float_to_half(const float &a)
+{
+ uint16_t temp = __convert_float_to_halfint(a);
+ __half result(0, true/*raw*/);
+ result.set_raw(temp);
+ return result;
+}
+#endif
+
+#if defined (__CUDACC__) || defined (__HIPCC__)
+// The CUDA Toolkit only provides device versions for half precision operators,
+// so we have to provide custom implementations below.
__CUDA_HD__
inline __half asin(const __half &one)
{
@@ -564,146 +526,6 @@ inline __half acos(const __half &one)
#else // not __CUDACC__ or __HIPCC__
-struct __half
-{
- uint16_t __x;
-
- inline __half(void)
- {
- __x = 0;
- }
-
- /// Constructor from uint16_t
- inline __half(short a, bool raw)
- {
- if (raw)
- __x = a;
- else
- __x = __convert_float_to_halfint(float(a));
- }
-
- /// Constructor from float
- inline explicit __half(float a)
- {
- __x = __convert_float_to_halfint(a);
- }
-
- inline __half& operator=(const float &rhs)
- {
- __x = __convert_float_to_halfint(rhs);
- return *this;
- }
-
- /// Cast to float
- inline operator float() const
- {
- return __convert_halfint_to_float(__x);
- }
-
- /// Get raw storage
- inline uint16_t raw() const
- {
- return this->__x;
- }
-
- inline void set_raw(uint16_t raw)
- {
- this->__x = raw;
- }
-
- /// Increment
- inline __half& operator +=(const __half &rhs)
- {
- *this = __half(float(*this) + float(rhs));
- return *this;
- }
-
- /// Decrement
- inline __half& operator -=(const __half&rhs)
- {
- *this = __half(float(*this) - float(rhs));
- return *this;
- }
-
- /// Scale up
- inline __half& operator *=(const __half &rhs)
- {
- *this = __half(float(*this) * float(rhs));
- return *this;
- }
-
- /// Scale down
- inline __half& operator /=(const __half &rhs)
- {
- *this = __half(float(*this) / float(rhs));
- return *this;
- }
-
-};
-
-inline __half operator-(const __half &one)
-{
- return __half(-(float(one)));
-}
-
-inline __half operator+(const __half &one, const __half &two)
-{
- return __half(float(one) + float(two));
-}
-
-inline __half operator-(const __half &one, const __half &two)
-{
- return __half(float(one) - float(two));
-}
-
-inline __half operator*(const __half &one, const __half &two)
-{
- return __half(float(one) * float(two));
-}
-
-inline __half operator/(const __half &one, const __half &two)
-{
- return __half(float(one) / float(two));
-}
-
-inline bool operator==(const __half &one, const __half &two)
-{
- return (float(one) == float(two));
-}
-
-inline bool operator!=(const __half &one, const __half &two)
-{
- return (float(one) != float(two));
-}
-
-inline bool operator<(const __half &one, const __half &two)
-{
- return (float(one) < float(two));
-}
-
-inline bool operator<=(const __half &one, const __half &two)
-{
- return (float(one) <= float(two));
-}
-
-inline bool operator>(const __half &one, const __half &two)
-{
- return (float(one) > float(two));
-}
-
-inline bool operator>=(const __half &one, const __half &two)
-{
- return (float(one) >= float(two));
-}
-
-inline __half __convert_float_to_half(const float &a)
-{
- uint16_t temp = __convert_float_to_halfint(a);
- __half result(0, true/*raw*/);
- result.set_raw(temp);
- return result;
-}
-
inline __half floor(const __half &a)
{
return static_cast<__half>(::floor(static_cast<float>(a)));
@@ -774,6 +596,16 @@ inline __half sqrt(const __half &a)
return static_cast<__half>(::sqrt(static_cast<float>(a)));
}
+inline bool isinf(__half a)
+{
+ return std::isinf(static_cast<float>(a));
+}
+
+inline bool isnan(__half a)
+{
+ return std::isnan(static_cast<float>(a));
+}
+
#endif // Not nvcc or hipcc
#endif // __HALF_H__ runtime/legion/legion_redop.cc
diff --git a/runtime/legion/legion_redop.cc b/runtime/legion/legion_redop.cc
index 9d4cf49da..75ef9c3b3 100644
--- a/runtime/legion/legion_redop.cc
+++ b/runtime/legion/legion_redop.cc
@@ -20,10 +20,10 @@
namespace Legion {
#ifdef LEGION_REDOP_HALF
- /*static*/ const __half SumReduction<__half>::identity = __half(0, false/*raw*/);
- /*static*/ const __half DiffReduction<__half>::identity = __half(0, false/*raw*/);
- /*static*/ const __half ProdReduction<__half>::identity = __half(1, false/*raw*/);
- /*static*/ const __half DivReduction<__half>::identity = __half(1, false/*raw*/);
+ /*static*/ const __half SumReduction<__half>::identity = __half(0);
+ /*static*/ const __half DiffReduction<__half>::identity = __half(0);
+ /*static*/ const __half ProdReduction<__half>::identity = __half(1);
+ /*static*/ const __half DivReduction<__half>::identity = __half(1);
/*static*/ const __half MaxReduction<__half>::identity = __half(-2e10);
/*static*/ const __half MinReduction<__half>::identity = __half(2e10);
#endif
@@ -45,10 +45,10 @@ namespace Legion {
#ifdef LEGION_REDOP_COMPLEX
#ifdef LEGION_REDOP_HALF
- /*static*/ const complex<__half> SumReduction<complex<__half> >::identity = complex<__half>(__half(0, false/*raw*/), __half(0, false/*raw*/));
- /*static*/ const complex<__half> DiffReduction<complex<__half> >::identity = complex<__half>(__half(0, false/*raw*/), __half(0, false/*raw*/));
- /*static*/ const complex<__half> ProdReduction<complex<__half> >::identity = complex<__half>(__half(1, false/*raw*/), __half(0, false/*raw*/));
- /*static*/ const complex<__half> DivReduction<complex<__half> >::identity = complex<__half>(__half(1, false/*raw*/), __half(0, false/*raw*/));
+ /*static*/ const complex<__half> SumReduction<complex<__half> >::identity = complex<__half>(__half(0), __half(0));
+ /*static*/ const complex<__half> DiffReduction<complex<__half> >::identity = complex<__half>(__half(0), __half(0));
+ /*static*/ const complex<__half> ProdReduction<complex<__half> >::identity = complex<__half>(__half(1), __half(0));
+ /*static*/ const complex<__half> DivReduction<complex<__half> >::identity = complex<__half>(__half(1), __half(0));
#endif
/*static*/ const complex<float> SumReduction<complex<float> >::identity = complex<float>(0.f, 0.f);
/*static*/ const complex<float> DiffReduction<complex<float> >::identity = complex<float>(0.f, 0.f);
|
I don't think we should be trying to make Legion's version of |
Further amending my comment here: it is not Legion's responsibility to provide a portability layer for CUDA. If users want to use CCCL's types, they should turn off Legion's support for |
Legion defines its own implementation of
__half
, andcuda::std::complex<__half>
. Relatively recent versions of CUDA andlibcuxx
now also define these operators, both on host and device. The following set of patches allow Legion to be compiled with CUDA 12.2 - 12.5 (possibly also lower versions, but not tested).They should not be applied as-is, however, as they don't address several other cases:
__half
should be defined, but including Legion headers later where__half
is already defined. Specifically, theCMakeLists.txt
check should somehow be moved to header files and be done by checking versions. The specific error case is: a Legion package is built on a cluster using CUDA 12.2. A user installs that package, but they themselves have CUDA 12.5. Legion isn't incompatible with 12.5, so this is OK (and package managers won't downgrade the version of CUDA). But nowLEGION_HAVE_CUDA_HOST_HALF
is undefined, when it really should be true.runtime/CMakeLists.txt
runtime/mathtypes/complex.h
runtime/mathtypes/half.h
runtime/legion/legion_redop.cc
The text was updated successfully, but these errors were encountered: