@@ -20,6 +20,56 @@ Notable changes include:
20
20
* Bug fixes/improvements:
21
21
22
22
23
+ Version 2025.03.0 -- Release date 2025-03-17
24
+ ============================================
25
+
26
+ This release contains new features, bug fixes, and updates to submodule
27
+ dependencies.
28
+
29
+ Notable changes include:
30
+
31
+ * New features / API changes:
32
+ * Added improved support for perfectly nested loops in RAJA::launch.
33
+ * Added helper methods to simplify the creation of RAJA View objects
34
+ with permutations of stride ordering. Examples and user docs have also
35
+ been added.
36
+ * Added GPU policies for CUDA and HIP that do not check loop bounds when
37
+ they do not need to be checked in a kernel. This can help improve
38
+ performance by up to 5%. The new policies are documented in the RAJA
39
+ user guide and include ` direct_unchecked ` in their names.
40
+ * Refactored the new (experimental) RAJA reduction interface to have
41
+ consistent min/max/loc operator semantics and added type safety to
42
+ reduce erroneous usage. Changes are described in the RAJA User Guide.
43
+ * Added support for new RAJA reduction interface to RAJA::dynamic_forall
44
+ and pulled dynamic_forall out of RAJA ` expt ` namespace.
45
+ * Added ` RAJA_HIP_WAVESIZE ` CMake option to set the wave size for HIP
46
+ builds. It defaults to 64 but can be set to 32, for example, to
47
+ build RAJA to run on Radeon gaming cards.
48
+
49
+ * Build changes/improvements:
50
+ * Update BLT to v0.7.0 release.
51
+ * Update camp submodule to v2025.03.0 release.
52
+ * Update desul submodule to 6114dd25b54782678c555c0c1d2197f13cc8d2a0
53
+ commit.
54
+ * Added clang-format CI check (clang 14) that must pass before a PR can
55
+ be merged -- noted here so external contributors are aware.
56
+
57
+ * Bug fixes/improvements:
58
+ * Resolved undefined behavior related to constructing
59
+ uniform_int_distribution with min > max. This was causing some Windows
60
+ tests to fail.
61
+ * Corrected call to wrong global function when using a fixed CUDA policy
62
+ and reductions in RAJA::launch kernel -- potential performance issue.
63
+ * Fixed memory leak in RAJA::launch OpenMP back-end.
64
+ * Added missing host-device decorations to some math utility functions.
65
+ * Fixed MSVC compilation failures with 64-bit intrinsics in x86 Windows
66
+ builds.
67
+ * Fixed issue so that a kernel will no longer be launched when there is no
68
+ work for it to do; i.e., no active iteration space entries.
69
+ * Removed invalid C++ usage in implementation of RAJA::kernel ` initLocalMem `
70
+ statement, which was causing large warning messages during compilation.
71
+
72
+
23
73
Version 2024.07.0 -- Release date 2024-07-24
24
74
============================================
25
75
0 commit comments