Skip to content

[AutoSparkUT] Re-enable 'normalize special floating numbers in subquery' test (issue #14116)#14400

Merged
wjxiz1992 merged 5 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14116-normalize-float-subquery
Mar 24, 2026
Merged

[AutoSparkUT] Re-enable 'normalize special floating numbers in subquery' test (issue #14116)#14400
wjxiz1992 merged 5 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14116-normalize-float-subquery

Conversation

@wjxiz1992
Copy link
Copy Markdown
Collaborator

@wjxiz1992 wjxiz1992 commented Mar 11, 2026

Summary

  • Root cause fixed upstream: The -0.0 normalization bug was in RMM's device_uvector::set_element_async, which used cudaMemsetAsync for zero values — clearing the sign bit of IEEE 754 -0.0. This has been fixed in rapidsai/rmm#2302.
  • This PR: Simply removes the .exclude() for "normalize special floating numbers in subquery" in RapidsSQLQuerySuite, re-enabling the test now that the upstream fix has landed in the spark-rapids-jni nightly SNAPSHOT.
  • No spark-rapids code changes needed: The original workaround in GpuScalar (using ColumnVector path to bypass Scalar.fromDouble) has been removed — the direct Scalar.fromDouble/Scalar.fromFloat calls now preserve -0.0 correctly.

Upstream fix chain

rapidsai/rmm#2302 (06c3562)  — remove zero-value cudaMemsetAsync special-casing
  → spark-rapids-jni cudf-pins updated (RMM pin now includes the fix)
    → spark-rapids-jni nightly SNAPSHOT rebuilt with fixed librmm.so
      → this test now passes on GPU without any spark-rapids workaround

RAPIDS test to Spark original mapping

RAPIDS test Spark original Spark file Lines
normalize special floating numbers in subquery (inherited) normalize special floating numbers in subquery sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 3620-3636 (permalink)

Test plan

  • mvn package -pl tests -am -Dbuildver=330 -DwildcardSuites=RapidsSQLQuerySuite — 234 tests, 0 failures, 0 errors
  • The previously excluded test "normalize special floating numbers in subquery" now passes
  • No new test failures introduced
  • Verified on latest origin/main (commit b74f7f7) with upstream RMM fix in spark-rapids-jni SNAPSHOT

Closes #14116

Checklists

  • This PR has added documentation for new or modified features or
    behaviors.
  • This PR has added new tests or modified existing tests to cover
    new code paths.
    (Re-enabled the inherited Spark test "normalize special floating numbers in subquery" by removing its .exclude() entry. The test validates GPU-CPU parity for -0.0 in scalar subqueries.)
  • Performance testing has been performed and its results are added
    in the PR description. Or, an issue has been filed with a link in the PR
    description.

Made with Cursor

…VIDIA#14116)

cuDF's Scalar.fromDouble(-0.0) normalizes -0.0 to 0.0, losing the sign
bit. This caused GPU scalar subqueries to return 0.0 where CPU correctly
returns -0.0, violating GPU-CPU parity.

Root cause: the JNI path Scalar.fromDouble -> makeFloat64Scalar drops
the IEEE 754 sign bit of negative zero during scalar creation.

Fix: in GpuScalar.from(), create float/double scalars via a 1-element
ColumnVector + getScalarElement(0) instead of Scalar.fromDouble/fromFloat.
The column-based path preserves the exact bit pattern.

This re-enables the previously excluded test "normalize special floating
numbers in subquery" in RapidsSQLQuerySuite.

Closes NVIDIA#14116

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
Copilot AI review requested due to automatic review settings March 11, 2026 09:18
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR re-enables the previously excluded "normalize special floating numbers in subquery" test in RapidsSQLQuerySuite (Spark 3.3.0) by removing its .exclude() entry from RapidsTestSettings.scala. The root cause — cudaMemsetAsync in RMM's device_uvector::set_element_async silently clearing the sign bit of IEEE 754 -0.0 — has been fixed upstream in rapidsai/rmm#2302 and is now available in the spark-rapids-jni nightly SNAPSHOT.

  • Removes the single .exclude("normalize special floating numbers in subquery", KNOWN_ISSUE(...)) line from RapidsTestSettings.scala
  • No spark-rapids production code changes required; the fix is entirely upstream in librmm
  • The test was verified to pass (234 tests, 0 failures) with the updated SNAPSHOT dependency

Confidence Score: 5/5

  • This PR is safe to merge — it is a minimal, low-risk change that only removes a test exclusion entry.
  • The change is a single-line deletion of a .exclude() call in a test settings file. There are no production code changes, no new logic introduced, and no other Spark version directories in the repository that would require a matching update. The PR description clearly documents the upstream fix chain and provides passing test evidence (234 tests, 0 failures).
  • No files require special attention.

Important Files Changed

Filename Overview
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala Removes the .exclude() entry for "normalize special floating numbers in subquery" from RapidsSQLQuerySuite, re-enabling the test now that the upstream RMM bug (cudaMemsetAsync zeroing the sign bit of -0.0) has been fixed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["rapidsai/rmm#2302\nRemove zero-value cudaMemsetAsync\nspecial-casing for device_uvector"] --> B["spark-rapids-jni cudf-pins updated\nRMM pin now includes fix"]
    B --> C["spark-rapids-jni nightly SNAPSHOT rebuilt\nwith fixed librmm.so"]
    C --> D["GPU -0.0 sign bit preserved\nin Scalar.fromDouble / Scalar.fromFloat"]
    D --> E["RapidsSQLQuerySuite\n'normalize special floating numbers\nin subquery' now passes on GPU"]
    E --> F["Remove .exclude() entry\nfrom RapidsTestSettings.scala\n(this PR)"]
Loading

Last reviewed commit: "Merge branch 'main' ..."

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes GPU/CPU parity for scalar subquery results involving signed zero by ensuring GpuScalar.from preserves the IEEE-754 -0.0 bit pattern for float/double values, and re-enables the previously excluded Spark-derived test that covers this case.

Changes:

  • Update GpuScalar.from to create float/double cuDF scalars via a 1-element ColumnVector + getScalarElement(0) to preserve -0.0.
  • Remove the exclusion for "normalize special floating numbers in subquery" from Spark 3.3.0 test settings.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sql-plugin/src/main/scala/com/nvidia/spark/rapids/literals.scala Changes float/double scalar creation path to preserve -0.0 bit pattern.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala Re-enables the previously excluded subquery normalization test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 283 to 294
case d: Double =>
// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).
// Create via a 1-element column to preserve the exact bit pattern.
withResource(ColumnVector.fromDoubles(d)) { cv =>
cv.getScalarElement(0)
}
case f: Float =>
withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>
cv.getScalarElement(0)
}
case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +
s" for DoubleType, expecting Double or Float.")
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating float/double scalars by materializing a 1-element ColumnVector will allocate/copy device memory on every scalar creation, which can be a noticeable perf/regression vs Scalar.fromDouble/fromFloat (GpuScalar.from is used in many code paths). Consider using the ColumnVector+getScalarElement workaround only for the specific problematic value (signed zero), and keep the fast Scalar.fromDouble path for all other doubles/floats.

Suggested change
case d: Double =>
// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).
// Create via a 1-element column to preserve the exact bit pattern.
withResource(ColumnVector.fromDoubles(d)) { cv =>
cv.getScalarElement(0)
}
case f: Float =>
withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>
cv.getScalarElement(0)
}
case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +
s" for DoubleType, expecting Double or Float.")
case d: Double =>
// cuDF Scalar.fromDouble normalizes -0.0 to 0.0 (see #14116).
// Use a 1-element column only for negative zero to preserve the exact bit pattern.
if (d == 0.0 && JDouble.doubleToRawLongBits(d) == JDouble.doubleToRawLongBits(-0.0d)) {
withResource(ColumnVector.fromDoubles(d)) { cv =>
cv.getScalarElement(0)
}
} else {
Scalar.fromDouble(d)
}
case f: Float =>
// Preserve negative zero for floats as well.
if (f == 0.0f && JFloat.floatToRawIntBits(f) == JFloat.floatToRawIntBits(-0.0f)) {
withResource(ColumnVector.fromDoubles(f.toDouble)) { cv =>
cv.getScalarElement(0)
}
} else {
Scalar.fromDouble(f.toDouble)
}
case _ => throw new IllegalArgumentException(s"'$v: ${v.getClass}' is not supported" +
s" for DoubleType, expecting Double or Float.")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I get that Spark has a test for this, but what value is there is keeping -0.0 instead of normalizing it to 0.0? IEEE treats them all as the same, and spark has had bugs where -0.0 can cause issues. Does creating a column so that we can pull out a scalar from it cost any performance? It will cost some memory at least. Do we know where cudf is normalizing this? I just want questions answered before we put in a change like this.

Address review feedback: instead of routing all float/double scalar
creation through ColumnVector (which allocates device memory), detect
-0.0 via raw bit comparison and only use the slow path for that specific
value. All other values continue to use the fast Scalar.fromDouble/
Scalar.fromFloat path, making the common-case cost zero.

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

wjxiz1992 commented Mar 12, 2026

@revans2 Let me add more background on the UT fix work:

  1. we want to increase test coverage

  2. method is: port(by extends) Spark Unit Tests directly via rapidsTest framework to run them on GPU.

  3. We want GPU to produce exact same behavior as CPU does. For this case, though we know 0.0 == -0.0, but if CPU produces -0.0, GPU should also produce -0.0 . Another check in Spark is this compare, it intentionally distinguishes them.

  4. sign bit loss at Java_ai_rapids_cudf_Scalar_makeFloat64Scalar.

The debug code used in the suite was:

testRapids("DEBUG #14116 - trace negative zero through join") {
  val negZeroRow = java.util.Arrays.asList(Row(-0.0))
  val posZeroRow = java.util.Arrays.asList(Row(0.0))
  val schema = org.apache.spark.sql.types.StructType(Seq(
    org.apache.spark.sql.types.StructField(
      "d", org.apache.spark.sql.types.DoubleType)))
  withTempView("v1", "v2") {
    spark.createDataFrame(negZeroRow, schema)
      .createTempView("v1")
    spark.createDataFrame(posZeroRow, schema)
      .createTempView("v2")

    def bits(d: Double): String =
      java.lang.Double.doubleToRawLongBits(d).toHexString

    val sb = new StringBuilder
    import com.nvidia.spark.rapids.{GpuScalar, Arm}
    Arm.withResource(
      GpuScalar((-0.0).asInstanceOf[Any],
        org.apache.spark.sql.types.DoubleType)
    ) { gs =>
      val s = gs.getBase
      sb.append(s"gpuscalar=${bits(s.getDouble)} ")
    }

    val tests = Seq(
      "plain_v1" -> "SELECT d FROM v1",
      "subq_no_join" -> "SELECT (SELECT d FROM v1)",
      "join_no_subq" ->
        "SELECT v1.d FROM v1 JOIN v2 ON v1.d = v2.d",
      "subq_with_join" -> ("SELECT (SELECT v1.d " +
        "FROM v1 JOIN v2 ON v1.d = v2.d)")
    )
    tests.foreach { case (name, query) =>
      val d = sql(query).collect().head.getDouble(0)
      sb.append(s"$name=${bits(d)} ")
    }

    val msg = sb.toString
    assert(msg.contains("gpuscalar=8000000000000000"),
      s"GpuScalar lost -0.0. All results: $msg")
    assert(msg.contains("subq_no_join=8000000000000000"),
      s"subq_no_join lost -0.0. All results: $msg")
    assert(msg.contains("subq_with_join=8000000000000000"),
      s"subq_with_join lost -0.0. All results: $msg")
  }
}

Test output was:

cudf_scalar=0 cudf_col=0 plain_v1=8000000000000000 subq_no_join=0 join_no_subq=8000000000000000 subq_with_join=0
Test Bits Meaning
cudf_scalar 0 Scalar.fromDouble(-0.0) → 0.0, sign bit lost
plain_v1 8000000000000000 Direct SELECT via column path, -0.0 preserved
subq_no_join 0 Scalar subquery (no join), goes through GpuScalar → sign bit lost
join_no_subq 8000000000000000 Join without scalar subquery, -0.0 preserved
subq_with_join 0 Scalar subquery + join, same GpuScalar path → sign bit lost

==================================

A more dedicated test:

NegativeZeroScalarRepro.java

import ai.rapids.cudf.ColumnVector;
import ai.rapids.cudf.Scalar;

/**
 * Standalone reproducer: cuDF Scalar.fromDouble(-0.0) loses the IEEE 754 sign bit.
 *
 * Run with:
 *   javac -cp <rapids-uber-jar> NegativeZeroScalarRepro.java
 *   java  -cp .:<rapids-uber-jar> NegativeZeroScalarRepro
 *
 * Expected: Scalar path should preserve -0.0 (sign bit = 0x8000000000000000)
 * Actual:   Scalar path normalizes -0.0 to 0.0 (sign bit lost)
 *
 * See: https://github.com/NVIDIA/spark-rapids/issues/14116
 */
public class NegativeZeroScalarRepro {
  public static void main(String[] args) {
    System.out.println("=== cuDF Scalar -0.0 sign bit reproducer ===\n");
    int failures = 0;

    // --- Double tests ---
    System.out.println("--- Double (-0.0d) ---");

    // Path 1: Scalar.fromDouble — this is the buggy path
    try (Scalar s = Scalar.fromDouble(-0.0d)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean preserved = bits == 0x8000000000000000L;
      System.out.printf("  Scalar.fromDouble(-0.0):        bits=0x%016x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // Path 2: ColumnVector — this preserves the bit pattern
    try (ColumnVector cv = ColumnVector.fromDoubles(-0.0d);
         Scalar s = cv.getScalarElement(0)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean preserved = bits == 0x8000000000000000L;
      System.out.printf("  ColumnVector(-0.0).getScalar:   bits=0x%016x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // --- Float tests ---
    System.out.println("\n--- Float (-0.0f) ---");

    // Path 1: Scalar.fromFloat
    try (Scalar s = Scalar.fromFloat(-0.0f)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean preserved = bits == 0x80000000;
      System.out.printf("  Scalar.fromFloat(-0.0f):        bits=0x%08x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // Path 2: ColumnVector
    try (ColumnVector cv = ColumnVector.fromFloats(-0.0f);
         Scalar s = cv.getScalarElement(0)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean preserved = bits == 0x80000000;
      System.out.printf("  ColumnVector(-0.0f).getScalar:  bits=0x%08x  %s%n",
          bits, preserved ? "PASS (sign bit preserved)" : "FAIL (sign bit lost)");
      if (!preserved) failures++;
    }

    // --- Positive zero sanity check ---
    System.out.println("\n--- Positive zero (sanity check) ---");
    try (Scalar s = Scalar.fromDouble(0.0d)) {
      long bits = Double.doubleToRawLongBits(s.getDouble());
      boolean ok = bits == 0L;
      System.out.printf("  Scalar.fromDouble(0.0):         bits=0x%016x  %s%n",
          bits, ok ? "PASS" : "FAIL");
      if (!ok) failures++;
    }
    try (Scalar s = Scalar.fromFloat(0.0f)) {
      int bits = Float.floatToRawIntBits(s.getFloat());
      boolean ok = bits == 0;
      System.out.printf("  Scalar.fromFloat(0.0f):         bits=0x%08x  %s%n",
          bits, ok ? "PASS" : "FAIL");
      if (!ok) failures++;
    }

    // --- User-visible impact ---
    System.out.println("\n--- User-visible impact (string representation) ---");
    System.out.printf("  Java Double.toString(-0.0): \"%s\"%n", Double.toString(-0.0));
    System.out.printf("  Java Double.toString(0.0):  \"%s\"%n", Double.toString(0.0));
    System.out.println("  => cast(-0.0 as string) would return \"0.0\" on GPU"
        + " instead of \"-0.0\" on CPU");

    System.out.printf("%n=== Result: %d failure(s) ===%n", failures);
    System.exit(failures > 0 ? 1 : 0);
  }
}

Repro:

// first compile
javac -cp <PATH_TO>/rapids-4-spark_2.12-26.04.0-SNAPSHOT-cuda12.jar NegativeZeroScalarRepro.java
...
// run
RAPIDS_JAR=<PATH_TO>/rapids-4-spark_2.12-26.04.0-SNAPSHOT-cuda12.jar && SLF4J_JAR=<PATH_TO>/org/slf4j/slf4j-api/1.7.36/slf4j-api-1.7.36.jar &&  java -cp .:${RAPIDS_JAR}:${SLF4J_JAR} NegativeZeroScalarRepro

==================================================

=== cuDF Scalar -0.0 sign bit reproducer ===

--- Double (-0.0d) ---
  Scalar.fromDouble(-0.0):        bits=0x0000000000000000  FAIL (sign bit lost)
  ColumnVector(-0.0).getScalar:   bits=0x8000000000000000  PASS (sign bit preserved)

--- Float (-0.0f) ---
  Scalar.fromFloat(-0.0f):        bits=0x00000000  FAIL (sign bit lost)
  ColumnVector(-0.0f).getScalar:  bits=0x80000000  PASS (sign bit preserved)

--- Positive zero (sanity check) ---
  Scalar.fromDouble(0.0):         bits=0x0000000000000000  PASS
  Scalar.fromFloat(0.0f):         bits=0x00000000  PASS

--- User-visible impact (string representation) ---
  Java Double.toString(-0.0): "-0.0"
  Java Double.toString(0.0):  "0.0"
  => cast(-0.0 as string) would return "0.0" on GPU instead of "-0.0" on CPU

=== Result: 2 failure(s) ===
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

I didn't file issue to cuDF as this 0.0 issue in this use case is more specific to Spark, but I can file one if you think it's needed.

  1. I update the logic to only ColumnVector workaround for -0.0 for performance. All other values continue to use the fast Scalar.fromDouble/Scalar.fromFloat path.

@wjxiz1992 wjxiz1992 self-assigned this Mar 12, 2026
@revans2
Copy link
Copy Markdown
Collaborator

revans2 commented Mar 12, 2026

@wjxiz1992 I think you might have misunderstood my comments. -0.0 vs 0.0 is a very minor thing. It feels very inconsequential to have a scalar value be different. I get that we want to match all of spark's unit tests, but if it is going to make the code less maintainable and possibly slower, then we need to think about the cost of being 100% compatible with spark and decide if it is worth it. We also need to think about the change that we are making and ask ourselves if this is the right place to make the change. Here we are making multiple tiny memory allocations and copies to make it so that we can preserve the -0.0. That is not worth it to me, which is why I asked where the error came from. We are not explicitly normalizing -0.0 to 0.0 in the code you pointed to, so I want to understand if there is something we can do there that would let us fix this in a much more efficient way. I also want to understand who added the test and why that added the test. What possible case exists that we need to preserve this, especially when spark strips it out in so many situations.

For me I am +1 if we can do this in a way that does not cause any more GPU memory allocations or GPU data movement, or make the code more difficult to work with. I am -1 in all other cases unless we can prove that this is critical to some real world situation.

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

wjxiz1992 commented Mar 13, 2026

@revans2 I see, thanks for the nice explanation!

  1. Now I changed my wrong thought that "correctness(consistency with Spark) > performance". it really depends.
  2. Yes, the JNI code deson't explicitly normalize it (sign bit loss is just a result). makeFloat64Scalar creates a cudf::scalar_type_t and calls set_value(static_cast(value)). No explicit normalization here — the jdouble → double cast preserves the sign bit. Then cuDF C++ layer — cudf/cpp/src/scalar/scalar.cpp
    fixed_width_scalar::set_value
    calls _data.set_value_async(value, stream) where _data is an rmm::device_scalar. This performs a host-to-device copy. The sign bit is lost somewhere in this RMM path — neither the JNI nor the cuDF C++ code explicitly normalizes -0.0.
  3. Let me file a cuDF issue there for more investigations and discussions, this tiny unit test doesn't deserve the added GPU memory costs introduced in this PR.
  4. All [AutpSparkUT] prefix PRs actually originates from the test coverage worry from high table, then Mahone introduced this "RapidsTest-extends" way to migrate all Spark unit test. And no, it's not from any actual customer use case, you can regard it as just a KPI "how many Spark unit tests have been migrated". All problematic tests are now excluded in this file, and my job is to remove them from exlucsion. I'll be more cautious so it doens't ruin the spark-rapids project.

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

@revans2 Update: I traced the root cause down to the RMM layer.

Root cause: rmm::device_uvector::set_element_async (device_uvector.hpp L222-226) has a zero-optimization that uses cudaMemsetAsync when value == value_type{0}. Since IEEE 754 defines -0.0 == 0.0 as true, -0.0 triggers the memset path, which clears all bits including the sign bit.

cudaMemcpy(-0.0):   bits=0x8000000000000000  PASS (sign bit preserved)
cudaMemset(0):      bits=0x0000000000000000  FAIL (sign bit lost)
-0.0 == 0.0 ? TRUE (IEEE 754)

The call chain is: Scalar.fromDouble → JNI makeFloat64Scalarcudf::fixed_width_scalar::set_valuermm::device_scalar::set_value_asyncdevice_uvector::set_element_async → hits the == 0 optimization → cudaMemsetAsync → sign bit gone.

Filed as rapidsai/rmm#2298 with a CUDA reproducer and suggested fix (use memcmp instead of == for floating-point zero detection).

For this PR: since the proper fix belongs in RMM (zero GPU memory overhead once fixed there), I'll keep the exclude in place and revisit after the RMM fix lands.

The root cause (RMM set_element_async normalizing -0.0 to +0.0 via
cudaMemsetAsync) has been fixed upstream in rapidsai/rmm#2302
(commit 06c3562). The fix is now included in the spark-rapids-jni
nightly SNAPSHOT via the updated cudf-pins RMM pin.

Remove the ColumnVector-based workaround in GpuScalar and restore the
original direct Scalar.fromDouble/fromFloat calls. The exclusion
removal in RapidsTestSettings (from the prior commit) is retained —
the test now passes with the upstream fix alone.

Verified: RapidsSQLQuerySuite 234 tests, 0 failures, 0 errors.
Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
@wjxiz1992 wjxiz1992 changed the title [AutoSparkUT] Fix GpuScalar to preserve -0.0 for float/double (issue #14116) [AutoSparkUT] Re-enable 'normalize special floating numbers in subquery' test (issue #14116) Mar 18, 2026
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

@revans2 Hi Bobby, this PR has been updated based on your feedback:

  • The workaround is completely removed. The root cause was fixed upstream in Remove zero-value special casing in set_element_async to preserve IEEE 754 -0.0 rapidsai/rmm#2302 (the cudaMemsetAsync zero-optimization in device_uvector::set_element_async that was clearing the sign bit of -0.0). That fix is now included in the spark-rapids-jni nightly SNAPSHOT.
  • Zero production code changes — the literals.scala file is byte-for-byte identical to the base branch. No extra GPU memory allocations, no ColumnVector workaround, no performance cost.
  • The only change is removing one .exclude() line in RapidsTestSettings.scala to re-enable the test, which now passes with the upstream fix alone.
  • Test verified: RapidsSQLQuerySuite 234 tests, 0 failures, 0 errors.

Could you take another look when you get a chance? Thanks!

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

@wjxiz1992 wjxiz1992 merged commit eb541ca into NVIDIA:main Mar 24, 2026
47 checks passed
@sameerz sameerz added the test Only impacts tests label Mar 25, 2026
wjxiz1992 added a commit to wjxiz1992/spark-rapids that referenced this pull request Mar 30, 2026
The stash pop three-way merge re-introduced exclusions for NVIDIA#14098,

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
NVIDIA#14110, and NVIDIA#14116 that were already removed by merged PRs NVIDIA#14446,
NVIDIA#14398, and NVIDIA#14400. Remove them to match origin/main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test Only impacts tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AutoSparkUT]"normalize special floating numbers in subquery" in SQLQuerySuite failed

5 participants