[AutoSparkUT] Fix CollectLimitExec GPU replacement for Command results (SPARK-19650) by wjxiz1992 · Pull Request #14398 · NVIDIA/spark-rapids

wjxiz1992 · 2026-03-11T06:52:17Z

Summary

Fix GPU plugin to preserve job-free executeCollect behavior for Command results (issue [AutoSparkUT]"SPARK-19650: An action on a Command should not trigger a Spark job" in SQLQuerySuite failed #14110).
When CollectLimitExec wraps a CommandResultExec (e.g., sql("show databases").head()), the GPU plugin was replacing it with GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec, forcing result materialization through GpuBringBackToHost's default executeCollect() which triggers a Spark job. On CPU, CollectLimitExec.executeCollect() delegates to CommandResultExec.executeTake() which returns pre-computed rows without a job.
The fix adds a tagPlanForGpu() check in GpuCollectLimitMeta: when the child is CommandResultExec or ExecutedCommandExec, CollectLimitExec stays on CPU to preserve the job-free execution path.
Remove the KNOWN_ISSUE exclusion for SPARK-19650 — the original Spark test now passes directly on GPU.

Traceability

Field	Value
RAPIDS test name	`RapidsSQLQuerySuite` (inherits original test, no `testRapids` override)
Spark original test	`test("SPARK-19650: An action on a Command should not trigger a Spark job")` in `SQLQuerySuite`
Spark source file	`sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala`
Line range	2627–2649
Source link (master)	SQLQuerySuite.scala#L2627-L2649
Source link (pinned)	SQLQuerySuite.scala@f66c336#L2627-L2649
Fix location	`GpuCollectLimitMeta.tagPlanForGpu()` in `limit.scala`
Issue	#14110

Test plan

mvn package -pl tests -am -Dbuildver=330 -Dmaven.repo.local=./.mvn-repo -DwildcardSuites=...RapidsSQLQuerySuite — Tests: succeeded 215, failed 0, ignored 18 (was 19 ignored before)
SPARK-19650 test passes directly (original Spark test, no testRapids override)
No new failures introduced
Pre-merge CI

Performance

This PR modifies sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala (GPU execution path). Impact analysis:

Affected code path: cold path (query planning only). The new tagPlanForGpu() override in GpuCollectLimitMeta runs once per query plan during the plan-tagging phase — not per-row, not per-batch, not per-partition. The cost is one getClass.getSimpleName call + two string comparisons per CollectLimitExec node encountered during planning. This is negligible relative to any query's total planning time.

No hot-path changes:

convertToGpu() is unchanged — the GPU replacement logic for normal CollectLimitExec (e.g., SELECT * FROM table LIMIT N) is identical before and after this PR.
No new allocations, branches, or synchronization points are added to the normal GPU execution flow.
GPU memory usage patterns are unaffected — the change only prevents GPU replacement in a narrow case, it doesn't alter what the GPU path does when it is used.

Trigger condition is narrow: The guard only fires when CollectLimitExec.child is CommandResultExec or ExecutedCommandExec — DDL/administrative command results (e.g., SHOW DATABASES, DESCRIBE TABLE), not data-intensive queries. For all other child types, the existing GPU replacement path is taken with zero behavioral difference.

Net effect for the triggered case is a performance improvement: Before this fix, CollectLimitExec(CommandResultExec) was replaced with GPU operators that forced a Spark job submission through GpuBringBackToHost.executeCollect(). After this fix, CollectLimitExec stays on CPU and delegates to CommandResultExec.executeTake(), returning pre-computed rows without any Spark job — eliminating unnecessary job scheduling overhead.

Checklists

This PR has added documentation for new or modified features or behaviors.
This PR has added new tests or modified existing tests to cover new code paths.
(The existing Spark test SPARK-19650 in RapidsSQLQuerySuite now covers the fixed code path — the KNOWN_ISSUE exclusion was removed so the test runs directly.)
Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Made with Cursor

NVIDIA#14110) Skip GPU replacement of CollectLimitExec when the child plan already provides pre-computed results (CommandResultExec, ExecutedCommandExec, LocalTableScanExec). These plans override executeCollect/executeTake to return rows without triggering a Spark job. Replacing their parent CollectLimitExec with GPU equivalent forces a job submission through GpuBringBackToHost's default executeCollect path. This preserves GPU-CPU parity for SPARK-19650: the original Spark test now passes directly on GPU without any test adaptation. Issue: NVIDIA#14110 Maven validation: mvn package -pl tests -am -Dbuildver=330 \ -Dmaven.repo.local=./.mvn-repo \ -DwildcardSuites=...RapidsSQLQuerySuite Tests: succeeded 215, failed 0, ignored 18 Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor

greptile-apps · 2026-03-11T06:55:20Z

Greptile Summary

This PR fixes a correctness issue where the GPU plugin was replacing CollectLimitExec(CommandResultExec) with GPU operators (GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec), causing GpuBringBackToHost.executeCollect() to launch an unnecessary Spark job even though the CPU path can return pre-computed rows job-free via CommandResultExec.executeTake().

Key changes:

Adds a tagPlanForGpu() override in GpuCollectLimitMeta that calls willNotWorkOnGpu when the child plan is a CommandResultExec or ExecutedCommandExec, keeping CollectLimitExec on CPU for DDL/command results (e.g. SHOW DATABASES, DESCRIBE TABLE).
Uses full class name matching with endsWith (rather than isInstanceOf) to avoid compile errors on Spark versions predating CommandResultExec (introduced in 3.2) — an explanatory comment was added in response to prior review feedback.
Removes the KNOWN_ISSUE exclusion for SPARK-19650 from RapidsTestSettings for Spark 3.3.0, allowing the original Spark test to run directly on GPU.
Prior concerns about the LocalTableScanExec case being over-broad and the missing comment explaining the string-matching approach have both been fully addressed in the latest revision.

Confidence Score: 5/5

This PR is safe to merge — the fix is narrowly scoped, correctly implemented, and all prior review concerns have been addressed.
The change is a single targeted guard in the plan-tagging phase (cold path only), all previous review feedback about LocalTableScanExec over-breadth and the missing string-matching comment has been addressed, the base tagPlanForGpu() is empty so omitting a super call is consistent with the codebase pattern, and the test exclusion removal is validated by the passing test run reported in the PR description.
No files require special attention.

Important Files Changed

Filename	Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala	Adds `tagPlanForGpu()` override in `GpuCollectLimitMeta` that prevents GPU replacement when `CollectLimitExec` wraps a `CommandResultExec` or `ExecutedCommandExec`, using full class name matching via `endsWith` to stay shim-compatible across Spark versions.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala	Removes the KNOWN_ISSUE exclusion for `SPARK-19650` from `RapidsSQLQuerySuite`, allowing the original Spark test to run directly on GPU now that the underlying issue is fixed.

Sequence Diagram

sequenceDiagram
    participant User
    participant CollectLimitExec
    participant GpuCollectLimitMeta
    participant tagPlanForGpu

    User->>CollectLimitExec: sql("show databases").head()

    Note over GpuCollectLimitMeta: Plan tagging phase
    GpuCollectLimitMeta->>tagPlanForGpu: tagPlanForGpu()
    tagPlanForGpu->>tagPlanForGpu: check child class name
    alt child is CommandResultExec or ExecutedCommandExec
        tagPlanForGpu->>GpuCollectLimitMeta: willNotWorkOnGpu(...)
        Note over CollectLimitExec: Stays on CPU
        CollectLimitExec->>CollectLimitExec: executeCollect() → child.executeTake(limit)
        Note over CollectLimitExec: Returns pre-computed rows (no Spark job)
    else normal child (e.g. data query)
        tagPlanForGpu->>GpuCollectLimitMeta: (no-op, GPU replacement proceeds)
        Note over GpuCollectLimitMeta: convertToGpu()
        GpuCollectLimitMeta-->>User: GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec
    end

_{Reviews (3): Last reviewed commit: "Use full class name for command exec typ..." | Re-trigger Greptile}

sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala

Copilot

Pull request overview

Fixes a Spark RAPIDS GPU plugin behavior regression where CollectLimitExec wrapping command-style results (e.g., sql("show databases").head()) would be replaced with a GPU plan that triggers a Spark job, instead of preserving Spark’s job-free executeCollect/executeTake behavior for pre-computed command results.

Changes:

Add a GpuCollectLimitMeta.tagPlanForGpu() guard to keep CollectLimitExec on CPU when its child is a pre-computed result exec (command/local table scan), avoiding an unnecessary Spark job.
Remove the SPARK-19650 exclusion from the Spark 3.3.0 RAPIDS test settings since the original Spark test now passes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala`	Prevents GPU replacement of `CollectLimitExec` for command/local pre-computed children to preserve job-free collection semantics.
`tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala`	Re-enables the SPARK-19650 test by removing the known-issue exclusion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T06:56:25Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala

+  override def tagPlanForGpu(): Unit = {
+    val childName = collectLimit.child.getClass.getSimpleName
+    if (childName == "CommandResultExec" ||
+        childName == "ExecutedCommandExec" ||
+        childName == "LocalTableScanExec") {
+      willNotWorkOnGpu(
+        s"child $childName already provides pre-computed " +
+        "results; replacing CollectLimit would trigger " +
+        "an unnecessary Spark job")
+    }


Using getClass.getSimpleName string comparisons to detect child plan types is brittle (renames/relocation across Spark versions, subclasses/proxies) and makes the logic harder to maintain. Prefer type-based matching (e.g., collectLimit.child match { case _: CommandResultExec | _: ExecutedCommandExec | _: LocalTableScanExec => ... }) or a shim/helper that checks classes directly, and import the relevant Spark exec classes instead of comparing names as strings.

This is the same concern addressed in 590dd86. String matching is intentional: CommandResultExec was introduced in Spark 3.2, and limit.scala is cross-version common code — importing it directly would cause compile failures on older Spark shims. The comment added in 590dd86 explains this design choice.

Maybe adding a list of supported commands in the shim layer would be better practice. I'm fine with the current code but would like to see other reviewers' opinions.

And the line limit issue seems to still be there, as it is in other recent PRs. (It's a nit for this pr, too.)

Agree that a shim-layer approach would be cleaner if the list grows. Right now it's only 2 types and both are stable Spark internals (CommandResultExec since 3.2, ExecutedCommandExec since early Spark), so the maintenance cost of string matching is low. Happy to refactor into a shim helper if other reviewers also prefer that — would make sense to do it once we need to add more types.

no problem, but this has already passed IT, the lines are not too many, I will do a length change if there're new comments in, I'll do it along with that.

- Remove LocalTableScanExec from the guard — it is used for general small datasets where GPU acceleration is still beneficial. - Add comment explaining why string matching is used instead of isInstanceOf (cross-version compatibility). Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor

wjxiz1992 · 2026-03-12T01:03:18Z

build

wjxiz1992 · 2026-03-19T04:00:21Z

Performance Analysis

This PR modifies sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala, so addressing performance impact here.

Affected code path: cold path (query planning only)

The new tagPlanForGpu() override in GpuCollectLimitMeta runs once per query plan during the tagPlanForGpu phase — not per-row, not per-batch, not per-partition. The cost is one getClass.getSimpleName call + two string comparisons per CollectLimitExec node encountered during planning. This is negligible relative to any query's total planning time.

No hot-path changes:

convertToGpu() is unchanged — the GPU replacement logic for normal CollectLimitExec (e.g., SELECT * FROM table LIMIT N) is identical before and after this PR.
No new allocations, branches, or synchronization points are added to the normal GPU execution flow.
GPU memory usage patterns are unaffected — the change only prevents GPU replacement in a narrow case, it doesn't alter what the GPU path does when it is used.

Trigger condition is narrow:

The guard only fires when CollectLimitExec.child is CommandResultExec or ExecutedCommandExec — these are DDL/administrative command results (e.g., SHOW DATABASES, DESCRIBE TABLE), not data-intensive queries. For all other child types, the existing GPU replacement path is taken with zero behavioral difference.

Net effect for the triggered case is a performance improvement:

Before this fix, CollectLimitExec(CommandResultExec) was replaced with GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec, which forced a Spark job submission through GpuBringBackToHost.executeCollect(). After this fix, CollectLimitExec stays on CPU and delegates to CommandResultExec.executeTake(), returning pre-computed rows without any Spark job — eliminating unnecessary job scheduling overhead.

Conclusion: No benchmark needed. The change is on a cold path (planning-time only), does not touch the hot execution path, and the behavioral change for the narrow trigger case is strictly a performance improvement (fewer Spark jobs).

wjxiz1992 · 2026-03-19T04:05:56Z

@sperlingxx @firestarman can you help take a look at this PR?

firestarman · 2026-03-24T02:36:12Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala

+    // String matching avoids compile errors on Spark versions
+    // where CommandResultExec (added in 3.2) does not exist.
+    val childName = collectLimit.child.getClass.getSimpleName
+    if (childName == "CommandResultExec" ||


NIT: Can we use the full package name instead of the shorter one ?

Fixed in 2fc8235. Switched from getClass.getSimpleName to getClass.getName with endsWith for full package name matching (.execution.CommandResultExec and .execution.command.ExecutedCommandExec). This is also more robust since getSimpleName can throw InternalError for some inner/anonymous classes.

Switch from getSimpleName to getName with endsWith for CommandResultExec/ExecutedCommandExec detection in tagPlanForGpu(). Full package name matching is more robust than simple name — getSimpleName can throw InternalError for some inner/anonymous classes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2026-03-24T07:13:47Z

build

firestarman

LGTM

The stash pop three-way merge re-introduced exclusions for NVIDIA#14098, Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor NVIDIA#14110, and NVIDIA#14116 that were already removed by merged PRs NVIDIA#14446, NVIDIA#14398, and NVIDIA#14400. Remove them to match origin/main.

Copilot AI review requested due to automatic review settings March 11, 2026 06:52

Copilot started reviewing on behalf of wjxiz1992 March 11, 2026 06:52 View session

greptile-apps bot reviewed Mar 11, 2026

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala Show resolved Hide resolved

sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 11, 2026

View reviewed changes

wjxiz1992 requested review from GaryShen2008, firestarman, sperlingxx and thirtiseven March 12, 2026 07:13

wjxiz1992 self-assigned this Mar 12, 2026

wjxiz1992 requested a review from liurenjie1024 March 20, 2026 03:50

firestarman reviewed Mar 24, 2026

View reviewed changes

firestarman approved these changes Mar 25, 2026

View reviewed changes

wjxiz1992 merged commit d4d2e1e into NVIDIA:main Mar 25, 2026
47 checks passed

sameerz added the bug Something isn't working label Mar 25, 2026

Conversation

wjxiz1992 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Traceability

Test plan

Performance

Checklists

Uh oh!

greptile-apps bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

thirtiseven Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

thirtiseven Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 commented Mar 12, 2026

Uh oh!

wjxiz1992 commented Mar 19, 2026

Performance Analysis

Uh oh!

wjxiz1992 commented Mar 19, 2026

Uh oh!

firestarman Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 commented Mar 24, 2026

Uh oh!

firestarman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wjxiz1992 commented Mar 11, 2026 •

edited

Loading

greptile-apps bot commented Mar 11, 2026 •

edited

Loading

wjxiz1992 Mar 18, 2026 •

edited

Loading