Skip to content

[AutoSparkUT] Fix CollectLimitExec GPU replacement for Command results (SPARK-19650)#14398

Merged
wjxiz1992 merged 3 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14110-command-job-testrapids
Mar 25, 2026
Merged

[AutoSparkUT] Fix CollectLimitExec GPU replacement for Command results (SPARK-19650)#14398
wjxiz1992 merged 3 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14110-command-job-testrapids

Conversation

@wjxiz1992
Copy link
Copy Markdown
Collaborator

@wjxiz1992 wjxiz1992 commented Mar 11, 2026

Summary

close #14110

  • Fix GPU plugin to preserve job-free executeCollect behavior for Command results (issue [AutoSparkUT]"SPARK-19650: An action on a Command should not trigger a Spark job" in SQLQuerySuite failed #14110).
  • When CollectLimitExec wraps a CommandResultExec (e.g., sql("show databases").head()), the GPU plugin was replacing it with GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec, forcing result materialization through GpuBringBackToHost's default executeCollect() which triggers a Spark job. On CPU, CollectLimitExec.executeCollect() delegates to CommandResultExec.executeTake() which returns pre-computed rows without a job.
  • The fix adds a tagPlanForGpu() check in GpuCollectLimitMeta: when the child is CommandResultExec or ExecutedCommandExec, CollectLimitExec stays on CPU to preserve the job-free execution path.
  • Remove the KNOWN_ISSUE exclusion for SPARK-19650 — the original Spark test now passes directly on GPU.

Traceability

Field Value
RAPIDS test name RapidsSQLQuerySuite (inherits original test, no testRapids override)
Spark original test test("SPARK-19650: An action on a Command should not trigger a Spark job") in SQLQuerySuite
Spark source file sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Line range 2627–2649
Source link (master) SQLQuerySuite.scala#L2627-L2649
Source link (pinned) SQLQuerySuite.scala@f66c336#L2627-L2649
Fix location GpuCollectLimitMeta.tagPlanForGpu() in limit.scala
Issue #14110

Test plan

  • mvn package -pl tests -am -Dbuildver=330 -Dmaven.repo.local=./.mvn-repo -DwildcardSuites=...RapidsSQLQuerySuiteTests: succeeded 215, failed 0, ignored 18 (was 19 ignored before)
  • SPARK-19650 test passes directly (original Spark test, no testRapids override)
  • No new failures introduced
  • Pre-merge CI

Performance

This PR modifies sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala (GPU execution path). Impact analysis:

Affected code path: cold path (query planning only). The new tagPlanForGpu() override in GpuCollectLimitMeta runs once per query plan during the plan-tagging phase — not per-row, not per-batch, not per-partition. The cost is one getClass.getSimpleName call + two string comparisons per CollectLimitExec node encountered during planning. This is negligible relative to any query's total planning time.

No hot-path changes:

  • convertToGpu() is unchanged — the GPU replacement logic for normal CollectLimitExec (e.g., SELECT * FROM table LIMIT N) is identical before and after this PR.
  • No new allocations, branches, or synchronization points are added to the normal GPU execution flow.
  • GPU memory usage patterns are unaffected — the change only prevents GPU replacement in a narrow case, it doesn't alter what the GPU path does when it is used.

Trigger condition is narrow: The guard only fires when CollectLimitExec.child is CommandResultExec or ExecutedCommandExec — DDL/administrative command results (e.g., SHOW DATABASES, DESCRIBE TABLE), not data-intensive queries. For all other child types, the existing GPU replacement path is taken with zero behavioral difference.

Net effect for the triggered case is a performance improvement: Before this fix, CollectLimitExec(CommandResultExec) was replaced with GPU operators that forced a Spark job submission through GpuBringBackToHost.executeCollect(). After this fix, CollectLimitExec stays on CPU and delegates to CommandResultExec.executeTake(), returning pre-computed rows without any Spark job — eliminating unnecessary job scheduling overhead.

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (The existing Spark test SPARK-19650 in RapidsSQLQuerySuite now covers the fixed code path — the KNOWN_ISSUE exclusion was removed so the test runs directly.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Made with Cursor

NVIDIA#14110)

Skip GPU replacement of CollectLimitExec when the child plan already
provides pre-computed results (CommandResultExec, ExecutedCommandExec,
LocalTableScanExec). These plans override executeCollect/executeTake to
return rows without triggering a Spark job. Replacing their parent
CollectLimitExec with GPU equivalent forces a job submission through
GpuBringBackToHost's default executeCollect path.

This preserves GPU-CPU parity for SPARK-19650: the original Spark test
now passes directly on GPU without any test adaptation.

Issue: NVIDIA#14110

Maven validation:
  mvn package -pl tests -am -Dbuildver=330 \
    -Dmaven.repo.local=./.mvn-repo \
    -DwildcardSuites=...RapidsSQLQuerySuite
  Tests: succeeded 215, failed 0, ignored 18

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
Copilot AI review requested due to automatic review settings March 11, 2026 06:52
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR fixes a correctness issue where the GPU plugin was replacing CollectLimitExec(CommandResultExec) with GPU operators (GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec), causing GpuBringBackToHost.executeCollect() to launch an unnecessary Spark job even though the CPU path can return pre-computed rows job-free via CommandResultExec.executeTake().

Key changes:

  • Adds a tagPlanForGpu() override in GpuCollectLimitMeta that calls willNotWorkOnGpu when the child plan is a CommandResultExec or ExecutedCommandExec, keeping CollectLimitExec on CPU for DDL/command results (e.g. SHOW DATABASES, DESCRIBE TABLE).
  • Uses full class name matching with endsWith (rather than isInstanceOf) to avoid compile errors on Spark versions predating CommandResultExec (introduced in 3.2) — an explanatory comment was added in response to prior review feedback.
  • Removes the KNOWN_ISSUE exclusion for SPARK-19650 from RapidsTestSettings for Spark 3.3.0, allowing the original Spark test to run directly on GPU.
  • Prior concerns about the LocalTableScanExec case being over-broad and the missing comment explaining the string-matching approach have both been fully addressed in the latest revision.

Confidence Score: 5/5

  • This PR is safe to merge — the fix is narrowly scoped, correctly implemented, and all prior review concerns have been addressed.
  • The change is a single targeted guard in the plan-tagging phase (cold path only), all previous review feedback about LocalTableScanExec over-breadth and the missing string-matching comment has been addressed, the base tagPlanForGpu() is empty so omitting a super call is consistent with the codebase pattern, and the test exclusion removal is validated by the passing test run reported in the PR description.
  • No files require special attention.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala Adds tagPlanForGpu() override in GpuCollectLimitMeta that prevents GPU replacement when CollectLimitExec wraps a CommandResultExec or ExecutedCommandExec, using full class name matching via endsWith to stay shim-compatible across Spark versions.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala Removes the KNOWN_ISSUE exclusion for SPARK-19650 from RapidsSQLQuerySuite, allowing the original Spark test to run directly on GPU now that the underlying issue is fixed.

Sequence Diagram

sequenceDiagram
    participant User
    participant CollectLimitExec
    participant GpuCollectLimitMeta
    participant tagPlanForGpu

    User->>CollectLimitExec: sql("show databases").head()

    Note over GpuCollectLimitMeta: Plan tagging phase
    GpuCollectLimitMeta->>tagPlanForGpu: tagPlanForGpu()
    tagPlanForGpu->>tagPlanForGpu: check child class name
    alt child is CommandResultExec or ExecutedCommandExec
        tagPlanForGpu->>GpuCollectLimitMeta: willNotWorkOnGpu(...)
        Note over CollectLimitExec: Stays on CPU
        CollectLimitExec->>CollectLimitExec: executeCollect() → child.executeTake(limit)
        Note over CollectLimitExec: Returns pre-computed rows (no Spark job)
    else normal child (e.g. data query)
        tagPlanForGpu->>GpuCollectLimitMeta: (no-op, GPU replacement proceeds)
        Note over GpuCollectLimitMeta: convertToGpu()
        GpuCollectLimitMeta-->>User: GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec
    end
Loading

Reviews (3): Last reviewed commit: "Use full class name for command exec typ..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Spark RAPIDS GPU plugin behavior regression where CollectLimitExec wrapping command-style results (e.g., sql("show databases").head()) would be replaced with a GPU plan that triggers a Spark job, instead of preserving Spark’s job-free executeCollect/executeTake behavior for pre-computed command results.

Changes:

  • Add a GpuCollectLimitMeta.tagPlanForGpu() guard to keep CollectLimitExec on CPU when its child is a pre-computed result exec (command/local table scan), avoiding an unnecessary Spark job.
  • Remove the SPARK-19650 exclusion from the Spark 3.3.0 RAPIDS test settings since the original Spark test now passes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala Prevents GPU replacement of CollectLimitExec for command/local pre-computed children to preserve job-free collection semantics.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala Re-enables the SPARK-19650 test by removing the known-issue exclusion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +196 to +205
override def tagPlanForGpu(): Unit = {
val childName = collectLimit.child.getClass.getSimpleName
if (childName == "CommandResultExec" ||
childName == "ExecutedCommandExec" ||
childName == "LocalTableScanExec") {
willNotWorkOnGpu(
s"child $childName already provides pre-computed " +
"results; replacing CollectLimit would trigger " +
"an unnecessary Spark job")
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using getClass.getSimpleName string comparisons to detect child plan types is brittle (renames/relocation across Spark versions, subclasses/proxies) and makes the logic harder to maintain. Prefer type-based matching (e.g., collectLimit.child match { case _: CommandResultExec | _: ExecutedCommandExec | _: LocalTableScanExec => ... }) or a shim/helper that checks classes directly, and import the relevant Spark exec classes instead of comparing names as strings.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same concern addressed in 590dd86. String matching is intentional: CommandResultExec was introduced in Spark 3.2, and limit.scala is cross-version common code — importing it directly would cause compile failures on older Spark shims. The comment added in 590dd86 explains this design choice.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a list of supported commands in the shim layer would be better practice. I'm fine with the current code but would like to see other reviewers' opinions.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the line limit issue seems to still be there, as it is in other recent PRs. (It's a nit for this pr, too.)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that a shim-layer approach would be cleaner if the list grows. Right now it's only 2 types and both are stable Spark internals (CommandResultExec since 3.2, ExecutedCommandExec since early Spark), so the maintenance cost of string matching is low. Happy to refactor into a shim helper if other reviewers also prefer that — would make sense to do it once we need to add more types.

Copy link
Copy Markdown
Collaborator Author

@wjxiz1992 wjxiz1992 Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, but this has already passed IT, the lines are not too many, I will do a length change if there're new comments in, I'll do it along with that.

- Remove LocalTableScanExec from the guard — it is used for general
  small datasets where GPU acceleration is still beneficial.
- Add comment explaining why string matching is used instead of
  isInstanceOf (cross-version compatibility).

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

Performance Analysis

This PR modifies sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala, so addressing performance impact here.

Affected code path: cold path (query planning only)

The new tagPlanForGpu() override in GpuCollectLimitMeta runs once per query plan during the tagPlanForGpu phase — not per-row, not per-batch, not per-partition. The cost is one getClass.getSimpleName call + two string comparisons per CollectLimitExec node encountered during planning. This is negligible relative to any query's total planning time.

No hot-path changes:

  • convertToGpu() is unchanged — the GPU replacement logic for normal CollectLimitExec (e.g., SELECT * FROM table LIMIT N) is identical before and after this PR.
  • No new allocations, branches, or synchronization points are added to the normal GPU execution flow.
  • GPU memory usage patterns are unaffected — the change only prevents GPU replacement in a narrow case, it doesn't alter what the GPU path does when it is used.

Trigger condition is narrow:

The guard only fires when CollectLimitExec.child is CommandResultExec or ExecutedCommandExec — these are DDL/administrative command results (e.g., SHOW DATABASES, DESCRIBE TABLE), not data-intensive queries. For all other child types, the existing GPU replacement path is taken with zero behavioral difference.

Net effect for the triggered case is a performance improvement:

Before this fix, CollectLimitExec(CommandResultExec) was replaced with GpuGlobalLimitExec → GpuShuffleExchangeExec → GpuLocalLimitExec, which forced a Spark job submission through GpuBringBackToHost.executeCollect(). After this fix, CollectLimitExec stays on CPU and delegates to CommandResultExec.executeTake(), returning pre-computed rows without any Spark job — eliminating unnecessary job scheduling overhead.

Conclusion: No benchmark needed. The change is on a cold path (planning-time only), does not touch the hot execution path, and the behavioral change for the narrow trigger case is strictly a performance improvement (fewer Spark jobs).

@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

@sperlingxx @firestarman can you help take a look at this PR?

@wjxiz1992 wjxiz1992 requested a review from liurenjie1024 March 20, 2026 03:50
// String matching avoids compile errors on Spark versions
// where CommandResultExec (added in 3.2) does not exist.
val childName = collectLimit.child.getClass.getSimpleName
if (childName == "CommandResultExec" ||
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Can we use the full package name instead of the shorter one ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2fc8235. Switched from getClass.getSimpleName to getClass.getName with endsWith for full package name matching (.execution.CommandResultExec and .execution.command.ExecutedCommandExec). This is also more robust since getSimpleName can throw InternalError for some inner/anonymous classes.

Switch from getSimpleName to getName with endsWith for
CommandResultExec/ExecutedCommandExec detection in
tagPlanForGpu(). Full package name matching is more robust
than simple name — getSimpleName can throw InternalError
for some inner/anonymous classes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Allen Xu <allxu@nvidia.com>
@wjxiz1992
Copy link
Copy Markdown
Collaborator Author

build

Copy link
Copy Markdown
Collaborator

@firestarman firestarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wjxiz1992 wjxiz1992 merged commit d4d2e1e into NVIDIA:main Mar 25, 2026
47 checks passed
@sameerz sameerz added the bug Something isn't working label Mar 25, 2026
wjxiz1992 added a commit to wjxiz1992/spark-rapids that referenced this pull request Mar 30, 2026
The stash pop three-way merge re-introduced exclusions for NVIDIA#14098,

Signed-off-by: Allen Xu <allxu@nvidia.com>
Made-with: Cursor
NVIDIA#14110, and NVIDIA#14116 that were already removed by merged PRs NVIDIA#14446,
NVIDIA#14398, and NVIDIA#14400. Remove them to match origin/main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AutoSparkUT]"SPARK-19650: An action on a Command should not trigger a Spark job" in SQLQuerySuite failed

6 participants