Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions sql-plugin/src/main/scala/com/nvidia/spark/rapids/limit.scala
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,21 @@ class GpuCollectLimitMeta(
override val childParts: scala.Seq[PartMeta[_]] =
Seq(GpuOverrides.wrapPart(collectLimit.outputPartitioning, conf, Some(this)))

override def tagPlanForGpu(): Unit = {
// Use full class name to avoid compile errors on Spark versions
// where CommandResultExec (added in 3.2) does not exist.
val childClass = collectLimit.child.getClass.getName
if (childClass.endsWith(
".execution.CommandResultExec") ||
childClass.endsWith(
".execution.command.ExecutedCommandExec")) {
willNotWorkOnGpu(
s"child ${collectLimit.child.getClass.getSimpleName}" +
" already provides pre-computed results; replacing" +
" CollectLimit would trigger an unnecessary Spark job")
}
Comment on lines +196 to +208
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using getClass.getSimpleName string comparisons to detect child plan types is brittle (renames/relocation across Spark versions, subclasses/proxies) and makes the logic harder to maintain. Prefer type-based matching (e.g., collectLimit.child match { case _: CommandResultExec | _: ExecutedCommandExec | _: LocalTableScanExec => ... }) or a shim/helper that checks classes directly, and import the relevant Spark exec classes instead of comparing names as strings.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same concern addressed in 590dd86. String matching is intentional: CommandResultExec was introduced in Spark 3.2, and limit.scala is cross-version common code — importing it directly would cause compile failures on older Spark shims. The comment added in 590dd86 explains this design choice.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a list of supported commands in the shim layer would be better practice. I'm fine with the current code but would like to see other reviewers' opinions.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the line limit issue seems to still be there, as it is in other recent PRs. (It's a nit for this pr, too.)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that a shim-layer approach would be cleaner if the list grows. Right now it's only 2 types and both are stable Spark internals (CommandResultExec since 3.2, ExecutedCommandExec since early Spark), so the maintenance cost of string matching is low. Happy to refactor into a shim helper if other reviewers also prefer that — would make sense to do it once we need to add more types.

Copy link
Copy Markdown
Collaborator Author

@wjxiz1992 wjxiz1992 Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, but this has already passed IT, the lines are not too many, I will do a length change if there're new comments in, I'll do it along with that.

}

override def convertToGpu(): GpuExec =
GpuGlobalLimitExec(collectLimit.limit,
GpuShuffleExchangeExec(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,6 @@ class RapidsTestSettings extends BackendTestSettings {
.exclude("Common subexpression elimination", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/14106"))
.exclude("SPARK-27619: When spark.sql.legacy.allowHashOnMapType is true, hash can be used on Maptype", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/14108"))
.exclude("SPARK-17515: CollectLimit.execute() should perform per-partition limits", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/14109"))
.exclude("SPARK-19650: An action on a Command should not trigger a Spark job", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/14110"))
.exclude("SPARK-31594: Do not display the seed of rand/randn with no argument in output schema", ADJUST_UT("Replaced by testRapids version with a correct regex expression to match the projectExplainOutput, randn isn't supported now. See https://github.com/NVIDIA/spark-rapids/issues/11613"))
.exclude("normalize special floating numbers in subquery", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/14116"))
.exclude("SPARK-33677: LikeSimplification should be skipped if pattern contains any escapeChar", KNOWN_ISSUE("https://github.com/NVIDIA/spark-rapids/issues/14117"))
Expand Down
Loading