[AutoSparkUT] Fix cached table zero-column scan crash (issue #14098) by wjxiz1992 · Pull Request #14446 · NVIDIA/spark-rapids

wjxiz1992 · 2026-03-20T09:24:48Z

Summary

Fix ParquetCachedBatchSerializer crash when a cached table is scanned with zero selected columns (e.g., cross-join side that only needs row count)
Root cause: empty selectedAttributes was incorrectly treated as "select all columns", producing a full-column buffer that mismatched the broadcast exchange's empty output schema
Return row-only batches when no columns are selected, fixing both the GPU path (gpuConvertCachedBatchToColumnarBatch) and CPU fallback path (convertCachedBatchToColumnarBatch)

Test Plan

RapidsSQLQuerySuite passes with 234 tests, 0 failures (buildver=330)
"SPARK-6743: no columns from cache" test now passes on GPU — exclusion removed
No new test regressions

PR Traceability

RAPIDS Test	Spark Original	Spark Source	Lines
`RapidsSQLQuerySuite` (inherited)	`SPARK-6743: no columns from cache`	`sql/core/.../SQLQuerySuite.scala`	129-144

Performance

Cold-path only change. Normal column selection (hot path) is unaffected. The zero-column edge case is now faster since it skips unnecessary parquet decoding.

Checklists

This PR has added documentation for new or modified features or behaviors.
This PR has added new tests or modified existing tests to cover new code paths.
Performance testing has been performed and its results are added in the PR description.

Closes #14098

🤖 Generated with Claude Code

…o-column scans (issue NVIDIA#14098) When a cached table is used in a cross join and one side needs zero columns (only row count), ParquetCachedBatchSerializer incorrectly treated empty selectedAttributes as "select all columns". This caused a column count mismatch when the broadcast exchange deserialized the buffer with an empty output schema. Return row-only batches when selectedAttributes is empty instead of falling back to all cached columns. Fixes both the GPU path (gpuConvertCachedBatchToColumnarBatch) and the CPU fallback path (convertCachedBatchToColumnarBatch). ### Performance Cold-path only change. Normal column selection (hot path) is unaffected. The zero-column edge case is now faster since it skips unnecessary parquet decoding. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. - [x] Performance testing has been performed and its results are added in the PR description. Closes NVIDIA#14098 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>

Keep lines that are under 85 chars on a single line instead of splitting them across multiple lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>

Copilot

Pull request overview

Fixes a crash in the Parquet-based cached table serializer when a cached table scan selects zero columns (e.g., count-only/cross-join side row-count usage), by returning row-only ColumnarBatches instead of incorrectly treating “no selected attributes” as “select all columns”.

Changes:

Add an explicit zero-selected-columns fast-path in ParquetCachedBatchSerializer for both GPU conversion and CPU conversion paths.
Remove the RapidsSQLQuerySuite exclusion for SPARK-6743: no columns from cache now that the bug is fixed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala	Re-enables the Spark-derived unit test previously excluded due to the crash.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/ParquetCachedBatchSerializer.scala	Returns row-only batches on zero-column scans and avoids unnecessary parquet decoding / schema mismatch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T09:28:03Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/ParquetCachedBatchSerializer.scala

+    // When no columns are selected (e.g., count-only scan or
+    // cross-join side that needs only row count), return
+    // row-only batches without decoding parquet data.
+    if (selectedAttributes.isEmpty) {
+      return input.map {
+        case parquetCB: ParquetCachedBatch =>
+          new ColumnarBatch(Array.empty, parquetCB.numRows)
+        case other =>
+          throw new IllegalStateException(
+            s"Expected ParquetCachedBatch but got ${other.getClass}")
+      }


The zero-column fast-path mapping (selectedAttributes.isEmpty -> map ParquetCachedBatch to new ColumnarBatch(Array.empty, numRows)) is duplicated here and again in convertCachedBatchToColumnarBatch. Consider extracting a small private helper to keep behavior/exception text consistent and reduce the chance of one path diverging in future edits.

The two call sites differ in return type semantics (gpuConvertCachedBatchToColumnarBatch returns GPU-resident batches, convertCachedBatchToColumnarBatch returns host batches) so a shared helper would need to paper over that distinction. Given the logic is just new ColumnarBatch(Array.empty, numRows), the duplication is minimal and a helper would add more abstraction than it saves.

greptile-apps · 2026-03-20T09:29:07Z

Greptile Summary

This PR fixes a crash in ParquetCachedBatchSerializer when a cached table is scanned with zero selected columns (e.g., the broadcast side of a cross-join that only needs a row count). Both gpuConvertCachedBatchToColumnarBatch and convertCachedBatchToColumnarBatch previously "optimized" an empty selectedAttributes by substituting cacheAttributes (all columns), causing a full-column batch to be produced that mismatched the downstream consumer's empty schema. The fix adds an early return in both methods that directly maps each ParquetCachedBatch to ColumnarBatch(Array.empty, parquetCB.numRows), bypassing Parquet decoding entirely.

Key changes:

Replaces the misguided "select-all when none requested" pattern in both GPU and CPU paths with an early-exit that returns lightweight row-count-only batches
Correctly skips GpuSemaphore.acquireIfNecessary in the new path since no GPU resources are used
Removes the SPARK-6743: no columns from cache test exclusion, confirming the bug is resolved

Minor observation: The existing sizeInBytes == 0 branch inside convertCachedBatchToColumnarInternal (lines 497–504) was previously the secondary handler for zero-column cached batches, but is now unreachable via any normal code path: the only write-side producer of empty-buffer ParquetCachedBatch objects (numCols() == 0 caching branch) will also result in empty cacheAttributes, which triggers the new early-return before convertCachedBatchToColumnarInternal is ever called. This is harmless dead code but could be cleaned up in a follow-up.

Confidence Score: 5/5

This PR is safe to merge — the fix is minimal, well-targeted, and the normal (non-empty selection) hot path is completely unchanged.
Both changed code paths are correct: the early return produces the exact shape (ColumnarBatch with no columns and the right row count) that downstream consumers expect for a zero-column projection. No GPU semaphore is acquired (correct, since no GPU work is done), and existing test coverage directly validates the fix via SPARK-6743: no columns from cache. The only non-critical observation is that the sizeInBytes == 0 branch in convertCachedBatchToColumnarInternal is now dead code, which does not affect correctness.
No files require special attention.

Important Files Changed

Filename	Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/ParquetCachedBatchSerializer.scala	Adds zero-column early-return guards to both `gpuConvertCachedBatchToColumnarBatch` and `convertCachedBatchToColumnarBatch`; removes the old "optimize" pattern that incorrectly substituted all columns for an empty selection. A minor side-effect is that the existing `sizeInBytes == 0` branch in `convertCachedBatchToColumnarInternal` is now unreachable for the zero-column path, making it dead code.
tests/src/test/spark330/scala/org/apache/spark/sql/rapids/utils/RapidsTestSettings.scala	Removes the `SPARK-6743: no columns from cache` exclusion from `RapidsSQLQuerySuite`, correctly reflecting that the bug is now fixed.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[convertCachedBatchToColumnarBatch\nor gpuConvertCachedBatchToColumnarBatch] --> B{selectedAttributes.isEmpty?}
    B -- Yes\n NEW path --> C[Return RDD mapping each\nParquetCachedBatch →\nColumnarBatch with Array.empty\nand parquetCB.numRows]
    B -- No\n Existing path --> D[getSupportedSchemaFromUnsupported\ncacheAttributes, selectedAttributes]
    D --> E{isSqlEnabled &&\nisSqlExecuteOnGPU &&\nisSchemaSupportedByCudf?}
    E -- Yes --> F[convertCachedBatchToColumnarInternal\nGPU decode path]
    F --> G[Copy GPU→Host columns\nwrap in CloseableColumnBatchIterator]
    E -- No --> H[CachedBatchIteratorConsumer\nCPU Parquet decode path]
    C --> I[Consumer receives\n0-column batch with\ncorrect row count ✓]
    G --> I
    H --> I

_{Last reviewed commit: "Restore original mul..."}

greptile-apps · 2026-03-20T09:29:11Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/ParquetCachedBatchSerializer.scala

+    if (selectedAttributes.isEmpty) {
+      return input.map {
+        case parquetCB: ParquetCachedBatch =>
+          new ColumnarBatch(Array.empty, parquetCB.numRows)
+        case other =>
+          throw new IllegalStateException(
+            s"Expected ParquetCachedBatch but got ${other.getClass}")
+      }
    }


Inconsistency with CloseableColumnBatchIterator wrapping

The non-empty paths in convertCachedBatchToColumnarBatch both wrap their results in CloseableColumnBatchIterator (GPU path at line 592, CPU path via CachedBatchIteratorConsumer), but the new zero-column early-return does not. While this is functionally safe — ColumnarBatch(Array.empty, n) holds no closeable column vector resources — it is a structural inconsistency. Consider wrapping for uniformity:

Suggested change

if (selectedAttributes.isEmpty) {

return input.map {

case parquetCB: ParquetCachedBatch =>

new ColumnarBatch(Array.empty, parquetCB.numRows)

case other =>

throw new IllegalStateException(

s"Expected ParquetCachedBatch but got ${other.getClass}")

}

}

// When no columns are selected, return row-only batches

if (selectedAttributes.isEmpty) {

return input.mapPartitions { cbIter =>

CloseableColumnBatchIterator(cbIter.map {

case parquetCB: ParquetCachedBatch =>

new ColumnarBatch(Array.empty, parquetCB.numRows)

case other =>

throw new IllegalStateException(

s"Expected ParquetCachedBatch but got ${other.getClass}")

})

}

}

The same note applies to the analogous block in gpuConvertCachedBatchToColumnarBatch (lines 472–479).

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

As you noted, this is functionally safe — the empty ColumnarBatch holds no closeable resources, so wrapping it in CloseableColumnBatchIterator would be a no-op. Keeping the early return simple makes the intent clearer: no columns → no decoding, just row count.

…rInternal call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2026-03-20T09:39:48Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/ParquetCachedBatchSerializer.scala

firestarman

LGTM

The stash pop three-way merge re-introduced exclusions for NVIDIA#14098, Signed-off-by: Allen Xu <allxu@nvidia.com> Made-with: Cursor NVIDIA#14110, and NVIDIA#14116 that were already removed by merged PRs NVIDIA#14446, NVIDIA#14398, and NVIDIA#14400. Remove them to match origin/main.

Copilot AI review requested due to automatic review settings March 20, 2026 09:24

Copilot started reviewing on behalf of wjxiz1992 March 20, 2026 09:25 View session

Remove unnecessary line breaks in ParquetCachedBatchSerializer

7ba2b1d

Keep lines that are under 85 chars on a single line instead of splitting them across multiple lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>

Copilot AI reviewed Mar 20, 2026

View reviewed changes

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

wjxiz1992 self-assigned this Mar 20, 2026

Restore original multi-line formatting in convertCachedBatchToColumna…

528e885

…rInternal call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 requested review from GaryShen2008, firestarman, tgravescs and thirtiseven March 23, 2026 05:39

firestarman reviewed Mar 23, 2026

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/ParquetCachedBatchSerializer.scala Show resolved Hide resolved

firestarman approved these changes Mar 23, 2026

View reviewed changes

wjxiz1992 merged commit 6a9f655 into NVIDIA:main Mar 24, 2026
47 checks passed

sameerz added the bug Something isn't working label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoSparkUT] Fix cached table zero-column scan crash (issue #14098)#14446

[AutoSparkUT] Fix cached table zero-column scan crash (issue #14098)#14446
wjxiz1992 merged 3 commits intoNVIDIA:mainfrom
wjxiz1992:fix/14098-no-columns-from-cache

wjxiz1992 commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

wjxiz1992 Mar 20, 2026

Uh oh!

greptile-apps bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 20, 2026

Uh oh!

wjxiz1992 Mar 20, 2026

Uh oh!

wjxiz1992 commented Mar 20, 2026

Uh oh!

Uh oh!

firestarman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

wjxiz1992 commented Mar 20, 2026

Summary

Test Plan

PR Traceability

Performance

Checklists

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

wjxiz1992 commented Mar 20, 2026

Uh oh!

Uh oh!

firestarman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

greptile-apps bot commented Mar 20, 2026 •

edited

Loading