Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable test_read_hive_fixed_length_char on Spark 3.4+. #8325

Merged
merged 3 commits into from
May 19, 2023

Conversation

mythrocks
Copy link
Collaborator

Fixes #8321.

This commit disables test_read_hive_fixed_length_char for Spark 3.4 until #8324 is resolved (i.e. the change in behaviour of CHAR columns is addressed).

Fixes NVIDIA#8321.

This commit disables `test_read_hive_fixed_length_char` for Spark 3.4
until NVIDIA#8324 is resolved (i.e. the change in behaviour of `CHAR` columns
is addressed).

Signed-off-by: MithunR <[email protected]>
@mythrocks mythrocks self-assigned this May 18, 2023
@mythrocks
Copy link
Collaborator Author

Build

@sameerz sameerz added the Spark 3.4+ Spark 3.4+ issues label May 19, 2023
revans2
revans2 previously approved these changes May 19, 2023
@tgravescs
Copy link
Collaborator

tgravescs commented May 19, 2023

so to be clear, by disabling the test, we fallback to the CPU here? want to ensure since #8324 is marked low priority. If that is the case should we have test to verify it falls back? Maybe not required just asking

@mythrocks
Copy link
Collaborator Author

mythrocks commented May 19, 2023

It's not a fallback, exactly.

On 3.4, Spark adds a code-gen step to modify the results (i.e. pad the result column out to the required width). That step causes the Project exec to fall off the GPU:

+- *(1) Project [staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, foo#20, 3, true, false, true) AS foo#24]
   +- GpuColumnarToRow false
      +- GpuFileGpuScan orc spark_catalog.default.foobar[foo#20] Batched: true, DataFilters: [], Format: ORC, Location: InMemoryFileIndex[file:/tmp/foobar], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<foo:string>

I don't know if it's worth adding a test verifying that Project isn't on GPU on 3.4. But I'm open to suggestions/improvements.

@tgravescs
Copy link
Collaborator

ok main thing is it falls back and doesn't fail, I'm fine with skipping test since we have the issue to track.

@mythrocks
Copy link
Collaborator Author

Actually, you've convinced me. I've added an equivalent test on 3.4, allowing for ProjectExec fallback.

You're right. It would be best to codify that that the GPU results match the CPU results, even when ProjectExec falls back.

@mythrocks mythrocks requested a review from tgravescs May 19, 2023 18:11
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks requested a review from tgravescs May 19, 2023 18:48
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks merged commit dc116c7 into NVIDIA:branch-23.06 May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 3.4+ Spark 3.4+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] test_read_hive_fixed_length_char integ test fails on Spark 3.4
5 participants