Disable `test_read_hive_fixed_length_char` on Spark 3.4+. #8325

mythrocks · 2023-05-18T21:40:09Z

This commit disables test_read_hive_fixed_length_char for Spark 3.4 until #8324 is resolved (i.e. the change in behaviour of CHAR columns is addressed).

Fixes NVIDIA#8321. This commit disables `test_read_hive_fixed_length_char` for Spark 3.4 until NVIDIA#8324 is resolved (i.e. the change in behaviour of `CHAR` columns is addressed). Signed-off-by: MithunR <[email protected]>

mythrocks · 2023-05-18T22:09:47Z

Build

tgravescs · 2023-05-19T15:57:14Z

so to be clear, by disabling the test, we fallback to the CPU here? want to ensure since #8324 is marked low priority. If that is the case should we have test to verify it falls back? Maybe not required just asking

mythrocks · 2023-05-19T17:53:52Z

It's not a fallback, exactly.

On 3.4, Spark adds a code-gen step to modify the results (i.e. pad the result column out to the required width). That step causes the Project exec to fall off the GPU:

+- *(1) Project [staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, foo#20, 3, true, false, true) AS foo#24]
   +- GpuColumnarToRow false
      +- GpuFileGpuScan orc spark_catalog.default.foobar[foo#20] Batched: true, DataFilters: [], Format: ORC, Location: InMemoryFileIndex[file:/tmp/foobar], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<foo:string>

I don't know if it's worth adding a test verifying that Project isn't on GPU on 3.4. But I'm open to suggestions/improvements.

tgravescs · 2023-05-19T17:57:55Z

ok main thing is it falls back and doesn't fail, I'm fine with skipping test since we have the issue to track.

mythrocks · 2023-05-19T18:11:08Z

Actually, you've convinced me. I've added an equivalent test on 3.4, allowing for ProjectExec fallback.

You're right. It would be best to codify that that the GPU results match the CPU results, even when ProjectExec falls back.

mythrocks · 2023-05-19T18:11:50Z

Build

integration_tests/src/main/python/orc_test.py

mythrocks · 2023-05-19T20:20:16Z

Build

Disable test_read_hive_fixed_length_char on Spark 3.4+.

7369c2d

Fixes NVIDIA#8321. This commit disables `test_read_hive_fixed_length_char` for Spark 3.4 until NVIDIA#8324 is resolved (i.e. the change in behaviour of `CHAR` columns is addressed). Signed-off-by: MithunR <[email protected]>

mythrocks self-assigned this May 18, 2023

sameerz added the Spark 3.4+ Spark 3.4+ issues label May 19, 2023

revans2 previously approved these changes May 19, 2023

View reviewed changes

Added test for ProjectExec fallback.

dd5b431

mythrocks dismissed revans2’s stale review via dd5b431 May 19, 2023 18:08

mythrocks requested a review from tgravescs May 19, 2023 18:11

tgravescs reviewed May 19, 2023

View reviewed changes

integration_tests/src/main/python/orc_test.py Outdated Show resolved Hide resolved

Review comment: Use assert_gpu_fallback_collect().

b8d0e81

mythrocks requested a review from tgravescs May 19, 2023 18:48

jlowe approved these changes May 19, 2023

View reviewed changes

tgravescs approved these changes May 19, 2023

View reviewed changes

mythrocks merged commit dc116c7 into NVIDIA:branch-23.06 May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable `test_read_hive_fixed_length_char` on Spark 3.4+. #8325

Disable `test_read_hive_fixed_length_char` on Spark 3.4+. #8325

mythrocks commented May 18, 2023

mythrocks commented May 18, 2023

tgravescs commented May 19, 2023 •

edited

Loading

mythrocks commented May 19, 2023 •

edited

Loading

tgravescs commented May 19, 2023

mythrocks commented May 19, 2023

mythrocks commented May 19, 2023

mythrocks commented May 19, 2023

Disable test_read_hive_fixed_length_char on Spark 3.4+. #8325

Disable test_read_hive_fixed_length_char on Spark 3.4+. #8325

Conversation

mythrocks commented May 18, 2023

mythrocks commented May 18, 2023

tgravescs commented May 19, 2023 • edited Loading

mythrocks commented May 19, 2023 • edited Loading

tgravescs commented May 19, 2023

mythrocks commented May 19, 2023

mythrocks commented May 19, 2023

mythrocks commented May 19, 2023

Disable `test_read_hive_fixed_length_char` on Spark 3.4+. #8325

Disable `test_read_hive_fixed_length_char` on Spark 3.4+. #8325

tgravescs commented May 19, 2023 •

edited

Loading

mythrocks commented May 19, 2023 •

edited

Loading