Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 #8102

andygrove · 2023-04-13T21:11:19Z

We had two tests that were failing with Spark 3.4.0. As discussed in the issue, the tests relied on some hacks to work around Scala Spark behavior, and moving these tests into Python avoids the need for these kind of hacks.

Changes in this PR:

Moves the two tests from Scala to Python
AnsiUtil has new isAnsiCast method
ExecutionPlanCaptureCallback has new assertContainsAnsiCast method

Signed-off-by: Andy Grove <[email protected]>

andygrove · 2023-04-13T21:12:09Z

build

andygrove · 2023-04-13T22:15:47Z

build

andygrove · 2023-04-14T13:53:47Z

I have more shim work to do, so moving back to draft for now.

exec binary-dedupe.sh failed, exit code is 255, error msg is org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback$.class is not bitwise-identical across shims

…insPlanMatching

andygrove · 2023-04-14T14:06:25Z

build

andygrove · 2023-04-14T21:53:19Z

build

andygrove · 2023-04-14T22:20:10Z

build

andygrove · 2023-04-17T17:19:12Z

build

integration_tests/src/main/python/asserts.py

…ontains_ansi_cast to assert_cpu_and_gpu_contains_ansi_cast

andygrove · 2023-04-18T16:30:46Z

build

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/AnsiUtil.scala

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback.scala

integration_tests/src/main/python/asserts.py

andygrove · 2023-04-19T16:49:40Z

build

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback.scala

sql-plugin/src/main/spark311/scala/com/nvidia/spark/rapids/shims/AnsiCastShim.scala

andygrove · 2023-04-19T20:01:10Z

build

gerashegalov

LGTM

andygrove · 2023-04-19T23:25:00Z

build

gerashegalov · 2023-04-19T23:50:36Z

integration_tests/src/main/python/asserts.py

@@ -558,7 +558,36 @@ def run_on_gpu():
 def run_with_cpu_and_gpu(func,
    mode,
    conf={}):
-    from_cpu, _, from_gpu, _ = run_with_cpu_and_gpu_return_df(func, mode, conf)


I don't quite understand the fix yet.

The test failure was

> from_cpu, cpu_df = with_cpu_session(bring_back, conf=conf) [2023-04-19T22:24:58.252Z] �[1m�[31mE ValueError: not enough values to unpack (expected 2, got 1)�[0m �[1m�[31m../../src/main/python/asserts.py�[0m:533: ValueError

which points to

spark-rapids/integration_tests/src/main/python/asserts.py

Line 533 in 9cc035f

from_cpu, cpu_df = with_cpu_session(bring_back, conf=conf)

Yes, I added run_with_cpu_and_gpu_return_df in this PR to support the new tests. Because it was largely the same as run_with_cpu_and_gpu, I thought I would be smart and refactor run_with_cpu_and_gpu to call run_with_cpu_and_gpu_return_df and then throw away the unwanted values:

from_cpu, _, from_gpu, _ = run_with_cpu_and_gpu_return_df(func, mode, conf)

However, this was only safe for tests with the mode COLLECT_WITH_DATAFRAME because in other cases only one value is returned from with_cpu_session.

So run_with_cpu_and_gpu_return_df is possibly redundant since it should just return the tuple in this case? I will take a look tomorrow.

Yeah, this new function was not needed ... I just needed to call run_with_cpu_and_gpu with COLLECT_WITH_DATAFRAME. I have pushed changes for this.

andygrove · 2023-04-20T14:01:52Z

build

andygrove added 6 commits April 13, 2023 13:05

Port Hive ansi tests to Python

e1fabbc

Merge remote-tracking branch 'nvidia/branch-23.06' into hive-ansi-340

330ee99

Signed-off-by: Andy Grove <[email protected]>

tests pass with 330

f11cb13

tests pass with 321

d1cff2c

remove debug println

25e1dfb

add newline

9c58435

andygrove added 2 commits April 13, 2023 15:35

scalastyle

f0bb9f8

revert some shim changes

b927b2a

andygrove changed the title ~~WIP: Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0~~ Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 Apr 13, 2023

andygrove marked this pull request as draft April 14, 2023 13:53

remove UnaryExecNode handling from ExecutionPlanCaptureCallback.conta…

ddd4094

…insPlanMatching

andygrove marked this pull request as ready for review April 14, 2023 16:36

andygrove requested review from NVnavkumar and nartal1 April 14, 2023 16:36

andygrove linked an issue Apr 14, 2023 that may be closed by this pull request

[BUG] Unit tests failure in AnsiCastOpSuite on Spark-3.4 #7757

Closed

upmerge

3613f9f

update Arm usage in 340 AnsiUtil shim

395f7d9

sameerz added the Spark 3.4+ Spark 3.4+ issues label Apr 16, 2023

andygrove self-assigned this Apr 17, 2023

Merge remote-tracking branch 'nvidia/branch-23.06' into hive-ansi-340

5a3d657

gerashegalov reviewed Apr 18, 2023

View reviewed changes

integration_tests/src/main/python/asserts.py Outdated Show resolved Hide resolved

andygrove added 2 commits April 18, 2023 07:26

refactor to reduce boilerplate, and rename assert_cpu_and_gpu_write_c…

c51ff13

…ontains_ansi_cast to assert_cpu_and_gpu_contains_ansi_cast

check both cpu and gpu for ansi cast

6ad466b

andygrove added 2 commits April 18, 2023 08:02

add new assert function run_with_cpu_and_gpu_return_df

09ffa9a

revert unused helper function

f91992d

gerashegalov reviewed Apr 18, 2023

View reviewed changes

sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/AnsiUtil.scala Outdated Show resolved Hide resolved

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback.scala Outdated Show resolved Hide resolved

gerashegalov reviewed Apr 18, 2023

View reviewed changes

integration_tests/src/main/python/asserts.py Outdated Show resolved Hide resolved

andygrove added 4 commits April 19, 2023 10:11

Add PlanShims.isAnsiCast

6ebf78c

use placeholder for unused vars in python code

9cc035f

Refactor shims to avoid duplicating code

bd1d98e

remove unused import, update copyright years

2946a8e

gerashegalov reviewed Apr 19, 2023

View reviewed changes

sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback.scala Outdated Show resolved Hide resolved

sql-plugin/src/main/spark311/scala/com/nvidia/spark/rapids/shims/AnsiCastShim.scala Show resolved Hide resolved

more shim refactoring

b97b011

gerashegalov previously approved these changes Apr 19, 2023

View reviewed changes

fix regression

87ad057

andygrove dismissed gerashegalov’s stale review via 87ad057 April 19, 2023 23:23

Merge remote-tracking branch 'nvidia/branch-23.06' into hive-ansi-340

d481567

gerashegalov reviewed Apr 20, 2023

View reviewed changes

remove run_with_cpu_and_gpu_return_df

eed2aa4

andygrove mentioned this pull request Apr 20, 2023

[BUG] get-shim-versions-from-dist workflow failing in CI #8153

Closed

Merge remote-tracking branch 'nvidia/branch-23.06' into hive-ansi-340

17951ba

gerashegalov approved these changes Apr 20, 2023

View reviewed changes

andygrove merged commit 4cd299b into NVIDIA:branch-23.06 Apr 20, 2023

andygrove deleted the hive-ansi-340 branch April 20, 2023 17:04

pxLi mentioned this pull request Apr 21, 2023

[BUG] failed AnsiCastShim build in datasbricks 11.3 runtime #8164

Closed

mattahrens added the feature request New feature or request label Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 #8102

Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 #8102

andygrove commented Apr 13, 2023 •

edited

Loading

andygrove commented Apr 13, 2023

andygrove commented Apr 13, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 17, 2023

andygrove commented Apr 18, 2023

andygrove commented Apr 19, 2023

andygrove commented Apr 19, 2023

gerashegalov left a comment

andygrove commented Apr 19, 2023

gerashegalov Apr 19, 2023

andygrove Apr 20, 2023

andygrove Apr 20, 2023

andygrove Apr 20, 2023

andygrove commented Apr 20, 2023

Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 #8102

Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 #8102

Conversation

andygrove commented Apr 13, 2023 • edited Loading

andygrove commented Apr 13, 2023

andygrove commented Apr 13, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 14, 2023

andygrove commented Apr 17, 2023

andygrove commented Apr 18, 2023

andygrove commented Apr 19, 2023

andygrove commented Apr 19, 2023

gerashegalov left a comment

Choose a reason for hiding this comment

andygrove commented Apr 19, 2023

gerashegalov Apr 19, 2023

Choose a reason for hiding this comment

andygrove Apr 20, 2023

Choose a reason for hiding this comment

andygrove Apr 20, 2023

Choose a reason for hiding this comment

andygrove Apr 20, 2023

Choose a reason for hiding this comment

andygrove commented Apr 20, 2023

andygrove commented Apr 13, 2023 •

edited

Loading