Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 #8102

Merged
merged 25 commits into from
Apr 20, 2023

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Apr 13, 2023

Closes #7757

We had two tests that were failing with Spark 3.4.0. As discussed in the issue, the tests relied on some hacks to work around Scala Spark behavior, and moving these tests into Python avoids the need for these kind of hacks.

Changes in this PR:

  • Moves the two tests from Scala to Python
  • AnsiUtil has new isAnsiCast method
  • ExecutionPlanCaptureCallback has new assertContainsAnsiCast method

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove changed the title WIP: Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 Apr 13, 2023
@andygrove
Copy link
Contributor Author

I have more shim work to do, so moving back to draft for now.

exec binary-dedupe.sh failed, exit code is 255, error msg is org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback$.class is not bitwise-identical across shims

@andygrove andygrove marked this pull request as draft April 14, 2023 13:53
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove marked this pull request as ready for review April 14, 2023 16:36
@andygrove andygrove requested review from NVnavkumar and nartal1 April 14, 2023 16:36
@andygrove andygrove linked an issue Apr 14, 2023 that may be closed by this pull request
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

@sameerz sameerz added the Spark 3.4+ Spark 3.4+ issues label Apr 16, 2023
@andygrove andygrove self-assigned this Apr 17, 2023
@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

@andygrove
Copy link
Contributor Author

build

gerashegalov
gerashegalov previously approved these changes Apr 19, 2023
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove
Copy link
Contributor Author

build

@@ -558,7 +558,36 @@ def run_on_gpu():
def run_with_cpu_and_gpu(func,
mode,
conf={}):
from_cpu, _, from_gpu, _ = run_with_cpu_and_gpu_return_df(func, mode, conf)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand the fix yet.

The test failure was

>       from_cpu, cpu_df = with_cpu_session(bring_back, conf=conf)
[2023-04-19T22:24:58.252Z] �[1m�[31mE       ValueError: not enough values to unpack (expected 2, got 1)�[0m
�[1m�[31m../../src/main/python/asserts.py�[0m:533: ValueError

which points to

from_cpu, cpu_df = with_cpu_session(bring_back, conf=conf)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added run_with_cpu_and_gpu_return_df in this PR to support the new tests. Because it was largely the same as run_with_cpu_and_gpu, I thought I would be smart and refactor run_with_cpu_and_gpu to call run_with_cpu_and_gpu_return_df and then throw away the unwanted values:

from_cpu, _, from_gpu, _ = run_with_cpu_and_gpu_return_df(func, mode, conf)

However, this was only safe for tests with the mode COLLECT_WITH_DATAFRAME because in other cases only one value is returned from with_cpu_session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So run_with_cpu_and_gpu_return_df is possibly redundant since it should just return the tuple in this case? I will take a look tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this new function was not needed ... I just needed to call run_with_cpu_and_gpu with COLLECT_WITH_DATAFRAME. I have pushed changes for this.

@andygrove
Copy link
Contributor Author

build

@andygrove andygrove merged commit 4cd299b into NVIDIA:branch-23.06 Apr 20, 2023
@andygrove andygrove deleted the hive-ansi-340 branch April 20, 2023 17:04
@mattahrens mattahrens added the feature request New feature or request label Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Spark 3.4+ Spark 3.4+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Unit tests failure in AnsiCastOpSuite on Spark-3.4
5 participants