-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0 #8102
Conversation
build |
build |
I have more shim work to do, so moving back to draft for now.
|
build |
build |
build |
build |
…ontains_ansi_cast to assert_cpu_and_gpu_contains_ansi_cast
build |
sql-plugin/src/main/spark340/scala/com/nvidia/spark/rapids/shims/AnsiUtil.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback.scala
Outdated
Show resolved
Hide resolved
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/ExecutionPlanCaptureCallback.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/spark311/scala/com/nvidia/spark/rapids/shims/AnsiCastShim.scala
Show resolved
Hide resolved
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
build |
@@ -558,7 +558,36 @@ def run_on_gpu(): | |||
def run_with_cpu_and_gpu(func, | |||
mode, | |||
conf={}): | |||
from_cpu, _, from_gpu, _ = run_with_cpu_and_gpu_return_df(func, mode, conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the fix yet.
The test failure was
> from_cpu, cpu_df = with_cpu_session(bring_back, conf=conf)
[2023-04-19T22:24:58.252Z] �[1m�[31mE ValueError: not enough values to unpack (expected 2, got 1)�[0m
�[1m�[31m../../src/main/python/asserts.py�[0m:533: ValueError
which points to
from_cpu, cpu_df = with_cpu_session(bring_back, conf=conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added run_with_cpu_and_gpu_return_df
in this PR to support the new tests. Because it was largely the same as run_with_cpu_and_gpu
, I thought I would be smart and refactor run_with_cpu_and_gpu
to call run_with_cpu_and_gpu_return_df
and then throw away the unwanted values:
from_cpu, _, from_gpu, _ = run_with_cpu_and_gpu_return_df(func, mode, conf)
However, this was only safe for tests with the mode COLLECT_WITH_DATAFRAME
because in other cases only one value is returned from with_cpu_session
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So run_with_cpu_and_gpu_return_df
is possibly redundant since it should just return the tuple in this case? I will take a look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this new function was not needed ... I just needed to call run_with_cpu_and_gpu
with COLLECT_WITH_DATAFRAME.
I have pushed changes for this.
build |
Closes #7757
We had two tests that were failing with Spark 3.4.0. As discussed in the issue, the tests relied on some hacks to work around Scala Spark behavior, and moving these tests into Python avoids the need for these kind of hacks.
Changes in this PR:
AnsiUtil
has newisAnsiCast
methodExecutionPlanCaptureCallback
has newassertContainsAnsiCast
method