Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle PySparkException in case of literal expressions [databricks] #8192

Merged
merged 4 commits into from
May 2, 2023

Conversation

razajafri
Copy link
Collaborator

@razajafri razajafri commented Apr 27, 2023

Spark 3.4.0 and possibly later versions throw a PySparkException instead of a Py4JJavaError only when the expression is a literal. To handle this this PR passes the exception message as part of the call to the test function has split the expressions tests from literals tests.

We also changed the name of assert_py4j_exception to assert_spark_exception as it handles PySparkException in addition to Py4JJavaError

fixes #8160

Pass exception message as param to the test

Signed-off-by: Raza Jafri <[email protected]>
@sameerz sameerz added the Spark 3.4+ Spark 3.4+ issues label Apr 27, 2023
@razajafri razajafri changed the title Handle PySparkException in case of literal expressions Handle PySparkException in case of literal expressions [databricks] Apr 27, 2023
@razajafri razajafri requested a review from jlowe April 28, 2023 16:01
jlowe
jlowe previously approved these changes Apr 28, 2023
@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

CI failed due to

[2023-04-28T18:07:34.499Z] subprocess.CalledProcessError: Command 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null [email protected] -p 2200 -i **** 'SPARKSRCTGZ=/home/ubuntu/spark-rapids-ci.tgz BASE_SPARK_VERSION=3.3.0 BASE_SPARK_VERSION_TO_INSTALL_DATABRICKS_JARS=3.3.0 MVN_OPT=         bash /home/ubuntu/build.sh  2>&1 | tee buildout; if [ `echo ${PIPESTATUS[0]}` -ne 0 ]; then false; else true; fi'' returned non-zero exit status 1.

@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

There seems to be a disconnect between DB node that the CI runs tests on.

[2023-04-28T21:51:25.035Z] ERROR ../../src/main/python/prune_partition_column_test.py::test_prune_partition_column_when_fallback_filter_project[c-orc-True][ALLOW_NON_GPU(FilterExec)] - ConnectionRefusedError: [Errno 111] Connection refused

@sameerz
Copy link
Collaborator

sameerz commented May 1, 2023

build

@NVnavkumar
Copy link
Collaborator

Looks like Databricks 11.3 doesn't throw a PySpark failure (like 3.4.0), but still throws SparkArithmeticException:

2023-05-01T21:23:59.8766333Z [2023-05-01T21:21:59.670Z] =================================== FAILURES ===================================
2023-05-01T21:23:59.8766824Z [2023-05-01T21:21:59.670Z] �[31m�[1m_ test_div_overflow_exception_when_ansi_literal[True-CAST(-9223372036854775808L as LONG) DIV -1] _�[0m
2023-05-01T21:23:59.8767260Z [2023-05-01T21:21:59.670Z] [gw1] linux -- Python 3.8.10 /usr/bin/python
2023-05-01T21:23:59.8767549Z [2023-05-01T21:21:59.670Z] 
2023-05-01T21:23:59.8767926Z [2023-05-01T21:21:59.670Z] expr = 'CAST(-9223372036854775808L as LONG) DIV -1', ansi_enabled = True
2023-05-01T21:23:59.8768232Z [2023-05-01T21:21:59.670Z] 
2023-05-01T21:23:59.8768764Z [2023-05-01T21:21:59.670Z]     @pytest.mark.skipif(is_before_spark_320(), reason='https://github.com/apache/spark/pull/32260')
2023-05-01T21:23:59.8769277Z [2023-05-01T21:21:59.670Z]     @pytest.mark.parametrize('expr', ['CAST(-9223372036854775808L as LONG) DIV -1'])
2023-05-01T21:23:59.8769712Z [2023-05-01T21:21:59.670Z]     @pytest.mark.parametrize('ansi_enabled', [False, True])
2023-05-01T21:23:59.8770171Z [2023-05-01T21:21:59.670Z]     def test_div_overflow_exception_when_ansi_literal(expr, ansi_enabled):
2023-05-01T21:23:59.8770619Z [2023-05-01T21:21:59.670Z] >       _div_overflow_exception_when(expr, ansi_enabled, is_lit=True)
2023-05-01T21:23:59.8770935Z [2023-05-01T21:21:59.670Z] 
2023-05-01T21:23:59.8771308Z [2023-05-01T21:21:59.670Z] �[1m�[31m../../src/main/python/arithmetic_ops_test.py�[0m:1053: 
2023-05-01T21:23:59.8771718Z [2023-05-01T21:21:59.670Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2023-05-01T21:23:59.8772174Z [2023-05-01T21:21:59.670Z] �[1m�[31m../../src/main/python/arithmetic_ops_test.py�[0m:1028: in _div_overflow_exception_when
2023-05-01T21:23:59.8772561Z [2023-05-01T21:21:59.670Z]     assert_gpu_and_cpu_error(
2023-05-01T21:23:59.8772982Z [2023-05-01T21:21:59.670Z] �[1m�[31m../../src/main/python/asserts.py�[0m:626: in assert_gpu_and_cpu_error
2023-05-01T21:23:59.8773454Z [2023-05-01T21:21:59.670Z]     assert_spark_exception(lambda: with_cpu_session(df_fun, conf), error_message)
2023-05-01T21:23:59.8773877Z [2023-05-01T21:21:59.670Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2023-05-01T21:23:59.8774252Z [2023-05-01T21:21:59.670Z] 
2023-05-01T21:23:59.8774654Z [2023-05-01T21:21:59.670Z] func = <function assert_gpu_and_cpu_error.<locals>.<lambda> at 0x7f1a66316e50>
2023-05-01T21:23:59.8775259Z [2023-05-01T21:21:59.670Z] error_message = 'pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in integral divide'
2023-05-01T21:23:59.8775682Z [2023-05-01T21:21:59.670Z] 
2023-05-01T21:23:59.8776025Z [2023-05-01T21:21:59.670Z]     def assert_spark_exception(func, error_message):
2023-05-01T21:23:59.8776346Z [2023-05-01T21:21:59.670Z]         """
2023-05-01T21:23:59.8776707Z [2023-05-01T21:21:59.670Z]         Assert that a specific Java exception is thrown
2023-05-01T21:23:59.8777095Z [2023-05-01T21:21:59.670Z]         :param func: a function to be verified
2023-05-01T21:23:59.8777606Z [2023-05-01T21:21:59.670Z]         :param error_message: a string such as the one produce by java.lang.Exception.toString
2023-05-01T21:23:59.8778124Z [2023-05-01T21:21:59.670Z]         :return: Assertion failure if no exception matching error_message has occurred.
2023-05-01T21:23:59.8778469Z [2023-05-01T21:21:59.670Z]         """
2023-05-01T21:23:59.8778822Z [2023-05-01T21:21:59.670Z]         with pytest.raises(Exception) as excinfo:
2023-05-01T21:23:59.8779153Z [2023-05-01T21:21:59.670Z]             func()
2023-05-01T21:23:59.8779525Z [2023-05-01T21:21:59.670Z]         actual_error = excinfo.exconly()
2023-05-01T21:23:59.8780033Z [2023-05-01T21:21:59.670Z] >       assert error_message in actual_error, f"Expected error '{error_message}' did not appear in '{actual_error}'"
2023-05-01T21:23:59.8780925Z [2023-05-01T21:21:59.670Z] �[1m�[31mE       AssertionError: Expected error 'pyspark.errors.exceptions.captured.ArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in integral divide' did not appear in 'py4j.protocol.Py4JJavaError: An error occurred while calling o321840.collectToPython.�[0m
2023-05-01T21:23:59.8781949Z [2023-05-01T21:21:59.671Z] �[1m�[31mE       : org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] Overflow in integral divide. Use 'try_divide' to tolerate overflow and return NULL instead. If necessary set spark.sql.ansi.enabled to "false" to bypass this error.�[0m
2023-05-01T21:23:59.8782546Z [2023-05-01T21:21:59.671Z] �[1m�[31mE       == SQL(line 1, position 1) ==�[0m
2023-05-01T21:23:59.8782954Z [2023-05-01T21:21:59.671Z] �[1m�[31mE       CAST(-9223372036854775808L as LONG) DIV -1�[0m
2023-05-01T21:23:59.8783336Z [2023-05-01T21:21:59.671Z] �[1m�[31mE       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^�[0m
2023-05-01T21:23:59.8783641Z [2023-05-01T21:21:59.671Z] �[1m�[31mE       �[0m

Signed-off-by: Raza Jafri <[email protected]>
@razajafri
Copy link
Collaborator Author

build

@razajafri razajafri requested a review from jlowe May 2, 2023 04:50
Copy link
Collaborator

@NVnavkumar NVnavkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@razajafri razajafri merged commit 8a6ce2c into NVIDIA:branch-23.06 May 2, 2023
@razajafri razajafri deleted the pyspark-exception branch May 2, 2023 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 3.4+ Spark 3.4+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Arithmetic_ops_test failure for Spark 3.4
5 participants