[SPARK-19926][PYSPARK] make captured exception from JVM side user friendly #25814

advancedxy · 2019-09-17T08:50:19Z

What changes were proposed in this pull request?

The str of CapaturedException is now returned by str(self.desc) rather than repr(self.desc), which is more user-friendly. It also handles unicode under python2 specially.

Why are the changes needed?

This is an improvement, and makes exception more human readable in python side.

Does this PR introduce any user-facing change?

Before this pr, select 中文字段 throws exception something likes below:

Traceback (most recent call last):
  File "/Users/advancedxy/code_workspace/github/spark/python/pyspark/sql/tests/test_utils.py", line 34, in test_capture_user_friendly_exception
    raise e
AnalysisException: u"cannot resolve '`\u4e2d\u6587\u5b57\u6bb5`' given input columns: []; line 1 pos 7;\n'Project ['\u4e2d\u6587\u5b57\u6bb5]\n+- OneRowRelation\n"

after this pr:

Traceback (most recent call last):
  File "/Users/advancedxy/code_workspace/github/spark/python/pyspark/sql/tests/test_utils.py", line 34, in test_capture_user_friendly_exception
    raise e
AnalysisException: cannot resolve '`中文字段`' given input columns: []; line 1 pos 7;
'Project ['中文字段]
+- OneRowRelation

How was this patch

Add a new test to verify unicode are correctly converted and manual checks for thrown exceptions.

This pr's credits should go to @uncleGen and is based on #17267

…endly

advancedxy · 2019-09-17T08:51:43Z

Ping @HyukjinKwon and @viirya.

I believe this should be ported before #18324, let's check this pr first.

python/pyspark/sql/tests/test_utils.py

viirya · 2019-09-17T17:35:58Z

python/pyspark/sql/utils.py

+import sys
+
+if sys.version_info.major >= 3:
+    unicode = str


do you need this? you only use unicode when version major < 3, isn't?

Looks we don't.

I thought unicode should be available when running python3. But looks like isinstance is not evaluated, and it doesn't require unicode to be in the namespace.

HyukjinKwon · 2019-09-18T11:07:16Z

ok to test

SparkQA · 2019-09-18T11:12:11Z

Test build #110903 has finished for PR 25814 at commit 3c028e9.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-18T11:15:52Z

@advancedxy please fix the style. looks good otherwise.

HyukjinKwon · 2019-09-18T11:16:12Z

python/pyspark/sql/utils.py

+        desc = self.desc
+        # encode unicode instance for python2 for human readable description
+        if sys.version_info.major < 3 and isinstance(desc, unicode):
+            return str(desc.encode('utf-8'))


Let's remove this once we drop Python 2, which will be right away after Spark 3.0 release.

SparkQA · 2019-09-18T13:57:50Z

Test build #110910 has finished for PR 25814 at commit 9e4e7e9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-18T14:31:36Z

Merged to master.

tgravescs · 2019-09-19T14:07:59Z

I'm seeing an error after this went in with unicode:

starting flake8 test...
flake8 checks failed:
./python/pyspark/sql/utils.py:31:60: F821 undefined name 'unicode'
if sys.version_info.major < 3 and isinstance(desc, unicode):
^
1 F821 undefined name 'unicode'

Where is unicode supposed to come from?

advancedxy · 2019-09-19T14:20:53Z

If it’s python2, unicode is built-in.
Otherwise, this part is never evaluated, thus irrelevant.

tgravescs · 2019-09-19T14:56:06Z

hmm, then why am I getting the error? I'm simply running the dev/lint-python script, my default system version is python 3.7.3. it seems to be checking for unicode package still. Its not clear to me which version Jenkins is using. I see it install python 3.5.6 but that is way after the lint script runs

Have you tried this with python3?

advancedxy · 2019-09-19T15:39:34Z

Have you tried this with python3?

Yes, I ran tests with python3.7 on my local machine, but not dev/lint-python.

Looks like flake cannot infer that unicode is never used for python3. The code is correct in the runtime.

Seems we don't run lint-python on the Jenkins? @HyukjinKwon can you help confirming that?

I'd like to address this problem in #25847, the fix is simple, just
add the following in the begin:

import py4j
import sys

if sys.version_info.major >= 3:
    unicode = str

Does that sound reasonable to you @HyukjinKwon ? Or I should submit a follow up pr for this problem? Also cc @viirya

tgravescs · 2019-09-19T15:54:47Z

I would like to see this fixed quickly because dev/lint-python is run by run-tests.py which if you are running with python 3 fails and thus makes my CI fail. I have to assume the Apache Spark Jenkins is using python 2 at that point. If #25847 is going to be committed today I'm ok with it, otherwise I think it should be separate pr

viirya · 2019-09-19T15:56:49Z

I'm fine either way. Looks #25847 was approved. It might be merged soon.

advancedxy · 2019-09-19T16:12:57Z

I have updated a new commit to #25847, hope it would be merged soon.

viirya · 2019-09-19T16:18:56Z

I will take a look, If @HyukjinKwon is unable to merge it due to time difference.

tgravescs · 2019-09-19T16:28:54Z

thanks!

viirya · 2019-09-19T21:24:11Z

@tgravescs I have a question on #25847, and I think that @advancedxy can't response in quick due to time difference. I am not sure how many people are affected by this, but in order to unblock it, I am going to create a follow up PR to include the fix.

…from python execution in Python 2 ### What changes were proposed in this pull request? This PR allows non-ascii string as an exception message in Python 2 by explicitly en/decoding in case of `str` in Python 2. ### Why are the changes needed? Previously PySpark will hang when the `UnicodeDecodeError` occurs and the real exception cannot be passed to the JVM side. See the reproducer as below: ```python def f(): raise Exception("中") spark = SparkSession.builder.master('local').getOrCreate() spark.sparkContext.parallelize([1]).map(lambda x: f()).count() ``` ### Does this PR introduce any user-facing change? User may not observe hanging for the similar cases. ### How was this patch tested? Added a new test and manually checking. This pr is based on #18324, credits should also go to dataknocker. To make lint-python happy for python3, it also includes a followup fix for #25814 Closes #25847 from advancedxy/python_exception_19926_and_21045. Authored-by: Xianjin YE <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

[SPARK-19926][PYSPARK] make captured exception from JVM side user fri…

3ff3ea4

…endly

dongjoon-hyun added the PYSPARK label Sep 17, 2019

viirya reviewed Sep 17, 2019

View reviewed changes

python/pyspark/sql/tests/test_utils.py Outdated Show resolved Hide resolved

viirya reviewed Sep 17, 2019

View reviewed changes

advancedxy added 2 commits September 18, 2019 10:10

Address comments

02988b1

Remove unused imports

3c028e9

HyukjinKwon approved these changes Sep 18, 2019

View reviewed changes

HyukjinKwon reviewed Sep 18, 2019

View reviewed changes

Fix imports

9e4e7e9

HyukjinKwon closed this in 203bf9e Sep 18, 2019

advancedxy deleted the python_exception_19926_and_21045 branch September 18, 2019 14:39

advancedxy mentioned this pull request Sep 19, 2019

[SPARK-21045][PYTHON] Allow non-ascii string as an exception message from python execution in Python 2 #25847

Closed

viirya mentioned this pull request Sep 19, 2019

[SPARK-19926][PYSPARK][Followup] Add unicode for flake8 check #25858

Closed

[SPARK-19926][PYSPARK] make captured exception from JVM side user friendly #25814

[SPARK-19926][PYSPARK] make captured exception from JVM side user friendly #25814

Uh oh!

Conversation

advancedxy commented Sep 17, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch

Uh oh!

advancedxy commented Sep 17, 2019

Uh oh!

Uh oh!

viirya Sep 17, 2019

Choose a reason for hiding this comment

Uh oh!

advancedxy Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Sep 18, 2019

Uh oh!

SparkQA commented Sep 18, 2019

Uh oh!

HyukjinKwon commented Sep 18, 2019

Uh oh!

HyukjinKwon Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

advancedxy Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 18, 2019

Uh oh!

HyukjinKwon commented Sep 18, 2019

Uh oh!

tgravescs commented Sep 19, 2019

Uh oh!

advancedxy commented Sep 19, 2019

Uh oh!

tgravescs commented Sep 19, 2019

Uh oh!

advancedxy commented Sep 19, 2019

Uh oh!

tgravescs commented Sep 19, 2019

Uh oh!

viirya commented Sep 19, 2019

Uh oh!

advancedxy commented Sep 19, 2019

Uh oh!

viirya commented Sep 19, 2019

Uh oh!

tgravescs commented Sep 19, 2019

Uh oh!

viirya commented Sep 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants