-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19926][PYSPARK] make captured exception from JVM side user friendly #25814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19926][PYSPARK] make captured exception from JVM side user friendly #25814
Conversation
|
Ping @HyukjinKwon and @viirya. I believe this should be ported before #18324, let's check this pr first. |
python/pyspark/sql/utils.py
Outdated
| import sys | ||
|
|
||
| if sys.version_info.major >= 3: | ||
| unicode = str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this? you only use unicode when version major < 3, isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks we don't.
I thought unicode should be available when running python3. But looks like isinstance is not evaluated, and it doesn't require unicode to be in the namespace.
|
ok to test |
|
Test build #110903 has finished for PR 25814 at commit
|
|
@advancedxy please fix the style. looks good otherwise. |
| desc = self.desc | ||
| # encode unicode instance for python2 for human readable description | ||
| if sys.version_info.major < 3 and isinstance(desc, unicode): | ||
| return str(desc.encode('utf-8')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this once we drop Python 2, which will be right away after Spark 3.0 release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
|
Test build #110910 has finished for PR 25814 at commit
|
|
Merged to master. |
|
I'm seeing an error after this went in with unicode: starting flake8 test... Where is unicode supposed to come from? |
|
If it’s python2, unicode is built-in. |
|
hmm, then why am I getting the error? I'm simply running the dev/lint-python script, my default system version is python 3.7.3. it seems to be checking for unicode package still. Its not clear to me which version Jenkins is using. I see it install python 3.5.6 but that is way after the lint script runs Have you tried this with python3? |
Yes, I ran tests with python3.7 on my local machine, but not Looks like flake cannot infer that unicode is never used for python3. The code is correct in the runtime. Seems we don't run lint-python on the Jenkins? @HyukjinKwon can you help confirming that? I'd like to address this problem in #25847, the fix is simple, just Does that sound reasonable to you @HyukjinKwon ? Or I should submit a follow up pr for this problem? Also cc @viirya |
|
I would like to see this fixed quickly because dev/lint-python is run by run-tests.py which if you are running with python 3 fails and thus makes my CI fail. I have to assume the Apache Spark Jenkins is using python 2 at that point. If #25847 is going to be committed today I'm ok with it, otherwise I think it should be separate pr |
|
I'm fine either way. Looks #25847 was approved. It might be merged soon. |
|
I have updated a new commit to #25847, hope it would be merged soon. |
|
I will take a look, If @HyukjinKwon is unable to merge it due to time difference. |
|
thanks! |
|
@tgravescs I have a question on #25847, and I think that @advancedxy can't response in quick due to time difference. I am not sure how many people are affected by this, but in order to unblock it, I am going to create a follow up PR to include the fix. |
…from python execution in Python 2
### What changes were proposed in this pull request?
This PR allows non-ascii string as an exception message in Python 2 by explicitly en/decoding in case of `str` in Python 2.
### Why are the changes needed?
Previously PySpark will hang when the `UnicodeDecodeError` occurs and the real exception cannot be passed to the JVM side.
See the reproducer as below:
```python
def f():
raise Exception("中")
spark = SparkSession.builder.master('local').getOrCreate()
spark.sparkContext.parallelize([1]).map(lambda x: f()).count()
```
### Does this PR introduce any user-facing change?
User may not observe hanging for the similar cases.
### How was this patch tested?
Added a new test and manually checking.
This pr is based on #18324, credits should also go to dataknocker.
To make lint-python happy for python3, it also includes a followup fix for #25814
Closes #25847 from advancedxy/python_exception_19926_and_21045.
Authored-by: Xianjin YE <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
The str of
CapaturedExceptionis now returned by str(self.desc) rather than repr(self.desc), which is more user-friendly. It also handles unicode under python2 specially.Why are the changes needed?
This is an improvement, and makes exception more human readable in python side.
Does this PR introduce any user-facing change?
Before this pr, select
中文字段throws exception something likes below:after this pr:
How was this patch
Add a new test to verify unicode are correctly converted and manual checks for thrown exceptions.
This pr's credits should go to @uncleGen and is based on #17267