Commit b8100b5
[SPARK-41586][PYTHON] Introduce
### What changes were proposed in this pull request?
This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path.
To summarize, this PR includes the changes below:
- Add `python/pyspark/errors/error_classes.py` to support error class for PySpark.
- Add `ErrorClassesReader` to manage the `error_classes.py`.
- Add `PySparkException` to handle the errors generated by PySpark.
- Add `check_error` for error class testing.
This is an initial PR for introducing error framework for PySpark to facilitate the error management and provide better/consistent error messages to users.
While such an active work is being done on the [SQL side to improve error messages](https://issues.apache.org/jira/browse/SPARK-37935), so far there is no work to improve error messages in PySpark.
So, I'd expect to also initiate the effort on error message improvement for PySpark side from this PR.
Eventually, the errors massage will be shown as below, for example:
- PySpark, `PySparkException` (thrown by Python driver):
```python
>>> from pyspark.sql.functions import lit
>>> lit([df.id, df.id])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../spark/python/pyspark/sql/utils.py", line 334, in wrapped
return f(*args, **kwargs)
File ".../spark/python/pyspark/sql/functions.py", line 176, in lit
raise PySparkException(
pyspark.errors.exceptions.PySparkException: [COLUMN_IN_LIST] lit does not allow a column in a list.
```
- PySpark, `AnalysisException` (thrown by JVM side, and capture in PySpark side):
```
>>> df.unpivot("id", [], "var", "val").collect()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../spark/python/pyspark/sql/dataframe.py", line 3296, in unpivot
jdf = self._jdf.unpivotWithSeq(jids, jvals, variableColumnName, valueColumnName)
File ".../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File ".../spark/python/pyspark/sql/utils.py", line 209, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: [UNPIVOT_REQUIRES_VALUE_COLUMNS] At least one value column needs to be specified for UNPIVOT, all columns specified as ids;
'Unpivot ArraySeq(id#2L), ArraySeq(), var, [val]
+- LogicalRDD [id#2L, int#3L, double#4, str#5], false
```
- Spark, `AnalysisException`:
```scala
scala> df.select($"id").unpivot(Array($"id"), Array.empty,variableColumnName = "var", valueColumnName = "val")
org.apache.spark.sql.AnalysisException: [UNPIVOT_REQUIRES_VALUE_COLUMNS] At least one value column needs to be specified for UNPIVOT, all columns specified as ids;
'Unpivot ArraySeq(id#0L), ArraySeq(), var, [val]
+- Project [id#0L]
+- Range (0, 10, step=1, splits=Some(16))
```
**Next up** for this PR include:
- Migrate more errors into `PySparkException` across all modules (e.g, Spark Connect, pandas API on Spark...).
- Migrate more error tests into error class tests by using `check_error`.
- Define more error classes onto `error_classes.py`.
- Add documentation.
### Why are the changes needed?
Centralizing error messages & introducing identified error class provides the following benefits:
- Errors are searchable via the unique class names and properly classified.
- Reduce the cost of future maintenance for PySpark errors.
- Provide consistent & actionable error messages to users.
- Facilitates translating error messages into different languages.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Adding UTs & running the existing static analysis tools (`dev/lint-python`)
Closes #39387 from itholic/SPARK-41586.
Authored-by: itholic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>pyspark.errors and error classes for PySpark1 parent 47068db commit b8100b5
File tree
16 files changed
+398
-3
lines changed- dev
- sparktestsupport
- python
- docs/source/reference
- pyspark
- errors
- tests
- sql
- tests
- testing
16 files changed
+398
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
34 | | - | |
| 34 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
756 | 756 | | |
757 | 757 | | |
758 | 758 | | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
759 | 769 | | |
760 | 770 | | |
761 | 771 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
33 | 34 | | |
34 | 35 | | |
35 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
109 | 112 | | |
110 | 113 | | |
111 | 114 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
0 commit comments