-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40951][PYSPARK][TESTS] pyspark-connect tests should be skipped if pandas doesn't exist
#38426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… if pandas doesn't exist
pyspark-connect tests should be skipped if pandas doesn't exist
|
Could you review this, @HyukjinKwon ? |
|
LGTM |
|
Thank you, @zhengruifeng . I'll fix |
|
The last commit only changes the annotation and I verified it once more manually. Merged to master for Apache Spark 3.4. |
|
Late LGTM. Thanks for this improvement! |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @grundprinzip FYI
|
Thank you, @HyukjinKwon . |
…ed if `pandas` doesn't exist
### What changes were proposed in this pull request?
This PR aims to skip `pyspark-connect` unit tests when `pandas` is unavailable.
### Why are the changes needed?
**BEFORE**
```
% python/run-tests --modules pyspark-connect
Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['python3.9']
Will test the following Python modules: ['pyspark-connect']
python3.9 python_implementation is CPython
python3.9 version is: Python 3.9.15
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_plan_only (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/f14573f1-131f-494a-a015-8b4762219fb5/python3.9__pyspark.sql.tests.connect.test_connect_plan_only__86sd4pxg.log)
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_column_expressions (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/51391499-d21a-4c1d-8b79-6ac52859a4c9/python3.9__pyspark.sql.tests.connect.test_connect_column_expressions__kn__9aur.log)
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_basic (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/7854cbef-e40d-4090-a37d-5a5314eb245f/python3.9__pyspark.sql.tests.connect.test_connect_basic__i1rutevd.log)
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_select_ops (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/6f947453-7481-4891-81b0-169aaac8c6ee/python3.9__pyspark.sql.tests.connect.test_connect_select_ops__5sxao0ji.log)
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python3.9/3.9.15/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Cellar/python3.9/3.9.15/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/sql/tests/connect/test_connect_basic.py", line 22, in <module>
import pandas
ModuleNotFoundError: No module named 'pandas'
```
**AFTER**
```
% python/run-tests --modules pyspark-connect
Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['python3.9']
Will test the following Python modules: ['pyspark-connect']
python3.9 python_implementation is CPython
python3.9 version is: Python 3.9.15
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_basic (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/571609c0-3070-476c-afbe-56e215eb5647/python3.9__pyspark.sql.tests.connect.test_connect_basic__4e9k__5x.log)
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_column_expressions (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/4a30d035-e392-4ad2-ac10-5d8bc5421321/python3.9__pyspark.sql.tests.connect.test_connect_column_expressions__c9x39tvp.log)
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_plan_only (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/eea0b5db-9a92-4fbb-912d-a59daaf73f8e/python3.9__pyspark.sql.tests.connect.test_connect_plan_only__0p9ivnod.log)
Starting test(python3.9): pyspark.sql.tests.connect.test_connect_select_ops (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/6069c664-afd9-4a3c-a0cc-f707577e039e/python3.9__pyspark.sql.tests.connect.test_connect_select_ops__sxzrtiqa.log)
Finished test(python3.9): pyspark.sql.tests.connect.test_connect_column_expressions (1s) ... 2 tests were skipped
Finished test(python3.9): pyspark.sql.tests.connect.test_connect_select_ops (1s) ... 2 tests were skipped
Finished test(python3.9): pyspark.sql.tests.connect.test_connect_plan_only (1s) ... 10 tests were skipped
Finished test(python3.9): pyspark.sql.tests.connect.test_connect_basic (1s) ... 6 tests were skipped
Tests passed in 1 seconds
Skipped tests in pyspark.sql.tests.connect.test_connect_basic with python3.9:
test_limit_offset (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests) ... skip (0.002s)
test_schema (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests) ... skip (0.000s)
test_simple_datasource_read (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests) ... skip (0.000s)
test_simple_explain_string (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests) ... skip (0.000s)
test_simple_read (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests) ... skip (0.000s)
test_simple_udf (pyspark.sql.tests.connect.test_connect_basic.SparkConnectTests) ... skip (0.000s)
Skipped tests in pyspark.sql.tests.connect.test_connect_column_expressions with python3.9:
test_column_literals (pyspark.sql.tests.connect.test_connect_column_expressions.SparkConnectColumnExpressionSuite) ... skip (0.000s)
test_simple_column_expressions (pyspark.sql.tests.connect.test_connect_column_expressions.SparkConnectColumnExpressionSuite) ... skip (0.000s)
Skipped tests in pyspark.sql.tests.connect.test_connect_plan_only with python3.9:
test_all_the_plans (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.002s)
test_datasource_read (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.000s)
test_deduplicate (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.001s)
test_filter (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.000s)
test_limit (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.000s)
test_offset (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.000s)
test_relation_alias (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.000s)
test_sample (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.001s)
test_simple_project (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.000s)
test_simple_udf (pyspark.sql.tests.connect.test_connect_plan_only.SparkConnectTestsPlanOnly) ... skip (0.000s)
Skipped tests in pyspark.sql.tests.connect.test_connect_select_ops with python3.9:
test_join_with_join_type (pyspark.sql.tests.connect.test_connect_select_ops.SparkConnectToProtoSuite) ... skip (0.002s)
test_select_with_columns_and_strings (pyspark.sql.tests.connect.test_connect_select_ops.SparkConnectToProtoSuite) ... skip (0.000s)
```
### Does this PR introduce _any_ user-facing change?
No. This is a test-only PR.
### How was this patch tested?
Manually run the following.
```
$ pip3 uninstall pandas
$ python/run-tests --modules pyspark-connect
```
Closes apache#38426 from dongjoon-hyun/SPARK-40951.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to skip
pyspark-connectunit tests whenpandasis unavailable.Why are the changes needed?
BEFORE
AFTER
Does this PR introduce any user-facing change?
No. This is a test-only PR.
How was this patch tested?
Manually run the following.