Skip to content

Commit 00cb2f9

Browse files
committed
[SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas with Arrow optimization throws an exception per maxResultSize
### What changes were proposed in this pull request? This PR proposes to add a test case for: ```bash ./bin/pyspark --conf spark.driver.maxResultSize=1m spark.conf.set("spark.sql.execution.arrow.enabled",True) ``` ```python spark.range(10000000).toPandas() ``` ``` Empty DataFrame Columns: [id] Index: [] ``` which can result in partial results (see #25593 (comment)). This regression was found between Spark 2.3 and Spark 2.4, and accidentally fixed. ### Why are the changes needed? To prevent the same regression in the future. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Test was added. Closes #25594 from HyukjinKwon/SPARK-28881. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent ab1819d commit 00cb2f9

File tree

1 file changed

+30
-1
lines changed

1 file changed

+30
-1
lines changed

python/pyspark/sql/tests/test_arrow.py

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
import unittest
2323
import warnings
2424

25-
from pyspark.sql import Row
25+
from pyspark.sql import Row, SparkSession
2626
from pyspark.sql.functions import udf
2727
from pyspark.sql.types import *
2828
from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, have_pyarrow, \
@@ -421,6 +421,35 @@ def run_test(num_records, num_parts, max_records, use_delay=False):
421421
run_test(*case)
422422

423423

424+
@unittest.skipIf(
425+
not have_pandas or not have_pyarrow,
426+
pandas_requirement_message or pyarrow_requirement_message)
427+
class MaxResultArrowTests(unittest.TestCase):
428+
# These tests are separate as 'spark.driver.maxResultSize' configuration
429+
# is a static configuration to Spark context.
430+
431+
@classmethod
432+
def setUpClass(cls):
433+
cls.spark = SparkSession.builder \
434+
.master("local[4]") \
435+
.appName(cls.__name__) \
436+
.config("spark.driver.maxResultSize", "10k") \
437+
.getOrCreate()
438+
439+
# Explicitly enable Arrow and disable fallback.
440+
cls.spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
441+
cls.spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", "false")
442+
443+
@classmethod
444+
def tearDownClass(cls):
445+
if hasattr(cls, "spark"):
446+
cls.spark.stop()
447+
448+
def test_exception_by_max_results(self):
449+
with self.assertRaisesRegexp(Exception, "is bigger than"):
450+
self.spark.range(0, 10000, 1, 100).toPandas()
451+
452+
424453
class EncryptionArrowTests(ArrowTests):
425454

426455
@classmethod

0 commit comments

Comments
 (0)