Skip to content

Commit 879caf4

Browse files
HyukjinKwonWilli Raschkowski
authored andcommitted
[SPARK-28881][PYTHON][TESTS] Add a test to make sure toPandas with Arrow optimization throws an exception per maxResultSize
### What changes were proposed in this pull request? This PR proposes to add a test case for: ```bash ./bin/pyspark --conf spark.driver.maxResultSize=1m spark.conf.set("spark.sql.execution.arrow.enabled",True) ``` ```python spark.range(10000000).toPandas() ``` ``` Empty DataFrame Columns: [id] Index: [] ``` which can result in partial results (see apache#25593 (comment)). This regression was found between Spark 2.3 and Spark 2.4, and accidentally fixed. ### Why are the changes needed? To prevent the same regression in the future. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Test was added. Closes apache#25594 from HyukjinKwon/SPARK-28881. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent 2c4cfb1 commit 879caf4

File tree

1 file changed

+30
-1
lines changed

1 file changed

+30
-1
lines changed

python/pyspark/sql/tests/test_arrow.py

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
import unittest
2323
import warnings
2424

25-
from pyspark.sql import Row
25+
from pyspark.sql import Row, SparkSession
2626
from pyspark.sql.functions import udf
2727
from pyspark.sql.types import *
2828
from pyspark.testing.sqlutils import ReusedSQLTestCase, have_pandas, have_pyarrow, \
@@ -430,6 +430,35 @@ def run_test(num_records, num_parts, max_records, use_delay=False):
430430
run_test(*case)
431431

432432

433+
@unittest.skipIf(
434+
not have_pandas or not have_pyarrow,
435+
pandas_requirement_message or pyarrow_requirement_message)
436+
class MaxResultArrowTests(unittest.TestCase):
437+
# These tests are separate as 'spark.driver.maxResultSize' configuration
438+
# is a static configuration to Spark context.
439+
440+
@classmethod
441+
def setUpClass(cls):
442+
cls.spark = SparkSession.builder \
443+
.master("local[4]") \
444+
.appName(cls.__name__) \
445+
.config("spark.driver.maxResultSize", "10k") \
446+
.getOrCreate()
447+
448+
# Explicitly enable Arrow and disable fallback.
449+
cls.spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
450+
cls.spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", "false")
451+
452+
@classmethod
453+
def tearDownClass(cls):
454+
if hasattr(cls, "spark"):
455+
cls.spark.stop()
456+
457+
def test_exception_by_max_results(self):
458+
with self.assertRaisesRegexp(Exception, "is bigger than"):
459+
self.spark.range(0, 10000, 1, 100).toPandas()
460+
461+
433462
class EncryptionArrowTests(ArrowTests):
434463

435464
@classmethod

0 commit comments

Comments
 (0)