Skip to content

[SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging#43732

Closed
grundprinzip wants to merge 2 commits intoapache:masterfrom
grundprinzip:SPARK-45852
Closed

[SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging#43732
grundprinzip wants to merge 2 commits intoapache:masterfrom
grundprinzip:SPARK-45852

Conversation

@grundprinzip
Copy link
Contributor

What changes were proposed in this pull request?

The Python client for Spark connect logs the text representation of the proto message. However, for deeply nested objects this can lead to a Python recursion error even before the maximum nested recursion limit of the GRPC message is reached.

This patch fixes this issue by explicitly catching the recursion error during text conversion.

Why are the changes needed?

Stability

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

Was this patch authored or co-authored using generative AI tooling?

No

@HyukjinKwon
Copy link
Member

Merged to master.



class SparkConnectBasicTests(SparkConnectSQLTestCase):
def test_recursion_handling_for_plan_logging(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @grundprinzip and @HyukjinKwon .

This test case seems to fail with Python 3.11. SPARK-45987 is filed for Python 3.11 failure.


# Calling schema will trigger logging the message that will in turn trigger the message
# conversion into protobuf that will then trigger the recursion error.
self.assertIsNotNone(cdf.schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure happens here.

ERROR [3.874s]: test_recursion_handling_for_plan_logging (pyspark.sql.tests.connect.test_connect_basic.SparkConnectBasicTests.test_recursion_handling_for_plan_logging)
SPARK-45852 - Test that we can handle recursion in plan logging.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 171, in test_recursion_handling_for_plan_logging
    self.assertIsNotNone(cdf.schema)
                         ^^^^^^^^^^
  File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1735, in schema
    return self._session.client.schema(query)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 924, in schema
    schema = self._analyze(method="schema", plan=plan).schema
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1110, in _analyze
    self._handle_error(error)
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1499, in _handle_error
    self._handle_rpc_error(error)
  File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1570, in _handle_rpc_error
    raise SparkConnectGrpcException(str(rpc_error)) from None
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "Exception serializing request!"
	debug_error_string = "None"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@dongjoon-hyun
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants