Skip to content

Conversation

@xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Jan 10, 2024

What changes were proposed in this pull request?

When using pandas UDFs with iterators, if users enable the profiling spark conf, a warning indicating non-support should be raised, and profiling should be disabled.

However, currently, after raising the not-supported warning, the memory profiler is still being enabled.

The PR proposed to fix that.

Why are the changes needed?

A bug fix to eliminate misleading behavior.

Does this PR introduce any user-facing change?

The noticeable changes will affect only those using the PySpark shell. This is because, in the PySpark shell, the memory profiler will raise an error, which in turn blocks the execution of the UDF.

How was this patch tested?

Manual test.

Was this patch authored or co-authored using generative AI tooling?

Setup:

$ ./bin/pyspark --conf spark.python.profile=true

>>> from typing import Iterator
>>> from pyspark.sql.functions import *
>>> import pandas as pd
>>> @pandas_udf("long")
... def plus_one(iterator: Iterator[pd.Series]) -> Iterator[pd.Series]:
...     for s in iterator:
...         yield s + 1
... 
>>> df = spark.createDataFrame(pd.DataFrame([1, 2, 3], columns=["v"]))

Before:

>>> df.select(plus_one(df.v)).show()
UserWarning: Profiling UDFs with iterators input/output is not supported.
Traceback (most recent call last):
...
OSError: could not get source code

After:

>>> df.select(plus_one(df.v)).show()
/Users/xinrong.meng/spark/python/pyspark/sql/udf.py:417: UserWarning: Profiling UDFs with iterators input/output is not supported.
+-----------+                                                                   
|plus_one(v)|
+-----------+
|          2|
|          3|
|          4|
+-----------+

@xinrong-meng xinrong-meng changed the title Disable memory profiler for iterator UDFs [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators Jan 10, 2024
@xinrong-meng xinrong-meng marked this pull request as ready for review January 11, 2024 19:26
@xinrong-meng
Copy link
Member Author

@ueshin @HyukjinKwon @zhengruifeng may I get a review please?

@xinrong-meng xinrong-meng requested a review from ueshin January 11, 2024 19:27
Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for this?
Otherwise, LGTM.

@xinrong-meng
Copy link
Member Author

Thanks all! Merged to master, will do manual cherry-pick for branch-3.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants