[SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling #45378

xinrong-meng · 2024-03-05T01:17:43Z

What changes were proposed in this pull request?

Introduce spark.profile.clear for SparkSession-based profiling.

Why are the changes needed?

A straightforward and unified interface for managing and resetting profiling results for SparkSession-based profilers.

Does this PR introduce any user-facing change?

Yes. spark.profile.clear is supported as shown below.

Preparation:

>>> from pyspark.sql.functions import pandas_udf
>>> df = spark.range(3)
>>> @pandas_udf("long")
... def add1(x):
...   return x + 1
... 
>>> added = df.select(add1("id"))
>>> spark.conf.set("spark.sql.pyspark.udf.profiler", "perf")
>>> added.show()
+--------+                                                                      
|add1(id)|
+--------+
...
+--------+
>>> spark.profile.show()
============================================================
Profile of UDF<id=2>
============================================================
         1410 function calls (1374 primitive calls) in 0.004 seconds
...

Example usage:

>>> spark.profile.profiler_collector._profile_results
{2: (<pstats.Stats object at 0x7ff6484d22e0>, None)}

>>> spark.profile.clear(1)  # id mismatch
>>> spark.profile.profiler_collector._profile_results
{2: (<pstats.Stats object at 0x7ff6484d22e0>, None)}

>>> spark.profile.clear(type="memory")  # type mismatch
>>> spark.profile.profiler_collector._profile_results
{2: (<pstats.Stats object at 0x7ff6484d22e0>, None)}

>>> spark.profile.clear()  # clear all
>>> spark.profile.profiler_collector._profile_results
{}
>>> spark.profile.show()
>>>

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

xinrong-meng · 2024-03-05T21:51:17Z

Failed tests are irrelevant to changes proposed in this PR. Rerun failed tests https://github.com/xinrong-meng/spark/actions/runs/8162084262.

HyukjinKwon · 2024-03-06T01:23:32Z

python/pyspark/sql/profiler.py

+        """
+        Clear the perf profile results.
+
+        .. versionadded:: 4.0.0


Is this a user-facing API? If not, we don't need this version directive

It is a user-facing API, along with profile.show and profile.dump. We will also add it to API doc.

Actually this is not. The clear in Profile should be a user-facing API.

HyukjinKwon

Seems fine but cc @ueshin

python/pyspark/sql/profiler.py

ueshin

Otherwise, LGTM, pending tests.

python/pyspark/sql/profiler.py

zhengruifeng · 2024-03-07T04:03:36Z

python/pyspark/sql/tests/test_session.py

            },
        )

+    def test_clear_memory_type(self):


nit, it seems we don't have a parity test for test_session. does it make sense to move SparkSessionProfileTests out of test_session and add parity test for it?

Good idea!

For now, all logic tested by SparkSessionProfileTests is directly imported in Spark Connect with no modification. But I do agree separating it later will improve readability and ensure future parity. I'll refactor later. Thanks!

xinrong-meng · 2024-03-07T21:22:06Z

Merged to master, thank you all!

xinrong-meng added 2 commits March 4, 2024 16:50

clear

52ccdde

mock test

2e05db8

github-actions bot added SQL PYTHON labels Mar 5, 2024

xinrong-meng added 2 commits March 5, 2024 11:43

fix clear

4b441ae

test

c3980bb

github-actions bot added the CORE label Mar 5, 2024

xinrong-meng changed the title ~~[WIP] Introduce spark.profile.clear for SparkSession-based profiling~~ [SPARK-47276][PYTHON][CONNECT] Introduce spark.profile.clear for SparkSession-based profiling Mar 5, 2024

xinrong-meng mentioned this pull request Mar 5, 2024

[SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers #45269

Closed

xinrong-meng marked this pull request as ready for review March 5, 2024 21:50

HyukjinKwon reviewed Mar 6, 2024

View reviewed changes

ueshin reviewed Mar 6, 2024

View reviewed changes

python/pyspark/sql/profiler.py Outdated Show resolved Hide resolved

fix + test

5881718

ueshin approved these changes Mar 6, 2024

View reviewed changes

python/pyspark/sql/profiler.py Outdated Show resolved Hide resolved

python/pyspark/sql/profiler.py Outdated Show resolved Hide resolved

fix

dc5c6ca

zhengruifeng reviewed Mar 7, 2024

View reviewed changes

fix

3de5afb

xinrong-meng closed this in 501999a Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling #45378

[SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling #45378

Uh oh!

xinrong-meng commented Mar 5, 2024 •

edited

Loading

Uh oh!

xinrong-meng commented Mar 5, 2024

Uh oh!

HyukjinKwon Mar 6, 2024

Uh oh!

xinrong-meng Mar 6, 2024

Uh oh!

ueshin Mar 7, 2024

Uh oh!

HyukjinKwon left a comment

Uh oh!

Uh oh!

ueshin left a comment

Uh oh!

Uh oh!

Uh oh!

zhengruifeng Mar 7, 2024

Uh oh!

xinrong-meng Mar 7, 2024 •

edited

Loading

Uh oh!

xinrong-meng commented Mar 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-47276][PYTHON][CONNECT] Introduce spark.profile.clear for SparkSession-based profiling #45378

[SPARK-47276][PYTHON][CONNECT] Introduce spark.profile.clear for SparkSession-based profiling #45378

Uh oh!

Conversation

xinrong-meng commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

xinrong-meng commented Mar 5, 2024

Uh oh!

HyukjinKwon Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

xinrong-meng Mar 6, 2024

Choose a reason for hiding this comment

Uh oh!

ueshin Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ueshin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhengruifeng Mar 7, 2024

Choose a reason for hiding this comment

Uh oh!

xinrong-meng Mar 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xinrong-meng commented Mar 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling #45378

[SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling #45378

xinrong-meng commented Mar 5, 2024 •

edited

Loading

xinrong-meng Mar 7, 2024 •

edited

Loading