Skip to content

Conversation

@LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

This pr refine docstring of flatten/sequence/shuffle and add some new examples.

Why are the changes needed?

To improve PySpark documentation

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass Github Actions

Was this patch authored or co-authored using generative AI tooling?

No


>>> import pyspark.sql.functions as sf
>>> df = spark.createDataFrame([([1, 20, 3, 5],)], ['data'])
>>> df.select(sf.shuffle(df.data)).show() # doctest: +SKIP
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the execution result of shuffle is non-deterministic, only examples are shown here without actual testing. Also, I personally think that the result of shuffle should not be sorted again. If there are any other opinions, please let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me

@github-actions github-actions bot removed the INFRA label Dec 31, 2023
@LuciferYang
Copy link
Contributor Author

Merged into master. Thanks @HyukjinKwon @itholic

@LuciferYang LuciferYang deleted the SPARK-46551 branch May 1, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants