-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34214][SQL][PYTHON] Expose regexp_extract_all in PySpark #31306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
python/pyspark/sql/functions.py
Outdated
| r"""Extract all matches of the given group in a regex, from the specified string column. | ||
| If the regex did not match, or the specified group did not match, an empty array is returned. | ||
| .. versionadded:: 3.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's target 3.2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought this was the version where the Scala function was introduced. Changed.
|
Actually, why is |
|
@HyukjinKwon According the description, it seems good for remove the Scala function version. Although I think is |
|
Do you wish to remove support for |
|
No, I meant to keep |
|
@beliefer, can you make a PR to remove this in |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop this. I think we shouldn't expose this as DSL at the first place. This is more for SQL compliance, and you can use it via expr anyway.
OK |
|
@HyukjinKwon I created #31346 |
What changes were proposed in this pull request?
This PR implements SPARK-34214, by exposing the already existing
regexp_extract_allSQL function in the PySpark API.Why are the changes needed?
Please refer to SPARK-24884 for why this function is useful. This PR merely exposes it to the PySpark API, for added consistency and greater availability for users.
Does this PR introduce any user-facing change?
Yes, a new function is made available in the PySpark API. The associated docstring is included. Also I tweaked the description of the original
regexp_extractfunction to highlight how its behaviour differs from that ofregexp_extract_all.How was this patch tested?
I tested it locally in a pyspark console session. I didn't find any tests for
regexp_extract, so I didn't add any for the new function, but happy to do so if desired.