Skip to content

Conversation

@beliefer
Copy link
Contributor

What changes were proposed in this pull request?

#27507 implements regexp_extract_all and added the scala function version of it.
According https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L41-L59, it seems good for remove the scala function version. Although I think is regexp_extract_all is very useful, if we just reference the description.

Why are the changes needed?

regexp_extract_all is less common.

Does this PR introduce any user-facing change?

'No'. regexp_extract_all was added in Spark 3.1.0 which isn't released yet.

How was this patch tested?

Jenkins test.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beliefer. Looks good, pending tests.

@SparkQA
Copy link

SparkQA commented Jan 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39089/

@SparkQA
Copy link

SparkQA commented Jan 26, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39089/

@HyukjinKwon
Copy link
Member

@cloud-fan WDYT?

@SparkQA
Copy link

SparkQA commented Jan 26, 2021

Test build #134503 has finished for PR 31346 at commit 6a2d570.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @beliefer , @HyukjinKwon , @cloud-fan .

dongjoon-hyun pushed a commit that referenced this pull request Jan 26, 2021
…t_all

### What changes were proposed in this pull request?
#27507 implements `regexp_extract_all` and added the scala function version of it.
According https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L41-L59, it seems good for remove the scala function version. Although I think is regexp_extract_all is very useful, if we just reference the description.

### Why are the changes needed?
`regexp_extract_all` is less common.

### Does this PR introduce _any_ user-facing change?
'No'. `regexp_extract_all` was added in Spark 3.1.0 which isn't released yet.

### How was this patch tested?
Jenkins test.

Closes #31346 from beliefer/SPARK-24884-followup.

Authored-by: beliefer <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 99b6af2)
Signed-off-by: Dongjoon Hyun <[email protected]>
@beliefer
Copy link
Contributor Author

@dongjoon-hyun @HyukjinKwon @cloud-fan Thanks!

@MrPowers
Copy link
Contributor

Can someone help me understand why this was removed? I added this method to spark-daria a while back to fill the gap for the Spark community. Think the arguments for removing were cause this method was for SQL compliance and that it's accessible via expr.

I've needed this method a lot in practical projects. It's a "must have" not a nice to have for a lot of analyses. Invoking the method via expr isn't typically practical cause Scala API users have column objects, not strings.

Can we just implement all the SQL methods in Scala and Python so we have API consistency?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jan 27, 2021

Could you send your opinion on the dev mailing list, @MrPowers ? Specifically, the following. Thanks!

Can we just implement all the SQL methods in Scala and Python so we have API consistency?

@HyukjinKwon
Copy link
Member

Yes, this looks something we should discuss in the mailing list.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 27, 2021

But I don't personally agree with that. Each language specific APIs should better focus on being the language friendly. Many of them are SQL specific and doesn't make much sense in other languages time to time. Also we have an API expr that allows users to use them too.

BTW, we should also think about R side APIs too not only Scala and Python.

skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…t_all

### What changes were proposed in this pull request?
apache#27507 implements `regexp_extract_all` and added the scala function version of it.
According https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L41-L59, it seems good for remove the scala function version. Although I think is regexp_extract_all is very useful, if we just reference the description.

### Why are the changes needed?
`regexp_extract_all` is less common.

### Does this PR introduce _any_ user-facing change?
'No'. `regexp_extract_all` was added in Spark 3.1.0 which isn't released yet.

### How was this patch tested?
Jenkins test.

Closes apache#31346 from beliefer/SPARK-24884-followup.

Authored-by: beliefer <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@leleogere
Copy link

Can we just implement all the SQL methods in Scala and Python so we have API consistency?

I very much agree with that. Having language API for all SQL functions makes developing applications in IDE way easier and more user-friendly (typing, autocompletion, less error-prone than using strings...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants