[SPARK-33809][SQL] Support `all_columns_except` complex column expression #30805

AngersZhuuuu · 2020-12-16T14:49:40Z

What changes were proposed in this pull request?

We are easy to drop some column when use DF/DS api drop but difficult when use SQL when there are so many columns.
Add a function to support this:

 usage = """
    _FUNC_(expr*) - Returns the columns except columns given in function parameters.
  """,
  examples = """
    Examples:
      > select * FROM TBL;
       A B C
       1 2 3
      > SELECT _FUNC_(a, b) FROM TBL;
       C
       3

Idea from a Chinese blog
https://mp.weixin.qq.com/s?src=11&timestamp=1608729276&ver=2784&signature=IMj38*tURJObyQv8z84ELR-Vfkw2becwRfzQ5*9p0X7ilIp-AJVmIFUGOHOr6DcEF5syqfEhiQHa87ZAdzQJHWmy7FjjvTYdEldLkc9GsqrOELpLQrQ5CVG2KdK0ZCgP&new=1

Why are the changes needed?

Provide a way to handle drop column in SQL

Does this PR introduce any user-facing change?

Yes, user can use all_columns_except func in projection list to handle complex drop column scenario

How was this patch tested?

Added UT

…sion function

…sion

SparkQA · 2020-12-16T15:22:05Z

Test build #132893 has finished for PR 30805 at commit 7f98b93.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AllColumnsExcept(exclude: Expression*) extends Expression with CodegenFallback

SparkQA · 2020-12-16T16:04:11Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37495/

SparkQA · 2020-12-16T16:40:26Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37495/

HyukjinKwon · 2020-12-17T02:14:35Z

@AngersZhuuuu, do you have any reference of all_columns_except function in other DBMSes?

AngersZhuuuu · 2020-12-17T02:24:07Z

@AngersZhuuuu, do you have any reference of all_columns_except function in other DBMSes?

https://stackoverflow.com/questions/729197/sql-exclude-a-column-using-select-except-columna-from-tablea
https://dataschool.com/learn-sql/exclude-a-column/
https://www.postgresonline.com/journal/archives/41-How-to-SELECT--ALL-EXCEPT-some-columns-in-a-table.html

Didn't find any DBMS support such feature yet, but I think many people need this feature. And it make our user convenient to write sql easier and convert DF/DS api to SQL.

wangyum · 2020-12-17T03:09:55Z

How about?

set spark.sql.parser.quotedRegexColumnNames=true;

EXPLAIN SELECT `(c_customer_sk|c_customer_id|c_current_cdemo_sk|c_current_hdemo_sk|c_current_addr_sk|c_first_shipto_date_sk|c_first_sales_date_sk)?+.+` FROM customer;

== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- Project [c_salutation#1604, c_first_name#1605, c_last_name#1606, c_preferred_cust_flag#1607, c_birth_day#1608, c_birth_month#1609, c_birth_year#1610, c_birth_country#1611, c_login#1612, c_email_address#1613, c_last_review_date#1614]
   +- FileScan parquet carmel_tpcds5t.customer

EXPLAIN SELECT * FROM customer;
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- FileScan parquet carmel_tpcds5t.customer[c_customer_sk#1597, c_customer_id#1598, c_current_cdemo_sk#1599, c_current_hdemo_sk#1600, c_current_addr_sk#1601, c_first_shipto_date_sk#1602, c_first_sales_date_sk#1603, c_salutation#1604, c_first_name#1605, c_last_name#1606, c_preferred_cust_flag#1607, c_birth_day#1608, c_birth_month#1609, c_birth_year#1610, c_birth_country#1611, c_login#1612, c_email_address#1613, c_last_review_date#1614] Batched: true, DataFilters: [], Format: Parquet,

wangyum · 2020-12-17T03:24:19Z

Can we add the usage of spark.sql.parser.quotedRegexColumnNames to document?

AngersZhuuuu · 2020-12-17T03:24:59Z

quotedRegexColumnNames

I think it's necessary. Very useful

SparkQA · 2020-12-17T04:51:40Z

Test build #132913 has finished for PR 30805 at commit aedf260.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

da-tubi · 2020-12-23T13:25:19Z

@AngersZhuuuu
It would be better if you mention about the post where the original idea of all_columns_except comes from:

https://mp.weixin.qq.com/s?src=11&timestamp=1608729276&ver=2784&signature=IMj38*tURJObyQv8z84ELR-Vfkw2becwRfzQ5*9p0X7ilIp-AJVmIFUGOHOr6DcEF5syqfEhiQHa87ZAdzQJHWmy7FjjvTYdEldLkc9GsqrOELpLQrQ5CVG2KdK0ZCgP&new=1

Anyway, thanks for your work to bring it to Apache Spark.

@wangyum
It's OK to use quotedRegexColumnNames, but the syntax is a bit weird!

AngersZhuuuu · 2020-12-23T14:24:54Z

@AngersZhuuuu
It would be better if you mention about the post where the original idea of all_columns_except comes from:

https://mp.weixin.qq.com/s?src=11&timestamp=1608729276&ver=2784&signature=IMj38*tURJObyQv8z84ELR-Vfkw2becwRfzQ5*9p0X7ilIp-AJVmIFUGOHOr6DcEF5syqfEhiQHa87ZAdzQJHWmy7FjjvTYdEldLkc9GsqrOELpLQrQ5CVG2KdK0ZCgP&new=1

Anyway, thanks for your work to bring it to Apache Spark.

@wangyum
It's OK to use quotedRegexColumnNames, but the syntax is a bit weird!

Since it's a Chinese blog, so I didn't put it in pr desc. Updated to the pr desc

…edRegexColumnNames` in the SQL documents ### What changes were proposed in this pull request? According to #30805 (comment), doc `spark.sql.parser.quotedRegexColumnNames` since we need user know about this in doc and it's useful. ![image](https://user-images.githubusercontent.com/46485123/103656543-afa4aa80-4fa3-11eb-8cd3-a9d1b87a3489.png) ![image](https://user-images.githubusercontent.com/46485123/103656551-b2070480-4fa3-11eb-9ce7-95cc424242a6.png) ### Why are the changes needed? Complete doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #30816 from AngersZhuuuu/SPARK-33818. Authored-by: angerszhu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…edRegexColumnNames` in the SQL documents ### What changes were proposed in this pull request? According to #30805 (comment), doc `spark.sql.parser.quotedRegexColumnNames` since we need user know about this in doc and it's useful. ![image](https://user-images.githubusercontent.com/46485123/103656543-afa4aa80-4fa3-11eb-8cd3-a9d1b87a3489.png) ![image](https://user-images.githubusercontent.com/46485123/103656551-b2070480-4fa3-11eb-9ce7-95cc424242a6.png) ### Why are the changes needed? Complete doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #30816 from AngersZhuuuu/SPARK-33818. Authored-by: angerszhu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 9b54da4) Signed-off-by: Dongjoon Hyun <[email protected]>

…edRegexColumnNames` in the SQL documents ### What changes were proposed in this pull request? According to apache#30805 (comment), doc `spark.sql.parser.quotedRegexColumnNames` since we need user know about this in doc and it's useful. ![image](https://user-images.githubusercontent.com/46485123/103656543-afa4aa80-4fa3-11eb-8cd3-a9d1b87a3489.png) ![image](https://user-images.githubusercontent.com/46485123/103656551-b2070480-4fa3-11eb-9ce7-95cc424242a6.png) ### Why are the changes needed? Complete doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes apache#30816 from AngersZhuuuu/SPARK-33818. Authored-by: angerszhu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

AngersZhuuuu added 2 commits December 16, 2020 22:42

[SPARK-33809][SQL] Support all_columns_except complex column expres…

e0bd1c5

…sion function

[SPARK-33809][SQL] Support all_columns_except complex column expres…

7f98b93

…sion

AngersZhuuuu marked this pull request as draft December 16, 2020 14:49

github-actions bot added the SQL label Dec 16, 2020

AngersZhuuuu added 3 commits December 17, 2020 09:39

Update Analyzer.scala

a056cba

Update Analyzer.scala

3981b45

Merge branch 'master' into column_exclude

aedf260

AngersZhuuuu marked this pull request as ready for review December 17, 2020 03:05

AngersZhuuuu mentioned this pull request Dec 17, 2020

[SPARK-33818][SQL][DOC] Add descriptions about spark.sql.parser.quotedRegexColumnNames in the SQL documents #30816

Closed

AngersZhuuuu closed this Feb 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-33809][SQL] Support `all_columns_except` complex column expression #30805

[SPARK-33809][SQL] Support `all_columns_except` complex column expression #30805

Uh oh!

AngersZhuuuu commented Dec 16, 2020 •

edited

Loading

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

HyukjinKwon commented Dec 17, 2020

Uh oh!

AngersZhuuuu commented Dec 17, 2020 •

edited

Loading

Uh oh!

wangyum commented Dec 17, 2020 •

edited

Loading

Uh oh!

wangyum commented Dec 17, 2020

Uh oh!

AngersZhuuuu commented Dec 17, 2020

Uh oh!

SparkQA commented Dec 17, 2020

Uh oh!

da-tubi commented Dec 23, 2020

Uh oh!

AngersZhuuuu commented Dec 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-33809][SQL] Support all_columns_except complex column expression #30805

[SPARK-33809][SQL] Support all_columns_except complex column expression #30805

Uh oh!

Conversation

AngersZhuuuu commented Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

SparkQA commented Dec 16, 2020

Uh oh!

HyukjinKwon commented Dec 17, 2020

Uh oh!

AngersZhuuuu commented Dec 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyum commented Dec 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyum commented Dec 17, 2020

Uh oh!

AngersZhuuuu commented Dec 17, 2020

Uh oh!

SparkQA commented Dec 17, 2020

Uh oh!

da-tubi commented Dec 23, 2020

Uh oh!

AngersZhuuuu commented Dec 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-33809][SQL] Support `all_columns_except` complex column expression #30805

[SPARK-33809][SQL] Support `all_columns_except` complex column expression #30805

AngersZhuuuu commented Dec 16, 2020 •

edited

Loading

AngersZhuuuu commented Dec 17, 2020 •

edited

Loading

wangyum commented Dec 17, 2020 •

edited

Loading