Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Dec 16, 2020

What changes were proposed in this pull request?

We are easy to drop some column when use DF/DS api drop but difficult when use SQL when there are so many columns.
Add a function to support this:

 usage = """
    _FUNC_(expr*) - Returns the columns except columns given in function parameters.
  """,
  examples = """
    Examples:
      > select * FROM TBL;
       A B C
       1 2 3
      > SELECT _FUNC_(a, b) FROM TBL;
       C
       3

Idea from a Chinese blog
https://mp.weixin.qq.com/s?src=11&timestamp=1608729276&ver=2784&signature=IMj38*tURJObyQv8z84ELR-Vfkw2becwRfzQ5*9p0X7ilIp-AJVmIFUGOHOr6DcEF5syqfEhiQHa87ZAdzQJHWmy7FjjvTYdEldLkc9GsqrOELpLQrQ5CVG2KdK0ZCgP&new=1

Why are the changes needed?

Provide a way to handle drop column in SQL

Does this PR introduce any user-facing change?

Yes, user can use all_columns_except func in projection list to handle complex drop column scenario

How was this patch tested?

Added UT

@AngersZhuuuu AngersZhuuuu marked this pull request as draft December 16, 2020 14:49
@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Test build #132893 has finished for PR 30805 at commit 7f98b93.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AllColumnsExcept(exclude: Expression*) extends Expression with CodegenFallback

@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37495/

@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37495/

@github-actions github-actions bot added the SQL label Dec 16, 2020
@HyukjinKwon
Copy link
Member

@AngersZhuuuu, do you have any reference of all_columns_except function in other DBMSes?

@AngersZhuuuu
Copy link
Contributor Author

AngersZhuuuu commented Dec 17, 2020

@AngersZhuuuu, do you have any reference of all_columns_except function in other DBMSes?

https://stackoverflow.com/questions/729197/sql-exclude-a-column-using-select-except-columna-from-tablea
https://dataschool.com/learn-sql/exclude-a-column/
https://www.postgresonline.com/journal/archives/41-How-to-SELECT--ALL-EXCEPT-some-columns-in-a-table.html

Didn't find any DBMS support such feature yet, but I think many people need this feature. And it make our user convenient to write sql easier and convert DF/DS api to SQL.

@AngersZhuuuu AngersZhuuuu marked this pull request as ready for review December 17, 2020 03:05
@wangyum
Copy link
Member

wangyum commented Dec 17, 2020

How about?

set spark.sql.parser.quotedRegexColumnNames=true;

EXPLAIN SELECT `(c_customer_sk|c_customer_id|c_current_cdemo_sk|c_current_hdemo_sk|c_current_addr_sk|c_first_shipto_date_sk|c_first_sales_date_sk)?+.+` FROM customer;

== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- Project [c_salutation#1604, c_first_name#1605, c_last_name#1606, c_preferred_cust_flag#1607, c_birth_day#1608, c_birth_month#1609, c_birth_year#1610, c_birth_country#1611, c_login#1612, c_email_address#1613, c_last_review_date#1614]
   +- FileScan parquet carmel_tpcds5t.customer
EXPLAIN SELECT * FROM customer;
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- FileScan parquet carmel_tpcds5t.customer[c_customer_sk#1597, c_customer_id#1598, c_current_cdemo_sk#1599, c_current_hdemo_sk#1600, c_current_addr_sk#1601, c_first_shipto_date_sk#1602, c_first_sales_date_sk#1603, c_salutation#1604, c_first_name#1605, c_last_name#1606, c_preferred_cust_flag#1607, c_birth_day#1608, c_birth_month#1609, c_birth_year#1610, c_birth_country#1611, c_login#1612, c_email_address#1613, c_last_review_date#1614] Batched: true, DataFilters: [], Format: Parquet,

@wangyum
Copy link
Member

wangyum commented Dec 17, 2020

Can we add the usage of spark.sql.parser.quotedRegexColumnNames to document?

@AngersZhuuuu
Copy link
Contributor Author

quotedRegexColumnNames

I think it's necessary. Very useful

@SparkQA
Copy link

SparkQA commented Dec 17, 2020

Test build #132913 has finished for PR 30805 at commit aedf260.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@da-tubi
Copy link

da-tubi commented Dec 23, 2020

@AngersZhuuuu
It would be better if you mention about the post where the original idea of all_columns_except comes from:

https://mp.weixin.qq.com/s?src=11&timestamp=1608729276&ver=2784&signature=IMj38*tURJObyQv8z84ELR-Vfkw2becwRfzQ5*9p0X7ilIp-AJVmIFUGOHOr6DcEF5syqfEhiQHa87ZAdzQJHWmy7FjjvTYdEldLkc9GsqrOELpLQrQ5CVG2KdK0ZCgP&new=1

Anyway, thanks for your work to bring it to Apache Spark.

@wangyum
It's OK to use quotedRegexColumnNames, but the syntax is a bit weird!

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu
It would be better if you mention about the post where the original idea of all_columns_except comes from:

https://mp.weixin.qq.com/s?src=11&timestamp=1608729276&ver=2784&signature=IMj38*tURJObyQv8z84ELR-Vfkw2becwRfzQ5*9p0X7ilIp-AJVmIFUGOHOr6DcEF5syqfEhiQHa87ZAdzQJHWmy7FjjvTYdEldLkc9GsqrOELpLQrQ5CVG2KdK0ZCgP&new=1

Anyway, thanks for your work to bring it to Apache Spark.

@wangyum
It's OK to use quotedRegexColumnNames, but the syntax is a bit weird!

Since it's a Chinese blog, so I didn't put it in pr desc. Updated to the pr desc

dongjoon-hyun pushed a commit that referenced this pull request Jan 8, 2021
…edRegexColumnNames` in the SQL documents

### What changes were proposed in this pull request?
According to #30805 (comment),
doc `spark.sql.parser.quotedRegexColumnNames` since  we need user know about this in doc and it's useful.

![image](https://user-images.githubusercontent.com/46485123/103656543-afa4aa80-4fa3-11eb-8cd3-a9d1b87a3489.png)
![image](https://user-images.githubusercontent.com/46485123/103656551-b2070480-4fa3-11eb-9ce7-95cc424242a6.png)

### Why are the changes needed?
Complete doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #30816 from AngersZhuuuu/SPARK-33818.

Authored-by: angerszhu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Jan 8, 2021
…edRegexColumnNames` in the SQL documents

### What changes were proposed in this pull request?
According to #30805 (comment),
doc `spark.sql.parser.quotedRegexColumnNames` since  we need user know about this in doc and it's useful.

![image](https://user-images.githubusercontent.com/46485123/103656543-afa4aa80-4fa3-11eb-8cd3-a9d1b87a3489.png)
![image](https://user-images.githubusercontent.com/46485123/103656551-b2070480-4fa3-11eb-9ce7-95cc424242a6.png)

### Why are the changes needed?
Complete doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #30816 from AngersZhuuuu/SPARK-33818.

Authored-by: angerszhu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 9b54da4)
Signed-off-by: Dongjoon Hyun <[email protected]>
xuanyuanking pushed a commit to xuanyuanking/spark that referenced this pull request Sep 29, 2021
…edRegexColumnNames` in the SQL documents

### What changes were proposed in this pull request?
According to apache#30805 (comment),
doc `spark.sql.parser.quotedRegexColumnNames` since  we need user know about this in doc and it's useful.

![image](https://user-images.githubusercontent.com/46485123/103656543-afa4aa80-4fa3-11eb-8cd3-a9d1b87a3489.png)
![image](https://user-images.githubusercontent.com/46485123/103656551-b2070480-4fa3-11eb-9ce7-95cc424242a6.png)

### Why are the changes needed?
Complete doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes apache#30816 from AngersZhuuuu/SPARK-33818.

Authored-by: angerszhu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants