-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-33809][SQL] Support all_columns_except complex column expression
#30805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #132893 has finished for PR 30805 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
@AngersZhuuuu, do you have any reference of |
https://stackoverflow.com/questions/729197/sql-exclude-a-column-using-select-except-columna-from-tablea Didn't find any DBMS support such feature yet, but I think many people need this feature. And it make our user convenient to write sql easier and convert DF/DS api to SQL. |
|
How about? set spark.sql.parser.quotedRegexColumnNames=true;
EXPLAIN SELECT `(c_customer_sk|c_customer_id|c_current_cdemo_sk|c_current_hdemo_sk|c_current_addr_sk|c_first_shipto_date_sk|c_first_sales_date_sk)?+.+` FROM customer;
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- Project [c_salutation#1604, c_first_name#1605, c_last_name#1606, c_preferred_cust_flag#1607, c_birth_day#1608, c_birth_month#1609, c_birth_year#1610, c_birth_country#1611, c_login#1612, c_email_address#1613, c_last_review_date#1614]
+- FileScan parquet carmel_tpcds5t.customerEXPLAIN SELECT * FROM customer;
== Physical Plan ==
AdaptiveSparkPlan(isFinalPlan=false)
+- FileScan parquet carmel_tpcds5t.customer[c_customer_sk#1597, c_customer_id#1598, c_current_cdemo_sk#1599, c_current_hdemo_sk#1600, c_current_addr_sk#1601, c_first_shipto_date_sk#1602, c_first_sales_date_sk#1603, c_salutation#1604, c_first_name#1605, c_last_name#1606, c_preferred_cust_flag#1607, c_birth_day#1608, c_birth_month#1609, c_birth_year#1610, c_birth_country#1611, c_login#1612, c_email_address#1613, c_last_review_date#1614] Batched: true, DataFilters: [], Format: Parquet, |
|
Can we add the usage of |
I think it's necessary. Very useful |
|
Test build #132913 has finished for PR 30805 at commit
|
|
@AngersZhuuuu Anyway, thanks for your work to bring it to Apache Spark. @wangyum |
Since it's a Chinese blog, so I didn't put it in pr desc. Updated to the pr desc |
…edRegexColumnNames` in the SQL documents ### What changes were proposed in this pull request? According to #30805 (comment), doc `spark.sql.parser.quotedRegexColumnNames` since we need user know about this in doc and it's useful.   ### Why are the changes needed? Complete doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #30816 from AngersZhuuuu/SPARK-33818. Authored-by: angerszhu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…edRegexColumnNames` in the SQL documents ### What changes were proposed in this pull request? According to #30805 (comment), doc `spark.sql.parser.quotedRegexColumnNames` since we need user know about this in doc and it's useful.   ### Why are the changes needed? Complete doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes #30816 from AngersZhuuuu/SPARK-33818. Authored-by: angerszhu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 9b54da4) Signed-off-by: Dongjoon Hyun <[email protected]>
…edRegexColumnNames` in the SQL documents ### What changes were proposed in this pull request? According to apache#30805 (comment), doc `spark.sql.parser.quotedRegexColumnNames` since we need user know about this in doc and it's useful.   ### Why are the changes needed? Complete doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need Closes apache#30816 from AngersZhuuuu/SPARK-33818. Authored-by: angerszhu <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
We are easy to drop some column when use DF/DS api
dropbut difficult when use SQL when there are so many columns.Add a function to support this:
Idea from a Chinese blog
https://mp.weixin.qq.com/s?src=11×tamp=1608729276&ver=2784&signature=IMj38*tURJObyQv8z84ELR-Vfkw2becwRfzQ5*9p0X7ilIp-AJVmIFUGOHOr6DcEF5syqfEhiQHa87ZAdzQJHWmy7FjjvTYdEldLkc9GsqrOELpLQrQ5CVG2KdK0ZCgP&new=1
Why are the changes needed?
Provide a way to handle drop column in SQL
Does this PR introduce any user-facing change?
Yes, user can use
all_columns_exceptfunc in projection list to handle complex drop column scenarioHow was this patch tested?
Added UT