-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39859][SQL] Support v2 DESCRIBE TABLE EXTENDED for columns #40058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from my side.
cc @cloud-fan , @viirya , @sunchao , @parthchandra
| if (colStats.get.avgLen().isPresent) { | ||
| rows += toCatalystRow("max_col_len", colStats.get.avgLen().getAsLong.toString) | ||
| } else { | ||
| rows += toCatalystRow("max_col_len", "NULL") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (colStats.get.avgLen().isPresent) { | |
| rows += toCatalystRow("max_col_len", colStats.get.avgLen().getAsLong.toString) | |
| } else { | |
| rows += toCatalystRow("max_col_len", "NULL") | |
| } | |
| if (colStats.get.maxLen().isPresent) { | |
| rows += toCatalystRow("max_col_len", colStats.get.maxLen().getAsLong.toString) | |
| } else { | |
| rows += toCatalystRow("max_col_len", "NULL") | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
| r.table.asReadable.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match { | ||
| case s: SupportsReportStatistics => | ||
| val stats = s.estimateStatistics() | ||
| Some(stats.columnStats().get(FieldReference.column(c.name))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is columnStats case-sensitive or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is controlled by SQLConf.CASE_SENSITIVE
### What changes were proposed in this pull request? Support v2 DESCRIBE TABLE EXTENDED for columns ### Why are the changes needed? DS v1/v2 command parity ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #40058 from huaxingao/describe_col. Authored-by: huaxingao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit ebab0ef) Signed-off-by: Dongjoon Hyun <[email protected]>
|
Merged to master/3.4 for Apache Spark 3.4.0. Thank you, @huaxingao and @viirya . cc @MaxGekk since he filed SPARK-39859 originally. |
|
Thanks @dongjoon-hyun @viirya |
| case c: Attribute => | ||
| DescribeColumnExec(output, c, isExtended) :: Nil | ||
| val colStats = | ||
| r.table.asReadable.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may not be very cheap. Shall we only do it if isExtended is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and what if the table is not readable (e.g. write-only)? We should not fail but show no column stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I think it's cleaner to pass v2 Table to DescribeColumnExec and move this code block to DescribeColumnExec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks for your comments. I will have a follow-up to fix this and also move the test to parent suite.
| } | ||
|
|
||
| // TODO(SPARK-39859): Support v2 `DESCRIBE TABLE EXTENDED` for columns | ||
| test("describe extended (formatted) a column") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't have to be done immediately, but it's better to move this test to the parent suite, to make sure v1 and v2 commands have the same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
### What changes were proposed in this pull request? Support v2 DESCRIBE TABLE EXTENDED for columns ### Why are the changes needed? DS v1/v2 command parity ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes apache#40058 from huaxingao/describe_col. Authored-by: huaxingao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit ebab0ef) Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? Support v2 DESCRIBE TABLE EXTENDED with table stats ### Why are the changes needed? Similar to #40058, make DS v1/v2 command parity, e.g. DESC EXTENDED table | col_name | data_type | comment | |-------------------|---------------------------|------------| | ... | ... | ... | | Statistics | 864 bytes, 2 rows | | | ... | ... | ... | ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? add test `describe extended table with stats` ### Was this patch authored or co-authored using generative AI tooling? No Closes #44535 from Zouxxyy/dev/desc-table-stats. Lead-authored-by: zouxxyy <[email protected]> Co-authored-by: Zouxxyy <[email protected]> Signed-off-by: Max Gekk <[email protected]>
What changes were proposed in this pull request?
Support v2 DESCRIBE TABLE EXTENDED for columns
Why are the changes needed?
DS v1/v2 command parity
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT