Skip to content

Conversation

@huaxingao
Copy link
Contributor

What changes were proposed in this pull request?

Support v2 DESCRIBE TABLE EXTENDED for columns

Why are the changes needed?

DS v1/v2 command parity

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@github-actions github-actions bot added the SQL label Feb 16, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from my side.

cc @cloud-fan , @viirya , @sunchao , @parthchandra

@viirya viirya changed the title [Spark-39859][SQL] Support v2 DESCRIBE TABLE EXTENDED for columns [SPARK-39859][SQL] Support v2 DESCRIBE TABLE EXTENDED for columns Feb 16, 2023
Comment on lines 78 to 82
if (colStats.get.avgLen().isPresent) {
rows += toCatalystRow("max_col_len", colStats.get.avgLen().getAsLong.toString)
} else {
rows += toCatalystRow("max_col_len", "NULL")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (colStats.get.avgLen().isPresent) {
rows += toCatalystRow("max_col_len", colStats.get.avgLen().getAsLong.toString)
} else {
rows += toCatalystRow("max_col_len", "NULL")
}
if (colStats.get.maxLen().isPresent) {
rows += toCatalystRow("max_col_len", colStats.get.maxLen().getAsLong.toString)
} else {
rows += toCatalystRow("max_col_len", "NULL")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks

r.table.asReadable.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match {
case s: SupportsReportStatistics =>
val stats = s.estimateStatistics()
Some(stats.columnStats().get(FieldReference.column(c.name)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is columnStats case-sensitive or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is controlled by SQLConf.CASE_SENSITIVE

dongjoon-hyun pushed a commit that referenced this pull request Feb 17, 2023
### What changes were proposed in this pull request?
Support v2 DESCRIBE TABLE EXTENDED for columns

### Why are the changes needed?
DS v1/v2 command parity

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes #40058 from huaxingao/describe_col.

Authored-by: huaxingao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit ebab0ef)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 17, 2023

Merged to master/3.4 for Apache Spark 3.4.0. Thank you, @huaxingao and @viirya .

cc @MaxGekk since he filed SPARK-39859 originally.

@huaxingao
Copy link
Contributor Author

Thanks @dongjoon-hyun @viirya

@huaxingao huaxingao deleted the describe_col branch February 17, 2023 04:43
case c: Attribute =>
DescribeColumnExec(output, c, isExtended) :: Nil
val colStats =
r.table.asReadable.newScanBuilder(CaseInsensitiveStringMap.empty()).build() match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may not be very cheap. Shall we only do it if isExtended is true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what if the table is not readable (e.g. write-only)? We should not fail but show no column stats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think it's cleaner to pass v2 Table to DescribeColumnExec and move this code block to DescribeColumnExec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Thanks for your comments. I will have a follow-up to fix this and also move the test to parent suite.

}

// TODO(SPARK-39859): Support v2 `DESCRIBE TABLE EXTENDED` for columns
test("describe extended (formatted) a column") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't have to be done immediately, but it's better to move this test to the parent suite, to make sure v1 and v2 commands have the same behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
### What changes were proposed in this pull request?
Support v2 DESCRIBE TABLE EXTENDED for columns

### Why are the changes needed?
DS v1/v2 command parity

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes apache#40058 from huaxingao/describe_col.

Authored-by: huaxingao <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit ebab0ef)
Signed-off-by: Dongjoon Hyun <[email protected]>
MaxGekk pushed a commit that referenced this pull request Jan 2, 2024
### What changes were proposed in this pull request?

Support v2 DESCRIBE TABLE EXTENDED with table stats

### Why are the changes needed?

Similar to  #40058, make DS v1/v2 command parity, e.g.

DESC EXTENDED table

| col_name          | data_type                 | comment    |
|-------------------|---------------------------|------------|
| ...               | ...                       | ...        |
| Statistics        | 864 bytes, 2 rows         |            |
| ...               | ...                       | ...        |

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

add test `describe extended table with stats`

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #44535 from Zouxxyy/dev/desc-table-stats.

Lead-authored-by: zouxxyy <[email protected]>
Co-authored-by: Zouxxyy <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants