Skip to content

Conversation

@saitharun15
Copy link

What changes were proposed in this pull request?

Backport [#44524 ] to 3.4 for [SPARK-46535][SQL] Fix NPE when describe extended a column without col stats

Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.

Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)

Does this PR introduce any user-facing change?

How was this patch tested?

Add a new test describe extended (formatted) a column without col stats

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 19, 2024
@saitharun15
Copy link
Author

@MaxGekk @yaooqinn @LuciferYang @guykhazma , this is a backport to 3.4 for #44524 can u please review this

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saitharun15 Please, enable GAs in your fork.

import org.apache.spark.sql.types.StringType
import org.apache.spark.util.Utils


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this empty line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@saitharun15
Copy link
Author

@MaxGekk I have enabled GAs in my fork

@saitharun15 saitharun15 requested a review from MaxGekk September 19, 2024 15:00
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @saitharun15 , @LuciferYang , @MaxGekk .

Although @saitharun15 needs to setup his fork more properly, we can see that SQL module tests passed in the CIs.

For Scala linter, I verified manually.

$ dev/scalastyle
Using SPARK_LOCAL_IP=localhost
Scalastyle checks passed.

Merged to branch-3.4 for Apache Spark 3.4.4.

dongjoon-hyun pushed a commit that referenced this pull request Sep 19, 2024
…ut col stats

### What changes were proposed in this pull request?

Backport  [#44524 ] to 3.4 for [[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE when describe extended a column without col stats

### Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.

```
Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Add a new test describe extended (formatted) a column without col stats

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48160 from saitharun15/SPARK-46535-branch-3.4.

Lead-authored-by: saitharun15 <[email protected]>
Co-authored-by: Sai Tharun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants