Skip to content

Conversation

@Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented Dec 28, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.

Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918)

This RP will fix it

Does this PR introduce any user-facing change?

How was this patch tested?

Add a new test describe extended (formatted) a column without col stats

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label Dec 28, 2023
@Zouxxyy Zouxxyy changed the title [SPARK-39859][SQL] Fix describe extended column without col stats [SPARK-39859][SQL] Fix NPE when describe extended a column without col stats Dec 28, 2023
@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented Dec 28, 2023

@huaxingao Can you help with a review~

@Zouxxyy Zouxxyy changed the title [SPARK-39859][SQL] Fix NPE when describe extended a column without col stats [SPARK-46535][SQL] Fix NPE when describe extended a column without col stats Dec 28, 2023
@MaxGekk
Copy link
Member

MaxGekk commented Dec 28, 2023

Not related to this PR but users shouldn't see NPE. We should convert NPE to an internal error. cc @cloud-fan

MaxGekk added a commit that referenced this pull request Dec 28, 2023
…errors

### What changes were proposed in this pull request?
In the PR, I propose to handle NPE and asserts from eagerly executed commands, and convert them to internal errors.

### Why are the changes needed?
To unify the approach for errors raised by Spark SQL.

### Does this PR introduce _any_ user-facing change?
Yes.

Before the changes:
```
Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
```

After:
```
org.apache.spark.SparkException: [INTERNAL_ERROR] Eagerly executed command failed. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. SQLSTATE: XX000
	at org.apache.spark.SparkException$.internalError(SparkException.scala:107)
...
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
```

### How was this patch tested?
Manually, by running the test from another PR: #44524

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #44525 from MaxGekk/internal-error-eagerlyExecuteCommands.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
@MaxGekk
Copy link
Member

MaxGekk commented Dec 28, 2023

@Zouxxyy This should be ported to only branch-3.5? The earlier version (branch-3.4) doesn't have the issue according to https://issues.apache.org/jira/browse/SPARK-46535, correct?

@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented Dec 28, 2023

@Zouxxyy This should be ported to only branch-3.5? The earlier version (branch-3.4) doesn't have the issue according to https://issues.apache.org/jira/browse/SPARK-46535, correct?

yes

@MaxGekk
Copy link
Member

MaxGekk commented Dec 28, 2023

The failed tests are not related to the changes.

Merging to master/3.5. Thank you, @Zouxxyy and @yaooqinn @LuciferYang for review.

MaxGekk pushed a commit that referenced this pull request Dec 28, 2023
…l stats

### What changes were proposed in this pull request?

### Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.

```text
Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918)
```

This RP will fix it

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Add a new test `describe extended (formatted) a column without col stats`

### Was this patch authored or co-authored using generative AI tooling?

Closes #44524 from Zouxxyy/dev/fix-stats.

Lead-authored-by: zouxxyy <[email protected]>
Co-authored-by: Kent Yao <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit af8228c)
Signed-off-by: Max Gekk <[email protected]>
@MaxGekk MaxGekk closed this in af8228c Dec 28, 2023
@MaxGekk
Copy link
Member

MaxGekk commented Dec 28, 2023

@Zouxxyy Congratulations with your first contribution to Apache Spark!

@Zouxxyy Zouxxyy deleted the dev/fix-stats branch September 13, 2024 16:53
dongjoon-hyun pushed a commit that referenced this pull request Sep 19, 2024
…ut col stats

### What changes were proposed in this pull request?

Backport  [#44524 ] to 3.4 for [[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE when describe extended a column without col stats

### Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.

```
Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
```

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Add a new test describe extended (formatted) a column without col stats

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48160 from saitharun15/SPARK-46535-branch-3.4.

Lead-authored-by: saitharun15 <[email protected]>
Co-authored-by: Sai Tharun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
…l stats

### What changes were proposed in this pull request?

### Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.

```text
Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918)
```

This RP will fix it

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Add a new test `describe extended (formatted) a column without col stats`

### Was this patch authored or co-authored using generative AI tooling?

Closes apache#44524 from Zouxxyy/dev/fix-stats.

Lead-authored-by: zouxxyy <[email protected]>
Co-authored-by: Kent Yao <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
(cherry picked from commit af8228c)
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants