[SPARK-46535][SQL] Fix NPE when describe extended a column without col stats #44524

Zouxxyy · 2023-12-28T08:34:14Z

What changes were proposed in this pull request?

Why are the changes needed?

Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.

Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null
	at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918)

This RP will fix it

Does this PR introduce any user-facing change?

How was this patch tested?

Add a new test describe extended (formatted) a column without col stats

Was this patch authored or co-authored using generative AI tooling?

… col stats

Zouxxyy · 2023-12-28T08:45:46Z

@huaxingao Can you help with a review~

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

…/v2/DescribeTableSuite.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala

MaxGekk · 2023-12-28T10:15:12Z

Not related to this PR but users shouldn't see NPE. We should convert NPE to an internal error. cc @cloud-fan

…errors ### What changes were proposed in this pull request? In the PR, I propose to handle NPE and asserts from eagerly executed commands, and convert them to internal errors. ### Why are the changes needed? To unify the approach for errors raised by Spark SQL. ### Does this PR introduce _any_ user-facing change? Yes. Before the changes: ``` Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) ``` After: ``` org.apache.spark.SparkException: [INTERNAL_ERROR] Eagerly executed command failed. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:107) ... Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) ``` ### How was this patch tested? Manually, by running the test from another PR: #44524 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44525 from MaxGekk/internal-error-eagerlyExecuteCommands. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>

MaxGekk · 2023-12-28T14:43:31Z

@Zouxxyy This should be ported to only branch-3.5? The earlier version (branch-3.4) doesn't have the issue according to https://issues.apache.org/jira/browse/SPARK-46535, correct?

Zouxxyy · 2023-12-28T14:48:05Z

@Zouxxyy This should be ported to only branch-3.5? The earlier version (branch-3.4) doesn't have the issue according to https://issues.apache.org/jira/browse/SPARK-46535, correct?

yes

MaxGekk · 2023-12-28T16:55:52Z

The failed tests are not related to the changes.

Merging to master/3.5. Thank you, @Zouxxyy and @yaooqinn @LuciferYang for review.

…l stats ### What changes were proposed in this pull request? ### Why are the changes needed? Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception. ```text Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918) ``` This RP will fix it ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Add a new test `describe extended (formatted) a column without col stats` ### Was this patch authored or co-authored using generative AI tooling? Closes #44524 from Zouxxyy/dev/fix-stats. Lead-authored-by: zouxxyy <[email protected]> Co-authored-by: Kent Yao <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit af8228c) Signed-off-by: Max Gekk <[email protected]>

MaxGekk · 2023-12-28T16:59:07Z

@Zouxxyy Congratulations with your first contribution to Apache Spark!

…ut col stats ### What changes were proposed in this pull request? Backport [#44524 ] to 3.4 for [[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE when describe extended a column without col stats ### Why are the changes needed? Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception. ``` Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103) ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Add a new test describe extended (formatted) a column without col stats ### Was this patch authored or co-authored using generative AI tooling? No Closes #48160 from saitharun15/SPARK-46535-branch-3.4. Lead-authored-by: saitharun15 <[email protected]> Co-authored-by: Sai Tharun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…l stats ### What changes were proposed in this pull request? ### Why are the changes needed? Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception. ```text Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918) ``` This RP will fix it ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Add a new test `describe extended (formatted) a column without col stats` ### Was this patch authored or co-authored using generative AI tooling? Closes apache#44524 from Zouxxyy/dev/fix-stats. Lead-authored-by: zouxxyy <[email protected]> Co-authored-by: Kent Yao <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit af8228c) Signed-off-by: Max Gekk <[email protected]>

[SPARK-39859][SQL] Fix describe extended (formatted) a column without…

c3a80e8

… col stats

github-actions bot added the SQL label Dec 28, 2023

Zouxxyy changed the title ~~[SPARK-39859][SQL] Fix describe extended column without col stats~~ [SPARK-39859][SQL] Fix NPE when describe extended a column without col stats Dec 28, 2023

yaooqinn reviewed Dec 28, 2023

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala Outdated Show resolved Hide resolved

Update sql/core/src/test/scala/org/apache/spark/sql/execution/command…

7ce0040

…/v2/DescribeTableSuite.scala

yaooqinn approved these changes Dec 28, 2023

View reviewed changes

Zouxxyy commented Dec 28, 2023

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala Outdated Show resolved Hide resolved

Zouxxyy changed the title ~~[SPARK-39859][SQL] Fix NPE when describe extended a column without col stats~~ [SPARK-46535][SQL] Fix NPE when describe extended a column without col stats Dec 28, 2023

update

abd0b61

yaooqinn approved these changes Dec 28, 2023

View reviewed changes

MaxGekk approved these changes Dec 28, 2023

View reviewed changes

LuciferYang approved these changes Dec 28, 2023

View reviewed changes

MaxGekk mentioned this pull request Dec 28, 2023

[SPARK-46537][SQL] Convert NPE and asserts from commands to internal errors #44525

Closed

MaxGekk closed this in af8228c Dec 28, 2023

Zouxxyy deleted the dev/fix-stats branch September 13, 2024 16:53

saitharun15 mentioned this pull request Sep 19, 2024

[SPARK-46535][SQL][3.4] Fix NPE when describe extended a column without col stats #48160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-46535][SQL] Fix NPE when describe extended a column without col stats #44524

[SPARK-46535][SQL] Fix NPE when describe extended a column without col stats #44524

Uh oh!

Zouxxyy commented Dec 28, 2023 •

edited

Loading

Uh oh!

Zouxxyy commented Dec 28, 2023

Uh oh!

Uh oh!

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

Zouxxyy commented Dec 28, 2023

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-46535][SQL] Fix NPE when describe extended a column without col stats #44524

[SPARK-46535][SQL] Fix NPE when describe extended a column without col stats #44524

Uh oh!

Conversation

Zouxxyy commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Zouxxyy commented Dec 28, 2023

Uh oh!

Uh oh!

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

Zouxxyy commented Dec 28, 2023

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

MaxGekk commented Dec 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Zouxxyy commented Dec 28, 2023 •

edited

Loading