-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46535][SQL] Fix NPE when describe extended a column without col stats #44524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@huaxingao Can you help with a review~ |
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala
Outdated
Show resolved
Hide resolved
…/v2/DescribeTableSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeTableSuite.scala
Outdated
Show resolved
Hide resolved
|
Not related to this PR but users shouldn't see NPE. We should convert NPE to an internal error. cc @cloud-fan |
…errors ### What changes were proposed in this pull request? In the PR, I propose to handle NPE and asserts from eagerly executed commands, and convert them to internal errors. ### Why are the changes needed? To unify the approach for errors raised by Spark SQL. ### Does this PR introduce _any_ user-facing change? Yes. Before the changes: ``` Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) ``` After: ``` org.apache.spark.SparkException: [INTERNAL_ERROR] Eagerly executed command failed. You hit a bug in Spark or the Spark plugins you use. Please, report this bug to the corresponding communities or vendors, and provide the full stack trace. SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:107) ... Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) ``` ### How was this patch tested? Manually, by running the test from another PR: #44524 ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44525 from MaxGekk/internal-error-eagerlyExecuteCommands. Authored-by: Max Gekk <[email protected]> Signed-off-by: Max Gekk <[email protected]>
|
@Zouxxyy This should be ported to only |
yes |
|
The failed tests are not related to the changes. Merging to master/3.5. Thank you, @Zouxxyy and @yaooqinn @LuciferYang for review. |
…l stats ### What changes were proposed in this pull request? ### Why are the changes needed? Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception. ```text Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918) ``` This RP will fix it ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Add a new test `describe extended (formatted) a column without col stats` ### Was this patch authored or co-authored using generative AI tooling? Closes #44524 from Zouxxyy/dev/fix-stats. Lead-authored-by: zouxxyy <[email protected]> Co-authored-by: Kent Yao <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit af8228c) Signed-off-by: Max Gekk <[email protected]>
|
@Zouxxyy Congratulations with your first contribution to Apache Spark! |
…ut col stats ### What changes were proposed in this pull request? Backport [#44524 ] to 3.4 for [[SPARK-46535]](https://issues.apache.org/jira/browse/SPARK-46535)[SQL] Fix NPE when describe extended a column without col stats ### Why are the changes needed? Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception. ``` Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103) ``` ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Add a new test describe extended (formatted) a column without col stats ### Was this patch authored or co-authored using generative AI tooling? No Closes #48160 from saitharun15/SPARK-46535-branch-3.4. Lead-authored-by: saitharun15 <[email protected]> Co-authored-by: Sai Tharun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…l stats ### What changes were proposed in this pull request? ### Why are the changes needed? Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception. ```text Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.connector.read.colstats.ColumnStatistics.min()" because the return value of "scala.Option.get()" is null at org.apache.spark.sql.execution.datasources.v2.DescribeColumnExec.run(DescribeColumnExec.scala:63) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:150) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:241) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:116) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:918) ``` This RP will fix it ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? Add a new test `describe extended (formatted) a column without col stats` ### Was this patch authored or co-authored using generative AI tooling? Closes apache#44524 from Zouxxyy/dev/fix-stats. Lead-authored-by: zouxxyy <[email protected]> Co-authored-by: Kent Yao <[email protected]> Signed-off-by: Max Gekk <[email protected]> (cherry picked from commit af8228c) Signed-off-by: Max Gekk <[email protected]>
What changes were proposed in this pull request?
Why are the changes needed?
Currently executing DESCRIBE TABLE EXTENDED a column without col stats with v2 table will throw a null pointer exception.
This RP will fix it
Does this PR introduce any user-facing change?
How was this patch tested?
Add a new test
describe extended (formatted) a column without col statsWas this patch authored or co-authored using generative AI tooling?