-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46443][SQL] Decimal precision and scale should decided by H2 dialect. #44398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c5eb9c1 to
d0f31bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the background. So I let it as the default implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @JoshRosen
d0f31bd to
039ef7e
Compare
|
Where does |
It comes from spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala Line 416 in dc0bfc4
The schema is DecimalType(38, 38) and the data returns from H2 is java.math.BigDecimal(7, 2).d = java.math.BigDecimal(7, 2) p = 38 s = 38 The Decimal(d, p, s) causes the exception.
|
|
So what we need is a cast? seems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong. I think we should make sure the final Decimal instance we return has the same precision and scale as the JDBC column type.
I think this is already wrong. We should update the H2 dialect to return |
Yes. |
Let me try this way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too specific. Can we do it if precision > 38?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am doubt that H2 may only have this particular situation.
Other situations greater than 38 have not been actually verified. Can we wait until we encounter other exceptions in the future before expanding?
c87d2d4 to
2a07868
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make the comment a bit more clearer:
H2 supports very large decimal precision like 100000. The max precision in Spark is only 38.
Here we shrink both the precision and scale of H2 decimal to fit Spark, and still keep the ratio between them.
2a07868 to
86eeeb8
Compare
|
The GA failure is unrelated. |
|
thanks, merging to master/3.5! |
…ialect ### What changes were proposed in this pull request? This PR fix a but by make JDBC dialect decide the decimal precision and scale. **How to reproduce the bug?** #44397 proposed DS V2 push down `PERCENTILE_CONT` and `PERCENTILE_DISC`. The bug fired when pushdown the below SQL to H2 JDBC. `SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"` **The root cause** `getQueryOutputSchema` used to get the output schema of query by call `JdbcUtils.getSchema`. The query for database H2 show below. `SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"` We can get the five variables from `ResultSetMetaData`, please refer: ``` columnName = "PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY SALARY NULLS FIRST)" dataType = 2 typeName = "NUMERIC" fieldSize = 100000 fieldScale = 50000 ``` Then we get the catalyst schema with `JdbcUtils.getCatalystType`, it calls `DecimalType.bounded(precision, scale)` actually. The `DecimalType.bounded(100000, 50000)` returns `DecimalType(38, 38)`. At finally, `makeGetter` throws exception. ``` Caused by: org.apache.spark.SparkArithmeticException: [DECIMAL_PRECISION_EXCEEDS_MAX_PRECISION] Decimal precision 42 exceeds max precision 38. SQLSTATE: 22003 at org.apache.spark.sql.errors.DataTypeErrors$.decimalPrecisionExceedsMaxPrecisionError(DataTypeErrors.scala:48) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:124) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:577) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$4(JdbcUtils.scala:408) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:552) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$3(JdbcUtils.scala:408) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$3$adapted(JdbcUtils.scala:406) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:358) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:339) ``` ### Why are the changes needed? This PR fix the bug that `JdbcUtils` can't get the correct decimal type. ### Does this PR introduce _any_ user-facing change? 'Yes'. Fix a bug. ### How was this patch tested? Manual tests in #44397 ### Was this patch authored or co-authored using generative AI tooling? 'No'. Closes #44398 from beliefer/SPARK-46443. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit a921da8) Signed-off-by: Wenchen Fan <[email protected]>
|
@cloud-fan Thank you! |
What changes were proposed in this pull request?
This PR fix a but by make JDBC dialect decide the decimal precision and scale.
How to reproduce the bug?
#44397 proposed DS V2 push down
PERCENTILE_CONTandPERCENTILE_DISC.The bug fired when pushdown the below SQL to H2 JDBC.
SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"The root cause
getQueryOutputSchemaused to get the output schema of query by callJdbcUtils.getSchema.The query for database H2 show below.
SELECT "DEPT",PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "SALARY" ASC NULLS FIRST) FROM "test"."employee" WHERE 1=0 GROUP BY "DEPT"We can get the five variables from
ResultSetMetaData, please refer:Then we get the catalyst schema with
JdbcUtils.getCatalystType, it callsDecimalType.bounded(precision, scale)actually.The
DecimalType.bounded(100000, 50000)returnsDecimalType(38, 38).At finally,
makeGetterthrows exception.Why are the changes needed?
This PR fix the bug that
JdbcUtilscan't get the correct decimal type.Does this PR introduce any user-facing change?
'Yes'.
Fix a bug.
How was this patch tested?
Manual tests in #44397
Was this patch authored or co-authored using generative AI tooling?
'No'.