[SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down#34593
[SPARK-37324][SQL] Adds support for decimal rounding mode up, down, half_down#34593sathiyapk wants to merge 14 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
qq: is there any reference of DBMSes that supports down, up, etc?
There was a problem hiding this comment.
sql-server does something similar :
https://docs.microsoft.com/en-us/sql/t-sql/functions/round-transact-sql?view=sql-server-ver15
There was a problem hiding this comment.
Can we match the argument and usage with other references? I think we don't have to bother to come up with spark's own argument style.
There was a problem hiding this comment.
Could you please give me an example ?
There was a problem hiding this comment.
I currently don't - I haven't taken a look around other DBMSes on that.
There was a problem hiding this comment.
SAP Hana SQL ROUND function takes rounding mode as argument as we do : https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/20e6a27575191014bd54a07fd86c585d.html
There was a problem hiding this comment.
ROUND(<number> [, <position> [, <rounding_mode>]])
There was a problem hiding this comment.
Hmm, there seems to be no standard / de-fact standard implementation of round which has the third argument.
As you mention, SAP Hana and MS SQL Server allow the third argument for round but they are still ununiform.
There was a problem hiding this comment.
Not sure about the changePrecision method for ROUND_UP, ROUND_DOWN and ROUND_HALF_DOWN. If anyone could help me on this that would be great.
There was a problem hiding this comment.
@HyukjinKwon Also do you have any input on this please ? I need to integrate that in DecimalSuite.scala.
There was a problem hiding this comment.
In the current implementation, changePrecision should not take ROUND_UP, ROUND_DOWN and ROUND_HALF_DOWN. If we have such case, it's a bug so how about having an assertion before this match expression or simply falling to the default _ branch?
There was a problem hiding this comment.
Hi @sarutak thanks for your review. Out of curiosity, may i know why it will be a bug if changePrecision takes ROUND_UP, ROUND_DOWN or ROUND_HALF_DOWN ?
So we could do something like this :
case ROUND_UP | ROUND_DOWN | ROUND_HALF_DOWN | _ =>
throw QueryExecutionErrors.unsupportedRoundingMode(roundMode)
There was a problem hiding this comment.
Out of curiosity, may i know why it will be a bug if changePrecision takes ROUND_UP, ROUND_DOWN or ROUND_HALF_DOWN ?
It's not expected that ROUND_UP, ROUND_DOWN and ROUND_HALF_DOWN are not expected to be passed to changePrecision and it's private.
|
Test Fails on compatibility check against spark-sql_2.12:3.2.0! but it is normal, no ? any idea how can we fix it ? error] spark-sql: Failed binary compatibility check against org.apache.spark:spark-sql_2.12:3.2.0! Found 2 potential problems (filtered 203) |
|
@gengliangwang Could you please kindly take a look at this ? thank you very much. |
|
ok to test |
There was a problem hiding this comment.
This breaks binary compatibility in Java because the default argument doesn't work. we should overload it.
There was a problem hiding this comment.
Thanks, that fixed it. I'm currently fixing the schema mismatch fails. But i'm not sure why the kubernetes integration test is failing, do you have any input on that, please ?
|
Test build #145261 has finished for PR 34593 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #145282 has finished for PR 34593 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #145350 has finished for PR 34593 at commit
|
|
Kubernetes integration test starting |
|
Test build #145339 has finished for PR 34593 at commit
|
|
Test build #145341 has finished for PR 34593 at commit
|
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Test build #145552 has finished for PR 34593 at commit
|
|
Test build #145551 has finished for PR 34593 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #145597 has finished for PR 34593 at commit
|
| | org.apache.spark.sql.catalyst.expressions.Right | right | SELECT right('Spark SQL', 3) | struct<right(Spark SQL, 3):string> | | ||
| | org.apache.spark.sql.catalyst.expressions.Rint | rint | SELECT rint(12.3456) | struct<rint(12.3456):double> | | ||
| | org.apache.spark.sql.catalyst.expressions.Round | round | SELECT round(2.5, 0) | struct<round(2.5, 0):decimal(2,0)> | | ||
| | org.apache.spark.sql.catalyst.expressions.Round | round | SELECT round(2.5, 0) | struct<round(2.5, 0, half_up):decimal(2,0)> | |
There was a problem hiding this comment.
Not every DBMS systems supports the third parameter. Some of the external connectors are relying on the SQL representation string of expressions.
So, shall we hide the 3rd parameter if it is the default "half_up" ?
There was a problem hiding this comment.
Yes, we can do that. I'll wait for the opinions of other committers too.
This doesn't seem not related to the proposal in this PR. |
|
I investigated whether the major DBMS and query processing systems (PostgreSQL, BigQuery, Snowflake, MySQL, SAP HANA, MS SQL Server, Firebird, DB2, Teradata, Hive) support the third parameter of @sathiyapk @HyukjinKwon @gengliangwang What do you think? |
|
@sarutak yeah I did some investigations too. This feature seems not commonly used. |
|
I think people can just use |
floor and ceil can't control the position of rounding directly |
|
@sarutak Currently, we are relying on UDF for rounding, when i googled to do it via spark native methods, i found a lot of people have the same issue and it's not complex to support it via spark native methods, so i opened this PR. |
|
@cloud-fan @gengliangwang I confirm |
|
Then can we add a |
|
@cloud-fan I like the idea of adding |
|
@sathiyapk you are right, but I can also argue that the We can collect more data points, I found one example that snowflake supports scale parameter in floor/ceil: https://docs.snowflake.com/en/sql-reference/functions/floor.html |
|
@cloud-fan In our case, the first function we looked into was If everyone agree I will open a new PR for the scale parameter to |
|
+1 for adding the scale parameter to |
|
@gengliangwang @HyukjinKwon what do you think ? |
|
+1 for adding the scale parameter to floor and ceil. |
### What changes were proposed in this pull request? Adds `scale` parameter to `floor`/`ceil` functions in order to allow users to control the rounding position. This feature is proposed in the PR: #34593 ### Why are the changes needed? Currently we support Decimal RoundingModes : HALF_UP (round) and HALF_EVEN (bround). But we have use cases that needs RoundingMode.UP and RoundingMode.DOWN. Floor and Ceil functions helps to do this but it doesn't support the position of the rounding. Adding scale parameter to the functions would help us control the rounding positions. Snowflake supports `scale` parameter to `floor`/`ceil` : ` FLOOR( <input_expr> [, <scale_expr> ] )` REF: https://docs.snowflake.com/en/sql-reference/functions/floor.html ### Does this PR introduce _any_ user-facing change? Now users can pass `scale` parameter to the `floor` and `ceil` functions. ``` > SELECT floor(-0.1); -1.0 > SELECT floor(5); 5 > SELECT floor(3.1411, 3); 3.141 > SELECT floor(3.1411, -3); 1000.0 > SELECT ceil(-0.1); 0.0 > SELECT ceil(5); 5 > SELECT ceil(3.1411, 3); 3.142 > SELECT ceil(3.1411, -3); 1000.0 ``` ### How was this patch tested? This patch was tested locally using unit test and git workflow. Closes #34729 from sathiyapk/SPARK-37475-floor-ceil-scale. Authored-by: Sathiya KUMAR <ext.sathiyaprabhu.kumar@sncf.fr> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Why are the changes needed?
Currently there's no easier and straight forward way to round decimals with the mode UP, DOWN and HALF_DOWN. People need to use UDF or complex operations to do the same.
Opening support for the other rounding modes might interest a lot of use cases.
SAP Hana Sql ROUND function does it :
ROUND(<number> [, <position> [, <rounding_mode>]])REF : https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/20e6a27575191014bd54a07fd86c585d.html
Sql Server does something similar to this :
ROUND ( numeric_expression , length [ ,function ] )REF : https://docs.microsoft.com/en-us/sql/t-sql/functions/round-transact-sql?view=sql-server-ver15
Does this PR introduce any user-facing change?
Now, users can specify the rounding mode while calling round function for round modes up, down, half_down. Calling round function without rounding mode will default to half_up.
How was this patch tested?
This patch was tested locally using unit test and git workflow.