-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29663][SQL] Support sum with interval type values #26325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #112937 has finished for PR 26325 at commit
|
|
Test build #112981 has finished for PR 26325 at commit
|
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an example with intervals to https://github.com/apache/spark/pull/26325/files#diff-84a56c8f7c77992f9f7294cea56734ceL30-L36
|
@MaxGekk good advice, thanks. |
|
Test build #113003 has finished for PR 26325 at commit
|
|
Test build #113021 has finished for PR 26325 at commit
|
|
retest this please |
|
Test build #113074 has finished for PR 26325 at commit
|
|
@MaxGekk please recheck thanks |
|
@MaxGekk Can we have this merged? Or you may help cc some other reviewer. thank youvery much. |
|
@yaooqinn I don't have permissions to merge this. You need someone from the list https://spark.apache.org/committers.html |
|
cc @maropu @HyukjinKwon thanks |
| @@ -0,0 +1,32 @@ | |||
| -- sum | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we make it clear that this only test interval sum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can add tests to group-by.sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move them to group-by.sql
|
Test build #113205 has finished for PR 26325 at commit
|
| -- empty set | ||
| select sum(cast(v as interval)) from VALUES ('1 seconds'), ('2 seconds'), (null) t(v) where 1=0; | ||
|
|
||
| -- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basic interval sum
| select sum(cast(v as interval)) from VALUES ('-1 seconds'), ('-2 seconds'), (null) t(v); | ||
| select sum(cast(v as interval)) from VALUES ('-1 weeks'), ('2 seconds'), (null) t(v); | ||
|
|
||
| --group by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one space after --
| from VALUES (1, '-1 weeks'), (2, '2 seconds'), (3, null), (1, '5 days') t(i, v) | ||
| group by i; | ||
|
|
||
| --having |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with all above comments, thank for reviewing
|
Test build #113207 has finished for PR 26325 at commit
|
|
Test build #113209 has finished for PR 26325 at commit
|
|
thanks, merging to master! |
### What changes were proposed in this pull request? This PR reverts #26325 and #26347 ### Why are the changes needed? When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double. However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption. ### Does this PR introduce any user-facing change? yes, remove 2 features added in 3.0 ### How was this patch tested? N/A Closes #27619 from cloud-fan/revert. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: herman <[email protected]>
### What changes were proposed in this pull request? This PR reverts #26325 and #26347 ### Why are the changes needed? When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double. However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption. ### Does this PR introduce any user-facing change? yes, remove 2 features added in 3.0 ### How was this patch tested? N/A Closes #27619 from cloud-fan/revert. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: herman <[email protected]> (cherry picked from commit 1b67d54) Signed-off-by: herman <[email protected]>
### What changes were proposed in this pull request? This PR reverts apache#26325 and apache#26347 ### Why are the changes needed? When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double. However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption. ### Does this PR introduce any user-facing change? yes, remove 2 features added in 3.0 ### How was this patch tested? N/A Closes apache#27619 from cloud-fan/revert. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: herman <[email protected]>
What changes were proposed in this pull request?
sum support interval values
Why are the changes needed?
Part of SPARK-27764 Feature Parity between PostgreSQL and Spark
Does this PR introduce any user-facing change?
yes, sum can evaluate intervals
How was this patch tested?
add ut