Skip to content

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

sum support interval values

Why are the changes needed?

Part of SPARK-27764 Feature Parity between PostgreSQL and Spark

Does this PR introduce any user-facing change?

yes, sum can evaluate intervals

How was this patch tested?

add ut

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112937 has finished for PR 26325 at commit a09ec61.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #112981 has finished for PR 26325 at commit 9f1ce88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaooqinn
Copy link
Member Author

@MaxGekk good advice, thanks.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #113003 has finished for PR 26325 at commit 1ff5135.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #113021 has finished for PR 26325 at commit e17dc68.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 1, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113074 has finished for PR 26325 at commit e17dc68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 1, 2019

@MaxGekk please recheck thanks

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 1, 2019

@MaxGekk Can we have this merged? Or you may help cc some other reviewer. thank youvery much.

@MaxGekk
Copy link
Member

MaxGekk commented Nov 1, 2019

@yaooqinn I don't have permissions to merge this. You need someone from the list https://spark.apache.org/committers.html

@yaooqinn
Copy link
Member Author

yaooqinn commented Nov 4, 2019

cc @maropu @HyukjinKwon thanks

@@ -0,0 +1,32 @@
-- sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make it clear that this only test interval sum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or you can add tests to group-by.sql

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move them to group-by.sql

@SparkQA
Copy link

SparkQA commented Nov 4, 2019

Test build #113205 has finished for PR 26325 at commit e9e4b59.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

-- empty set
select sum(cast(v as interval)) from VALUES ('1 seconds'), ('2 seconds'), (null) t(v) where 1=0;

--
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic interval sum

select sum(cast(v as interval)) from VALUES ('-1 seconds'), ('-2 seconds'), (null) t(v);
select sum(cast(v as interval)) from VALUES ('-1 weeks'), ('2 seconds'), (null) t(v);

--group by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one space after --

from VALUES (1, '-1 weeks'), (2, '2 seconds'), (3, null), (1, '5 days') t(i, v)
group by i;

--having
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with all above comments, thank for reviewing

@SparkQA
Copy link

SparkQA commented Nov 4, 2019

Test build #113207 has finished for PR 26325 at commit 5415a85.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2019

Test build #113209 has finished for PR 26325 at commit b3c975f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in 44b8fbc Nov 4, 2019
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@yaooqinn yaooqinn deleted the SPARK-29663 branch November 5, 2019 00:51
hvanhovell pushed a commit that referenced this pull request Feb 18, 2020
### What changes were proposed in this pull request?

This PR reverts #26325 and #26347

### Why are the changes needed?

When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double.

However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption.

### Does this PR introduce any user-facing change?

yes, remove 2 features added in 3.0

### How was this patch tested?

N/A

Closes #27619 from cloud-fan/revert.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: herman <[email protected]>
hvanhovell pushed a commit that referenced this pull request Feb 18, 2020
### What changes were proposed in this pull request?

This PR reverts #26325 and #26347

### Why are the changes needed?

When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double.

However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption.

### Does this PR introduce any user-facing change?

yes, remove 2 features added in 3.0

### How was this patch tested?

N/A

Closes #27619 from cloud-fan/revert.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: herman <[email protected]>
(cherry picked from commit 1b67d54)
Signed-off-by: herman <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

This PR reverts apache#26325 and apache#26347

### Why are the changes needed?

When we do sum/avg, we need a wider type of input to hold the sum value, to reduce the possibility of overflow. For example, we use long to hold the sum of integral inputs, use double to hold the sum of float/double.

However, we don't have a wider type of interval. Also the semantic is unclear: what if the days field overflows but the months field doesn't? Currently the avg of `1 month` and `2 month` is `1 month 15 days`, which assumes 1 month has 30 days and we should avoid this assumption.

### Does this PR introduce any user-facing change?

yes, remove 2 features added in 3.0

### How was this patch tested?

N/A

Closes apache#27619 from cloud-fan/revert.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: herman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants