-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-36741][SQL] ArrayDistinct handle duplicated Double.NaN and Float.Nan #33993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
...st/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
|
Test build #143262 has finished for PR 33993 at commit
|
| val elem = array.get(i, elementType) | ||
| if (isNaN(elem)) { | ||
| if (!hs.containsNaN) { | ||
| arrayBuffer += elem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, let's wait for the decision at the first PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, let's wait for the decision at the first PR.
Done
|
Test build #143267 has finished for PR 33993 at commit
|
|
ping @cloud-fan @dongjoon-hyun |
| } | ||
|
|
||
| val processArray = withArrayNullAssignment( | ||
| def withNaNCheck(body: String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move these codegen utils to SQLOpenHashSet as well to reduce duplicated code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we move these codegen utils to
SQLOpenHashSetas well to reduce duplicated code?
How about to do this after all done. Not only this can be refactored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Updated, How about current?
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143307 has finished for PR 33993 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| } | ||
|
|
||
| def valueNaN(dataType: DataType): Any = { | ||
| def isNaNFuncAndValueNaN(dataType: DataType, valueName: String): Option[(String, String)] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, Done
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143345 has finished for PR 33993 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143360 has finished for PR 33993 at commit
|
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
...catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #143385 has finished for PR 33993 at commit
|
|
Test build #143387 has finished for PR 33993 at commit
|
|
Test build #143391 has finished for PR 33993 at commit
|
|
ping @cloud-fan |
…at.Nan
### What changes were proposed in this pull request?
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayDistinct won't show duplicated `NaN` value
### How was this patch tested?
Added UT
Closes #33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <[email protected]>
…at.Nan
### What changes were proposed in this pull request?
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayDistinct won't show duplicated `NaN` value
### How was this patch tested?
Added UT
Closes #33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <[email protected]>
…at.Nan
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on #33955
Fix bug
ArrayDistinct won't show duplicated `NaN` value
Added UT
Closes #33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/3.2/3.1/3.0! |
…at.Nan
### What changes were proposed in this pull request?
For query
```
select array_distinct(array(cast('nan' as double), cast('nan' as double)))
```
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by `OpenHashSet` can't handle `Double.NaN` and `Float.NaN` too.
In this pr fix this based on apache#33955
### Why are the changes needed?
Fix bug
### Does this PR introduce _any_ user-facing change?
ArrayDistinct won't show duplicated `NaN` value
### How was this patch tested?
Added UT
Closes apache#33993 from AngersZhuuuu/SPARK-36741.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e356f6a)
Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
For query
This returns [NaN, NaN], but it should return [NaN].
This issue is caused by
OpenHashSetcan't handleDouble.NaNandFloat.NaNtoo.In this pr fix this based on #33955
Why are the changes needed?
Fix bug
Does this PR introduce any user-facing change?
ArrayDistinct won't show duplicated
NaNvalueHow was this patch tested?
Added UT