-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper for eqNullSafe #17605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #75700 has finished for PR 17605 at commit
|
|
LGTM thanks for adding this. |
|
@holdenk Do you think it could be merged? |
|
Test build #76159 has finished for PR 17605 at commit
|
|
Test build #76161 has finished for PR 17605 at commit
|
|
Test build #76164 has finished for PR 17605 at commit
|
|
LGTM too. |
python/pyspark/sql/column.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to document, unlike Pandas, NaN is not treated as NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think a note is enough, or should we add an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, an example is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Done.
e5e4081 to
043880b
Compare
|
Test build #76308 has finished for PR 17605 at commit
|
|
Test build #76313 has finished for PR 17605 at commit
|
| +----------------+---------------+----------------+ | ||
| |(value <=> NULL)|(value <=> NaN)|(value <=> 42.0)| | ||
| +----------------+---------------+----------------+ | ||
| | false| true| false| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Pandas/numpy, the nan's don’t compare equal, i.e., np.nan != np.nan, but in Spark we treat them as equal. Shall we document it too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is already covered by SQL guide (https://spark.apache.org/docs/latest/sql-programming-guide.html#nan-semantics). Maybe a link would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
965396e to
673bf70
Compare
|
Test build #76337 has finished for PR 17605 at commit
|
|
Test build #76338 has finished for PR 17605 at commit
|
|
Test build #76339 has finished for PR 17605 at commit
|
|
LGTM |
1 similar comment
|
LGTM |
|
Thanks! Merging to master. |
|
Thanks. |
What changes were proposed in this pull request?
Adds Python bindings for
Column.eqNullSafeHow was this patch tested?
Manual tests, existing unit tests, doc build.