-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly #8166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #40766 has finished for PR 8166 at commit
|
|
Jenkins, test this please. |
|
Test build #40949 has finished for PR 8166 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since k1 will be at most == v1_size due to the earlier while, checking for == here will suffice and is easier to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for k2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think checking k1 >= v1_size is more robust than k1 == v1_size, and Scala code also use the former one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's fine with me
|
LGTM after docstring change |
|
Test build #41666 has finished for PR 8166 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it return False?
|
@yanboliang Please update the PR to use the first 128 nonzeros entries to compute hash. |
d63d54e to
3b8ac7a
Compare
|
Test build #42420 has finished for PR 8166 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the code more readable:
if isnan(value):
value = float('nan')
return struct.unpack('Q', struct.pack('d', value))[0]|
Test build #42465 has finished for PR 8166 at commit
|
|
LGTM. Merged into master. @yanboliang |
|
@mengxr OK, I opened SPARK-10615 to track the |
PySpark DenseVector, SparseVector
__eq__method should use semantics equality, and DenseVector can compared with SparseVector.Implement PySpark DenseVector, SparseVector
__hash__method based on the first 16 entries. That will make PySpark Vector objects can be used in collections.