-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9525] [PySpark] [MLlib] Optimize SparseVector initialization #7854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MechCoder
commented
Aug 1, 2015
- Remove sorting of indices and assume that the user gives a sorted tuple of indices, values etc
- Avoid iterating twice to get the indices and values if the argument provided is a dict.
- Add checks such that the length of the indices should be less than the size provided.
|
@JoshRosen Were the pylint checks removed? |
|
I moved the buffer checks upward, so that it is easier to perform the (tuple / dict) conversion and check. |
|
Test build #39358 has finished for PR 7854 at commit
|
|
@jkbradley should this find a place in 1.5 or can this wait? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
items() has no guarantees on the ordering of the keys... Is it okay that indices may not be sorted after this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should use an OrderedDict here
|
It looks like the Python constructors enforce sorted indices (consistent with Scala) only with numeric (and not String) args. This should be made consistent, but it's not clear whether we should guarantee ordering at the cost of sorting overhead or relax this constraint at the risk of breaking code elsewhere. We should clarify and clearly document if there is any guarantee on the ordering of indices, since features like #7794 depend on this ordering. |
|
might not be able to work on this. feel free to cherry pick and complete. |