-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8874] [ML] Add missing methods in Word2Vec #7263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mengxr I would like to return this as a dataframe with each row having two fields, a string and a float array. But I'm unable to understand how to access the SparkContext here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So its my understanding, from my reading of the code, it isn't possible to access the Spark context here. If we look at similar methods in other models (like predict in classifier), they can't access the Spark Context, its only when we are applying fit/transform and we have a dataframe passed into us that we can access the SparkContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. I just thought I was being ignorant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we can call SparkContext.getOrCreate() to get the active SparkContext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my bad, I didn't see that new functionality.
|
Test build #36711 has finished for PR 7263 at commit
|
393376d to
4dfd418
Compare
|
Test build #36760 has finished for PR 7263 at commit
|
9813fb8 to
dcdcfa7
Compare
|
@mengxr I added support only for the scala code right now. We can move the python wrappers to another PR. |
|
Test build #37040 has finished for PR 7263 at commit
|
dcdcfa7 to
69d91b7
Compare
|
Test build #37039 has finished for PR 7263 at commit
|
|
Test build #37041 has finished for PR 7263 at commit
|
|
ping @mengxr |
|
I'm curious: Did you discuss with @mengxr about whether these methods should return the same types as the spark.mllib Scala API vs. DataFrames? |
|
I remember discussing somewhere (I think offline on chat). However, I think it would be neater to return DataFrames and would comply with spark.ml in general. What is your opinion? |
|
OK, if it was discussed, then it's fine with me. After all, it's more Java-friendly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"no." --> "number of"
|
Test build #39440 has finished for PR 7263 at commit
|
|
LGTM. Merging with master and branch-1.5 |
Add missing methods 1. getVectors 2. findSynonyms to W2Vec scala and python API mengxr Author: MechCoder <[email protected]> Closes #7263 from MechCoder/missing_methods_w2vec and squashes the following commits: 149d5ca [MechCoder] minor doc 69d91b7 [MechCoder] [SPARK-8874] [ML] Add missing methods in Word2Vec (cherry picked from commit 13675c7) Signed-off-by: Joseph K. Bradley <[email protected]>
After #7263 it is pretty straightforward to Python wrappers. Author: MechCoder <[email protected]> Closes #7930 from MechCoder/spark-9533 and squashes the following commits: 1bea394 [MechCoder] make getVectors a lazy val 5522756 [MechCoder] [SPARK-9533] [PySpark] [ML] Add missing methods in Word2Vec ML (cherry picked from commit 076ec05) Signed-off-by: Joseph K. Bradley <[email protected]>
After #7263 it is pretty straightforward to Python wrappers. Author: MechCoder <[email protected]> Closes #7930 from MechCoder/spark-9533 and squashes the following commits: 1bea394 [MechCoder] make getVectors a lazy val 5522756 [MechCoder] [SPARK-9533] [PySpark] [ML] Add missing methods in Word2Vec ML
After apache/spark#7263 it is pretty straightforward to Python wrappers. Author: MechCoder <[email protected]> Closes #7930 from MechCoder/spark-9533 and squashes the following commits: 1bea394 [MechCoder] make getVectors a lazy val 5522756 [MechCoder] [SPARK-9533] [PySpark] [ML] Add missing methods in Word2Vec ML
Add missing methods
to W2Vec scala and python API
@mengxr