-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11626][ML]ml.feature.Word2Vec.transform() function very slow #9592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… broadcast every sentence
|
Test build #2029 has finished for PR 9592 at commit
|
|
@979969786 Could you try marking |
|
OK, marking wordVectors @transient is better than don't marking, but still very slow |
|
@979969786 I sent you a PR at https://github.com/979969786/spark/pull/1. I think it should fix the problem (though I didn't test). |
fix serialization issues in ml.Word2Vec
|
I 've tested, running correctly and quickly. |
|
ok to test |
|
test this please |
|
Test build #45644 has finished for PR 9592 at commit
|
|
Merged into master and branch-1.6. Thanks! |
org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence. Author: Yuming Wang <[email protected]> Author: yuming.wang <[email protected]> Author: Xiangrui Meng <[email protected]> Closes #9592 from 979969786/master. (cherry picked from commit 27524a3) Signed-off-by: Xiangrui Meng <[email protected]>
org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence. Author: Yuming Wang <[email protected]> Author: yuming.wang <[email protected]> Author: Xiangrui Meng <[email protected]> Closes apache#9592 from 979969786/master.
|
cc @yinxusen for the performance issue and the fix. |
|
Got it, thanks! |
org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence.