Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Nov 10, 2015

org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence.

@SparkQA
Copy link

SparkQA commented Nov 10, 2015

Test build #2029 has finished for PR 9592 at commit a9f894d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Nov 10, 2015

@979969786 Could you try marking wordVectors @transient instead? I think that is the root cause.

https://github.com/979969786/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala#L151

@wangyum
Copy link
Member Author

wangyum commented Nov 11, 2015

OK, marking wordVectors @transient is better than don't marking, but still very slow

@mengxr
Copy link
Contributor

mengxr commented Nov 11, 2015

@979969786 I sent you a PR at https://github.com/979969786/spark/pull/1. I think it should fix the problem (though I didn't test).

@wangyum
Copy link
Member Author

wangyum commented Nov 11, 2015

I 've tested, running correctly and quickly.

@mengxr
Copy link
Contributor

mengxr commented Nov 11, 2015

ok to test

@mengxr
Copy link
Contributor

mengxr commented Nov 11, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45644 has finished for PR 9592 at commit 8b0df79.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Nov 11, 2015

Merged into master and branch-1.6. Thanks!

@asfgit asfgit closed this in 27524a3 Nov 11, 2015
asfgit pushed a commit that referenced this pull request Nov 11, 2015
org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence.

Author: Yuming Wang <[email protected]>
Author: yuming.wang <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes #9592 from 979969786/master.

(cherry picked from commit 27524a3)
Signed-off-by: Xiangrui Meng <[email protected]>
dskrvk pushed a commit to dskrvk/spark that referenced this pull request Nov 13, 2015
org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence.

Author: Yuming Wang <[email protected]>
Author: yuming.wang <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes apache#9592 from 979969786/master.
@mengxr
Copy link
Contributor

mengxr commented Nov 13, 2015

cc @yinxusen for the performance issue and the fix.

@yinxusen
Copy link
Contributor

Got it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants