[SPARK-11626][ML]ml.feature.Word2Vec.transform() function very slow #9592

wangyum · 2015-11-10T08:37:46Z

org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence.

… broadcast every sentence

SparkQA · 2015-11-10T11:49:33Z

Test build #2029 has finished for PR 9592 at commit a9f894d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-11-10T18:35:52Z

@979969786 Could you try marking wordVectors @transient instead? I think that is the root cause.

https://github.com/979969786/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala#L151

wangyum · 2015-11-11T01:50:06Z

OK, marking wordVectors @transient is better than don't marking, but still very slow

mengxr · 2015-11-11T03:34:20Z

@979969786 I sent you a PR at https://github.com/979969786/spark/pull/1. I think it should fix the problem (though I didn't test).

fix serialization issues in ml.Word2Vec

wangyum · 2015-11-11T05:35:59Z

I 've tested, running correctly and quickly.

mengxr · 2015-11-11T15:35:01Z

ok to test

mengxr · 2015-11-11T15:35:13Z

test this please

SparkQA · 2015-11-11T16:28:19Z

Test build #45644 has finished for PR 9592 at commit 8b0df79.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-11-11T17:44:20Z

Merged into master and branch-1.6. Thanks!

org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence. Author: Yuming Wang <[email protected]> Author: yuming.wang <[email protected]> Author: Xiangrui Meng <[email protected]> Closes #9592 from 979969786/master. (cherry picked from commit 27524a3) Signed-off-by: Xiangrui Meng <[email protected]>

org.apache.spark.ml.feature.Word2Vec.transform() very slow. we should not read broadcast every sentence. Author: Yuming Wang <[email protected]> Author: yuming.wang <[email protected]> Author: Xiangrui Meng <[email protected]> Closes apache#9592 from 979969786/master.

mengxr · 2015-11-13T21:20:25Z

cc @yinxusen for the performance issue and the fix.

yinxusen · 2015-11-14T01:01:59Z

Got it, thanks!

ml.feature.Word2Vec.transform() function very slow, we shouldn't read…

a9f894d

… broadcast every sentence

modify wordVectors to transient.

a2b2835

mengxr and others added 2 commits November 10, 2015 19:39

improve implementation to avoid mapping per record

fec6460

Merge pull request #1 from mengxr/SPARK-11626

8b0df79

fix serialization issues in ml.Word2Vec

asfgit closed this in 27524a3 Nov 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-11626][ML]ml.feature.Word2Vec.transform() function very slow #9592

[SPARK-11626][ML]ml.feature.Word2Vec.transform() function very slow #9592

Uh oh!

wangyum commented Nov 10, 2015

Uh oh!

SparkQA commented Nov 10, 2015

Uh oh!

mengxr commented Nov 10, 2015

Uh oh!

wangyum commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

wangyum commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

SparkQA commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

mengxr commented Nov 13, 2015

Uh oh!

yinxusen commented Nov 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-11626][ML]ml.feature.Word2Vec.transform() function very slow #9592

[SPARK-11626][ML]ml.feature.Word2Vec.transform() function very slow #9592

Uh oh!

Conversation

wangyum commented Nov 10, 2015

Uh oh!

SparkQA commented Nov 10, 2015

Uh oh!

mengxr commented Nov 10, 2015

Uh oh!

wangyum commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

wangyum commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

SparkQA commented Nov 11, 2015

Uh oh!

mengxr commented Nov 11, 2015

Uh oh!

mengxr commented Nov 13, 2015

Uh oh!

yinxusen commented Nov 14, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants