Skip to content

Conversation

@MechCoder
Copy link
Contributor

Add missing methods

  1. getVectors
  2. findSynonyms

to W2Vec scala and python API

@mengxr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mengxr I would like to return this as a dataframe with each row having two fields, a string and a float array. But I'm unable to understand how to access the SparkContext here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So its my understanding, from my reading of the code, it isn't possible to access the Spark context here. If we look at similar methods in other models (like predict in classifier), they can't access the Spark Context, its only when we are applying fit/transform and we have a dataframe passed into us that we can access the SparkContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. I just thought I was being ignorant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can call SparkContext.getOrCreate() to get the active SparkContext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my bad, I didn't see that new functionality.

@SparkQA
Copy link

SparkQA commented Jul 7, 2015

Test build #36711 has finished for PR 7263 at commit 393376d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder MechCoder force-pushed the missing_methods_w2vec branch from 393376d to 4dfd418 Compare July 8, 2015 06:21
@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36760 has finished for PR 7263 at commit 4dfd418.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder MechCoder force-pushed the missing_methods_w2vec branch 2 times, most recently from 9813fb8 to dcdcfa7 Compare July 10, 2015 13:47
@MechCoder
Copy link
Contributor Author

@mengxr I added support only for the scala code right now. We can move the python wrappers to another PR.

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37040 has finished for PR 7263 at commit dcdcfa7.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder MechCoder force-pushed the missing_methods_w2vec branch from dcdcfa7 to 69d91b7 Compare July 10, 2015 13:55
@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37039 has finished for PR 7263 at commit 9813fb8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37041 has finished for PR 7263 at commit 69d91b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder
Copy link
Contributor Author

ping @mengxr

@jkbradley
Copy link
Member

I'm curious: Did you discuss with @mengxr about whether these methods should return the same types as the spark.mllib Scala API vs. DataFrames?

@MechCoder
Copy link
Contributor Author

I remember discussing somewhere (I think offline on chat). However, I think it would be neater to return DataFrames and would comply with spark.ml in general. What is your opinion?

@jkbradley
Copy link
Member

OK, if it was discussed, then it's fine with me. After all, it's more Java-friendly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no." --> "number of"

@SparkQA
Copy link

SparkQA commented Aug 2, 2015

Test build #39440 has finished for PR 7263 at commit 149d5ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM. Merging with master and branch-1.5
Thanks!

@asfgit asfgit closed this in 13675c7 Aug 3, 2015
asfgit pushed a commit that referenced this pull request Aug 3, 2015
Add missing methods

1. getVectors
2. findSynonyms

to W2Vec scala and python API

mengxr

Author: MechCoder <[email protected]>

Closes #7263 from MechCoder/missing_methods_w2vec and squashes the following commits:

149d5ca [MechCoder] minor doc
69d91b7 [MechCoder] [SPARK-8874] [ML] Add missing methods in Word2Vec

(cherry picked from commit 13675c7)
Signed-off-by: Joseph K. Bradley <[email protected]>
@MechCoder MechCoder deleted the missing_methods_w2vec branch August 4, 2015 05:07
asfgit pushed a commit that referenced this pull request Aug 6, 2015
After #7263 it is pretty straightforward to Python wrappers.

Author: MechCoder <[email protected]>

Closes #7930 from MechCoder/spark-9533 and squashes the following commits:

1bea394 [MechCoder] make getVectors a lazy val
5522756 [MechCoder] [SPARK-9533] [PySpark] [ML] Add missing methods in Word2Vec ML

(cherry picked from commit 076ec05)
Signed-off-by: Joseph K. Bradley <[email protected]>
asfgit pushed a commit that referenced this pull request Aug 6, 2015
After #7263 it is pretty straightforward to Python wrappers.

Author: MechCoder <[email protected]>

Closes #7930 from MechCoder/spark-9533 and squashes the following commits:

1bea394 [MechCoder] make getVectors a lazy val
5522756 [MechCoder] [SPARK-9533] [PySpark] [ML] Add missing methods in Word2Vec ML
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
After apache/spark#7263 it is pretty straightforward to Python wrappers.

Author: MechCoder <[email protected]>

Closes #7930 from MechCoder/spark-9533 and squashes the following commits:

1bea394 [MechCoder] make getVectors a lazy val
5522756 [MechCoder] [SPARK-9533] [PySpark] [ML] Add missing methods in Word2Vec ML
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants