[SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API #5213

yanboliang · 2015-03-26T15:58:49Z

Support FPGrowth algorithm in Python API.
Should we remove "Experimental" which were marked for FPGrowth and FPGrowthModel in Scala? @jkbradley

SparkQA · 2015-03-26T16:03:23Z

Test build #29237 has started for PR 5213 at commit 4f26944.

This patch merges cleanly.

SparkQA · 2015-03-26T16:10:19Z

Test build #29237 has finished for PR 5213 at commit 4f26944.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):

AmplabJenkins · 2015-03-26T16:10:21Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29237/
Test FAILed.

SparkQA · 2015-03-26T16:28:17Z

Test build #29240 has started for PR 5213 at commit 546494a.

This patch merges cleanly.

SparkQA · 2015-03-26T17:51:41Z

Test build #29240 has finished for PR 5213 at commit 546494a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):

AmplabJenkins · 2015-03-26T17:51:45Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29240/
Test PASSed.

jkbradley · 2015-03-27T21:06:42Z

mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala

typo: "helper"

jkbradley · 2015-03-27T21:10:20Z

Let's keep it Experimental for now; we can hopefully remove that tag before the 1.4 release if no issues come up before then.

Also, can you please add doc to match the Scaladoc? (We've been lazy about this with Python but should be better about making the docs match.)

Please edit python/docs/pyspark.mllib.rst to generate docs for Python. I'd follow the "pyspark.mllib.recommendation module" for settings.

Thanks!

SparkQA · 2015-03-29T08:48:14Z

Test build #29364 has started for PR 5213 at commit e3f17cb.

This patch does not merge cleanly.

SparkQA · 2015-03-29T10:39:38Z

Test build #29364 has finished for PR 5213 at commit e3f17cb.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):

AmplabJenkins · 2015-03-29T10:39:42Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29364/
Test PASSed.

jkbradley · 2015-03-30T20:10:30Z

@yanboliang Thanks for the updates. Can you please fix the merge issues? (Rebasing off of the current master is often easiest.)

Also, can you please add documentation to FPGrowth.train()? Copying algorithm + parameter documentation from the Scala docs should be fine. That should be it.

SparkQA · 2015-03-31T07:03:20Z

Test build #29463 has started for PR 5213 at commit a2d7cf7.

SparkQA · 2015-03-31T08:26:49Z

Test build #29463 has finished for PR 5213 at commit a2d7cf7.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):
This patch does not change any dependencies.

AmplabJenkins · 2015-03-31T08:26:53Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29463/
Test PASSed.

mengxr · 2015-03-31T18:09:55Z

mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala

It might be simpler to move this out of class PythonMLLibAPI to use with py4j. See #5243.

mengxr · 2015-03-31T18:10:58Z

python/pyspark/mllib/fpm.py

Empty line before this line and doc are needed. It might be convenient if we follow the Java/Scala implementation and use a namedtuple to wrap the result. So users can call items and freq instead of [0] and [1].

AmplabJenkins · 2015-04-04T18:28:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29713/
Test PASSed.

mengxr · 2015-04-05T23:11:36Z

python/pyspark/mllib/fpm.py

In Python doc, we limit the line width to 72 (following PEP8). This doesn't include the code example in the doc. Please update the doc strings in your PR.

mengxr · 2015-04-05T23:12:13Z

The implementation looks good to me. There are some minor issues about the docstring style. Please fix it and it should be good to go. Thanks!

SparkQA · 2015-04-09T15:33:12Z

Test build #29941 has started for PR 5213 at commit 8ce0359.

AmplabJenkins · 2015-04-09T16:01:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29941/
Test FAILed.

shaneknapp · 2015-04-09T16:02:30Z

jenkins, test this please

SparkQA · 2015-04-09T16:05:50Z

Test build #29945 has started for PR 5213 at commit 8ce0359.

SparkQA · 2015-04-09T17:30:34Z

Test build #29945 has finished for PR 5213 at commit 8ce0359.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):
This patch does not change any dependencies.

AmplabJenkins · 2015-04-09T17:30:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29945/
Test FAILed.

SparkQA · 2015-04-09T17:58:23Z

Test build #29952 has started for PR 5213 at commit ed62ead.

SparkQA · 2015-04-09T19:01:30Z

Test build #29952 has finished for PR 5213 at commit ed62ead.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):
This patch does not change any dependencies.

AmplabJenkins · 2015-04-09T19:01:35Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29952/
Test FAILed.

mengxr · 2015-04-09T20:28:03Z

test this please

SparkQA · 2015-04-09T20:33:27Z

Test build #29961 has started for PR 5213 at commit ed62ead.

SparkQA · 2015-04-09T21:37:06Z

Test build #29961 has finished for PR 5213 at commit ed62ead.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class FPGrowthModel(JavaModelWrapper):
- class FPGrowth(object):
This patch does not change any dependencies.

AmplabJenkins · 2015-04-09T21:37:11Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29961/
Test FAILed.

mengxr · 2015-04-09T22:12:08Z

LGTM. Merged into master. (The failed test are irrelevant.) @yanboliang Thanks! I created SPARK-6827 to wrap the records in freqItemsets with namedtuples.

jkbradley reviewed Mar 27, 2015
View reviewed changes

mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala Outdated

Copy link

Member

jkbradley Mar 27, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "helper"

yanboliang added 6 commits March 31, 2015 14:54

Support FPGrowth algorithm in Python API

b96206a

add fpm to __init__.py

7f62c8f

fix typos

2c951b8

trigger jenkins

b18fd07

add python doc

dcf7d73

add doc for FPGrowth.train()

a2d7cf7

yanboliang force-pushed the spark-6264 branch from e3f17cb to a2d7cf7 Compare March 31, 2015 06:57

mengxr reviewed Mar 31, 2015
View reviewed changes

address comments

544c725

mengxr reviewed Apr 5, 2015
View reviewed changes

fix docstring style

8ce0359

trigger jenkins

ed62ead

asfgit closed this in a0411ae Apr 9, 2015

yanboliang deleted the spark-6264 branch April 24, 2015 10:02

[SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API #5213

[SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API #5213

Uh oh!

Conversation

yanboliang commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 26, 2015

Uh oh!

AmplabJenkins commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 26, 2015

Uh oh!

AmplabJenkins commented Mar 26, 2015

Uh oh!

jkbradley Mar 27, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Mar 27, 2015

Uh oh!

SparkQA commented Mar 29, 2015

Uh oh!

SparkQA commented Mar 29, 2015

Uh oh!

AmplabJenkins commented Mar 29, 2015

Uh oh!

jkbradley commented Mar 30, 2015

Uh oh!

SparkQA commented Mar 31, 2015

Uh oh!

SparkQA commented Mar 31, 2015

Uh oh!

AmplabJenkins commented Mar 31, 2015

Uh oh!

mengxr Mar 31, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr Mar 31, 2015

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Apr 4, 2015

Uh oh!

mengxr Apr 5, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr commented Apr 5, 2015

Uh oh!

SparkQA commented Apr 9, 2015

Uh oh!

AmplabJenkins commented Apr 9, 2015

Uh oh!

shaneknapp commented Apr 9, 2015

Uh oh!

SparkQA commented Apr 9, 2015

Uh oh!

SparkQA commented Apr 9, 2015

Uh oh!

AmplabJenkins commented Apr 9, 2015

Uh oh!

SparkQA commented Apr 9, 2015

Uh oh!

SparkQA commented Apr 9, 2015

Uh oh!

AmplabJenkins commented Apr 9, 2015

Uh oh!

mengxr commented Apr 9, 2015

Uh oh!

SparkQA commented Apr 9, 2015

Uh oh!

SparkQA commented Apr 9, 2015

Uh oh!

AmplabJenkins commented Apr 9, 2015

Uh oh!

mengxr commented Apr 9, 2015

Uh oh!

Reviewers

Assignees