-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6264] [MLLIB] Support FPGrowth algorithm in Python API #5213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #29237 has started for PR 5213 at commit
|
|
Test build #29237 has finished for PR 5213 at commit
|
|
Test FAILed. |
|
Test build #29240 has started for PR 5213 at commit
|
|
Test build #29240 has finished for PR 5213 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "helper"
|
Let's keep it Experimental for now; we can hopefully remove that tag before the 1.4 release if no issues come up before then. Also, can you please add doc to match the Scaladoc? (We've been lazy about this with Python but should be better about making the docs match.) Please edit python/docs/pyspark.mllib.rst to generate docs for Python. I'd follow the "pyspark.mllib.recommendation module" for settings. Thanks! |
|
Test build #29364 has started for PR 5213 at commit
|
|
Test build #29364 has finished for PR 5213 at commit
|
|
Test PASSed. |
|
@yanboliang Thanks for the updates. Can you please fix the merge issues? (Rebasing off of the current master is often easiest.) Also, can you please add documentation to FPGrowth.train()? Copying algorithm + parameter documentation from the Scala docs should be fine. That should be it. |
|
Test build #29463 has started for PR 5213 at commit |
|
Test build #29463 has finished for PR 5213 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be simpler to move this out of class PythonMLLibAPI to use with py4j. See #5243.
python/pyspark/mllib/fpm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty line before this line and doc are needed. It might be convenient if we follow the Java/Scala implementation and use a namedtuple to wrap the result. So users can call items and freq instead of [0] and [1].
|
Test PASSed. |
python/pyspark/mllib/fpm.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python doc, we limit the line width to 72 (following PEP8). This doesn't include the code example in the doc. Please update the doc strings in your PR.
|
The implementation looks good to me. There are some minor issues about the docstring style. Please fix it and it should be good to go. Thanks! |
|
Test build #29941 has started for PR 5213 at commit |
|
Test FAILed. |
|
jenkins, test this please |
|
Test build #29945 has started for PR 5213 at commit |
|
Test build #29945 has finished for PR 5213 at commit
|
|
Test FAILed. |
|
Test build #29952 has started for PR 5213 at commit |
|
Test build #29952 has finished for PR 5213 at commit
|
|
Test FAILed. |
|
test this please |
|
Test build #29961 has started for PR 5213 at commit |
|
Test build #29961 has finished for PR 5213 at commit
|
|
Test FAILed. |
|
LGTM. Merged into master. (The failed test are irrelevant.) @yanboliang Thanks! I created SPARK-6827 to wrap the records in freqItemsets with namedtuples. |
Support FPGrowth algorithm in Python API.
Should we remove "Experimental" which were marked for FPGrowth and FPGrowthModel in Scala? @jkbradley