-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13967] [PYSPARK][ML] Added binary Param to Python CountVectorizer #12308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13967] [PYSPARK][ML] Added binary Param to Python CountVectorizer #12308
Conversation
|
cc @MLnick |
|
Since we seem to be adding the same param to multiple models, would it maybe make sense to make this a shared param? |
|
Test build #55541 has finished for PR 12308 at commit
|
Maybe, I'm not sure if there are other uses besides |
|
Will take a look |
|
@holdenk @BryanCutler I'd say we could make We could later add |
|
LGTM otherwise. |
|
+1 for not sharing the Param if the docs (and semantics) differ |
python/pyspark/ml/feature.py
Outdated
| outputCol=None): | ||
| """ | ||
| setParams(self, minTF=1.0, minDF=1.0, vocabSize=1 << 18, inputCol=None, outputCol=None) | ||
| setParams(self, minTF=1.0, minDF=1.0, vocabSize=1 << 18, binary=False, inputCol=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a backslash at the end. Please check the generated HTML doc.
This should have cause the doc checks to fail right? |
|
Test build #55633 has finished for PR 12308 at commit
|
|
@BryanCutler it seems that (due to a bug introduced during code review) right now we only hault on doc build errors, not warnings as originally intended. Thanks for noticing this I'll make a follow up JIRA to deal with this. |
|
It looks like the option to treat warnings as errors is there, it just gets overwritten in the makefile |
|
Yup I've got a fix but going to cleanup the warnings that got in the meantime too. |
|
@mengxr @jkbradley anything further? |
|
LGTM, feel free to merge, thanks! |
|
Merged to master. Thanks! |
Added binary toggle param to CountVectorizer feature transformer in PySpark.
Created a unit test for using CountVectorizer with the binary toggle on.