-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw an error #10962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #50242 has started for PR 10962 at commit |
|
I think this makes sense since it is harmonizing the behaviour with the Scala API. Perhaps adding a note in the PyDoc about the new behaviour as well as a test might be good additions to the PR? Also lets ping @jkbradley & @yanboliang for thoughts on changing the API like this. |
python/pyspark/ml/param/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we support param of type Param, we should not only check the hasattr(self, param.name) but also check self.uid == param.parent. You can directly call _shouldOwn to do this work. It means if you provide a Param to check whether it belongs to the instance, you should both check uid and name.
But I vote to only support paramName that make consistent semantics with Scala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree that we should only accept string types for this function. The reason I have included Param type is because in the current ml param tests, there is a check where hasParam is called by passing a Param instance instead of a string, so this test would fail (see here). It is odd that the test passes a Param instance and not a string, since the function describes itself as accepting strings, but, in an odd twist, the check works anyway.
If we do accept Param type, we can't call _shouldOwn because it throws an error instead of returning False (by design?). At any rate, I vote to accept only strings and change the test to pass in the param name instead of the param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for accepting only strings. If no strong reasons, keep consistent with Scala is the best choice.
|
Test build #50277 has finished for PR 10962 at commit
|
|
I wonder if there is support for also changing |
|
Looking at params.scala shouldOwn uses a require which we should probably stay consistent with (not saying we shouldn't change it per-se just if we do we should change both). I think throwing an exception is more reasonable for shouldOwn since its only an internal use but hasParam is external facing. |
|
Good point about mirroring Scala. In that case, it is probably best not to change it. |
python/pyspark/ml/param/__init__.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a note per @holdenk 's suggestion. I can reword if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would view this one as a bug fix and backport it to branch-1.6. So the note is not necessary.
|
Test build #50296 has finished for PR 10962 at commit
|
|
Maybe @davies or @jkbradley could take a look? |
|
ping! @yanboliang @jkbradley This is a small change that will allow the Python API to match the Scala API, where the current behavior could be reasonably stated as a bug. Is there anything I need to change or update on the PR? I would very much appreciate your review. |
python/pyspark/ml/param/__init__.py
Outdated
| return param in self.params | ||
| if isinstance(paramName, str): | ||
| p = getattr(self, paramName, None) | ||
| return p is not None and isinstance(p, Param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: p is not None is not needed
|
LGTM, just one minor comment. |
|
@davies Thanks for taking a look! I have addressed your comment. |
|
Test build #50826 has finished for PR 10962 at commit
|
|
Test build #2521 has finished for PR 10962 at commit
|
|
@sethah The changes look good to me and thanks for fixing it! I would say this is a bug and we should backport it to branch-1.6. So the note on the behavior change is unnecessary. |
|
Thanks for reviewing @mengxr! I have addressed your comment. |
|
test this please |
|
Test build #51152 has finished for PR 10962 at commit
|
|
Merged into master and branch-1.6. Thanks! |
…n error
Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality.
In Python:
```python
from pyspark.ml.classification import NaiveBayes
nb = NaiveBayes()
print nb.hasParam("smoothing")
print nb.hasParam("notAParam")
```
produces:
> True
> AttributeError: 'NaiveBayes' object has no attribute 'notAParam'
However, in Scala:
```scala
import org.apache.spark.ml.classification.NaiveBayes
val nb = new NaiveBayes()
nb.hasParam("smoothing")
nb.hasParam("notAParam")
```
produces:
> true
> false
cc holdenk
Author: sethah <[email protected]>
Closes #10962 from sethah/SPARK-13047.
(cherry picked from commit b354673)
Signed-off-by: Xiangrui Meng <[email protected]>
Pyspark Params class has a method
hasParam(paramName)which returnsTrueif the class has a parameter by that name, but throws anAttributeErrorotherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior ofhasParamor add an additional method with this functionality.In Python:
produces:
However, in Scala:
produces:
cc @holdenk