Skip to content

Conversation

@sethah
Copy link
Contributor

@sethah sethah commented Jan 28, 2016

Pyspark Params class has a method hasParam(paramName) which returns True if the class has a parameter by that name, but throws an AttributeError otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of hasParam or add an additional method with this functionality.

In Python:

from pyspark.ml.classification import NaiveBayes
nb = NaiveBayes()
print nb.hasParam("smoothing")
print nb.hasParam("notAParam")

produces:

True
AttributeError: 'NaiveBayes' object has no attribute 'notAParam'

However, in Scala:

import org.apache.spark.ml.classification.NaiveBayes
val nb  = new NaiveBayes()
nb.hasParam("smoothing")
nb.hasParam("notAParam")

produces:

true
false

cc @holdenk

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50242 has started for PR 10962 at commit d52b1de.

@holdenk
Copy link
Contributor

holdenk commented Jan 28, 2016

I think this makes sense since it is harmonizing the behaviour with the Scala API. Perhaps adding a note in the PyDoc about the new behaviour as well as a test might be good additions to the PR?

Also lets ping @jkbradley & @yanboliang for thoughts on changing the API like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we support param of type Param, we should not only check the hasattr(self, param.name) but also check self.uid == param.parent. You can directly call _shouldOwn to do this work. It means if you provide a Param to check whether it belongs to the instance, you should both check uid and name.
But I vote to only support paramName that make consistent semantics with Scala.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree that we should only accept string types for this function. The reason I have included Param type is because in the current ml param tests, there is a check where hasParam is called by passing a Param instance instead of a string, so this test would fail (see here). It is odd that the test passes a Param instance and not a string, since the function describes itself as accepting strings, but, in an odd twist, the check works anyway.

If we do accept Param type, we can't call _shouldOwn because it throws an error instead of returning False (by design?). At any rate, I vote to accept only strings and change the test to pass in the param name instead of the param.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for accepting only strings. If no strong reasons, keep consistent with Scala is the best choice.

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50277 has finished for PR 10962 at commit 2989f93.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Jan 28, 2016

I wonder if there is support for also changing _shouldOwn to return True/False instead of True/ValueError? The only placed it is currently used is in _resolveParam, which could be modified to keep the same behavior. There is a related PR that would benefit from a _shouldOwn True/False check. Plus, the name seems to imply returning True/False instead of throwing an error. Thoughts?

@holdenk
Copy link
Contributor

holdenk commented Jan 28, 2016

Looking at params.scala shouldOwn uses a require which we should probably stay consistent with (not saying we shouldn't change it per-se just if we do we should change both). I think throwing an exception is more reasonable for shouldOwn since its only an internal use but hasParam is external facing.

@sethah
Copy link
Contributor Author

sethah commented Jan 28, 2016

Good point about mirroring Scala. In that case, it is probably best not to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a note per @holdenk 's suggestion. I can reword if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would view this one as a bug fix and backport it to branch-1.6. So the note is not necessary.

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50296 has finished for PR 10962 at commit a1b885c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Jan 28, 2016

Maybe @davies or @jkbradley could take a look?

@sethah
Copy link
Contributor Author

sethah commented Feb 5, 2016

ping! @yanboliang @jkbradley

This is a small change that will allow the Python API to match the Scala API, where the current behavior could be reasonably stated as a bug. Is there anything I need to change or update on the PR? I would very much appreciate your review.

return param in self.params
if isinstance(paramName, str):
p = getattr(self, paramName, None)
return p is not None and isinstance(p, Param)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: p is not None is not needed

@davies
Copy link
Contributor

davies commented Feb 5, 2016

LGTM, just one minor comment.

@sethah
Copy link
Contributor Author

sethah commented Feb 5, 2016

@davies Thanks for taking a look! I have addressed your comment.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50826 has finished for PR 10962 at commit 8dc7d05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #2521 has finished for PR 10962 at commit 8dc7d05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Feb 11, 2016

@sethah The changes look good to me and thanks for fixing it! I would say this is a bug and we should backport it to branch-1.6. So the note on the behavior change is unnecessary.

@sethah
Copy link
Contributor Author

sethah commented Feb 11, 2016

Thanks for reviewing @mengxr! I have addressed your comment.

@mengxr
Copy link
Contributor

mengxr commented Feb 12, 2016

test this please

@SparkQA
Copy link

SparkQA commented Feb 12, 2016

Test build #51152 has finished for PR 10962 at commit 6581575.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Feb 12, 2016

Merged into master and branch-1.6. Thanks!

@asfgit asfgit closed this in b354673 Feb 12, 2016
asfgit pushed a commit that referenced this pull request Feb 12, 2016
…n error

Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality.

In Python:
```python
from pyspark.ml.classification import NaiveBayes
nb = NaiveBayes()
print nb.hasParam("smoothing")
print nb.hasParam("notAParam")
```
produces:
> True
> AttributeError: 'NaiveBayes' object has no attribute 'notAParam'

However, in Scala:
```scala
import org.apache.spark.ml.classification.NaiveBayes
val nb  = new NaiveBayes()
nb.hasParam("smoothing")
nb.hasParam("notAParam")
```
produces:
> true
> false

cc holdenk

Author: sethah <[email protected]>

Closes #10962 from sethah/SPARK-13047.

(cherry picked from commit b354673)
Signed-off-by: Xiangrui Meng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants