Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distribution parameters as named positional arguments #5880

Closed
thomasaarholt opened this issue Jun 11, 2022 · 7 comments
Closed

Add distribution parameters as named positional arguments #5880

thomasaarholt opened this issue Jun 11, 2022 · 7 comments
Labels
major Include in major changes release notes section

Comments

@thomasaarholt
Copy link
Contributor

Description of your problem

PyMC sideliner here, who is finally taking a proper crack at it with 4.0. I find the distribution docstrings difficult to parse when viewing them in my editor. As an example, the Bernoulli distribution in jupyter notebook (I imagine a lot of people are using notebooks):

import pymc as pm
pm.Bernoulli()

image

And with VSCode (I am using this):
image

Two points:

  1. My main gripe is with the arguments being shown as pm.Bernoulli(name, *args, **kwargs)/(name: Unknown, *args: Unknown, **kwargs: Unknown) -> TensorVariable. It would be much nicer to actually show the parameters taken by the distribution (in this case, p).
  2. The second point is a bit more complicated: VSCode tries to show the docstring of the __init__ method rather than the class docstring*. Since Bernoulli doesn't have an explicit init method, it inherits from the first one it finds, which seems to be the __new__ method of the Distribution class. Note that if you position the cursor (|) as Bernoulli|() when invoking the documentation, you get the Bernoulli class docs and not the parent new doc. It would be nice to supply a simple docstring to the init method in order to override the confusing parent DIstribution docstring.

I am quite happy to take a crack at this, though I do recognise that there are several other issues concerned with similar things #5308, #5353, #5358)

  • class docstring vs init docstring:
class A:
    "class docstring"
    def __init__(self, a: int, b: str = "hi"):
        "init docstring"
        print(a, b)

Versions and main components

  • PyMC/PyMC3 Version: 4.0
  • Python Version: 3.9
@michaelosthege
Copy link
Member

Yeah, this is indeed very annoying.

By deleting the Distribution.__new__ docstring, VS Code shows the right docstring, but still with the uninformative signature.

Maybe one can monkeypatch autogenerated __init__ methods onto all distributions that don't have them?
The .dist() classmethods are the ones that count and __init__ is just being forwarded internally.
Creating __init__s for all distributions with basically the same signature as the .dist() classmethods would be a lot of redundancy =/

@ricardoV94
Copy link
Member

We should be using functions instead of classes

@thomasaarholt
Copy link
Contributor Author

@michaelosthege I see you're the author of the distribution.__new__ aspect. I just went digging a bit into the code, and saw issues like #5308. Did you need to use __new__ due to how pymc checks if it is in a with model: context? Or were there other reasons? (I've read up on what __new__ does, but it's unclear to me why it was needed here)

I still think init methods for the parameters is a good idea, despite the redundancy, at least until someone has a better idea. As it currently stands, it does raise the barrier for new users.

@ricardoV94 I see your bullet list in #5308 - how are you proposing using functions instead of classes?

@ricardoV94
Copy link
Member

ricardoV94 commented Jun 13, 2022

@ricardoV94 I see your bullet list in #5308 - how are you proposing using functions instead of classes?

If you follow the call stack you'll notice that we never do anything with the classes other than accessing the static property cls.rv_op and the (easily static) classmethod cls.dist. That could easily be done in a function.

@michaelosthege
Copy link
Member

I think @ricardoV94 is right that we should refactor distributions to be functions.

This is, however, a major refactoring and needs to be done right. The logic from pm.Distribution doesn't go away and will have to be refactored into some other structure (maybe a decorator?) to make sure it still applies to all distribution implementations.

Also the more complicated distributions like GPs, timeseries, ODEs will have to be considered.
It's probably a good idea to refactor many auxiliary things into functions.

Just to thinking out loud how we could get there:

  • Refactoring logic from pm.Distribution into functions
  • Implementing a def Normal() function-based alternative to figure out how it should look like / what needs to be done
  • Transition period where both function- and class-based distributions exist while they're being migrated one by one?

One of the bigger questions I have is how to deal with pm.Normal(name, mu, sigma) vs pm.Normal.dist(mu, sigma).

  • Requiring all parameters to be explicit kwargs?
  • Requiring nullable name args every time?
  • pm.Normal vs. pm.normal to mimick the underlying Aesara RVs while still providing the alternative parametrizations?

We should continue this discussion elsewehere (a Discussion?), by the way. It's beyond the scope of this issue.

@ricardoV94
Copy link
Member

Agree with this being out of context, but regarding:

One of the bigger questions I have is how to deal with pm.Normal(name, mu, sigma) vs pm.Normal.dist(mu, sigma).

I also thought about pm.normal, but I think it's not distinctive enough. Perhaps pm.normal_dist or pm.Normal_dist could do it.

@canyon289
Copy link
Member

canyon289 commented Jun 18, 2022

Sidenote: I independently ran into this same issue, and agree that a whole refactoring is a major challenge.

Perhaps if there is a way to alleviate this issue without refactoring in the meanwhile we can consider that here

image

@michaelosthege michaelosthege added the major Include in major changes release notes section label Jul 13, 2022
@pymc-devs pymc-devs locked and limited conversation to collaborators Aug 30, 2022
@ricardoV94 ricardoV94 converted this issue into discussion #6083 Aug 30, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
major Include in major changes release notes section
Projects
None yet
Development

No branches or pull requests

4 participants