Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding meaningful str representations to PyMC3 objects #4076

Merged
merged 7 commits into from
Sep 15, 2020

Conversation

Spaak
Copy link
Member

@Spaak Spaak commented Sep 4, 2020

This is the second PR of three as discussed in #3956 (and likely the one that should get closest attention).

This PR adds semantically meaningful implementations of __str__ to all Distribution, Model, and PyMC3Variable instances, mirroring the already existing functionality for rendering such objects as LaTeX in the context of jupyter notebooks.

Specifically:

model = pm.Model()
with model:
    mu = pm.Normal('mu', mu=0, sd=10)
    sd = pm.HalfNormal('sd', sd=10)
    datobs = pm.Normal('datobs', mu=mu, sd=sd, observed=np.random.normal(size=100))


>>> print(model)
    mu ~ Normal(mu=0.0, sigma=10.0)
    sd ~ HalfNormal(sigma=10.0)
datobs ~ Normal(mu=mu, sigma=f(sd))


>>> print(mu)
mu ~ Normal(mu=0.0, sigma=10.0)

etc.

Of special attention here is that in several places throughout the code str(var) was used to get the name of a variable, for use e.g. in a dict. I've replaced these uses with a new get_var_name function that delegates to TensorVariable.__str__. After merging this PR, future cases where a variable name is required should use get_var_name instead of str.

@twiecki @ColCarroll

@codecov
Copy link

codecov bot commented Sep 4, 2020

Codecov Report

Merging #4076 into master will increase coverage by 0.35%.
The diff coverage is 92.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4076      +/-   ##
==========================================
+ Coverage   88.29%   88.64%   +0.35%     
==========================================
  Files          88       88              
  Lines       13860    13881      +21     
==========================================
+ Hits        12237    12305      +68     
+ Misses       1623     1576      -47     
Impacted Files Coverage Δ
pymc3/model_graph.py 88.48% <50.00%> (ø)
pymc3/tuning/scaling.py 58.18% <50.00%> (+0.77%) ⬆️
pymc3/distributions/distribution.py 94.42% <75.00%> (+1.62%) ⬆️
pymc3/backends/base.py 87.30% <100.00%> (+0.04%) ⬆️
pymc3/backends/sqlite.py 93.54% <100.00%> (+0.03%) ⬆️
pymc3/blocking.py 69.23% <100.00%> (+0.29%) ⬆️
pymc3/distributions/posterior_predictive.py 89.29% <100.00%> (ø)
pymc3/model.py 89.21% <100.00%> (+1.28%) ⬆️
pymc3/step_methods/arraystep.py 93.66% <100.00%> (+0.04%) ⬆️
pymc3/tests/sampler_fixtures.py 96.77% <100.00%> (+0.02%) ⬆️
... and 9 more

@twiecki
Copy link
Member

twiecki commented Sep 5, 2020

Love the new print output!

Why can't we overload __str__ and continue to use str() instead of a creating a new function?

Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- as you said, touches a lot of code, so it'll be good to have it on master and collect bug reports for a while.

@twiecki, I think the problem with just using str everywhere is that this function also handles theano tensors (but I could be wrong)?

@Spaak
Copy link
Member Author

Spaak commented Sep 7, 2020

@twiecki:

Why can't we overload __str__ and continue to use str() instead of a creating a new function?

I think in principle it would be possible to keep using str everywhere, but this would mean that in many places the 'name' of the variable would include the distribution details, i.e. before this PR, str(mu) == 'mu' (the default Theano behaviour); after the PR, str(mu) == 'mu ~ Normal(mu=1.0, sigma=0.0)'. I'd say we definitely want to keep referring to variables by just their name. While we can overload __str__ (which this PR overrides), this does not pass through to str, which cannot accept any extra arguments (i.e. str has a fixed signature). It seems to me the only option here is to add the new function. (It also makes sense semantically to me, str(x) gives an informative representation, if we're specifically after a variable's name, use get_var_name(x).)

@aseyboldt aseyboldt mentioned this pull request Sep 7, 2020
@Spaak
Copy link
Member Author

Spaak commented Sep 15, 2020

Anyone interested in pulling the trigger here? ;) @twiecki @ColCarroll

@twiecki twiecki merged commit 9eb69fc into pymc-devs:master Sep 15, 2020
@twiecki
Copy link
Member

twiecki commented Sep 15, 2020

Thanks @Spaak!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants