Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return_inferencedata option for pm.sample #3911

Merged

Conversation

michaelosthege
Copy link
Member

@michaelosthege michaelosthege commented May 4, 2020

Most PyMC3 modeling workflows (should) at some point use ArviZ to save and load traces as InferenceData objects.

With a sample(…, return_inferencedata=True/False) option, the conversion may happen as early as possible, encouraging the user to use ArviZ from the start.

  • wait for next ArviZ release to happen

  • update PR such that

    • pin to latest ArviZ version
    • take out the NotImplementedError
  • there are no breaking changes to the user-facing API - but a FutureWarning is added, allowing us to default to return-inferencedata=True in v4.0.0 if it turns out to be a good idea

  • in use inference data in end of sampling report #3883, Oriol made some optimizations to avoid duplicate work of conversion. This PR does the same by doing the conversion already in sample() and passing the idata for convergence checks.

  • are the changes—especially new features—covered by tests and docstrings? -- test was added, other tests remain untouched, so they assert that the user-facing API did not break

  • consider adding/updating relevant example notebooks - I'll do that on a future PR

  • right before it's ready to merge, mention the PR in the RELEASE-NOTES.md

+ convert to InferenceData and save metadata to it already in sample()
+ pass idata instead of trace to convergence check, to avoid duplicate work
+ directly use arviz diagnostics instead of pymc3 aliases
@michaelosthege
Copy link
Member Author

This is the first stage - converting to InferenceData earlier.

I'm going to wait for the tests before adding the return_inferencedata option.

@codecov
Copy link

codecov bot commented May 4, 2020

Codecov Report

Merging #3911 into master will decrease coverage by 2.90%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3911      +/-   ##
==========================================
- Coverage   86.40%   83.49%   -2.91%     
==========================================
  Files          86      103      +17     
  Lines       13728    14192     +464     
==========================================
- Hits        11861    11849      -12     
- Misses       1867     2343     +476     
Impacted Files Coverage Δ
pymc3/backends/ndarray.py 92.63% <ø> (ø)
pymc3/backends/text.py 97.16% <ø> (ø)
pymc3/backends/report.py 95.31% <100.00%> (+2.23%) ⬆️
pymc3/sampling.py 86.47% <100.00%> (+0.27%) ⬆️
pymc3/gp/util.py 50.74% <0.00%> (-29.26%) ⬇️
pymc3/distributions/dist_math.py 91.50% <0.00%> (-0.20%) ⬇️
pymc3/tuning/starting.py 80.76% <0.00%> (-0.15%) ⬇️
pymc3/distributions/continuous.py 80.01% <0.00%> (ø)
pymc3/examples/gelman_bioassay.py 0.00% <0.00%> (ø)
pymc3/examples/factor_potential.py 0.00% <0.00%> (ø)
... and 16 more

+ set to None
+ defaults to False
@michaelosthege
Copy link
Member Author

Tests are green.
Now pushing the addition of the return_inferencedata kwarg. The tests should remain green...

Replaced "<varname>: <type>" with "<varname> : <type>" per numpy guidelines.

Fix spelling typo.
Copy link
Contributor

@rpgoldman rpgoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, pending tests.

Probably needs some additional tests that exercise returning InferenceData.

pymc3/sampling.py Outdated Show resolved Hide resolved
pymc3/sampling.py Outdated Show resolved Hide resolved
@michaelosthege
Copy link
Member Author

I've added a test for the new kwarg.

As .sample( appears another 78 times in the tests, I'd propose to update those tests with the future PR that changes the default behavior.

So this PR is ready for review - I'll update the release notes when I address your upcoming feedback.

@michaelosthege michaelosthege changed the title [WIP] return_inferencedata option for pm.sample return_inferencedata option for pm.sample May 4, 2020
@michaelosthege
Copy link
Member Author

You'll notice a NotImplementedError when return_inferencedata==True and discard_tuned_samples==False.

I'm in contact with Oriol to get arviz.from_pymc3 support for warmup draws. If this can be arranged in an ArviZ release in the near future, we can use that and pin that ArviZ version for the 3.9 release?

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Michael, this looks good to me 👌
I'll merge tomorrow morning, in case @rpgoldman or @ColCarroll request some last-minute changes.

pymc3/sampling.py Outdated Show resolved Hide resolved
pymc3/sampling.py Outdated Show resolved Hide resolved
pymc3/sampling.py Outdated Show resolved Hide resolved
pymc3/sampling.py Outdated Show resolved Hide resolved
@AlexAndorra
Copy link
Contributor

@michaelosthege, as discussed, @OriolAbril's warmup PR was merged on ArviZ repo, and the next ArviZ release will be 0.8.0, so I think we can reactivate work on this PR and address the comments we made above 👌

@michaelosthege
Copy link
Member Author

Yes, we can wait for the ArviZ release.
I'll address the feedback when the release is out.

@fonnesbeck
Copy link
Member

We should probably either convert one of the example notebooks to using this throughout, or create a new one. The api_quickstart is probably the place to put it.

+ more direct use of ArviZ
+ some wording things
@michaelosthege
Copy link
Member Author

@AlexAndorra and @OriolAbril thank you for the feedback on the notebook! they were really straightforward to apply.

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @michaelosthege ! There was a small typo left, and I left a question about how to handle warmups 😉 Going to review the NB now.

RELEASE-NOTES.md Outdated Show resolved Hide resolved
pymc3/sampling.py Outdated Show resolved Hide resolved
@review-notebook-app
Copy link

review-notebook-app bot commented May 26, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-26T10:37:17Z
----------------------------------------------------------------

Can you set tune to 1500 or 2000? As the default is now 1000, not sure it's a good idea to nudge users towards using less than default, especially as lots of them already don't pay a lot of attention to tuning samples. Plus, the sampler is complaining because not enough tuning here ;)


@review-notebook-app
Copy link

review-notebook-app bot commented May 26, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-26T10:37:18Z
----------------------------------------------------------------

Typo: "With discard_tuned_samples=False", not True

Also, maybe put this line below the next cell, otherwise it makes it seems like you're showing the place where tuning samples are kept in ID.


@review-notebook-app
Copy link

review-notebook-app bot commented May 26, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-26T10:37:18Z
----------------------------------------------------------------

You have to set the kwarg r_hat=True


@review-notebook-app
Copy link

review-notebook-app bot commented May 26, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-26T10:37:19Z
----------------------------------------------------------------

To use the Data container through the end you can set the new values like this, in the model context (see this tutorial for more):

with model:
    pm.set_data({'x_obs': [-1, 0, 1.0], "y_obs": [0, 0, 0]})
    post_pred = pm.sample_posterior_predictive(idata.posterior)

Let's also use the default number of samples, as earlier.


@AlexAndorra
Copy link
Contributor

AlexAndorra commented May 26, 2020

@michaelosthege , finished reviewing the NB -- just a few tweaks to add / typos to fix and we'll be good to go 👌

@michaelosthege
Copy link
Member Author

The pipelines fails because of arviz-devs/arviz#1210

I could make a workaround for the test case, but I also think having draws=0 is a valid use case..

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NB is a marvel now, thanks a lot for your patience and perseverance on this @michaelosthege !
Waiting for ArviZ 0.8.3 to be released, and I'll merge if tests pass then -- I really, really hope no other edge case will pop up this time 😅 🤞

@AlexAndorra
Copy link
Contributor

Problems seem to be fixed on ArviZ master, but before releasing 0.8.3, we need to make sure this really works. To do that, the best would be that Travis tests against ArviZ master, not only against the latest release -- @canyon289 do you know if it's possible and how easy it would be?
Also pinging @twiecki and @ColCarroll, who could have valuable input.

@OriolAbril
Copy link
Member

Given that @michaelosthege has successfully run the tests locally, I'll merge tomorrow to make release 0.8.3 and hope for the best.

@AlexAndorra
Copy link
Contributor

@OriolAbril was emboldened by your successful local tests and released ArviZ 0.8.3 @michaelosthege, so you can push your last changes and I'll merge if tests pass 🤞

@OriolAbril
Copy link
Member

🎉

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've done it, well done guys 👏 Quite a team 💪

@AlexAndorra AlexAndorra merged commit aafa00b into pymc-devs:master May 28, 2020
@michaelosthege michaelosthege deleted the return-inferencedata-option branch May 28, 2020 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants