-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Season of Docs 2021 Information
Welcome to the ideas page for the PyMC3 entry in the 2021 Season of Docs. This page aggregates information regarding PyMC3's proposal, to give candidate writers an idea of what we are looking for to improve our project's documentation this year.
The core of PyMC3's documentation is comprised mainly of examples and tutorials in the form of Jupyter notebooks. Most of these are of high quality and have been useful resources for the library's user base, but they were created in isolation and essentially function in isolation, grouped together by topic in an ad hoc fashion. Moreover, because they were written by a different authors, often with different goals and emphasis, the amount and style of supporting text (and code) varies greatly among them. The goal of this project is to logically and effectively integrate the existing documentation into a more focused, cohesive system. We hope that the resulting product would better enable users to teach themselves Bayesian computation, and in particular, how to do Bayesian computation using PyMC3.
The Divio documentation system breaks down software documentation into four distinct quadrants:
- tutorials
- how-to guides
- explanation
- reference
Due in part to how it originated, PyMC3's documentation is stronger on the right-hand side of this diagram than on the left. If a user has a well-defined, one-off task to complete, chances are they can find an adequate reference for implementing it among the project's collection of notebooks. For example, if you want to fit a latent variable Gaussian process there is a high-quality, runnable notebook that can act as a template for your own analysis. However, if you merely have dataset that might be well-served by using a latent-variable Gaussian process for inference, but you don't know about Gaussian processes you may miss it, and perhaps end up using an inappropriate type of model in its absence. Thus, there is an element of learning and understanding that needs to take place in order to make sound decisions regarding your analysis, and how to execute it with PyMC3. This project aims to provide the resources for such goals.
While some original writing will be required, much of the work associated with this project will involve thoughtfully refactoring the existing materials in the documentation to better facilitate learning. We recognize that there is no single best way to approach this problem and will be responsive to innovative and creative ways of developing the material.
The PyMC3 project is currently in the process of a major technological change. To date, PyMC3 has relied on the now-defunct Theano deep learning engine for much of its computational requirements, from constructing the model graph, to automatic differentiation for gradient-based MCMC methods, to the auto-generation of optimized C code. In the past year, the PyMC core development team has taken over the development of Theano from its original developers, with the aim of ensuring its long-term viability as the computational backbone of PyMC. During this time, we have discovered that it is possible to develop linkers for converting model graphs to executable functions using alternative backends. This creates a tremendous opportunity for PyMC to take advantage of newer computational innovations as they arise, including samplers written in other probabilistic programming frameworks, such as TensorFlow Probability and Numpyro. We are in an ongoing process to implement these changes to Theano to ensure its viability well into the future. To distinguish this new updated backend it has been renamed Aesara.
To accompany these changes, we would like to develop architecture documentation that can be used as a resource for PyMC3 contributors and advanced users. We envision this as a developer-oriented walk-through of the new computational backend that can be updated as Aesara and PyMC3 evolve together. As the current developer documentation consists of an API reference and a single developer guide notebook, we also view this as an opportunity to make PyMC3's developer resources more robust, with the ultimate goal of attracting and retaining more contributors, and allowing all users the opportunity to better understand the underlying implementation of their favorite probabililistic programming methods.
In addition to a rethinking and expansion of our current documentation materials, there is also scope for improving the PyMC3 documentation site itself. Our current documentation resides on a relatively simple Sphinx-generated website that aggregates our various documentation sources on a banner-style menu. We would welcome a more intuitive layout, which perhaps includes a navigation sidebar for easier movement through the documentation. We are also interested in a pivotal change to the organization of the documentation that would clearly separate library docs (i.e. material describing the software itself, including tutorials and reference material) from the project docs (i.e. those having to do with governance, community, and related issues). There are also important documents that currently reside on our project wiki that should be folded into the documentation site.
If you are interested in applying for the funded writer position, please send a CV and writing sample to fonnesbeck at gmail dot com. There will be opportunities for the community to contribute to this effort also, and details of this will be posted to our Discourse site in the near future.