-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build all generated documentation dynamically #1235
Conversation
We've historically had our (generated) API docs checked into git because running the separate sphinx-apidoc command on Read The Docs was kind of a hack. However, a newer API reference documentation generator called AutoAPI now integrates this process into the normal docs build, so we can easily generate the API reference whenever the docs are built. In addition, since we're now building within a conda environment on RTD, with pudl fully installed, we can call the metadata_to_rst function from within the docs build script, avoiding the need to separately generate the data dictionaries and store them in source control.
Codecov Report
@@ Coverage Diff @@
## dev #1235 +/- ##
==========================================
- Coverage 81.55% 81.53% -0.02%
==========================================
Files 52 52
Lines 6379 6379
==========================================
- Hits 5202 5201 -1
- Misses 1177 1178 +1
Continue to review full report at Codecov.
|
docs/conf.py
Outdated
# Generate a data dictionary for the documentation: | ||
# Doing this here means we don't have track script output with Git: | ||
skip_names = ["datasets", "accumulated_depreciation_ferc1"] | ||
names = [name for name in sorted(RESOURCE_METADATA) if name not in skip_names] | ||
package = Package.from_resource_ids(names) | ||
# Sort fields within each resource by name: | ||
for resource in package.resources: | ||
resource.schema.fields = sorted( | ||
resource.schema.fields, key=lambda x: x.name | ||
) | ||
package.to_rst(path="data_dictionaries/pudl_db.rst") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that there's a better way to integrate this into the workflow of the docs build. Define a function here and register it with one of the Sphinx build / plugin hooks. Rather than just having it run on import of conf.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is clear and effective. Are there drawbacks to this solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I foresee is that this has substantial external side-effects on import -- any time this module gets imported, the data dictionary RST file gets regenerated. Which is definitely a no-no. I know that there are several places to hook into the Sphinx build API and if we did that, then this would only happen when the build is actually getting run, and at the appropriate time. The functionality would still be defined here but its execution would be more controlled / isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we want to do is analogous to this example of deploying static assets into the docs build so I'm just going to go ahead and make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks good to me!
Just curious, are there any cons to not tracking our docs in git? Will it be more difficult to track down any bugs in our docs?
docs/conf.py
Outdated
# Generate a data dictionary for the documentation: | ||
# Doing this here means we don't have track script output with Git: | ||
skip_names = ["datasets", "accumulated_depreciation_ferc1"] | ||
names = [name for name in sorted(RESOURCE_METADATA) if name not in skip_names] | ||
package = Package.from_resource_ids(names) | ||
# Sort fields within each resource by name: | ||
for resource in package.resources: | ||
resource.schema.fields = sorted( | ||
resource.schema.fields, key=lambda x: x.name | ||
) | ||
package.to_rst(path="data_dictionaries/pudl_db.rst") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is clear and effective. Are there drawbacks to this solution?
We're still tracking our docs in git -- we're just not tracking the same information in more than one place anymore. If we have bugs in the docs we'll fix them in the place where the information originates, rather than accidentally changing a generated file that's also checked into source control. This will also mean that people don't need to remember to generate the docs and check in the generated files after they're done with editing code or adding a new module. Which... almost nobody does. I usually end up doing it after reviewing someone's PR. |
@@ -110,6 +129,29 @@ | |||
html_static_path = ['_static'] | |||
|
|||
|
|||
# -- Custom build operations ------------------------------------------------- | |||
def metadata_to_rst(app): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this is way cleaner. And plus now I know how to inject behavior into Sphinx! I think we'll probably want to do a lot more exporting of metadata and other information programmatically into the docs going forward, so this is good.
It also lets us cleanup after the build is finished, rather than needing to have a line in Tox that does it.
We've historically had our programmatically generated API docs checked into git because running the separate
sphinx-apidoc
command on Read The Docs was kind of a hack. However, a newer API reference documentation generator called AutoAPI now integrates this process into the normal docs build, so we can easily generate the API reference whenever the docs are built.In addition, since we're now building within a conda environment on RTD, with the PUDL package fully installed, we can call our own
metadata_to_rst()
function from within the docs build script (docs/conf.py
), avoiding the need to separately generate the data dictionaries and store them in source control.