Add dbt docs natively in Airflow via plugin#737
Conversation
👷 Deploy Preview for amazing-pothos-a3bca0 processing.
|
6a0b7a4 to
911e132
Compare
|
I've been excited about this one! I haven't looked at this super in-depth yet, but how does the local filesystem option work? Could I add running |
911e132 to
b3be404
Compare
Yes, exactly. For my own professional usage this is the option I am doing. As part of my deployment process, I I like this deployment approach because it relieves some compute pressure on the scheduler since it doesn't need to So when this feature is in, I would This approach does have a small downside as far as dbt docs are concerned (and now that I mention it I should note it in the documentation), which is that the Many users will be fine with these limitations of stale caching. If anyone wants low infra but up-to-date docs, you can
Overall no cloud infra + no pre-compile is an option, but it would require additional hacks and probably isn't something I'd encourage. |
8119dd1 to
f744980
Compare
|
Please bear with me as I just throw some commits at Github Actions in a desperate attempt to get the tests I added working. I cannot for the life of me get the tests working on my local machine. 😓 Also, I added the aforementioned caveats regarding local storage to the documentation. |
f744980 to
190c2e8
Compare
190c2e8 to
b74e402
Compare
b74e402 to
96d6759
Compare
96d6759 to
e00eb78
Compare
✅ Deploy Preview for sunny-pastelito-5ecb04 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
What's the status of this? 🤔 The issue we left off at was that @tatiana was having some issues with the HTTP method for retrieving files, but I was unable to replicate it or figure out what the problem was (it worked on my end...). We both were able to confirm that the local method worked, and the GCS/S3/Azure methods have not been fully integration tested. |
jbandoro
left a comment
There was a problem hiding this comment.
What's the status of this? 🤔 The issue we left off at was that @tatiana was having some issues with the HTTP method for retrieving files, but I was unable to replicate it or figure out what the problem was (it worked on my end...). We both were able to confirm that the local method worked, and the GCS/S3/Azure methods have not been fully integration tested.
Thanks for another great contribution @dwreeves, this is awesome!
I confirmed that the HTTP, local file, and GCS paths work with docker-compose. @tatiana mentioned to me that the HTTP path wasn't working when running airflow standalone, and I had an issue where the webserver process kept exiting with signal 11 when trying to view the dbt docs when I was using the example docs dir you put in the docker-compose here, but it did work with a local file path in airlfow standalone mode.
Airflow recommends against using airflow standalone in production, and issues could be related to local system, and since this worked well in my testing with docker-compose, happy to approve with the minor comment below fixed before merging.
Co-authored-by: Justin Bandoro <79104794+jbandoro@users.noreply.github.com>
|
Thanks for the approval. Just updated. |
|
@dwreeves how does this work when there are multiple dbt projects/doc dirs? |
|
@ms32035 I was a little worried someone would ask. It is not supported directly. It could be in theory, although I cannot think of an API+UI design for multiple dbt projects' docs that wouldn't complicate things a lot for users with just one project. I'm all ears for what user-facing interface you'd think is appropriate. Another solution, which I find reasonably elegant, is to import projects into a "docs project" so to speak. So you create a dbt project that has a packages:
- local: ../project1
- local: ../project2It's possible that this pattern could/should be documented. Or we just support multiple projects' docs directly, although again, I don't know how to avoid the complications for the API. |
|
I actually do think at the very least this pattern should be documented. I'll open a PR in the next couple days. I'm aware due to very custom setups like passing vars into the dbt compile command, that "just create a docs project" could be a nonstarter for many people. At that point though, I'd rather have a variable that turns off the blueprint for the plugin, and let users do their own thing by subclassing the plugin, and they can create their own dropdown with multiple blueprints. Most of the novel javascript and iframe machinery is there, after all. |
|
@dwreeves the only design idea I have is to have a list page as an entry point, where you'd have to pre-configure the folders, - a comma separated string in airflow conf. I did something similar here https://github.com/ms32035/airflow-multirepo-deploy/blob/f08d8a5863b3311bf4210d3dc2c835370dd0296d/multirepo-deploy-plugin/multirepo_deploy_plugin.py#L110 |
|
I would only support that solution if, when there is only one project, the dbt docs load normally without a list. So the list is just there if you have multiple projects. Making each attribute into a comma separated list is that gets zip()'d is something I briefly considered. It does help make things simple for single project users, although it feels weird and slightly inexplicit as a data model. That's the trade-off, basically. |
|
If list length = 1 then redirect to the first element instead of rendering the table :D |
Description
This PR adds a plugin (via the Airflow plugins entrypoint) that adds a menu item inside of
Browsethat renders the dbt docs:And this is what it looks like. (This example is inside the dev docker compose):
The docs are rendered via an iframe with some additional hacks to make the page render in a user friendly way. I chose an iframe over vendoring the
index.htmlin the templates for a few reasons, but mostly to support custom{% block __overview__ %}text. However, extracting the text fromindex.htmland rendering it in a custom page is certainly an option too.The dbt docs are specified in the Airflow config with the following parameters:
Note that the path can be a link to any of the following:
This is designed to work with the operators that dump the dbt docs, and the documentation changes I added make that clear.
Lastly, if docs are not hooked up, a message comes up telling the user that they should set their dbt docs up:
Current limitations
dev/dags/fails locally), so I actually have no idea whether the test suite works. I was just planning on letting Github Actions take a stab at it.API Decisions
The core maintainers of the repo should provide some feedback on a few high level API decisions:
dbt_docs_diranddbt_docs_conn_idare appropriate names. Other names could be like,dbt_docs_path, ordbt_docs_dir_conn_id, ordbt_docs_path_conn_id, etc.Related Issue(s)
Closes #571.
Breaking Change?
This PR should not cause any breaking changes.
Checklist