Add support for InvocationMode.DBT_RUNNER for local execution mode#836
Add support for InvocationMode.DBT_RUNNER for local execution mode#836jbandoro wants to merge 21 commits into
InvocationMode.DBT_RUNNER for local execution mode#836Conversation
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #836 +/- ##
==========================================
- Coverage 94.72% 94.70% -0.02%
==========================================
Files 56 56
Lines 2520 2589 +69
==========================================
+ Hits 2387 2452 +65
- Misses 133 137 +4 ☔ View full report in Codecov by Sentry. |
Invocation.DBT_RUNNER for local and virtualenv execution modes
Invocation.DBT_RUNNER for local and virtualenv execution modesInvocationMode.DBT_RUNNER for local and virtualenv execution modes
jlaneve
left a comment
There was a problem hiding this comment.
This is looking great! Will try it out locally over the weekend.
One thing I haven't thought a ton about is what config we should pass the performance tests. IMO the perf tests should cover the "best case" scenario (appropriately tune Cosmos for performance) so that we're always pushing the boundary vs the default. Thoughts on including a change in this PR to the perf tests to use this new method?
One other thought: do you think it's worth doing any auto-discovery to infer which invocation method is used? i.e. if you don't explicitly specify one, should we:
- try to import the dbt runner, if it works, great - we can use the more performant method
- if it doesn't work, no problem, we default to subprocess
InvocationMode.DBT_RUNNER for local and virtualenv execution modesInvocationMode.DBT_RUNNER for local execution mode
Description
This PR adds
dbtRunnerprogrammatic invocation forExecutionMode.LOCAL. I decided to not make a new execution mode for each (e.g.ExecutionMode.LOCAL_DBT_RUNNER) and all of the child operators but instead added an additional configExecutionConfig.invocation_modewhereInvocationMode.DBT_RUNNERcould be specified. This is so that users who are already using local execution mode could use dbt runner and see performance improvements.With the
dbtRunnerResultit makes it easy to know whether the dbt run was successful and logs do not need to be parsed but are still logged in the operator:Performance Testing
After #827 was added, I modified it slightly to use postgres adapter instead of sqlite because the latest dbt-core support for sqlite is 1.4 when programmatic invocation requires >=1.5.0. I got the following results comparing subprocess to dbt runner for 10 models:
InvocationMode.SUBPROCESS:Ran 10 models in 23.77661895751953 seconds NUM_MODELS=10 TIME=23.77661895751953InvocationMode.DBT_RUNNER:Ran 10 models in 8.390100002288818 seconds NUM_MODELS=10 TIME=8.390100002288818So using
InvocationMode.DBT_RUNNERis almost 3x faster, and can speed up dag runs if there are a lot of models that execute relatively quickly since there seems to be a 1-2s speed up per task.One thing I found while working on this is that a manifest is stored in the result if you parse a project with the runner, and can be reused in subsequent commands to avoid reparsing. This could be a useful way for caching the manifest if we use dbt runner for dbt ls parsing and could speed up the initial render as well.
I thought at first it would be easy to have this also work for virtualenv execution, since I at first thought the entire
executemethod was run in the virtualenv, which is not the case since the virtualenv operator creates a virtualenv and then passes the executable path to a subprocess. It may be possible to have this work for virtualenv and would be better suited for a follow-up PR.Related Issue(s)
closes #717
Breaking Change?
None
Checklist