Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dbt CLI cold start load up to 8% faster with lazy loading of task modules and agate #9744

Merged
merged 5 commits into from
Mar 18, 2024

Conversation

dwreeves
Copy link
Contributor

@dwreeves dwreeves commented Mar 9, 2024

The dbt CLI takes a while to load, especially before .pyc files are compiled. This PR makes things go much faster by making sure a handful of resources only get loaded when they are needed.

In isolation, this PR speeds things up by about 5%. The additional 3% requires the following change to dbt-adapters: dbt-labs/dbt-adapters#126

Lazy loading task modules

The first change was already discussed, but issue #4627 was closed without ever being fully resolved:

Since the defaults are only used if a subcommand is being called, was thinking we could improve performance by only importing the relevant dbt.task. For example:

  1. If the user uses the command dbt build we will only import dbt.task.build as build_task
  2. If the user uses the command dbt clean we will only import dbt.task.clean as clean_task
  3. If the user uses the command dbt without a subcommand we will not import any classes from dbt.task

Implementing this speeds up the CLI by about 5% on a cold start.

Use TYPE_CHECKING to import agate

When I run PYTHONPROFILEIMPORTTIME=1 dbt --help to see where some additional free speed-ups may be, agate sticks out as taking a while to load; specifically it takes up about 3.5% of the total load time for the command dbt --help.

agate appears to only be used in three contexts across dbt (correct me if I am wrong): dbt unit tests, dbt docs generate, and dbt seeds.

Most agate imports are actually just for type annotations, meaning runtimes such as dbt run --select foo that do not require agate at all will end up loading it.

Important note: Gating import agate behind if TYPE_CHECKING: also requires making this change to the dbt-adapters library as well. Once both dbt-core and dbt-adapters use if TYPE_CHECKING to import agate in those contexts, then this change should shave another 3.5% off load times, and this speed improvement will impact not just dbt --help but all dbt command invocations that don’t require agate.

Before and after

I use the following script to test speeds before and after the changes. Note that this is only testing the lazy loading of modules, not if TYPE_CHECKING: import agate for reasons stated above.

In my testing, the before and after difference is about 5%.

#!/bin/bash
# OS X - requires `brew install coreutils` for gdate

for ((i=1; i<=60; i++))
do
  find "./venv" -type f -name "*.pyc" -delete
  find "./.venv" -type f -name "*.pyc" -delete
  find "./core" -type f -name "*.pyc" -delete

  start=$(gdate +%s%3N)
  dbt --help >/dev/null
  end=$(gdate +%s%3N)

  # Ignore first 5 runs
  if [ $i -gt 5 ]; then
    echo $((end - start))
  fi
done

If we compare the before and after outputs of the PYTHONPROFILEIMPORTTIME=1 diagnostics, we can see what modules that are no longer avoided which account for the 5% speed-up:

import csv
from pprint import pprint

before = []
after = []
with open(‘imports-before.txt’) as f:
    reader = csv.reader(f, delimiter='|')
    for row in reader:
        before.append(row[-1].strip())
with open(‘imports-after.txt') as f:
    reader = csv.reader(f, delimiter='|')
    for row in reader:
        after.append(row[-1].strip())

pprint([i for i in before if i not in after])

Output:

['dbt.task',
 'dbt.parser.sql',
 'sqlparse.tokens',
 'sqlparse.utils',
 'sqlparse.sql',
 'sqlparse.exceptions',
 'sqlparse.cli',
 'sqlparse.engine.grouping',
 'sqlparse.keywords',
 'sqlparse.lexer',
 'sqlparse.engine.statement_splitter',
 'sqlparse.engine.filter_stack',
 'sqlparse.engine',
 'sqlparse.filters.others',
 'sqlparse.filters.output',
 'sqlparse.filters.tokens',
 'sqlparse.filters.reindent',
 'sqlparse.filters.right_margin',
 'sqlparse.filters.aligned_indent',
 'sqlparse.filters',
 'sqlparse.formatter',
 'sqlparse',
 'dbt.compilation',
 'dbt.task.printer',
 'dbt.task.base',
 'multiprocessing.dummy.connection',
 'multiprocessing.dummy',
 'dbt.task.runnable',
 'dbt.task.compile',
 'dbt.task.run',
 'dbt.task.snapshot',
 'dbt.task.seed',
 'daff',
 'dbt.task.test',
 'dbt.task.build',
 'dbt.task.clean',
 'dbt.task.clone',
 'dbt.links',
 'dbt.task.debug',
 'dbt.deps',
 'dbt.deps.base',
 'dbt.deps.local',
 'dbt.deps.tarball',
 'packaging',
 'packaging._structures',
 'packaging.version',
 'dbt.clients.git',
 'dbt.deps.git',
 'dbt.clients.registry',
 'dbt.deps.registry',
 'dbt.deps.resolver',
 'dbt.task.deps',
 'dbt.task.docs',
 'dbt.task.docs.generate',
 'socketserver',
 'shlex',
 'webbrowser',
 'http.server',
 'dbt.task.docs.serve',
 'dbt.contracts.results',
 'dbt.task.freshness',
 'dbt.task.init',
 'dbt.task.list',
 'dbt.task.run_operation',
 'dbt.task.retry',
 'dbt.task.show']

@dwreeves dwreeves requested a review from a team as a code owner March 9, 2024 19:04
@cla-bot cla-bot bot added the cla:yes label Mar 9, 2024
Copy link
Contributor

github-actions bot commented Mar 9, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

1 similar comment
Copy link
Contributor

github-actions bot commented Mar 9, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link

codecov bot commented Mar 9, 2024

Codecov Report

Attention: Patch coverage is 85.36585% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 88.11%. Comparing base (65b366b) to head (2f48446).
Report is 1 commits behind head on main.

Files Patch % Lines
core/dbt/artifacts/schemas/run/v5/run.py 75.00% 1 Missing ⚠️
core/dbt/cli/main.py 94.44% 1 Missing ⚠️
core/dbt/context/providers.py 80.00% 1 Missing ⚠️
core/dbt/exceptions.py 80.00% 1 Missing ⚠️
core/dbt/task/run_operation.py 75.00% 1 Missing ⚠️
core/dbt/task/test.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9744      +/-   ##
==========================================
- Coverage   88.18%   88.11%   -0.08%     
==========================================
  Files         178      178              
  Lines       22480    22487       +7     
==========================================
- Hits        19825    19814      -11     
- Misses       2655     2673      +18     
Flag Coverage Δ
integration 85.58% <85.36%> (-0.08%) ⬇️
unit 61.81% <39.02%> (-0.98%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dwreeves
Copy link
Contributor Author

dwreeves commented Mar 9, 2024

It looks like this PR needs a artifact_minor_upgrade label because I add if TYPE_CHECKING: to a file in core/dbt/artifacts.

I also don't know what to do about the coverage requirement? Mypy covers the TYPE_CHECKING just fine. Is it possible to just ignore that for this PR? It doesn't make sense to hit a coverage threshold for a change like this.

@MichelleArk MichelleArk added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Mar 18, 2024
@peterallenwebb
Copy link
Contributor

@dwreeves Thank you for making dbt better! We hope to have even more improvements to start-up time coming your way soon.

@MichelleArk MichelleArk merged commit 29395ac into dbt-labs:main Mar 18, 2024
60 of 61 checks passed
@dwreeves
Copy link
Contributor Author

Thanks @MichelleArk and @peterallenwebb!

The associated dbt-adapters PR is essential for realizing the speed boost across the dbt runtimes which don't use agate. Not sure who is in charge of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking cla:yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants