Add optional orjson parser for faster dbt manifest loading#18
Merged
Conversation
Uses orjson (a Rust-based JSON library) to parse manifest.json when the `enable_orjson_parser` setting is enabled, falling back to stdlib json by default. Benchmarks show ~1.9x speedup (≈47% faster) across manifest sizes from 400 KB to 30 MB. The feature is disabled by default to remain backwards-compatible. Enable with: AIRFLOW__COSMOS__ENABLE_ORJSON_PARSER=True pip install 'astronomer-cosmos[orjson]' Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces an experimental feature to speed up parsing of large
dbtmanifest files by optionally using the fast Rust-basedorjsonlibrary instead of the standard Pythonjsonmodule. The change is controlled by a new configuration setting and includes robust error handling and comprehensive tests to ensure correctness and backward compatibility.Experimental orjson parser for manifest loading:
enable_orjson_parser(default: False) to allow usingorjsonfor loadingmanifest.jsonfiles, with fallback to the standardjsonmodule if disabled or unavailable. This can significantly improve parsing speed for large manifests. [1] [2] [3] [4]orjsonis not installed when enabled.Dependency management:
orjsonas an optional dependency inpyproject.tomland included it in test dependencies for CI coverage. [1] [2]Testing and validation:
orjsonis missing, correctness of manifest loading with both parsers, and equivalence of parsed results.Uses orjson (a Rust-based JSON library) to parse manifest.json when theenable_orjson_parsersetting is enabled, falling back to stdlib json by default. Benchmarks show ~1.9x speedup (≈47% faster) across manifest sizes from 400 KB to 30 MB.The feature is disabled by default to remain backwards-compatible. Enable with:
AIRFLOW__COSMOS__ENABLE_ORJSON_PARSER=True
pip install 'astronomer-cosmos[orjson]'
Made-with: Cursor
Description
Related Issue(s)
Breaking Change?
Checklist