Skip to content

Add optional orjson parser for faster dbt manifest loading#18

Merged
corsettigyg merged 1 commit into
mainfrom
feat/orjson-manifest-parser
Apr 10, 2026
Merged

Add optional orjson parser for faster dbt manifest loading#18
corsettigyg merged 1 commit into
mainfrom
feat/orjson-manifest-parser

Conversation

@corsettigyg

Copy link
Copy Markdown
Owner

This pull request introduces an experimental feature to speed up parsing of large dbt manifest files by optionally using the fast Rust-based orjson library instead of the standard Python json module. The change is controlled by a new configuration setting and includes robust error handling and comprehensive tests to ensure correctness and backward compatibility.

Experimental orjson parser for manifest loading:

  • Added a new configuration option enable_orjson_parser (default: False) to allow using orjson for loading manifest.json files, with fallback to the standard json module if disabled or unavailable. This can significantly improve parsing speed for large manifests. [1] [2] [3] [4]
  • Updated documentation to describe the new setting, its benefits, installation instructions, and error handling if orjson is not installed when enabled.

Dependency management:

  • Added orjson as an optional dependency in pyproject.toml and included it in test dependencies for CI coverage. [1] [2]

Testing and validation:

  • Introduced a new test module to verify correct behavior of the orjson parser, including: default setting value, error handling when orjson is missing, correctness of manifest loading with both parsers, and equivalence of parsed results.Uses orjson (a Rust-based JSON library) to parse manifest.json when the enable_orjson_parser setting is enabled, falling back to stdlib json by default. Benchmarks show ~1.9x speedup (≈47% faster) across manifest sizes from 400 KB to 30 MB.

The feature is disabled by default to remain backwards-compatible. Enable with:
AIRFLOW__COSMOS__ENABLE_ORJSON_PARSER=True
pip install 'astronomer-cosmos[orjson]'

Made-with: Cursor

Description

Related Issue(s)

Breaking Change?

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

Uses orjson (a Rust-based JSON library) to parse manifest.json when the
`enable_orjson_parser` setting is enabled, falling back to stdlib json
by default. Benchmarks show ~1.9x speedup (≈47% faster) across manifest
sizes from 400 KB to 30 MB.

The feature is disabled by default to remain backwards-compatible.
Enable with:
  AIRFLOW__COSMOS__ENABLE_ORJSON_PARSER=True
  pip install 'astronomer-cosmos[orjson]'

Made-with: Cursor
@corsettigyg corsettigyg merged commit 0c275df into main Apr 10, 2026
35 of 80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant