Skip to content

[Experimental feature] enable orjson parser for whole project#2552

Merged
tatiana merged 17 commits into
astronomer:mainfrom
corsettigyg:main
May 12, 2026
Merged

[Experimental feature] enable orjson parser for whole project#2552
tatiana merged 17 commits into
astronomer:mainfrom
corsettigyg:main

Conversation

@corsettigyg
Copy link
Copy Markdown
Collaborator

@corsettigyg corsettigyg commented Apr 10, 2026

Description

closed the old #2320 due to the code change getting convoluted and re-opened here.

Idea is the same, use orjson for better manifest parsing, but keep it only where it matters

Related Issue(s)

Breaking Change?

should not be

Checklist

  • I have made corresponding changes to the documentation (if required)
  • I have added tests that prove my fix is effective or that my feature works

Benchmark

I patched the graph generator integration test and ran 10 iterations on it for different manifest sizes. In total, I got

Manifest Size Nodes json mean orjson mean Speedup
sample 0.4 MB 28 0.003 s 0.002 s 1.44×
sample 5.1 MB 2,436 0.052 s 0.040 s 1.31×
sample 10.0 MB 4,984 0.101 s 0.086 s 1.18×
sample 15.0 MB 7,532 0.169 s 0.121 s 1.39×
sample 20.0 MB 10,080 0.234 s 0.180 s 1.30×
sample 25.0 MB 12,628 0.345 s 0.239 s 1.44×
sample 30.0 MB 15,176 0.425 s 0.284 s 1.50×
real-world (the manifest we use for our cosmos runs) 26.7 MB 1,533 0.159 s 0.078 s 2.03×

What is interesting is that in our real world manifest, which has not only nodes but also descriptions, tests, tags, etc.... the performance improvement of orjson is better than for the mock manifests I created. I read about it and the explanation I can think about is that even tho the real manifest has fewer graph objects, it is still a much fatter json per object, which is where orjson helps the most 🥇

If necessary, I can share the benchmark I ran. just did not commit it since it is not exactly a test per say.

corsettigyg and others added 2 commits April 10, 2026 20:00
Uses orjson (a Rust-based JSON library) to parse manifest.json when the
`enable_orjson_parser` setting is enabled, falling back to stdlib json
by default. Benchmarks show ~1.9x speedup (≈47% faster) across manifest
sizes from 400 KB to 30 MB.

The feature is disabled by default to remain backwards-compatible.
Enable with:
  AIRFLOW__COSMOS__ENABLE_ORJSON_PARSER=True
  pip install 'astronomer-cosmos[orjson]'

Made-with: Cursor
Add optional orjson parser for faster dbt manifest loading
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental configuration flag to parse dbt manifest.json with orjson for faster DAG parsing, along with documentation, packaging, and tests to support/validate the feature.

Changes:

  • Introduce settings.enable_orjson_parser and document the new Airflow config/env var.
  • Add optional dependency group orjson plus test-environment dependency installation.
  • Refactor DbtGraph.load_from_dbt_manifest() to parse manifests via a new _load_manifest_from_file() helper, and add unit tests for behavior/equivalence.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
cosmos/dbt/graph.py Adds optional orjson import and _load_manifest_from_file() used by manifest-based graph loading.
cosmos/settings.py Adds enable_orjson_parser setting read from Airflow config.
docs/reference/configs/cosmos-conf.rst Documents the new experimental setting and installation instructions.
pyproject.toml Adds orjson optional extra and installs orjson in the hatch tests environment.
tests/dbt/test_orjson_parser.py New unit tests for default behavior, missing dependency errors, and parser equivalence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py Outdated
Comment thread cosmos/dbt/graph.py
Comment thread docs/reference/configs/cosmos-conf.rst
corsettigyg added a commit to corsettigyg/astronomer-cosmos that referenced this pull request Apr 10, 2026
- Use open("rb")/open("r") instead of read_bytes() in _load_manifest_from_file
  so the method works correctly with ObjectStoragePath (remote object storage),
  which may not implement Path.read_bytes()
- Restore the `or {}` fallback in load_from_dbt_manifest to guard against
  manifest.json files containing JSON null, preserving the previous behavior

Made-with: Cursor
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 10, 2026 18:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py Outdated
Comment thread docs/reference/configs/cosmos-conf.rst
Comment thread tests/dbt/test_orjson_parser.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 10, 2026 18:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py
…atibility

The new `_load_manifest_from_file` raised `CosmosLoadDbtException` for any
non-dict root, breaking the pre-existing `test_load_from_dbt_manifest_handles_null_manifest`
which expects a JSON `null` manifest to be treated as an empty dict (the old
`json.load(fp) or {}` behavior).

Treat `None` as `{}` while still rejecting genuinely invalid roots (lists,
strings, numbers). Also tightens the null-root test and adds coverage for
invalid non-null, non-dict roots.

Made-with: Cursor
@corsettigyg
Copy link
Copy Markdown
Collaborator Author

fixed CICD issues

@tatiana tatiana self-assigned this Apr 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.02%. Comparing base (5b2a8ce) to head (1040847).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2552   +/-   ##
=======================================
  Coverage   98.02%   98.02%           
=======================================
  Files         105      105           
  Lines        7829     7844   +15     
=======================================
+ Hits         7674     7689   +15     
  Misses        155      155           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corsettigyg This looks great, thank you very much, and I'm sorry for all the re-work on the original PR. I left two minor comments - if you could address them, it would be great. You have superpowers to merge these comments, and the checks pass.

Comment thread pyproject.toml Outdated
Comment thread docs/reference/configs/cosmos-conf.rst Outdated
Co-authored-by: Tatiana Al-Chueyr <tatiana.alchueyr@gmail.com>
Copilot AI review requested due to automatic review settings May 5, 2026 15:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py Outdated
Comment thread docs/reference/configs/cosmos-conf.rst
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 5, 2026 15:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cosmos/dbt/graph.py
Comment thread pyproject.toml
Comment thread docs/reference/configs/cosmos-conf.rst
Comment thread tests/dbt/test_orjson_parser.py
The test assertions expected 'astronomer-cosmos[orjson]' in the error
message, but the implementation raises an error suggesting
'pip install orjson'. Since no orjson extra is defined in pyproject.toml,
update both assertions to match the actual installation guidance.

Co-authored-by: Cursor <cursoragent@cursor.com>
@corsettigyg
Copy link
Copy Markdown
Collaborator Author

@tatiana agree with the comments :) it should be all good now 💪

Copilot AI review requested due to automatic review settings May 12, 2026 12:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread docs/reference/configs/cosmos-conf.rst
@tatiana tatiana merged commit 63714cd into astronomer:main May 12, 2026
129 checks passed
@tatiana tatiana added this to the Cosmos 1.15.0 milestone May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants