Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove manifest from catalog and connection method signatures #9242

Conversation

MichelleArk
Copy link
Contributor

@MichelleArk MichelleArk commented Dec 7, 2023

resolves #9218
resolves #9217

Problem

  1. Setting the relations cache requires an entire manifest currently, even though just a sliver of functionality is necessary for the task.
  2. An entire manifest object is passed from core to adapters for catalog methods, when all that is needed is an iterable of ResultConfig protocols to create relations from, and a set of used schemas.

Solution

Using up prior PRs to close these out on a clean PR: #9212, #9213

  1. Pre-compute the manifest nodes to populate the cache for - passing in an iterable of RelationConfigs to the adapter instead of the entire manifest.
  2. Refactor the generate task to pre-compute the list of catalogable relation configs + used schemas and pass those + plumb them through adapter methods instead of passing an overly broad manifest object through core -> adapters.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@@ -406,11 +406,21 @@ def populate_adapter_cache(
if not self.args.populate_cache:
return

if self.manifest is None:
raise DbtInternalError("manifest was None in populate_adapter_cache")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carrying over prior conversation: #9213 (comment)

node
for node in self.manifest.nodes.values()
if (node.is_relational and not node.is_ephemeral_model and not node.is_external_node)
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carrying over prior conversatin: #9213 (comment)

Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (f1c2f06) 86.59% compared to head (7dc70a2) 86.59%.

Files Patch % Lines
core/dbt/task/runnable.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@                           Coverage Diff                           @@
##           feature/decouple-adapters-from-core    #9242      +/-   ##
=======================================================================
- Coverage                                86.59%   86.59%   -0.01%     
=======================================================================
  Files                                      219      219              
  Lines                                    26888    26893       +5     
=======================================================================
+ Hits                                     23284    23288       +4     
- Misses                                    3604     3605       +1     
Flag Coverage Δ
integration 83.46% <96.29%> (-0.01%) ⬇️
unit 65.28% <55.55%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

) -> agate.Table:
kwargs = {"information_schema": information_schema, "schemas": schemas}
table = self.execute_macro(
GET_CATALOG_MACRO_NAME,
kwargs=kwargs,
# pass in the full manifest so we get any local project
# overrides
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carrying over prior conversation that explains this removal: #9212 (comment)

@@ -1160,16 +1157,16 @@ def _get_one_catalog_by_relations(
table = self.execute_macro(
GET_CATALOG_RELATIONS_MACRO_NAME,
kwargs=kwargs,
# pass in the full manifest, so we get any local project
# overrides
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichelleArk MichelleArk marked this pull request as ready for review December 7, 2023 01:46
@MichelleArk MichelleArk requested review from a team as code owners December 7, 2023 01:46
@MichelleArk MichelleArk requested review from ChenyuLInx and colin-rogers-dbt and removed request for ChenyuLInx December 7, 2023 01:46
@MichelleArk MichelleArk merged commit 26ddaaf into feature/decouple-adapters-from-core Dec 7, 2023
51 checks passed
@MichelleArk MichelleArk deleted the remove-manifest-from-catalog-and-connection-methods branch December 7, 2023 02:05
colin-rogers-dbt added a commit that referenced this pull request Dec 22, 2023
* remove dbt.contracts.connection imports from adapter module

* Move events to common (#8676)

* Move events to common

* More Type Annotations (#8536)

* Extend use of type annotations in the events module.

* Add return type of None to more __init__ definitions.

* Still more type annotations adding -> None to __init__

* Tweak per review

* Allow adapters to include python package logging in dbt logs (#8643)

* add set_package_log_level functionality

* set package handler

* set package handler

* add logging about stting up logging

* test event log handler

* add event log handler

* add event log level

* rename package and add unit tests

* revert logfile config change

* cleanup and add code comments

* add changie

* swap function for dict

* add additional unit tests

* fix unit test

* update README and protos

* fix formatting

* update precommit

---------

Co-authored-by: Peter Webb <[email protected]>

* fix import

* move types_pb2.py from events to common/events

* move agate_helper into common

* Add utils module (#8910)

* moving types_pb2.py to common/events

* split out utils into core/common/adapters

* add changie

* remove usage of dbt.config.PartialProject from dbt/adapters (#8909)

* remove usage of dbt.config.PartialProject from dbt/adapters

* add changie

---------

Co-authored-by: Colin <[email protected]>

* move agate_helper unit tests under tests/unit/common

* move agate_helper into common (#8911)

* move agate_helper into common

* add changie

---------

Co-authored-by: Colin <[email protected]>

* remove dbt.flags.MP_CONTEXT usage in dbt/adapters (#8931)

* remove dbt.flags.LOG_CACHE_EVENTS usage in dbt/adapters (#8933)

* Refactor Base Exceptions (#8989)

* moving types_pb2.py to common/events

* Refactor Base Exceptions

* update make_log_dir_if_missing to handle str

* move remaining adapters exception imports to common/adapters
---------

Co-authored-by: Michelle Ark <[email protected]>

* Remove usage of dbt.deprecations in dbt/adapters, enable core & adapter-specific (#9051)

* Decouple adapter constraints from core (#9054)

* Move constraints to dbt.common

* Move constraints to contracts folder, per review

* Add a changelog entry.

* move include/global_project to adapters (#8930)

* remove adapter.get_compiler (#9134)

* Move adapter logger to adapters (#9165)

* moving types_pb2.py to common/events

* Move AdapterLogger to adapter folder

* add changie

* delete accidentally merged types_pb2.py

* Move the semver package to common and alter references. (#9166)

* Move the semver package to common and alter references.

* Alter leftover references to dbt.semver, this time using from syntax.

---------

Co-authored-by: Mila Page <[email protected]>

* Refactor EventManager setup and interaction (#9180)

* moving types_pb2.py to common/events

* move event manager setup back to core, remove ref to global EVENT_MANAGER and clean up event manager functions

* move invocation_id from events to first class common concept

* move lowercase utils to common

* move lowercase utils to common

* ref CAPTURE_STREAM through method

* add changie

* first pass: adapter migration script (#9160)

* Decouple macro generator from adapters (#9149)

* Remove usage of dbt.contracts.relation in dbt/adapters (#9207)

* Remove ResultNode usage from connections (#9211)

* Add RelationConfig Protocol for use in Relation.create_from (#9210)

* move relation contract to dbt.adapters

* changelog entry

* first pass: clean up relation.create_from

* type ignores

* type ignore

* changelog entry

* update RelationConfig variable names

* Merge main into feature/decouple-adapters-from-core (#9240)

* moving types_pb2.py to common/events

* Restore warning on unpinned git packages (#9157)

* Support --empty flag for schema-only dry runs (#8971)

* Fix ensuring we produce valid jsonschema artifacts for manifest, catalog, sources, and run-results (#9155)

* Drop `all_refs=True` from jsonschema-ization build process

Passing `all_refs=True` makes it so that Everything is a ref, even
the top level schema. In jsonschema land, this essentially makes the
produced artifact not a full schema, but a fractal object to be included
in a schema. Thus when `$id` is passed in, jsonschema tools blow up
because `$id` is for identifying a schema, which we explicitly weren't
creating. The alternative was to drop the inclusion of `$id`. Howver, we're
intending to create a schema, and having an `$id` is recommended best
practice. Additionally since we were intending to create a schema,
not a fractal, it seemed best to create to full schema.

* Explicity produce jsonschemas using DRAFT_2020_12 dialect

Previously were were implicitly using the `DRAFT_2020_12` dialect through
mashumaro. It felt wise to begin explicitly specifying this. First, it
is closest in available mashumaro provided dialects to what we produced
pre 1.7. Secondly, if mashumaro changes its default for whatever reason
(say a new dialect is added, and mashumaro moves to that), we don't want
to automatically inherit that.

* Bump manifest version to v12

Core 1.7 released with manifest v11, and we don't want to be overriding
that with 1.8. It'd be weird for 1.7 and 1.8 to both have v11 manifests,
but for them to be different, right?

* Begin including schema dialect specification in produced jsonschema

In jsonschema's documentation they state
> It's not always easy to tell which draft a JSON Schema is using.
> You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to.
> It's generally good practice to include it, though it is not required.

and

> For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world.

Basically, to know how to parse a schema, it's important to include what
schema dialect is being used for the schema specification. The change in
this commit ensures we include that information.

* Create manifest v12 jsonschema specification

* Add change documentation for jsonschema schema production fix

* Bump run-results version to v6

* Generate new v6 run-results jsonschema

* Regenerate catalog v1 and sources v3 with fixed jsonschema production

* Update tests to handle bumped versions of manifest and run-results

---------

Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Michelle Ark <[email protected]>
Co-authored-by: Quigley Malcolm <[email protected]>

* Move BaseConfig to Common (#9224)

* moving types_pb2.py to common/events

* move BaseConfig and assorted dependencies to common

* move ShowBehavior and OnConfigurationChange to common

* add changie

* Remove manifest from catalog and connection method signatures (#9242)

* Add MacroResolverProtocol, remove lazy loading of manifest in adapter.execute_macro (#9243)

* remove manifest from adapter.execute_macro, replace with MacroResolver + remove lazy loading

* rename to MacroResolverProtocol

* pass MacroResolverProtcol in adapter.calculate_freshness_from_metadata

* changelog entry

* fix adapter.calculate_freshness call

* pass context to MacroQueryStringSetter (#9248)

* moving types_pb2.py to common/events

* remove manifest from adapter.execute_macro, replace with MacroResolver + remove lazy loading

* rename to MacroResolverProtocol

* pass MacroResolverProtcol in adapter.calculate_freshness_from_metadata

* changelog entry

* fix adapter.calculate_freshness call

* pass context to MacroQueryStringSetter

* changelog entry

---------

Co-authored-by: Colin <[email protected]>

* add macro_context_generator on adapter (#9251)

* moving types_pb2.py to common/events

* remove manifest from adapter.execute_macro, replace with MacroResolver + remove lazy loading

* rename to MacroResolverProtocol

* pass MacroResolverProtcol in adapter.calculate_freshness_from_metadata

* changelog entry

* fix adapter.calculate_freshness call

* add macro_context_generator on adapter

* fix adapter test setup

* changelog entry

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Add typing for macro_context_generator, fix query_header_context

---------

Co-authored-by: Colin <[email protected]>
Co-authored-by: William Deng <[email protected]>

* Pass mp_context to adapter factory (#9275)

* moving types_pb2.py to common/events

* require core to pass mp_context to adapter factory

* add changie

* fix SpawnContext annotation

* Fix include for decoupling (#9286)

* moving types_pb2.py to common/events

* fix include path in MANIFEST.in

* Fix include for decoupling (#9288)

* moving types_pb2.py to common/events

* fix include path in MANIFEST.in

* add index.html to in MANIFEST.in

* move system client to common (#9294)

* moving types_pb2.py to common/events

* move system.py to common

* add changie update README

* remove dbt.utils from semver.py

* remove aliasing connection_exception_retry

* Update materialized views to use RelationConfigs and remove refs to dbt.utils (#9291)

* moving types_pb2.py to common/events

* add AdapterRuntimeConfig protocol and clean up dbt-postgress core imports

* add changie

* remove AdapterRuntimeConfig

* update changelog

* Add config field to RelationConfig (#9300)

* moving types_pb2.py to common/events

* add config field to RelationConfig

* merge main into feature/decouple-adapters-from-core (#9305)

* moving types_pb2.py to common/events

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Remove `--dry-run` flag from `dbt deps` (#9169)

* Rm --dry-run flag for dbt deps

* Add changelog entry

* Update test

* PR feedback

* adding clean_up methods to basic and unique_id tests (#9195)

* init attempt of adding clean_up methods to basic and unique_id tests

* swapping cleanup method drop of test_schema to unique_schema to test breakage on docs_generate test

* moving the clean_up method down into class BaseDocsGenerate

* remove drop relation for unique_schema

* manually define alternate_schema for clean_up as not being seen as part of project_config

* add changelog

* remove unneeded changelog

* uncomment line that generates new manifest and delete manifest our changes created

* make sure the manifest test is deleted and readd older version of manifest.json to appease test

* manually revert file to previous commit

* Revert "manually revert file to previous commit"

This reverts commit a755419.

---------

Co-authored-by: William Deng <[email protected]>
Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Matthew McKnight <[email protected]>

* resolve merge conflict on unparsed.py (#9309)

* moving types_pb2.py to common/events

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Remove `--dry-run` flag from `dbt deps` (#9169)

* Rm --dry-run flag for dbt deps

* Add changelog entry

* Update test

* PR feedback

* adding clean_up methods to basic and unique_id tests (#9195)

* init attempt of adding clean_up methods to basic and unique_id tests

* swapping cleanup method drop of test_schema to unique_schema to test breakage on docs_generate test

* moving the clean_up method down into class BaseDocsGenerate

* remove drop relation for unique_schema

* manually define alternate_schema for clean_up as not being seen as part of project_config

* add changelog

* remove unneeded changelog

* uncomment line that generates new manifest and delete manifest our changes created

* make sure the manifest test is deleted and readd older version of manifest.json to appease test

* manually revert file to previous commit

* Revert "manually revert file to previous commit"

This reverts commit a755419.

---------

Co-authored-by: William Deng <[email protected]>
Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Matthew McKnight <[email protected]>

* Resolve unparsed.py conflict (#9311)

* Update parser to support conversion metrics (#9173)

* added ConversionTypeParams classes

* updated parser for ConversionTypeParams

* added step to populate input_measure for conversion metrics

* version bump on DSI

* comment back manifest generating line

* updated v12 schemas

* added tests

* added changelog

* Remove `--dry-run` flag from `dbt deps` (#9169)

* Rm --dry-run flag for dbt deps

* Add changelog entry

* Update test

* PR feedback

* adding clean_up methods to basic and unique_id tests (#9195)

* init attempt of adding clean_up methods to basic and unique_id tests

* swapping cleanup method drop of test_schema to unique_schema to test breakage on docs_generate test

* moving the clean_up method down into class BaseDocsGenerate

* remove drop relation for unique_schema

* manually define alternate_schema for clean_up as not being seen as part of project_config

* add changelog

* remove unneeded changelog

* uncomment line that generates new manifest and delete manifest our changes created

* make sure the manifest test is deleted and readd older version of manifest.json to appease test

* manually revert file to previous commit

* Revert "manually revert file to previous commit"

This reverts commit a755419.

---------

Co-authored-by: William Deng <[email protected]>
Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Matthew McKnight <[email protected]>

---------

Co-authored-by: colin-rogers-dbt <[email protected]>
Co-authored-by: Peter Webb <[email protected]>
Co-authored-by: Colin <[email protected]>
Co-authored-by: Mila Page <[email protected]>
Co-authored-by: Mila Page <[email protected]>
Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Quigley Malcolm <[email protected]>
Co-authored-by: William Deng <[email protected]>
Co-authored-by: Matthew McKnight <[email protected]>
Co-authored-by: Chenyu Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants