Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move manifest nodes to dbt/artifacts #9538

Merged
merged 25 commits into from
Feb 21, 2024
Merged

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Feb 7, 2024

resolves #9388

Problem

As part of the initiative to make a package to handle artifact generation, serializing and validation we are moving things to a separate dbt/artifacts directory. This PR is to move the nodes/resources in the ManifestNodes union.

Solution

Move the nodes in the ManifestNodes union to the artifacts package.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@gshank gshank requested a review from a team as a code owner February 7, 2024 17:27
@gshank gshank requested a review from emmyoop February 7, 2024 17:27
@cla-bot cla-bot bot added the cla:yes label Feb 7, 2024
Copy link

codecov bot commented Feb 7, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (1ec5e22) 88.00% compared to head (07229e2) 88.00%.
Report is 2 commits behind head on main.

Files Patch % Lines
core/dbt/artifacts/resources/base.py 87.09% 4 Missing ⚠️
core/dbt/artifacts/resources/v1/snapshot.py 97.56% 1 Missing ⚠️
core/dbt/contracts/graph/nodes.py 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9538      +/-   ##
==========================================
- Coverage   88.00%   88.00%   -0.01%     
==========================================
  Files         168      176       +8     
  Lines       22210    22311     +101     
==========================================
+ Hits        19546    19634      +88     
- Misses       2664     2677      +13     
Flag Coverage Δ
integration 85.59% <97.51%> (-0.08%) ⬇️
unit 62.10% <94.54%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gshank
Copy link
Contributor Author

gshank commented Feb 7, 2024

Questions:

There's some validation logic in the configs that I'm not sure should be in artifacts. Should we attempt to move that out someplace else?

I put all of the "resources" into a manifest_nodes.py file, because a number of them were pretty small and it seemed like overkill to give each one separate files, but I don't have strong feelings about that.

Anything in the "base" resources dictionary will not be versioned, so we should be sure that it makes sense to put things there.

I did not put UnitTestNode into artifacts because right now that resource is never written out into json artifacts, it's only used internally in constructing the temporary unit testing manifest.

None of the embedded objects (i.e. Configs and things like RefArgs) can use the pattern of putting the fields and methods in different classes, because then there are different types in the artifacts and the nodes and mypy complains. It works fine for top-level resources though.

@gshank
Copy link
Contributor Author

gshank commented Feb 7, 2024

There's a couple of failing unit tests that I need to chase down, but wanted to give people on opportunity to review.

Comment on lines 120 to 149
class ParsedNodeMandatory(GraphResource, HasRelationMetadata):
alias: str
checksum: FileHash
config: NodeConfig = field(default_factory=NodeConfig)

@property
def identifier(self):
return self.alias


@dataclass
class ParsedNode(ParsedNodeMandatory):
tags: List[str] = field(default_factory=list)
description: str = field(default="")
columns: Dict[str, ColumnInfo] = field(default_factory=dict)
meta: Dict[str, Any] = field(default_factory=dict)
group: Optional[str] = None
docs: Docs = field(default_factory=Docs)
patch_path: Optional[str] = None
build_path: Optional[str] = None
deferred: bool = False
unrendered_config: Dict[str, Any] = field(default_factory=dict)
created_at: float = field(default_factory=lambda: time.time())
config_call_dict: Dict[str, Any] = field(default_factory=dict)
relation_name: Optional[str] = None
raw_code: str = ""


@dataclass
class CompiledNode(ParsedNode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could rename these to ParsedResource (& ParsedResourceMandatory) + CompiledResource and define them in https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/artifacts/resources/base.py to be more consistent with the base + graph resource refactors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add anything to classes in base.py because they aren't versioned. Are we sure that this is what we want?

Copy link
Contributor

@MichelleArk MichelleArk Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline: there isn't really a clear policy shaking out to discern whether these type of base resource classes should be defined in v1/ or one directory above in base.py. Let's revisit this in later on once we have backward/forward compatibility considered and tested (issue: #9615) , potentially as part of a more holistic 'final' reorganization push.

@MichelleArk MichelleArk added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Feb 21, 2024
Copy link
Contributor

@MichelleArk MichelleArk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge lift! Thank you 💪

@gshank gshank merged commit 7df747a into main Feb 21, 2024
56 of 57 checks passed
@gshank gshank deleted the gs/manifest_nodes_to_artifacts branch February 21, 2024 16:16
martynydbt pushed a commit that referenced this pull request Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking cla:yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-3558] Move all ManifestNodes to dbt/artifacts
2 participants