Skip to content

Commit

Permalink
Move SourceDefinition to dbt/artifacts (#9543)
Browse files Browse the repository at this point in the history
* Move `ColumnInfo` to dbt/artifacts

* Move `Quoting` resource to dbt/artifacts

* Move `TimePeriod` to `types.py` in dbt/artifacts

* Move `Time` class to `components`

We need to move the data parts of the `Time` definition to dbt/artifacts.
That is not what we're doing in this commit. In this commit we're simply
moving the functional `Time` definition upstream of `unparsed` and `nodes`.
This does two things
  - Mirrors the import path that the resource `time` definition will have in dbt/artifacts
  - Reduces the chance of ciricular import problems between `unparsed` and `nodes`

* Move data part of `Time` definition to dbt/artifacts

* Move `FreshnessThreshold` class to components module

We need to move the data parts of the `FreshnessThreshold` definition to dbt/artifacts.
That is not what we're doing in this commit. In this commit we're simply
moving the functional `FreshnessThreshold` definition upstream of `unparsed` and `nodes`.
This does two things
  - Mirrors the import path that the resource `FreshnessThreshold` definition will have in dbt/artifacts
  - Reduces the chance of ciricular import problems between `unparsed` and `nodes`

* Move data part of `FreshnessThreshold` to dbt/artifacts

Note: We had to override some of the attrs of the `FreshnessThreshold`
resource because the resource version only has access to the resource
version of `Time`. The overrides in the functional definition of
`FreshnessThreshold` make it so the attrs use the functional version
of `Time`.

* Move `ExternalTable` and `ExternalPartition` to `source_definition` module in dbt/artifacts

* Move `SourceConfig` to `source_definition` module in dbt/artifacts

* Move `HasRelationMetadata` to core `components` module

This is a precursor to splitting `HasRelationMetadata` into it's
data and functional parts.

* Move data portion of `HasRelationMetadata` to dbt/artifacts

* Move `SourceDefinitionMandatory` to dbt/artifacts

* Move the data parts of `SourceDefinition` to dbt/artifacts

Something interesting here is that we had to override the `freshness`
property. We had to do this because if we didn't we wouldn't get the
functional parts of `FreshnessThreshold`, we'd only get the data parts.

Also of note, the `SourceDefintion` has a lot of `@property` methods that
on other classes would be actual attribute properties of the node. There is
an argument to be made that these should be moved as well, but thats perhaps
a separate discussion.

Finally, we have not (yet) moved `NodeInfoMixin`. It is an open discussion
whether we do or not. It seems primarily functional, as a means to update the
source freshness information. As the artifacts primarily deal with the shape
of the data, not how it should be set, it seems for now that `NodeInfoMixin`
should stay in core / not move to artifacts. This thinking may change though.

* Refactor `from_resource` to no longer use generics

In the next commit we're gonna add a `to_resource` method. As we don't
want to have to pass a resource into `to_resource`, the class itself
needs to expose what resource class should be built. Thus a type annotation
is no longer enough. To solve this we've added a class method to BaseNode
which returns the associated resource class. The method on BaseNode will
raise a NotImplementedError unless the the inheriting class has overridden
the `resouce_class` method to return the a resource class.

You may be thinking "Why not a class property"? And that is absolutely a
valid question. We used to be able to chain `@classmethod` with
`@property` to create a class property. However, this was deprecated in
python 3.11 and removed in 3.13 (details on why this happened can be found
[here](python/cpython#89519)). There is an
[alternate way to setup a class property](python/cpython#89519 (comment)),
however this seems a bit convoluted if a class method easily gets the job
done. The draw back is that we must do `.resource_class()` instead of
`.resource_class` and on classes implementing `BaseNode` we have to
override it with a method instead of a property specification.

Additionally, making it a class _instance_ property won't work because
we don't want to require an _instance_ of the class to get the
`resource_class` as we might not have an instance at our dispossal.

* Add `to_resource` method to `BaseNode`

Nodes have extra attributes. We don't want these extra attributes to
get serialized. Thus we're converting back to resources prior to
serialization. There could be a CPU hit here as we're now dictifying
and undictifying right before serialization. We can do some complicated
and non-straight-forward things to get around this. However, we want
to see how big of a perforance hit we actually have before going that
route.

* Drop `__post_serialize__` from `SourceDefinition` node class

The method `__post_serialize__` on the `SourceDefinition` was used for
ensuring the property `_event_status` didn't make it to the serialized
version of the node. Now that resource definition of `SourceDefinition`
handles serialization/deserialization, we can drop `__post_serialize__`
as it is no longer needed.

* Merge functional parts of `components` into their resource counter parts

We discussed this on the PR. It seems like a minimal lift, and minimal to
support. Doing so also has the benefit of reducing a bunch of the overriding
we were previously doing.

* Fixup:: Rename variable `name` to `node_id` in `_map_nodes_to_map_resources`

Naming is hard. That is all.

* Fixup: Ensure conversion of groups to resources for `WritableManifest`
  • Loading branch information
QMalcolm authored Feb 15, 2024
1 parent 2411f93 commit 8a1b927
Show file tree
Hide file tree
Showing 18 changed files with 303 additions and 220 deletions.
6 changes: 6 additions & 0 deletions .changes/unreleased/Under the Hood-20240208-120620.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Under the Hood
body: Move data parts of `SourceDefinition` class to dbt/artifacts
time: 2024-02-08T12:06:20.696709-08:00
custom:
Author: QMalcolm
Issue: "9384"
18 changes: 17 additions & 1 deletion core/dbt/artifacts/resources/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
from dbt.artifacts.resources.base import BaseResource, GraphResource

# alias to latest resource definitions
from dbt.artifacts.resources.v1.components import DependsOn, NodeVersion, RefArgs
from dbt.artifacts.resources.v1.components import (
ColumnInfo,
DependsOn,
FreshnessThreshold,
HasRelationMetadata,
NodeVersion,
Quoting,
RefArgs,
Time,
)
from dbt.artifacts.resources.v1.documentation import Documentation
from dbt.artifacts.resources.v1.exposure import (
Exposure,
Expand Down Expand Up @@ -50,3 +59,10 @@
SemanticModel,
SemanticModelConfig,
)
from dbt.artifacts.resources.v1.source_definition import (
ExternalPartition,
ExternalTable,
SourceDefinition,
ParsedSourceMandatory,
SourceConfig,
)
9 changes: 9 additions & 0 deletions core/dbt/artifacts/resources/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,12 @@ class RunHookType(StrEnum):
class ModelLanguage(StrEnum):
python = "python"
sql = "sql"


class TimePeriod(StrEnum):
minute = "minute"
hour = "hour"
day = "day"

def plural(self) -> str:
return str(self) + "s"
90 changes: 88 additions & 2 deletions core/dbt/artifacts/resources/v1/components.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
from dataclasses import dataclass, field
from datetime import timedelta
from dbt.artifacts.resources.types import TimePeriod
from dbt.artifacts.resources.v1.macro import MacroDependsOn
from dbt_common.dataclass_schema import dbtClassMixin
from typing import Dict, List, Optional, Union
from dbt_common.contracts.config.properties import AdditionalPropertiesMixin
from dbt_common.contracts.constraints import ColumnLevelConstraint
from dbt_common.contracts.util import Mergeable, Replaceable
from dbt_common.dataclass_schema import dbtClassMixin, ExtensibleDbtClassMixin
from typing import Any, Dict, List, Optional, Union


NodeVersion = Union[str, float]
Expand Down Expand Up @@ -35,3 +40,84 @@ def keyword_args(self) -> Dict[str, Optional[NodeVersion]]:
return {"version": self.version}
else:
return {}


@dataclass
class ColumnInfo(AdditionalPropertiesMixin, ExtensibleDbtClassMixin, Replaceable):
"""Used in all ManifestNodes and SourceDefinition"""

name: str
description: str = ""
meta: Dict[str, Any] = field(default_factory=dict)
data_type: Optional[str] = None
constraints: List[ColumnLevelConstraint] = field(default_factory=list)
quote: Optional[bool] = None
tags: List[str] = field(default_factory=list)
_extra: Dict[str, Any] = field(default_factory=dict)


@dataclass
class Quoting(dbtClassMixin, Mergeable):
database: Optional[bool] = None
schema: Optional[bool] = None
identifier: Optional[bool] = None
column: Optional[bool] = None


@dataclass
class Time(dbtClassMixin, Mergeable):
count: Optional[int] = None
period: Optional[TimePeriod] = None

def exceeded(self, actual_age: float) -> bool:
if self.period is None or self.count is None:
return False
kwargs: Dict[str, int] = {self.period.plural(): self.count}
difference = timedelta(**kwargs).total_seconds()
return actual_age > difference

def __bool__(self):
return self.count is not None and self.period is not None


@dataclass
class FreshnessThreshold(dbtClassMixin, Mergeable):
warn_after: Optional[Time] = field(default_factory=Time)
error_after: Optional[Time] = field(default_factory=Time)
filter: Optional[str] = None

def status(self, age: float) -> "dbt.artifacts.schemas.results.FreshnessStatus": # type: ignore # noqa F821
from dbt.artifacts.schemas.results import FreshnessStatus

if self.error_after and self.error_after.exceeded(age):
return FreshnessStatus.Error
elif self.warn_after and self.warn_after.exceeded(age):
return FreshnessStatus.Warn
else:
return FreshnessStatus.Pass

def __bool__(self):
return bool(self.warn_after) or bool(self.error_after)


@dataclass
class HasRelationMetadata(dbtClassMixin, Replaceable):
database: Optional[str]
schema: str

# Can't set database to None like it ought to be
# because it messes up the subclasses and default parameters
# so hack it here
@classmethod
def __pre_deserialize__(cls, data):
data = super().__pre_deserialize__(data)
if "database" not in data:
data["database"] = None
return data

@property
def quoting_dict(self) -> Dict[str, bool]:
if hasattr(self, "quoting"):
return self.quoting.to_dict(omit_none=True)
else:
return {}
72 changes: 72 additions & 0 deletions core/dbt/artifacts/resources/v1/source_definition.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import time

from dataclasses import dataclass, field
from dbt.artifacts.resources.base import GraphResource
from dbt.artifacts.resources.types import NodeType
from dbt.artifacts.resources.v1.components import (
ColumnInfo,
FreshnessThreshold,
HasRelationMetadata,
Quoting,
)
from dbt_common.contracts.config.base import BaseConfig
from dbt_common.contracts.config.properties import AdditionalPropertiesAllowed
from dbt_common.contracts.util import Mergeable, Replaceable
from dbt_common.exceptions import CompilationError
from typing import Any, Dict, List, Literal, Optional, Union


@dataclass
class ExternalPartition(AdditionalPropertiesAllowed, Replaceable):
name: str = ""
description: str = ""
data_type: str = ""
meta: Dict[str, Any] = field(default_factory=dict)

def __post_init__(self):
if self.name == "" or self.data_type == "":
raise CompilationError("External partition columns must have names and data types")


@dataclass
class ExternalTable(AdditionalPropertiesAllowed, Mergeable):
location: Optional[str] = None
file_format: Optional[str] = None
row_format: Optional[str] = None
tbl_properties: Optional[str] = None
partitions: Optional[Union[List[str], List[ExternalPartition]]] = None

def __bool__(self):
return self.location is not None


@dataclass
class SourceConfig(BaseConfig):
enabled: bool = True


@dataclass
class ParsedSourceMandatory(GraphResource, HasRelationMetadata):
source_name: str
source_description: str
loader: str
identifier: str
resource_type: Literal[NodeType.Source]


@dataclass
class SourceDefinition(ParsedSourceMandatory):
quoting: Quoting = field(default_factory=Quoting)
loaded_at_field: Optional[str] = None
freshness: Optional[FreshnessThreshold] = None
external: Optional[ExternalTable] = None
description: str = ""
columns: Dict[str, ColumnInfo] = field(default_factory=dict)
meta: Dict[str, Any] = field(default_factory=dict)
source_meta: Dict[str, Any] = field(default_factory=dict)
tags: List[str] = field(default_factory=list)
config: SourceConfig = field(default_factory=SourceConfig)
patch_path: Optional[str] = None
unrendered_config: Dict[str, Any] = field(default_factory=dict)
relation_name: Optional[str] = None
created_at: float = field(default_factory=lambda: time.time())
2 changes: 1 addition & 1 deletion core/dbt/artifacts/schemas/freshness/v3/freshness.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from typing import Dict, Any, Sequence, List, Union, Optional
from datetime import datetime

from dbt.artifacts.resources import FreshnessThreshold
from dbt.artifacts.schemas.results import ExecutionResult, FreshnessStatus, NodeResult, TimingInfo
from dbt.artifacts.schemas.base import (
ArtifactMixin,
Expand All @@ -12,7 +13,6 @@
from dbt_common.dataclass_schema import dbtClassMixin, StrEnum
from dbt_common.exceptions import DbtInternalError

from dbt.contracts.graph.unparsed import FreshnessThreshold
from dbt.contracts.graph.nodes import SourceDefinition


Expand Down
19 changes: 13 additions & 6 deletions core/dbt/contracts/graph/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,6 +772,9 @@ class ManifestStateCheck(dbtClassMixin):
project_hashes: MutableMapping[str, FileHash] = field(default_factory=dict)


NodeClassT = TypeVar("NodeClassT", bound="BaseNode")


@dataclass
class Manifest(MacroMethods, DataClassMessagePackMixin, dbtClassMixin):
"""The manifest for the full graph, after parsing and during compilation."""
Expand Down Expand Up @@ -1020,26 +1023,30 @@ def build_group_map(self):
group_map[node.group].append(node.unique_id)
self.group_map = group_map

@classmethod
def _map_nodes_to_map_resources(cls, nodes_map: MutableMapping[str, NodeClassT]):
return {node_id: node.to_resource() for node_id, node in nodes_map.items()}

def writable_manifest(self) -> "WritableManifest":
self.build_parent_and_child_maps()
self.build_group_map()
return WritableManifest(
nodes=self.nodes,
sources=self.sources,
sources=self._map_nodes_to_map_resources(self.sources),
macros=self.macros,
docs=self.docs,
exposures=self.exposures,
metrics=self.metrics,
groups=self.groups,
exposures=self._map_nodes_to_map_resources(self.exposures),
metrics=self._map_nodes_to_map_resources(self.metrics),
groups=self._map_nodes_to_map_resources(self.groups),
selectors=self.selectors,
metadata=self.metadata,
disabled=self.disabled,
child_map=self.child_map,
parent_map=self.parent_map,
group_map=self.group_map,
semantic_models=self.semantic_models,
semantic_models=self._map_nodes_to_map_resources(self.semantic_models),
unit_tests=self.unit_tests,
saved_queries=self.saved_queries,
saved_queries=self._map_nodes_to_map_resources(self.saved_queries),
)

def write(self, path):
Expand Down
6 changes: 1 addition & 5 deletions core/dbt/contracts/graph/model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
MetricConfig,
SavedQueryConfig,
SemanticModelConfig,
SourceConfig,
)
from dbt_common.contracts.config.base import BaseConfig, MergeBehavior, CompareBehavior
from dbt_common.contracts.config.materialization import OnConfigurationChangeOption
Expand Down Expand Up @@ -54,11 +55,6 @@ class Hook(dbtClassMixin, Replaceable):
index: Optional[int] = None


@dataclass
class SourceConfig(BaseConfig):
enabled: bool = True


@dataclass
class NodeAndTestConfig(BaseConfig):
enabled: bool = True
Expand Down
Loading

0 comments on commit 8a1b927

Please sign in to comment.