Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow use of sources as unit testing inputs #9059

Merged
merged 5 commits into from
Nov 15, 2023

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Nov 12, 2023

resolves #8507

Problem

We want to support the use of sources as inputs in unit test cases.

Solution

Created a UnitTestSourceDefinition object, which acts as a source for purposes of resolving "source" calls, but acts as a model for purpose of executing the test case.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@gshank gshank requested review from a team as code owners November 12, 2023 00:15
@cla-bot cla-bot bot added the cla:yes label Nov 12, 2023
Copy link

codecov bot commented Nov 12, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (436dae6) 86.80% compared to head (c5b4428) 86.81%.

Files Patch % Lines
core/dbt/context/providers.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##           unit_testing_feature_branch    #9059   +/-   ##
============================================================
  Coverage                        86.80%   86.81%           
============================================================
  Files                              181      181           
  Lines                            27057    27075   +18     
============================================================
+ Hits                             23488    23505   +17     
- Misses                            3569     3570    +1     
Flag Coverage Δ
integration 83.82% <97.22%> (+0.01%) ⬆️
unit 64.57% <36.11%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gshank gshank requested review from MichelleArk and removed request for mikealfare November 13, 2023 14:54
@@ -263,8 +271,10 @@ def create_from(
node: ResultNode,
**kwargs: Any,
) -> Self:
if node.resource_type == NodeType.Source:
if not isinstance(node, SourceDefinition):
if node.resource_type == NodeType.Source or isinstance(node, UnitTestSourceDefinition):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify the logic here?

Copy link
Contributor Author

@gshank gshank Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? We can't set the resource_type to Source because that breaks execution.

@gshank
Copy link
Contributor Author

gshank commented Nov 13, 2023

I looked at taking out the special casing of UnitTestSourceDefinition, but unfortunately there are subtle differences in the specification of quoting between sources and models, and so I think it's best to actually use the relation.create_from_source to get the quoting right.

source_name=original_input_node.source_name, # needed for source lookup
)
# Sources need to go in the sources dictionary in order to create the right lookup
self.unit_test_manifest.sources[input_node.unique_id] = input_node # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we anticipate any issues by having the sources dictionary contain a unique_id key that is prefixed with model instead of source here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to care. We don't actually check the unique_id prefix that I can recall. If somebody starts parsing the unit_test_manifest, I suppose it might be confusing. But right now we're putting it in two places, so one of them will be wrong.

Copy link
Contributor

@MichelleArk MichelleArk Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably isn't worth spending tons of time on.. but I think it could be possible to get around having to add the node to manifest.sources and do the lookup from the .nodes collection in UnitTestRuntimeSourceResolver since the unique_id will include source_name. kind of like what's done here: https://github.com/dbt-labs/dbt-core/blob/unit_testing_feature_branch/core/dbt/context/providers.py#L578

Not entirely sure what's more readable or less complex in this case. I can imagine having to maintain UnitTestSourceDefinitions across both dictionaries could be error-prone though..

But right now we're putting it in two places, so one of them will be wrong.

Given that UnitTestSourceDefinition is a ModelNode, I think having it in nodes is 'more' correct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lookup behavior of sources and nodes is subtly different with regard to the meaning of package=None, so I don't think looking up sources as though they were nodes is worth it.

# Sources need to go in the sources dictionary in order to create the right lookup
self.unit_test_manifest.sources[input_node.unique_id] = input_node # type: ignore

# Both ModelNode and UnitTestSourceDefinition need to go in nodes dictionary
Copy link
Contributor

@MichelleArk MichelleArk Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my own understanding - is this to enable cte injection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. There's code in compilation.py that looks up the existence of the cte in the nodes dictionary: if cte.id not in manifest.nodes:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we could also check for a UnitTestSourceDefinition and in sources, but that didn't feel like an improvement.

"resource_type": NodeType.Model,
"package_name": package_name,
"original_file_path": original_input_node.original_file_path,
"unique_id": f"model.{package_name}.{input_name}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need to include source_name in input_name to avoid clobbering sources with the same table_name but different source_names when they are inserted to manifest.nodes and manifest.sources below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to include the source_name. This does make for pretty long unique_ids. Do we have any concerns about that? It's not like we're using that name to construct tables or anything...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've noticed this issue creep up in #9015. I think we could shorten the node name for CTE generation (since it doesn't need to be unique) but keep the unique_id longer

@gshank gshank merged commit c6be2d2 into unit_testing_feature_branch Nov 15, 2023
49 checks passed
@gshank gshank deleted the 8507-unit_testing_input_sources branch November 15, 2023 15:31
gshank added a commit that referenced this pull request Jan 16, 2024
* Initial implementation of unit testing (from pr #2911)

Co-authored-by: Michelle Ark <[email protected]>

* 8295 unit testing artifacts (#8477)

* unit test config: tags & meta (#8565)

* Add additional functional test for unit testing selection, artifacts, etc (#8639)

* Enable inline csv format in unit testing (#8743)

* Support unit testing incremental models (#8891)

* update unit test key: unit -> unit-tests (#8988)


* convert to use unit test name at top level key (#8966)

* csv file fixtures (#9044)

* Unit test support for `state:modified` and `--defer` (#9032)

Co-authored-by: Michelle Ark <[email protected]>

* Allow use of sources as unit testing inputs (#9059)

* Use daff for diff formatting in unit testing (#8984)

* Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064)

Co-authored-by: Michelle Ark <[email protected]>
Fix #8652: Use seed value if rows not specified

* Move unit testing to test and build commands (#9108)

* Enable unit testing in non-root packages (#9184)

* convert test to data_test (#9201)

* Make fixtures files full-fledged members of manifest and enable partial parsing (#9225)

* In build command run unit tests before models (#9273)

---------

Co-authored-by: Michelle Ark <[email protected]>
Co-authored-by: Michelle Ark <[email protected]>
Co-authored-by: Emily Rockman <[email protected]>
Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Kshitij Aranke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants