Fix filter parsing bug #9508

courtneyholcomb · 2024-02-02T00:42:23Z

resolves #9507

Problem

Currently, if you pass a string into a filter YAML param, it will be assumed to be a list. This means your semantic manifest ends up with a list of strings for where filters. For example, this YAML:

filter: |
    {{ Dimension('customer__customer_type') }}  = 'new'

would look like this in the semantic manifest:

{"where_filters": [{"where_sql_template": "{"}, {"where_sql_template": "{"}, {"where_sql_template": " "}, {"where_sql_template": "D"}, {"where_sql_template": "i"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "n"}, {"where_sql_template": "s"}, {"where_sql_template": "i"}, {"where_sql_template": "o"}, {"where_sql_template": "n"}, {"where_sql_template": "("}, {"where_sql_template": "'"}, {"where_sql_template": "c"}, {"where_sql_template": "u"}, {"where_sql_template": "s"}, {"where_sql_template": "t"}, {"where_sql_template": "o"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "r"}, {"where_sql_template": "_"}, {"where_sql_template": "_"}, {"where_sql_template": "c"}, {"where_sql_template": "u"}, {"where_sql_template": "s"}, {"where_sql_template": "t"}, {"where_sql_template": "o"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "r"}, {"where_sql_template": "_"}, {"where_sql_template": "t"}, {"where_sql_template": "y"}, {"where_sql_template": "p"}, {"where_sql_template": "e"}, {"where_sql_template": "'"}, {"where_sql_template": ")"}, {"where_sql_template": " "}, {"where_sql_template": "}"}, {"where_sql_template": "}"}, {"where_sql_template": " "}, {"where_sql_template": " "}, {"where_sql_template": "="}, {"where_sql_template": " "}, {"where_sql_template": "'"}, {"where_sql_template": "n"}, {"where_sql_template": "e"}, {"where_sql_template": "w"}, {"where_sql_template": "'"}]}

This is because the dataclass from_dict() method only works for unions if Union is the outermost type annotation. We had it nested in an Optional annotation, so this fixes that.

Solution

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
This PR includes type annotations for new and modified functions

github-actions · 2024-02-02T00:42:38Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

codecov · 2024-02-02T00:44:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2411f93) 87.97% compared to head (50fc004) 87.93%.
Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9508      +/-   ##
==========================================
- Coverage   87.97%   87.93%   -0.05%     
==========================================
  Files         167      167              
  Lines       22171    22171              
==========================================
- Hits        19506    19496      -10     
- Misses       2665     2675      +10

Flag	Coverage Δ
integration	`85.51% <100.00%> (-0.12%)`	⬇️
unit	`61.90% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tlento

This seems reasonable to me based on my minuscule knowledge of Mashumaro (where the from_dict method apparently comes from), but I'm not a maintainer here so I'll let someone from the core team do the approving/merging.

tlento · 2024-02-02T00:58:31Z

core/dbt/contracts/graph/unparsed.py

@@ -564,7 +564,7 @@ def __bool__(self):
 @dataclass
 class UnparsedMetricInputMeasure(dbtClassMixin):
    name: str
-    filter: Optional[Union[str, List[str]]] = None
+    filter: Union[Optional[str], Optional[List[str]]] = None


Do we want to use an explicit None in the type spec instead of putting in an Optional on everything? Union[str, List[str], None] or str | List[str] | None is a little less repetitive, assuming either of those work.

Sure that works too! Just tested to confirm.

courtneyholcomb · 2024-02-02T17:01:51Z

based on my minuscule knowledge of Mashumaro (where the from_dict method apparently comes from)

@tlento I thought that too, but if you go into the definition in Mashumaro, it's just the type stubs. Took me forever to track this down as a result 😅 but turns out from_dict() is just coming from dataclass.

tlento · 2024-02-03T01:01:59Z

@tlento I thought that too, but if you go into the definition in Mashumaro, it's just the type stubs. Took me forever to track this down as a result 😅 but turns out from_dict() is just coming from dataclass.

Well I just learned something today. There is no from_dict method on a Python dataclass, which is why I was so confused at first, and that type stub is indeed non-functional as written, but I figured there was something that Mashumaro was doing to populate the body with some type-specific extraction.

Turns out the library adds it on the fly via a dynamic codegen + compile step invoked from the DataclassDictMixin.

Consequently, I have no idea what the resulting from_dict code is actually doing or why this was an issue or why or how this PR fixes it, but I 100% trust that this was the issue because I can easily imagine how it might happen.

Were you able to actually get to the from_dict method body in your investigation? I'd be curious to see what it looks like, since my feeble human brain is not able to assemble the recursively-generated code.

Wild stuff, either way.

Fatal1ty · 2024-02-03T05:58:53Z

Consequently, I have no idea what the resulting from_dict code is actually doing

You can see the generated code by enabling debug in the config parameter as described in the docs.

why this was an issue

Currently, Union types are handled naively by trying to use a handler for each variant type in the loop. Values of typestr are handled as is by default and that can be an issue in Union. The most reliable solution in this case would be to register a custom deserialization method. For example, if str is the first variant on the list, then using a simple isinstance check as in this example can help. Other options may include a custom method for list[str] or the whole union. You can play with debug mode to see which method is the best for you. I understand that this imperfection might be confusing and dangerous in some cases, so I have plans to redo the processing of unions in next versions.

If you still have any questions or need help, I invite you to open an issue for discussion.

QMalcolm

We should absolutely get this in. Thank you for doing this work ❤️ We should though add a test. Especially as @Fatal1ty mentioned, the typing based parser in Mashumaro will be seeing some changes in the future, so we want to guarantee that the filter continues to be parsed correctly as the things we depend on change how they operate. A good place to add a test would be in test_metrics.py.

QMalcolm · 2024-02-09T19:33:13Z

core/dbt/contracts/graph/unparsed.py

@@ -564,7 +564,7 @@ def __bool__(self):
 @dataclass
 class UnparsedMetricInputMeasure(dbtClassMixin):
    name: str
-    filter: Optional[Union[str, List[str]]] = None
+    filter: Union[str, List[str], None] = None


Nit: We should probably add a comment about whats going on here. I can imagine a future person like me accidentally changing it back to Optional during a refactor as that is our usual pattern.

Good call, updated!

courtneyholcomb · 2024-02-13T19:10:15Z

Added tests! @QMalcolm thank you for pointing me to where those should be!

Note that in the process of writing tests, I discovered a separate bug. If you try to use a list filter that includes jinja on a metric or an input measure, you'll get an error when you run dbt parse. For example, parsing the following YAML:

metrics:
  - name: collective_tenure_measure_filter_list
    label: "Collective tenure2"
    description: Total number of years of team experience
    type: simple
    type_params:
      measure:
        name: "years_tenure"
        filter:
          - "{{ Dimension('id__loves_dbt') }} is true"

Gets this error:
CompilationError("Could not render {{ Dimension('id__loves_dbt') }} is true: 'Dimension' is undefined")
The exact same YAML works if you include the filter as a string.

I'll be putting up a separate issue for that bug. But that's the reason you don't see any test cases for input measures or metrics with list filters. The same bug does not exist for filters on saved queries, oddly, so I was able to include list filter tests for those.

courtneyholcomb · 2024-02-13T19:25:18Z

I'm not sure why the artifacts check is failing in CI - it looks like the action itself is erroring? But LMK if there's something I need to fix there!

QMalcolm · 2024-02-17T12:18:48Z

I'm not sure why the artifacts check is failing in CI - it looks like the action itself is erroring? But LMK if there's something I need to fix there!

The failing CI check is one we've added recently. I've added the label artifact_minor_upgrade which essentially skips the check.

Context: we want to strongly control changes to /artifacts to help identify and mitigate breaking changes. To do this we added a CI step which checks for any file changes in /artifacts. Once we someone verifies the change isn't breaking, we add the artifact_minor_upgrade label to acknowledge this. We could implement a smarter check which recognizes what is and isn't a breaking change, and maybe we will in the future, but this gets the job done for now.

QMalcolm

Looks good to me! Thank you for adding the tests ❤️

Use Union syntax that works for dataclasses

68f03a8

courtneyholcomb requested review from tlento and QMalcolm February 2, 2024 00:42

courtneyholcomb requested a review from a team as a code owner February 2, 2024 00:42

cla-bot bot added the cla:yes label Feb 2, 2024

Changelog

d66e48d

tlento reviewed Feb 2, 2024

View reviewed changes

dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Feb 2, 2024

Use simpler syntax

b59f4d6

QMalcolm requested changes Feb 9, 2024

View reviewed changes

Add comments explaining this for future devs

3c96d52

courtneyholcomb force-pushed the court/dataclass-union-fix branch from 99f8e09 to f95123e Compare February 13, 2024 19:04

Write tests

a97fecd

courtneyholcomb force-pushed the court/dataclass-union-fix branch from f95123e to a97fecd Compare February 13, 2024 19:17

Merge branch 'main' into court/dataclass-union-fix

50fc004

courtneyholcomb requested a review from QMalcolm February 13, 2024 19:24

QMalcolm added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Feb 17, 2024

QMalcolm approved these changes Feb 17, 2024

View reviewed changes

QMalcolm merged commit 20f9049 into main Feb 20, 2024
57 of 58 checks passed

QMalcolm deleted the court/dataclass-union-fix branch February 20, 2024 19:37

tlento mentioned this pull request Feb 26, 2024

Update dbt-semantic-interfaces dependency to compatible range #9671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix filter parsing bug #9508

Fix filter parsing bug #9508

courtneyholcomb commented Feb 2, 2024 •

edited

Loading

github-actions bot commented Feb 2, 2024

codecov bot commented Feb 2, 2024 •

edited

Loading

tlento left a comment

tlento Feb 2, 2024

courtneyholcomb Feb 2, 2024

courtneyholcomb commented Feb 2, 2024

tlento commented Feb 3, 2024

Fatal1ty commented Feb 3, 2024 •

edited

Loading

QMalcolm left a comment

QMalcolm Feb 9, 2024 •

edited

Loading

courtneyholcomb Feb 9, 2024

courtneyholcomb commented Feb 13, 2024

courtneyholcomb commented Feb 13, 2024

QMalcolm commented Feb 17, 2024

QMalcolm left a comment

Fix filter parsing bug #9508

Fix filter parsing bug #9508

Conversation

courtneyholcomb commented Feb 2, 2024 • edited Loading

Problem

Solution

Checklist

github-actions bot commented Feb 2, 2024

codecov bot commented Feb 2, 2024 • edited Loading

Codecov Report

tlento left a comment

Choose a reason for hiding this comment

tlento Feb 2, 2024

Choose a reason for hiding this comment

courtneyholcomb Feb 2, 2024

Choose a reason for hiding this comment

courtneyholcomb commented Feb 2, 2024

tlento commented Feb 3, 2024

Fatal1ty commented Feb 3, 2024 • edited Loading

QMalcolm left a comment

Choose a reason for hiding this comment

QMalcolm Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

courtneyholcomb Feb 9, 2024

Choose a reason for hiding this comment

courtneyholcomb commented Feb 13, 2024

courtneyholcomb commented Feb 13, 2024

QMalcolm commented Feb 17, 2024

QMalcolm left a comment

Choose a reason for hiding this comment

courtneyholcomb commented Feb 2, 2024 •

edited

Loading

codecov bot commented Feb 2, 2024 •

edited

Loading

Fatal1ty commented Feb 3, 2024 •

edited

Loading

QMalcolm Feb 9, 2024 •

edited

Loading