fix: Change DefaultNoneColumnMapper to use a normal set #3580

evanh · 2023-01-09T21:44:21Z

Dataset configuration doesn't support complex objects for registered classes
like this. In order for the discover entity to be migration to YAML, there needs
to be a different way to initialize this class that can be encoded in YAML.

This mapper is checking if a column name exists in the ColumnSet. That check
compares against the flattened column name stored in the ColumnSet. This can be
achieved by comparing to a normal set instead of a ColumnSet.

Blast Radius

This should only concern anyone who works with the Discover entity.
Also, nothing should change when this is merged.

Dataset configuration doesn't support complex objects for registered classes like this. In order for the discover entity to be migration to YAML, there needs to be a different way to initialize this class that can be encoded in YAML. This mapper is checking if a column name exists in the ColumnSet. That check compares against the flattened column name stored in the ColumnSet. This can be achieved by comparing to a normal set instead of a ColumnSet.

volokluev · 2023-01-09T21:47:00Z

I'm not sure this will take us in the direction we want to go in. The problem is more that the DefaultNoneColumnMapper needs to take the columns of the two tables that it's creating a merge table of. Is the plan to just re-enumerate all those columns in the discover entity yaml?

codecov-commenter · 2023-01-09T21:48:02Z

Codecov Report

Base: 92.20% // Head: 92.37% // Increases project coverage by +0.16% 🎉

Coverage data is based on head (580d297) compared to base (15d29cd).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3580      +/-   ##
==========================================
+ Coverage   92.20%   92.37%   +0.16%     
==========================================
  Files         733      733              
  Lines       33963    33974      +11     
==========================================
+ Hits        31317    31382      +65     
+ Misses       2646     2592      -54

Impacted Files	Coverage Δ
snuba/datasets/entities/discover.py	`100.00% <ø> (ø)`
snuba/clickhouse/translators/snuba/allowed.py	`100.00% <100.00%> (ø)`
snuba/state/__init__.py	`70.64% <0.00%> (-0.46%)`	⬇️
snuba/datasets/entities/metrics.py	`100.00% <0.00%> (ø)`
tests/datasets/configuration/test_entity_loader.py	`98.55% <0.00%> (+0.16%)`	⬆️
snuba/datasets/pluggable_entity.py	`94.11% <0.00%> (+0.17%)`	⬆️
snuba/datasets/storages/factory.py	`95.95% <0.00%> (+2.02%)`	⬆️
snuba/cli/__init__.py	`86.36% <0.00%> (+18.18%)`	⬆️
snuba/utils/streams/metrics_adapter.py	`66.66% <0.00%> (+66.66%)`	⬆️
snuba/cli/consumer.py	`68.51% <0.00%> (+68.51%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

evanh · 2023-01-09T22:02:03Z

Is the plan to just re-enumerate all those columns in the discover entity yaml?

That's what I was thinking. The list of transaction/event columns is going to need to be enumerated somewhere, since in theory the entities/discover.py file is going away.

That list of event/transaction specific columns is referenced in the class itself, but just as a convenience (instead of spelling out all the columns it does <list of common columns> + event columns + transactions columns.

I think the Discover entity would list out all the columns (common, event, transactions) as its schema, and then in the mappers define which ones are event specific vs. transaction specific. That will result in a lot of lines of configuration, but more closely maps to how these values are actually used.

volokluev · 2023-01-09T22:25:41Z

snuba/clickhouse/translators/snuba/allowed.py

@@ -148,7 +149,7 @@ class DefaultNoneColumnMapper(ColumnMapper):
    the discover dataset file.
    """

-    columns: ColumnSet
+    columns: set[str]


Can we make this a list and then check uniqueness in the __post_init__ the yaml syntax for sets is pretty ugly

Can we fix the docstring and explain that the columns list is a list of strings mapping to the column names?

evanh requested a review from a team as a code owner January 9, 2023 21:44

volokluev approved these changes Jan 9, 2023

View reviewed changes

evanh added 2 commits January 10, 2023 15:14

update docs and take a list as input

50d79ce

fix entity

580d297

evanh merged commit fa65895 into master Jan 11, 2023

evanh deleted the evanh/fix/default-none-column-mapper branch January 11, 2023 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Change DefaultNoneColumnMapper to use a normal set #3580

fix: Change DefaultNoneColumnMapper to use a normal set #3580

evanh commented Jan 9, 2023

volokluev commented Jan 9, 2023

codecov-commenter commented Jan 9, 2023 •

edited

Loading

evanh commented Jan 9, 2023

volokluev Jan 9, 2023

fix: Change DefaultNoneColumnMapper to use a normal set #3580

fix: Change DefaultNoneColumnMapper to use a normal set #3580

Conversation

evanh commented Jan 9, 2023

Blast Radius

volokluev commented Jan 9, 2023

codecov-commenter commented Jan 9, 2023 • edited Loading

Codecov Report

evanh commented Jan 9, 2023

volokluev Jan 9, 2023

Choose a reason for hiding this comment

codecov-commenter commented Jan 9, 2023 •

edited

Loading