Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Change DefaultNoneColumnMapper to use a normal set #3580

Merged
merged 3 commits into from
Jan 11, 2023

Conversation

evanh
Copy link
Member

@evanh evanh commented Jan 9, 2023

Dataset configuration doesn't support complex objects for registered classes
like this. In order for the discover entity to be migration to YAML, there needs
to be a different way to initialize this class that can be encoded in YAML.

This mapper is checking if a column name exists in the ColumnSet. That check
compares against the flattened column name stored in the ColumnSet. This can be
achieved by comparing to a normal set instead of a ColumnSet.

Blast Radius

This should only concern anyone who works with the Discover entity.
Also, nothing should change when this is merged.

Dataset configuration doesn't support complex objects for registered classes
like this. In order for the discover entity to be migration to YAML, there needs
to be a different way to initialize this class that can be encoded in YAML.

This mapper is checking if a column name exists in the ColumnSet. That check
compares against the flattened column name stored in the ColumnSet. This can be
achieved by comparing to a normal set instead of a ColumnSet.
@evanh evanh requested a review from a team as a code owner January 9, 2023 21:44
@volokluev
Copy link
Member

I'm not sure this will take us in the direction we want to go in. The problem is more that the DefaultNoneColumnMapper needs to take the columns of the two tables that it's creating a merge table of. Is the plan to just re-enumerate all those columns in the discover entity yaml?

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2023

Codecov Report

Base: 92.20% // Head: 92.37% // Increases project coverage by +0.16% 🎉

Coverage data is based on head (580d297) compared to base (15d29cd).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3580      +/-   ##
==========================================
+ Coverage   92.20%   92.37%   +0.16%     
==========================================
  Files         733      733              
  Lines       33963    33974      +11     
==========================================
+ Hits        31317    31382      +65     
+ Misses       2646     2592      -54     
Impacted Files Coverage Δ
snuba/datasets/entities/discover.py 100.00% <ø> (ø)
snuba/clickhouse/translators/snuba/allowed.py 100.00% <100.00%> (ø)
snuba/state/__init__.py 70.64% <0.00%> (-0.46%) ⬇️
snuba/datasets/entities/metrics.py 100.00% <0.00%> (ø)
tests/datasets/configuration/test_entity_loader.py 98.55% <0.00%> (+0.16%) ⬆️
snuba/datasets/pluggable_entity.py 94.11% <0.00%> (+0.17%) ⬆️
snuba/datasets/storages/factory.py 95.95% <0.00%> (+2.02%) ⬆️
snuba/cli/__init__.py 86.36% <0.00%> (+18.18%) ⬆️
snuba/utils/streams/metrics_adapter.py 66.66% <0.00%> (+66.66%) ⬆️
snuba/cli/consumer.py 68.51% <0.00%> (+68.51%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@evanh
Copy link
Member Author

evanh commented Jan 9, 2023

Is the plan to just re-enumerate all those columns in the discover entity yaml?

That's what I was thinking. The list of transaction/event columns is going to need to be enumerated somewhere, since in theory the entities/discover.py file is going away.

That list of event/transaction specific columns is referenced in the class itself, but just as a convenience (instead of spelling out all the columns it does <list of common columns> + event columns + transactions columns.

I think the Discover entity would list out all the columns (common, event, transactions) as its schema, and then in the mappers define which ones are event specific vs. transaction specific. That will result in a lot of lines of configuration, but more closely maps to how these values are actually used.

@@ -148,7 +149,7 @@ class DefaultNoneColumnMapper(ColumnMapper):
the discover dataset file.
"""

columns: ColumnSet
columns: set[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we make this a list and then check uniqueness in the __post_init__ the yaml syntax for sets is pretty ugly
  2. Can we fix the docstring and explain that the columns list is a list of strings mapping to the column names?

@evanh evanh merged commit fa65895 into master Jan 11, 2023
@evanh evanh deleted the evanh/fix/default-none-column-mapper branch January 11, 2023 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants