-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(MDC): Represent migrations in configuration #3071
Conversation
A couple of initial thoughts/questions:
|
The general consensus I'm gathering on this is: Don't make individual migrations part of the config, but making migration groups part of the configuration makes sense. I'm going to move forward with that.
These are standard fields we are adding to all the config files, so they are self describing and we can change the schemas a little easier in the future. |
Separate to the representation, I think there will be a few other questions we will have to answer as part of this. How will the initial system migrations (which are mandatory) be fit into this system. How will dependencies between migration groups be resolved? We currently use the OPTIONAL_MIGRATION_GROUPS mechanism for experimental product features - what is the future of this? |
Should we make migration groups auto-discoverable ?
|
- add_column | ||
storage_set: generic_metrics_sets | ||
table_name: generic_metric_sets_local | ||
column: | ||
name: _indexed_tags_hash | ||
type: Array | ||
args: { type: UInt, arg: 64 } | ||
modifiers: { materialized: "arrayMap((k, v) -> cityHash64(concat(toString(k), '=', toString(v))), tags.key, tags.indexed_value)" } | ||
after: null | ||
|
||
- add_column | ||
storage_set: generic_metrics_sets | ||
table_name: generic_metric_sets_local | ||
column: | ||
name: _raw_tags_hash | ||
type: Array | ||
args: { type: UInt, arg: 64 } | ||
modifiers: { materialized: "arrayMap((k, v) -> cityHash64(concat(toString(k), '=', v)), tags.key, tags.raw_value)" } | ||
after: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@onewland, fyi, these columns should be created on the distributed table as well. In fact they are missing in production.
564b206
to
d1e0ee9
Compare
This PR has a migration; here is the generated SQL CONFIG {'version': 'v1', 'kind': 'migration_group', 'name': 'generic_metrics', 'optional': True, 'migrations': ['0001_sets_aggregate_table', '0002_sets_raw_table', '0003_sets_mv', '0004_sets_raw_add_granularities', '0005_sets_replace_mv', '0006_sets_raw_add_granularities_dist_table', '0007_distributions_aggregate_table', '0008_distributions_raw_table', '0009_distributions_mv']} |
Codecov ReportBase: 92.84% // Head: 93.03% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #3071 +/- ##
==========================================
+ Coverage 92.84% 93.03% +0.19%
==========================================
Files 673 676 +3
Lines 30893 30914 +21
==========================================
+ Hits 28683 28762 +79
+ Misses 2210 2152 -58
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
StorageKey enum replaced with a class. Based on a working solution for MigrationGroup here: #3071
a2055e0
to
82cb0a6
Compare
This changes the migration groups to load automatically from configuration files and populate the various global dictionaries we are currently using. There is a new `ConfigurationLoader` which reads a migration group from a config path. That path is detected automatically. The loaders are then added to the global list of loaders and can be accessed in the same method as the hardcoded loaders. It is also showing one way to solve the "Enum" problem we have with different components of Snuba.
82cb0a6
to
d990750
Compare
assert generic_metrics_loader.get_migrations() == [ | ||
"0001_sets_aggregate_table", | ||
"0002_sets_raw_table", | ||
"0003_sets_mv", | ||
"0004_sets_raw_add_granularities", | ||
"0005_sets_replace_mv", | ||
"0006_sets_raw_add_granularities_dist_table", | ||
"0007_distributions_aggregate_table", | ||
"0008_distributions_raw_table", | ||
"0009_distributions_mv", | ||
] | ||
|
||
m = generic_metrics_loader.load_migration("0005_sets_replace_mv") | ||
assert isinstance(m, Migration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since migrations are likely to be added, I would just iterate over them and load_migration
in order to make sure that they are all indeed migrations. Maybe assert that the len(migrations) > 5
or something.
This way when another one is added, you won't need to change the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave this because I think we should have a test like this. I do agree though that it will be annoying for generic metrics to have to update this test as well. I think in the future we maybe will want a dummy dataset we can do these kinds of tests on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all my comments are not required for merging
@@ -73,6 +50,27 @@ def load_migration(self, migration_id: str) -> Migration: | |||
raise MigrationDoesNotExist("Invalid migration ID") | |||
|
|||
|
|||
class ConfigurationLoader(DirectoryLoader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should inherit from the base GroupLoader
not DirectoryLoader which supposed to be a completely separate implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because otherwise I'd have to copy the load_migration
function over.
Please also update MIGRATIONS.md |
Reposting an earlier comment since it was marked resolved without a response. I'm still interested in your thoughts on this. If you have other reasons for keeping this interface I'll be happy to hear them but the only reason stated is to avoid changing a few spots in the codebase and that just doesn't seem right to me.
|
This deletes the Enum and removes the need for the metaclass.
It's not clear that migrations belong in configuration. This was already an issue from the previous work, since the migrations themselves are still part of the code. There is now a separate effort to figure out where migrations should live, so reverting these changes in the mean time to avoid having things in a half finished state. Before state: The generic metrics MigrationGroup was defined in configuration, and the groups code was loading that configuration. The MigrationGroup was a string to allow for groups being defined outside of Python code. After state: All MigrationGroups are defined in the python file, and MigrationGroup has been changed back to an Enum. This reverts commit ae89a81
* Revert "ref(MDC): Represent migrations in configuration (#3071)" It's not clear that migrations belong in configuration. This was already an issue from the previous work, since the migrations themselves are still part of the code. There is now a separate effort to figure out where migrations should live, so reverting these changes in the mean time to avoid having things in a half finished state. Before state: The generic metrics MigrationGroup was defined in configuration, and the groups code was loading that configuration. The MigrationGroup was a string to allow for groups being defined outside of Python code. After state: All MigrationGroups are defined in the python file, and MigrationGroup has been changed back to an Enum. This reverts commit ae89a81
This changes the migration groups to load automatically from configuration files
and populate the various global dictionaries we are currently using.
There is a new
ConfigurationLoader
which reads a migration group from a configpath. That path is detected automatically. The loaders are then added to the
global list of loaders and can be accessed in the same method as the hardcoded
loaders.
It is also showing one way to solve the "Enum" problem we have with different
components of Snuba.