Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(MDC): Represent migrations in configuration #3071

Merged
merged 5 commits into from
Sep 13, 2022

Conversation

evanh
Copy link
Member

@evanh evanh commented Aug 17, 2022

This changes the migration groups to load automatically from configuration files
and populate the various global dictionaries we are currently using.

There is a new ConfigurationLoader which reads a migration group from a config
path. That path is detected automatically. The loaders are then added to the
global list of loaders and can be accessed in the same method as the hardcoded
loaders.

It is also showing one way to solve the "Enum" problem we have with different
components of Snuba.

@lynnagara
Copy link
Member

A couple of initial thoughts/questions:

  1. What is the goal of representing individual migrations as configuration at this point? Is it due to safety concerns (running custom code)? Usability concerns with the current system? Even if we want to support this at a point, I believe we will always have a need for migrations as code, and am worried about taking this flexibility away from users. We needed code migrations a few times already such as when we rebuilt the errors table and I don't think this will go away.

  2. The migration groups mechanism seems fine to me - what is the purpose of having "kind" and "version" here though? Version at least seems like an entirely new concept, why do we need it?

@evanh
Copy link
Member Author

evanh commented Aug 19, 2022

The general consensus I'm gathering on this is: Don't make individual migrations part of the config, but making migration groups part of the configuration makes sense. I'm going to move forward with that.

what is the purpose of having "kind" and "version" here though?

These are standard fields we are adding to all the config files, so they are self describing and we can change the schemas a little easier in the future.

@lynnagara
Copy link
Member

Separate to the representation, I think there will be a few other questions we will have to answer as part of this. How will the initial system migrations (which are mandatory) be fit into this system. How will dependencies between migration groups be resolved? We currently use the OPTIONAL_MIGRATION_GROUPS mechanism for experimental product features - what is the future of this?

@fpacifici
Copy link
Contributor

Don't make individual migrations part of the config, but making migration groups part of the configuration makes sense. I'm going to move forward with that.

Should we make migration groups auto-discoverable ?
We did not do that at the beginning because there were corner cases that were not trivial to manage the sequence and potential merge conflicts in the same group.
Though in a world where we want to reduce the friction and reduce the places the product teams need to touch it may be a direction to reconsider.
We could do something like this:

  • only mention a new group in settings when they are created, or even in the dataset configuration.
  • auto-discover the migration files in the group directory
  • expect the migration files to be in sequence (keep the number at the beginning). It is something relatively easy to automatically test.

Comment on lines 48 to 66
- add_column
storage_set: generic_metrics_sets
table_name: generic_metric_sets_local
column:
name: _indexed_tags_hash
type: Array
args: { type: UInt, arg: 64 }
modifiers: { materialized: "arrayMap((k, v) -> cityHash64(concat(toString(k), '=', toString(v))), tags.key, tags.indexed_value)" }
after: null

- add_column
storage_set: generic_metrics_sets
table_name: generic_metric_sets_local
column:
name: _raw_tags_hash
type: Array
args: { type: UInt, arg: 64 }
modifiers: { materialized: "arrayMap((k, v) -> cityHash64(concat(toString(k), '=', v)), tags.key, tags.raw_value)" }
after: null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@onewland, fyi, these columns should be created on the distributed table as well. In fact they are missing in production.

@evanh evanh force-pushed the evanh/feat/migrations-as-config branch from 564b206 to d1e0ee9 Compare August 30, 2022 20:33
@evanh evanh marked this pull request as ready for review August 30, 2022 20:34
@evanh evanh requested a review from a team as a code owner August 30, 2022 20:34
@github-actions
Copy link

This PR has a migration; here is the generated SQL

CONFIG {'version': 'v1', 'kind': 'migration_group', 'name': 'generic_metrics', 'optional': True, 'migrations': ['0001_sets_aggregate_table', '0002_sets_raw_table', '0003_sets_mv', '0004_sets_raw_add_granularities', '0005_sets_replace_mv', '0006_sets_raw_add_granularities_dist_table', '0007_distributions_aggregate_table', '0008_distributions_raw_table', '0009_distributions_mv']}

@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2022

Codecov Report

Base: 92.84% // Head: 93.03% // Increases project coverage by +0.19% 🎉

Coverage data is based on head (550d0dd) compared to base (3bc37db).
Patch coverage: 96.92% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3071      +/-   ##
==========================================
+ Coverage   92.84%   93.03%   +0.19%     
==========================================
  Files         673      676       +3     
  Lines       30893    30914      +21     
==========================================
+ Hits        28683    28762      +79     
+ Misses       2210     2152      -58     
Impacted Files Coverage Δ
snuba/cli/migrations.py 40.95% <0.00%> (ø)
snuba/migrations/runner.py 91.66% <50.00%> (ø)
...ba/datasets/configuration/migration_json_schema.py 100.00% <100.00%> (ø)
snuba/datasets/configuration/migration_parser.py 100.00% <100.00%> (ø)
snuba/migrations/groups.py 95.09% <100.00%> (ø)
snuba/settings/__init__.py 94.48% <100.00%> (+0.04%) ⬆️
...ts/datasets/configuration/test_migration_groups.py 100.00% <100.00%> (ø)
tests/migrations/test_groups.py 100.00% <100.00%> (ø)
tests/migrations/test_runner.py 99.15% <100.00%> (ø)
tests/migrations/test_runner_individual.py 91.20% <100.00%> (-2.20%) ⬇️
... and 16 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

rahul-kumar-saini added a commit that referenced this pull request Sep 1, 2022
StorageKey enum replaced with a class.

Based on a working solution for MigrationGroup here:
#3071
@evanh evanh removed the do not merge label Sep 1, 2022
@evanh evanh force-pushed the evanh/feat/migrations-as-config branch from a2055e0 to 82cb0a6 Compare September 1, 2022 20:32
This changes the migration groups to load automatically from configuration files
and populate the various global dictionaries we are currently using.

There is a new `ConfigurationLoader` which reads a migration group from a config
path. That path is detected automatically. The loaders are then added to the
global list of loaders and can be accessed in the same method as the hardcoded
loaders.

It is also showing one way to solve the "Enum" problem we have with different
components of Snuba.
@evanh evanh force-pushed the evanh/feat/migrations-as-config branch from 82cb0a6 to d990750 Compare September 1, 2022 20:33
snuba/datasets/configuration/migration_parser.py Outdated Show resolved Hide resolved
snuba/migrations/groups.py Outdated Show resolved Hide resolved
Comment on lines +16 to +29
assert generic_metrics_loader.get_migrations() == [
"0001_sets_aggregate_table",
"0002_sets_raw_table",
"0003_sets_mv",
"0004_sets_raw_add_granularities",
"0005_sets_replace_mv",
"0006_sets_raw_add_granularities_dist_table",
"0007_distributions_aggregate_table",
"0008_distributions_raw_table",
"0009_distributions_mv",
]

m = generic_metrics_loader.load_migration("0005_sets_replace_mv")
assert isinstance(m, Migration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since migrations are likely to be added, I would just iterate over them and load_migration in order to make sure that they are all indeed migrations. Maybe assert that the len(migrations) > 5 or something.
This way when another one is added, you won't need to change the test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this because I think we should have a test like this. I do agree though that it will be annoying for generic metrics to have to update this test as well. I think in the future we maybe will want a dummy dataset we can do these kinds of tests on.

Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all my comments are not required for merging

@@ -73,6 +50,27 @@ def load_migration(self, migration_id: str) -> Migration:
raise MigrationDoesNotExist("Invalid migration ID")


class ConfigurationLoader(DirectoryLoader):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should inherit from the base GroupLoader not DirectoryLoader which supposed to be a completely separate implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because otherwise I'd have to copy the load_migration function over.

@lynnagara
Copy link
Member

Please also update MIGRATIONS.md

@lynnagara
Copy link
Member

lynnagara commented Sep 9, 2022

Reposting an earlier comment since it was marked resolved without a response. I'm still interested in your thoughts on this. If you have other reasons for keeping this interface I'll be happy to hear them but the only reason stated is to avoid changing a few spots in the codebase and that just doesn't seem right to me.

I don't think this enum like interface is the right one anymore since Migration groups can be defined on the fly. It made sense previously because all of the members were defined in code. The rest of the codebase never references MigrationGroup (only a handful of places within the migrations module) so there are not that many places to change.

This deletes the Enum and removes the need for the metaclass.
@evanh evanh merged commit ae89a81 into master Sep 13, 2022
@evanh evanh deleted the evanh/feat/migrations-as-config branch September 13, 2022 17:41
evanh added a commit that referenced this pull request Nov 2, 2022
It's not clear that migrations belong in configuration. This was already an
issue from the previous work, since the migrations themselves are still part of
the code. There is now a separate effort to figure out where migrations should
live, so reverting these changes in the mean time to avoid having things in a
half finished state.

Before state:

The generic metrics MigrationGroup was defined in configuration, and the groups
code was loading that configuration. The MigrationGroup was a string to allow
for groups being defined outside of Python code.

After state:

All MigrationGroups are defined in the python file, and MigrationGroup has been
changed back to an Enum.

This reverts commit ae89a81
evanh added a commit that referenced this pull request Nov 3, 2022
* Revert "ref(MDC): Represent migrations in configuration (#3071)"

It's not clear that migrations belong in configuration. This was already an
issue from the previous work, since the migrations themselves are still part of
the code. There is now a separate effort to figure out where migrations should
live, so reverting these changes in the mean time to avoid having things in a
half finished state.

Before state:

The generic metrics MigrationGroup was defined in configuration, and the groups
code was loading that configuration. The MigrationGroup was a string to allow
for groups being defined outside of Python code.

After state:

All MigrationGroups are defined in the python file, and MigrationGroup has been
changed back to an Enum.

This reverts commit ae89a81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants