-
-
Notifications
You must be signed in to change notification settings - Fork 37.7k
New indexes for states and recording_runs tables #6688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
894699b
f9a0f23
77c007d
7d7c4e4
38aa348
d397ccb
c8e37e9
b3b1360
37e3e5d
7cd55c9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,20 +41,39 @@ def _apply_update(engine, new_version): | |
| from sqlalchemy import Table | ||
| from . import models | ||
|
|
||
| if new_version == 1: | ||
| def create_index(table_name, column_name): | ||
| """Create an index for the specified table and column.""" | ||
| table = Table(table_name, models.Base.metadata) | ||
| name = "_".join(("ix", table_name, column_name)) | ||
| # Look up the index object that was created from the models | ||
| index = next(idx for idx in table.indexes if idx.name == name) | ||
| _LOGGER.debug("Creating index for table %s column %s", | ||
| table_name, column_name) | ||
| index.create(engine) | ||
| _LOGGER.debug("Index creation done for table %s column %s", | ||
| table_name, column_name) | ||
| def create_index(table_name, column_name): | ||
| """Create an index for the specified table and column.""" | ||
| table = Table(table_name, models.Base.metadata) | ||
| name = "_".join(("ix", table_name, column_name)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about something like this to combine these into one function? The pattern is usually called def create_index(table_name, *column_names):
name = "_".join("_".join("ix", table_name), column_names)
...Then it could be used for both index types like this create_index("recorder_runs", "start", "end")
create_index("states", "last_updated")
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So what happens when we load an old database with new models objects, is that the actual Index sqlalchemy metadata objects get initialized with all the parameters we used in models.py, but the index isn't created in the underlying database engine. These indexes are present in the next is a python built-in that returns the next item from an iterator. Since we create a new iterator by searching through table.indexes for idx.name == name, there's only going to be one index in that iterator. What that means is that we're getting the metadata object for the index that was created by models.py. Once we have the reference all we have to do is call index.create. |
||
| # Look up the index object that was created from the models | ||
| index = next(idx for idx in table.indexes if idx.name == name) | ||
| _LOGGER.debug("Creating index for table %s column %s", | ||
| table_name, column_name) | ||
| index.create(engine) | ||
| _LOGGER.debug("Index creation done for table %s column %s", | ||
| table_name, column_name) | ||
|
|
||
| def create_compound_index(table_name, column_names): | ||
| """Create an index for the specified table and columns.""" | ||
| table = Table(table_name, models.Base.metadata) | ||
| index_name = "ix_" + table_name + "_" + "_".join(column_names) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefer string formatting over concatenation. index_name = "ix_{}_{}".format(table_name, "_".join(…))Although, actually I prefer @armills approach better. Extract a function: def _generate_name(*args):
"""Generate a name based on arguments."""
return "_".join(args)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per my comment about migration.py I think in order to do that we'd have to split out column names from extra arguments (e.g. unique=True). I generally prefer using something like field1_field2_idx for non-unique, field1_field2_uq for unique and field1_fk for indexes required for foreign keys (I think that may be MySQL dependent though). Otherwise, the column names could get kinda wonky. The other weird bit I ran into I'm not sure how to solve is the column name had to match what was in models.py. if it didn't, HA would get confused and fail (I'm not sure exactly why though). So I wonder if there was a way to either grab the index name from the models.py or have a function that generated the name in both places?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I described the procedure in the comment above, but basically the only thing this function is doing is looking up the index object by name. It isn't creating it or specifying any arguments such as unique that will be used by the index. The reason we are generating the name here is so that it will match the name generated by sqlalchemy for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Aha ok that helps explain things quite a bit! If it's ok, I'll probably wrap the above explanation around some in-line comments since that might be helpful to future folks trying to figure out how that works.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think expanding the documentation never hurts. Obviously we should leave out
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm in terms of the index naming, would it make more sense to use literals here? If we have to manually create the index name on models.py, it seems like it might be easier to do that here? So something like I'm normally not a fan of literals like that, but unless there is a way to auto-generate an index name in models.py, it might keep things a bit less error-prone for folks that don't know about the auto-naming bits going on in migration.py?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm OK with that myself. We should have one function that takes the table and index name as it's inputs. We can then have a separate function like balloob described that can generate index names for single column indexes, which gets passed into the first function.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah good point, we could. Actually I wonder if we can reuse whatever function SQLAlchemy uses for that (I pulled up the docs to figure out how it names its indexes when using Index=True but it just provided the naming convention). For more clarity, here's what I've done based on the above: |
||
| # Look up the index object that was created from the models | ||
| _LOGGER.debug("Looking up index for table %s", table_name) | ||
| index = next(idx for idx in table.indexes if idx.name == index_name) | ||
| _LOGGER.debug("Creating %s index for table %s", | ||
| index_name, table_name) | ||
| index.create(engine) | ||
| _LOGGER.debug("Done creating %s index for table %s", | ||
| index_name, table_name) | ||
|
|
||
| if new_version == 1: | ||
| create_index("events", "time_fired") | ||
| elif new_version == 2: | ||
| # Create compound start/end index for recorder_runs | ||
| create_compound_index("recorder_runs", ("start", "end")) | ||
| # Create indexes for states | ||
| create_index("states", "last_updated") | ||
| create_index("states", "created") | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I managed to get a compound index in here for recorder_runs. It's not a huge table (at least in my own DB) relative to states, but was causing a non-indexed table-scan. I really wanted to convert the create_index function to handle single or multiple indexes, but given how the Index constructor works, I was having a heck of a time. In either case, not sure how pruned indexes (e.g. I mention that because some of the types within tables ideally should be an ENUM for performance, but from the standpoint of flexibility are varchars. A reasonable compromise for that is to use a small pruned index which saves space but still gives decent cardinality. That's outside the scope of this PR, but sort of explaining my thought process here.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only thing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Took me a while to figure it out, but you're right that does work. I'll be submitting an update to the PR shortly. |
||
| else: | ||
| raise ValueError("No schema migration defined for version {}" | ||
| .format(new_version)) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move these methods out of the
_apply_updatemethod?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd defer to @armills on that one since my PR uses mostly what was already there. I would imagine these functions would only be useful during a migration unless things were getting very complex (say by creating a temporary table and creating an index on that for later use) since, otherwise, the real places index definitions live is in the models?
The HA approach to migrations is pretty new to me so I'm still wrapping my head around it, to be fair.