-
-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve tables metadata #3672
Improve tables metadata #3672
Conversation
- Rename from patch to set - Make it work when no metadata record exists yet - Rename the metadata_dict parameter to metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly okay, but I have a change to request:
Please remove the Table Metadata key set from the models file (and its use in the tables utilities file). It's not needed, and will inevitably get out-of-sync with either the model, or the relevant TypedDict
, or both.
mathesar/models/base.py
Outdated
table_metadata_fields = { | ||
'import_verified', | ||
'column_order', | ||
'record_summary_customized', | ||
'record_summary_template', | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed. See my other comment.
mathesar/utils/tables.py
Outdated
TableMetaData.objects.update_or_create( | ||
database=Database.objects.get(id=database_id), | ||
table_oid=table_oid, | ||
defaults={f: v for f, v in metadata.items() if f in table_metadata_fields} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just pass the metadata
dictionary directly. I can think of no cases where the behavior would be different. E.g., if you have extra keys in that dictionary, they'll be silently ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it's a bit more subtle than this. I'd still like to simplify this to match a more typical example. Given that the dictionary to be passed is documented and indeed typed, I want to avoid maintaining these fields in so many places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of no cases where the behavior would be different. E.g., if you have extra keys in that dictionary, they'll be silently ignored.
Here are some cases:
-
For an existing TableMetaData record, pass a
metadata
object like:{ "table_oid": 99 }
The associated table is changed.
-
For an existing TableMetaData record, pass a
metadata
object like:{ "created_at": "1999-01-01 00:00:00" }
The creation timestamp is changed.
-
For a nonexistent TableMetaData record, pass a
metadata
object like:{ "id": 99 }
This creates a new record with the supplied id, messing up the PostgreSQL sequence, preventing any subsequent records from being created through the normal process
For stability's sake, I would prefer that the API not allow any of the above behavior.
I'd also like to point out that this field validation was already in place before my PR, right here. All I did was move the list of fields closer to the other code that it mirrors. The reason I moved it is precisely because I share your concern about that code getting out of sync. I actually hemmed and hawed about this for a while during this PR. I tried a played around with a few different approaches, basically kind of hating all of them. I'm finding that there is really an absurd amount of duplication and boilerplate with the patterns we're using. I don't blame us for this. I blame python. For example in rpc/tables/metadata.py
there are eight occurrences of record_summary_customized
. To me that's crazy. It's crazy that we can't easily be more expressive about our intent with these fields.
I don't like having an additional list of metadata fields in mathesar/models/base.py
either (as I originally had in this PR when I opened it). I tried two other approaches to maintaining the field validation while avoiding the need to maintain a list of fields.
-
I tried finding a way to do some runtime introspection on the
TableMetaData
class, to get a list of field names set on that class but not its ancestors. And I wanted to then manually ignore fields likedatabase
andtable_oid
. That got kind of messy and started to bump up against the limits of my immediate knowledge of Python metaprogramming. I imagine this approach is possible. I just didn't want to continue spending time trying to figure out a clean way to do this. -
I also tried moving the
TableMetaDataBlob
TypedDict class lower down in the module graph. I tried putting it inmathesar/models/base.py
. Then I imported it from there tomathesar/utils/tables.py
so I could iterate over the fields for the sake of validation. But I couldn't figure out how to appropriate incorporate the documentation of that class into the rendered output in MkDocs. I wrestled with this for quite a while actually. Probably at least 1 hour. I read over the mkdocstrings docs trying to figure out a way to do this. I imagine it's possible. But again, I didn't want to spend more time on it.
Ultimately I chose to prioritize application stability (validating the metadata fields) and development velocity (abandoning the above approaches) at the expense of code duplication. I still think that's the right balance to strike.
Given the counter examples I posed at the start of this comment, do you still maintain that we don't need any field validation here?
Do you have any better ideas for reducing code duplication?
Given that the list of fields was already in place before this PR, do you think that this code duplication is important enough to address in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in our technical beta meeting today and decided to take Brent's approach for now but to hopefully come back to this topic later and find a better way to validate RPC request inputs in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, in the interest of speed and breaking things, I've made the change I suggested. @seancolsen , if you're okay with this, go ahead and merge. If not, we can go back and forth on it.
mathesar/rpc/tables/metadata.py
Outdated
) -> TableMetaData: | ||
def set_( | ||
*, table_oid: int, metadata: TableMetaDataBlob, database_id: int, **kwargs | ||
) -> TableMetaDataRecord: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> TableMetaDataRecord: | |
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 3396114. Thanks!
Fixes #3671
Notes
I didn't implement
tables.get_with_metadata
(as specified in the issue). I think this is fine for now. I'm not 100% sure I'll actually needtables.get_with_metadata
. I'll implement it later if I do find that I need it.In 243823f I removed
test_tables_meta_data_set
. This is the type of test that I have advocated for not writing (in a meeting recently). I still maintain this having this sort of test in the codebase does us more harm than good at this stage. It slowed me down while making this PR. I wanted to remove the return value fromset_table_meta_data
and fromtables.metadata.set
. I don't want a return value there because I see no need for API consumers to utilize a return value. I want to keep these things as simple as possible. In removing the return value, the test broke. And I didn't have an easy way to refactor the test to fix the assertion. So I just deleted it.Checklist
Update index.md
).develop
branch of the repositoryDeveloper Certificate of Origin
Developer Certificate of Origin