Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tables metadata #3672

Merged
merged 6 commits into from
Jul 11, 2024
Merged

Improve tables metadata #3672

merged 6 commits into from
Jul 11, 2024

Conversation

seancolsen
Copy link
Contributor

Fixes #3671

Notes

  • I didn't implement tables.get_with_metadata (as specified in the issue). I think this is fine for now. I'm not 100% sure I'll actually need tables.get_with_metadata. I'll implement it later if I do find that I need it.

  • In 243823f I removed test_tables_meta_data_set. This is the type of test that I have advocated for not writing (in a meeting recently). I still maintain this having this sort of test in the codebase does us more harm than good at this stage. It slowed me down while making this PR. I wanted to remove the return value from set_table_meta_data and from tables.metadata.set. I don't want a return value there because I see no need for API consumers to utilize a return value. I want to keep these things as simple as possible. In removing the return value, the test broke. And I didn't have an easy way to refactor the test to fix the assertion. So I just deleted it.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

- Rename from patch to set
- Make it work when no metadata record exists yet
- Rename the metadata_dict parameter to metadata
@seancolsen seancolsen added this to the Beta milestone Jul 10, 2024
@seancolsen seancolsen added the pr-status: review A PR awaiting review label Jul 10, 2024
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly okay, but I have a change to request:

Please remove the Table Metadata key set from the models file (and its use in the tables utilities file). It's not needed, and will inevitably get out-of-sync with either the model, or the relevant TypedDict, or both.

Comment on lines 143 to 148
table_metadata_fields = {
'import_verified',
'column_order',
'record_summary_customized',
'record_summary_template',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed. See my other comment.

TableMetaData.objects.update_or_create(
database=Database.objects.get(id=database_id),
table_oid=table_oid,
defaults={f: v for f, v in metadata.items() if f in table_metadata_fields}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just pass the metadata dictionary directly. I can think of no cases where the behavior would be different. E.g., if you have extra keys in that dictionary, they'll be silently ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently it's a bit more subtle than this. I'd still like to simplify this to match a more typical example. Given that the dictionary to be passed is documented and indeed typed, I want to avoid maintaining these fields in so many places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of no cases where the behavior would be different. E.g., if you have extra keys in that dictionary, they'll be silently ignored.

Here are some cases:

  • For an existing TableMetaData record, pass a metadata object like:

    {
      "table_oid": 99
    }

    The associated table is changed.

  • For an existing TableMetaData record, pass a metadata object like:

    {
      "created_at": "1999-01-01 00:00:00"
    }

    The creation timestamp is changed.

  • For a nonexistent TableMetaData record, pass a metadata object like:

    {
      "id": 99
    }

    This creates a new record with the supplied id, messing up the PostgreSQL sequence, preventing any subsequent records from being created through the normal process

For stability's sake, I would prefer that the API not allow any of the above behavior.

I'd also like to point out that this field validation was already in place before my PR, right here. All I did was move the list of fields closer to the other code that it mirrors. The reason I moved it is precisely because I share your concern about that code getting out of sync. I actually hemmed and hawed about this for a while during this PR. I tried a played around with a few different approaches, basically kind of hating all of them. I'm finding that there is really an absurd amount of duplication and boilerplate with the patterns we're using. I don't blame us for this. I blame python. For example in rpc/tables/metadata.py there are eight occurrences of record_summary_customized. To me that's crazy. It's crazy that we can't easily be more expressive about our intent with these fields.

I don't like having an additional list of metadata fields in mathesar/models/base.py either (as I originally had in this PR when I opened it). I tried two other approaches to maintaining the field validation while avoiding the need to maintain a list of fields.

  • I tried finding a way to do some runtime introspection on the TableMetaData class, to get a list of field names set on that class but not its ancestors. And I wanted to then manually ignore fields like database and table_oid. That got kind of messy and started to bump up against the limits of my immediate knowledge of Python metaprogramming. I imagine this approach is possible. I just didn't want to continue spending time trying to figure out a clean way to do this.

  • I also tried moving the TableMetaDataBlob TypedDict class lower down in the module graph. I tried putting it in mathesar/models/base.py. Then I imported it from there to mathesar/utils/tables.py so I could iterate over the fields for the sake of validation. But I couldn't figure out how to appropriate incorporate the documentation of that class into the rendered output in MkDocs. I wrestled with this for quite a while actually. Probably at least 1 hour. I read over the mkdocstrings docs trying to figure out a way to do this. I imagine it's possible. But again, I didn't want to spend more time on it.

Ultimately I chose to prioritize application stability (validating the metadata fields) and development velocity (abandoning the above approaches) at the expense of code duplication. I still think that's the right balance to strike.

Given the counter examples I posed at the start of this comment, do you still maintain that we don't need any field validation here?

Do you have any better ideas for reducing code duplication?

Given that the list of fields was already in place before this PR, do you think that this code duplication is important enough to address in this PR?

Copy link
Contributor Author

@seancolsen seancolsen Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in our technical beta meeting today and decided to take Brent's approach for now but to hopefully come back to this topic later and find a better way to validate RPC request inputs in general.

@mathemancer mathemancer self-requested a review July 11, 2024 05:32
Copy link
Contributor

@mathemancer mathemancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in the interest of speed and breaking things, I've made the change I suggested. @seancolsen , if you're okay with this, go ahead and merge. If not, we can go back and forth on it.

) -> TableMetaData:
def set_(
*, table_oid: int, metadata: TableMetaDataBlob, database_id: int, **kwargs
) -> TableMetaDataRecord:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> TableMetaDataRecord:
) -> None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3396114. Thanks!

@seancolsen seancolsen added this pull request to the merge queue Jul 11, 2024
Merged via the queue into develop with commit 024c1b5 Jul 11, 2024
37 checks passed
@seancolsen seancolsen deleted the tables_metadata_improvements branch July 11, 2024 15:13
@kgodey kgodey modified the milestones: Beta, Pre-beta test build #1 Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: review A PR awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve tables metadata
4 participants