Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modern rpc prototype #3524

Merged
merged 26 commits into from
Apr 15, 2024
Merged

Modern rpc prototype #3524

merged 26 commits into from
Apr 15, 2024

Conversation

mathemancer
Copy link
Contributor

@mathemancer mathemancer commented Apr 3, 2024

Fixes #3519
Fixes #3518

Summary

This adds a new RPC endpoint, available at /rpc/. At the moment, the endpoint is available only when logged in, and the functions are available only for superusers. The functions are:

  • connections.create_from_known_connection, and
  • connections.create_from_scratch.

Technical details

  • The functions are documented via auto-generated docs (created from their docstrings) in the new documentation page at /user-guide/rpc-functions/.

  • I elected to go against my previous plan and use django-modern-rpc.

    • This seems to be more actively maintained, and
    • This integrates with Django auth.
  • Unfortunately, this meant the auto-generated docs were not good enough. So, I decided to use the mkdocs plugin to include RPC function docs in the docs site instead.

  • This PR disables the check to prevent type hints!!!

    • This was necessary to get the documentation generated automatically.
    • Another option is to use the Sphinx (rather than Google) docstring style for the RPC functions, but I find the Google docstring style much more readable. However, the Sphinx style is better for documenting types without type hinting.

Additional context

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop branch of the repository
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@mathemancer mathemancer requested a review from Anish9901 April 3, 2024 02:15
@mathemancer
Copy link
Contributor Author

@Anish9901 if you're far enough along on figuring out testing for these, go ahead and add a couple tests. Otherwise, I think we should merge this untested (I did check manually to make sure things were working).

@seancolsen @pavish Take a look and see what you think of:

  • The style of the docs
  • The error responses when parameters are missing
  • The error responses when you call the functions in some illegal manner (e.g., violating a uniqueness constraint on DB keys)

@kgodey It might be worth a glance from you, since this is the initial PR that will define how the RPC endpoint looks and works.

@mathemancer
Copy link
Contributor Author

mathemancer commented Apr 3, 2024

This is ready to review, but I'm leaving it as a draft until @Anish9901 replies with an opinion on whether to add tests into this PR.

@mathemancer mathemancer added the pr-status: review A PR awaiting review label Apr 3, 2024
@kgodey kgodey self-assigned this Apr 3, 2024
@Anish9901
Copy link
Member

I'll add some tests.

@@ -0,0 +1,23 @@
# Using the RPC endpoint
Copy link
Contributor

@kgodey kgodey Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to confuse the intended audience of the "User Guide" section. I would either put this in the wiki (audience: developers) or create a "Developer Guide" or "API" section to put this in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought @kgodey. I pushed f2433eb to fix this.

Copy link
Contributor

@seancolsen seancolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start and it's getting me excited about this new API architecture. I really like it!

Since this is the first RPC PR and we're seeking to establish patterns, I'm being a bit scrupulous with this review.

The auto-generated API docs are awesome! Thanks for figuring that out. I pushed f2433eb to improve some minor things there.

Critique

Standard verbs

The standard verbs we plan to use within method names is still an open question. This PR seems to suggest (at least tentatively) that we'd use "create" as a standard verb. I have a slight preference for standardizing on "add" instead, simply because it's shorter. I think it would be nice for us to decide on those standard verbs before merging this PR, and get input from other team members in doing so.

My personal preference for standard verbs is: list, get, add, patch, replace, delete.

Objects as request parameters

I notice that the request schema is flattened out a bit for these methods compared to what we had in the REST API. I'm curious what the rationale is for this flattening. In particular, there's one change I would suggest making...

For the create_from_known_connection method, one way to describe the current request schema would be with the following TypeScript:

type CreateFromKnownConnection_Params = {
  nickname: string;
  db_name: string;
  create_db: boolean;
  connection_type: 'internal_database' | 'user_database';
  connection_id?: number;
  sample_data: ('library_management' | 'movie_collection')[];
};

But at the type level, there's a shortcoming with the above type definition. As noted in the API documentation connection_id is required when connection_type is 'user_database' and ignored otherwise. In general, I try to encode requirements like this in the type system if possible because it helps us reduce front end bugs. We can describe that that interdependence of properties by altering the TypeScript to be:

type Base = {
  nickname: string;
  db_name: string;
  create_db: boolean;
  sample_data: ('library_management' | 'movie_collection')[];
};
type CreateFromKnownConnection_Params =
  | (Base & {
      connection_type: 'internal_database';
    })
  | (Base & {
      connection_type: 'user_database';
      connection_id: number;
    });

What we've done is made the entire request type a union of two types, at the top level. This is pretty ugly though. And it gets much worse if that top level actually contains more than one such interdependence of properties.

What I'd prefer to do is to change the request schema by adding some nesting. The TypeScript would be:

type CreateFromKnownConnection_Params = {
  nickname: string;
  db_name: string;
  create_db: boolean;
  connection:
    | {
        type: 'internal_database';
      }
    | {
        type: 'user_database';
        id: number;
      };
  sample_data: ('library_management' | 'movie_collection')[];
};

With this schema the connection property can either be:

  •   {
        "type": "internal_database"
      }

    Or something like:

  •   {
        "type": "user_database",
        "id": 42
      }

This change would introduce a function parameter that in JSON is an object. I don't see such a function parameter anywhere in this PR and I'm a little bit curious if your choice of flattening parameters is due to your reluctance to support object-type parameters. My guess is that we'll run into the need for object parameters at some point and I'm wondering how we'd handle that within Python. Would such a parameter become a NamedTuple? A TypedDict? A dataclass? How would it appear in the auto-generated docs?

I don't feel too strongly about the change I'm proposing above. But if you're open to pursuing it, then it might actually be a good opportunity to establish a pattern for object-type parameters going forward.

Extensible return values

The methods added in this PR both return primitive numbers, like this:

42

As a rule of thumb, I suggest wrapping such primitives in objects in most cases:

{
  "id": 42
}

Rationale: Later we might decide that we need to return more information here. For example we might want to indicate whether the user database was newly created or whether it already existed. If we begin the API by returning a primitive number, then any addition of return value becomes a breaking API change. With an object, we can add more properties in a backwards compatible manner.

Note: This is not to suggest that I think return values must absolutely never be primitives. I can imagine an RPC function that returns the nth prime number. That's a case where I think it might be appropriate to favor simplicity over extensibility. But my hunch is that for Mathesar any time we're returning something we probably ought to wrap it in an object to guard against breaking changes.

It's also worth mentioning that I'd be inclined to return null for most (all?) of the methods that delete things. I think we ought to think of null return values like void functions. Those would be functions where we intend for the client to make no use of the return value. And in those cases we could transition from a null return value to a non-null return value without making a breaking change.

Return values compatible with REST APIs

Say the front end calls connections.create_from_known_connection with these params:

{
  "nickname": "apple",
  "db_name": "apple",
  "create_db": true,
  "connection_type": "internal_database",
  "sample_data": []
}
  • In our old REST API, the front end was previously getting a response like this:

    {
      "id": 42,
      "nickname": "apple",
      "database": "apple",
      "supported_types_url": "http://localhost:8000/api/ui/v0/connections/42/types/",
      "username": "mathesar",
      "host": "mathesar_dev_db",
      "port": 5432
    }
  • With the "Extensible return values" change implemented, the front end would only get this:

    {
      "id": 42
    }

What's difficult about this change is that the front end is currently relying on response properties like "port" which are not necessarily present in the request. Removing those properties is going to be a small headache for me. To be clear: the front end does have that information — it's just higher up the call stack. It would be possible for the front end to adapt to a partial return value, but it would be more work.

In general I support RPC API methods that return partial or minimal information. But given that the scope of this API transition is quite large, I would be inclined to err on the side of returning a response that is more or less compatible with the response of the equivalent REST endpoint, so long as that's relatively easy for the backend. We might need to take this decision on a case by case basis though in pursuit of the fastest approach across the whole stack.

For new RPC methods (that don't serve to replace REST endpoints), I'd be inclined to err on the side of simplicity and minimalism.

@seancolsen seancolsen removed their assignment Apr 3, 2024
@seancolsen
Copy link
Contributor

Also, I made a couple edits to the wiki to bring the RPC transition plan into conformance with this PR (e.g. django-modern-rpc and dots in method names).


Mathesar has a REST API that the front end uses to interact with the backend.

For Mathesar's beta release, we are actively transitioning to a new [RPC-style API](https://wiki.mathesar.org/projects/2024/architecture-transition/rpc/) and will soon be phasing out the REST API entirely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathemancer are we going to be entirely removing the REST API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, that's the goal. I doubt we'll achieve that goal by the beta release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to have a REST API at some point in the future for other API consumers / apps, etc., but I don't think we need to think about that now.

@mathemancer
Copy link
Contributor Author

Thank you for this thoughtful feedback, @seancolsen !

You're correct that dependencies between function parameters is a smell. As I read your thoughts, I realized that the connection_type parameter adds precisely zero information, and can only ever produce an error. We should remove it, and just accept an optional connection_id.

As for your broader point: I typically prefer primitive types for "public" function arguments, and prefer them even more strongly in this case where there's a translation between two languages. I'm using "primitive" a bit loosely here to mean any non-composite type. My reasoning is that

  • Primitives translate more cleanly in general (for example, the datetime types are different for different languages.
  • Primitives have the advantage of being well-understood and self-documenting when used in a function signature. For example, there's pretty much no question about what an int parameter will accept as valid input, but there are many potential questions about what a Connection parameter will accept as valid input.
  • Primitives have the advantage that exceptions produced will be more likely to be automatically useful (e.g., forgetting the connection_id when submitting a connection_type should produce a KeyError: "connection_id" error without too much (maybe any) custom handling, and could even be upgraded to
    TypeError: connection_id keyword arg is required when connection_type="user_database".
    
    if we wanted custom handling. If it's nested within an object, we have to then provide some nesting in our error response:
    TypeError: the connection keyword arg requires a connection_id attribute whenever its connection_type is "user_database".
    
    The more nested the object in question is, the worse that problem gets.

With that said, I agree that we'll likely have to handle some kind of nesting if polymorphism is required. In a different language, I'd probably use multiple dispatch in the API-facing functions to sort that whenever possible, but oh well. Another option (even in python) is to simply have differently named methods for different combinations of arguments. Of the concerns above, the self-documenting part is what I'm most loathe to give up.

The most self-documenting solution I can think of is to add data classes representing any nested object to the same module as the function(s) that accept(s) it, and use type hinting to specify that a given argument should be of the type of that data class. The class woudld be picked up by the mkdocs plugin, and even cross-referenced with the type hint. However, that type hint would be a lie! I.e., the actual type received via API call would always be a dict! So, we'd have to then add some branching logic to handle the possible inputs, and document how to interpret the documentation of those data classes. We could try to create a custom __init__, but I expect that would confuse the auto-generated documentation.

A less self-documenting, but in some ways cleaner approach would be to specify that the argument is a dict via type hinting (and so in the docs), and then specify details about the form of the dict in the docstring of the API-facing function.

I suppose I'm just trying to avoid the nesting problem whenever possible. Whenever absolutely necessary, I prefer the latter solution.

Regarding the return type: I tend to prefer returning primitives when possible for most of the same reasons (translation and self-documenting). With that said, I don't have a fundamental problem with returning a flatish object from functions in this context. However, that still leaves the issue of making the function self-documenting. I think the same options mentioned above apply, but I want to tinker with that a bit. I do see your point about non-breaking API changes...it's a good one. With that said, I do think that if a public function is modified to return a different type, it should just be a new function instead.

@mathemancer
Copy link
Contributor Author

Also, I made a couple edits to the wiki to bring the RPC transition plan into conformance with this PR (e.g. django-modern-rpc and dots in method names).

Thank you for that, and for improving the documentation structure in this PR!

@mathemancer mathemancer marked this pull request as ready for review April 11, 2024 09:35
@mathemancer
Copy link
Contributor Author

I think this is ready for re-review, all current feedback is handled ( i hope )

Copy link
Contributor

@seancolsen seancolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathemancer

  • In both methods, I see db_name in the request but database in response. I'd like these to be consistent.

    Can we change the request to use database?

  • I'm not sure what the purpose of the get_error_code method is. I tried playing with it to see if I could understand it better, but I wasn't able to get it to work. I tried sending:

    {
      "jsonrpc": "2.0",
      "method": "get_error_code",
      "id": 0,
      "params": {
        "err": -26035
      }
    }

    And I got back:

    {
      "id": 0,
      "jsonrpc": "2.0",
      "error": {
        "code": -32601,
        "message": "Method not found: \"get_error_code\""
      }
    }

    However, I suggest merging this PR without addressing anything else about error handling. We can work on it more in another PR. I'd like to get this PR merged so that I can base multiple things on top if it without stacking too high.

@mathemancer
Copy link
Contributor Author

@mathemancer

  • In both methods, I see db_name in the request but database in response. I'd like these to be consistent.
    Can we change the request to use database?

Sure, that's fine with me. I just left it that way since I figured it would make the front end changes more minimal. As I type this, though, it seems like it's unlikely to produce a bunch of work for y'all to make it more consistent.

  • I'm not sure what the purpose of the get_error_code method is. I tried playing with it to see if I could understand it better, but I wasn't able to get it to work. I tried sending:
    ...

Argh. I see now that my attempts to document the error handling have produced more confusion than clarity. The get_error_code function isn't intended to be called from the RPC endpoint. I added it to the documentation simply to provide the error code schema (I.e., which numbers go with which libraries, etc.). I'll try something different.

@mathemancer mathemancer requested a review from seancolsen April 12, 2024 07:27
Copy link
Member

@pavish pavish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@pavish pavish removed their assignment Apr 12, 2024
Copy link
Contributor

@seancolsen seancolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to get this merged!

@Anish9901 Anish9901 added this pull request to the merge queue Apr 15, 2024
Merged via the queue into develop with commit 82fa1e3 Apr 15, 2024
37 checks passed
@Anish9901 Anish9901 deleted the modern_rpc_prototype branch April 15, 2024 09:52
@Anish9901 Anish9901 mentioned this pull request Apr 17, 2024
7 tasks
@seancolsen seancolsen mentioned this pull request Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-status: review A PR awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create new RPC endpoint Get testing for RPC functions organized and set up.
5 participants