Modern rpc prototype #3524

mathemancer · 2024-04-03T01:08:21Z

Summary

This adds a new RPC endpoint, available at /rpc/. At the moment, the endpoint is available only when logged in, and the functions are available only for superusers. The functions are:

connections.create_from_known_connection, and
connections.create_from_scratch.

Technical details

The functions are documented via auto-generated docs (created from their docstrings) in the new documentation page at /user-guide/rpc-functions/.
I elected to go against my previous plan and use django-modern-rpc.
- This seems to be more actively maintained, and
- This integrates with Django auth.
Unfortunately, this meant the auto-generated docs were not good enough. So, I decided to use the mkdocs plugin to include RPC function docs in the docs site instead.
This PR disables the check to prevent type hints!!!
- This was necessary to get the documentation generated automatically.
- Another option is to use the Sphinx (rather than Google) docstring style for the RPC functions, but I find the Google docstring style much more readable. However, the Sphinx style is better for documenting types without type hinting.

Additional context

RPC transition plan

Checklist

My pull request has a descriptive title (not a vague title like Update index.md).
My pull request targets the develop branch of the repository
My commit messages follow best practices.
My code follows the established code style of the repository.
I added tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no
visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

mathemancer · 2024-04-03T02:18:20Z

@Anish9901 if you're far enough along on figuring out testing for these, go ahead and add a couple tests. Otherwise, I think we should merge this untested (I did check manually to make sure things were working).

@seancolsen @pavish Take a look and see what you think of:

The style of the docs
The error responses when parameters are missing
The error responses when you call the functions in some illegal manner (e.g., violating a uniqueness constraint on DB keys)

@kgodey It might be worth a glance from you, since this is the initial PR that will define how the RPC endpoint looks and works.

mathemancer · 2024-04-03T02:18:56Z

This is ready to review, but I'm leaving it as a draft until @Anish9901 replies with an opinion on whether to add tests into this PR.

Anish9901 · 2024-04-03T09:47:17Z

I'll add some tests.

kgodey · 2024-04-03T14:25:57Z

docs/docs/user-guide/rpc-functions.md

@@ -0,0 +1,23 @@
+# Using the RPC endpoint


This is likely to confuse the intended audience of the "User Guide" section. I would either put this in the wiki (audience: developers) or create a "Developer Guide" or "API" section to put this in.

I had the same thought @kgodey. I pushed f2433eb to fix this.

seancolsen

This is a great start and it's getting me excited about this new API architecture. I really like it!

Since this is the first RPC PR and we're seeking to establish patterns, I'm being a bit scrupulous with this review.

The auto-generated API docs are awesome! Thanks for figuring that out. I pushed f2433eb to improve some minor things there.

Critique

Standard verbs

The standard verbs we plan to use within method names is still an open question. This PR seems to suggest (at least tentatively) that we'd use "create" as a standard verb. I have a slight preference for standardizing on "add" instead, simply because it's shorter. I think it would be nice for us to decide on those standard verbs before merging this PR, and get input from other team members in doing so.

My personal preference for standard verbs is: list, get, add, patch, replace, delete.

Objects as request parameters

I notice that the request schema is flattened out a bit for these methods compared to what we had in the REST API. I'm curious what the rationale is for this flattening. In particular, there's one change I would suggest making...

For the create_from_known_connection method, one way to describe the current request schema would be with the following TypeScript:

type CreateFromKnownConnection_Params = {
  nickname: string;
  db_name: string;
  create_db: boolean;
  connection_type: 'internal_database' | 'user_database';
  connection_id?: number;
  sample_data: ('library_management' | 'movie_collection')[];
};

But at the type level, there's a shortcoming with the above type definition. As noted in the API documentation connection_id is required when connection_type is 'user_database' and ignored otherwise. In general, I try to encode requirements like this in the type system if possible because it helps us reduce front end bugs. We can describe that that interdependence of properties by altering the TypeScript to be:

type Base = {
  nickname: string;
  db_name: string;
  create_db: boolean;
  sample_data: ('library_management' | 'movie_collection')[];
};
type CreateFromKnownConnection_Params =
  | (Base & {
      connection_type: 'internal_database';
    })
  | (Base & {
      connection_type: 'user_database';
      connection_id: number;
    });

What we've done is made the entire request type a union of two types, at the top level. This is pretty ugly though. And it gets much worse if that top level actually contains more than one such interdependence of properties.

What I'd prefer to do is to change the request schema by adding some nesting. The TypeScript would be:

type CreateFromKnownConnection_Params = {
  nickname: string;
  db_name: string;
  create_db: boolean;
  connection:
    | {
        type: 'internal_database';
      }
    | {
        type: 'user_database';
        id: number;
      };
  sample_data: ('library_management' | 'movie_collection')[];
};

With this schema the connection property can either be:

```
  {
    "type": "internal_database"
  }
```
Or something like:

  {
    "type": "user_database",
    "id": 42
  }

This change would introduce a function parameter that in JSON is an object. I don't see such a function parameter anywhere in this PR and I'm a little bit curious if your choice of flattening parameters is due to your reluctance to support object-type parameters. My guess is that we'll run into the need for object parameters at some point and I'm wondering how we'd handle that within Python. Would such a parameter become a NamedTuple? A TypedDict? A dataclass? How would it appear in the auto-generated docs?

I don't feel too strongly about the change I'm proposing above. But if you're open to pursuing it, then it might actually be a good opportunity to establish a pattern for object-type parameters going forward.

Extensible return values

The methods added in this PR both return primitive numbers, like this:

As a rule of thumb, I suggest wrapping such primitives in objects in most cases:

{
  "id": 42
}

Rationale: Later we might decide that we need to return more information here. For example we might want to indicate whether the user database was newly created or whether it already existed. If we begin the API by returning a primitive number, then any addition of return value becomes a breaking API change. With an object, we can add more properties in a backwards compatible manner.

Note: This is not to suggest that I think return values must absolutely never be primitives. I can imagine an RPC function that returns the n^th prime number. That's a case where I think it might be appropriate to favor simplicity over extensibility. But my hunch is that for Mathesar any time we're returning something we probably ought to wrap it in an object to guard against breaking changes.

It's also worth mentioning that I'd be inclined to return null for most (all?) of the methods that delete things. I think we ought to think of null return values like void functions. Those would be functions where we intend for the client to make no use of the return value. And in those cases we could transition from a null return value to a non-null return value without making a breaking change.

Return values compatible with REST APIs

Say the front end calls connections.create_from_known_connection with these params:

{
  "nickname": "apple",
  "db_name": "apple",
  "create_db": true,
  "connection_type": "internal_database",
  "sample_data": []
}

In our old REST API, the front end was previously getting a response like this:

{
  "id": 42,
  "nickname": "apple",
  "database": "apple",
  "supported_types_url": "http://localhost:8000/api/ui/v0/connections/42/types/",
  "username": "mathesar",
  "host": "mathesar_dev_db",
  "port": 5432
}

With the "Extensible return values" change implemented, the front end would only get this:
```
{
  "id": 42
}
```

What's difficult about this change is that the front end is currently relying on response properties like "port" which are not necessarily present in the request. Removing those properties is going to be a small headache for me. To be clear: the front end does have that information — it's just higher up the call stack. It would be possible for the front end to adapt to a partial return value, but it would be more work.

In general I support RPC API methods that return partial or minimal information. But given that the scope of this API transition is quite large, I would be inclined to err on the side of returning a response that is more or less compatible with the response of the equivalent REST endpoint, so long as that's relatively easy for the backend. We might need to take this decision on a case by case basis though in pursuit of the fastest approach across the whole stack.

For new RPC methods (that don't serve to replace REST endpoints), I'd be inclined to err on the side of simplicity and minimalism.

seancolsen · 2024-04-03T20:10:47Z

Also, I made a couple edits to the wiki to bring the RPC transition plan into conformance with this PR (e.g. django-modern-rpc and dots in method names).

kgodey · 2024-04-03T21:09:22Z

docs/docs/api/rest.md

+
+Mathesar has a REST API that the front end uses to interact with the backend.
+
+For Mathesar's beta release, we are actively transitioning to a new [RPC-style API](https://wiki.mathesar.org/projects/2024/architecture-transition/rpc/) and will soon be phasing out the REST API entirely.


@mathemancer are we going to be entirely removing the REST API?

Eventually, that's the goal. I doubt we'll achieve that goal by the beta release.

I'm wondering if it makes sense to have a REST API at some point in the future for other API consumers / apps, etc., but I don't think we need to think about that now.

mathemancer · 2024-04-03T21:54:38Z

Thank you for this thoughtful feedback, @seancolsen !

You're correct that dependencies between function parameters is a smell. As I read your thoughts, I realized that the connection_type parameter adds precisely zero information, and can only ever produce an error. We should remove it, and just accept an optional connection_id.

As for your broader point: I typically prefer primitive types for "public" function arguments, and prefer them even more strongly in this case where there's a translation between two languages. I'm using "primitive" a bit loosely here to mean any non-composite type. My reasoning is that

Primitives translate more cleanly in general (for example, the datetime types are different for different languages.
Primitives have the advantage of being well-understood and self-documenting when used in a function signature. For example, there's pretty much no question about what an int parameter will accept as valid input, but there are many potential questions about what a Connection parameter will accept as valid input.
Primitives have the advantage that exceptions produced will be more likely to be automatically useful (e.g., forgetting the connection_id when submitting a connection_type should produce a KeyError: "connection_id" error without too much (maybe any) custom handling, and could even be upgraded to
```
TypeError: connection_id keyword arg is required when connection_type="user_database".
```
if we wanted custom handling. If it's nested within an object, we have to then provide some nesting in our error response:
```
TypeError: the connection keyword arg requires a connection_id attribute whenever its connection_type is "user_database".
```
The more nested the object in question is, the worse that problem gets.

With that said, I agree that we'll likely have to handle some kind of nesting if polymorphism is required. In a different language, I'd probably use multiple dispatch in the API-facing functions to sort that whenever possible, but oh well. Another option (even in python) is to simply have differently named methods for different combinations of arguments. Of the concerns above, the self-documenting part is what I'm most loathe to give up.

The most self-documenting solution I can think of is to add data classes representing any nested object to the same module as the function(s) that accept(s) it, and use type hinting to specify that a given argument should be of the type of that data class. The class woudld be picked up by the mkdocs plugin, and even cross-referenced with the type hint. However, that type hint would be a lie! I.e., the actual type received via API call would always be a dict! So, we'd have to then add some branching logic to handle the possible inputs, and document how to interpret the documentation of those data classes. We could try to create a custom __init__, but I expect that would confuse the auto-generated documentation.

A less self-documenting, but in some ways cleaner approach would be to specify that the argument is a dict via type hinting (and so in the docs), and then specify details about the form of the dict in the docstring of the API-facing function.

I suppose I'm just trying to avoid the nesting problem whenever possible. Whenever absolutely necessary, I prefer the latter solution.

Regarding the return type: I tend to prefer returning primitives when possible for most of the same reasons (translation and self-documenting). With that said, I don't have a fundamental problem with returning a flatish object from functions in this context. However, that still leaves the issue of making the function self-documenting. I think the same options mentioned above apply, but I want to tinker with that a bit. I do see your point about non-breaking API changes...it's a good one. With that said, I do think that if a public function is modified to return a different type, it should just be a new function instead.

mathemancer · 2024-04-03T21:55:56Z

Also, I made a couple edits to the wiki to bring the RPC transition plan into conformance with this PR (e.g. django-modern-rpc and dots in method names).

Thank you for that, and for improving the documentation structure in this PR!

mathemancer · 2024-04-11T09:35:58Z

I think this is ready for re-review, all current feedback is handled ( i hope )

seancolsen

@mathemancer

In both methods, I see db_name in the request but database in response. I'd like these to be consistent.

Can we change the request to use database?
I'm not sure what the purpose of the get_error_code method is. I tried playing with it to see if I could understand it better, but I wasn't able to get it to work. I tried sending:
```
{
  "jsonrpc": "2.0",
  "method": "get_error_code",
  "id": 0,
  "params": {
    "err": -26035
  }
}
```
And I got back:
```
{
  "id": 0,
  "jsonrpc": "2.0",
  "error": {
    "code": -32601,
    "message": "Method not found: \"get_error_code\""
  }
}
```
However, I suggest merging this PR without addressing anything else about error handling. We can work on it more in another PR. I'd like to get this PR merged so that I can base multiple things on top if it without stacking too high.

mathemancer · 2024-04-11T14:51:46Z

@mathemancer

In both methods, I see db_name in the request but database in response. I'd like these to be consistent.
Can we change the request to use database?

Sure, that's fine with me. I just left it that way since I figured it would make the front end changes more minimal. As I type this, though, it seems like it's unlikely to produce a bunch of work for y'all to make it more consistent.

I'm not sure what the purpose of the get_error_code method is. I tried playing with it to see if I could understand it better, but I wasn't able to get it to work. I tried sending:
...

Argh. I see now that my attempts to document the error handling have produced more confusion than clarity. The get_error_code function isn't intended to be called from the RPC endpoint. I added it to the documentation simply to provide the error code schema (I.e., which numbers go with which libraries, etc.). I'll try something different.

pavish

Looks good to me!

seancolsen

Excited to get this merged!

mathemancer added 8 commits April 2, 2024 18:00

add functions to be exposed via RPC

d8c0458

expose functions at /rpc/

19751da

add basic documentation of the RPC endpoint

02117f2

Merge branch 'develop' into modern_rpc_prototype

7f1b5cd

make flake8 happy

18bd599

move RPC endpoint to conform to others

2ecc28e

Add usage help to RPC docs

55ed922

remove type-hint check from pipeline

3d81f48

mathemancer requested a review from Anish9901 April 3, 2024 02:15

mathemancer requested review from seancolsen, kgodey and pavish April 3, 2024 02:19

mathemancer assigned seancolsen, pavish and Anish9901 Apr 3, 2024

mathemancer added the pr-status: review A PR awaiting review label Apr 3, 2024

kgodey self-assigned this Apr 3, 2024

kgodey reviewed Apr 3, 2024

View reviewed changes

Clean up RPC API docs

f2433eb

seancolsen reviewed Apr 3, 2024

View reviewed changes

seancolsen removed their assignment Apr 3, 2024

kgodey reviewed Apr 3, 2024

View reviewed changes

mathemancer added 2 commits April 3, 2024 17:40

fix broken link in docs

24809a7

add class to represent return type for connection funcs

51951bd

mathemancer added 7 commits April 11, 2024 08:19

add error code smoketest, fix found bug

4585627

add happy path error code tests

841cb8c

Merge branch 'develop' into modern_rpc_prototype

7ea26ac

make uniqueness of nickname more clear

8634d50

add info about response and error objects

3fb2b05

add updated error code lists

c6e1976

update test to match new error list

8b6ba6c

mathemancer requested review from Anish9901, seancolsen and pavish April 11, 2024 09:35

mathemancer marked this pull request as ready for review April 11, 2024 09:35

seancolsen requested changes Apr 11, 2024

View reviewed changes

mathemancer added 3 commits April 12, 2024 13:26

Merge branch 'develop' into modern_rpc_prototype

900b741

update docs to make error code context clearer

3af825c

rename db_name parameter to database

524665f

mathemancer requested a review from seancolsen April 12, 2024 07:27

pavish approved these changes Apr 12, 2024

View reviewed changes

pavish removed their assignment Apr 12, 2024

seancolsen approved these changes Apr 12, 2024

View reviewed changes

Anish9901 approved these changes Apr 15, 2024

View reviewed changes

Anish9901 added this pull request to the merge queue Apr 15, 2024

Merged via the queue into develop with commit 82fa1e3 Apr 15, 2024
37 checks passed

Anish9901 deleted the modern_rpc_prototype branch April 15, 2024 09:52

This was referenced Apr 16, 2024

Resolve remaining questions within RPC plans mathesar-foundation/mathesar-wiki#112

Merged

Connections RPC front end #3543

Merged

Anish9901 mentioned this pull request Apr 17, 2024

Exception handler tests #3547

Merged

7 tasks

seancolsen mentioned this pull request Apr 24, 2024

Release notes 0.1.7 #3554

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modern rpc prototype #3524

Modern rpc prototype #3524

mathemancer commented Apr 3, 2024 •

edited

Loading

mathemancer commented Apr 3, 2024

mathemancer commented Apr 3, 2024 •

edited

Loading

Anish9901 commented Apr 3, 2024

kgodey Apr 3, 2024 •

edited

Loading

seancolsen Apr 3, 2024

seancolsen left a comment •

edited

Loading

seancolsen commented Apr 3, 2024

kgodey Apr 3, 2024

mathemancer Apr 3, 2024

kgodey Apr 9, 2024

mathemancer commented Apr 3, 2024

mathemancer commented Apr 3, 2024

mathemancer commented Apr 11, 2024

seancolsen left a comment

mathemancer commented Apr 11, 2024

pavish left a comment

seancolsen left a comment


		Mathesar has a REST API that the front end uses to interact with the backend.

		For Mathesar's beta release, we are actively transitioning to a new [RPC-style API](https://wiki.mathesar.org/projects/2024/architecture-transition/rpc/) and will soon be phasing out the REST API entirely.

Modern rpc prototype #3524

Modern rpc prototype #3524

Conversation

mathemancer commented Apr 3, 2024 • edited Loading

Summary

Technical details

Additional context

Checklist

Developer Certificate of Origin

mathemancer commented Apr 3, 2024

mathemancer commented Apr 3, 2024 • edited Loading

Anish9901 commented Apr 3, 2024

kgodey Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

seancolsen Apr 3, 2024

Choose a reason for hiding this comment

seancolsen left a comment • edited Loading

Choose a reason for hiding this comment

Critique

Standard verbs

Objects as request parameters

Extensible return values

Return values compatible with REST APIs

seancolsen commented Apr 3, 2024

kgodey Apr 3, 2024

Choose a reason for hiding this comment

mathemancer Apr 3, 2024

Choose a reason for hiding this comment

kgodey Apr 9, 2024

Choose a reason for hiding this comment

mathemancer commented Apr 3, 2024

mathemancer commented Apr 3, 2024

mathemancer commented Apr 11, 2024

seancolsen left a comment

Choose a reason for hiding this comment

mathemancer commented Apr 11, 2024

pavish left a comment

Choose a reason for hiding this comment

seancolsen left a comment

Choose a reason for hiding this comment

mathemancer commented Apr 3, 2024 •

edited

Loading

mathemancer commented Apr 3, 2024 •

edited

Loading

kgodey Apr 3, 2024 •

edited

Loading

seancolsen left a comment •

edited

Loading