-
Notifications
You must be signed in to change notification settings - Fork 29
Add betterproto to experimental module #360
Conversation
7150380 to
d993de8
Compare
d993de8 to
f791766
Compare
a76bbaa to
728a57d
Compare
3cfe695 to
3afdbf5
Compare
|
In tox tests, one of the spouts test is failing due to the bug @PFedak discovered. Since this PR doesn't depend on spouts, I'm okay with merging this after review. |
431ab96 to
3027604
Compare
3027604 to
c29b24e
Compare
| [ | ||
| n | ||
| for n in attrs(request_cls) | ||
| if n not in PROTO_OBJECT_BUILTINS and "FIELD_NUMBER" not in n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes to self/initial thoughts:
- Is there some way we could preview the generated docs? Since that's a big part of what we hope to get out of this change?
- Did we ever figure out why betterproto-generated objects sometimes use ints for enums and other times use python enums?
- I'm only now realizing that this will double the amount of garbage generated per request. That seems mostly fine, but could be a problem for PutFile/GetFile. We should have wrappers for those (ModifyFileClient et. al.) but is there some way this could become a problem? Also: confirm that the wrappers use the Google protobuf objects for efficiency.
|
Basically, I just meant that the wire-level protocol for PutFile previously looked something like: But because, with this change, we typically create two objects for each RPC, it means that putting a large file might now look more like: Basically, with PutFile in particular, one logical request might consist of a many smaller stream RPC messages. Because, with this change, we now create two protobuf objects for each request, a BetterProto object and a Google protobuf object, we might be generating twice as much memory garbage in the course of getting a file into pachyderm. It's probably fine, but I wanted to check that e.g. ModifyFileClient, which just takes a path and a byte source, doesn't bother generating BetterProto objects behind its API and converting them to Google protobuf objects, and instead just sends Google protobuf objects directly, that's all. |
For posterity/my personal sanity, the issue is this one:
Further discussion in #pach-core (and here)
TODO: Link to core pach PR (if any) |
… PFS and PPS proto docs
msteffen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry again that this took so long, and thanks for being so patient! Now that I've finally gotten through the review, I see how much tedious rewiring went into making this work. I'm really glad we got this experiment done; I still really think this is something we need to try. Really awesome work.
LGTM
| - PFS: ``mount()``/``unmount()``- Mounting/unmounting Pachyderm repos locally | ||
|
|
||
| **Note:** The experimental module WILL NOT follow semver. Breaking changes can | ||
| be introduced in the next minor version of :mod:`python_pachyderm`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
|
||
| bp_obj = bp_class().parse(pb_obj.SerializeToString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| from .service import BP_MODULES | ||
|
|
||
|
|
||
| class ProtoIterator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| path=path, datum=datum, raw=wrappers_pb2.BytesValue(value=chunk) | ||
| path=path, | ||
| datum=datum, | ||
| raw=chunk, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some nice ergonomic improvements with BetterProto, honestly
| return repo_name | ||
|
|
||
|
|
||
| def create_test_pipeline(client: python_pachyderm.Client, test_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know where this gets used? Is this just carried over from the non-experimental client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 3 references to create_test_pipeline() in tests/experimental/test_pps.py.
| from python_pachyderm.proto.v2 import pfs | ||
| from python_pachyderm.service import pfs_proto, Service | ||
| from python_pachyderm.experimental.pfs import commit_from, Commit, uuid_re | ||
| from python_pachyderm.service import Service # , pfs_proto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying my above comment and pinning this for posterity: we could, as a later optimization, import python_pachyderm.service.pfs_proto and then change the parts of ModifyFileClient's API that take a URL, byte iterator, or anything else that isn't a protobuf so that they use the non-experimental client's GRPC API. Thus instead of:
>>> with client.modify_file_client(c) as mfc:
>>> mfc.put_file_bytes(
... "/new_file.txt",
... data)
becoming send(bp_to_pb(experimental.proto.PutFileRequest("/new_file.txt", data))), it could be send(proto.PutFileRequest("/new_file.txt", data)) and avoid the bp_to_pb round trip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add this change to make the ModifyFile rpc route not use any betterproto objects.
msteffen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new ModifyFileClient LGTM! Nice job keeping the change small!
Switches the proto module the mixins in the experimental module use from google protobuf to betterproto-generated code. The main focus for review is in
client.py. This is where the conversion between google protobuf and betterproto objects occurs, sandwiching the gRPC call (which happens in_Client._req()).Additionally:
tests/python_pachyderm.ExperimentalClient()topython_pachyderm.experimental.Client()contributing.md,__init__.py)pfs.py,util.py,service.pyto experimental module