-
Notifications
You must be signed in to change notification settings - Fork 3.8k
chore(engine): serializable physical plans #19672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updates all physical nodes to use a ULID as their ID, and makes the field public for explicit node construction (which will be used for protobuf conversion). Unit tests which previously explicitly set the ULID have been updated to leave the ID as the empty ULID. Currently this field is never set (but will be in the following commit).
When creating a physical plan, each plan node will now have a unique ULID. The Clone method has been updated to generate a new ULID for the resulting cloned node. Workflows, for the time being, will reuse some node ULIDs when a node is found across multiple sharded tasks.
7ac0a91 to
ff047e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| // MarshalPhysical converts a protobuf plan into standard representation. | ||
| // Returns an error if the conversion fails or is unsupported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(here and also in all the nodes marshal / unmarshal funcs) shouldn't Marshal / Unmarshal semantics be vice versa?
- Marshalling - standard representation -> proto
- Unmarshalling - proto -> standard representation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could go either way, and it's relative to the package the conversion logic is defined in.
This signature
func (*Plan) MarshalPhysical() (*physical.Plan, error)looks more correct to me than
func (*Plan) UnmarshalPhysical() (*physical.Plan, error) I'd like to keep this one as-is for now, but let's revisit if it gets confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh I thought that "marshalling" has a unidirectional semantics - from a "standard" object to a DTO. The only "formal" definition I found is from Wiki (sorry 🤦)
marshalling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission.
More importantly, I see many projects stick to this semantics, as seen in the following examples:
I would propose something like this for the next iterations, wdyt?
// if we want to keep both methods on plan pointer receiver we could get rid of
// marshalling / unmarshalling semantics (given that Plan.Marshal and Plan.Unmarshal already exist
// and operate on bytes) and rename the methods to something more "mapping"-like
func (*Plan) ToPhysical() (*physical.Plan, error)
func (*Plan) FromPhysical(from *physical.Plan) errorThis adds two new packages, expressionpb and physicalpb, which are serializable representations of physical.Expression and physical.Plan, respectively. These packages include utility functions to convert between the protobuf representations and the planner types. A translation layer is used due to the complexity of integrating protobuf throughout the engine, as well as difficulties with finding a clean pattern to construct node types. #19638 took an initial attempt at fully integrating the protobuf types, but revealed that it is very challenging. While investiating the code, I observed that it's very clunky to work with the protobuf types, especailly with how often we rely on interface values. It's clear to me that we will want to eventually remove our translation layer, but doing it too soon means needing to update the entire engine code path twice. It is a much safer bet to start with a translation layer, find the right abstraction for constructing the protobuf, and then migrate once we have confidence in the pattern. Co-authored-by: Sophie Waldman <[email protected]>
As all usages of DAGs (physical plans, workflows) now use ULID for uniquely representing nodes, we no longer need to have a stringified ID method.
ff047e3 to
39cd797
Compare
This adds two new packages,
expressionpbandphysicalpb, which are serializable representations ofphysical.Expressionandphysical.Plan, respectively.These packages include utility functions to convert between the protobuf representations and the planner types.
A translation layer is used due to the complexity of integrating protobuf throughout the engine, as well as difficulties with finding a clean pattern to construct node types. #19638 took an initial attempt at fully integrating the protobuf types, but revealed that it is very challenging.
While helping with #19638, I observed that it's very clunky to work with the protobuf types, especially with how often we rely on interface values; these do not work as smoothly with protobuf's oneofs, resulting in quite painful code.
It's clear to me that we will want to eventually remove the translation layer, but we need more time to figure out how we should interact with the protobuf types cleanly throughout the codebase. Skipping straight to using the protobuf types now has too much of a risk of needing another massive PR. Given this, it's much safer bet to start with a translation layer, find the right abstraction for constructing the protobuf, and then migrate once we have confidence in the pattern.
Closes #19638.