Skip to content

Conversation

@rfratto
Copy link
Member

@rfratto rfratto commented Nov 1, 2025

This adds two new packages, expressionpb and physicalpb, which are serializable representations of physical.Expression and physical.Plan, respectively.

These packages include utility functions to convert between the protobuf representations and the planner types.

A translation layer is used due to the complexity of integrating protobuf throughout the engine, as well as difficulties with finding a clean pattern to construct node types. #19638 took an initial attempt at fully integrating the protobuf types, but revealed that it is very challenging.

While helping with #19638, I observed that it's very clunky to work with the protobuf types, especially with how often we rely on interface values; these do not work as smoothly with protobuf's oneofs, resulting in quite painful code.

It's clear to me that we will want to eventually remove the translation layer, but we need more time to figure out how we should interact with the protobuf types cleanly throughout the codebase. Skipping straight to using the protobuf types now has too much of a risk of needing another massive PR. Given this, it's much safer bet to start with a translation layer, find the right abstraction for constructing the protobuf, and then migrate once we have confidence in the pattern.

Closes #19638.

Updates all physical nodes to use a ULID as their ID, and makes the
field public for explicit node construction (which will be used for
protobuf conversion).

Unit tests which previously explicitly set the ULID have been updated to
leave the ID as the empty ULID.

Currently this field is never set (but will be in the following commit).
When creating a physical plan, each plan node will now have a unique
ULID. The Clone method has been updated to generate a new ULID for the
resulting cloned node.

Workflows, for the time being, will reuse some node ULIDs when a node is
found across multiple sharded tasks.
@rfratto rfratto requested a review from a team as a code owner November 1, 2025 03:07
@rfratto rfratto force-pushed the physical-plan-proto branch 2 times, most recently from 7ac0a91 to ff047e3 Compare November 1, 2025 03:27
Copy link
Contributor

@ashwanthgoli ashwanthgoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +12 to +13
// MarshalPhysical converts a protobuf plan into standard representation.
// Returns an error if the conversion fails or is unsupported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(here and also in all the nodes marshal / unmarshal funcs) shouldn't Marshal / Unmarshal semantics be vice versa?

  • Marshalling - standard representation -> proto
  • Unmarshalling - proto -> standard representation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could go either way, and it's relative to the package the conversion logic is defined in.

This signature

func (*Plan) MarshalPhysical() (*physical.Plan, error)

looks more correct to me than

func (*Plan) UnmarshalPhysical() (*physical.Plan, error) 

I'd like to keep this one as-is for now, but let's revisit if it gets confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I thought that "marshalling" has a unidirectional semantics - from a "standard" object to a DTO. The only "formal" definition I found is from Wiki (sorry 🤦)

marshalling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission.

More importantly, I see many projects stick to this semantics, as seen in the following examples:

I would propose something like this for the next iterations, wdyt?

// if we want to keep both methods on plan pointer receiver we could get rid of 
// marshalling / unmarshalling semantics (given that Plan.Marshal and Plan.Unmarshal already exist
// and operate on bytes) and rename the methods to something more "mapping"-like
func (*Plan) ToPhysical() (*physical.Plan, error) 
func (*Plan) FromPhysical(from *physical.Plan) error

rfratto and others added 2 commits November 3, 2025 08:44
This adds two new packages, expressionpb and physicalpb, which are
serializable representations of physical.Expression and physical.Plan,
respectively.

These packages include utility functions to convert between the protobuf
representations and the planner types.

A translation layer is used due to the complexity of integrating
protobuf throughout the engine, as well as difficulties with finding a
clean pattern to construct node types. #19638 took an
initial attempt at fully integrating the protobuf types, but revealed
that it is very challenging.

While investiating the code, I observed that it's very clunky to work
with the protobuf types, especailly with how often we rely on interface
values. It's clear to me that we will want to eventually remove our
translation layer, but doing it too soon means needing to update the
entire engine code path twice. It is a much safer bet to start with a
translation layer, find the right abstraction for constructing the
protobuf, and then migrate once we have confidence in the pattern.

Co-authored-by: Sophie Waldman <[email protected]>
As all usages of DAGs (physical plans, workflows) now use ULID for
uniquely representing nodes, we no longer need to have a stringified ID
method.
@rfratto rfratto force-pushed the physical-plan-proto branch from ff047e3 to 39cd797 Compare November 3, 2025 13:44
@rfratto rfratto enabled auto-merge (squash) November 3, 2025 13:49
@rfratto rfratto merged commit 016ee7d into main Nov 3, 2025
67 checks passed
@rfratto rfratto deleted the physical-plan-proto branch November 3, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants