chore(engine): serializable physical plans #19672

rfratto · 2025-11-01T03:07:02Z

This adds two new packages, expressionpb and physicalpb, which are serializable representations of physical.Expression and physical.Plan, respectively.

These packages include utility functions to convert between the protobuf representations and the planner types.

A translation layer is used due to the complexity of integrating protobuf throughout the engine, as well as difficulties with finding a clean pattern to construct node types. #19638 took an initial attempt at fully integrating the protobuf types, but revealed that it is very challenging.

While helping with #19638, I observed that it's very clunky to work with the protobuf types, especially with how often we rely on interface values; these do not work as smoothly with protobuf's oneofs, resulting in quite painful code.

It's clear to me that we will want to eventually remove the translation layer, but we need more time to figure out how we should interact with the protobuf types cleanly throughout the codebase. Skipping straight to using the protobuf types now has too much of a risk of needing another massive PR. Given this, it's much safer bet to start with a translation layer, find the right abstraction for constructing the protobuf, and then migrate once we have confidence in the pattern.

Closes #19638.

Updates all physical nodes to use a ULID as their ID, and makes the field public for explicit node construction (which will be used for protobuf conversion). Unit tests which previously explicitly set the ULID have been updated to leave the ID as the empty ULID. Currently this field is never set (but will be in the following commit).

When creating a physical plan, each plan node will now have a unique ULID. The Clone method has been updated to generate a new ULID for the resulting cloned node. Workflows, for the time being, will reuse some node ULIDs when a node is found across multiple sharded tasks.

ashwanthgoli

LGTM

pkg/engine/internal/util/ulid/ulid.go

pkg/engine/internal/proto/ulid/ulid.proto

pkg/engine/internal/planner/physical/plan.go

pkg/engine/internal/util/ulid/ulid.go

ivkalita · 2025-11-03T10:45:38Z

pkg/engine/internal/proto/physicalpb/marshal.go

+// MarshalPhysical converts a protobuf plan into standard representation.
+// Returns an error if the conversion fails or is unsupported.


(here and also in all the nodes marshal / unmarshal funcs) shouldn't Marshal / Unmarshal semantics be vice versa?

Marshalling - standard representation -> proto

Unmarshalling - proto -> standard representation

I think it could go either way, and it's relative to the package the conversion logic is defined in.

This signature

func (*Plan) MarshalPhysical() (*physical.Plan, error)

looks more correct to me than

func (*Plan) UnmarshalPhysical() (*physical.Plan, error)

I'd like to keep this one as-is for now, but let's revisit if it gets confusing

Tbh I thought that "marshalling" has a unidirectional semantics - from a "standard" object to a DTO. The only "formal" definition I found is from Wiki (sorry 🤦)

marshalling is the process of transforming the memory representation of an object into a data format suitable for storage or transmission.

More importantly, I see many projects stick to this semantics, as seen in the following examples:

python

go encoding/json

Ruby Marshal

Rust binmarshal

I would propose something like this for the next iterations, wdyt?

// if we want to keep both methods on plan pointer receiver we could get rid of // marshalling / unmarshalling semantics (given that Plan.Marshal and Plan.Unmarshal already exist // and operate on bytes) and rename the methods to something more "mapping"-like func (*Plan) ToPhysical() (*physical.Plan, error) func (*Plan) FromPhysical(from *physical.Plan) error

This adds two new packages, expressionpb and physicalpb, which are serializable representations of physical.Expression and physical.Plan, respectively. These packages include utility functions to convert between the protobuf representations and the planner types. A translation layer is used due to the complexity of integrating protobuf throughout the engine, as well as difficulties with finding a clean pattern to construct node types. #19638 took an initial attempt at fully integrating the protobuf types, but revealed that it is very challenging. While investiating the code, I observed that it's very clunky to work with the protobuf types, especailly with how often we rely on interface values. It's clear to me that we will want to eventually remove our translation layer, but doing it too soon means needing to update the entire engine code path twice. It is a much safer bet to start with a translation layer, find the right abstraction for constructing the protobuf, and then migrate once we have confidence in the pattern. Co-authored-by: Sophie Waldman <[email protected]>

As all usages of DAGs (physical plans, workflows) now use ULID for uniquely representing nodes, we no longer need to have a stringified ID method.

rfratto added 2 commits October 31, 2025 23:00

rfratto requested a review from a team as a code owner November 1, 2025 03:07

pull-request-size bot added the size/XXL label Nov 1, 2025

chore(engine): add ULID method to physical.Node interface

f6c1b12

rfratto force-pushed the physical-plan-proto branch 2 times, most recently from 7ac0a91 to ff047e3 Compare November 1, 2025 03:27

ashwanthgoli approved these changes Nov 3, 2025

View reviewed changes

pkg/engine/internal/util/ulid/ulid.go Outdated Show resolved Hide resolved

chaudum reviewed Nov 3, 2025

View reviewed changes

pkg/engine/internal/proto/ulid/ulid.proto Show resolved Hide resolved

ivkalita reviewed Nov 3, 2025

View reviewed changes

pkg/engine/internal/planner/physical/plan.go Outdated Show resolved Hide resolved

ivkalita reviewed Nov 3, 2025

View reviewed changes

pkg/engine/internal/planner/physical/plan.go Show resolved Hide resolved

ivkalita reviewed Nov 3, 2025

View reviewed changes

rfratto and others added 2 commits November 3, 2025 08:44

chore(engine): update dag package to use ULID for node IDs

39cd797

As all usages of DAGs (physical plans, workflows) now use ULID for uniquely representing nodes, we no longer need to have a stringified ID method.

rfratto force-pushed the physical-plan-proto branch from ff047e3 to 39cd797 Compare November 3, 2025 13:44

rfratto mentioned this pull request Nov 3, 2025

chore(topk): improvements to reduce alloc bytes and alloc space #19660

Merged

6 tasks

rfratto enabled auto-merge (squash) November 3, 2025 13:49

rfratto merged commit 016ee7d into main Nov 3, 2025
67 checks passed

rfratto deleted the physical-plan-proto branch November 3, 2025 13:50

rfratto mentioned this pull request Nov 3, 2025

chore(engine): Add HTTP/2 transport for wire protocol #19575

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(engine): serializable physical plans #19672

chore(engine): serializable physical plans #19672

Uh oh!

rfratto commented Nov 1, 2025 •

edited

Loading

Uh oh!

ashwanthgoli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivkalita Nov 3, 2025

Uh oh!

rfratto Nov 3, 2025

Uh oh!

ivkalita Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		// MarshalPhysical converts a protobuf plan into standard representation.
		// Returns an error if the conversion fails or is unsupported.

chore(engine): serializable physical plans #19672

chore(engine): serializable physical plans #19672

Uh oh!

Conversation

rfratto commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashwanthgoli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ivkalita Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

rfratto Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

ivkalita Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rfratto commented Nov 1, 2025 •

edited

Loading