Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic IDs for ExecutionPlan #11364

Open
ameyc opened this issue Jul 9, 2024 · 7 comments · May be fixed by #12186
Open

Deterministic IDs for ExecutionPlan #11364

ameyc opened this issue Jul 9, 2024 · 7 comments · May be fixed by #12186
Labels
enhancement New feature or request

Comments

@ameyc
Copy link
Contributor

ameyc commented Jul 9, 2024

Is your feature request related to a problem or challenge?

Currently execution plans do not have an id associated with them this makes comparison of metrics across the runs. Additionally we would like to add a UI to our project to display the metrics as well snapshot our plans.

Looking at the code, it seems to me that the the physical_planner does record the node index which is not passed down to map_logical_node_to_physical.

Describe the solution you'd like

It would quite trivial to pass the node index to ExecutionPlan creation step and add additional field operator_id plan_id to ExecutionPlanProperties.

Describe alternatives you've considered

We looked into using task_id field, but it always seems to be set to None.

Additional context

No response

@ameyc ameyc added the enhancement New feature or request label Jul 9, 2024
@mustafasrepo
Copy link
Contributor

During planning there are couple of physical optimization rules which modify the ExecutionPlan (rules can be found in the link).
Hence, I think proper place to generate id for an ExecutionPlan is the end of all optimizations where ExecutionPlan is finalized (doing this during the conversion from logical plan to physical plan may be premature).

I guess, once ExecutionPlan is finalized, we can generate an id for the plan using TreeNode API.

@ameyc
Copy link
Contributor Author

ameyc commented Aug 24, 2024

@ozankabak @alamb taking a look at this again as we are working on snapshotting and UI for Denoramlized. It seems to me that the ideal place to add these is in the PlanProperties with something like

pub struct PlanProperties { pub eq_properties: EquivalenceProperties, pub partitioning: Partitioning, pub execution_mode: ExecutionMode, output_ordering: Option<LexOrdering>, node_id: Option<usize>, }

with a method on ExecutionPlan to set the node_id (as this isnt available when the compute_properties() is called on the node at creation time.
fn with_node_id(&mut self, _node_id: usize) {}

Once the final physical plan is generated in create_physical_plan() after all the optimizer passes, we can then visit all the nodes and call "with_node_id" on them. Does this sound like a reasonable approach?

@alamb
Copy link
Contributor

alamb commented Aug 25, 2024

Once the final physical plan is generated in create_physical_plan() after all the optimizer passes, we can then visit all the nodes and call "with_node_id" on them. Does this sound like a reasonable approach?

I do think this sounds reasonable

Some questions and thoughts:

  1. What would be the default value of id (if with_node_id wasn't called)?
  2. You might want to consider an API like pub fn with_node_id(self: Arc<...>, node_id: usize if you can as the ExcutionPlan nodes are almost always Arcd)
  3. I would recommend not showing this id in the explain plan if they change / are not deterministic
  4. I would recommend extending one of the examples (perhaps related to planning, or make one related to streaming) showing how you use this field both as a test as well as to ensure the API works well

@ozankabak
Copy link
Contributor

Overall sounds reasonable, I will circle back tomorrow after discussing with Synnada folks. Maybe we can upstream some code to help.

@ameyc
Copy link
Contributor Author

ameyc commented Aug 26, 2024

  • What would be the default value of id (if with_node_id wasn't called)?

so was thinking if this isn't set, simply letting it be None in the default. this will be mostly to account for UserDefined ExecutionPlans that haven't overridden that method.

@ameyc ameyc linked a pull request Aug 27, 2024 that will close this issue
@ameyc
Copy link
Contributor Author

ameyc commented Aug 27, 2024

Just cut a PR based on feedback from @alamb

@ozankabak
Copy link
Contributor

We have been thinking about this, I will reply in the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants