Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance short circuit handling in CommonSubexprEliminate #11197

Merged
merged 3 commits into from
Jul 3, 2024

Conversation

peter-toth
Copy link
Contributor

Which issue does this PR close?

Part of #11194.

Rationale for this change

Curently CommonSubexprEliminate always skips expressions that can short circuit, but that's not quite right as only the children of those expressions should be skipped.

I.e. from the plan:

Projection (a OR b) AS c1, (a OR b) AS c2

the expression a OR b can be extracted as common expression, but from the plan:

Projection (c OR (a + b)) AS c1, (d OR (a + b)) AS c2

the expression a + b can't be extracted.

What changes are included in this PR?

Changes the rule to handle short circuiting expressions better. As VisitRecord::JumpMark is no longer needed this PR removes it.

Are these changes tested?

Yes, added new UT.

Are there any user-facing changes?

No.

@alamb
Copy link
Contributor

alamb commented Jul 1, 2024

Thanks @peter-toth -- I plan to review this tomorrow morning

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @peter-toth -- I spent some time this morning playing with this PR and I think it makes sense to me.

Maybe @haohuaijin (as the original author of this feature in #8928) might have some time to review this PR as well

}
let is_tree = expr.short_circuits();
let tnr = if is_tree {
TreeNodeRecursion::Jump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the meaning of is_tree here.

Maybe we could add a comment explaining that Jump will skip children but continue with siblings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can rename is_tree to is_short_circuits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot to add a comment here. Basically I wanted to express that we handle the expression as a subtree (not just a node) in this case.
I added a comment in c02bae9.


let extracted_short_circuit = col("a").eq(lit(0)).or(col("b").eq(lit(0)));
let not_extracted_short_circuit_leg = (col("a") + col("b")).eq(lit(0));
let plan = LogicalPlanBuilder::from(table_scan.clone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test covers the negative case too.

Copy link
Contributor

@haohuaijin haohuaijin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @peter-toth and @alamb,

assert_optimized_plan_eq(expected, plan, None);

Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test case like the one below to check if (a or b) can be extracted as a common subexpr?

select ((a or b) or d) as f1, ((a or b) or c) as f2 from t;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR doesn't extract surely evaluated children of short circuiting expressions so I kept that TODO yet ( https://github.com/apache/datafusion/pull/11197/files#diff-351499880963d6a383c92e156e75019cd9ce33107724a9635853d7d4cd1898d0R955).
I will address that in a separate follow-up PR...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add the test (as a negative test perhaps)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted the test case to test both legs of an OR expression: 8b75d82. In this PR none of them are extraceted, but the after that follow-up PR the srurely evaluated first leg (called not_extracted_short_circuit_leg_1 now) will be extracted.

}
let is_tree = expr.short_circuits();
let tnr = if is_tree {
TreeNodeRecursion::Jump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can rename is_tree to is_short_circuits

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peter-toth and @haohuaijin -- I plan to merge this PR tomorrow unless there are other comments or other reviewers would like more time to review

@alamb alamb merged commit 5bdc745 into apache:main Jul 3, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 3, 2024

Thanks again everyone

comphead pushed a commit to comphead/arrow-datafusion that referenced this pull request Jul 8, 2024
)

* Enhance short circuit handling in `CommonSubexprEliminate`

* explain is_tree

* adjust test
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
)

* Enhance short circuit handling in `CommonSubexprEliminate`

* explain is_tree

* adjust test
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants