Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: avoid repeat format in calc_func_dependencies_for_project #12305

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

haohuaijin
Copy link
Contributor

@haohuaijin haohuaijin commented Sep 3, 2024

Which issue does this PR close?

Closes #12304

Rationale for this change

see #12304; the reason for the performance degradation is that we format the same expr in a loop.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Sep 3, 2024
@haohuaijin
Copy link
Contributor Author

the benchmarks are below, they compare main and this pr
image
the below benchmarks are compared main, datafusion v41.0.0 and this pr
image

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @haohuaijin

.map(|i| vec![i])
.unwrap_or(vec![])),
Expr::Alias(alias) => {
let name = format!("{}", alias.expr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprise that rust compile doesn't optimize on this 😮

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because the rust doesn't know the length of inputs_fields at compile time, it doesn't optimize this case.

@@ -2793,22 +2793,28 @@ fn calc_func_dependencies_for_project(
.filter_map(|(qualifier, f)| {
let flat_name = qualifier
.map(|t| format!("{}.{}", t, f.name()))
.unwrap_or(f.name().clone());
.unwrap_or_else(|| f.name().clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguments passed to unwrap_or are eagerly evaluated; if you are passing the result of a function call, it is recommended to use unwrap_or_else, which is lazily evaluated.

👍

@lewiszlw lewiszlw merged commit 6bbad7e into apache:main Sep 4, 2024
26 checks passed
@haohuaijin haohuaijin deleted the improve-plan-time branch September 4, 2024 08:51
@haohuaijin haohuaijin restored the improve-plan-time branch September 4, 2024 15:34
@haohuaijin haohuaijin deleted the improve-plan-time branch September 19, 2024 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

significant degraded performance of select * from large tables in main compared to datafusion v41.0.0
6 participants