Skip to content

[native] Push down FilterNode into TableScan#23755

Merged
aditi-pandit merged 1 commit intoprestodb:masterfrom
Yuhta:tasks/T202164911/0
Oct 2, 2024
Merged

[native] Push down FilterNode into TableScan#23755
aditi-pandit merged 1 commit intoprestodb:masterfrom
Yuhta:tasks/T202164911/0

Conversation

@Yuhta
Copy link
Contributor

@Yuhta Yuhta commented Oct 1, 2024

Sometimes the planner put a FilterNode right after TableScanNode. This change merges these 2 nodes by moving the filter expression into remaining filter of the table scan when possible. This results in fewer IO and CPU because table scan can leverage the information in remaining filter to skip file stripes in case of random sampling.

Also add $path and $bucket in split info columns and fix split counts in coordinator UI.

@Yuhta Yuhta force-pushed the tasks/T202164911/0 branch from 493b3ce to 8c97879 Compare October 1, 2024 15:45
@Yuhta Yuhta marked this pull request as ready for review October 1, 2024 17:32
@Yuhta Yuhta requested a review from a team as a code owner October 1, 2024 17:32
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Yuhta. Had a high level question:

Am curious why we are doing the pushdown of FilterNode into TableScan during Presto -> Velox plan conversion ? This could be an optimization in Velox Local Planner as well. That would make it common to Presto and Spark.

@Yuhta
Copy link
Contributor Author

Yuhta commented Oct 1, 2024

@aditi-pandit Local planner has no dependency on Hive so it does not have knowledge about remaining filter. I guess this was the same reason why Presto planner does not put it in remaining filter as well (did not verify). Seems Prestissimo is a reasonable place to fix this, as we have all the dependency needed and we have the knowledge that this will benefit the execution layer (Velox).

@aditi-pandit
Copy link
Contributor

@aditi-pandit Local planner has no dependency on Hive so it does not have knowledge about remaining filter. I guess this was the same reason why Presto planner does not put it in remaining filter as well (did not verify). Seems Prestissimo is a reasonable place to fix this, as we have all the dependency needed and we have the knowledge that this will benefit the execution layer (Velox).

Hmm... Yeah, Velox Local Planner doesn't have dependency on Hive. Alright, your approach is reasonable.

@Yuhta Yuhta force-pushed the tasks/T202164911/0 branch from 8c97879 to ee4bb8e Compare October 2, 2024 15:10
Also add `$path` and `$bucket` in split info columns and fix split counts in
coordinator UI.
@Yuhta Yuhta force-pushed the tasks/T202164911/0 branch from ee4bb8e to cff51f6 Compare October 2, 2024 15:10
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Yuhta for this code, and answering all my questions.

@aditi-pandit aditi-pandit merged commit 176e886 into prestodb:master Oct 2, 2024
@jaystarshot jaystarshot mentioned this pull request Nov 1, 2024
25 tasks
@jaystarshot
Copy link
Member

Please consider adding release notes following our release notes guide - link.

Using below for now

*Merged FilterNode into TableScanNode where possible, reducing I/O and CPU; added $path and $bucket to split info, and fixed split counts in coordinator UI. :pr:23755``

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments