Conversation
|
any benchmark? |
|
@tooptoop4 We benchmarked with 6-month GitHub archive data (220 GB, 450 million records) and observed significant improvements with column stats and clustering enabled. |
|
First of all a big thanks from me for the work on the connector and on hudi in general 👍. Is there a rough ETA on when this will be merged and be available in trino? Thanks again. |
|
@codope do you plan to rebase and update this PR soon? |
|
Is there any plan to support the use case where an operation such as (Note: The =|>|< symbol is not a standard operator in most query languages, so for an accurate translation, it would be helpful to provide additional context or clarification on what the intended operation is.) |
The current PR does not support that. In Hudi 1.0, we are adding functinal indexes which can support skipping data based on function/expression on column(s). |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
👋 @codope .. I assume you are continuing this work at some stage and will leave the PR open. |
|
I am closing this PR. Once we upgrade the Hudi version in Trino (with the updated Hadoop-independent abstraction), we'll revive or create a new PR for data skipping. |

Description
Use Hudi column stats to skip data and improve query latency.
Stacked on top of #16034
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: