Adding non-recursive directory listing support in trino filesystem and using the non-rec api in hudi metadata dir listing by ryadav-uptycs · Pull Request #20686 · trinodb/trino

ryadav-uptycs · 2024-02-13T17:07:10Z

Details are Captured in
#20253

Hudi connector shows incorrect table count for some of the tables where we run clustering

After digging deeper into the code we found that in newer version of trino .hudi metapath listing is done recursively (https://github.com/trinodb/trino/blob/430/plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/table/HudiTableMetaClient.java#L166) which is causing to incorrect oldest timestamp file calculation. Since it is doing recursively it is also reading commit files from .hoodie/.bucket_index/consistent_hashing_metadata//20231101000038468.commit which was placed to update the state of clustering. it is causing to skip some of the parquet files while listing the directory. hence incorrectly data count. This clustering commit files in hudi was done as part of this PR (apache/hudi#8503)

if we add filter after getting the listing result the query takes time in planning phase , but after adding non-recursive implementation planning is finishing quickly

PS: This is same PR as #20255 after resolving all conflicts

codope · 2024-02-14T13:35:15Z

@ryadav-uptycs Thanks for the contribution! I think 7da7882 missed filtering lot of files under .hoodie and we are landing into this bug due to that. That commit was to get rid f Hadoop dependencies. We are working on newer abstractions in Hudi code so that there is no need to inline Hudi classes in Trino code. And these new APIs will be Hadoop independent. Until then, we need to land this fix. I will review the PR this week.

…e api in hudi-connector to avoid incorrect query resul extra check

ryadav-uptycs · 2024-02-20T04:17:44Z

@electrum Could you please review it

github-actions · 2024-03-12T17:03:49Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

dain

This doesn't seem like the right approach to me. Instead, can't Hudi just filter the reslts to remove the entries it doesn't want?

ryadav-uptycs · 2024-03-20T06:44:29Z

This doesn't seem like the right approach to me. Instead, can't Hudi just filter the reslts to remove the entries it doesn't want?

@dain hudi can , but listing will be faster if we dont list the dir in the first place . filter will be helpful to get the desired result but it will add performance impact

github-actions · 2024-04-10T17:46:27Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions · 2024-05-10T17:40:25Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions · 2024-06-03T17:03:30Z

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

mosabua · 2024-06-03T18:16:07Z

Will leave this closed since the upcoming Hudi release and work from @yihua will affect how to proceed on this work (and if)

cla-bot bot added the cla-signed label Feb 13, 2024

ryadav-uptycs force-pushed the nonrec-dirlist branch from a8722e9 to b51c0b2 Compare February 13, 2024 17:14

ryadav-uptycs requested review from codope, findepi and mosabua February 13, 2024 17:20

mosabua requested review from yihua and removed request for mosabua February 13, 2024 17:29

github-actions bot added tests:hive hudi Hudi connector hive Hive connector labels Feb 13, 2024

findepi requested a review from electrum February 13, 2024 21:10

Adding non-recursive directory listing support and using non-recursiv…

98dc4a8

…e api in hudi-connector to avoid incorrect query resul extra check

ryadav-uptycs force-pushed the nonrec-dirlist branch from f6acda0 to 98dc4a8 Compare February 14, 2024 14:10

github-actions bot added the stale label Mar 12, 2024

dain added the syntax-needs-review label Mar 19, 2024

dain requested changes Mar 19, 2024

View reviewed changes

github-actions bot removed the stale label Mar 20, 2024

github-actions bot added the stale label Apr 10, 2024

findepi removed the tests:hive label Apr 18, 2024

github-actions bot removed the stale label Apr 19, 2024

github-actions bot added the stale label May 10, 2024

github-actions bot closed this Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding non-recursive directory listing support in trino filesystem and using the non-rec api in hudi metadata dir listing#20686

Adding non-recursive directory listing support in trino filesystem and using the non-rec api in hudi metadata dir listing#20686
ryadav-uptycs wants to merge 1 commit intotrinodb:masterfrom
ryadav-uptycs:nonrec-dirlist

ryadav-uptycs commented Feb 13, 2024 •

edited

Loading

Uh oh!

codope commented Feb 14, 2024

Uh oh!

ryadav-uptycs commented Feb 20, 2024

Uh oh!

github-actions bot commented Mar 12, 2024

Uh oh!

dain left a comment

Uh oh!

ryadav-uptycs commented Mar 20, 2024

Uh oh!

github-actions bot commented Apr 10, 2024

Uh oh!

github-actions bot commented May 10, 2024

Uh oh!

github-actions bot commented Jun 3, 2024

Uh oh!

mosabua commented Jun 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Conversation

ryadav-uptycs commented Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codope commented Feb 14, 2024

Uh oh!

ryadav-uptycs commented Feb 20, 2024

Uh oh!

github-actions bot commented Mar 12, 2024

Uh oh!

dain left a comment

Choose a reason for hiding this comment

Uh oh!

ryadav-uptycs commented Mar 20, 2024

Uh oh!

github-actions bot commented Apr 10, 2024

Uh oh!

github-actions bot commented May 10, 2024

Uh oh!

github-actions bot commented Jun 3, 2024

Uh oh!

mosabua commented Jun 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

ryadav-uptycs commented Feb 13, 2024 •

edited

Loading