-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-7105] support filesystem view configuable to avoid clean oom #10116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| LOG.info("Creating remote table view first"); | ||
| return new PriorityBasedFileSystemView(remoteFileSystemView, secondaryView); | ||
| } else { | ||
| LOG.info("Creating secondary table view first"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @zhedoubushishi , who have also encountered OOM for async cleaning.
|
This PR may fix the similiar issue: #10002 (comment) |
Nice~ That pr is the root cause for clean oom problem. But a little difference, Flink has only one clean operator, Spark can perform clean operation in parallel. We provide a config to allow using secondary table view first is also a optional way. This also can avoid clean oom. |
So only the clean task uses the local fs view ? I didn't see that change in Flink. |
Yes, only clean task plan generator uses the local fs view. Flink only has one clean operator and this config maybe not helpful for Flink. I think we can add this config, it does not influence the Flink job. And it's helpful for clean plan task for Spark. |
|
#10002 May already resolves the OOM pressure, it does further optimization also to compaction and clustering besides the cleaning service. so it should be a thorough solution. |
Change Logs
If there are many partitions and files When generating the clean plan, it's easy to throw oom exception even if configing a large memory. The default way is remote table view first, it can not fall back to secondary table view if remote view throws oom exception. Using secondary view first is more stable than remote view.
Impact
N/A
Risk level (write none, low medium or high below)
none
Documentation Update
N/A
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist