Skip to content

Conversation

@Askwang
Copy link
Contributor

@Askwang Askwang commented Nov 16, 2023

Change Logs

If there are many partitions and files When generating the clean plan, it's easy to throw oom exception even if configing a large memory. The default way is remote table view first, it can not fall back to secondary table view if remote view throws oom exception. Using secondary view first is more stable than remote view.

Impact

N/A

Risk level (write none, low medium or high below)

none

Documentation Update

N/A

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

LOG.info("Creating remote table view first");
return new PriorityBasedFileSystemView(remoteFileSystemView, secondaryView);
} else {
LOG.info("Creating secondary table view first");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zhedoubushishi , who have also encountered OOM for async cleaning.

@Askwang Askwang changed the title [HUDI-7105] support filesystem view configuable [HUDI-7105] support filesystem view configuable to avoid clean oom Nov 20, 2023
@danny0405
Copy link
Contributor

This PR may fix the similiar issue: #10002 (comment)

@Askwang
Copy link
Contributor Author

Askwang commented Nov 21, 2023

This PR may fix the similiar issue: #10002 (comment)

Nice~ That pr is the root cause for clean oom problem. But a little difference, Flink has only one clean operator, Spark can perform clean operation in parallel. We provide a config to allow using secondary table view first is also a optional way. This also can avoid clean oom.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405
Copy link
Contributor

danny0405 commented Nov 21, 2023

We provide a config to allow using secondary table view first is also a optional way

So only the clean task uses the local fs view ? I didn't see that change in Flink.

@Askwang
Copy link
Contributor Author

Askwang commented Nov 24, 2023

We provide a config to allow using secondary table view first is also a optional way

So only the clean task uses the local fs view ? I didn't see that change in Flink.

Yes, only clean task plan generator uses the local fs view. Flink only has one clean operator and this config maybe not helpful for Flink. I think we can add this config, it does not influence the Flink job. And it's helpful for clean plan task for Spark.

@danny0405
Copy link
Contributor

#10002 May already resolves the OOM pressure, it does further optimization also to compaction and clustering besides the cleaning service. so it should be a thorough solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants