Skip to content

Conversation

@hbgstc123
Copy link
Contributor

@hbgstc123 hbgstc123 commented Dec 8, 2022

Change Logs

code from org.apache.hudi.sink.compact.CompactOperator:

image

In flink inline async compaction, if OOM error happen during execution of doCompaction(...), exception hook will not be executed, and no CompactionCommitEvent is sent to CompactionCommitSink. Result in compaction instant stuck in "inflight" state, never succeed or rollback, in fact the compaction is failed.

I change the execute sequence: run ExceptionHook before re-throw FatalErrorOrOOM.

I tested and it can sent CompactionCommitEvent to compaction sink to rollback the failed compaction when OOM happen during compaction.

Impact

no

Risk level (write none, low medium or high below)

low

Documentation Update

no

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

ExceptionUtils.rethrowIfFatalErrorOrOOM(t);
final String errMsg = String.format("Executor executes action [%s] error", actionString.get());
logger.error(errMsg, t);
if (hook != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also move the log though ~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for review, moved

@danny0405 danny0405 self-assigned this Dec 9, 2022
@danny0405 danny0405 added area:table-service Table services engine:flink Flink integration priority:blocker Production down; release blocker labels Dec 9, 2022
@hbgstc123 hbgstc123 force-pushed the oom_compact_event_lost branch from 53bde66 to 913d570 Compare December 9, 2022 02:29
@hudi-bot
Copy link
Collaborator

hudi-bot commented Dec 9, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, nice catch ~

@danny0405 danny0405 changed the title [HUDI-5350]fix oom cause compaction event lost problem. [HUDI-5350] Fix oom cause compaction event lost problem Dec 9, 2022
@danny0405 danny0405 merged commit 115584c into apache:master Dec 9, 2022
nsivabalan pushed a commit that referenced this pull request Dec 13, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Dec 14, 2022
alexeykudinkin pushed a commit that referenced this pull request Dec 14, 2022
XuQianJin-Stars pushed a commit that referenced this pull request Jan 4, 2023
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:table-service Table services engine:flink Flink integration priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants