Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Nov 19, 2022

Change Logs

This PR addresses the #7234 related to HoodieMergeHandle shutdown sequence:
in introduced at #4264 we changed the ordering in which we shut down the handle relative to the executor:

Before it was

  1. Handle
  2. Executor

After

  1. Executor
  2. Handle

The reason it was switched was to handle the case when during exception thrown executor might still be writing out records, and closing of the handle (before the executor) was leaving some of the produced Parquet files corrupted.

This PR, addresses this issue by making sure that in the successful path we close the Handle immediately as soon as writing has finished (before we shutdown the executor), which would make sure this will not result in any PipeBroken exceptions in GCS

Impact

No impact

Risk level (write none, low medium or high below)

Low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope codope added priority:critical Production degraded; pipelines stalled writer-core labels Nov 28, 2022
@nsivabalan nsivabalan added priority:blocker Production down; release blocker and removed priority:critical Production degraded; pipelines stalled labels Dec 6, 2022
@nsivabalan nsivabalan self-assigned this Dec 6, 2022
@nsivabalan nsivabalan added the release-0.12.2 Patches targetted for 0.12.2 label Dec 6, 2022
@alexeykudinkin alexeykudinkin removed the release-0.12.2 Patches targetted for 0.12.2 label Dec 6, 2022
@codope codope added the release-0.12.2 Patches targetted for 0.12.2 label Dec 7, 2022
@alexeykudinkin alexeykudinkin changed the title [HUDI-5238][Stacked on 7238] Fixing HoodieMergeHandle shutdown sequence [HUDI-5238] Fixing HoodieMergeHandle shutdown sequence Dec 8, 2022
@alexeykudinkin alexeykudinkin removed the release-0.12.2 Patches targetted for 0.12.2 label Dec 8, 2022
@alexeykudinkin alexeykudinkin added priority:critical Production degraded; pipelines stalled and removed priority:blocker Production down; release blocker labels Jan 25, 2023
@alexeykudinkin alexeykudinkin force-pushed the ak/exeq-trdwn-fix branch 2 times, most recently from 85eaa8d to 9fd17ce Compare February 17, 2023 18:42
@xushiyan xushiyan force-pushed the ak/exeq-trdwn-fix branch from 49aea36 to f5753cd Compare May 19, 2023 08:28
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan xushiyan merged commit 4c980f3 into apache:master May 19, 2023
@prithiviiiiiiii
Copy link

@xushiyan / @alexeykudinkin / @nsivabalan i am facing this exact same issue with 12.0 #7234.

One question: why this happens selectively? In my case, many tables' upserts are running fine with 12.0 except one table. Why?
i just want to understand on what's different in this one table of mine such that it is not upserting on 12.0.

Also how should i fix this. Do i need to migrate to a later version? or is there a better way to fix this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled release-0.14.0

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants