Skip to content

Conversation

@hddong
Copy link
Contributor

@hddong hddong commented Apr 27, 2020

Tips

What is the purpose of the pull request

When roll over is true, HoodieLogFormatWriter will create next version log file. But it always left a blank file when close.

Brief change log

(for example:)

  • Clean blank file created by HoodieLogFormatWriter

Verify this pull request

This pull request is a trivial rework / code cleanup without any test coverage.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@hddong
Copy link
Contributor Author

hddong commented Apr 27, 2020

@yanghua please have a review when free.

@yanghua yanghua self-requested a review April 27, 2020 15:10
@yanghua yanghua self-assigned this Apr 27, 2020
@vinothchandar
Copy link
Member

@n3nash could you also review this since it touches the common log format writer as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hddong : Any possible reasons why blank file is created in the first place ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hddong : Any possible reasons why blank file is created in the first place ?

Create blank file for appendBlock when new HoodieLogFormatWriter
https://github.com/apache/incubator-hudi/blob/f1592be629c3f9762f62d4e1dbf3be54f213d92d/hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java#L105-L108
But there is a special case, when roll over is true(block size is past the threshold), we will close the old writer and create a new writer . And if we close a new writer created by rolloverIfNeeded , there will left a blank file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. but this is effectively doing an extra RPC every close, just to handle this corner case. what is the actual issue caused by the empty file? does the query/writing fail or get stuck?

@yanghua
Copy link
Contributor

yanghua commented Jul 9, 2020

@hddong This PR has conflicts files.

@hddong hddong force-pushed the remove-blank-file-HoodieLogFormatWriter branch from f1592be to fb2bf16 Compare July 10, 2020 02:09
@hddong
Copy link
Contributor Author

hddong commented Jul 10, 2020

@yanghua had rebase this.

@hddong
Copy link
Contributor Author

hddong commented Jul 27, 2020

@vinothchandar :

public static MessageType readSchemaFromLogFile(FileSystem fs, Path path) throws IOException {

readSchemaFromLogFile may read the blank file and the blank file had been read before in HoodieLogFileCommand(modified to avoid reading the blank files)

@bvaradar bvaradar force-pushed the remove-blank-file-HoodieLogFormatWriter branch from fb2bf16 to 54b0ed4 Compare September 25, 2020 04:35
@bvaradar
Copy link
Contributor

@hddong : I went ahead and redid this change in the interest of time :)
Instead of deleting on close, I have made changes to lazily create the log file when appending next block. This should avoid creating empty files.
(cc @vinothchandar )

@bvaradar bvaradar force-pushed the remove-blank-file-HoodieLogFormatWriter branch from 54b0ed4 to 96adac4 Compare September 25, 2020 06:06
@vinothchandar
Copy link
Member

LGTM

@vinothchandar vinothchandar merged commit 32c9cad into apache:master Sep 29, 2020
prashantwason pushed a commit to prashantwason/incubator-hudi that referenced this pull request Feb 22, 2021
kroushan-nit pushed a commit to kroushan-nit/hudi-oss-fork that referenced this pull request Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants