-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-24625 AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length. #1970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
1 similar comment
|
💔 -1 overall
This message was automatically generated. |
|
retest build |
wchevreuil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is to say, AsyncFSWAL.getLogFileSizeIfBeingWritten could not reflect the file length which successfully synced to underlying HDFS, which is not as expected.
Wasn't that intentional, as a mean to proper track WAL files still open for write? For example, in case of replication, it should go as far as any entry got already appended, no? Ping @Apache9 who worked on this before to give more thoughts.
|
|
||
| protected FSDataOutputStream output; | ||
|
|
||
| private volatile long syncedLength = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an AtomicLong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems that using AtomicLong is unnecessary, because AtomicLong could not provide update if greater than semantics, so I used synchronized keyword here when updating the syncedLength for simplicity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use AtomicUtils.updateMax. It is a util class in hbase-common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used AtomicUtils to replace synchronized, thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why do we have AtomicUtils.updateMax? It seems getAndAccumulate is designed for this use case, i.e., syncedLength.getAndAccumulate(fsdos.getPos(), Math::max)
This guy contacted me offline and I confirmed that this should be a problem. What I can recall is that, when doing some bug fixes and improving the performance in AsyncFSWAL, I changed the way we calculate the length of the writer. Maybe I forget the assumption in HBASE-14004 when doing these changes and lead to the problem. So @comnetwork , please add more comments to say why we need the getSyncedLength method in the WAL.Writer interface? So later people will not break it again. Thanks. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
@Apache9 , I already added comments for /**
|
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
|
||
| protected FSDataOutputStream output; | ||
|
|
||
| private volatile long syncedLength = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use AtomicUtils.updateMax. It is a util class in hbase-common.
|
@Apache9 , Used |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
Apache9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But please fix the javadoc issues? After you fix the javadoc issues I will merge the PR.
Thanks.
| interface WriterBase extends Closeable { | ||
| long getLength(); | ||
| /** | ||
| * <pre> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not need to use pre here? These are just normal text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I would fix it.
|
Hi, any updates here? @comnetwork |
|
Removed the pre tag in the comment and fix the javadoc bug. |
|
🎊 +1 overall
This message was automatically generated. |
|
There are still checkstyle issues? |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
Fixed the checkstyle errors. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…he expected synced file length. (#1970) Signed-off-by: Duo Zhang <[email protected]>
…he expected synced file length. (#1970) Signed-off-by: Duo Zhang <[email protected]>
…he expected synced file length. (apache#1970) Signed-off-by: Duo Zhang <[email protected]>
| future.complete(out.getPos()); | ||
| long pos = out.getPos(); | ||
| if(pos > this.syncedLength) { | ||
| this.syncedLength = pos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This read-followedby-update also needs to be atomic, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is just for test so not a big problem but aligning with other producation implementations is better. Can have an addendum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, reviewed the code again, actuall, the flush0 method can only be executed in a single thread so no need to use AtomicUtils.updateMax. The AtomicLong is in the ProtobufLogWriter, not the output stream. But the 'if(pos > this.syncedLength) {' is a bit confusing to developers, I prefer we just remove this check...
…he expected synced file length. (apache#1970) Signed-off-by: Duo Zhang <[email protected]>
…return the expected synced file length. (apache#1970)" This reverts commit f834919.
…he expected synced file length. (apache#1970) Author: chenglei Reason: Bug Ref: CDPD-15964 Signed-off-by: Duo Zhang <[email protected]> Change-Id: Ie87d5e6f1eb47c48f413a31a0e5507b0090f5fe1 (cherry picked from commit e064d08)
…he expected synced file length. (apache#1970) Signed-off-by: Duo Zhang <[email protected]> (cherry picked from commit bf587fa) Change-Id: Ie87d5e6f1eb47c48f413a31a0e5507b0090f5fe1
…return the expected synced file length. (apache#1970)" This reverts commit bf587fa. (cherry picked from commit 160c229) Change-Id: Ifffde0910e6a219956697f417b76da31816a7852
In the PR, I added a new method
getSyncedLengthtoWriterBaseinterface to return the length which successfully synced to underlying fileSystem.and
AbstractFSWAL.getLogFileSizeIfBeingWrittenis relied on the addedWriteBase.getSyncedLengthmethod: