-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-26093 Replication is stuck due to zero length wal file in oldWALs directory #3504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
36818f7
5a86597
c762357
397d325
a0d092b
21827b8
7a220d6
c53892b
8450440
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -316,9 +316,8 @@ private boolean openNextLog() throws IOException { | |
| return false; | ||
| } | ||
|
|
||
| private Path getArchivedLog(Path path) throws IOException { | ||
| Path getArchivedLog(Path path) throws IOException { | ||
|
||
| Path walRootDir = CommonFSUtils.getWALRootDir(conf); | ||
|
|
||
| // Try found the log in old dir | ||
| Path oldLogDir = new Path(walRootDir, HConstants.HREGION_OLDLOGDIR_NAME); | ||
| Path archivedLogLocation = new Path(oldLogDir, path.getName()); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -52,6 +52,7 @@ | |
| import org.apache.hadoop.hbase.KeyValue; | ||
| import org.apache.hadoop.hbase.Server; | ||
| import org.apache.hadoop.hbase.TableName; | ||
| import org.apache.hadoop.hbase.Waiter; | ||
| import org.apache.hadoop.hbase.Waiter.ExplainingPredicate; | ||
| import org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL; | ||
| import org.apache.hadoop.hbase.regionserver.wal.WALCellCodec; | ||
|
|
@@ -716,4 +717,52 @@ public void testCleanClosedWALs() throws Exception { | |
| assertEquals(0, logQueue.getMetrics().getUncleanlyClosedWALs()); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Tests that we handle EOFException properly if the wal has moved to oldWALs directory. | ||
| * @throws Exception exception | ||
| */ | ||
| @Test | ||
| public void testEOFExceptionInOldWALsDirectory() throws Exception { | ||
| assertEquals(1, logQueue.getQueueSize(fakeWalGroupId)); | ||
| AbstractFSWAL abstractWAL = (AbstractFSWAL)log; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: We don't need this down cast, do we?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bharathv It is needed. We are getting the current wal file name via AbstractFSWAL#getCurrentFileName so that we can truncate that wal file to create 0 size wal file. WAL class doesn't have this method. |
||
| Path emptyLogFile = abstractWAL.getCurrentFileName(); | ||
| log.rollWriter(true); | ||
|
|
||
| // AsyncFSWAl and FSHLog both moves the log from WALs to oldWALs directory asynchronously. | ||
| // Wait for in flight wal close count to become 0. This makes sure that empty wal is moved to | ||
| // oldWALs directory. | ||
| Waiter.waitFor(CONF, 5000, | ||
| (Waiter.Predicate<Exception>) () ->abstractWAL.getInflightWALCloseCount() == 0); | ||
| // There will 2 logs in the queue. | ||
| assertEquals(2, logQueue.getQueueSize(fakeWalGroupId)); | ||
|
|
||
| Configuration localConf = new Configuration(CONF); | ||
| localConf.setInt("replication.source.maxretriesmultiplier", 1); | ||
| localConf.setBoolean("replication.source.eof.autorecovery", true); | ||
|
|
||
| try (WALEntryStream entryStream = new WALEntryStreamWithRetries(logQueue, localConf, 0, log, | ||
| null, logQueue.getMetrics(), fakeWalGroupId)) { | ||
| // Get the archived dir path for the first wal. | ||
| Path archivePath = entryStream.getArchivedLog(emptyLogFile); | ||
| // Make sure that the wal path is not the same as archived Dir path. | ||
| assertNotEquals(emptyLogFile.toString(), archivePath.toString()); | ||
| assertTrue(fs.exists(archivePath)); | ||
| fs.truncate(archivePath, 0); | ||
| // make sure the size of the wal file is 0. | ||
| assertEquals(0, fs.getFileStatus(archivePath).getLen()); | ||
| } | ||
|
|
||
| ReplicationSourceManager mockSourceManager = Mockito.mock(ReplicationSourceManager.class); | ||
| ReplicationSource source = Mockito.mock(ReplicationSource.class); | ||
| when(source.isPeerEnabled()).thenReturn(true); | ||
| when(mockSourceManager.getTotalBufferUsed()).thenReturn(new AtomicLong(0)); | ||
|
|
||
| // Start the reader thread. | ||
| createReader(false, localConf); | ||
| // Wait for the replication queue size to be 1. This means that we have handled | ||
| // 0 length wal from oldWALs directory. | ||
| Waiter.waitFor(localConf, 10000, | ||
| (Waiter.Predicate<Exception>) () -> logQueue.getQueueSize(fakeWalGroupId) == 1); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage here to not use try-with-resources but a try finally?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to use
entryStreamobject in handleEofException method. If I use try-wth-resources, I don't have access toentryStreamin catch block.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so if we could move the main logic in WALEntryStream.getArchivedLog to a util class, then we do not need try finally then? In general, I do not think we should expose WALEntryStream.getArchivedLog directly. Let's add a public static method in AbstractFSWALProvider, just below the getWALArchiveDirectoryName method. I think this is the right place for holding a public method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Apache9 for the review. Didn't know a utility method AbstractFSWALProvider#getArchivedLogPath already exists.
Removed all the changes to make WALEntryStream class. Please review again.