Skip to content

Conversation

@jhungund
Copy link
Contributor

@jhungund jhungund commented Sep 16, 2024

During the retrievel of bucket-cache from persistence, it was observed that, if an exception, other than the IOException occurs, the exception is not logged and also the retrieval thread exits leaving the bucket cache in an uninitialised state, leaving it unusable.

This change, enables the retrieval thread to peint all types of exceptions and also reinitialises the bucket cache and makes it reusable.

Unfortunately, the exception was seen due to a parallel execution of eviction during the execution persistence of backing map. Hence, this use-case may not be tested via a unit test.

Change-Id: I81b7f5fe06945702bbc59df96d054f95f03de499

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@wchevreuil wchevreuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rather understand and solve the trigger, than put this bandaid here.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@jhungund
Copy link
Contributor Author

We should rather understand and solve the trigger, than put this bandaid here.
@wchevreuil , Unfortunately, since the exception gets missed and is not printed. Hence, hence, to understand this further, we need this change which will print the exception that is being missed.
The trace of the exception will reveal exact details of the issue which can then, be handled.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

…t-cache from persistence.

In certain scenarios, where there is discrepancy between the number of chunks persisted
in the file and the number of chunks stored in the persistence file. This is because the bucket
cache may be operated upon in parallel.

During the retrievel of bucket-cache from persistence, it was observed that, if an exception,
other than the IOException occurs, the exception is not logged and also the retrieval thread
exits leaving the bucket cache in an uninitialised state, leaving it unusable.

With this change, the retrieval code does not rely on the metadata information (number of chunks)
and instead, it reads from the file-stream as long as the data is available to be read.

This change, enables the retrieval thread to print all types of exceptions and also reinitialises
the bucket cache and makes it reusable.

Change-Id: I81b7f5fe06945702bbc59df96d054f95f03de499
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

} catch (IOException ioex) {
LOG.error("Can't restore from file[{}] because of ", persistencePath, ioex);
} catch (Throwable ex) {
LOG.error("Can't restore from file[{}] because of ", persistencePath, ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since we are handling the error, shouldn't this be a warn? And let's explain the cache will be reset and reload would happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Comment on lines 83 to 89
if (firstChunkPersisted == false) {
// Persist all details along with the first chunk into BucketCacheEntry
BucketProtoUtils.toPB(cache, builder.build()).writeDelimitedTo(fos);
firstChunkPersisted = true;
} else {
// Directly persist subsequent backing-map chunks.
builder.build().writeDelimitedTo(fos);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe for consistency and simplicity, we should just write the BucketCacheProtos.BucketCacheEntry before the for loop, then the backmap chunks only within this loop. That way we wouldn't need this firstChunkPersisted thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Change-Id: Icf6cdcf829e7d4bd16f50f48fd02059b415f2d09
@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@wchevreuil wchevreuil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to also address the spotless issues.

Comment on lines 78 to 88
// Create the first chunk and persist all details along with it.
while (entrySetIter.hasNext()) {
blockCount++;
Map.Entry<BlockCacheKey, BucketEntry> entry = entrySetIter.next();
addToBuilder(entry, entryBuilder, builder);
if (blockCount % chunkSize == 0 || (blockCount == backingMapSize)) {
chunkCount++;
if (chunkCount == 1) {
// Persist all details along with the first chunk into BucketCacheEntry
BucketProtoUtils.toPB(cache, builder.build()).writeDelimitedTo(fos);
} else {
// Directly persist subsequent backing-map chunks.
break;
}
}
builder.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need two while loops? just do the BucketProtoUtils.toPB(cache, builder.build()).writeDelimitedTo(fos); before the loop. That would write all the BucketCacheEntry meta info at the beginning of the file with an empty backingMap at this point, but that's fine as we'll write all backingMap chunks subsequently and this code would be cleaner, I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

Comment on lines 1646 to 1651
BucketCacheProtos.BucketCacheEntry firstChunk =
BucketCacheProtos.BucketCacheEntry.parseDelimitedFrom(in);
parseFirstChunk(firstChunk);

// Subsequent chunks have the backingMap entries.
for (int i = 1; i < numChunks; i++) {
LOG.info("Reading chunk no: {}", i + 1);
int numChunks = 0;
while (in.available() > 0) {
parseChunkPB(BucketCacheProtos.BackingMap.parseDelimitedFrom(in),
firstChunk.getDeserializersMap());
LOG.info("Retrieved chunk: {}", i + 1);
numChunks++;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the way we write to the file as I suggested, so that we have only the BucketCacheProtos.BucketCacheEntry at the beginning followed by all BucketCacheProtos.BackingMap chunks , then we should change the naming here, firstChunk should now be entry. And these parseFirstChunk, parseChunkPB are not really parsing anything (parsing is delegated to the proto utils), but rather updating the cache index structures, so we should rename it to something like updateCacheIndex.

Also looking at parseFirstChunck and parseChunk, we should replace duplicate code inside parseFirstChunck by call to parseChunk. Or since parseFirstChunk is only used here, we could just get rid of it and simply do:

fullyCachedFiles.clear(); fullyCachedFiles.putAll(BucketProtoUtils.fromPB(entry.getCachedFilesMap()));

Whilst the backingMap and blocksByHfile would get updated all in the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

Change-Id: I97936a683673ff89e04a15bc66542fb93a32fe8a
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 8s master passed
+1 💚 compile 3m 6s master passed
+1 💚 checkstyle 0m 35s master passed
+1 💚 spotbugs 1m 33s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 55s the patch passed
+1 💚 compile 3m 6s the patch passed
+1 💚 javac 3m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 35s the patch passed
+1 💚 spotbugs 1m 36s the patch passed
+1 💚 hadoopcheck 10m 48s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 42s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
36m 10s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/7/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6250
JIRA Issue HBASE-28839
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 479e2404e5d0 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 58ebef3
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/7/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 6s master passed
+1 💚 compile 1m 15s master passed
+1 💚 javadoc 0m 31s master passed
+1 💚 shadedjars 6m 26s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 13s the patch passed
+1 💚 compile 1m 1s the patch passed
+1 💚 javac 1m 1s the patch passed
+1 💚 javadoc 0m 28s the patch passed
+1 💚 shadedjars 5m 53s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 217m 32s /patch-unit-hbase-server.txt hbase-server in the patch failed.
245m 51s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/7/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6250
JIRA Issue HBASE-28839
Optional Tests javac javadoc unit compile shadedjars
uname Linux 5a31b3419c94 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 58ebef3
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/7/testReport/
Max. process+thread count 5686 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/7/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 20s master passed
+1 💚 compile 1m 3s master passed
+1 💚 javadoc 0m 29s master passed
+1 💚 shadedjars 5m 58s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 12s the patch passed
+1 💚 compile 1m 2s the patch passed
+1 💚 javac 1m 2s the patch passed
+1 💚 javadoc 0m 29s the patch passed
+1 💚 shadedjars 5m 53s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 219m 52s hbase-server in the patch passed.
245m 41s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/8/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6250
JIRA Issue HBASE-28839
Optional Tests javac javadoc unit compile shadedjars
uname Linux cbcb8dc0e810 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 409554d
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/8/testReport/
Max. process+thread count 5266 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/8/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@jhungund
Copy link
Contributor Author

Hi @wchevreuil, the failing test seems to have passed in the the rerun.
https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6250/8/testReport/org.apache.hadoop.hbase.security/TestSecureIPC/

I have addressed your review comments. Please take a look.
Thanks,
Janardhan

.putAllDeserializers(CacheableDeserializerIdManager.save())
.putAllCachedFiles(toCachedPB(cache.fullyCachedFiles)).setBackingMap(backingMap)
.putAllCachedFiles(toCachedPB(cache.fullyCachedFiles))
.setBackingMap(backingMapBuilder.build())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pass an empty backingmap here otherwise we see an exception of an incomplete object. Hence, we just pass an empty backing map along with the metadata. Subsequently, we persist all entries of the backing map in chunks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a builder, just pass an empty map. Or, since we don't persist any map entries within the BucketCacheEntry proto object, just remove the map from the protobuf message. We already changed the persistent file format on HBASE-28805, as long as this can land on all related branches whilst HBASE-28805 has not made into any release, we are free to change the format.

Copy link
Contributor Author

@jhungund jhungund Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wchevreuil, we will need to retain the old format of protobuf message to maintain the backward compatibility. Hence, the we cannot change the protobuf message. We can reuse this protobuf message by persisting an empty backing map instead of introducing a new version.

.putAllDeserializers(CacheableDeserializerIdManager.save())
.putAllCachedFiles(toCachedPB(cache.fullyCachedFiles)).setBackingMap(backingMap)
.putAllCachedFiles(toCachedPB(cache.fullyCachedFiles))
.setBackingMap(backingMapBuilder.build())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a builder, just pass an empty map. Or, since we don't persist any map entries within the BucketCacheEntry proto object, just remove the map from the protobuf message. We already changed the persistent file format on HBASE-28805, as long as this can land on all related branches whilst HBASE-28805 has not made into any release, we are free to change the format.

@wchevreuil wchevreuil merged commit bd1bee0 into apache:master Sep 23, 2024
wchevreuil pushed a commit that referenced this pull request Sep 23, 2024
…t-cache from persistence. (#6250)

Signed-off-by: Wellington Chevreuil <[email protected]>
jhungund added a commit to janardhanrh/hbase that referenced this pull request Sep 24, 2024
…t-cache from persistence. (apache#6250)

Signed-off-by: Wellington Chevreuil <[email protected]>
wchevreuil pushed a commit that referenced this pull request Sep 24, 2024
…t-cache from persistence. (#6250) (#6288)

Signed-off-by: Wellington Chevreuil <[email protected]>
szucsvillo pushed a commit to szucsvillo/hbase that referenced this pull request Feb 7, 2025
HBASE-28839: Handle all types of exceptions during retrieval of bucket-cache from persistence. (apache#6250)

Signed-off-by: Wellington Chevreuil <[email protected]>
Change-Id: I1e8147bffcc456a59375ec67471e736079e5e107
(cherry picked from commit 2e0b01f)
stoty pushed a commit to stoty/hbase that referenced this pull request Nov 22, 2025
HBASE-28839: Handle all types of exceptions during retrieval of bucket-cache from persistence. (apache#6250)

Signed-off-by: Wellington Chevreuil <[email protected]>
Change-Id: I1e8147bffcc456a59375ec67471e736079e5e107
(cherry picked from commit 2e0b01f)
wchevreuil pushed a commit to wchevreuil/hbase that referenced this pull request Dec 28, 2025
…t-cache from persistence. (apache#6250) (apache#6288)

Signed-off-by: Wellington Chevreuil <[email protected]>
Change-Id: Ied978410cc7d353e675144b877365465fcf96c67
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants