Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -219,13 +219,14 @@ public void run() {
} catch (IOException e) {
// The service itself failed . It may be an error coming from the communication
// layer, but, as well, a functional error raised by the server.
receiveGlobalFailure(multiAction, server, numAttempt, e, true);

receiveGlobalFailure(multiAction, server, numAttempt, e);
return;
} catch (Throwable t) {
// This should not happen. Let's log & retry anyway.
LOG.error("id=" + asyncProcess.id + ", caught throwable. Unexpected."
+ " Retrying. Server=" + server + ", tableName=" + tableName, t);
receiveGlobalFailure(multiAction, server, numAttempt, t, true);
receiveGlobalFailure(multiAction, server, numAttempt, t);
return;
}
if (res.type() == AbstractResponse.ResponseType.MULTI) {
Expand Down Expand Up @@ -570,7 +571,6 @@ private RegionLocations findAllLocationsOrFail(Action action, boolean useCache)
*/
void sendMultiAction(Map<ServerName, MultiAction> actionsByServer, int numAttempt,
List<Action> actionsForReplicaThread, boolean reuseThread) {
boolean clearServerCache = true;
// Run the last item on the same thread if we are already on a send thread.
// We hope most of the time it will be the only item, so we can cut down on threads.
int actionsRemaining = actionsByServer.size();
Expand Down Expand Up @@ -606,15 +606,14 @@ void sendMultiAction(Map<ServerName, MultiAction> actionsByServer, int numAttemp
LOG.warn("id=" + asyncProcess.id + ", task rejected by pool. Unexpected." + " Server="
+ server.getServerName(), t);
// Do not update cache if exception is from failing to submit action to thread pool
clearServerCache = false;
} else {
// see #HBASE-14359 for more details
LOG.warn("Caught unexpected exception/error: ", t);
}
asyncProcess.decTaskCounters(multiAction.getRegions(), server);
// We're likely to fail again, but this will increment the attempt counter,
// so it will finish.
receiveGlobalFailure(multiAction, server, numAttempt, t, clearServerCache);
receiveGlobalFailure(multiAction, server, numAttempt, t);
}
}
}
Expand Down Expand Up @@ -764,13 +763,24 @@ private void failAll(MultiAction actions, ServerName server, int numAttempt,
* @param t the throwable (if any) that caused the resubmit
*/
private void receiveGlobalFailure(MultiAction rsActions, ServerName server, int numAttempt,
Throwable t, boolean clearServerCache) {
Throwable t) {
errorsByServer.reportServerError(server);
Retry canRetry = errorsByServer.canTryMore(numAttempt) ? Retry.YES : Retry.NO_RETRIES_EXHAUSTED;
boolean clearServerCache;

if (t instanceof RejectedExecutionException) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should add RejectedExecutionException to the predicate in ClientExceptionsUtil#isMetaClearingException(). Seems dicy to have a special case test here. Or am I missing some wider context?

clearServerCache = false;
} else {
clearServerCache = ClientExceptionsUtil.isMetaClearingException(t);
}

// Do not update cache if exception is from failing to submit action to thread pool
if (clearServerCache) {
cleanServerCache(server, t);

if (LOG.isTraceEnabled()) {
LOG.trace("Cleared meta cache for server {} due to global failure {}", server, t);
}
}

int failed = 0;
Expand All @@ -779,12 +789,8 @@ private void receiveGlobalFailure(MultiAction rsActions, ServerName server, int
for (Map.Entry<byte[], List<Action>> e : rsActions.actions.entrySet()) {
byte[] regionName = e.getKey();
byte[] row = e.getValue().get(0).getAction().getRow();
// Do not use the exception for updating cache because it might be coming from
// any of the regions in the MultiAction and do not update cache if exception is
// from failing to submit action to thread pool
if (clearServerCache) {
updateCachedLocations(server, regionName, row,
ClientExceptionsUtil.isMetaClearingException(t) ? null : t);
updateCachedLocations(server, regionName, row, t);
Copy link
Contributor Author

@hgromer hgromer May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also solves the frustration of seeing "UnknownException" when inspecting meta cache clear exception metrics. This has made it quite difficult to track down what triggered the meta cache clear.

I think it's always better to provide more context than less. Even if an exception is meta cache clearing (though it will be now), I'd still prefer to know the exact exception type that cleared the meta cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 would be good to preserve the exception for updateCachedLocations

Since we currently pass null to updateCachedLocations if we have a meta cache clearing exception, does that means that we never update the cache clearing exception metric properly for cache clears coming from receiveGlobalFailure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we'll do is basically "mask" the cache clearing exception by report an UnknownException. The code for that lives in the metrics class. It's annoying b/c that coupled with the lack of any logging in this code path makes it really difficult to determine what caused these meta cache clears.

}
for (Action action : e.getValue()) {
Retry retry =
Expand Down