-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29372: Meta cache clear metrics and logs shouldn't use "UnknownException" #6961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| if (clearServerCache) { | ||
| updateCachedLocations(server, regionName, row, | ||
| ClientExceptionsUtil.isMetaClearingException(t) ? null : t); | ||
| updateCachedLocations(server, regionName, row, t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also solves the frustration of seeing "UnknownException" when inspecting meta cache clear exception metrics. This has made it quite difficult to track down what triggered the meta cache clear.
I think it's always better to provide more context than less. Even if an exception is meta cache clearing (though it will be now), I'd still prefer to know the exact exception type that cleared the meta cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 would be good to preserve the exception for updateCachedLocations
Since we currently pass null to updateCachedLocations if we have a meta cache clearing exception, does that means that we never update the cache clearing exception metric properly for cache clears coming from receiveGlobalFailure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we'll do is basically "mask" the cache clearing exception by report an UnknownException. The code for that lives in the metrics class. It's annoying b/c that coupled with the lack of any logging in this code path makes it really difficult to determine what caused these meta cache clears.
| Retry canRetry = errorsByServer.canTryMore(numAttempt) ? Retry.YES : Retry.NO_RETRIES_EXHAUSTED; | ||
| boolean clearServerCache = false; | ||
|
|
||
| if (!(t instanceof RejectedExecutionException)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforces the constraints added in https://issues.apache.org/jira/browse/HBASE-27491
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if you instead push RejectedExecutionException down into ClientExceptionsUtil.isMetaClearingException.
How about adding another collection of execution-exceptions for the family of various ExecutorService interaction errors, like is done with networking/connection exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted, adding that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hgromer I think you dropped my earlier comment.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java
Show resolved
Hide resolved
| if (clearServerCache) { | ||
| updateCachedLocations(server, regionName, row, | ||
| ClientExceptionsUtil.isMetaClearingException(t) ? null : t); | ||
| updateCachedLocations(server, regionName, row, t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 would be good to preserve the exception for updateCachedLocations
Since we currently pass null to updateCachedLocations if we have a meta cache clearing exception, does that means that we never update the cache clearing exception metric properly for cache clears coming from receiveGlobalFailure?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java
Show resolved
Hide resolved
droudnitsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the new scope of the work is
- Pass the meta clearing exception to
updateCachedLocationsso we can properly capture the exception in the client metric instead of unknown exception - Pushing
RejectedExecutionExceptiondown intoisMetaClearingException
I am wondering if it make sense to split these into two different issues given the change in scope? My initial thought is in agreement that RejectedExecutionException should not be meta cache clearing, but maybe it would be beneficial to handle that change independently of the client metric fix , there may be implications beyond batch operations for that change.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java
Show resolved
Hide resolved
It'd make sense for me to have that be it's own thing (one I probably wouldn't own to be completely honest), cc @ndimiduk since I was following his suggestion |
…t 'UnknownException'
|
I've updated the PR to soleley add some logging and emitting the real meta cache clearing exception in meta cache clear metrics. cc @ndimiduk @droudnitsky |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
The hadoopcheck issue seems unrelated |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
ndimiduk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is devilishly difficult to follow. There's a complex commit history around this behavior that includes attempted refactors and reverts. It seems like the test harness is pretty good at catching issues though.
It looks like you're partially undoing #4914, which I think is fine. Reading though that PR, I believe the correct implementation would have been to push knowledge of the RejectedExecutionException down into ClientExceptionsUtil#isMetaClearingException(). Instead it externalised the logic up here in AsyncRequestFutureImpl, making it more difficult to follow. I think you haven't done enough and should go ahead and push the exception check down further.
Do you have any suggestions for we we verify that this exception/null pass-down hasn't subtly broken something?
| Retry canRetry = errorsByServer.canTryMore(numAttempt) ? Retry.YES : Retry.NO_RETRIES_EXHAUSTED; | ||
| boolean clearServerCache; | ||
|
|
||
| if (t instanceof RejectedExecutionException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should add RejectedExecutionException to the predicate in ClientExceptionsUtil#isMetaClearingException(). Seems dicy to have a special case test here. Or am I missing some wider context?
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java
Show resolved
Hide resolved
Oh. I see now that you and droudnitsky are attempting to cut scope. I don't feel great about leaving the |
|
The refactor on #578 looked like a good effort. Too bad it had to be reverted. |
droudnitsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndimiduk I agree it would have made sense for #4914 to introduce the change through isMetaClearingException. My suggestion to split the scope here is driven by the complexity in this codepath , my thinking being that it'd be easier to reason about the metrics fix and pushing RejectedExecutionException down into isMetaClearingException in two seperate PRs, the latter change is wider in scope. I cannot think of any reason not to push the exception down into isMetaClearingException but it may benefit from some thinking/review independent of the metrics fix, thats just my personal thought.
Looks good to me thank you Hernan
|
Okay fair enough, let's take this as it is. |
…xception" (apache#6961) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…xception" (apache#6961) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…xception" (apache#6961) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
…xception" (apache#6961) (#180) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]> Co-authored-by: Hernan Gelaf-Romer <[email protected]>
…xception" (apache#6961) (#180) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]> Co-authored-by: Hernan Gelaf-Romer <[email protected]>
…xception" (apache#6961) (#180) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]> Co-authored-by: Hernan Gelaf-Romer <[email protected]>
…xception" (apache#6961) Co-authored-by: Hernan Gelaf-Romer <[email protected]> Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Ray Mattingly <[email protected]>
No description provided.