-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4441. Add metrics for ACL related operations.(Addendum for HA). #1584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is beyond my comfort zone. @bharatviswa504 would you like to review this since it touches OM HA? |
bharatviswa504
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ayushtkn for the work, overall looks good to me except one question.
How do you think if we move the acl-metric code to true branch of onComplete() of every requests ? (Feel free to correct me if I miss something, thanks)
| @Override | ||
| public OMClientResponse validateAndUpdateCache(OzoneManager ozoneManager, | ||
| long trxnLogIndex, OzoneManagerDoubleBufferHelper omDoubleBufferHelper) { | ||
| ozoneManager.getMetrics().incNumAddAcl(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this line after true operationResult branch of onComplete() at line-120 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx @cxorm for the review. In onComplete(), there doesn't seems to be any instance of ozoneManager available, and I quickly had a check over, couple of other files, they too were incrementing metrics in validateAndUpdateCache, I think we can be in sync with them.
Let me know, if you are ok going ahead this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx @cxorm for the review. In
onComplete(), there doesn't seems to be any instance ofozoneManageravailable, and I quickly had a check over, couple of other files, they too were incrementing metrics invalidateAndUpdateCache, I think we can be in sync with them.
Thanks @ayushtkn for the comment.
IMHO the onComplete() is called by super class validateAndUpdateCache (for example, in this part its super class is OMBucketAclRequest), and it has instance of ozoneManager.
An operation scenario is that when we call ozoneManager.getMetrics().incNumAddAcl() before super.validateAndUpdateCache() and failure occurs in validateAndUpdateCache(), the incNumAddAcl() shouldn't take effect, but it did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanx @cxorm
Are you talking about this onComplete?
https://github.com/apache/ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/acl/OMVolumeAddAclRequest.java#L114
If not can you point me to that. It doesn't have ozonemanager
An operation scenario is that when we call ozoneManager.getMetrics().incNumAddAcl() before super.validateAndUpdateCache() and failure occurs in validateAndUpdateCache(), the incNumAddAcl() shouldn't take effect, but it did.
Aren't these metrics suppose to track the number of calls? not just success?
validateAndUpdateCache too has a metrics getting incremented, omMetrics.incNumVolumeUpdates();, if a failure occurs post that, it isn't bothered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reference from HDFS namenode.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java#L1960
Here too, the metrics is incremented before and the execution is done post that, even if the call fails , the metrics shall still track that call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Ozone, we increment these metrics represents number of operations, (Not the number of successful operations)
IMHO, this approach matches with all other requests, and LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bharatviswa504 for the clarification (and sorry for my previous view.)
I'm +1 for the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ayushtkn for the work and discussion, and @bharatviswa504 for the clarification and review.
I'm merging it.
* master: (53 commits) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) HDDS-4442. Disable the location information of audit logger to reduce overhead (apache#1567) HDDS-4441. Add metrics for ACL related operations.(Addendum for HA). (apache#1584) HDDS-4081. Create ZH translation of StorageContainerManager.md in doc. (apache#1558) HDDS-4080. Create ZH translation of OzoneManager.md in doc. (apache#1541) HDDS-4079. Create ZH translation of Containers.md in doc. (apache#1539) HDDS-4184. Add Features menu for Chinese document. (apache#1547) HDDS-4235. Ozone client FS path validation is not present in OFS. (apache#1582) HDDS-4338. Fix the issue that SCM web UI banner shows "HDFS SCM". (apache#1583) HDDS-4337. Implement RocksDB options cache for new datanode DB utilities. (apache#1544) HDDS-4083. Create ZH translation of Recon.md in doc (apache#1575) HDDS-4453. Replicate closed container for random selected datanodes. (apache#1574) HDDS-4408: terminate Datanode when Datanode State Machine Thread got uncaught exception. (apache#1533) HDDS-4443. Recon: Using Mysql database throws exception and fails startup (apache#1570) HDDS-4315. Use Epoch to generate unique ObjectIDs (apache#1480) HDDS-4455. Fix typo in README.md doc (apache#1578) HDDS-4441. Add metrics for ACL related operations. (apache#1571) HDDS-4437. Avoid unnecessary builder conversion in setting volume Quota/Owner request (apache#1564) HDDS-4417. Simplify Ozone client code with configuration object (apache#1542) HDDS-4363. Add metric to track the number of RocksDB open/close operations. (apache#1530) ...
What changes were proposed in this pull request?
Acl metrics in case of HA
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4441
How was this patch tested?
Added UT