Skip to content

Conversation

@ayushtkn
Copy link
Member

What changes were proposed in this pull request?

Acl metrics in case of HA

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4441

How was this patch tested?

Added UT

@jojochuang
Copy link
Contributor

This is beyond my comfort zone. @bharatviswa504 would you like to review this since it touches OM HA?

Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM.

Copy link
Member

@cxorm cxorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ayushtkn for the work, overall looks good to me except one question.

How do you think if we move the acl-metric code to true branch of onComplete() of every requests ? (Feel free to correct me if I miss something, thanks)

@Override
public OMClientResponse validateAndUpdateCache(OzoneManager ozoneManager,
long trxnLogIndex, OzoneManagerDoubleBufferHelper omDoubleBufferHelper) {
ozoneManager.getMetrics().incNumAddAcl();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this line after true operationResult branch of onComplete() at line-120 ?

Copy link
Member Author

@ayushtkn ayushtkn Nov 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @cxorm for the review. In onComplete(), there doesn't seems to be any instance of ozoneManager available, and I quickly had a check over, couple of other files, they too were incrementing metrics in validateAndUpdateCache, I think we can be in sync with them.

Let me know, if you are ok going ahead this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @cxorm for the review. In onComplete(), there doesn't seems to be any instance of ozoneManager available, and I quickly had a check over, couple of other files, they too were incrementing metrics in validateAndUpdateCache, I think we can be in sync with them.

Thanks @ayushtkn for the comment.

IMHO the onComplete() is called by super class validateAndUpdateCache (for example, in this part its super class is OMBucketAclRequest), and it has instance of ozoneManager.

An operation scenario is that when we call ozoneManager.getMetrics().incNumAddAcl() before super.validateAndUpdateCache() and failure occurs in validateAndUpdateCache(), the incNumAddAcl() shouldn't take effect, but it did.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @cxorm
Are you talking about this onComplete?
https://github.com/apache/ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/volume/acl/OMVolumeAddAclRequest.java#L114

If not can you point me to that. It doesn't have ozonemanager

An operation scenario is that when we call ozoneManager.getMetrics().incNumAddAcl() before super.validateAndUpdateCache() and failure occurs in validateAndUpdateCache(), the incNumAddAcl() shouldn't take effect, but it did.

Aren't these metrics suppose to track the number of calls? not just success?

validateAndUpdateCache too has a metrics getting incremented, omMetrics.incNumVolumeUpdates();, if a failure occurs post that, it isn't bothered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference from HDFS namenode.
https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java#L1960

Here too, the metrics is incremented before and the execution is done post that, even if the call fails , the metrics shall still track that call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Ozone, we increment these metrics represents number of operations, (Not the number of successful operations)
IMHO, this approach matches with all other requests, and LGTM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bharatviswa504 for the clarification (and sorry for my previous view.)
I'm +1 for the patch.

Copy link
Member

@cxorm cxorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ayushtkn for the work and discussion, and @bharatviswa504 for the clarification and review.

I'm merging it.

@cxorm cxorm merged commit 585984e into apache:master Nov 16, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Nov 18, 2020
* master: (53 commits)
  HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585)
  HDDS-4442. Disable the location information of audit logger to reduce overhead (apache#1567)
  HDDS-4441. Add metrics for ACL related operations.(Addendum for HA). (apache#1584)
  HDDS-4081. Create ZH translation of StorageContainerManager.md in doc. (apache#1558)
  HDDS-4080. Create ZH translation of OzoneManager.md in doc. (apache#1541)
  HDDS-4079. Create ZH translation of Containers.md in doc. (apache#1539)
  HDDS-4184. Add Features menu for Chinese document. (apache#1547)
  HDDS-4235. Ozone client FS path validation is not present in OFS. (apache#1582)
  HDDS-4338. Fix the issue that SCM web UI banner shows "HDFS SCM". (apache#1583)
  HDDS-4337. Implement RocksDB options cache for new datanode DB utilities. (apache#1544)
  HDDS-4083. Create ZH translation of Recon.md in doc (apache#1575)
  HDDS-4453. Replicate closed container for random selected datanodes. (apache#1574)
  HDDS-4408: terminate Datanode when Datanode State Machine Thread got uncaught exception. (apache#1533)
  HDDS-4443. Recon: Using Mysql database throws exception and fails startup (apache#1570)
  HDDS-4315. Use Epoch to generate unique ObjectIDs (apache#1480)
  HDDS-4455. Fix typo in README.md doc (apache#1578)
  HDDS-4441. Add metrics for ACL related operations. (apache#1571)
  HDDS-4437. Avoid unnecessary builder conversion in setting volume Quota/Owner request (apache#1564)
  HDDS-4417. Simplify Ozone client code with configuration object (apache#1542)
  HDDS-4363. Add metric to track the number of RocksDB open/close operations. (apache#1530)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants