Skip to content

Conversation

@ChenSammi
Copy link
Contributor

@ChenSammi
Copy link
Contributor Author

Hey @nandakumar131 , would you help to review the patch? The failed flaky test is irrelevant. They can pass locally.

@adoroszlai
Copy link
Contributor

The failed flaky test is irrelevant.

May be irrelevant here, but please see #3376 (comment) and below.

@ChenSammi ChenSammi requested a review from nandakumar131 May 10, 2022 02:07
@adoroszlai
Copy link
Contributor

@neils-dev if you would like to review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChenSammi. I see that retrievePassword method now not only throws an exception when the s3 user cannot be authenticated but now also when the OM is not the leader. With this new InvaidToken exception (OMNotLeader / NotReady exception), the users of the retrievePassword may not accurately process the error.
Do we need to change the S3SecurityUtil.validateS3Credential handling of an exception thrown by its call to retrievePassword?

Copy link
Contributor Author

@ChenSammi ChenSammi May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Will take care of that in next patch.

Copy link
Contributor

@neils-dev neils-dev May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With validateS3Credential() checking the leader status when validating the password (S3Token) from the OMRequest, is the OzoneManagerRatisUtils.checkLeaderStatus() in the line above it unnecessary, redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong. With HDDS-4440 and HDDS-5881, S3 authentication will be carried as part of the OMRequest. It doesn't leverage the underline hadoop RPC SASL authentication, and these requests will not be processed by SASL. So the leader check is still needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChenSammi, thanks for the comment and for the help you've given me when I started with ozone and ozone development.

With HDDS-4440 and HDDS-5881, S3 authentication will be carried as part of the OMRequest.

Yes, authentication is done using the S3Authentication fields of the OMRequest. This processRequest() in the ProtocalServerSideTranslatorPB is part of the path for processing requests with the authentication through S3SecurityUtil.validateS3Credential.

and these requests will not be processed by SASL. So the leader check is still needed here.

Yes, the leader check is needed here, however isn't the leader check done along with the call to S3SecurityUtil.validateS3Credentials as well? So in this code path the leader is checked twice. Once just prior to the call to S3SecurityUtil.ValidateS3Credential and once inside the call (within retrievePassword). Is the call to OzoneManagerRatisUtils.checkLeaderStatus(ozoneManager) still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Will remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why go through this nesting? alternatively we could ozoneManager.checkLeaderStatus

Suggested change
OzoneManagerRatisUtils.checkLeaderStatus(ozoneManager);
ozoneManager.checkLeaderStatus();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All protocol service API need throws ServiceException, while ozoneManager.checkLeaderStatus throws OMNotLeaderException and OMLeaderNotReadyException. So we need OzoneManagerRatisUtils.checkLeaderStatus wrapper to covert exceptions.

ozoneManager.checkLeaderStatus is used both in SASL path authentication and protocol level authentication. It throws the core exception. Let the caller wrap the core exception into desired outer exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neils-dev, could you help to take another look?

Copy link
Contributor

@neils-dev neils-dev May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ChenSammi, thanks for your updated comment on handling the leader exceptions.

All protocol service API need throws ServiceException, while ozoneManager.checkLeaderStatus throws OMNotLeaderException and OMLeaderNotReadyException. So we need OzoneManagerRatisUtils.checkLeaderStatus wrapper to covert exceptions.

Having that, the OzoneManagerProtocolServerServerSideTranslatorPB.processRequest needs to throw a ServiceException should isRatisEnabled == true and it is not a leader. There looks to be two cases where this happens in processRequest:

  1. For the case that isRatisEnabled == true and s3Auth == false, that's what happens with your leader check OzoneManagerRatisUtils.checkLeaderStatus. It is handled.
  2. For the case that isRatisEnabled == true and request.hasS3Authentication == true it needs to be handled so that the ServiceException is thrown in the event of a leadership check exception. This looks to be currently wrapping the leadership exception in an IOException (InvalidToken) and returned to the caller through the error OmResponse. It is handled by the validateS3Credential (missed it on initial read). Thanks.

The javadoc for validateS3Credential can be updated to include the ServiceException that may be thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @neils-dev, sorry for the late response. Just want to confirm that the comments left to address is the following, right?

The javadoc for validateS3Credential can be updated to include the ServiceException that may be thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neils-dev , the javadoc is updated. Would you like to take another look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi. Looks great. I tested the changes on the secure docker development clusters as well without issues - clients handle OMException raised as expected for authentication errors (S3) and client failovers for raised ServiceException wrapped OMNotALeader exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neils-dev . Do you think the PR can be merged now?

Copy link
Contributor

@neils-dev neils-dev Jun 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (I am unable to merge.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OzoneManagerRatisUtils.checkLeaderStatus(ozoneManager);
ozoneManager.checkLeaderStatus();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this method.

@neils-dev
Copy link
Contributor

The failed flaky test is irrelevant.

May be irrelevant here, but please see #3376 (comment) and below.

Applying Ratis patch [RATIS-1481], I've been able to run the TestOzonePrepare#testPrepareDownedOM without failure. see comment #3186 (comment).

Copy link
Contributor

@swagle swagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@ChenSammi ChenSammi merged commit 3397db9 into apache:master Jun 7, 2022
@ChenSammi
Copy link
Contributor Author

Thanks @adoroszlai @kerneltime @neils-dev @swagle for the code review.

errose28 added a commit to errose28/ozone that referenced this pull request Jun 7, 2022
* master: (87 commits)
  HDDS-6686. Do Leadship check before SASL token verification. (apache#3382)
  HDDS-4364: [FSO]List FileStatus : startKey can be a non-existed path (apache#3481)
  HDDS-6091. Add file checksum to OmKeyInfo (apache#3201)
  HDDS-6706. Exposing Volume Information Metrics to the DataNode UI (apache#3478)
  HDDS-6759: Add listblock API in MockDatanodeStorage (apache#3452)
  HDDS-5821 Container cache management for closing RockDB  (apache#3426)
  HDDS-6683. Refactor OM server bucket layout configuration usage (apache#3477)
  HDDS-6824. Revert changes made in proto.lock by HDDS-6768. (apache#3480)
  HDDS-6811. Bucket create message with layout type (apache#3479)
  HDDS-6810. Add a optional flag to trigger listStatus as part of listKeys for FSO buckets. (apache#3461)
  HDDS-6828. Revert RockDB version pending leak fixes (apache#3475)
  HDDS-6764: EC: DN ability to create RECOVERING containers for EC reconstruction. (apache#3458)
  HDDS-6795: EC: PipelineStateMap#addPipeline should not have precondition checks post db updates (apache#3453)
  HDDS-6823. Intermittent failure in TestOzoneECClient#testExcludeOnDNMixed (apache#3476)
  HDDS-6820. Bucket Layout Post-Finalization Validators for ACL Requests. (apache#3472)
  HDDS-6819. Add LEGACY to AllowedBucketLayouts in CreateBucketHandler (apache#3473)
  HDDS-4859. [FSO]ListKeys: seek all the files/dirs from startKey to keyPrefix (apache#3466)
  HDDS-6705 Add metrics for volume statistics including disk capacity, usage, Reserved (apache#3430)
  HDDS-6474. Add test to cover the FSO bucket list status with beyond batch boundary and cache. (apache#3379). Contributed by aswinshakil
  HDDS-6280. Support Container Balancer HA (apache#3423)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants