-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-5075. [SCM HA Security] Remove code of not starting ozone services when Security is enabled on SCM HA cluster #2132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…es when Security is enabled on SCM HA cluster.
|
Thanks for creating this PR @bharatviswa504 Does it mean that security of SCM-HA is fully finished and production ready? |
@elek |
|
@adoroszlai I have added for almost all SCM HA security tasks the [SCM HA Security] prefix, so following the same notion for this task also.Hope you are okay with it? |
|
Thanks the answer @bharatviswa504
Based on my understanding without HDDS-5060 it can not be secure in production. We shouldn't release the current code it without this safety check (or we need to issue a cve... ;-) ) Is there any specific reason to remove this check before having full security? I mean: if it's required for acceptance tests, we can directly configure this value in docker-config files. Or do we have any other motivations? I have a strange feeling about committing something to the master which makes the branch non-releaseable... |
Is there any release planned from master, before next release we will make sure we shall have this fix. I belive we don't need to issue a cve for this, as we shall take care of this issue before the next release from apache master branch.
Currently, in docker-tests we have that way. If you see this PR, this PR is removing the config in docker tests.
The motivation for this is in Cloudera CM integration we don't need this additional config for CM integration of SCM HA. |
Sure, it's OK, although I think it makes titles too long without adding much value. Both SCM HA and security are already part of the title in this case. Labels also help categorize PRs and Jira issues. I just wanted to add the missing Jira ID. |
mukul1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
|
I think the release is far away. HDDS-5060 is required only if log purging is enabled for SCM HA which is currently disabled by default. I prefer to merge this change right away. Thanks @elek , @elek and @adoroszlai for suggestions/feedbacks. |
I am not sure. 1.1.0 is merged a long time ago, so it seems to be reasonable to publish sg. in the next 1-2 months. (IMHO).
I have a very mixed feeling about this motivation. If it's a problem on the vendor side, I strongly recommend fixing it on the vendor side instead of Apache master. Especially if it can be done easily. There are other users of Apache Ozone and I know at least one who uses snapshots from the master branch. I think master should be as stable and secure as possible. This is the reason why we use feature branches to stabilize all the works outside the master.
This is a very good argument, and I am fine with that if this is the case. Can you please share how is this log purging used? Isn't it required for bootstrapping/restoring Ratis nodes? Is this grpc endpoint turned off by default? How is ratis snapshot moved between nodes when this is turned off?
My preference is keeping master always as secure and stable as possible. Our 1.1.0 release was delayed with 3-4 months because earlier we used snapshot dependencies from Ratis. While I think it was a good decision earlier, my preference is improving our practice and keeping the master always releasable. (feature flags are good, but we should make it impossible to start something which is supposed to be secure but not). |
Log purging can be enabled in SCM HA by setting the config "ozone.scm.ha.ratis.log.purge.enabled" set to true which is set to false by default. For more details, please refer SCMHAConfiguration.java class. Unless, the logs are purged, ratis uses the same appendLog protocol using grpc to replicate the scm metadata. Unless, the logs are purged, no install snapshot notification will get initiated to the follower nodes. There is no requiemnet to moves across ratis snapshots. #2155 will fix the install snapshot mechanism which incoroporates some ratis fixes to make the whole install snapshot mechanism work. Once this gets done, the secure grpc channel work can start. |
What changes were proposed in this pull request?
Now SCM HA security is implemented, we can remove additional config of not starting ozone services when SCM HA is enabled on a secure cluster.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-5075
How was this patch tested?
Existing docker tests.