Skip to content

Conversation

@bharatviswa504
Copy link
Contributor

What changes were proposed in this pull request?

Now SCM HA security is implemented, we can remove additional config of not starting ozone services when SCM HA is enabled on a secure cluster.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5075

How was this patch tested?

Existing docker tests.

…es when Security is enabled on SCM HA cluster.
@elek
Copy link
Member

elek commented Apr 8, 2021

Thanks for creating this PR @bharatviswa504

Does it mean that security of SCM-HA is fully finished and production ready?

@adoroszlai adoroszlai changed the title [SCM HA Security] Remove code of not starting ozone services when Security is enabled on SCM HA cluster HDDS-5075. Remove code of not starting ozone services when Security is enabled on SCM HA cluster Apr 8, 2021
@bharatviswa504 bharatviswa504 changed the title HDDS-5075. Remove code of not starting ozone services when Security is enabled on SCM HA cluster HDDS-5075. [SCM HA Security] Remove code of not starting ozone services when Security is enabled on SCM HA cluster Apr 9, 2021
@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Apr 9, 2021

Thanks for creating this PR @bharatviswa504

Does it mean that security of SCM-HA is fully finished and production ready?

@elek
Yes, planned phase-1 development tasks are almost completed. We have one in-progress HDDS-5060.
And also we have a docker-compose ozone-secure-ha which starts SCM HA in a secure environment and CI is running tests in this env.

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Apr 9, 2021

@adoroszlai I have added for almost all SCM HA security tasks the [SCM HA Security] prefix, so following the same notion for this task also.Hope you are okay with it?

@elek
Copy link
Member

elek commented Apr 9, 2021

Thanks the answer @bharatviswa504

Yes, planned phase-1 development tasks are almost completed. We have one in-progress HDDS-5060.

Based on my understanding without HDDS-5060 it can not be secure in production. We shouldn't release the current code it without this safety check (or we need to issue a cve... ;-) )

Is there any specific reason to remove this check before having full security? I mean: if it's required for acceptance tests, we can directly configure this value in docker-config files.

Or do we have any other motivations?

I have a strange feeling about committing something to the master which makes the branch non-releaseable...

@bharatviswa504
Copy link
Contributor Author

bharatviswa504 commented Apr 9, 2021

Thanks the answer @bharatviswa504

Yes, planned phase-1 development tasks are almost completed. We have one in-progress HDDS-5060.

Based on my understanding without HDDS-5060 it can not be secure in production. We shouldn't release the current code it without this safety check (or we need to issue a cve... ;-) )

Is there any release planned from master, before next release we will make sure we shall have this fix. I belive we don't need to issue a cve for this, as we shall take care of this issue before the next release from apache master branch.

Is there any specific reason to remove this check before having full security? I mean: if it's required for acceptance tests, we can directly configure this value in docker-config files.

Currently, in docker-tests we have that way. If you see this PR, this PR is removing the config in docker tests.

Or do we have any other motivations?

I have a strange feeling about committing something to the master which makes the branch non-releaseable...

The motivation for this is in Cloudera CM integration we don't need this additional config for CM integration of SCM HA.
SCM HA work is merged to master, so SCM HA security tasks are being worked on in master directly, and as said above this will be addressed before the next release. But if you feel strongly this cannot be committed until we have HDDS-5060 I am fine with it.

@adoroszlai
Copy link
Contributor

I have added for almost all SCM HA security tasks the [SCM HA Security] prefix, so following the same notion for this task also.Hope you are okay with it?

Sure, it's OK, although I think it makes titles too long without adding much value. Both SCM HA and security are already part of the title in this case. Labels also help categorize PRs and Jira issues.

I just wanted to add the missing Jira ID.

Copy link
Contributor

@mukul1987 mukul1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@bshashikant
Copy link
Contributor

bshashikant commented Apr 19, 2021

I think the release is far away. HDDS-5060 is required only if log purging is enabled for SCM HA which is currently disabled by default. I prefer to merge this change right away.

Thanks @elek , @elek and @adoroszlai for suggestions/feedbacks.

@bshashikant bshashikant merged commit 014d0a2 into apache:master Apr 19, 2021
@elek
Copy link
Member

elek commented Apr 20, 2021

I think the release is far away

I am not sure. 1.1.0 is merged a long time ago, so it seems to be reasonable to publish sg. in the next 1-2 months. (IMHO).

The motivation for this is in Cloudera CM integration we don't need this additional config for CM integration of SCM HA.

I have a very mixed feeling about this motivation. If it's a problem on the vendor side, I strongly recommend fixing it on the vendor side instead of Apache master. Especially if it can be done easily. There are other users of Apache Ozone and I know at least one who uses snapshots from the master branch. I think master should be as stable and secure as possible. This is the reason why we use feature branches to stabilize all the works outside the master.

HDDS-5060 is required only if log purging is enabled for SCM HA which is currently disabled by default.

This is a very good argument, and I am fine with that if this is the case. Can you please share how is this log purging used? Isn't it required for bootstrapping/restoring Ratis nodes? Is this grpc endpoint turned off by default? How is ratis snapshot moved between nodes when this is turned off?

I prefer to merge this change right away.

My preference is keeping master always as secure and stable as possible. Our 1.1.0 release was delayed with 3-4 months because earlier we used snapshot dependencies from Ratis. While I think it was a good decision earlier, my preference is improving our practice and keeping the master always releasable. (feature flags are good, but we should make it impossible to start something which is supposed to be secure but not).

@bshashikant
Copy link
Contributor

I think the release is far away

I am not sure. 1.1.0 is merged a long time ago, so it seems to be reasonable to publish sg. in the next 1-2 months. (IMHO).

The motivation for this is in Cloudera CM integration we don't need this additional config for CM integration of SCM HA.

I have a very mixed feeling about this motivation. If it's a problem on the vendor side, I strongly recommend fixing it on the vendor side instead of Apache master. Especially if it can be done easily. There are other users of Apache Ozone and I know at least one who uses snapshots from the master branch. I think master should be as stable and secure as possible. This is the reason why we use feature branches to stabilize all the works outside the master.

HDDS-5060 is required only if log purging is enabled for SCM HA which is currently disabled by default.

This is a very good argument, and I am fine with that if this is the case. Can you please share how is this log purging used? Isn't it required for bootstrapping/restoring Ratis nodes? Is this grpc endpoint turned off by default? How is ratis snapshot moved between nodes when this is turned off?

I prefer to merge this change right away.

My preference is keeping master always as secure and stable as possible. Our 1.1.0 release was delayed with 3-4 months because earlier we used snapshot dependencies from Ratis. While I think it was a good decision earlier, my preference is improving our practice and keeping the master always releasable. (feature flags are good, but we should make it impossible to start something which is supposed to be secure but not).

Log purging can be enabled in SCM HA by setting the config "ozone.scm.ha.ratis.log.purge.enabled" set to true which is set to false by default. For more details, please refer SCMHAConfiguration.java class.

Unless, the logs are purged, ratis uses the same appendLog protocol using grpc to replicate the scm metadata. Unless, the logs are purged, no install snapshot notification will get initiated to the follower nodes. There is no requiemnet to moves across ratis snapshots.

#2155 will fix the install snapshot mechanism which incoroporates some ratis fixes to make the whole install snapshot mechanism work. Once this gets done, the secure grpc channel work can start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants