Skip to content

Conversation

@bshashikant
Copy link
Contributor

What changes were proposed in this pull request?

Enable ratis by default for SCM.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5740

How was this patch tested?

Existing Uts

@guihecheng
Copy link
Contributor

guihecheng commented Sep 15, 2021

Hi @bshashikant the patch itself looks good and I think this is to say that SCM-HA is a stable and production ready from now on.
Actually we've got a benchmark tool for SCM throughput and by using it we detected and fixed a serious throughput degradation on the ratis side these days.
FYI:
https://issues.apache.org/jira/browse/HDDS-5702
https://issues.apache.org/jira/browse/HDDS-5703

So IMHO, we could consider bump up the ratis version sometime(maybe the next release?), so we could have a more stable SCM-HA, thanks~
cc @JacksonYao287

@errose28
Copy link
Contributor

@guihecheng We have recently bumped Ratis to 2.2.0 on master fyi

@guihecheng
Copy link
Contributor

@guihecheng We have recently bumped Ratis to 2.2.0 on master fyi

Ah, that's a good news, thanks~

Copy link
Contributor

@mukul1987 mukul1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bshashikant for working on this. I have a question about upgrades, and a minor code change request regarding imports.

import static org.apache.hadoop.hdds.protocol.proto.StorageContainerDatanodeProtocolProtos.SCMRegisteredResponseProto.ErrorCode.success;
import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_PIPELINE_AUTO_CREATE_FACTOR_ONE;
import static org.apache.hadoop.hdds.scm.ScmConfigKeys.OZONE_SCM_RATIS_PIPELINE_LIMIT;
import static org.apache.hadoop.hdds.scm.ScmConfigKeys.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid star import. (Also in a few other files.)

Comment on lines +27 to +29
// setting ozone.scm.ratis.enable to false for now, as scm ha upgrade is
// not supported yet. This is supposed to work without SCM HA configuration
OZONE-SITE.XML_ozone.scm.ratis.enable=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If upgrade to SCM HA is not yet supported, why are we enabling it by default? Shouldn't we implement upgrade first, then enable by default?

My concern is this: consider a non SCM HA cluster, where ozone.scm.ratis.enable=false comes from the default config. If upgraded, will SCM HA now be suddenly enabled after upgrade? Will the admin need to disable SCM HA manually to make it work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is safe to use, in case of non HA, it should just startup a single node ratis server. This needs to be enabled, so as to make sure the the tests excercise the HA path, not the standalone path.

@bshashikant bshashikant merged commit 3eb7235 into apache:master Jan 13, 2022
arp7 pushed a commit that referenced this pull request Feb 17, 2022
…ansport support (#3074)

* HDDS-6149. Remove unused keytabs (#2960)

* HDDS-6094. Some unit tests are skipped due to using JUnit4 runner (#2909)

* HDDS-6075. OzoneConfiguration constructor overrides input configuration keys. (#2921)

* HDDS-4177. SCM Container DB bootstrap on Recon startup (#2942)

* HDDS-6086. Compute MD5MD5CRC file checksum using chunk checksums from DataNodes (#2919)

* HDDS-6148. Validate ContainerBalancerConfiguration before start ContainerBalancer (#2957)

* HDDS-6161. SCM StateMachine failing to reinitialize doesn't terminate the process. (#2971)

* HDDS-6134. Move replication-specific config to ReplicationServer (#2943)

* HDDS-4010. S3G startup fails when multiple service ids are configured. (#2976)

* HDDS-6170. Add metrics to replication manager to track container health states (#2975)

* HDDS-3231. Cleanup KeyManagerImpl (#2961)

* HDDS-5927. Improve defaults in ContainerBalancerConfiguration (#2892)

* HDDS-6157. More consistent synchronization in InputStreams (#2965)

* HDDS-4743. [FSO] Add FSO variant of ITestOzoneContractDistcp. (#2980)

* HDDS-6114. Intermittent error due to Failed to init RocksDB (#2947)

* HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient. (#2981)

* HDDS-6175. Use s3Auth during proxy during decrypt in RpcClient.

* HDDS-5740. Enable ratis by default for SCM. (#2637)

* HDDS-6183. Intermittent failure in TestKeyDeletingService.checkIfDeleteServiceWithFailingSCM. (#2991)

* HDDS-4190. Intermittent failure in TestOMVolumeSetOwnerRequest and TestOMVolumeSetQuotaRequest. (#2982)

* HDDS-6120. Compute block checksum using chunk checksums (#2930)

* HDDS-6147. Add ability in OM to get limited delta updates (#2956)

* HDDS-6195. Remove unused jmh-core dependency. (#2997)

* HDDS-6167. Update ozone-runner version to 20211202-1 (#2969)

* HDDS-6171. Create an API to change Bucket Owner (#2988)

* HDDS-6163. Fix PATH in docker image (#2967)

* HDDS-6202. Avoid using jmh-generator-annprocess since it is GPL2.0. (#2998)

* HDDS-6135. SCM Container DB bootstrap on Recon startup for SCM HA. (#2972)

* HDDS-6109. Preserve the underlying exception raised in client lib. (#2989)

* HDDS-3408. Rename ChunkLayOutVersion to ContainerLayoutVersion. (#2983)

* HDDS-6203. CleanUp incomplete gz files during Container move (#3000)

* HDDS-6216. Move OMOpenKeysDeleteRequest to package om.request.key (#3011)

* HDDS-6191. Intermittent failure in TestDeleteWithSlowFollower (#3015)

* HDDS-6128. CLI tool that downloads all the block replicas and creates a manifest file (#2987)

* HDDS-6177. Extend container info command to include replica details  (#2995)

* HDDS-6211. [Docs] Image styling on deployed site does not replicate local builds. (#3007)

* HDDS-6219. Switch to RATIS ReplicationType from STAND_ALONE in our tests. (#3014)

* HDDS-6192. feature/Observability.md translated to Chinese (#2994)

* HDDS-6205. Add CLI command to display the latest Replication Manager report (#3013)

* HDDS-6227. Test helpers should observe naming conditions (#3020)

* HDDS-6239. ozonesecure-mr failing with No URLs in mirrorlist (#3029)

* HDDS-6201. Fix NPE for DataScanner with scanned container deleted by others. (#3005)

* HDDS-5529. For any IOexception from @REPLicated method we should throw it (#2788)

* HDDS-6181. Change SCMHAInvocationHandler#invokeRatis() logging to TRACE (#2992)

* HDDS-6206. Application errors must not flood system log (#3001)

* HDDS-6245. Add BucketLayout logging to Audit Logs (#3040)

* HDDS-6238 Reduce memory requirements for list keys. (#3032)

* HDDS-2919. Intermittent failure in TestRDBStore (#3028)

* HDDS-6253. Unnecessary duplicate smoketest after defaulting to FSO (#3036)

* HDDS-6204. Cleanup handling malformed authorization header (#2999)

* HDDS-6169. Selective checks: skip junit tests on ozone-runner image update (#2974)

* HDDS-6270. Use a dedicated file instead of /etc/passwd for xcompat acceptance test (#3050)

* HDDS-6273. Amend doc SecuringTDE.md (#3047)

* HDDS-6140. Selective checks: skip unit check for integration-test changes (#2948)

* HDDS-6215. Recon get limited delta updates from OM (#3009)

* HDDS-6125. Recon get limited delta updates from OM

* HDDS-6215. Fix unit test

* trigger new CI check

* HDDS-6215. Fix typo

* trigger new CI check

Co-authored-by: Symious <[email protected]>

* HDDS-6226. Run tests for selective CI checks in CI (#3019)

* HDDS-6247. Avoid logging stack trace for user input problems (#3039)

* HDDS-6208. New checkstyle: WhitespaceAround (#3003)

* HDDS-6289. Upgrade acceptance test log flooded with parse error (#3063)

Co-authored-by: Siyao Meng <[email protected]>

* Trigger Build

* Fix integration test for added configuation field for selecting OmTransport for s3 gateway - TestOzoneConfigurationFields (added config key not in xml).

Co-authored-by: Doroszlai, Attila <[email protected]>
Co-authored-by: Lokesh Jain <[email protected]>
Co-authored-by: Aswin Shakil Balasubramanian <[email protected]>
Co-authored-by: Wei-Chiu Chuang <[email protected]>
Co-authored-by: Symious <[email protected]>
Co-authored-by: Bharat Viswanadham <[email protected]>
Co-authored-by: Stephen O'Donnell <[email protected]>
Co-authored-by: GeorgeJahad <[email protected]>
Co-authored-by: Siddhant Sangwan <[email protected]>
Co-authored-by: Jyotinder Singh <[email protected]>
Co-authored-by: Shashikant Banerjee <[email protected]>
Co-authored-by: Tsz-Wo Nicholas Sze <[email protected]>
Co-authored-by: Kiyoshi Mizumaru <[email protected]>
Co-authored-by: Ritesh H Shukla <[email protected]>
Co-authored-by: Nibiru <[email protected]>
Co-authored-by: Kaijie Chen <[email protected]>
Co-authored-by: Zita Dombi <[email protected]>
Co-authored-by: Istvan Fajth <[email protected]>
Co-authored-by: steinsgateted <[email protected]>
Co-authored-by: Gui Hecheng <[email protected]>
Co-authored-by: Jackson Yao <[email protected]>
Co-authored-by: Keyi Song <[email protected]>
Co-authored-by: UENISHI Kota <[email protected]>
Co-authored-by: Siyao Meng <[email protected]>
Co-authored-by: Symious <[email protected]>
adoroszlai added a commit to adoroszlai/ozone that referenced this pull request Apr 28, 2022
adoroszlai added a commit that referenced this pull request May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants