Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

SCM during start sends addSCM request to its peers. It tries to exclude itself from the target list:

OzoneConfiguration config = SCMHAUtils.removeSelfId(conf, selfId);
try {
return getScmBlockClient(config).addSCM(request);

but removeSelfId does not work. The bug is that it sets the generic ozone.scm.nodes property, leaving the SCM service-specific config key ozone.scm.nodes.<service> unchanged with 3 nodes.

https://issues.apache.org/jira/browse/HDDS-6749

How was this patch tested?

Added unit test.

Also added log message to show list of peers the fail-over proxy is created with. From HA acceptance test:

scm2_1      | 2022-05-13 22:49:10,532 [main] INFO proxy.SCMBlockLocationFailoverProxyProvider: Created block location fail-over proxy with 2 nodes: [nodeId=scm1,nodeAddress=scm1/172.18.0.10:9863, nodeId=scm3,nodeAddress=scm3/172.18.0.6:9863]
scm2_1      | 2022-05-13 22:49:36,580 [Listener at 0.0.0.0/9860] INFO proxy.SCMBlockLocationFailoverProxyProvider: Created block location fail-over proxy with 2 nodes: [nodeId=scm1,nodeAddress=scm1/172.18.0.10:9863, nodeId=scm3,nodeAddress=scm3/172.18.0.6:9863]
scm3_1      | 2022-05-13 22:49:38,931 [main] INFO proxy.SCMBlockLocationFailoverProxyProvider: Created block location fail-over proxy with 2 nodes: [nodeId=scm2,nodeAddress=scm2/172.18.0.9:9863, nodeId=scm1,nodeAddress=scm1/172.18.0.10:9863]
scm3_1      | 2022-05-13 22:49:46,296 [Listener at 0.0.0.0/9860] INFO proxy.SCMBlockLocationFailoverProxyProvider: Created block location fail-over proxy with 2 nodes: [nodeId=scm2,nodeAddress=scm2/172.18.0.9:9863, nodeId=scm1,nodeAddress=scm1/172.18.0.10:9863]

https://github.com/adoroszlai/hadoop-ozone/actions/runs/2322096338

@adoroszlai adoroszlai self-assigned this May 14, 2022
@adoroszlai adoroszlai added bug Something isn't working scm labels May 14, 2022
@adoroszlai adoroszlai requested review from avijayanhwx and smengcl May 14, 2022 07:35
Copy link
Contributor

@swagle swagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@adoroszlai adoroszlai requested a review from lokeshj1703 May 16, 2022 15:40
Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks @adoroszlai for the patch

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai for working on this! The changes look good to me. +1.

@adoroszlai adoroszlai merged commit fb09173 into apache:master May 17, 2022
@adoroszlai adoroszlai deleted the HDDS-6749 branch May 17, 2022 06:03
@adoroszlai
Copy link
Contributor Author

Thanks @lokeshj1703, @smengcl, @swagle for the review.

errose28 added a commit to errose28/ozone that referenced this pull request May 20, 2022
* master: (96 commits)
  HDDS-6738. Migrate tests with rules in hdds-server-framework to JUnit5 (apache#3415)
  HDDS-6650. S3MultipartUpload support update bucket usedNamespace. (apache#3404)
  HDDS-6491. Support FSO keys in getExpiredOpenKeys (apache#3226)
  HDDS-6596. EC: Support ListBlock from CoordinatorDN (apache#3410)
  HDDS-6737. Migrate parameterized tests in hdds-server-framework to JUnit5 (apache#3414)
  HDDS-6660: EC: Add the DN side Reconstruction Handler class. (apache#3399)
  HDDS-6750. Migrate simple tests in hdds-server-scm to JUnit5 (apache#3417)
  HDDS-6749. SCM includes itself as peer in addSCM request (apache#3413)
  HDDS-6657. Improve Ozone integrated Ranger configuration instructions (apache#3365)
  HDDS-6742. Audit operation category mismatch (apache#3407)
  HDDS-6748. Intermittent timeout in TestECBlockReconstructedInputStream#testReadDataWithUnbuffer (apache#3416)
  HDDS-6731. Migrate simple tests in hdds-server-framework to JUnit5 (apache#3412)
  HDDS-5919. In kubernetes OM HA has circular dependency on service availability (apache#3185)
  HDDS-6730. Migrate tests in hdds-tools to JUnit5 (apache#3402)
  HDDS-6630. Explicitly remove node after being chosen (apache#3332)
  HDDS-6560. Add general Grafana dashboard (apache#3285)
  HDDS-6704. EC: ReplicationManager - create version of ContainerReplicaCounts applicable to EC (apache#3405)
  HDDS-6680. Pre-Finalize behaviour for Bucket Layout Feature. (apache#3377)
  HDDS-6619. Add freon command to run r/w mix workload using ObjectStore APIs (apache#3383)
  HDDS-6734. ozone admin pipeline list CLI is not backward compatible (apache#3406)
  ...

Conflicts:
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStore.java
hadoop-hdds/interface-server/src/main/proto/SCMRatisProtocol.proto
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMDBDefinition.java
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/metadata/SCMMetadataStoreImpl.java
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working scm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants