Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

datanode.replication.limit < datanode.reconstruction.weight should not be allowed in configuration, otherwise EC reconstruction will never be scheduled.

https://issues.apache.org/jira/browse/HDDS-8898

How was this patch tested?

Tested by setting replication limit to 2:

scm_1       | [main] ERROR server.StorageContainerManagerStarter: SCM start failed with exception
scm_1       | java.lang.IllegalArgumentException: replicationLimit should be >= reconstruction weight 3, but was set to 2
scm_1       | 	at org.apache.hadoop.hdds.scm.container.replication.ReplicationManager$ReplicationManagerConfiguration.validate(ReplicationManager.java:1340)
scm_1       | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
scm_1       | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
scm_1       | 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
scm_1       | 	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
scm_1       | 	at org.apache.hadoop.hdds.conf.ConfigurationReflectionUtil.callPostConstruct(ConfigurationReflectionUtil.java:220)
scm_1       | 	at org.apache.hadoop.hdds.conf.ConfigurationSource.getObject(ConfigurationSource.java:180)
scm_1       | 	at org.apache.hadoop.hdds.scm.container.replication.LegacyReplicationManager.<init>(LegacyReplicationManager.java:313)
scm_1       | 	at org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:778)
scm_1       | 	at org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:393)
scm_1       | 	at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:585)
scm_1       | 	at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:597)

@adoroszlai adoroszlai self-assigned this Jun 22, 2023
@adoroszlai adoroszlai added the EC label Jun 22, 2023
}
if (datanodeReplicationLimit < reconstructionCommandWeight) {
throw new IllegalArgumentException("replicationLimit should be >= "
+ "reconstruction weight " + reconstructionCommandWeight + ", but "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above if check, it is denoting it as "reconstructionCommandWeight", can we change

+ "reconstruction weight " + reconstructionCommandWeight + ", but "

to

+ "reconstructionCommandWeight: " + reconstructionCommandWeight + ", but "

Copy link
Contributor Author

@adoroszlai adoroszlai Jun 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Note that names like reconstructionCommandWeight are meaningful only in Java code. Ideally we should reference them by property name to help users fix such config errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I was also thinking, why they didn't add the config name itself in the above one, ideally it should be config name only

@adoroszlai adoroszlai merged commit 601fadf into apache:master Jun 22, 2023
@adoroszlai adoroszlai deleted the HDDS-8898 branch June 22, 2023 16:44
@adoroszlai
Copy link
Contributor Author

Thanks @ayushtkn, @sodonnel for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Jun 25, 2023
* master: (96 commits)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  HDDS-8838. Update default datanode check empty containter on disk to false (apache#4937)
  HDDS-8763. Support RocksDB iterator with ByteBuffer. (apache#4942)
  HDDS-8543. FSO directory should reflect bucket/cluster default replication (apache#4947)
  HDDS-8898. Replication limit should not be less than reconstruction weight (apache#4954)
  HDDS-8739. Snapdiff should return complete absolute path in Diff Entry (apache#4823)
  HDDS-8908. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8534. Support asynchronous service logging (apache#4663)
  HDDS-8879. Cleanup SecurityConfig and related class initialization (apache#4921)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2023
* tmp-dir-refactor: (99 commits)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  Fix SCM HA finalization compat test
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  HDDS-8838. Update default datanode check empty containter on disk to false (apache#4937)
  HDDS-8763. Support RocksDB iterator with ByteBuffer. (apache#4942)
  HDDS-8543. FSO directory should reflect bucket/cluster default replication (apache#4947)
  HDDS-8898. Replication limit should not be less than reconstruction weight (apache#4954)
  HDDS-8739. Snapdiff should return complete absolute path in Diff Entry (apache#4823)
  HDDS-8908. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8534. Support asynchronous service logging (apache#4663)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants