Skip to content

Conversation

@guihecheng
Copy link
Contributor

@guihecheng guihecheng commented May 12, 2021

What changes were proposed in this pull request?

Datanode has config 'dfs.datanode.failed.volumes.tolerated', but it does not work,
this makes it to work.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5219

How was this patch tested?

a new integration-test case
an extended ut

@guihecheng guihecheng force-pushed the HDDS-5219 branch 3 times, most recently from 005f3b0 to 34cf431 Compare May 14, 2021 02:03
@guihecheng
Copy link
Contributor Author

Hi @bshashikant @ChenSammi , the config support is added as you suggested, please help review this, thanks~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DFSConfigKeysLegacy is a deprecated Class. Since Ozone is spinned out from Hadoop, as a new project, we should avoid use HDFS configuration key anymore. We can define a similar key for Ozone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi oh, then I shall add this key in some other place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent is incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference of if else here, could you do a little explaination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is borrowed from hdfs, when we set this 'tolerated' to -1, we mean unlimited number of bad volumes but we still should have at least 1 good volume left.
Otherwise, we should have fewer or equal number of bad volumes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bshashikant sure, will cp the lines above into code comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more detail error message is preferred here since we know the right reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll put a more clear message here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer more detail error message, such as how may volumes configured, and how many failed volumes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@bshashikant
Copy link
Contributor

@guihecheng , thanks for the work. The config needs to be defined in ozone-site.xml and should be well documented in ozone-default.xml.

@guihecheng guihecheng force-pushed the HDDS-5219 branch 2 times, most recently from b00c2a1 to fc40473 Compare May 24, 2021 06:10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel, we should make the default as -1 rather than zero by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bshashikant here I borrowed the default 0 from HDFS to stay aligned, but I also think that -1 is reasonable.
With -1 as the default, we introduce no behavioral change to existing clusters when we have a few(1~2) volumes failed.
Will update, thanks.

Copy link
Contributor

@bshashikant bshashikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with minor suggestion.

@guihecheng
Copy link
Contributor Author

@bshashikant updated, thanks

@bshashikant bshashikant merged commit 2671b48 into apache:master May 26, 2021
errose28 added a commit to errose28/ozone that referenced this pull request Jun 1, 2021
…ing-upgrade-master-merge

* upstream/master: (76 commits)
  HDDS-5280. Make XceiverClientManager creation when necessary in ContainerOperationClient (apache#2289)
  HDDS-5272. Make ozonefs.robot execution repeatable (apache#2280)
  HDDS-5123. Use the pre-created apache/ozone-testkrb5 image during secure acceptance tests (apache#2165)
  HDDS-4993. Add guardrail for reserved buffer size when DN reads a chunk (apache#2058)
  HDDS-4936. Change ozone groupId from org.apache.hadoop to org.apache.ozone (apache#2018)
  HDDS-4043. allow deletion from Trash directory without -skipTrash option (apache#2110)
  HDDS-4927. Determine over and under utilized datanodes in Container Balancer. (apache#2230)
  HDDS-5273. Handle unsecure cluster convert to secure cluster for SCM. (apache#2281)
  HDDS-5158. Add documentation for SCM HA Security. (apache#2205)
  HDDS-5275. Datanode Report Publisher publishes one extra report after DN shutdown (apache#2283)
  HDDS-5241. SCM UI should have leader/follower and Primordial SCM information (apache#2260)
  HDDS-5219. Limit number of bad volumes by dfs.datanode.failed.volumes.tolerated. (apache#2243)
  HDDS-5252. PipelinePlacementPolicy filter out datanodes with not enough space. (apache#2271)
  HDDS-5191. Increase default pvc storage size (apache#2219)
  HDDS-5073. Use ReplicationConfig on client side  (apache#2136)
  HDDS-5250. Build integration tests with Maven cache (apache#2269)
  HDDS-5236. Require block token for more operations (apache#2254)
  HDDS-5266 Misspelt words in S3MultipartUploadCommitPartRequest.java line 202 (apache#2279)
  HDDS-5249. Race Condition between Full and Incremental Container Reports (apache#2268)
  HDDS-5142. Make generic streaming client/service for container re-replication, data read, scm/om snapshot download (apache#2256)
  ...

Conflicts:
	hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java
	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java
	hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
	hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/MockNodeManager.java
	hadoop-ozone/dist/src/main/compose/testlib.sh
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestStorageContainerManager.java
	hadoop-ozone/interface-client/src/main/proto/OmClientProtocol.proto
	hadoop-ozone/ozone-manager/pom.xml
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/ratis/utils/OzoneManagerRatisUtils.java
	hadoop-ozone/s3gateway/pom.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants