Skip to content

Conversation

@elek
Copy link
Member

@elek elek commented Apr 27, 2021

What changes were proposed in this pull request?

The TestContainerDeletionChoosingPolicy#testRandomChoosingPolicy is reported to be intermittent.

Observed in this CI run: https://github.com/apache/ozone/pull/2179/checks?check_run_id=2423463857

Bundle is attached to issue.

It seems that #1885 introduced a few problems. (It was new to me, so I try to explain here the changes)

-data.incrPendingDeletionBlocks(random.nextInt(numContainers) + 1);
+data.incrPendingDeletionBlocks(20);

It's better to use fixed size of containers as it's very hard to predict the test behaviors without that.

 for (int i = 0; i < result1.size(); i++) {

That's the root cause of the problem. As we have block size based selection, the container size may be less than the requested block size.

List<ContainerBlockInfo> result1 = blockDeletingService
              .chooseContainerForBlockDeletion(50, deletionPolicy);

Here we should use block number s(50) instead of the container numbers (was 10 earlier)

for (int j = 0; j < 100; j++) {

As result1 and result2 returns with 3 containers it's better to check multiple times the randomness.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-5147

How was this patch tested?

Unit test is passed 7000 times locally. Without the patch it fails during the first 50-100 executions.

@adoroszlai adoroszlai requested a review from lokeshj1703 April 27, 2021 12:15
@errose28
Copy link
Contributor

Thanks for the quick fix Marton! This LGTM +1.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elek Thanks for working on this! The changes look good to me. +1.

lokeshj1703 pushed a commit that referenced this pull request Apr 28, 2021
@lokeshj1703
Copy link
Contributor

@elek Thanks for the contribution! @errose28 Thanks for the review! I have committed the PR to master branch.

errose28 added a commit to errose28/ozone that referenced this pull request May 4, 2021
…ing-upgrade-master-merge2

* upstream/master: (56 commits)
  HDDS-2212. Genconf tool should generate config files for secure clust… (apache#1788)
  HDDS-5166. Remove duplicate assignment of OZONE_OPTS for freon and sh (apache#2195)
  Revert "HDDS-5144. Create github check to alert when dependency tree is changed (apache#2177)"
  HDDS-4983. Display key offset for each block in command key info (apache#2051)
  HDDS-5144. Create github check to alert when dependency tree is changed (apache#2177)
  HDDS-4585. Support bucket acl operation in S3g (apache#1701)
  HDDS-5153. Decommissioning a dead node should complete immediately (apache#2190)
  HDDS-5147. Intermittent test failure in TestContainerDeletionChoosingPolicy#testRandomChoosingPolicy (apache#2188)
  HDDS-5152. Fix Suggested leader in Client. (apache#2189)
  HDDS-5148. Bump ratis version to 2.1.0-ff8aa66-SNAPSHOT (apache#2184)
  HDDS-4515. Datanodes should be able to persist and load CRL (apache#2181)
  HDDS-5060. [SCM HA Security] Make InterSCM grpc channel secure. (apache#2187)
  HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException. (apache#2141)
  HDDS-5127. Fix getServiceList when SCM HA is enabled (apache#2173)
  HDDS-4889. Add simple CI check for docs (apache#2156)
  HDDS-5131. Use timeout in github actions (apache#2176)
  HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine. (apache#2155)
  HDDS-5124. Use OzoneConsts.OZONE_TIME_ZONE instead of "GMT" (apache#2166)
  HDDS-5047. Refactor Pipeline to use ReplicationConfig instead of factor/type (apache#2096)
  HDDS-5083. Bump version of common-compress (apache#2139)
  ...

Conflicts:
	hadoop-hdds/common/pom.xml
	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java
	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/HddsDatanodeService.java
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManager.java
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMStorageConfig.java
	hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/StorageContainerManager.java
	hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/node/TestSCMNodeManager.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/MiniOzoneClusterImpl.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMStorage.java
	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
	hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerManagerFacade.java
errose28 added a commit to errose28/ozone that referenced this pull request May 13, 2021
…k-in-auth

* HDDS-3698-nonrolling-upgrade: (57 commits)
  Fix compilation errors afte merge Update javassist in recon pom Fix changes introduced in merge that failed TestSCMNodeManager upgrade tests Fix checkstyle Fix intermittent test failure TestSCMNodeManager#testSetNodeOpStateAndCommandFired after merge Skip scm init default layout version in TestOzoneConfigurationFields
  HDDS-2212. Genconf tool should generate config files for secure clust… (apache#1788)
  HDDS-5166. Remove duplicate assignment of OZONE_OPTS for freon and sh (apache#2195)
  Revert "HDDS-5144. Create github check to alert when dependency tree is changed (apache#2177)"
  HDDS-4983. Display key offset for each block in command key info (apache#2051)
  HDDS-5144. Create github check to alert when dependency tree is changed (apache#2177)
  HDDS-4585. Support bucket acl operation in S3g (apache#1701)
  HDDS-5153. Decommissioning a dead node should complete immediately (apache#2190)
  HDDS-5147. Intermittent test failure in TestContainerDeletionChoosingPolicy#testRandomChoosingPolicy (apache#2188)
  HDDS-5152. Fix Suggested leader in Client. (apache#2189)
  HDDS-5148. Bump ratis version to 2.1.0-ff8aa66-SNAPSHOT (apache#2184)
  HDDS-4515. Datanodes should be able to persist and load CRL (apache#2181)
  HDDS-5060. [SCM HA Security] Make InterSCM grpc channel secure. (apache#2187)
  HDDS-5051. Ensure failover to suggested leader if any for NotLeaderException. (apache#2141)
  HDDS-5127. Fix getServiceList when SCM HA is enabled (apache#2173)
  HDDS-4889. Add simple CI check for docs (apache#2156)
  HDDS-5131. Use timeout in github actions (apache#2176)
  HDDS-5103. Fix Install Snapshot Mechanism in SCMStateMachine. (apache#2155)
  HDDS-5124. Use OzoneConsts.OZONE_TIME_ZONE instead of "GMT" (apache#2166)
  HDDS-5047. Refactor Pipeline to use ReplicationConfig instead of factor/type (apache#2096)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants