Skip to content

Conversation

@elek
Copy link
Member

@elek elek commented Oct 13, 2019

What changes were proposed in this pull request?

Fixing an intermittent unit test.

What is the problem

For example from the nightly build:

  <testcase name="testNoFallback[8]" classname="org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware" time="0.014">
            <failure type="java.lang.AssertionError">java.lang.AssertionError
        	at org.junit.Assert.fail(Assert.java:86)
        	at org.junit.Assert.assertTrue(Assert.java:41)
        	at org.junit.Assert.assertTrue(Assert.java:52)
        	at org.apache.hadoop.hdds.scm.container.placement.algorithms.TestSCMContainerPlacementRackAware.testNoFallback(TestSCMContainerPlacementRackAware.java:276)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:498)
        	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)

The problem is in the testNoFallback:

Let's say we have 11 nodes (from parameter) and we would like to choose 5 nodes (hard coded in the test).

As the first two replicas are chosen from the same rack an all the other from different racks it's not possible, so we except a failure.

But we have an assertion that the success count is at least 3. But this is true only if the first two replicas are placed to the rack1 (5 nodes) or rack2 (5nodes). If the replica is placed to the rack3 (one node) it will fail immediately:

Lucky case when we have success count > 3

 rack1 -- node1 
 rack1 -- node2 -- FIRST replica
 rack1 -- node3 -- SECOND replica
 rack1 -- node4
 rack1 -- node5 
 rack2 -- node6
 rack2 -- node7 -- THIRD replica
 rack2 -- node8
 rack2 -- node9 
 rack2 -- node10
 rack3 -- node11 -- FOURTH replica{code}

The specific case when we have success count == 1, as we can't choose the second replica on rack3 (This is when the test is failing)

 rack1 -- node1 
 rack1 -- node2
 rack1 -- node3
 rack1 -- node4
 rack1 -- node5 
 rack2 -- node6
 rack2 -- node7
 rack2 -- node8
 rack2 -- node9 
 rack2 -- node10
 rack3 -- node11 -- FIRST replica{code}

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-2289

How was this patch tested?

With Intellij you can execute the unit test multiple times (1000x) or until the next failure. Execute it with or without the patch. Usually I got the problem during the first 100 execution.

@elek
Copy link
Member Author

elek commented Oct 13, 2019

/retest

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying my +1 from hadoop repo:

Thanks @elek for digging into this intermittent issue.

@adoroszlai
Copy link
Contributor

@ChenSammi could you please review this fix when you have some time?

@xiaoyuyao
Copy link
Contributor

Thanks @elek for fixing this and @adoroszlai for the review. The change LGTM. I will merge it shortly.

@xiaoyuyao xiaoyuyao merged commit fdd1b15 into apache:master Oct 22, 2019
GlenGeng-awx referenced this pull request in GlenGeng-awx/hadoop-ozone Sep 18, 2020
WIP: comment away checkLeader() for PipelineManagerV2Impl
kuenishi referenced this pull request in pfnet/ozone Feb 22, 2022
tanvipenumudy added a commit to tanvipenumudy/ozone that referenced this pull request May 12, 2022
# This is the 1st commit message:

Initial Commit

# This is the commit message apache#2:

more slight changes

# This is the commit message apache#3:

changes++

# This is the commit message apache#4:

getExecutorService Changes

# This is the commit message apache#5:

applyTransaction() Changes

# This is the commit message apache#6:

changes++

# This is the commit message apache#7:

TestOzoneManagerLock changes

# This is the commit message apache#8:

add changes

# This is the commit message apache#9:

add more minor changes

# This is the commit message apache#10:

add config to ozone-default.xml

# This is the commit message apache#11:

minor changes

# This is the commit message apache#12:

change modulo logic

# This is the commit message apache#13:

changes

# This is the commit message apache#14:

changes++

# This is the commit message apache#15:

add changes++

# This is the commit message apache#16:

minor changes

# This is the commit message apache#17:

Changes (to be reverted)

# This is the commit message apache#18:

Changes 09/05
vtutrinov pushed a commit to vtutrinov/ozone that referenced this pull request Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants