Skip to content

Conversation

@wycccccc
Copy link
Contributor

@wycccccc wycccccc commented Nov 12, 2020

What changes were proposed in this pull request?

Fix the bug about org.apache.hadoop.ozone.om.TestOzoneManagerHAMetadataOnly.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4428

How was this patch tested?

In my original test, an error occurred on average 30 times,but now there are no more errors in several tests.
Screenshot from 2020-11-12 18-05-03
``

Assert.assertTrue(logCapturer.getOutput().contains("created volume:"
+ volumeName));

Thread.sleep(100L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this sleep?
Is created volume: logged more than once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error occurred on line 423,I think that when logCapturer.clearOutput(); is executed, there will be a situation where the previous instruction has not been completed, which will cause the data in the cache to not be completely cleared.This will cause this error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be a situation where the previous instruction has not been completed

If that is so, the assertion above shouldn't fail?

  Assert.assertTrue(logCapturer.getOutput().contains("created volume:"
        + volumeName));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is Assert.assertFalse(logCapturer.getOutput().contains("created volume:" + volumeName));,
not Assert.assertTrue(logCapturer.getOutput().contains("created volume:" + volumeName));
So I think this problem is caused by not being cleared properly.
Screenshot from 2020-11-27 17-11-09

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assert true ensures, that created volume is loged once for the volume, then it clears the log. Then on the same request, it expects the same log line isn't logged again. I think the create Volume line is logged more than once. So, it passes assertTrue. since the line was logged atleast once, then clears it and the other occurrences come post clearing it, thus the failure.

I was running the test, and it logged multiple time -

2020-11-28 12:51:24,067 [OM StateMachine ApplyTransaction Thread - 0] INFO  volume.OMVolumeCreateRequest (OMVolumeCreateRequest.java:validateAndUpdateCache(196)) - created volume:7249ac13-c180-40f8-a636-8d4eb481ca48 for user:ayushsaxena
2020-11-28 12:51:24,067 [OM StateMachine ApplyTransaction Thread - 0] INFO  volume.OMVolumeCreateRequest (OMVolumeCreateRequest.java:validateAndUpdateCache(196)) - created volume:7249ac13-c180-40f8-a636-8d4eb481ca48 for user:ayushsaxena
2020-11-28 12:51:24,067 [OM StateMachine ApplyTransaction Thread - 0] INFO  volume.OMVolumeCreateRequest (OMVolumeCreateRequest.java:validateAndUpdateCache(196)) - created volume:7249ac13-c180-40f8-a636-8d4eb481ca48 for user:ayushsaxena

So chances are we cleared the log, after first occurrence.
Do you have the logs when this test fails, Can you check this there whether created volume is logged multiple times, I think it is getting logged once per OM.(Wild Guess)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If in case you are able to repro, try something like this -

-    Assert.assertTrue(logCapturer.getOutput().contains("created volume:"
-        + volumeName));
+    GenericTestUtils.waitFor(() -> StringUtils
+        .countMatches(logCapturer.getOutput(), "created volume:" + volumeName)
+        == getCluster().getOzoneManagersList().size() + 1, 100, 5000);

If this fixes...

Copy link
Contributor

@bharatviswa504 bharatviswa504 Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed, the issue was previously it used to check Create Volume, but as it is running on all 3 OM's So, other OM's when they run they have this in the log, even after log capture clear. I have changed the log output condition check, which has resolved this issue test failing.

For more info refer below commit

HDDS-4262. Use ClientID and CallID from Rpc Client to detect retry requests (#1436)

I think it is getting logged once per OM.(Wild Guess)

Yes it is logged once per OM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still failing
https://github.com/apache/ozone/pull/1538/checks?check_run_id=1336371166#step:4:3235

There is one more link in the jira, I think it is failing in the code added as part of HDDS-4262?

Copy link
Contributor

@bharatviswa504 bharatviswa504 Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I gave the wrong Jira reference.
The issue is fixed by below commit.
4669043

This is fixed by HDDS-4329, and the link provided run is Oct 31st, and HDDS-4329 got committed Dec 8th, 2020.

Check is changed as below in the test.

 // If 2nd time executed, it will fail with Volume creation failed. check
    // for that.
    Assert.assertFalse(logCapturer.getOutput().contains(
        "Volume creation failed"));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx Bharat, Yes, This one should solve the issue according to me.
@wycccccc can you give a try as well, If it is working for you as well, we can close this.
Thanx for the efforts here. :-)

@wycccccc wycccccc requested a review from ayushtkn November 26, 2020 08:36
Copy link
Contributor

@bharatviswa504 bharatviswa504 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I beleive we don't require the changes, this is fixed as part od HDDS-4262.

Are you still seeing this issue, In recent runs, I don't see this test failing?

Assert.assertTrue(logCapturer.getOutput().contains("created volume:"
+ volumeName));

Thread.sleep(100L);
Copy link
Contributor

@bharatviswa504 bharatviswa504 Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed, the issue was previously it used to check Create Volume, but as it is running on all 3 OM's So, other OM's when they run they have this in the log, even after log capture clear. I have changed the log output condition check, which has resolved this issue test failing.

For more info refer below commit

HDDS-4262. Use ClientID and CallID from Rpc Client to detect retry requests (#1436)

I think it is getting logged once per OM.(Wild Guess)

Yes it is logged once per OM.

@wycccccc
Copy link
Contributor Author

Thanx @ayushtkn @bharatviswa504 review. I agree with you that the issue has been resolved.So I will close this pr.

@wycccccc wycccccc closed this Dec 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants