Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,7 @@ public void testOMRetryCache() throws Exception {
Assert.assertTrue(logCapturer.getOutput().contains("created volume:"
+ volumeName));

Thread.sleep(100L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this sleep?
Is created volume: logged more than once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error occurred on line 423,I think that when logCapturer.clearOutput(); is executed, there will be a situation where the previous instruction has not been completed, which will cause the data in the cache to not be completely cleared.This will cause this error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will be a situation where the previous instruction has not been completed

If that is so, the assertion above shouldn't fail?

  Assert.assertTrue(logCapturer.getOutput().contains("created volume:"
        + volumeName));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is Assert.assertFalse(logCapturer.getOutput().contains("created volume:" + volumeName));,
not Assert.assertTrue(logCapturer.getOutput().contains("created volume:" + volumeName));
So I think this problem is caused by not being cleared properly.
Screenshot from 2020-11-27 17-11-09

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assert true ensures, that created volume is loged once for the volume, then it clears the log. Then on the same request, it expects the same log line isn't logged again. I think the create Volume line is logged more than once. So, it passes assertTrue. since the line was logged atleast once, then clears it and the other occurrences come post clearing it, thus the failure.

I was running the test, and it logged multiple time -

2020-11-28 12:51:24,067 [OM StateMachine ApplyTransaction Thread - 0] INFO  volume.OMVolumeCreateRequest (OMVolumeCreateRequest.java:validateAndUpdateCache(196)) - created volume:7249ac13-c180-40f8-a636-8d4eb481ca48 for user:ayushsaxena
2020-11-28 12:51:24,067 [OM StateMachine ApplyTransaction Thread - 0] INFO  volume.OMVolumeCreateRequest (OMVolumeCreateRequest.java:validateAndUpdateCache(196)) - created volume:7249ac13-c180-40f8-a636-8d4eb481ca48 for user:ayushsaxena
2020-11-28 12:51:24,067 [OM StateMachine ApplyTransaction Thread - 0] INFO  volume.OMVolumeCreateRequest (OMVolumeCreateRequest.java:validateAndUpdateCache(196)) - created volume:7249ac13-c180-40f8-a636-8d4eb481ca48 for user:ayushsaxena

So chances are we cleared the log, after first occurrence.
Do you have the logs when this test fails, Can you check this there whether created volume is logged multiple times, I think it is getting logged once per OM.(Wild Guess)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If in case you are able to repro, try something like this -

-    Assert.assertTrue(logCapturer.getOutput().contains("created volume:"
-        + volumeName));
+    GenericTestUtils.waitFor(() -> StringUtils
+        .countMatches(logCapturer.getOutput(), "created volume:" + volumeName)
+        == getCluster().getOzoneManagersList().size() + 1, 100, 5000);

If this fixes...

Copy link
Contributor

@bharatviswa504 bharatviswa504 Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fixed, the issue was previously it used to check Create Volume, but as it is running on all 3 OM's So, other OM's when they run they have this in the log, even after log capture clear. I have changed the log output condition check, which has resolved this issue test failing.

For more info refer below commit

HDDS-4262. Use ClientID and CallID from Rpc Client to detect retry requests (#1436)

I think it is getting logged once per OM.(Wild Guess)

Yes it is logged once per OM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is still failing
https://github.com/apache/ozone/pull/1538/checks?check_run_id=1336371166#step:4:3235

There is one more link in the jira, I think it is failing in the code added as part of HDDS-4262?

Copy link
Contributor

@bharatviswa504 bharatviswa504 Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I gave the wrong Jira reference.
The issue is fixed by below commit.
4669043

This is fixed by HDDS-4329, and the link provided run is Oct 31st, and HDDS-4329 got committed Dec 8th, 2020.

Check is changed as below in the test.

 // If 2nd time executed, it will fail with Volume creation failed. check
    // for that.
    Assert.assertFalse(logCapturer.getOutput().contains(
        "Volume creation failed"));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx Bharat, Yes, This one should solve the issue according to me.
@wycccccc can you give a try as well, If it is working for you as well, we can close this.
Thanx for the efforts here. :-)

logCapturer.clearOutput();

raftClientReply =
Expand Down