Skip to content

Conversation

@bharatviswa504
Copy link
Contributor

Add a docker compose OM HA cluster with S3.
Run the S3 test suite on the OM HA cluster.
https://issues.apache.org/jira/browse/HDDS-2278

@bharatviswa504 bharatviswa504 requested review from arp7 and elek October 15, 2019 02:34
@bharatviswa504 bharatviswa504 changed the title HDDS-2278. Run S3 test suite on OM HA cluste. HDDS-2278. Run S3 test suite on OM HA cluster. Oct 15, 2019
@bharatviswa504 bharatviswa504 self-assigned this Oct 15, 2019
Copy link
Member

@elek elek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks the patch @bharatviswa504

Note: While I am happy to have more and more environments, I feel that it's harder and harder to manage them and some of them are confusing. I am thinking about to merge some of the environments, for example include s3 in the simple ozone (and ozone-om-ha) and delete ozone-s3.

S3 doesn't have huge resource requirement and very core of our provided functionality.

This is not related to this patch (as this patch follows the current practice) but I am very interested about your opinion.

@elek
Copy link
Member

elek commented Oct 16, 2019

UPDATE: it's failing during my local test all the time. the ozones3 is working but not the new tests. Let's try to request a new test run...

@elek
Copy link
Member

elek commented Oct 16, 2019

/retest

@bharatviswa504
Copy link
Contributor Author

Thank You @elek for the review.
I see newly acceptance tests are passing. Let me know if you still see errors in your local run.

@bharatviswa504
Copy link
Contributor Author

+1 thanks the patch @bharatviswa504

Note: While I am happy to have more and more environments, I feel that it's harder and harder to manage them and some of them are confusing. I am thinking about to merge some of the environments, for example include s3 in the simple ozone (and ozone-om-ha) and delete ozone-s3.

S3 doesn't have huge resource requirement and very core of our provided functionality.

This is not related to this patch (as this patch follows the current practice) but I am very interested about your opinion.

Yes, I am with you on this, S3 now became core functionality, we don't need a separate environment for it, we can include S3 in Ozone compose. And also this helps in reducing run time of our acceptance test suite.

@bharatviswa504
Copy link
Contributor Author

@elek can we commit this or still some issues you are seeing?

@elek
Copy link
Member

elek commented Oct 18, 2019

I see a problem locally, which is passed on the server. Don't know what is the exact problem...

@adoroszlai
Copy link
Contributor

@bharatviswa504 @elek I think the problem is with the upgraded Ratis (#26 (review)). Leader election problem affects OM HA.

  1. The pull request source branch is based on 920dde9, which precedes Ratis upgrade 546e36b. Test passes locally when checking out this branch. This is what CI is testing.
  2. Applying the patch on current master (2529cee), the test fails locally. Log is full of leader election attempts:
om1_1       | 2019-10-18 15:55:51,883 INFO impl.RaftServerImpl: om1@group-562213E44849: changes role from CANDIDATE to FOLLOWER at term 3601 for DISCOVERED_A_NEW_TERM
...
om1_1       | 2019-10-18 15:55:52,026 INFO impl.RaftServerImpl: om1@group-562213E44849: changes role from  FOLLOWER to CANDIDATE at term 3601 for changeToCandidate
...
om1_1       | 2019-10-18 15:55:52,038 INFO impl.RaftServerImpl: om1@group-562213E44849: changes role from CANDIDATE to FOLLOWER at term 3602 for DISCOVERED_A_NEW_TERM

@bharatviswa504
Copy link
Contributor Author

Thank You @adoroszlai for the analysis.
I will wait until we upgrade to the latest Ratis version to commit this.

@bharatviswa504
Copy link
Contributor Author

Rebased with the latest master(after ratis version upgrade)

@bharatviswa504
Copy link
Contributor Author

/retest

@bharatviswa504
Copy link
Contributor Author

Newly added tests are passing, test failures are not related to this patch.

Copy link
Contributor

@dineshchitlangia dineshchitlangia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM, failures are unrelated

@dineshchitlangia dineshchitlangia merged commit 1baa5a1 into apache:master Oct 23, 2019
@dineshchitlangia
Copy link
Contributor

Thanks @elek and @adoroszlai for reviews. Thanks @bharatviswa504 for this contribution. Committed this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants