Skip to content

Conversation

@simplynaveen20
Copy link
Member

This PR contains miscellaneous fixes for disaster recovery.

  1. Force refresh database account on write forbidden exception
  2. Force refresh when read request failed due to endpoint unavailable if none of preferred region mentioned in connection policy is available
  3. If we hit scenario like region gone or write forbidden exception twice then code was not doing cache refresh, means it will only handle one disaster recovery .
  4. One bug is still open where we are not doing background location refresh if account is multi-master or user doesn't provided useMultipleWriteLocations = true, need to discuss.

@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 6139 in repo Azure/azure-sdk-for-java

@kushagraThapar
Copy link
Member

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider running the following scenario as a test:

  1. Create a multi master account enabled with two regions (eastus, westus)
  2. start ReadMyWrites with pref region is set to eastus
  3. remove eastus region from portal
  4. let the ReadMyWrites test continue running till eastus is fully removed from portal (wait for 20min)
  5. verify that we are sending the requests to westus (from the log)
  6. from portal add eastus region again
  7. wait till adding eastus region completes
  8. wait 20 min
  9. verify that we are targeting eastus (from the log)
  10. remove westus region from portal
  11. verify that there ReadMyWrites doesn't have any failure.

@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@mbhaskar
Copy link
Member

mbhaskar commented Nov 6, 2019

/azp run java - cosmos - tests

@kushagraThapar
Copy link
Member

/azp run java - cosmos - tests

@JimSuplizio
Copy link
Contributor

/azp run java - cosmos - ci

@moderakh moderakh self-requested a review November 8, 2019 01:22
Copy link
Member Author

@simplynaveen20 simplynaveen20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider running the following scenario as a test:

  1. Create a multi master account enabled with two regions (eastus, westus)
  2. start ReadMyWrites with pref region is set to eastus
  3. remove eastus region from portal
  4. let the ReadMyWrites test continue running till eastus is fully removed from portal (wait for 20min)
  5. verify that we are sending the requests to westus (from the log)
  6. from portal add eastus region again
  7. wait till adding eastus region completes
  8. wait 20 min
  9. verify that we are targeting eastus (from the log)
  10. remove westus region from portal
  11. verify that there ReadMyWrites doesn't have any failure.

Did the testing and sent the result over email, above scenario is working as expected

@moderakh
Copy link
Contributor

moderakh commented Nov 8, 2019

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kushagraThapar
Copy link
Member

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

This was referenced Nov 11, 2019
@kirankumarkolli
Copy link
Member

@moderakh thoughts on UT?

@moderakh
Copy link
Contributor

moderakh commented Nov 15, 2019

@kirankumarkolli for new customers we need to always validate the multi region fail over scenario.
I thought about how we can automate this before to save us time.

My thought for automated testing
We can have a test which uses java azure management sdk to add/remove region (the scenario above), and during that time we can validate that some basic operations (read/write/query) in a for loop fashion work during fail over.
This should be automatable and will save us time. If we decide to add this automated test, I would estimate half a week work for the test setup.

@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@simplynaveen20
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@kushagraThapar
Copy link
Member

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kushagraThapar kushagraThapar merged commit c96f52b into Azure:master Dec 3, 2019
xseeseesee pushed a commit that referenced this pull request Dec 10, 2019
* DR fixes for java sdk

* correct comment texts

* fixing existing location cache test

* formating change

* correct text in mock account name

* test fix as per #6352

* removing flaky assert from the test case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants