Skip to content

Conversation

@xinlian12
Copy link
Member

@xinlian12 xinlian12 commented Oct 6, 2020

This PR is mostly a duplicate of ConnectionStateListener change incuded in PR #14697 with few differences.

Implementation:
Each RntbdServiceEndpoint has a RntbdConnectionStateListener to keep tracking the partitionKeyRangeIdentity sets. New partitionKeyRangeIdentity will be added when RntbdServiceEndpoint.request() called.

When detecting a server is going down, we will remove all the effected PartitionKeyRangeIdentity from gateway address cache. Currently, only ClosedChannelException will trigger onConnectionEvent since this is more sure as a signal the server is going down.

Workflow:
Three major workflows could be triggered.

  1. Server starts a graceful connection closure. (FIN)
    Normal netty close channle flow will be triggered. (closeFuture, inactive, unregister). RntbdRequestManager will complete all pending requests for the channel with ClosedChannelException.

  2. Server ungracefully close a connection (RST).
    This will trigger netty exceptionCaught() flow. 'RntbdRequestManager.exceptionCaught()` will be called, and complete all pending requests for the channel with IOException. Channel will be closed.

  3. When the server down, client trying to start a new connection:
    Eventually get ConnectTimeOutException

Test:
Test33:
Did TCP packets capture for package upgrade and manually retart VM. Based on the tcp traces, the first and third workflow will be triggered.

Test Results from benchmark run:
Pending

Test Results from CTL run:
Pending

Annie Liang added 2 commits October 6, 2020 10:31
@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RntbdConnectionStateListener::partitionAddressCache is a local cache and can be stale. as a result of that RntbdConnectionStateListener may remove addresses of unrelated pkranges causing unnecessary address refresh.

Consider this scenario:

  1. PKR1, PKR2, PKR3 are hosted on the same physical node.
  2. request1 with PKR1, request2 with PKR2, and request3 with PKR3 are sent and as a result RntbdConnectionStateListener:partitionAddressCache is populated with PKR1, PKR2, and PKR3.
  3. due to partition movement PKR1 moves out of the physical node, but PKR2 and PKR3 stay on the node.
  4. now the node hosting PKR2 and PKR3 shuts down.
  5. RntbdConnectionStateListener thinks PKR1 is still on this physical node, and hence will remove addresses of PKR1, PKR2, and PKR3. But PKR1 addresses shouldn't have been removed!!.
  6. this will result in unnecessary address refresh for PKR1.

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. (please make sure we run CTL)

@xinlian12 very great work.

Please track the following as an improvement outside of the scope of this PR:

RntbdConnectionStateListener::partitionAddressCache is a local cache and can be stale. as a result of that RntbdConnectionStateListener may remove addresses of unrelated pkranges causing unnecessary address refresh.

Consider this scenario:

  1. PKR1, PKR2, PKR3 are hosted on the same physical node.
  2. request1 with PKR1, request2 with PKR2, and request3 with PKR3 are sent and as a result RntbdConnectionStateListener:partitionAddressCache is populated with PKR1, PKR2, and PKR3.
  3. due to partition movement PKR1 moves out of the physical node, but PKR2 and PKR3 stay on the node.
  4. now the node hosting PKR2 and PKR3 shuts down.
  5. RntbdConnectionStateListener thinks PKR1 is still on this physical node, and hence will remove addresses of PKR1, PKR2, and PKR3. But PKR1 addresses shouldn't have been removed!!.
  6. this will result in unnecessary address refresh for PKR1.

Copy link
Contributor

@moderakh moderakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good code-wise.

There are a few improvement which Annie is tracking outside of the scope of this PR.

for This PR however we are waiting for

  1. CTL result
  2. perf testing (there was some discussion that on dotNet side enabling this feature resulted in perf impact) we need to measure if that's the case in Java as well or not.

@xinlian12
Copy link
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@xinlian12
Copy link
Member Author

xinlian12 commented Oct 19, 2020

Test Result:

1. Adobe Tests (default config):
WestUS2 database, West US2 VM (16 core)
image
image
image

2. Benchmark Runner-ReadLatency:
Test33, 10,000 Documents, 1-1024 concurrency, each concurrency will run 5 mins:

P95:
image

P99:
image

P999:
image

Throughput:
image

3. CTL Run:
Test33

image

image

image

image

image

image

Test results from 4 core CPU:
Adobe Tests:
image
image
image

Benchmark test results:
image
image
image

@xinlian12 xinlian12 merged commit f0445e4 into Azure:master Oct 21, 2020
@xinlian12 xinlian12 deleted the ConnectionStateListener branch February 3, 2021 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants