Skip to content

Conversation

@kundadebdatta
Copy link
Member

@kundadebdatta kundadebdatta commented Jan 19, 2024

Pull Request Template

Description

Problem Statement:

In real world, there are many reasons an endpoint could become non-responsive and stop fullfilling any incoming requests. Some examples for such reasons could be network packet drops, the server node experiencing issues or a larger outage. Today, while initializing, the .NET v3 SDK requires to fetch the account metadata information from the routing gateway, using the global account endpoint. This information is needed to figure out the read, write regions, the resource identifiers, ETag etc, which are needed by the SDK to perform Read/ Write operations. The global account endpoint is passed through the CosmosClient constructor (see the below example for more details).

CosmosClientOptions clientOptions = new CosmosClientOptions()
{
    ApplicationPreferredRegions = new List<string>()
    {
        Regions.NorthCentralUS,
        Regions.EastAsia,
    },
    EnablePartitionLevelFailover = true,
    ConnectionMode = ConnectionMode.Direct,
};

CosmosClient client = new CosmosClient(
    "https://testaccount.documents-test.windows-int.net:443/",
    "key==",
    clientOptions
);

However, if for some unforeseen reason, the global account endpoint becomes non-responsive, today there is no way to fetch the account metadata information, thus failing the cosmos client initialization.

Proposed Solution:

The above problem could be solved if the global account metadata information is hosted with-in a private domain name. During an outage, the custom domain names can be used to route the Get Account metadata requests to the custom endpoints, if the primary global account endpoint become non-responsive.

Below is an example of how the SDK will capture the regional endpoints from the end user.

 CosmosClientOptions clientOptions = new CosmosClientOptions()
    {
        ApplicationPreferredRegions = new List<string>()
        {
            Regions.P1,
            Regions.P2,
            Regions.P3
        },
        AccountInitializationCustomEndpoints= new HashSet<string>()
        {
            { "custom.p-1.documents.azure.com" },
            { "custom.p-2.documents.azure.com" },
        },
        EnablePartitionLevelFailover = true,
    };

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #4236

@kundadebdatta kundadebdatta self-assigned this Jan 23, 2024
@kundadebdatta kundadebdatta marked this pull request as ready for review January 23, 2024 21:37
@kundadebdatta kundadebdatta changed the title CosmosClientOptions: Adds Custom Domain Names CosmosClientOptions: Adds Regional Private NameEndpoints Jan 23, 2024
@kundadebdatta kundadebdatta changed the title CosmosClientOptions: Adds Regional Private NameEndpoints CosmosClientOptions: Adds Regional Private Endpoints Jan 23, 2024
Copy link
Member

@kirankumarkolli kirankumarkolli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually, we can imagine that this.serviceEndpointEnumerator is either user given or system populated and GlobalEndpointManager.TryGetAccountPropertiesFromAllLocationsAsync should use it

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the one open issue around picking the right set of endpoints to reach out to, to get account info

kirankumarkolli
kirankumarkolli previously approved these changes Feb 6, 2024
@kundadebdatta kundadebdatta added the auto-merge Enables automation to merge PRs label Feb 9, 2024
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enables automation to merge PRs PerPartitionAutomaticFailover

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Per Partition Automatic Failover] Utilize CosmosClientOptions to Capture Custom Domain Names to resolve Cx Specified Endpoints

6 participants