Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Internal] Binary Encoding: Adds Binary Encoding Support for Point Operations #4652

Conversation

kundadebdatta
Copy link
Member

@kundadebdatta kundadebdatta commented Aug 23, 2024

Pull Request Template

Description

This PR introduces binary encoding support on request and responses for different Point operations.

What is Binary Encoding?

As the name suggests, binary encoding is a encoding mechanism through which the request payload will be encoded to binary first and sent to backend for processing. Decoding to Text will happen on the response path. The biggest benefit of binary encoding is to reduce cost on backend storage which helps to reduce the overall COGS.

Scope

The point operations that are currently in and out of scope for binary encoding are given below in tabular format:

Operations Currently in Scope Operations Currently Out of Scope Reason for Out of Scope
CreateItemAsync() PatchItemAsync() Operation Currently not Supported in BE
CreateItemStreamAsync() PatchItemStreamAsync() Operation Currently not Supported in BE
ReadItemAsync() TransactionalBatches Operation Currently not Supported in BE
ReadItemStreamAsync() Bulk APIs Operation Currently not Supported in BE
UpsertItemAsync()
UpsertItemStreamAsync()
RepalceItemAsync()
ReplaceItemStreamAsync()
DeleteItemAsync()
DeleteItemStreamAsync()

How to Enable Binary Encoding?

This PR introduces a new environment variable AZURE_COSMOS_BINARY_ENCODING_ENABLED to opt-in or opt-out the binary encoding feature on demand. Setting this environment variable to True will enable Binary encoding support.

How Binary Encoding has been Achieved?

The binary encoding in the .NET SDK has been divided into two parts which are applicable differently for ItemAsync() and ItemStreamAsync() apis. The details are given below:

  • ItemAsync() APIs: Currently the CosmosJsonDotNetSerializer has been refactored to read and write the binary bits directly into the stream. This reduces any conversion of the text stream to binary and vice versa and makes the serialization and de-serialization process even faster.

  • ItemStreamAsync() APIs: For these APIs, there are literally no serializes involved and the stream is returned directly to the caller. Therefore, this flow converts a Text stream into Binary and does the opposite on the response path. Conversion is a little bit costlier operation, in comparison with directly writing the binary stream using the serializer. Note that, irrespective of the binary encoding feature enabled or disabled, the output stream will always be in Text format, unless otherwise requested explicitly.

Are There Any Way to Request Binary Bits on Response?

The answer is yes. We introduced a new internal request option: EnableBinaryResponseOnPointOperations in the ItemRequestOptions, and setting this flag to True will not do any Text conversation, and will return the raw binary bits to the caller. However, please note that this option is applicable only for the ItemStreamAsync() APIs and will be helpful for some of the internal teams.

Flow Diagrams

To understand the changes better, please take a look at the flow diagrams below for both ItemAsync() and ItemStreamAsync() APIs.

Flow Diagram for ItemAsync() APIs that are in Scope per the Above Table:

flowchart TD
    A[All 'ItemAsync' APIs in Scope] -->|SerializerCore.ToStream| B{Select <br> Serializer}
    B -->|One| C[CosmosJsonDotNetSerializer]
    B -->|Two| D[CosmosSystemTextJsonSerializer]
    B -->|Three| E[Any Custom <br> Serializer]
    C -->|Serialize to <br> Binary Stream| F[ContainerCore<br>.ProcessItemStreamAsync]
    D -->|Serialize to <br> Text Stream| F[ContainerCore<br>.ProcessItemStreamAsync]
    E -->|Stream may or <br> may not be <br> Serialized to Binary| F[ContainerCore<br>.ProcessItemStreamAsync]
    F --> G{Is Input <br> Stream in <br> Binary ?}
    G -->|True| I[ProcessResourceOperationStreamAsync]
    G -->|False| H[Convert Input Text <br> Stream to <br> Binary Stream]
    H --> I 
    I --> |SendAsync| J[RequestInvokerHandler]
    J --> |Sets following headers to request response in binary format: 
    x-ms-cosmos-supported-serialization-formats = CosmosBinary
    x-ms-documentdb-content-serialization-format = CosmosBinary| K[TransportHandler]
    K --> |Binary Response <br> Stream|L[ContainerCore<br>.ProcessItemStreamAsync]
    L --> |Note: No explicit conversion to binary stream happens because we let the serializer directly de-serialize the binary stream into text. SerializerCore.FromStream| M{Select <br> Serializer}
    M -->|One| N[CosmosJsonDotNetSerializer]
    M -->|Two| O[CosmosSystemTextJsonSerializer]
    M -->|Three| P[Any Custom <br> Serializer]    
    N -->|De-Serialize to <br> Text Stream| Q[Container<br>.ItemAsync Response]
    O -->|De-Serialize to <br> Text Stream| Q[Container<br>.ItemAsync Response]
    P -->|Stream may or <br> may not be <br> De-Serialized to Text| Q[Container<br>.ItemAsync Response <br> in Text]
Loading

Flow Diagram for ItemStreamAsync() APIs that are in Scope per the Above Table:

flowchart TD
    A[All 'ItemStreamAsync' APIs in Scope]
    A -->|Stream may or <br> may not be <br> Serialized to Binary| F[ContainerCore<br>.ProcessItemStreamAsync]
    F --> G{Is Input <br> Stream in <br> Binary ?}
    G -->|True| I[ProcessResourceOperationStreamAsync]
    G -->|False| H[Convert Input Text <br> Stream to <br> Binary Stream]
    H --> I 
    I --> |SendAsync| J[RequestInvokerHandler]
    J --> |Sets following headers to get binary response: 
    x-ms-cosmos-supported-serialization-formats = CosmosBinary
    x-ms-documentdb-content-serialization-format = CosmosBinary| K[TransportHandler]
    K --> |Binary Response <br> Stream|L[ContainerCore<br>.ProcessItemStreamAsync]
    L --> M{Is Response <br> Stream in <br> Binary ?}
    M -->|Yes| N[CosmosSerializationUtil]
    M -->|No| Q
    N -->|Convert Binary Stream to <br> Text Stream| Q[Container<br>.ItemAsync Response]
Loading

Performance Testing

Below are the comparison results for the perf testing done on the master branch and the current feature branch with binary encoding disabled:

BenchmarkDotNet=v0.13.5, OS=ubuntu 20.04
Intel Xeon Platinum 8272CL CPU 2.60GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.427
  [Host]  : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2
  LongRun : .NET 6.0.35 (6.0.3524.45918), X64 RyuJIT AVX2

Job=LongRun  IterationCount=100  LaunchCount=3  
RunStrategy=Throughput  WarmupCount=15

Benchmark Results with No Binary Encoding on master branch:

image

Benchmark Results with Binary Encoding Disabled on feature branch:

image

Benchmark results comparison in terms of percentage between master and feature branch:

image

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #4644

@kundadebdatta kundadebdatta self-assigned this Aug 23, 2024
@kundadebdatta kundadebdatta added the Do Not Review Marks a PR in "work in progress" state. label Aug 23, 2024
@kundadebdatta kundadebdatta force-pushed the users/dkunda/4644_binary_encoding_for_point_ops branch from 7373fbc to 5173b24 Compare August 23, 2024 06:46
@kundadebdatta kundadebdatta changed the title Users/dkunda/4644 binary encoding for point ops [Internal] Binary Encoding: Adds Binary Encoding Support for Point Operations Aug 31, 2024
Code changes to update STJ Serializer

Adding more tests

Fixing item emulator test.

Minor cosmetic changes.

Adding more tests.

Fixing the cdb to newtonsoft serializer.

Code changes to fix ns reader. Adding more tests.

Minor refactoring.

Optimizing some of the serialization code.

Code changes to change serializer for patch operations.

Modularizating the codebase.

Adding summary for serialization utils.

Code changest to add changes and tests for patch operation.

Adding conversation logic in patch op.

Code changes to add tests for patch operation.

Code changes to refactor binary conversation logic.

Some refactor. Added requred unit tests.

Code changes to orginaze serializer and de-serializer. Modified default json serializer.

Provide option to request binary from item request options.

Code changes to add binary serializer for non stream apis.

Changes in request invocation handler.

remove unnecessary using

Further optimizations.

Code changes to refactor serialization and de-serialization logic.
@kundadebdatta kundadebdatta force-pushed the users/dkunda/4644_binary_encoding_for_point_ops branch from 094faa0 to 2cdf16b Compare September 18, 2024 19:36
@kundadebdatta kundadebdatta added BinaryEncoding binary encoding in .NET sdk and removed Do Not Review Marks a PR in "work in progress" state. labels Sep 18, 2024
@kundadebdatta kundadebdatta added the auto-merge Enables automation to merge PRs label Oct 18, 2024
Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a few minor comments

Copy link
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@microsoft-github-policy-service microsoft-github-policy-service bot merged commit f6ae4c4 into master Oct 23, 2024
23 checks passed
@microsoft-github-policy-service microsoft-github-policy-service bot deleted the users/dkunda/4644_binary_encoding_for_point_ops branch October 23, 2024 23:14
kirankumarkolli pushed a commit that referenced this pull request Nov 25, 2024
# Pull Request Template

## Description

Recently during our v3 sdk CI rolling runs, we observed some performance
regressions on the `ItemStreamAsync()` APIs. They regressed beyond 5%.


![image](https://github.com/user-attachments/assets/66cc4f01-2ec6-47e0-b885-5ad74e02bb63)

Upon doing further investigation, we figured out that during the
non-binary flow, we end up converting the incoming stream into
`CloneableStream` which might be the reason for this regression. Please
note that the reason this was not caught during the [original version of
the binary encoding
PR](#4652) was that
the performance test used to capture the benchmark for the original PR,
was targeted a real cosmos container, where for the CI runs, we use our
mocked containers.

This PR skips `CloneableStream` conversation for non-binary encoding
flow.

With the above change in place, our CI builds started passing:


![image](https://github.com/user-attachments/assets/8293a6e5-6fbc-4953-9de0-37162a081194)

## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #IssueNumber
kundadebdatta added a commit that referenced this pull request Jan 8, 2025
# Pull Request Template

## Description

Recently during our v3 sdk CI rolling runs, we observed some performance
regressions on the `ItemStreamAsync()` APIs. They regressed beyond 5%.


![image](https://github.com/user-attachments/assets/66cc4f01-2ec6-47e0-b885-5ad74e02bb63)

Upon doing further investigation, we figured out that during the
non-binary flow, we end up converting the incoming stream into
`CloneableStream` which might be the reason for this regression. Please
note that the reason this was not caught during the [original version of
the binary encoding
PR](#4652) was that
the performance test used to capture the benchmark for the original PR,
was targeted a real cosmos container, where for the CI runs, we use our
mocked containers.

This PR skips `CloneableStream` conversation for non-binary encoding
flow.

With the above change in place, our CI builds started passing:


![image](https://github.com/user-attachments/assets/8293a6e5-6fbc-4953-9de0-37162a081194)

## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #IssueNumber
kirankumarkolli pushed a commit that referenced this pull request Jan 8, 2025
…#4953)

# Pull Request Template

## Description

Recently during our v3 sdk CI rolling runs, we observed some performance
regressions on the `ItemStreamAsync()` APIs. They regressed beyond 5%.



![image](https://github.com/user-attachments/assets/66cc4f01-2ec6-47e0-b885-5ad74e02bb63)

Upon doing further investigation, we figured out that during the
non-binary flow, we end up converting the incoming stream into
`CloneableStream` which might be the reason for this regression. Please
note that the reason this was not caught during the [original version of
the binary encoding
PR](#4652) was that
the performance test used to capture the benchmark for the original PR,
was targeted a real cosmos container, where for the CI runs, we use our
mocked containers.

This PR skips `CloneableStream` conversation for non-binary encoding
flow.

With the above change in place, our CI builds started passing:



![image](https://github.com/user-attachments/assets/8293a6e5-6fbc-4953-9de0-37162a081194)

## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #IssueNumber

# Pull Request Template

## Description

Please include a summary of the change and which issue is fixed. Include
samples if adding new API, and include relevant motivation and context.
List any dependencies that are required for this change.

## Type of change

Please delete options that are not relevant.

- [x] Bug fix (non-breaking change which fixes an issue)

## Closing issues

To automatically close an issue: closes #IssueNumber
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs BinaryEncoding binary encoding in .NET sdk
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Binary Encoding - Add Support for Point Operations
4 participants