Skip to content

For CosmosDB bulk api added support for splitting of batch based on size.#23987

Merged
sajeetharan merged 8 commits intoAzure:mainfrom
v1k1:batch-size-2mb-limit
Feb 10, 2023
Merged

For CosmosDB bulk api added support for splitting of batch based on size.#23987
sajeetharan merged 8 commits intoAzure:mainfrom
v1k1:batch-size-2mb-limit

Conversation

@v1k1
Copy link
Copy Markdown
Contributor

@v1k1 v1k1 commented Nov 24, 2022

Packages impacted by this PR

@azure/cosmos

Issues associated with this PR

#23923

Describe the problem that is addressed by this PR

CosmosDB Items.bulk api doens't honour 2Mb cap imposed on a single batch request. With these changes if size of a batch (cumulative size of it's operations) exceeds 2Mb it is split into smaller batches before sending.

What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen?

Are there test cases added in this PR? (If not, why?)

Yes

Provide a list of related PRs (if any)

Command used to generate this PR:**(Applicable only to SDK release request PRs)

Checklists

  • Added impacted package name to the issue description
  • Does this PR needs any fixes in the SDK Generator?** (If so, create an Issue in the Autorest/typescript repository and link it here)
  • Added a changelog (if necessary)

@azure-sdk
Copy link
Copy Markdown
Collaborator

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-cosmos

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added under unreleased 3.17.3

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Will remove this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this specific size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken this constant from Java and .Net SDK. this is slightly lower than 2Mb but this seems to the value used by both SDKs.

@v1k1 v1k1 force-pushed the batch-size-2mb-limit branch from 50c8816 to 99e04a4 Compare November 24, 2022 09:07
Copy link
Copy Markdown
Member

@sajeetharan sajeetharan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@sajeetharan sajeetharan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@witemple-msft witemple-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach seems sound to me, but I think there are a couple of bugs that are worth fixing.

* @hidden
*/
export function splitBatchBasedOnBodySize(originalBatch: Batch): Batch[] {
if (originalBatch?.operations === undefined && originalBatch.operations.length < 1) return [];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an OR rather than AND? If no operations are defined OR the operations are of length zero? This condition seems like it will have the opposite of the intended effect because if originalBatch?.operations is undefined then the next condition must throw an error.

* @hidden
*/
export function calculateObjectSizeInBytes(obj: unknown): number {
return new TextEncoder().encode(bodyFromData(obj as any)).length;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a note that this feels expensive. You're basically encoding the body into a buffer, which the request pipeline must do anyway, for the sole purpose of measuring the body's length. I'm not sure if there's a great alternative to doing this to be honest, but it sticks out at me as being a costly operation.

@sajeetharan sajeetharan merged commit 17ef738 into Azure:main Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants