[Cosmos] [Don't Review] add azcosmos_perf — Go SDK performance testing tool#26764
Draft
tvaron3 wants to merge 2 commits into
Draft
[Cosmos] [Don't Review] add azcosmos_perf — Go SDK performance testing tool#26764tvaron3 wants to merge 2 commits into
tvaron3 wants to merge 2 commits into
Conversation
…and ReadMany ops Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ource
When an operation panics and the recover handler in runIteration catches it,
also pass the captured goroutine stack trace through to the ErrorResults
document via a new UpsertErrorWithSource helper. Previously only
fmt.Errorf("panic: %v", r) reached ADX and the stack went solely to
container stderr, making post-mortem investigation impossible once the pod
recycled.
UpsertError remains an unchanged convenience wrapper for non-panic call sites.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Adds a new Go performance testing tool (
sdk/data/azcosmos_perf) that mirrors the Rustazure_data_cosmos_perfcrate, intended for steady-state benchmarking of the Go Cosmos DB SDK on VMs / AKS with metrics emitted to Azure Data Explorer / Grafana.Features
ReadItem,QueryItems,ReadManyItems,UpsertItem,CreateItem,ChangeFeedItemsx-ms-request-duration-msaccountingrecover()keeps the process alive on a panic in any single op and writes the failure (with full goroutine stack trace insource_message) to the ADXErrorResultstable for post-mortemMaxItemCount, ReadMany batch sizePerfResults,ErrorResults) with per-interval upserts so dashboards stay liveentrypoint.shandrun_perf.shfor the AKS deployment used in our internal runsCommits
feat(azcosmos_perf): add Go performance testing tool with ChangeFeed and ReadMany ops— the package itselffeat(azcosmos_perf): persist panic stack trace to ADX error_message source— the recover handler now persists the stack trace, not just the panic messageWhat this PR deliberately does not include
getPartitionKeyRangesnil-deref fix incosmos_container.gothat this perf tool uncovered in production at concurrency=200. That exact same fix is already present in @simorenoh's open PR [Cosmos] add container cache and pk range cache #26723 ("[Cosmos] add container cache and pk range cache"). No need to duplicate or risk a conflict.newResponse(nil)hardening incosmos_response.go— best as its own small SDK PR.Production validation
The tool has been running on AKS (2 pods, concurrency=50) backed by ADX/Grafana. After applying the panic-handler stack-trace fix and (locally) the pk-range fix from #26723, 0 panics / 0 errors across all 6 operations at ~135K ops/op/5min sustained throughput. Earlier runs at concurrency=200 reproduced the panic 69 times in 4 hours — exclusively in
ReadManyItemsandChangeFeedItems, which both callgetPartitionKeyRanges.Draft because
sdk/data/azcosmos/perfvs top-levelazcosmos_perf)