Properly handle parsing of large request documents#9133
Conversation
6c8fd13 to
93c0b99
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves handling of large GraphQL request documents by supporting multi-segment UTF-8 input (ReadOnlySequence<byte>) through the request parser and document hash providers, with added tests for these scenarios.
Changes:
- Extend
Utf8GraphQLRequestParserto parsequeryvalues provided asUtf8JsonReader.ValueSequence(multi-segment). - Add
ReadOnlySequence<byte>hashing support toIDocumentHashProviderand concrete hash providers (MD5/SHA1/SHA256). - Add/expand tests for multi-segment parsing and hashing (including a large query case).
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/HotChocolate/Language/test/Language.Web.Tests/Utf8GraphQLRequestParserTests.cs | Adds coverage for parsing large query values delivered as a multi-segment sequence. |
| src/HotChocolate/Language/test/Language.Web.Tests/Sha256DocumentHashProviderTests.cs | Adds hashing tests for single- and multi-segment sequences (Base64/Hex). |
| src/HotChocolate/Language/test/Language.Web.Tests/Sha1DocumentHashProviderTests.cs | Adds hashing tests for single- and multi-segment sequences (Base64/Hex). |
| src/HotChocolate/Language/test/Language.Web.Tests/MD5DocumentHashProviderTests.cs | Adds hashing tests for single- and multi-segment sequences (Base64/Hex). |
| src/HotChocolate/Language/test/Language.Web.Tests/SequenceHelper.cs | Introduces a helper to build multi-segment ReadOnlySequence<byte> instances for tests. |
| src/HotChocolate/Language/src/Language.Web/Utf8GraphQLRequestParser.cs | Adds sequence-aware query extraction and a ParseDocument(ReadOnlySequence<byte>) overload. |
| src/HotChocolate/Language/src/Language.Web/Sha256DocumentHashProvider.cs | Implements sequence hashing via incremental hashing. |
| src/HotChocolate/Language/src/Language.Web/Sha1DocumentHashProvider.cs | Implements sequence hashing via incremental hashing. |
| src/HotChocolate/Language/src/Language.Web/MD5DocumentHashProvider.cs | Implements sequence hashing via incremental hashing. |
| src/HotChocolate/Language/src/Language.Web/IDocumentHashProvider.cs | Adds a new ComputeHash(ReadOnlySequence<byte>) API surface. |
| src/HotChocolate/Language/src/Language.Web/DocumentHashProviderBase.cs | Adds base implementation for sequence hashing with framework-conditional behavior. |
Comments suppressed due to low confidence (1)
src/HotChocolate/Language/src/Language.Web/Utf8GraphQLRequestParser.cs:329
- Persisted-query detection only checks
documentSpan.IsEmpty. When thequeryJSON string is delivered viaUtf8JsonReader.ValueSequence(large / multi-segment),documentSpanstays empty even though a query was provided, so this block can incorrectly treat the request as a persisted-query request ifextensionscontains a hash. Update the condition to ensure bothdocumentSpananddocumentSequenceare empty before attempting to extract a persisted-query hash.
// Handle persisted queries via extensions
if (documentSpan.IsEmpty
&& documentId.IsEmpty
&& _useCache
&& extensions is not null
&& TryExtractHash(extensions, out var hash))
{
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var r = Assert.Single(batch); | ||
| Assert.Null(r.OperationName); | ||
| Assert.Null(r.DocumentId); | ||
| Assert.Null(r.Variables); | ||
| Assert.Null(r.Extensions); | ||
| r.Document.MatchSnapshot(); | ||
| } |
There was a problem hiding this comment.
This test snapshot is extremely large (the generated Parse_Large_Query_Sequence.snap is ~23k lines). Very large snapshots slow down review/CI and are hard to maintain. Prefer asserting specific properties (e.g., selection count, a few representative field names, and/or that parsing succeeds without allocating a single segment) instead of snapshotting the entire parsed document.
| OperationDocumentHash ComputeHash(ReadOnlySpan<byte> document); | ||
|
|
||
| /// <summary> | ||
| /// Computes the hash of a GraphQL operation document. | ||
| /// </summary> | ||
| /// <param name="document">The GraphQL operation document.</param> | ||
| /// <returns>The hash of the GraphQL operation document.</returns> | ||
| OperationDocumentHash ComputeHash(ReadOnlySequence<byte> document); |
There was a problem hiding this comment.
Adding a new member to the public IDocumentHashProvider interface is a breaking change for any external consumers that implement this interface. If backwards compatibility is required, consider introducing a new interface (e.g., IDocumentHashProvider2) or providing sequence hashing via an extension/helper that falls back to the span overload for single-segment sequences.
| #if NETSTANDARD2_0 | ||
| var length = checked((int)document.Length); | ||
| var rented = ArrayPool<byte>.Shared.Rent(length); | ||
|
|
||
| try | ||
| { | ||
| document.CopyTo(rented); | ||
| return ComputeHash(rented.AsSpan(0, length)); | ||
| } | ||
| finally | ||
| { | ||
| ArrayPool<byte>.Shared.Return(rented); | ||
| } |
There was a problem hiding this comment.
On NETSTANDARD2_0, the multi-segment ComputeHash(ReadOnlySequence<byte>) path rents/copies the sequence into rented, but then calls ComputeHash(rented.AsSpan(...)), which in turn rents and copies again in the span overload (because the span overload always copies into an ArrayPool buffer on NETSTANDARD2_0). This causes a double rent+copy for multi-segment sequences. Consider computing the hash directly from the first rented buffer by calling the ComputeHash(byte[] document, int length) abstraction and formatting it, to avoid the extra allocation/copy.
| if (isEscaped) | ||
| { | ||
| var length = checked((int)documentBody.Length); | ||
| var rented = s_bytePool.Rent(length); | ||
|
|
||
| try | ||
| { | ||
| documentBody.CopyTo(rented); | ||
| ParseDocument(rented.AsSpan(0, length), isEscaped, ref document, ref documentHash, ref documentId); | ||
| } |
There was a problem hiding this comment.
In the ReadOnlySequence<byte> overload, the escaped path rents a buffer to make the sequence contiguous and then calls the span overload with isEscaped: true, which causes the span overload to rent another buffer for unescaping. This doubles peak memory and copies for large escaped documents. Consider unescaping directly from the contiguous buffer (e.g., call the span overload with isEscaped: false after unescaping yourself) or refactor so the unescape buffer can be reused instead of renting twice.
Fusion Gateway Performance ResultsSimple Composite Query
Response Times & Query
query TestQuery {
topProducts(first: 5) {
inStock
name
price
shippingEstimate
upc
weight
reviews {
id
body
author {
id
username
name
}
}
}
}Deep Recursion Query
Response Times & Query
query TestQuery {
users {
id
username
name
reviews {
id
body
product {
inStock
name
price
shippingEstimate
upc
weight
reviews {
id
body
author {
id
username
name
reviews {
id
body
product {
inStock
name
price
shippingEstimate
upc
weight
}
}
}
}
}
}
}
topProducts(first: 5) {
inStock
name
price
shippingEstimate
upc
weight
reviews {
id
body
author {
id
username
name
reviews {
id
body
product {
inStock
name
price
shippingEstimate
upc
weight
}
}
}
}
}
}Variable Batching Throughput
Response Times & Query
query TestQuery($upc: ID!, $price: Long!, $weight: Long!) {
productByUpc(upc: $upc) {
inStock
shippingEstimate(weight: $weight, price: $price)
}
}Variables (5 sets batched per request) [
{ "upc": "1", "price": 899, "weight": 100 },
{ "upc": "2", "price": 1299, "weight": 1000 },
{ "upc": "3", "price": 15, "weight": 20 },
{ "upc": "4", "price": 499, "weight": 100 },
{ "upc": "5", "price": 1299, "weight": 1000 }
]Run 22139169896 • Commit e8f0cb2 • Wed, 18 Feb 2026 12:39:04 GMT |
No description provided.