Merged
Conversation
Collaborator
|
|
CI performance tests
|
garypen
approved these changes
Feb 5, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Features
Improve BatchProcessor observability (Issue #6558)
A new metric has been introduced to allow observation of how many spans are being dropped by an telemetry batch processor.
apollo.router.telemetry.batch_processor.errors- The number of errors encountered by exporter batch processors.name: One ofapollo-tracing,datadog-tracing,jaeger-collector,otlp-tracing,zipkin-tracing.error= One ofchannel closed,channel full.By observing the number of spans dropped it is possible to estimate what batch processor settings will work for you.
In addition, the log message for dropped spans will now indicate which batch processor is affected.
By @bryncooke in #6558
🐛 Fixes
Improve performance of query hashing by using a precomputed schema hash (PR #6622)
The router now uses a simpler and faster query hashing algorithm with more predictable CPU and memory usage. This improvement is enabled by using a precomputed hash of the entire schema, rather than computing and hashing the subset of types and fields used by each query.
For more details on why these design decisions were made, please see the PR description
By @IvanGoncharov in #6622
Truncate invalid error paths (PR #6359)
This fix addresses an issue where the router was silently dropping subgraph errors that included invalid paths.
According to the GraphQL Specification an error path must point to a response field:
The router now truncates the path to the nearest valid field path if a subgraph error includes a path that can't be matched to a response field,
By @IvanGoncharov in #6359
Eagerly init subgraph operation for subscription primary nodes (PR #6509)
When subgraph operations are deserialized, typically from a query plan cache, they are not automatically parsed into a full document. Instead, each node needs to initialize its operation(s) prior to execution. With this change, the primary node inside SubscriptionNode is initialized in the same way as other nodes in the plan.
By @tninesling in #6509
Fix increased memory usage in
sysinfosince Router 1.59.0 (PR #6634)In version 1.59.0, Apollo Router started using the
sysinfocrate to gather metrics about available CPUs and RAM. By default, that crate usesrayoninternally to parallelize its handling of system processes. In turn, rayon creates a pool of long-lived threads.In a particular benchmark on a 32-core Linux server, this caused resident memory use to increase by about 150 MB. This is likely a combination of stack space (which only gets freed when the thread terminates) and per-thread space reserved by the heap allocator to reduce cross-thread synchronization cost.
This regression is now fixed by:
sysinfo’s use ofrayon, so the thread pool is not created and system processes information is gathered in a sequential loop.sysinfonot gather that information in the first place since Router does not use it.By @SimonSapin in #6634
Optimize demand control lookup (PR #6450)
The performance of demand control in the router has been optimized.
Previously, demand control could reduce router throughput due to its extra processing required for scoring.
This fix improves performance by shifting more data to be computed at plugin initialization and consolidating lookup queries:
By @tninesling in #6450
Fix missing Content-Length header in subgraph requests (Issue #6503)
A change in
1.59.0caused the Router to send requests to subgraphs without aContent-Lengthheader, which would cause issues with some GraphQL servers that depend on that header.This solves the underlying bug and reintroduces the
Content-Lengthheader.By @nmoutschen in #6538
🛠 Maintenance
Remove the legacy query planner (PR #6418)
The legacy query planner has been removed in this release. In the previous release, router v1.58, it was no longer used by default but was still available through the
experimental_query_planner_modeconfiguration key. That key is now removed.Also removed are configuration keys which were only relevant to the legacy planner:
supergraph.query_planning.experimental_parallelism: the new planner can always use available parallelism.supergraph.experimental_reuse_query_fragments: this experimental algorithm that attempted toreuse fragments from the original operation while forming subgraph requests is no longer present. Instead, by default new fragment definitions are generated based on the shape of the subgraph operation.
By @SimonSapin in #6418
Migrate various metrics to OTel instruments (PR #6476, PR #6356, PR #6539)
Various metrics using our legacy mechanism based on the
tracingcrate are migrated to OTel instruments.By @goto-bus-stop in #6476, #6356, #6539
📚 Documentation
Add instrumentation configuration examples (PR #6487)
The docs for router telemetry have new example configurations for common use cases for selectors and condition.
By @shorgi in #6487
🧪 Experimental
Remove experimental_retry option (PR #6338)
The
experimental_retryoption has been removed due to its limited use and functionality during its experimental phase.By @bnjjj in #6338