Merged
Conversation
|
CI performance tests
|
SimonSapin
approved these changes
Jul 30, 2024
garypen
approved these changes
Jul 30, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Features
Provide helm support for when router's health_check's default path is not being used(Issue #5652)
When helm chart is defining the liveness and readiness check probes, if the router has been configured to use a non-default health_check path, use that rather than the default ( /health )
By Jon Christiansen in #5653
Support new span and metrics formats for entity caching (PR #5625)
Metrics of the router's entity cache have been converted to the latest format with support for custom telemetry.
The following example configuration shows the the
cacheinstrument, thecacheselector in the subgraph service, and thecacheattribute of a subgraph span:To learn more, go to Entity caching docs.
By @Geal and @bnjjj in #5625
Helm: Support renaming key for retrieving APOLLO_KEY secret (Issue #5661)
A user of the router Helm chart can now rename the key used to retrieve the value of the secret key referenced by
APOLLO_KEY.Previously, the router Helm chart hardcoded the key name to
managedFederationApiKey. This didn't support users whose infrastructure required custom key names when getting secrets, such as Kubernetes users who need to use specific key names to access asecretStoreorexternalSecret. This change provides a user the ability to control the name of the key to use in retrieving that value.By Jon Christiansen in #5662
🐛 Fixes
Prevent Datadog timeout errors in logs (Issue #2058)
The router's Datadog exporter has been updated to reduce the frequency of logged errors related to connection pools.
Previously, the connection pools used by the Datadog exporter frequently timed out, and each timeout logged an error like the following:
Now, the pool timeout for the Datadog exporter has been changed so that timeout errors happen much less frequently.
By @BrynCooke in #5692
Allow service version overrides (PR #5689)
The router now supports configuration of
service.versionvia YAML file configuration. This enables users to produce custom versioned builds of the router.The following example overrides the version to be
1.0:By @BrynCooke in #5689
Populate Datadog
span.kind(PR #5609)Because Datadog traces use
span.kindto differentiate between different types of spans, the router now ensures thatspan.kindis correctly populated using the OpenTelemetry span kind, which has a 1-2-1 mapping to those set out in dd-trace.By @BrynCooke in #5609
Remove unnecessary internal metric events from traces and spans (PR #5649)
The router no longer includes some internal metric events in traces and spans that shouldn't have been included originally.
By @bnjjj in #5649
Support Datadog span metrics (PR #5609)
When using the APM view in Datadog, the router now displays span metrics for top-level spans or spans with the
_dd.measuredflag set.The router sets the
_dd.measuredflag by default for the following spans:requestroutersupergraphsubgraphsubgraph_requesthttp_requestquery_planningexecutionquery_parsingTo enable or disable span metrics for any span, configure
span_metricsfor the Datadog exporter:By @BrynCooke in #5609 and #5703
Use spawn_blocking for query parsing and validation (PR #5235)
To prevent its executor threads from blocking on large queries, the router now runs query parsing and validation in a Tokio blocking task.
By @xuorig in #5235
🛠 Maintenance
chore: Update rhai to latest release (1.19.0) (PR #5655)
In Rhai 1.18.0, there were changes to how exceptions within functions were created. For details see: https://github.com/rhaiscript/rhai/blob/7e0ac9d3f4da9c892ed35a211f67553a0b451218/CHANGELOG.md?plain=1#L12
We've modified how we handle errors raised by Rhai to comply with this change, which means error message output is affected. The change means that errors in functions will no longer document which function the error occurred in, for example:
Making this change allows us to keep up with the latest version (1.19.0) of Rhai.
By @garypen in #5655
Add version in the entity cache hash (PR #5701)
The hashing algorithm of the router's entity cache has been updated to include the entity cache version.
[!IMPORTANT]
If you have previously enabled entity caching, you should expect additional cache regeneration costs when updating to this version of the router while the new hashing algorithm comes into service.
By @bnjjj in #5701
Improve testing by avoiding cache effects and redacting tracing details (PR #5638)
We've had some problems with flaky tests and this PR addresses some of them.
The router executes in parallel and concurrently. Many of our tests use snapshots to try and make assertions that functionality is continuing to work correctly. Unfortunately, concurrent/parallel execution and static snapshots don't co-operate very well. Results may appear in pseudo-random order (compared to snapshot expectations) and so tests become flaky and fail without obvious cause.
The problem becomes particularly acute with features which are specifically designed for highly concurrent operation, such as batching.
This set of changes addresses some of the router testing problems by:
By @garypen in #5638
📚 Documentation
Update router naming conventions (PR #5400)
Renames our router product to distinguish between our non-commercial and commercial offerings. Instead of referring to the Apollo Router, we now refer to the following:
By @shorgi in #5400
🧪 Experimental
Enable Rust-based API schema implementation (PR #5623)
The router has transitioned to solely using a Rust-based API schema generation implementation.
Previously, the router used a Javascript-based implementation. After testing for a few months, we've validated the improved performance and robustness of the new Rust-based implementation, so the router now only uses it.
By @goto-bus-stop in #5623