Provide a backpressure enabled pipeline#6486
Merged
Conversation
There's a long way to go, but some interesting aspects of the required changes are implemented here. I noted that a significant number of errors and glitches and tests are resolved by making these changes. Mainly places where we were calling a service without readying it first or places where we were cloning an inner service without doing the memory "replace" dance. Because we are using a "trick" to make the router service cloneable right now, there are a few tests which don't work properly. I think for the "full" work, we'll need to make the router service properly cloneable (without requiring a mutex). This will require some fairly substantial re-working of a wide variety of services and layers. On the plus side, once we've done that work we'll be able to retire a bunch of code that we've written that we will no longer require. I'll pick this up in the New Year...
Collaborator
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 0 new, 1 changed, 0 removedBuild ID: c58fa85d7d545d7bab96c675 URL: https://www.apollographql.com/docs/deploy-preview/c58fa85d7d545d7bab96c675 |
|
CI performance tests
|
That test is just old on dev/next, so fixing it makes sense Also: add a note to coprocessor test changes to remind me that I need to understand what is happening there before this branch can merge.
re-order use statements to keep cargo fmt happy
To see how far off passing we are in CI
IMPORTANT: This change modifies the supergraph method invocation test to be a router_service service invocation test. Amongst other important details (such as we are now really testing the service through the full pipeline), it's important to note that we can't use `oneshot()` and just re-create the service every time we want to call it. If we do, then the rate limiting details are lost. So, we must re-use our service to make sure that state isn't lost.
Also: - update snapshot so it knows about concurrency - replace a bad supergraph test with a broken router test - re-order traffic shaping so that timeouts > concurrency > rate-limit
The code added yesterday was creating errors as data and using numeric codes (rather than the magic strings in 1.x). Re-instated the 1.x behaviour for reporting errors and also fixed the new timeout test.
Since we now fail traffic shaping tests at the router_service, they are not counted as graphql_errors (which are only processed, as they should be at the supergraph_service). IMO, these should never have been counted as graphql errors anyway, since they clearly aren't graphql errors but are traffic shaping (rate limit, timeout, etc...) errors. We'll still report them to the user as a 504 or a 503 or whatever, but they won't count towards the graphql_error metric. I've also updated a snapshot to reflect the error message we now provide.
Not required for GA. Implement later as a separate project.
bnjjj
reviewed
Jan 13, 2025
added 7 commits
January 13, 2025 17:59
This breaks all the http_post_mutation tests because of changes in expectation.
The AsyncCheckpoint layer was using `oneshot` and wasn't calling the prepared service. I've fixed that. This affects some of the tests, so I've fixed them as well.
Make it pass until subgraph rate limiting is changed. We'll need to update the test agains at that point.
The various tests in the limits layer should be readying their service before they call it. The deduplication function (dedup) accepts a readied service and then used ready_oneshot(). Since it can only accept a readied service, modify this to be a simple call() invocation.
added 5 commits
January 23, 2025 08:47
A bit fiddly to find and fix, so I only just spotted it.
I've left the `fake` "builders" taking an Option and using String::default() if None is supplied. This seems like a nice compromise.
BrynCooke
reviewed
Jan 24, 2025
BrynCooke
requested changes
Jan 24, 2025
apollo-router/src/services/layers/allow_only_http_post_mutations.rs
Outdated
Show resolved
Hide resolved
You can't Arc Batching in RouterService. It contains request specific state about the progress of a Batch and is broken if shared across multiple services. I didn't correctly poll the entity cache inner service. This is now fixed.
It's simpler to understand and fits in well with the Tower method. It may improve performance as well. also: Add a big comment to DEFAULT_BUFFER_SIZE to try and explain what it represents.
Put some basic details into the migration guide to be enhanced as required before release. also: Update the traffic shaping docs for concurrency.
This now just uses http extensions to store the control. Test added for service reuse, and limits dynamic update fixed.
Make sure the migration docs references the backpressure PR.
…vice_closed` test This appears to test an intended-enough behaviour. Originally introduced: #918 Comment outdated as of: #1440 This is a behaviour change compared to #1440, but the intended behaviour from #918. What's a bit worrying is that this depends on the internal composition of our pipeline, but is externally observable?
Move setting of if there was a graphql error into the router and supergraph builders. This will enable users to use on_graphql_error for all cases where we return an error.
BrynCooke
approved these changes
Jan 27, 2025
added 2 commits
January 27, 2025 15:08
apollo-router/src/plugins/telemetry/config_new/spans.rs The deleted line at line:443 caused the following test to fail. plugins::telemetry::config_new::spans::test::test_router_request_custom_attribute_on_graphql_error I've restored that line and the test is now passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Most of the functionality is now in place for effective traffic-shaping at the router and subgraph levels.
Please read the review guidance before starting your review.
dev-docs/BACKPRESSURE_REVIEW_NOTES.md