use spawn_blocking for parsing by Geal · Pull Request #5582 · apollographql/router

Geal · 2024-07-02T08:36:19Z

This leverages the work that @xuorig started in #5235 with a number of necessary follow-up fixes to get it into the shape we'll need to land it.

I have checked manually that the span duplication in the snapshot has no impact on traces reported to aggregators, it's more of an artifact of our way to gather spans in the test.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

router-perf · 2024-07-02T08:36:51Z

.changesets/fix_spawn_blocking_parser.md

Geal · 2024-07-02T17:24:01Z

it looks like there's some flakiness in the tests with the snapshots https://app.circleci.com/pipelines/github/apollographql/router/23631/workflows/0230bc91-f65c-47a1-ae33-6128fae26325/jobs/165914?invite=true#step-117-361965_43
@garypen if I cannot fix that tomorrow, can you take over that PR?

abernix · 2024-07-05T14:52:44Z

There is indeed an on-going something or another in the tests. @BrynCooke did try to look at this today, but will not be around for the next couple weeks, and I'm not sure will get back to it.

I will highlight that there is still conversation that wasn't answered from the original PR that might be worth discussing:

@xuorig wrote:

Few other questions:

Is specific back pressure for the spawn blocking needed at this level? I'm thinking this is fine for now, back pressure can happen as a concurrency limiter / rate limiter at ingress.

Would a wait map similar to planning make sense eventually here?

As per the extensive commentary here: https://docs.rs/tracing/latest/tracing/span/struct.Span.html#method.or_current Let's see if this fixes things.

They should be the same as in dev.

Not sure exactly what is happening here, but ...

This might be the solution I've been looking for. Make sure to create the spawn outside of the call to spawn_blocking.

Geal · 2024-07-09T17:38:18Z

Thanks @garypen for finishing that one

abernix · 2024-07-10T18:50:49Z

This PR is causing some tests to flake at a somewhat pronounced rate. We believe it's just actually the tests, or the way we tested, but we'll revert this for now.

This reverts commit 47a7386.

The parsing and validation step is a blocking task that can be expensive on large queries. When performed on a tokio executor threads, that thread is unavailable to handle other asynchronous requests in the mean time. Which means that if a lot of large queries come in at once, they could lock up all of the executor threads, and the router then stops handling traffic. This moves parsing validation in tokio's blocking tasks pool. This is a set of threads allocated purely to blocking tasks, exposing an async interface for the rest of the code, so executor threads can offload the parsing there then go back to the rest of the traffic while the query is parsed. If too many large queries come in at once, the blocking pool might get temporarily used entirely, but this will not affect traffic for other queries that were already parsed and planned, and the request handling timeout can trigger if it waits too long for its query to be parsed. Co-authored-by: Marc-Andre Giroux <mgiroux@netflix.com> Co-authored-by: Marc-Andre Giroux <mgiroux0@gmail.com>

Marc-Andre Giroux and others added 14 commits May 24, 2024 10:09

use std mutex for query analysis cache, use spawn_blocking for parsing

67c9361

changeset

98908a1

fmt

8e5e47f

Merge branch 'dev' into spawn-blocking-parser

5ce278e

dont change mutex yet

70f1f1b

refactor, spawn_blocking in warm up as well

b4625cc

lint

df8a6b6

Merge branch 'dev' into spawn-blocking-parser

1aee208

Merge branch 'dev' into spawn-blocking-parser

128c55c

Merge branch 'dev' into spawn-blocking-parser

a7f1cc2

Merge branch 'dev' into spawn-blocking-parser

9e6ea3e

lint

c3493e9

Merge branch 'dev' into spawn-blocking-parser

1886e58

update snapshots

1c6df83

Geal requested a review from a team July 2, 2024 08:36

Geal requested a review from a team as a code owner July 2, 2024 08:36

apollo-bot2 assigned Geal Jul 2, 2024

Geal commented Jul 2, 2024

View reviewed changes

.changesets/fix_spawn_blocking_parser.md Outdated Show resolved Hide resolved

Update .changesets/fix_spawn_blocking_parser.md

af60e73

Geal commented Jul 2, 2024

View reviewed changes

.changesets/fix_spawn_blocking_parser.md Outdated Show resolved Hide resolved

Update .changesets/fix_spawn_blocking_parser.md

6a994d5

garypen approved these changes Jul 2, 2024

View reviewed changes

check snapshots again

f0fed76

Geal enabled auto-merge (squash) July 2, 2024 09:47

Geal mentioned this pull request Jul 5, 2024

use spawn_blocking for parsing #5235

Closed

Gary Pennington added 2 commits July 9, 2024 14:37

Changing span processing from sync to async requires special handling

8836e0e

As per the extensive commentary here: https://docs.rs/tracing/latest/tracing/span/struct.Span.html#method.or_current Let's see if this fixes things.

revert snapshot changes

0c5d696

They should be the same as in dev.

Gary Pennington added 3 commits July 9, 2024 15:27

Merge branch 'dev' into spawn-blocking-parser

b1389e3

Try to cleanup snapshots

f08769f

Not sure exactly what is happening here, but ...

Another try at fixing the snapshot inconsistency

996914c

This might be the solution I've been looking for. Make sure to create the spawn outside of the call to spawn_blocking.

Geal merged commit 47a7386 into dev Jul 9, 2024

Geal deleted the spawn-blocking-parser branch July 9, 2024 16:02

abernix added a commit that referenced this pull request Jul 10, 2024

Revert "use spawn_blocking for parsing (#5582)"

fe204fa

This reverts commit 47a7386.

This was referenced Jul 10, 2024

Revert "use spawn_blocking for parsing" #5643

Merged

Reintroduce "use spawn_blocking for parsing" #5644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use spawn_blocking for parsing#5582

use spawn_blocking for parsing#5582
Geal merged 22 commits intodevfrom
spawn-blocking-parser

Geal commented Jul 2, 2024 •

edited by atlassian bot

Loading

Uh oh!

router-perf bot commented Jul 2, 2024

Uh oh!

Uh oh!

Uh oh!

Geal commented Jul 2, 2024 •

edited

Loading

Uh oh!

abernix commented Jul 5, 2024

Uh oh!

Geal commented Jul 9, 2024

Uh oh!

abernix commented Jul 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Geal commented Jul 2, 2024 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

router-perf bot commented Jul 2, 2024

Uh oh!

Uh oh!

Uh oh!

Geal commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abernix commented Jul 5, 2024

Uh oh!

Geal commented Jul 9, 2024

Uh oh!

abernix commented Jul 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Geal commented Jul 2, 2024 •

edited by atlassian bot

Loading

Geal commented Jul 2, 2024 •

edited

Loading