Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supergraph compose should call SubgraphFetchQuery only one time, not once for every subgraph #992

Closed
theJC opened this issue Feb 10, 2022 · 11 comments · Fixed by #1000, #1056, #1063, #1079 or #1392
Assignees

Comments

@theJC
Copy link

theJC commented Feb 10, 2022

Description

It appears Rover supergraph compose makes a query SubgraphFetchQuery call for each subgraph, but the call gets all schemas for all subgraphs. Our supergraph in one of our environments has 70 subgraphs in it, and if I run the rover supergraph compose command with --log trace I see 70 calls with:

DEBUG rover_client::blocking::client: Request Body: {"variables":{"graph_id":"indeed-onegraph","variant":"qa"},"query":"query SubgraphFetchQuery($graph_id: ID!, $variant: String!) {\n service(id: $graph_id) {\n implementingServices(graphVariant: $variant) {\n __typename\n ... on FederatedImplementingServices {\n services {\n name\n url\n activePartialSchema {\n sdl\n }\n }\n }\n }\n }\n}\n","operationName":"SubgraphFetchQuery"}

And the command takes close to 50 seconds to complete.

Steps to reproduce

Cannot share exact supergraph yaml file being used as it contains specifics about our environment that I cannot share.

Expected result

Only one call to call the SubgraphFetchQuery should be made per execution of the rover supergraph compose command and it should take < 5 seconds to the command to complete.

Actual result

It took 50 seconds to accomplish and made 70 SubgraphFetchQuery calls

Environment

Run rover info and paste the results here
Rover Info:
Version: 0.4.1
Install Location: /Users/jchristiansen/.rover/bin/rover
OS: Mac OS 12.2.0 [64-bit]
Shell: /bin/zsh

@theJC theJC added bug 🐞 triage issues and PRs that need to be triaged labels Feb 10, 2022
@theJC theJC changed the title supergraph compose should call SubgraphFetchQuery just once, not once for every subgraph supergraph compose should call SubgraphFetchQuery only one time, not once for every subgraph Feb 11, 2022
@EverlastingBugstopper
Copy link
Contributor

Hey @theJC, thanks for the report. I agree that this behavior is not ideal and there are many reasons to improve this.

Unfortunately, I don't think the Studio API currently supports fetching more than one subgraph schema with a given query so putting everything into a single query is blocked until that support comes.

We are talking about introducing a .rover "project" directory that might have some sort of "graph_modules" subdirectory akin to "node_modules" that would have copies of all of the resolved subgraphs. In order to not have to redownload that every time we'd need to store the history somehow, either with something like an etag or a better hash that we can get from the API.

Unfortunately I don't think this will be solved in the near future as we are still in design phase for this.

As a workaround for now until we have first-class support for this, you may be able to roll your own sorta hack around it if you write an "update" script that loops through the output of rover subgraph list, and then uses those subgraph names to run rover subgraph fetch and output them to a local directory. If you then refer to those local subgraph schemas in your supergraph.yml, it won't redownload the subgraph schemas, it'll read from what you have locally. You can then run your update script manually and while it'll be doing the same exact thing as rover supergraph compose that is inefficient and may take a while, it won't run on every composition which should speed up your iteration process.

Hope this helps!

@theJC
Copy link
Author

theJC commented Feb 11, 2022

@EverlastingBugstopper -- The response coming back a single execution of that query CONTAINS ALL of the subgraphs and their activePartialSchemas, it isn't returning just a single individual subgraph each time its called.

query SubgraphFetchQuery($graph_id: ID!, $variant: String!) {
  service(id: $graph_id) {
    implementingServices(graphVariant: $variant) {
      __typename
      ... on FederatedImplementingServices {
        services {
          name
          url
          activePartialSchema {
            sdl
          }
        }
      }
    }
  }
}

@EverlastingBugstopper
Copy link
Contributor

AH! You're so right here! Ok wow this is an expensive query. It seems like the opposite of what I previously said is true, you can only fetch all subgraphs from a supergraph. for your case, these are all likely coming from the same graph, but it's feasible that you could be creating a supergraph from subgraphs that are deployed across multiple graphs or multiple variants. so, we need to query for all of those. however we could definitely go through and find all of the subgraphs we need and then batch those requests so that one request is performed per supergraph variant.

thanks for the report on this, we'll want to get this fixed up for sure.

@theJC

This comment was marked as off-topic.

@EverlastingBugstopper

This comment was marked as off-topic.

@theJC
Copy link
Author

theJC commented Mar 4, 2022

@EverlastingBugstopper -- Can we get this being fixed removed from: #1000 -- it looks like it causes the top of this page show that its been fixed.

Also, can you update this ticket on status on if any progress is stalled out on this due to blockers on the Apollo studio API front? Any ticket you can point to to identify that work would be helpful.

@EverlastingBugstopper
Copy link
Contributor

@theJC you're totally right - we made an improvement but not an actual fix for this. Re-opening and we'll take a look.

@theJC
Copy link
Author

theJC commented Oct 14, 2022

@EverlastingBugstopper -- Note, this is still showing up as closed.

@theJC
Copy link
Author

theJC commented Oct 14, 2022

FYI, optimizations here would provide significant developer experience lift:

Our QA env now consists of 149 subgraphs
Our PROD env now consists of 121 subgraphs

Here's our daily p95 and max for the past 60 days of our "subgraph overlay" activity in QA which consists of doing one rover subgraph list command, writing out of a yaml file based off it, and then one rover subgraph compose command:

Units are elapsed seconds:

image

@EverlastingBugstopper
Copy link
Contributor

🤔 So, looking into this it seems like each of those queries is fetching one subgraph schema at a time and also fetching the names of all the subgraphs, like so:

query SubgraphFetchQuery($graphId: ID!, $variant: String!, $subgraphName: ID!) {
  graph(id: $graphId) {
    variant(name: $variant) {
      subgraph(name: $subgraphName) {
        name
        url
        activePartialSchema {
          sdl
        }
      }
      subgraphs {
        name
      }
    }
  }
}

Do you think the fix is as simple as removing the section that requests the names of all the subgraphs? I could imagine a world where we perform a second query after the initial one if no subgraph with that subgraphName was found in order to provide the nice "did you mean to type this subgraph" error message, but I'm not sure if that alone would fix the issue here.

One thing we could do here is to parallelize the requests being made in order to get some short term speed improvements and better CPU utilization.

@EverlastingBugstopper
Copy link
Contributor

Looked into this a bit more today - removing the error handling with the list of subgraph names does not seem to meaningfully impact the time to config resolution.

EverlastingBugstopper added a commit that referenced this issue Nov 14, 2022
# [0.10.0] - 2022-11-10

> Important: 1 potentially breaking change below, indicated by **❗
BREAKING ❗**

## ❗ BREAKING ❗

- **Fix implementation of `--header` argument - @EverlastingBugstopper,
#1369 fixes #1365**

This change tightens up usage of the `--header` argument used for
`introspect` commands by disallowing previously valid (but undocumented)
usage like this: `--header "Header-1: value" "Header-2: value"`. After
this change, you _must_ conform to what we have in the documentation,
which indicates separate instances of the `--header` argument for each
header, like so: `--header "Header-1: value" --header "Header-2:
value"`.

## 🚀 Features

- **Provide prebuilt binaries for ARM devices - @EverlastingBugstopper,
#1356 fixes #582**

As of this release, [`rover.apollo.dev`](https://rover.apollo.dev)
delivers prebuilt binaries from our GitHub release for ARM devices. Most
notably this means that Docker on M1 devices should work out of the box.
You should be able to replace any custom builds in your tooling pipeline
with a call to the [official curl
installer](https://www.apollographql.com/docs/rover/getting-started/#linux--macos-installer).

- **Report downstream check task results - @sachindshinde, #1385**

When running `rover subgraph check` commands, if the proposed schema
would cause downstream failures (i.e. with contracts), those failures
are now reported in the check response.

- **Faster `rover supergraph compose` - @EverlastingBugstopper, #1392
fixes #992**

Rover now resolves all subgraph schemas in parallel when running `rover
supergraph compose` on a `supergraph.yaml` file. This should improve the
speed to compose large supergraphs significantly. This change also
drastically improves error handling by reporting _all_ issues with
resolving subgraph schemas (and informing you which schema(s) failed to
resolve) rather than exiting on the first failed schema resolution.

- **Add `--polling-interval` to `rover dev` - @patrick91, #1377 fixes
#1221**

You can now set `--polling-interval` when running `rover dev` to change
the frequency of introspection poll requests for subgraphs that don't
provide the schema from the file system with the `--schema` argument.

- **Adds `--skip-update-check` to skip the once-per-day update check -
@Tsing, #1396 fixes #1394**

Once per day, Rover checks if there is a new version available for
update and notifies the user if there is. There is now a flag you can
pass to disable this check: `--skip-update-check`.

- **Respect the `NO_COLOR` environment variable - @chnn, #1360**

`rover` will not use color in any output when executed with the
`NO_COLOR` environment variable set to `true`.

## 🛠 Maintenance

- **Updates from clap v3 to clap v4 - @EverlatingBugstopper, #1404 fixes
#1400**

This release updated the command line argument parsing library to major
version 4. There should be no noticeable compatibility issues with this
update, only lighter binaries. The look and feel of the main `rover
--help` output has changed to a neutral color palette along with this
change.

- **Updates Rust to 1.65.0 - @EverlastingBugstopper, #1399**

- **Updates node.js to v18 - @renovate, #1389**

- **Updates node dev-dependencies - @renovate, #1204 and zs#1398**

- **Remove dependency on the `saucer` crate - @EverlastingBugstopper,
#1402**

- **Updates `introspector-gadget` to 0.2.0 - @EverlastingBugstopper,
#1386**

- **Only cache dependencies in CI, not whole `/target` -
@EverlastingBugstopper, #1387**

- **Use `engine@main` instead of `engine@current` to fetch the API
schema - @EverlastingBugstopper, #1368**

- **Use `lychee` as a link checker instead of npm - @ptondereau, #1328
fixes #1306**

We now use a Rust-based link checker to check the links in the Rover
repository instead of a node-based link checker (that was much more
flaky).

- **Describe latest federation versions in
`./latest_plugin_versions.json` - @EverlastingBugstopper, #1363**

When you run `rover supergraph compose`, the latest version of
composition is automatically downloaded to your machine, these latest
version numbers are now stored in `./latest_plugin_versions.json` in the
Rover repo.

- **Rename `apollo-` headers to `apollographql-` headers - @jsegaran,
#1411**

- **Update npm to v9 - @renovate, #1412**

## 📚 Documentation

- **Update studio algolia key to graphos - @trevorblades, #1384**

- **Fix some broken links - @StephenBarlow, #1376**

- **Fix a typo in the migration guide instructing the use of `check`
instead of `publish` - @EverlastingBugstopper, #1364 fixes #1361**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment