Skip to content

prep release: v1.59.2#6672

Merged
abernix merged 2 commits into1.59.2from
prep-1.59.2
Jan 29, 2025
Merged

prep release: v1.59.2#6672
abernix merged 2 commits into1.59.2from
prep-1.59.2

Conversation

@abernix
Copy link
Member

@abernix abernix commented Jan 28, 2025

Note

When approved, this PR will merge into the 1.59.2 branch which will — upon being approved itself — merge into main.

Things to review in this PR:

  • Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)
  • Version bumps
  • That it targets the right release branch (1.59.2 in this case!).

Important

This release contains important fixes which address resource utilization regressions which impacted Router v1.59.0 and v1.59.1. These regressions were in the form of:

  1. A small baseline increase in memory usage; AND
  2. Additional per-request CPU and memory usage for queries which included references to abstract types with a large number of implementations

If you have enabled Distributed query plan caching, this release contains changes which necessarily alter the hashing algorithm used for the cache keys. On account of this, you should anticipate additional cache regeneration cost when updating between these versions while the new hashing algorithm comes into service.

🐛 Fixes

Improve performance of query hashing by using a precomputed schema hash (PR #6622)

The router now uses a simpler and faster query hashing algorithm with more predictable CPU and memory usage. This improvement is enabled by using a precomputed hash of the entire schema, rather than computing and hashing the subset of types and fields used by each query.

For more details on why these design decisions were made, please see the PR description

By @IvanGoncharov in #6622

Fix increased memory usage in sysinfo since Router 1.59.0 (PR #6634)

In version 1.59.0, Apollo Router started using the sysinfo crate to gather metrics about available CPUs and RAM. By default, that crate uses rayon internally to parallelize its handling of system processes. In turn, rayon creates a pool of long-lived threads.

In a particular benchmark on a 32-core Linux server, this caused resident memory use to increase by about 150 MB. This is likely a combination of stack space (which only gets freed when the thread terminates) and per-thread space reserved by the heap allocator to reduce cross-thread synchronization cost.

This regression is now fixed by:

  • Disabling sysinfo’s use of rayon, so the thread pool is not created and system processes information is gathered in a sequential loop.
  • Making sysinfo not gather that information in the first place since Router does not use it.

By @SimonSapin in #6634

@svc-apollo-docs
Copy link
Collaborator

svc-apollo-docs commented Jan 28, 2025

⚠️ Docs preview not attached to branch

The preview was not built because the PR's base branch 1.59.2 is not in the list of sources.

An Apollo team member can comment one of the following commands to dictate which branch to attach the preview to:

  • !docs set-base-branch dev
  • !docs set-base-branch 1.x

Build ID: ab44ae95a43d262aec4ae808

@abernix abernix merged commit d623447 into 1.59.2 Jan 29, 2025
2 checks passed
@abernix abernix deleted the prep-1.59.2 branch January 29, 2025 12:02
@router-perf
Copy link

router-perf bot commented Jan 31, 2025

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants