Skip to content

feat(router): entity caching with L1/L2, shadow mode, and analytics#2677

Closed
jensneuse wants to merge 3166 commits intomainfrom
jensneuse/entity-caching
Closed

feat(router): entity caching with L1/L2, shadow mode, and analytics#2677
jensneuse wants to merge 3166 commits intomainfrom
jensneuse/entity-caching

Conversation

@jensneuse
Copy link
Copy Markdown
Member

@jensneuse jensneuse commented Mar 23, 2026

Summary

  • Entity-level caching for federated GraphQL: caches individual entities by @key fields at the router level, so a Product cached by one query is reused by every query that needs it — regardless of query shape or subgraph
  • Two-layer architecture: L1 (per-request dedup via sync.Map) eliminates redundant entity fetches within a single request; L2 (Redis or in-memory ristretto) shares cached entities across requests with per-entity TTLs via @entityCache(maxAge)
  • Shadow mode for safe rollout: cache reads/writes happen but cached data is never served — compares cached vs fresh responses field-by-field to measure staleness before going live
  • Analytics pipeline: entity cache hit/miss/stale metrics exported via OpenTelemetry for observability dashboards
  • Circuit breaker on L2: prevents cascade failures when Redis is degraded
  • Composition support: new @entityCache and @entityCacheTTL directives validated and propagated through the composition layer
  • ~40K lines across composition (TypeScript), engine (graphql-go-tools), router (Go), and proto layers

Companion PR: graphql-go-tools #1259

Documentation

All entity caching docs live in docs/entity-caching/:

Document Description
engineering-brief.md Full engineering brief — architecture, design decisions, review focus areas
directives.md @entityCache and @entityCacheTTL directive definitions and usage
configuration.md Router config reference (L1/L2, circuit breaker, env vars)
analytics.md Entity cache analytics and observability
analytics-rfc.md RFC for the entity cache analytics pipeline

Start with the engineering brief for a full overview of the design and review focus areas.

Test plan

  • Composition directive validation tests (composition/tests/v1/directives/entity-caching.test.ts)
  • Router entity cache unit tests (router/core/executor_entity_cache_test.go)
  • Entity cache package tests (memory, redis, circuit breaker)
  • Integration tests (router-tests/entity_caching/)
  • Standard subgraph entity caching integration tests
  • Shadow mode correctness verification
  • CI green on all platforms

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Entity caching: directives for object/root/mutation caching with L1 (per-request) and L2 (memory/Redis), per-subgraph overrides, shadow mode, partial loads, and circuit-breaker handling
    • Request-scoped field deduplication via request-scoped keys
    • Playground: cache-mode controls, cache badges, cache details/keys viewer, Cache Explorer UX
    • Cache analytics: snapshotting and observable metrics; demo & benchmark tooling
  • Documentation

    • New RFCs, guides, and engineering briefs covering entity caching, directives, config, analytics, and demos
  • Tests

    • Extensive unit/integration suites and router tests covering caching scenarios and backends

hardworker-bot and others added 30 commits February 21, 2026 21:35
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
 - aws-lambda-router@0.39.2
 - router@0.285.0
 - cdn@0.16.2
 - controlplane@0.191.3
 - keycloak@0.11.3
 - wgc@0.105.5
 - @wundergraph/cosmo-connect@0.131.0
 - controlplane@0.192.0
 - keycloak@0.12.0
 - @wundergraph/protographic@0.16.0
 - router@0.286.0
 - @wundergraph/cosmo-shared@0.43.4
 - studio@0.155.1
The Cosmo signup page can now be controlled with the uc param to display corresponding content, e.g. uc=apollo to address users reacting to an apollo user campaign.
Co-authored-by: Ludwig Bedacht <ludwig.bedacht@gmail.com>
 - controlplane@0.192.1
 - studio@0.156.0
 - wgc@0.105.6
 - @wundergraph/composition@0.51.2
 - controlplane@0.192.2
 - @wundergraph/protographic@0.16.1
 - @wundergraph/cosmo-shared@0.43.5
 - studio@0.156.1
Co-authored-by: Dustin Deus <deusdustin@gmail.com>
 - wgc@0.105.7
 - @wundergraph/cosmo-connect@0.132.0
 - controlplane@0.193.0
 - @wundergraph/protographic@0.16.2
 - @wundergraph/cosmo-shared@0.43.6
 - studio@0.157.0
 - router@0.286.1
 - studio@0.158.0
 - controlplane@0.193.1
Resolve conflicts: keep local entity-caching changes (graphql-go-tools rc.270 with
@requestScoped, EntityCachingConfiguration, MemoryStorageProvider, normalization
factory typing) and merge upstream additions (CostControl ValidateSliceArguments,
StorageProvider envPrefix, PQLManifestConfig tests, Cosmo Connect args + @requires).

Regenerate proto bindings (make generate-go), composition-go bundle, and demo
plugins (cd demo && make plugin-build-ci). Add staticcheck.conf to remaining
entity_caching subgraphs to silence SA4004 in gqlgen-generated federation code.
Apply gofmt to attributes.go and router_entity_cache_test.go.

Includes a regression test (TestEntityCaching) that asserts @cachePopulate must
not write to L2 when router L2 caching is disabled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jensneuse added a commit that referenced this pull request Apr 19, 2026
See TODO.md for the per-item resolution log.

Highlights:
- Namespace 6 new entity-caching directives as @openfed__ (Aenimus).
- Proto field rename negative_cache_ttl_seconds -> not_found_cache_ttl_seconds
  (field #6 preserved, old name reserved) + TTL-semantics field comments.
- Composition correctness: nested-key mapping iterates all + carries
  correct isBatch on list returns; renamed root types attach cache config
  under the canonical root; TypeScript hygiene (branded aliases, Array<T>,
  ordinal helper, alphabetical constants, reuse isTypeNodeListType).
- Router: default cache stored under both "default" and its provider_id so
  overrides reuse the instance; executor nil-safety + skip+warn on unknown
  subgraph ID; buildRedisClient rename; logEntityCacheOverrideIssues
  rename; metrics source labels via CacheOperationSource; otel attribute
  constants; config.schema.json memory max_size bytes-string format.
- Test truthfulness: 10 router-tests subtests now assert counters.* and/or
  cache.Len(); 9 of 10 fail with L2 disabled (was 0). negative_cache tests
  redesigned with mutation-created IDs; items subgraph mutations persist
  and emit events. Circuit breaker concurrent tests; redis post-close Set
  and Delete coverage; memory concurrency test error propagation;
  ristretto bounded assertion; post-close len guard.
- New unit coverage: entity_cache_metrics_test.go (0% -> ~94%),
  graphql_handler caching-options tests, router_entity_cache_test.go
  covering default-cache reuse via require.Same.
- Playground: functional setHeaders (no stale closure), Math.max zero
  duration clamp, controlled Tabs.
- Docs: PII redaction on raw keys, namespace sweep across all references,
  ENTITY_CACHING_DEMO SDL refresh, CLAUDE.md cheatsheet + code-block tags.

Flagged follow-up: @openfed__cachePopulate writes still land in L2 when
L2.Enabled=false. Read path correctly gates so user-visible impact is
wasted writes only. Fix location: cache-populate write path must respect
EntityCachingConfiguration.L2.Enabled.

Verified:
- composition: 1194 tests pass.
- router, router-tests: -race clean.
- router-tests/entity_caching: 63/64 pass (1 pre-existing skip).
- cache-demo e2e: every documented query + all three
  X-WG-Disable-Entity-Cache* header gates behave as documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jensneuse added a commit that referenced this pull request Apr 19, 2026
TODO.md - per-item resolution for every Aenimus / SkArchon / Noroth /
CodeRabbit review comment: what was done, where disagreements stand,
what is deferred, and the final verification matrix (composition /
router / router-tests / cache-demo e2e all green).

FEEDBACK.md - raw extracted feedback (47 human + 29 bot inline comments,
review-level summaries, and other signals) captured for traceability so
reviewers can cross-reference disposition entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Dependency Review

The following issues were found:

  • ❌ 2 vulnerable package(s)
  • ⚠️ 3 packages with OpenSSF Scorecard issues.

View full job summary

@github-actions
Copy link
Copy Markdown

graphqlmetrics - uncommitted changes detected

Seems like you forgot to commit some code. Possible causes:

  • Generated code not part of the PR, fix with: make generate and commit the changes
  • Dependency mismatch for tools (protoc, etc). Ensure your local machine has same versions of tools as CI does
  • Formatting drift, fix with make format graphqlmetrics / pnpm format graphqlmetrics

Dirty files
  • graphqlmetrics/gen/proto/wg/cosmo/common/common.pb.go
  • graphqlmetrics/gen/proto/wg/cosmo/graphqlmetrics/v1/graphqlmetrics.pb.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

♻️ Duplicate comments (2)
composition/CLAUDE.md (1)

103-109: ⚠️ Potential issue | 🟡 Minor

Add language tags to these fenced blocks.

These unlabeled fences will keep tripping MD040/markdownlint. Use graphql for the directive example and bash for the command snippets.

Also applies to: 208-225

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/CLAUDE.md` around lines 103 - 109, The fenced code blocks showing
GraphQL directives (e.g. `@openfed__entityCache`, `@openfed__queryCache`,
`@openfed__cachePopulate`, `@openfed__cacheInvalidate`, `@openfed__is`) should be
labeled with a language tag: change the directive block fences to use ```graphql
and any command/snippet blocks mentioned elsewhere (like shell examples) to use
```bash; update the other unlabeled fences referenced around lines 208–225 in
the same file to use the appropriate ```graphql or ```bash tags so
markdownlint/MD040 no longer flags them.
composition/src/v1/normalization/normalization-factory.ts (1)

3797-3829: ⚠️ Potential issue | 🟠 Major

Store root-field cache config under the canonical root type.

processRootFieldCacheDirectives() still passes the raw map key into the cache extractors, so renamed roots like schema { query: MyQuery } write cache config under MyQuery while the rest of normalization keys root config under Query/Mutation/Subscription. Those entries then miss the later root-node handling in normalize() and the router never sees the cache directives on the actual root.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/src/v1/normalization/normalization-factory.ts` around lines 3797
- 3829, processRootFieldCacheDirectives() is using the raw map key
(parentTypeName) when calling extractQueryCacheConfig /
extractCacheInvalidateConfig / extractCachePopulateConfig, which causes renamed
schema roots to be stored under the wrong type; compute the canonical root type
name from the operationType node (e.g. the NamedType node on operationType, such
as operationType.type.name.value) and pass that canonicalRootTypeName to
extractQueryCacheConfig/extractCacheInvalidateConfig/extractCachePopulateConfig
instead of parentTypeName so root-field cache config is stored under the
standard Query/Mutation/Subscription types.
🟡 Minor comments (10)
demo/pkg/subgraphs/cachegraph_ext/subgraph/data.go-46-53 (1)

46-53: ⚠️ Potential issue | 🟡 Minor

Add a nil guard in toArticle to prevent panics.

Line 49–51 dereferences ext unconditionally. If a missing extension reaches this helper, it will panic.

Proposed fix
 func toArticle(id string, ext *articleExtData) *model.Article {
+	if ext == nil {
+		return &model.Article{ID: id}
+	}
 	return &model.Article{
 		ID:            id,
 		ViewCount:     ext.ViewCount,
 		Rating:        ext.Rating,
 		ReviewSummary: ext.ReviewSummary,
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/pkg/subgraphs/cachegraph_ext/subgraph/data.go` around lines 46 - 53, The
toArticle function currently dereferences ext (type *articleExtData) without
checking for nil, which can panic; add a nil guard at the start of toArticle: if
ext is nil return &model.Article{ID: id} (or populate only safe defaults)
otherwise proceed to use ext.ViewCount, ext.Rating, ext.ReviewSummary; keep the
function name toArticle and the types articleExtData and model.Article unchanged
so callers remain compatible.
TODO.md-21-24 (1)

21-24: ⚠️ Potential issue | 🟡 Minor

Replace hardcoded absolute paths with portable commands.

The verification commands contain absolute paths specific to one developer's machine (/Users/jens/repos/..., /Users/jens/go/pkg/mod/...), which prevents other developers from running these verification steps. Use relative paths from the repository root or environment variables.

📝 Suggested portable rewrite
 ### Verification rerun on 2026-04-20
 
-- `cd /Users/jens/repos/wundergraph-projects/claude-space-2/graphql-go-tools/v2 && PATH=/Users/jens/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.26.0.darwin-arm64/bin:$PATH go test ./pkg/engine/resolve -run 'TestDetectMutationEntityImpact|TestMutationCacheTTLOverride' -count=1`
-- `cd router && PATH=/Users/jens/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.26.0.darwin-arm64/bin:$PATH go test ./core ./pkg/entitycache ./pkg/metric -count=1`
-- `cd router-tests && PATH=/Users/jens/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.26.0.darwin-arm64/bin:$PATH go test ./entity_caching -count=1`
+- `cd ../graphql-go-tools/v2 && go test ./pkg/engine/resolve -run 'TestDetectMutationEntityImpact|TestMutationCacheTTLOverride' -count=1`
+- `cd router && go test ./core ./pkg/entitycache ./pkg/metric -count=1`
+- `cd router-tests && go test ./entity_caching -count=1`
 - `cd playground && pnpm test && pnpm build`

(The PATH override is unnecessary if the correct Go toolchain is already in your environment.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TODO.md` around lines 21 - 24, Replace the hardcoded absolute paths in the
verification commands (the four shell lines that start with cd
/Users/jens/repos/... and PATH=/Users/jens/go/pkg/mod/...) with portable
alternatives: use repository-relative paths (e.g., cd to the local subfolders
like graphql-go-tools/v2, router, router-tests, playground) and drop the
machine-specific PATH override or replace it with a generic toolchain lookup
(rely on the developer's PATH or use environment variables such as GOMODCACHE or
GOROOT/bin if needed); ensure the commands remain the same otherwise (go test
./pkg/engine/resolve -run
'TestDetectMutationEntityImpact|TestMutationCacheTTLOverride' -count=1, go test
./core ./pkg/entitycache ./pkg/metric -count=1, go test ./entity_caching
-count=1, pnpm test && pnpm build) so they run from repo root on any machine.
demo/pkg/injector/latency.go-22-25 (1)

22-25: ⚠️ Potential issue | 🟡 Minor

Honor request cancellation during injected delay.

If the client disconnects/cancels, the middleware still sleeps the full duration before returning. Prefer a context-aware wait.

Suggested fix
-			if ms, err := strconv.Atoi(v); err == nil && ms > 0 && ms <= 10000 {
+			if ms, err := strconv.Atoi(v); err == nil && ms > 0 && ms <= 10000 {
 				log.Printf("[latency] %s %s sleep %dms prefix=%q", r.Method, r.Host, ms, r.Header.Get("X-WG-Cache-Key-Prefix"))
-				time.Sleep(time.Duration(ms) * time.Millisecond)
+				timer := time.NewTimer(time.Duration(ms) * time.Millisecond)
+				defer timer.Stop()
+				select {
+				case <-timer.C:
+				case <-r.Context().Done():
+					return
+				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/pkg/injector/latency.go` around lines 22 - 25, Replace the unconditional
time.Sleep in the latency injection middleware with a context-aware wait so the
handler honors client cancellation: after parsing ms (from strconv.Atoi) use the
request context (r.Context()) and wait with a select between
time.After(time.Duration(ms)*time.Millisecond) and ctx.Done(), returning
immediately (or skipping the sleep) if ctx.Done() fires; keep the existing
log.Printf call but consider logging that the delay was aborted when ctx.Done()
occurs.
demo/pkg/subgraphs/subgraphs.go-227-228 (1)

227-228: ⚠️ Potential issue | 🟡 Minor

Viewer is bypassing the latency injector in the shared demo path.

cache-demo wraps the viewer server with injector.Latency(...), and the benchmark router propagates X-Artificial-Latency to viewer, but both of these paths return viewer.NewHandler() directly. Requests started through subgraphs.New() or ViewerHandler() will ignore the latency header and behave differently from the dedicated cache-demo binary.

Suggested fix
 func ViewerHandler() http.Handler {
-	return viewer.NewHandler()
+	return injector.Latency(viewer.NewHandler())
 }
@@
 	if config.Ports.Viewer != 0 {
 		servers = append(servers, &http.Server{
 			Addr:    ":" + strconv.Itoa(config.Ports.Viewer),
-			Handler: viewer.NewHandler(),
+			Handler: ViewerHandler(),
 		})
 	}

Also applies to: 312-316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/pkg/subgraphs/subgraphs.go` around lines 227 - 228, ViewerHandler and
the handler returned by subgraphs.New currently return viewer.NewHandler()
directly and therefore bypass the latency injector; wrap the viewer handler with
the shared latency middleware so X-Artificial-Latency is respected. Concretely,
replace returns of viewer.NewHandler() in ViewerHandler and in the viewer route
inside subgraphs.New with injector.Latency(viewer.NewHandler()) (or the
appropriate injector.Latency(...) call used elsewhere) so the injected latency
middleware is applied to requests coming through those code paths.
benchmark/scripts/start_stack.sh-57-86 (1)

57-86: ⚠️ Potential issue | 🟡 Minor

BENCHMARK_REDIS_PORT is not fully wired through the stack.

start_redis.sh lets callers move Redis off 6399, but this script always launches the router with benchmark/router-cache.redis.yaml, which still points at redis://localhost:6399/2. Any non-default port will start Redis successfully and then leave the router talking to the wrong address.

Low-friction guard until the config is parameterized
 bash "${SCRIPT_DIR}/start_redis.sh"
+
+if [ "${BENCHMARK_REDIS_PORT:-6399}" != "6399" ]; then
+  echo "BENCHMARK_REDIS_PORT is not wired through benchmark/router-cache.redis.yaml yet" >&2
+  exit 1
+fi
 
 (
   cd "${ROOT_DIR}/demo"
   env -u GOROOT go build -o "${RUN_DIR}/cache-demo.bin" ./cmd/cache-demo
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/start_stack.sh` around lines 57 - 86, The router is still
pointed at redis://localhost:6399/2 via the static
benchmark/router-cache.redis.yaml while start_redis.sh can start Redis on a
different port (BENCHMARK_REDIS_PORT); update start_stack.sh so the
spawn_detached call that launches "${RUN_DIR}/router.bin" uses the actual
BENCHMARK_REDIS_PORT — either by exporting a REDIS_URL (or similar) env var
constructed from BENCHMARK_REDIS_PORT before the spawn_detached call, or by
generating a temporary router config from router-cache.redis.yaml with the port
substituted and passing that config to "${RUN_DIR}/router.bin"; modify the code
around spawn_detached in start_stack.sh (and any place that references
router-cache.redis.yaml or RUN_DIR/router.bin) so the router always receives the
correct Redis address.
composition/AGENTS.md-11-25 (1)

11-25: ⚠️ Potential issue | 🟡 Minor

Add a language to this fenced block.

This fence currently trips MD040. Using text here is enough and keeps the docs lint-clean.

Suggested fix
-```
+```text
 `@key`(fields: "...") on entity type
     ↓
 extractKeyFieldSets() → KeyFieldSetData { documentNode, normalizedFieldSet, isUnresolvable }
     ↓
 keyFieldSetDatasByTypeName: Map<TypeName, Map<normalizedFieldSet, KeyFieldSetData>>
     ↓
 validateKeyFieldSets() → keyFieldPathsByTypeNameByFieldSet (per-key dot-notation paths)
     ↓
 extractPerKeyFieldPaths() → Array<{normalizedFieldSet, isUnresolvable, fieldPaths: Set<string>}>
     ↓
 buildArgumentKeyMappings() → EntityKeyMappingConfig[] (one per fully-satisfiable key)
     ↓
 RootFieldCacheConfig.entityKeyMappings → serialized to protobuf → router
-```
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/AGENTS.md` around lines 11 - 25, The fenced code block in
AGENTS.md is missing a language tag which triggers MD040; update the opening
fence from ``` to ```text (and leave the closing fence as ```), so the block
becomes a fenced "text" code block; no changes needed to the block contents—only
change the opening fence to include "text".
composition/COMPOSITION_CONVENTIONS.md-16-51 (1)

16-51: ⚠️ Potential issue | 🟡 Minor

Label this fenced block to satisfy Markdown lint.

The directory tree fence is missing a language tag, so MD040 will keep firing here. text is the simplest fix.

Suggested fix
-```
+```text
 composition/src/
 ├── index.ts                      # public exports — minimal, additive
 ├── buildASTSchema/               # schema construction helpers
 ├── errors/                       # error factories + typed param interfaces
 ├── federation/                   # shared federation types (public result shapes)
 ├── normalization/                # shared normalization types
 ├── resolvability-graph/          # reachability validation (graph.ts, walker/)
 ├── router-compatibility-version/ # ROUTER_COMPATIBILITY_VERSION_ONE
 ├── router-configuration/types.ts # ConfigurationData + cache/field mapping types
 ├── schema-building/              # build router + client schemas from merged model
 ├── subgraph/types.ts             # Subgraph, SubgraphConfig, InternalSubgraph
 ├── types/                        # shared branded/alias types (SubgraphName, ...)
 ├── utils/                        # string-constants.ts, params.ts, shared helpers
 ├── v1/                           # router-compatibility-version=1 implementation
 │   ├── constants/                # split into dedicated modules (see §3)
 │   │   ├── constants.ts
 │   │   ├── directive-definitions.ts
 │   │   ├── integers.ts
 │   │   ├── non-directive-definitions.ts
 │   │   ├── strings.ts
 │   │   └── type-nodes.ts
 │   ├── federation/               # federation-factory.ts + params
 │   ├── normalization/            # normalization-factory.ts + walkers + utils
 │   ├── schema-building/
 │   ├── subgraph/
 │   ├── utils/
 │   └── warnings/
 └── warnings/                     # shared warning types

 composition/tests/v1/             # mirrors v1/ layout
 ├── directives/                   # one file per directive (authorization, entity-caching, ...)
 ├── types/                        # per-GraphQL-type tests (interfaces, unions, ...)
 ├── utils/                        # shared test helpers + SDL fragments
 └── test-data/
-```
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/COMPOSITION_CONVENTIONS.md` around lines 16 - 51, The Markdown
fenced code block containing the directory tree in COMPOSITION_CONVENTIONS.md is
missing a language tag (triggers MD040); update the opening fence from ``` to
```text so the block is labeled as text and ensure the closing fence remains
```; locate the block by searching for the directory-tree snippet starting with
"composition/src/" and change only the opening fence to include the language
tag.
benchmark/scripts/run_suite.ts-65-87 (1)

65-87: ⚠️ Potential issue | 🟡 Minor

Reject missing or malformed flag values during parsing.

--scenario --all, --duration --all, or --vus foo are all accepted here and only fail much later, sometimes after the benchmark stack has already started. Validate that each option has a non-flag value, and reject non-numeric --vus, before returning from parseArgs().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/run_suite.ts` around lines 65 - 87, The argument parsing in
parseArgs currently accepts missing or flag-like values (e.g., "--scenario
--all" or non-numeric "--vus foo"); update parseArgs to validate each option's
next token before assigning to scenario, vus, duration, rampUp, and rampDown:
ensure args[i+1] exists and does not startWith("--"), and for vus (and any
numeric options) convert to Number and verify it's a finite number (and positive
integer if required); if validation fails, throw or return a descriptive error
immediately instead of continuing so invalid flags are rejected early.
composition/CLAUDE.md-165-168 (1)

165-168: ⚠️ Potential issue | 🟡 Minor

Reference the current helper name here.

buildArgumentKeyMappingsV2 does not exist in composition/src/v1/normalization/normalization-factory.ts; the implementation is buildArgumentKeyMappings() plus getIsFieldValue(). This note will send readers to the wrong symbol.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/CLAUDE.md` around lines 165 - 168, The documentation incorrectly
references buildArgumentKeyMappingsV2; update the CLAUDE.md note to reference
the actual helper names used in code: buildArgumentKeyMappings() and
getIsFieldValue() (instead of buildArgumentKeyMappingsV2), and call out that the
argument name is FIELDS (plural) so ensure IS_DEFINITION and IS_DEFINITION_DATA
use the FIELDS constant and that buildArgumentKeyMappings() reads arg.name.value
=== FIELDS to extract `@openfed__is` values.
benchmark/scripts/lib.ts-189-206 (1)

189-206: ⚠️ Potential issue | 🟡 Minor

Reject half-configured multi-profile scenarios up front.

If a manifest provides responseFixtures without authProfiles (or the reverse), this falls through to the single-fixture branch. With responseFixture also present, the suite will silently benchmark the wrong matrix instead of failing fast.

🛡️ Proposed fix
 export function resolveRequestMatrix(scenario: Scenario): RequestVariant[] {
+  const hasMultiProfileFixtures = Boolean(scenario.responseFixtures);
+  const hasMultiProfileAuthProfiles = Boolean(scenario.authProfiles?.length);
+
+  if (hasMultiProfileFixtures !== hasMultiProfileAuthProfiles) {
+    throw new Error(
+      `scenario ${scenario.name} must define both responseFixtures and authProfiles together`,
+    );
+  }
+
   if (scenario.responseFixtures && scenario.authProfiles?.length) {
     return scenario.authProfiles.map((authProfile) => {
       const fixturePath = scenario.responseFixtures?.[authProfile];
       if (!fixturePath) {
         throw new Error(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/lib.ts` around lines 189 - 206, In resolveRequestMatrix,
detect and reject partially-specified multi-profile scenarios: if exactly one of
scenario.responseFixtures or scenario.authProfiles is present (i.e., one exists
but the other is missing/empty), throw an error indicating the manifest is
half-configured for that scenario; otherwise keep existing behavior (use
authProfiles+responseFixtures to build the matrix, or fall back to single
responseFixture/scenario.authProfile). Reference resolveRequestMatrix and the
properties scenario.responseFixtures, scenario.authProfiles,
scenario.responseFixture, and scenario.authProfile when implementing the check.
🧹 Nitpick comments (5)
TODO.md (1)

167-167: Minor word-order nitpick.

Consider rewording "user asked to not defer""user asked not to defer" for smoother reading.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TODO.md` at line 167, Replace the phrase "user asked to not defer" with "user
asked not to defer" in the TODO.md text to fix the word-order nitpick; search
for the exact substring "user asked to not defer" (the resolved sentence
mentioning "another PR" and metrics consolidation) and update it to "user asked
not to defer" for smoother reading.
benchmark/k6/cache_demo.js (1)

5-9: Fail cleanly on malformed BENCHMARK_PAYLOAD.

A bad env payload currently throws during module init, so you lose the explicit fail(...) message. Wrapping the parse makes this much easier to diagnose.

Suggested fix
-const payload = JSON.parse(__ENV.BENCHMARK_PAYLOAD || "{}");
+const payload = (() => {
+  try {
+    return JSON.parse(__ENV.BENCHMARK_PAYLOAD || "{}");
+  } catch {
+    fail("BENCHMARK_PAYLOAD must be valid JSON");
+  }
+})();

As per coding guidelines, "Add proper error handling with try-catch blocks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/k6/cache_demo.js` around lines 5 - 9, The module currently parses
BENCHMARK_PAYLOAD at init which can throw; wrap the JSON.parse of
__ENV.BENCHMARK_PAYLOAD in a try-catch so malformed JSON triggers a clear
fail(...) instead of an uncaught exception: attempt to parse into the payload
variable (keeping the "{}" fallback), and in the catch call fail with a message
that includes the parse error and the raw __ENV.BENCHMARK_PAYLOAD value so it's
easy to diagnose; update references to payload as before.
benchmark/scripts/lib.ts (2)

13-103: Prefer interface for these exported object shapes.

Scenario, Manifest, RequestVariant, K6Stage, ComparableModeSummary, ModeComparison, ScenarioVariantSummary, and SuiteSummary are all object-shape exports. In this repo those should be interfaces, not type aliases.

As per coding guidelines, "Prefer interfaces over type aliases for object shapes in TypeScript".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/lib.ts` around lines 13 - 103, Replace the exported
object-shape type aliases with exported interfaces: convert Scenario, Manifest,
RequestVariant, K6Stage, ComparableModeSummary, ModeComparison,
ScenarioVariantSummary, and SuiteSummary from "export type X = { ... }" to
"export interface X { ... }", preserving all property names, optional modifiers,
nested object structures (e.g., cache, router, redis), and exported status;
ensure any places that import or reference these remain valid (no runtime
changes needed) and adjust only the declaration syntax for those specific
symbols.

115-125: Make the mode tables exhaustive over Mode.

Both MODE_ORDER and COMPARISON_PAIRS are widened to string, so adding or renaming a mode will not fail compilation and will silently fall back to the default ordering/comparison behavior.

♻️ Proposed fix
-const MODE_ORDER: Record<string, number> = {
+const MODE_ORDER: Record<Mode, number> = {
   cache_disabled: 0,
   cache_enabled: 1,
   request_scoped_l1_disabled: 2,
   request_scoped_default: 3,
 };
 
-const COMPARISON_PAIRS: Array<readonly [baselineMode: string, candidateMode: string]> = [
+const COMPARISON_PAIRS: Array<readonly [baselineMode: Mode, candidateMode: Mode]> = [
   ["cache_disabled", "cache_enabled"],
   ["request_scoped_l1_disabled", "request_scoped_default"],
 ];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/lib.ts` around lines 115 - 125, MODE_ORDER and
COMPARISON_PAIRS are typed too broadly as string which lets missing/renamed
modes slip through; change their types to be explicit over the Mode union so the
compiler enforces exhaustiveness. Replace Record<string, number> with
Record<Mode, number> for MODE_ORDER and change COMPARISON_PAIRS to
Array<readonly [baselineMode: Mode, candidateMode: Mode]> (or readonly (readonly
[Mode, Mode])[]), and ensure the object/array literal actually contains an entry
for every member of the Mode union so the compiler will error if a Mode is added
or renamed; update the declarations of MODE_ORDER and COMPARISON_PAIRS (and any
const assertions) accordingly.
composition/src/errors/errors.ts (1)

1862-1871: Do not hardcode composite-key diagnostics to two arguments.

Composite keys and decomposed input objects can contribute more than two mapped argument paths. This factory only reports firstArgument and secondArgument, so 3+ field keys will emit incomplete diagnostics right when someone is debugging mapping validity.

♻️ Proposed fix
 export function explicitCompositeAdditionalNonKeyArgumentErrorMessage(
   fieldCoords: string,
-  firstArgument: string,
-  secondArgument: string,
+  mappedArguments: string[],
   compositeKey: string,
   entityType: string,
   extraArgument: string,
 ): string {
-  return `Field "${fieldCoords}" has arguments "${firstArgument}" and "${secondArgument}" with `@openfed__is` mappings covering composite `@key` "${compositeKey}" on entity "${entityType}", but also has additional argument "${extraArgument}" which is not mapped to a key field. All arguments must be key arguments — additional arguments may filter the response, making the cache key incomplete.`;
+  return `Field "${fieldCoords}" has arguments "${mappedArguments.join('", "')}" with `@openfed__is` mappings covering composite `@key` "${compositeKey}" on entity "${entityType}", but also has additional argument "${extraArgument}" which is not mapped to a key field. All arguments must be key arguments — additional arguments may filter the response, making the cache key incomplete.`;
 }

Based on learnings, input object decomposition in argument mapping must decompose into per-field mappings, with nested input objects mapping recursively.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/src/errors/errors.ts` around lines 1862 - 1871, The message
builder explicitCompositeAdditionalNonKeyArgumentErrorMessage currently assumes
exactly two mapped arguments (firstArgument, secondArgument) which hides
additional mapped paths; change its signature to accept a list of mapped
argument paths (e.g., mappedArguments: string[]) instead of
firstArgument/secondArgument and build the diagnostic by joining all
mappedArguments (comma-separated, with proper "and" before last item) so the
message includes every mapped key path along with the extraArgument and
compositeKey/entityType; update all call sites that pass
firstArgument/secondArgument to instead pass the array of decomposed per-field
mapping paths (recursing into nested input objects where necessary) so
diagnostics correctly report 3+ mapped arguments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmark/k6/cache_demo.js`:
- Around line 70-72: The current buildK6Payload implementation replaces the
default JSON header when payload.headers is provided, causing requests to lose
"content-type: application/json"; change the header assignment in buildK6Payload
to merge the defaults with any custom headers (e.g., combine { "content-type":
"application/json" } with payload.headers using a shallow merge so custom keys
override defaults but default content-type remains unless explicitly
overridden).

In `@benchmark/scripts/lib.ts`:
- Around line 651-678: In fetchJsonWithRetry, wrap each attempt with a
per-attempt timeout by creating an AbortController and passing its signal into
the fetch call without discarding any existing input.init.signal (merge: if
input.init.signal exists, race it with the timeout controller and forward a
composite signal or listen to input.init.signal to abort the timeout
controller); ensure the timeout controller is aborted/cleared after each attempt
and that the signal is included in the RequestInit passed to fetchImpl (do not
overwrite input.init directly), so the timeout-triggered AbortError flows into
isRetryableFetchError and the retry logic continues correctly.

In `@benchmark/scripts/scrape_metrics.ts`:
- Around line 54-57: The fetch for metrics currently has no timeout and can
hang; wrap the fetch in an AbortController: create a controller, pass
controller.signal to fetch("http://127.0.0.1:8088/metrics"), start a timer
(e.g., 5000ms) that calls controller.abort() on expiry, clear the timer after
fetch resolves, and handle AbortError to throw a clear "metrics fetch timeout"
error; update the code around the response/response.ok checks to use this
abort-enabled fetch flow (reference AbortController, controller.signal,
response).

In `@benchmark/scripts/stop_stack.sh`:
- Around line 13-16: The current shutdown in stop_stack.sh uses wait "${pid}"
which is ineffective for detached processes started with start_new_session=True
and can race; replace the wait call for "${pid}" with a polling loop that
repeatedly checks kill -0 "${pid}" (or /proc/${pid} existence) until the process
no longer exists or a timeout elapses, sleeping briefly between checks and then
proceed to remove the pid file; ensure the loop uses the same ${pid} variable
and enforces a reasonable timeout and graceful fallback (e.g., log/warn and
continue) so stop_stack.sh reliably waits for the detached process to exit.

In `@composition/src/v1/constants/directive-definitions.ts`:
- Around line 766-789: The new REQUEST_SCOPED_DEFINITION and its name symbol
REQUEST_SCOPED are not registered in the directive registry used during
normalization; import REQUEST_SCOPED and REQUEST_SCOPED_DEFINITION into the
directive registry module (where DIRECTIVE_DEFINITION_BY_NAME is defined) and
add an entry mapping REQUEST_SCOPED to REQUEST_SCOPED_DEFINITION in
DIRECTIVE_DEFINITION_BY_NAME so the `@openfed__requestScoped` directive is
discoverable and emitted during normalization.

In `@composition/src/v1/normalization/normalization-factory.ts`:
- Around line 4333-4339: The current early return from buildExplicitMappings
when isCompositeIsSpec is true causes a composite `@openfed__is` mapping to
suppress other satisfiable keys; instead of returning the result of
buildCompositeIsMapping, call buildCompositeIsMapping with (fieldCoords,
entityTypeName, keyFieldSets, isListReturn, argInfo) and append its produced
EntityKeyMappingConfig(s) to the mappings collection used by
buildExplicitMappings (do not merge into a single composite that enforces AND),
then continue processing remaining explicit/auto-mappable `@key` directives so
each independent `@key` yields its own EntityKeyMappingConfig (OR semantics).
Ensure you reference and update the same mappings accumulator used later in
buildExplicitMappings and remove the early return path.
- Around line 4846-4863: The current mismatch branch in normalization-factory.ts
aborts all auto-mapping by returning [] when a single argument vs key field
mismatch is found; instead, remove the early return so a mismatched field only
marks the current key as not fully mapped (set keyFullyMapped = false), emit the
warning via this.warnings.push(autoMappingTypeMismatchWarning(...)) if not
already emitted, and continue evaluating the rest of the key fields so other
candidate `@key` combinations can still produce separate EntityKeyMappingConfig
entries; keep use of namedTypesMatch(argTypeNode, keyTypeNode), the
argIsList/keyIsList/isListReturn checks, and the typeMismatchWarningEmitted flag
as-is but do not clear or abort outer mappings—only skip this key by ensuring
the loop continues and that fully-satisfiable keys are still collected.
- Around line 4185-4231: The current leaf-type acceptance uses
this.namedTypesMatch(inputFieldData.type, keyTypeNode) which ignores list
wrappers and lets mismatched list shapes (e.g. [ID!]! vs ID!) through; update
the acceptance to require both the named type and list shape match by also
calling this.listStructureMatches(...) (i.e. change the if to fail when either
namedTypesMatch(...) is false OR listStructureMatches(inputFieldData.type,
keyTypeNode) is false), so the error branch runs for nested list mismatches and
prevents emitting the invalid FieldMappingConfig used when building
fieldMappings/entityKeyField/argumentPath.

In `@demo/cmd/cache-demo/main.go`:
- Around line 40-47: gqlServer currently mounts the gqlgen server directly at
"/graphql" (srv created by handler.New) which bypasses injector.HTTP; change the
registration to wrap that handler with injector.HTTP so requests go through the
same middleware chain (i.e., replace mux.Handle("/graphql", srv) with
mux.Handle("/graphql", injector.HTTP(srv)) while keeping the existing
injector.Latency(mux) return). This ensures gqlServer, the handler.New-created
srv, and the "/graphql" route use injector.HTTP for consistent
request-context/auth behavior.

In `@demo/pkg/subgraphs/cachegraph/subgraph/schema.resolvers.go`:
- Around line 17-46: The package-level slices/maps (articlesData, listingsData)
are mutated without synchronization causing races and possible "concurrent map
read and map write" panics; protect access with a sync.RWMutex (e.g., declare a
package-level var articlesMu sync.RWMutex and listingsMu sync.RWMutex) and use
articlesMu.Lock()/Unlock() for CreateArticle and UpdateArticle mutations and
articlesMu.RLock()/RUnlock() for query readers, and similarly use
listingsMu.Lock()/Unlock() around DeleteListing (and any listing reads) when
accessing listingsData; alternatively implement copy-on-write snapshots for
writes and atomically replace the store if you prefer immutability.

In `@Makefile`:
- Around line 195-196: The demo target currently backgrounds the subgraph
process started by "go run cmd/all/main.go" and never cleans it up; modify the
Makefile commands to capture the background PID (from the "go run
cmd/all/main.go &" invocation), install a shell trap for INT/EXIT that kills
that PID (and waits for any children) before exiting, and ensure the router
process ("go run cmd/router/main.go --config ../demo/router-cache.yaml") runs in
the foreground so the trap can fire; use the captured PID (or a temp file) to
reliably kill the background subgraph on normal exit, interrupt, or error.

---

Minor comments:
In `@benchmark/scripts/lib.ts`:
- Around line 189-206: In resolveRequestMatrix, detect and reject
partially-specified multi-profile scenarios: if exactly one of
scenario.responseFixtures or scenario.authProfiles is present (i.e., one exists
but the other is missing/empty), throw an error indicating the manifest is
half-configured for that scenario; otherwise keep existing behavior (use
authProfiles+responseFixtures to build the matrix, or fall back to single
responseFixture/scenario.authProfile). Reference resolveRequestMatrix and the
properties scenario.responseFixtures, scenario.authProfiles,
scenario.responseFixture, and scenario.authProfile when implementing the check.

In `@benchmark/scripts/run_suite.ts`:
- Around line 65-87: The argument parsing in parseArgs currently accepts missing
or flag-like values (e.g., "--scenario --all" or non-numeric "--vus foo");
update parseArgs to validate each option's next token before assigning to
scenario, vus, duration, rampUp, and rampDown: ensure args[i+1] exists and does
not startWith("--"), and for vus (and any numeric options) convert to Number and
verify it's a finite number (and positive integer if required); if validation
fails, throw or return a descriptive error immediately instead of continuing so
invalid flags are rejected early.

In `@benchmark/scripts/start_stack.sh`:
- Around line 57-86: The router is still pointed at redis://localhost:6399/2 via
the static benchmark/router-cache.redis.yaml while start_redis.sh can start
Redis on a different port (BENCHMARK_REDIS_PORT); update start_stack.sh so the
spawn_detached call that launches "${RUN_DIR}/router.bin" uses the actual
BENCHMARK_REDIS_PORT — either by exporting a REDIS_URL (or similar) env var
constructed from BENCHMARK_REDIS_PORT before the spawn_detached call, or by
generating a temporary router config from router-cache.redis.yaml with the port
substituted and passing that config to "${RUN_DIR}/router.bin"; modify the code
around spawn_detached in start_stack.sh (and any place that references
router-cache.redis.yaml or RUN_DIR/router.bin) so the router always receives the
correct Redis address.

In `@composition/AGENTS.md`:
- Around line 11-25: The fenced code block in AGENTS.md is missing a language
tag which triggers MD040; update the opening fence from ``` to ```text (and
leave the closing fence as ```), so the block becomes a fenced "text" code
block; no changes needed to the block contents—only change the opening fence to
include "text".

In `@composition/CLAUDE.md`:
- Around line 165-168: The documentation incorrectly references
buildArgumentKeyMappingsV2; update the CLAUDE.md note to reference the actual
helper names used in code: buildArgumentKeyMappings() and getIsFieldValue()
(instead of buildArgumentKeyMappingsV2), and call out that the argument name is
FIELDS (plural) so ensure IS_DEFINITION and IS_DEFINITION_DATA use the FIELDS
constant and that buildArgumentKeyMappings() reads arg.name.value === FIELDS to
extract `@openfed__is` values.

In `@composition/COMPOSITION_CONVENTIONS.md`:
- Around line 16-51: The Markdown fenced code block containing the directory
tree in COMPOSITION_CONVENTIONS.md is missing a language tag (triggers MD040);
update the opening fence from ``` to ```text so the block is labeled as text and
ensure the closing fence remains ```; locate the block by searching for the
directory-tree snippet starting with "composition/src/" and change only the
opening fence to include the language tag.

In `@demo/pkg/injector/latency.go`:
- Around line 22-25: Replace the unconditional time.Sleep in the latency
injection middleware with a context-aware wait so the handler honors client
cancellation: after parsing ms (from strconv.Atoi) use the request context
(r.Context()) and wait with a select between
time.After(time.Duration(ms)*time.Millisecond) and ctx.Done(), returning
immediately (or skipping the sleep) if ctx.Done() fires; keep the existing
log.Printf call but consider logging that the delay was aborted when ctx.Done()
occurs.

In `@demo/pkg/subgraphs/cachegraph_ext/subgraph/data.go`:
- Around line 46-53: The toArticle function currently dereferences ext (type
*articleExtData) without checking for nil, which can panic; add a nil guard at
the start of toArticle: if ext is nil return &model.Article{ID: id} (or populate
only safe defaults) otherwise proceed to use ext.ViewCount, ext.Rating,
ext.ReviewSummary; keep the function name toArticle and the types articleExtData
and model.Article unchanged so callers remain compatible.

In `@demo/pkg/subgraphs/subgraphs.go`:
- Around line 227-228: ViewerHandler and the handler returned by subgraphs.New
currently return viewer.NewHandler() directly and therefore bypass the latency
injector; wrap the viewer handler with the shared latency middleware so
X-Artificial-Latency is respected. Concretely, replace returns of
viewer.NewHandler() in ViewerHandler and in the viewer route inside
subgraphs.New with injector.Latency(viewer.NewHandler()) (or the appropriate
injector.Latency(...) call used elsewhere) so the injected latency middleware is
applied to requests coming through those code paths.

In `@TODO.md`:
- Around line 21-24: Replace the hardcoded absolute paths in the verification
commands (the four shell lines that start with cd /Users/jens/repos/... and
PATH=/Users/jens/go/pkg/mod/...) with portable alternatives: use
repository-relative paths (e.g., cd to the local subfolders like
graphql-go-tools/v2, router, router-tests, playground) and drop the
machine-specific PATH override or replace it with a generic toolchain lookup
(rely on the developer's PATH or use environment variables such as GOMODCACHE or
GOROOT/bin if needed); ensure the commands remain the same otherwise (go test
./pkg/engine/resolve -run
'TestDetectMutationEntityImpact|TestMutationCacheTTLOverride' -count=1, go test
./core ./pkg/entitycache ./pkg/metric -count=1, go test ./entity_caching
-count=1, pnpm test && pnpm build) so they run from repo root on any machine.

---

Duplicate comments:
In `@composition/CLAUDE.md`:
- Around line 103-109: The fenced code blocks showing GraphQL directives (e.g.
`@openfed__entityCache`, `@openfed__queryCache`, `@openfed__cachePopulate`,
`@openfed__cacheInvalidate`, `@openfed__is`) should be labeled with a language tag:
change the directive block fences to use ```graphql and any command/snippet
blocks mentioned elsewhere (like shell examples) to use ```bash; update the
other unlabeled fences referenced around lines 208–225 in the same file to use
the appropriate ```graphql or ```bash tags so markdownlint/MD040 no longer flags
them.

In `@composition/src/v1/normalization/normalization-factory.ts`:
- Around line 3797-3829: processRootFieldCacheDirectives() is using the raw map
key (parentTypeName) when calling extractQueryCacheConfig /
extractCacheInvalidateConfig / extractCachePopulateConfig, which causes renamed
schema roots to be stored under the wrong type; compute the canonical root type
name from the operationType node (e.g. the NamedType node on operationType, such
as operationType.type.name.value) and pass that canonicalRootTypeName to
extractQueryCacheConfig/extractCacheInvalidateConfig/extractCachePopulateConfig
instead of parentTypeName so root-field cache config is stored under the
standard Query/Mutation/Subscription types.

---

Nitpick comments:
In `@benchmark/k6/cache_demo.js`:
- Around line 5-9: The module currently parses BENCHMARK_PAYLOAD at init which
can throw; wrap the JSON.parse of __ENV.BENCHMARK_PAYLOAD in a try-catch so
malformed JSON triggers a clear fail(...) instead of an uncaught exception:
attempt to parse into the payload variable (keeping the "{}" fallback), and in
the catch call fail with a message that includes the parse error and the raw
__ENV.BENCHMARK_PAYLOAD value so it's easy to diagnose; update references to
payload as before.

In `@benchmark/scripts/lib.ts`:
- Around line 13-103: Replace the exported object-shape type aliases with
exported interfaces: convert Scenario, Manifest, RequestVariant, K6Stage,
ComparableModeSummary, ModeComparison, ScenarioVariantSummary, and SuiteSummary
from "export type X = { ... }" to "export interface X { ... }", preserving all
property names, optional modifiers, nested object structures (e.g., cache,
router, redis), and exported status; ensure any places that import or reference
these remain valid (no runtime changes needed) and adjust only the declaration
syntax for those specific symbols.
- Around line 115-125: MODE_ORDER and COMPARISON_PAIRS are typed too broadly as
string which lets missing/renamed modes slip through; change their types to be
explicit over the Mode union so the compiler enforces exhaustiveness. Replace
Record<string, number> with Record<Mode, number> for MODE_ORDER and change
COMPARISON_PAIRS to Array<readonly [baselineMode: Mode, candidateMode: Mode]>
(or readonly (readonly [Mode, Mode])[]), and ensure the object/array literal
actually contains an entry for every member of the Mode union so the compiler
will error if a Mode is added or renamed; update the declarations of MODE_ORDER
and COMPARISON_PAIRS (and any const assertions) accordingly.

In `@composition/src/errors/errors.ts`:
- Around line 1862-1871: The message builder
explicitCompositeAdditionalNonKeyArgumentErrorMessage currently assumes exactly
two mapped arguments (firstArgument, secondArgument) which hides additional
mapped paths; change its signature to accept a list of mapped argument paths
(e.g., mappedArguments: string[]) instead of firstArgument/secondArgument and
build the diagnostic by joining all mappedArguments (comma-separated, with
proper "and" before last item) so the message includes every mapped key path
along with the extraArgument and compositeKey/entityType; update all call sites
that pass firstArgument/secondArgument to instead pass the array of decomposed
per-field mapping paths (recursing into nested input objects where necessary) so
diagnostics correctly report 3+ mapped arguments.

In `@TODO.md`:
- Line 167: Replace the phrase "user asked to not defer" with "user asked not to
defer" in the TODO.md text to fix the word-order nitpick; search for the exact
substring "user asked to not defer" (the resolved sentence mentioning "another
PR" and metrics consolidation) and update it to "user asked not to defer" for
smoother reading.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cbaaac0d-212c-48a2-bf71-4c51cd586799

📥 Commits

Reviewing files that changed from the base of the PR and between b678db7 and 18364a0.

⛔ Files ignored due to path filters (12)
  • connect-go/gen/proto/wg/cosmo/node/v1/node.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • connect-go/gen/proto/wg/cosmo/platform/v1/platform.pb.go is excluded by !**/*.pb.go, !**/gen/**
  • demo/pkg/subgraphs/cachegraph/subgraph/generated/federation.go is excluded by !**/generated/**
  • demo/pkg/subgraphs/cachegraph/subgraph/generated/generated.go is excluded by !**/generated/**
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/generated/federation.go is excluded by !**/generated/**
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/generated/federation.requires.go is excluded by !**/generated/**
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/generated/generated.go is excluded by !**/generated/**
  • demo/pkg/subgraphs/projects/generated/service.pb.go is excluded by !**/*.pb.go, !**/generated/**
  • demo/pkg/subgraphs/projects/generated/service.proto is excluded by !**/generated/**
  • demo/pkg/subgraphs/projects/generated/service.proto.lock.json is excluded by !**/generated/**
  • demo/pkg/subgraphs/viewer/subgraph/generated/federation.go is excluded by !**/generated/**
  • demo/pkg/subgraphs/viewer/subgraph/generated/generated.go is excluded by !**/generated/**
📒 Files selected for processing (93)
  • FEEDBACK.md
  • Makefile
  • TODO.md
  • benchmark/.gitignore
  • benchmark/README.md
  • benchmark/fixtures/article_simple.response.json
  • benchmark/fixtures/articles_by_ids_batch.response.json
  • benchmark/fixtures/catalogs_partial_load.response.json
  • benchmark/fixtures/listing_composite_key.response.json
  • benchmark/fixtures/request_scoped_viewer_articles.response.json
  • benchmark/fixtures/user_profile_header_sensitive.alice.response.json
  • benchmark/fixtures/user_profile_header_sensitive.bob.response.json
  • benchmark/fixtures/venue_nested_key.response.json
  • benchmark/fixtures/viewer_articles_deep_nested.response.json
  • benchmark/k6/cache_demo.js
  • benchmark/queries/article_simple.graphql
  • benchmark/queries/articles_by_ids_batch.graphql
  • benchmark/queries/catalogs_partial_load.graphql
  • benchmark/queries/listing_composite_key.graphql
  • benchmark/queries/request_scoped_viewer_articles.graphql
  • benchmark/queries/user_profile_header_sensitive.graphql
  • benchmark/queries/venue_nested_key.graphql
  • benchmark/queries/viewer_articles_deep_nested.graphql
  • benchmark/router-cache.redis.yaml
  • benchmark/scenarios/cache-demo.json
  • benchmark/scripts/capture_pprof.sh
  • benchmark/scripts/capture_redis_stats.sh
  • benchmark/scripts/lib.test.ts
  • benchmark/scripts/lib.ts
  • benchmark/scripts/run_suite.ts
  • benchmark/scripts/scrape_metrics.ts
  • benchmark/scripts/start_redis.sh
  • benchmark/scripts/start_stack.sh
  • benchmark/scripts/stop_redis.sh
  • benchmark/scripts/stop_stack.sh
  • benchmark/scripts/verify_equivalence.ts
  • benchmark/scripts/wait_ready.sh
  • composition-go/index.global.js
  • composition/AGENTS.md
  • composition/CLAUDE.md
  • composition/COMPOSITION_CONVENTIONS.md
  • composition/src/errors/errors.ts
  • composition/src/router-configuration/types.ts
  • composition/src/utils/string-constants.ts
  • composition/src/v1/constants/directive-definitions.ts
  • composition/src/v1/normalization/normalization-factory.ts
  • composition/src/v1/warnings/params.ts
  • composition/src/v1/warnings/warnings.ts
  • composition/tests/v1/directives/entity-cache-fuzz.test.ts
  • composition/tests/v1/directives/entity-cache-mapping-rules.test.ts
  • composition/tests/v1/directives/entity-caching.test.ts
  • connect/src/wg/cosmo/node/v1/node_pb.ts
  • demo/Makefile
  • demo/cmd/all/main.go
  • demo/cmd/cache-demo/main.go
  • demo/cmd/cachegraph/main.go
  • demo/config-cache-only.json
  • demo/go.mod
  • demo/graph-cache-only.yaml
  • demo/graph.yaml
  • demo/pkg/injector/latency.go
  • demo/pkg/subgraphs/cachegraph/cachegraph.go
  • demo/pkg/subgraphs/cachegraph/generate.go
  • demo/pkg/subgraphs/cachegraph/gqlgen.yml
  • demo/pkg/subgraphs/cachegraph/subgraph/data.go
  • demo/pkg/subgraphs/cachegraph/subgraph/data_test.go
  • demo/pkg/subgraphs/cachegraph/subgraph/entity.resolvers.go
  • demo/pkg/subgraphs/cachegraph/subgraph/model/models_gen.go
  • demo/pkg/subgraphs/cachegraph/subgraph/resolver.go
  • demo/pkg/subgraphs/cachegraph/subgraph/schema.graphqls
  • demo/pkg/subgraphs/cachegraph/subgraph/schema.resolvers.go
  • demo/pkg/subgraphs/cachegraph_ext/cachegraph_ext.go
  • demo/pkg/subgraphs/cachegraph_ext/generate.go
  • demo/pkg/subgraphs/cachegraph_ext/gqlgen.yml
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/data.go
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/entity.resolvers.go
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/model/models_gen.go
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/resolver.go
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/schema.graphqls
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/schema.resolvers.go
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/schema.resolvers_test.go
  • demo/pkg/subgraphs/subgraphs.go
  • demo/pkg/subgraphs/viewer/generate.go
  • demo/pkg/subgraphs/viewer/gqlgen.yml
  • demo/pkg/subgraphs/viewer/subgraph/context.go
  • demo/pkg/subgraphs/viewer/subgraph/data.go
  • demo/pkg/subgraphs/viewer/subgraph/entity.resolvers.go
  • demo/pkg/subgraphs/viewer/subgraph/model/models_gen.go
  • demo/pkg/subgraphs/viewer/subgraph/resolver.go
  • demo/pkg/subgraphs/viewer/subgraph/schema.graphqls
  • demo/pkg/subgraphs/viewer/subgraph/schema.resolvers.go
  • demo/pkg/subgraphs/viewer/viewer.go
  • demo/router-cache.yaml
✅ Files skipped from review due to trivial changes (39)
  • benchmark/fixtures/venue_nested_key.response.json
  • benchmark/fixtures/article_simple.response.json
  • benchmark/.gitignore
  • benchmark/fixtures/user_profile_header_sensitive.alice.response.json
  • benchmark/fixtures/articles_by_ids_batch.response.json
  • benchmark/fixtures/catalogs_partial_load.response.json
  • benchmark/fixtures/user_profile_header_sensitive.bob.response.json
  • demo/pkg/subgraphs/cachegraph_ext/generate.go
  • demo/pkg/subgraphs/viewer/generate.go
  • benchmark/queries/article_simple.graphql
  • benchmark/fixtures/listing_composite_key.response.json
  • benchmark/queries/articles_by_ids_batch.graphql
  • demo/pkg/subgraphs/cachegraph/generate.go
  • benchmark/queries/catalogs_partial_load.graphql
  • benchmark/queries/user_profile_header_sensitive.graphql
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/resolver.go
  • demo/graph.yaml
  • demo/pkg/subgraphs/cachegraph/subgraph/resolver.go
  • benchmark/fixtures/request_scoped_viewer_articles.response.json
  • benchmark/queries/venue_nested_key.graphql
  • benchmark/queries/listing_composite_key.graphql
  • benchmark/queries/request_scoped_viewer_articles.graphql
  • benchmark/router-cache.redis.yaml
  • benchmark/scripts/capture_pprof.sh
  • demo/graph-cache-only.yaml
  • benchmark/fixtures/viewer_articles_deep_nested.response.json
  • benchmark/scenarios/cache-demo.json
  • benchmark/README.md
  • demo/pkg/subgraphs/cachegraph_ext/gqlgen.yml
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/entity.resolvers.go
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/schema.resolvers.go
  • benchmark/queries/viewer_articles_deep_nested.graphql
  • demo/pkg/subgraphs/cachegraph/subgraph/data.go
  • demo/pkg/subgraphs/cachegraph/gqlgen.yml
  • demo/pkg/subgraphs/viewer/subgraph/data.go
  • demo/pkg/subgraphs/cachegraph_ext/subgraph/schema.graphqls
  • composition/src/v1/warnings/params.ts
  • composition/tests/v1/directives/entity-cache-mapping-rules.test.ts
  • composition/tests/v1/directives/entity-caching.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • composition/src/router-configuration/types.ts
  • composition/src/v1/warnings/warnings.ts

Comment on lines +70 to +72
{
headers: payload.headers || { "content-type": "application/json" },
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Merge custom headers with the JSON default instead of replacing it.

buildK6Payload() always includes a headers object, so this fallback never runs. That means benchmark requests drop content-type: application/json unless every caller remembers to add it manually.

Suggested fix
     {
-      headers: payload.headers || { "content-type": "application/json" },
+      headers: {
+        "content-type": "application/json",
+        ...(payload.headers || {}),
+      },
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
headers: payload.headers || { "content-type": "application/json" },
},
{
headers: {
"content-type": "application/json",
...(payload.headers || {}),
},
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/k6/cache_demo.js` around lines 70 - 72, The current buildK6Payload
implementation replaces the default JSON header when payload.headers is
provided, causing requests to lose "content-type: application/json"; change the
header assignment in buildK6Payload to merge the defaults with any custom
headers (e.g., combine { "content-type": "application/json" } with
payload.headers using a shallow merge so custom keys override defaults but
default content-type remains unless explicitly overridden).

Comment thread benchmark/scripts/lib.ts
Comment on lines +651 to +678
export async function fetchJsonWithRetry(input: {
url: string;
init: RequestInit;
fetchImpl?: typeof fetch;
retries?: number;
}): Promise<unknown> {
const fetchImpl = input.fetchImpl ?? fetch;
const retries = input.retries ?? 1;

for (let attempt = 0; ; attempt += 1) {
try {
const response = await fetchImpl(input.url, input.init);
const body = await response.text();

if (!response.ok) {
throw new Error(`request failed: ${response.status} ${body}`);
}

return JSON.parse(body);
} catch (error) {
if (attempt >= retries || !isRetryableFetchError(error)) {
throw error;
}

await delay(100);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and examine the function
fd -t f "lib.ts" | grep benchmark

Repository: wundergraph/cosmo

Length of output: 85


🏁 Script executed:

# Read the specific lines of the file to verify the code snippet
sed -n '651,678p' benchmark/scripts/lib.ts

Repository: wundergraph/cosmo

Length of output: 750


🏁 Script executed:

# Check the broader context around the function to understand its full implementation
sed -n '640,690p' benchmark/scripts/lib.ts

Repository: wundergraph/cosmo

Length of output: 1372


🏁 Script executed:

# Search for usages of fetchJsonWithRetry to understand how it's called
rg "fetchJsonWithRetry" -A 2 -B 2

Repository: wundergraph/cosmo

Length of output: 1770


🏁 Script executed:

# Check if there are any similar timeout/retry patterns in the codebase
rg "AbortController|AbortSignal" benchmark/

Repository: wundergraph/cosmo

Length of output: 43


🏁 Script executed:

# Check the delay function implementation to understand the retry pattern
rg "function delay|const delay" -A 3

Repository: wundergraph/cosmo

Length of output: 1956


🏁 Script executed:

# Look at RequestInit type to understand what signal property is available
rg "RequestInit" -A 5 -B 5 --max-count=5

Repository: wundergraph/cosmo

Length of output: 4572


🏁 Script executed:

# Find the isRetryableFetchError function to see if it handles AbortError
rg "isRetryableFetchError" -A 10 benchmark/scripts/lib.ts

Repository: wundergraph/cosmo

Length of output: 529


🏁 Script executed:

# Check if AbortError is handled anywhere in the error handling logic
rg "AbortError" benchmark/scripts/

Repository: wundergraph/cosmo

Length of output: 43


🏁 Script executed:

# Verify if any calls to fetchJsonWithRetry already pass a signal in init
rg "fetchJsonWithRetry" -A 5 benchmark/ | grep -A 5 "signal"

Repository: wundergraph/cosmo

Length of output: 43


🏁 Script executed:

# Double-check if any test cases might be affected by timeout changes
cat -n benchmark/scripts/lib.test.ts | grep -A 20 "fetchJsonWithRetry retries"

Repository: wundergraph/cosmo

Length of output: 803


Add a timeout to each fetch attempt.

This retry wrapper talks to the router over the network but never attaches an AbortSignal, so a hung connection can stall the entire benchmark run until the OS-level timeout kicks in. Bound each attempt with a timeout and feed that timeout into the retry path.

Consider merging signals instead of overwriting them in the proposed fix. The spread should conditionally add the signal to avoid overriding an existing signal in input.init:

⏱️ Proposed fix
 export async function fetchJsonWithRetry(input: {
   url: string;
   init: RequestInit;
   fetchImpl?: typeof fetch;
   retries?: number;
+  timeoutMs?: number;
 }): Promise<unknown> {
   const fetchImpl = input.fetchImpl ?? fetch;
   const retries = input.retries ?? 1;
+  const timeoutMs = input.timeoutMs ?? 10_000;
 
   for (let attempt = 0; ; attempt += 1) {
+    const controller = new AbortController();
+    const timeout = setTimeout(() => controller.abort(), timeoutMs);
     try {
-      const response = await fetchImpl(input.url, input.init);
+      const response = await fetchImpl(input.url, {
+        ...input.init,
+        signal: controller.signal,
+      });
       const body = await response.text();
 
       if (!response.ok) {
         throw new Error(`request failed: ${response.status} ${body}`);
       }
 
       return JSON.parse(body);
     } catch (error) {
       if (attempt >= retries || !isRetryableFetchError(error)) {
         throw error;
       }
 
       await delay(100);
+    } finally {
+      clearTimeout(timeout);
     }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/lib.ts` around lines 651 - 678, In fetchJsonWithRetry, wrap
each attempt with a per-attempt timeout by creating an AbortController and
passing its signal into the fetch call without discarding any existing
input.init.signal (merge: if input.init.signal exists, race it with the timeout
controller and forward a composite signal or listen to input.init.signal to
abort the timeout controller); ensure the timeout controller is aborted/cleared
after each attempt and that the signal is included in the RequestInit passed to
fetchImpl (do not overwrite input.init directly), so the timeout-triggered
AbortError flows into isRetryableFetchError and the retry logic continues
correctly.

Comment on lines +54 to +57
const response = await fetch("http://127.0.0.1:8088/metrics");
if (!response.ok) {
throw new Error(`failed to fetch router metrics: ${response.status}`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the file exists and get size
wc -l benchmark/scripts/scrape_metrics.ts

Repository: wundergraph/cosmo

Length of output: 99


🏁 Script executed:

# Read the full file to understand context
cat -n benchmark/scripts/scrape_metrics.ts

Repository: wundergraph/cosmo

Length of output: 2532


🏁 Script executed:

# Search for AbortController usage in the codebase to understand patterns
rg "AbortController" --type ts --type tsx -B 2 -A 5

Repository: wundergraph/cosmo

Length of output: 88


🏁 Script executed:

# Search for fetch patterns and timeout handling
rg "fetch\(" --type ts --type tsx -B 1 -A 3 | head -100

Repository: wundergraph/cosmo

Length of output: 88


🏁 Script executed:

# Search for AbortController usage with correct syntax
rg "AbortController" --type ts -B 2 -A 5

Repository: wundergraph/cosmo

Length of output: 1686


🏁 Script executed:

# Search for fetch patterns with timeout/abort handling
rg "fetch\(" --type ts -B 1 -A 5 | head -150

Repository: wundergraph/cosmo

Length of output: 10849


🏁 Script executed:

# Check if there are setTimeout patterns used for request timeouts
rg "setTimeout.*fetch|AbortController|signal:" --type ts -B 1 -A 3

Repository: wundergraph/cosmo

Length of output: 3101


Add timeout to metrics fetch to prevent indefinite hangs.

The fetch call has no timeout, so a wedged router will block the benchmark indefinitely. Add an explicit cancellation path using AbortController so the suite fails fast instead.

Suggested fix
-  const response = await fetch("http://127.0.0.1:8088/metrics");
+  const controller = new AbortController();
+  const timeout = setTimeout(() => controller.abort(), 5_000);
+  let response: Response;
+  try {
+    response = await fetch("http://127.0.0.1:8088/metrics", {
+      signal: controller.signal,
+    });
+  } finally {
+    clearTimeout(timeout);
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const response = await fetch("http://127.0.0.1:8088/metrics");
if (!response.ok) {
throw new Error(`failed to fetch router metrics: ${response.status}`);
}
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5_000);
let response: Response;
try {
response = await fetch("http://127.0.0.1:8088/metrics", {
signal: controller.signal,
});
} finally {
clearTimeout(timeout);
}
if (!response.ok) {
throw new Error(`failed to fetch router metrics: ${response.status}`);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/scrape_metrics.ts` around lines 54 - 57, The fetch for
metrics currently has no timeout and can hang; wrap the fetch in an
AbortController: create a controller, pass controller.signal to
fetch("http://127.0.0.1:8088/metrics"), start a timer (e.g., 5000ms) that calls
controller.abort() on expiry, clear the timer after fetch resolves, and handle
AbortError to throw a clear "metrics fetch timeout" error; update the code
around the response/response.ok checks to use this abort-enabled fetch flow
(reference AbortController, controller.signal, response).

Comment on lines +13 to +16
if kill -0 "${pid}" >/dev/null 2>&1; then
kill "${pid}" >/dev/null 2>&1 || true
wait "${pid}" 2>/dev/null || true
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the stop_stack.sh file to understand the full context
cat -n benchmark/scripts/stop_stack.sh

Repository: wundergraph/cosmo

Length of output: 836


🏁 Script executed:

# Now let's examine the start_stack.sh file to see how processes are started
cat -n benchmark/scripts/start_stack.sh

Repository: wundergraph/cosmo

Length of output: 2615


🏁 Script executed:

# Let's also check what processes are being started and how they're managed
grep -r "\.run" benchmark/scripts/ --include="*.sh" -B2 -A2

Repository: wundergraph/cosmo

Length of output: 1133


wait "${pid}" is ineffective for these detached processes and creates a race condition.

In start_stack.sh, processes are spawned with start_new_session=True (line 30), making them independent process groups—not children of the current shell. Bash's wait builtin only works on child processes, so it returns immediately (errors suppressed), and the PID file gets deleted before the actual process finishes shutting down. This can race with the next benchmark start.

Replace with a polling loop that confirms process termination:

Suggested fix
     if kill -0 "${pid}" >/dev/null 2>&1; then
       kill "${pid}" >/dev/null 2>&1 || true
-      wait "${pid}" 2>/dev/null || true
+      for _ in {1..50}; do
+        if ! kill -0 "${pid}" >/dev/null 2>&1; then
+          break
+        fi
+        sleep 0.1
+      done
+      if kill -0 "${pid}" >/dev/null 2>&1; then
+        kill -9 "${pid}" >/dev/null 2>&1 || true
+      fi
     fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/scripts/stop_stack.sh` around lines 13 - 16, The current shutdown
in stop_stack.sh uses wait "${pid}" which is ineffective for detached processes
started with start_new_session=True and can race; replace the wait call for
"${pid}" with a polling loop that repeatedly checks kill -0 "${pid}" (or
/proc/${pid} existence) until the process no longer exists or a timeout elapses,
sleeping briefly between checks and then proceed to remove the pid file; ensure
the loop uses the same ${pid} variable and enforces a reasonable timeout and
graceful fallback (e.g., log/warn and continue) so stop_stack.sh reliably waits
for the detached process to exit.

Comment on lines +766 to +789
// directive @openfed__requestScoped(key: String!) on FIELD_DEFINITION
//
// Marks a field as resolving to the same value for the entire request, shared with
// all other fields in the same subgraph that use the same `key`. Purely symmetric:
// every field that participates declares itself with the directive. There is no
// receiver/provider distinction. Whichever field is resolved first populates the
// per-request L1 cache under `{subgraphName}.{key}`; subsequent fields with the
// same key are served from L1 (and can skip the subgraph fetch when possible).
export const REQUEST_SCOPED_DEFINITION: DirectiveDefinitionNode = {
arguments: [
{
kind: Kind.INPUT_VALUE_DEFINITION,
name: stringToNameNode(KEY),
type: {
kind: Kind.NON_NULL_TYPE,
type: stringToNamedTypeNode(STRING_SCALAR),
},
},
],
kind: Kind.DIRECTIVE_DEFINITION,
locations: stringArrayToNameNodeArray([FIELD_DEFINITION_UPPER]),
name: stringToNameNode(REQUEST_SCOPED),
repeatable: false,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Register REQUEST_SCOPED_DEFINITION in the directive registry.

This definition is added here, but composition/src/v1/constants/constants.ts:1-143 still doesn't import REQUEST_SCOPED / REQUEST_SCOPED_DEFINITION or add it to DIRECTIVE_DEFINITION_BY_NAME. That leaves @openfed__requestScoped undiscoverable during normalization, so request-scoped config never gets emitted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/src/v1/constants/directive-definitions.ts` around lines 766 -
789, The new REQUEST_SCOPED_DEFINITION and its name symbol REQUEST_SCOPED are
not registered in the directive registry used during normalization; import
REQUEST_SCOPED and REQUEST_SCOPED_DEFINITION into the directive registry module
(where DIRECTIVE_DEFINITION_BY_NAME is defined) and add an entry mapping
REQUEST_SCOPED to REQUEST_SCOPED_DEFINITION in DIRECTIVE_DEFINITION_BY_NAME so
the `@openfed__requestScoped` directive is discoverable and emitted during
normalization.

Comment on lines +4333 to +4339
// Check if @openfed__is targets multiple fields (composite key via input object)
const isCompositeIsSpec = isFieldValue.includes(LITERAL_SPACE);

if (isCompositeIsSpec) {
return this.buildCompositeIsMapping(
fieldCoords, entityTypeName, keyFieldSets, isListReturn, argInfo,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let one composite @openfed__is suppress other satisfiable keys.

Returning from buildExplicitMappings() here means a single input-object mapping prevents any remaining explicit or auto-mappable alternative @key from being emitted for the same field. That breaks multi-key OR semantics.

As per coding guidelines, "In composition entity key mapping, each independent @key directive should produce its own EntityKeyMappingConfig with OR semantics, not be merged into a single composite mapping with AND semantics."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@composition/src/v1/normalization/normalization-factory.ts` around lines 4333
- 4339, The current early return from buildExplicitMappings when
isCompositeIsSpec is true causes a composite `@openfed__is` mapping to suppress
other satisfiable keys; instead of returning the result of
buildCompositeIsMapping, call buildCompositeIsMapping with (fieldCoords,
entityTypeName, keyFieldSets, isListReturn, argInfo) and append its produced
EntityKeyMappingConfig(s) to the mappings collection used by
buildExplicitMappings (do not merge into a single composite that enforces AND),
then continue processing remaining explicit/auto-mappable `@key` directives so
each independent `@key` yields its own EntityKeyMappingConfig (OR semantics).
Ensure you reference and update the same mappings accumulator used later in
buildExplicitMappings and remove the early return path.

Comment thread composition/src/v1/normalization/normalization-factory.ts
Comment thread demo/cmd/cache-demo/main.go
Comment on lines +17 to +46
func (r *mutationResolver) UpdateArticle(ctx context.Context, id string, title string) (*model.Article, error) {
for _, a := range articlesData {
if a.ID == id {
a.Title = title
return a, nil
}
}
return nil, nil
}

// CreateArticle is the resolver for the createArticle field.
func (r *mutationResolver) CreateArticle(ctx context.Context, title string, body string, authorName string) (*model.Article, error) {
a := &model.Article{
ID: fmt.Sprintf("%d", len(articlesData)+1),
Title: title,
Body: body,
AuthorName: authorName,
PublishedAt: time.Now().Format(time.RFC3339),
Tags: []string{},
}
articlesData = append(articlesData, a)
return a, nil
}

// DeleteListing is the resolver for the deleteListing field.
func (r *mutationResolver) DeleteListing(ctx context.Context, key model.ListingKey) (*model.Listing, error) {
k := listingKey{SellerID: key.SellerID, SKU: key.Sku}
l := listingsData[k]
delete(listingsData, k)
return l, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Protect the shared demo data before mutating it.

These mutations write package-level state that the query resolvers read concurrently. DeleteListing can panic with concurrent map read and map write, and CreateArticle / UpdateArticle race with the article queries under parallel traffic. Please guard these collections with a sync.RWMutex or switch the demo store to copy-on-write snapshots.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/pkg/subgraphs/cachegraph/subgraph/schema.resolvers.go` around lines 17 -
46, The package-level slices/maps (articlesData, listingsData) are mutated
without synchronization causing races and possible "concurrent map read and map
write" panics; protect access with a sync.RWMutex (e.g., declare a package-level
var articlesMu sync.RWMutex and listingsMu sync.RWMutex) and use
articlesMu.Lock()/Unlock() for CreateArticle and UpdateArticle mutations and
articlesMu.RLock()/RUnlock() for query readers, and similarly use
listingsMu.Lock()/Unlock() around DeleteListing (and any listing reads) when
accessing listingsData; alternatively implement copy-on-write snapshots for
writes and atomically replace the store if you prefer immutability.

Comment thread Makefile
Comment on lines +195 to +196
cd demo && go run cmd/all/main.go & \
sleep 2 && cd router && go run cmd/router/main.go --config ../demo/router-cache.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clean up the background subgraph process on exit.

go run cmd/all/main.go is backgrounded here, but nothing tears it down when the router exits or the user interrupts make demo. That leaves the demo subgraphs orphaned and the ports occupied for the next run.

Suggested fix
 demo:
 	`@echo` "Composing subgraph schemas..."
 	cd demo && $(MAKE) compose-cache
 	`@echo` "Starting subgraphs and router..."
 	`@echo` "Playground will be at http://localhost:3002/"
-	cd demo && go run cmd/all/main.go & \
-	sleep 2 && cd router && go run cmd/router/main.go --config ../demo/router-cache.yaml
+	`@set` -e; \
+	(cd demo && go run cmd/all/main.go) & \
+	subgraphs_pid=$$!; \
+	trap 'kill $$subgraphs_pid >/dev/null 2>&1 || true' EXIT INT TERM; \
+	sleep 2; \
+	cd router && go run cmd/router/main.go --config ../demo/router-cache.yaml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 195 - 196, The demo target currently backgrounds the
subgraph process started by "go run cmd/all/main.go" and never cleans it up;
modify the Makefile commands to capture the background PID (from the "go run
cmd/all/main.go &" invocation), install a shell trap for INT/EXIT that kills
that PID (and waits for any children) before exiting, and ensure the router
process ("go run cmd/router/main.go --config ../demo/router-cache.yaml") runs in
the foreground so the trap can fire; use the captured PID (or a temp file) to
reliably kill the background subgraph on normal exit, interrupt, or error.

- Regenerate Go proto bindings with protoc-gen-go v1.36.10 (matches Makefile pin)
- Reset pnpm-lock.yaml to main's resolution so rollup@4.59.0 override applies
  (was pinned to 4.46.2 with high-severity advisory)
- Run go generate ./... in demo to refresh gqlgen-generated cachegraph/viewer
- Re-run pnpm build:router to refresh embedded graphiql.html
- Format composition docs and modified files (AGENTS.md, CLAUDE.md,
  COMPOSITION_CONVENTIONS.md, errors.ts, normalization-factory.ts, warnings.ts,
  entity-cache-fuzz.test.ts, entity-cache-mapping-rules.test.ts) with prettier

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jensneuse and others added 5 commits April 20, 2026 09:11
…arnings

Adds to CLAUDE.md:
- Split between router playground (has Cache Explorer) and studio playground (does not, since it uses its own GraphiQL wrapper and does not import @wundergraph/playground).
- End-to-end bring-up recipe for running infra in docker while running controlplane, studio, and router locally — including the keycloak image rebuild workaround, the explicit pnpm migrate step before seed, and the port map.
- gqlgen.yml gotcha: `skip_runtime: true` only applies when the directive key matches the full schema name, including the `openfed__` prefix. Mismatched names silently generate a runtime handler that panics with "directive ... is not implemented" on every subgraph fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nfigs

The subgraph schemas declare `@openfed__entityCache`, `@openfed__queryCache`, `@openfed__cacheInvalidate`, `@openfed__cachePopulate`, `@openfed__is`, and `@openfed__requestScoped`, but gqlgen.yml listed the directives under their short names (`entityCache`, `queryCache`, etc.). Gqlgen matches `directives:` keys against the SDL directive name, so `skip_runtime: true` never applied. The generated code retained runtime handlers on `DirectiveRoot` that panic with "directive ... is not implemented" whenever a federated router fetches from the subgraph.

Fix: rename the keys to their fully prefixed names and regenerate. The regenerated `generated.go` no longer contains `Openfed__*` fields on `DirectiveRoot`.

Also:
- Add `demo/router-cache-cp.yaml` — a control-plane-connected router config for the cache demo, with entity_caching + L2 Redis enabled and no `execution_config.file`.
- Check in `cli/cachegraph-cachedemo/` — `pnpm wgc federated-graph fetch` export for the cache demo graph, useful as a reference for the composed SDL and router config when working against the control plane.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…layground

The router-embedded playground (`playground/` package) already had a Cache Explorer view, a per-request cache-mode dropdown (`X-WG-Disable-Entity-Cache*` headers), and cache-annotated request traces — but the studio playground (`studio/src/components/playground/*` + `studio/src/pages/[...]/playground.tsx`) is a separate implementation that never imported `@wundergraph/playground` and was missing all three.

Port the cache explorer infrastructure into studio so graph owners get the same experience when opening the studio playground, not only the router-served one:

- Copy `cache-explorer-{types,controller,utils,runner,view}.{ts,tsx}` and `view-cache.tsx` from `playground/src/components/playground/` into `studio/src/components/playground/`. Only one import path needed adjustment (`@/lib/use-local-storage` → `@/hooks/use-local-storage`).
- Convert two `for (const … of …)` iterations over `Set`/`Map.entries()` to `.forEach(…)` so the code compiles under studio's non-downlevelIteration tsconfig.
- Extend `studio/src/components/playground/types.ts` with `'cache-explorer'` on `PlaygroundView`, the `CacheMode` type, `cacheMode`/`setCacheMode` on the context, the `CacheTrace`/`CacheStatus` helpers, and `cacheTrace?: CacheTrace` on `ARTFetchNode`.
- In `studio/src/pages/[organizationSlug]/[namespace]/graph/[slug]/playground.tsx`:
  - Thread a `cacheMode` React state + ref through the fetcher and inject `X-WG-Disable-Entity-Cache{,-L1,-L2}` headers based on the selected mode.
  - Add a `CacheControl` select next to the existing view dropdown, wired to the context.
  - Add the Cache Explorer intercept: when `view === 'cache-explorer'` and the play button was clicked, dispatch the config to `cacheExplorerController.start(...)` and return a synthetic response so GraphiQL's response pane stays quiet while the runner takes over.
  - Register a `cache-explorer-visualization` wrapper div and portal `CacheExplorerView` into it, mirroring the existing ART / plan wrappers.
  - Add `CacheBadge` + `collectCacheSummary` so non-explorer runs still show the hits/total pill on the response toolbar.
  - Keep the user's headers JSON in sync with the selected cache mode via `applyCacheModeHeaders`, so the injected `X-WG-Disable-Entity-Cache*` headers are visible in the editor.

Also update `demo/router-cache-cp.yaml` CORS so the studio origin can send the cache-control headers and run the explorer cross-origin.

Verified end-to-end with a 10-iteration cache demo run from the studio playground: 2.73x server-side speedup, 100% cache ratio, 30 eliminated subgraph HTTP calls, 12.7KB -> 0B router-to-subgraph bytes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce UI into studio playground

## Router CORS

The studio playground's Cache Explorer and per-request cache-mode dropdown need to send these headers: `X-WG-Disable-Entity-Cache`, `X-WG-Disable-Entity-Cache-L1`, `X-WG-Disable-Entity-Cache-L2`, and `X-WG-Cache-Key-Prefix`. When studio is on a different origin than the router (which is the norm in real deployments, and also how the local docker-compose layout works), the browser does a CORS preflight that's rejected because those headers are not in `Access-Control-Allow-Headers`.

Fix it at the router level so the feature works out of the box against any router, not only against ones whose operator happens to remember to open those headers in `router.yaml`: append the four cache-mode headers to the list of defaults that `router/core/router.go` unions into the user's `CORS.AllowHeaders`. Alongside the already-merged WunderGraph ART / feature flag / tracing headers, this list is the set of headers the router's own playground tooling sends — so it's the right place for studio's new cache-explorer tooling to live too.

With that default in place, `demo/router-cache-cp.yaml` no longer needs a full `cors:` override. It keeps only the demo-specific `X-Artificial-Latency` header, which the cachegraph demo subgraphs use to inject synthetic latency so the cache effect is visible.

## Studio playground backport

While porting the cache explorer I noticed the studio playground also had drifted from `playground/` in three files that render the Request Trace:

- `trace-view.tsx` had no `parseCacheTrace`, so `ARTFetchNode.cacheTrace` was always undefined even when the server returned `datasource_load_trace.cache_trace`. Also missing `minimumVisibleDurationNs` and the `forcedTheme` context field.
- `fetch-flow.tsx` was missing the cache-status block on each node card (Cache: L1/L2 HIT / MISS / NO LOOKUP with hit/miss counts), the `ViewCache` modal button, and the `!data.loadSkipped` guard on the status-code badge. The latter caused a stale 200 OK badge to render alongside the red skipped-fetch bolt.
- `fetch-waterfall.tsx` was missing the `BoltIcon` cache-hit indicator next to each subgraph name, plus the full block of cache attribute rows (cache name, TTL, entity count, L1/L2 enabled, L1/L2 hit/miss, cache duration, L2 get/set, and a cache-keys preview).

Copy those three files verbatim from `playground/src/components/playground/` and rewrite the `nsToTime` import from `@/lib/utils` to `@/lib/insights-helpers` (studio's location). Every other dependency was already satisfied by prior commits on this branch (types.ts has the `CacheTrace` helpers, `view-cache.tsx` is present, `@/lib/utils:cn` and `@heroicons/react/24/solid:BoltIcon` exist in studio).

End-to-end verified: running the same query twice in the studio playground now shows the 3/3 Cached badge, lightning-bolt L2-hit markers on every row of the waterfall view, and the full cache statistics on the tree-view node cards — matching the router-embedded playground.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jensneuse and others added 3 commits April 20, 2026 09:59
Studio's `pnpm lint` (and `next build`) runs stricter rules than the playground package, so the verbatim copy of `cache-explorer-view.tsx` broke CI with four errors:

- `react-hooks/rules-of-hooks` at lines 1182 and 1186: two `useMemo` calls lived after an `if (!cachedPlan && !uncachedPlan) return null;` early return, which skips them on the empty path.
  Move both hooks above the early return so hooks run in a stable order on every render (the hook results are cheap even when both inputs are undefined).
- `react/no-unescaped-entities` at line 1380: two `"` characters directly in JSX text.
  Replace them with `&quot;` entities.

No behavior change — only lint compliance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Studio's CI runs `pnpm lint:fix` and then a `git-dirty-check` step that fails if the tree changed. The verbatim copies from the router playground use slightly different whitespace (double-spaces after `;` to align inline comments, `useMemo`/`useCallback` arg layout) that studio's prettier config reformats on touch. Run `pnpm lint:fix` locally on the ported files and commit the result so CI's dirty-check sees a clean tree.

Purely formatting; no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
COMMENTS.md, TODO.md, FEEDBACK.md, DONE.md, and docs/superpowers/
are local-only scratch for review triage and should never be
committed. TODO.md and FEEDBACK.md were previously committed by
mistake and have been stripped from branch history with git
filter-repo; add ignore rules to prevent reintroduction.
@jensneuse jensneuse closed this Apr 20, 2026
@jensneuse jensneuse force-pushed the jensneuse/entity-caching branch from be5e727 to 86c88c6 Compare April 20, 2026 09:55
jensneuse added a commit that referenced this pull request Apr 20, 2026
Squashed snapshot of jensneuse/entity-caching (e5ac2b9) onto main.
The original branch's history was rooted at unsigned "Initial commit"
4d73e22... while wundergraph/cosmo:main is rooted at the GPG-signed
equivalent — every commit SHA differs and GitHub treated the two as
unrelated histories, which is why PR #2677 was closed and could not
be reopened. This commit re-bases the entire feature as a single
diff on top of current main so it can be reviewed and merged.

Original branch is preserved at jensneuse/entity-caching for history.

Supersedes #2677.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.