docs: comprehensive caching and resolve package documentation#1433
docs: comprehensive caching and resolve package documentation#1433jensneuse merged 3 commits intofeat/add-caching-supportfrom
Conversation
Create three well-structured documentation files for the entity caching system and resolve package: 1. **v2/pkg/engine/resolve/CLAUDE.md** - Full resolve package reference covering the resolution pipeline (Resolver, Loader, Resolvable) and entity caching internals. Single file because caching is embedded in the fetch execution flow. 2. **ENTITY_CACHING_INTEGRATION.md** - Router integration guide with complete public APIs, configuration options, cache key formats, invalidation mechanisms, analytics, and a full end-to-end example. Another agent can fully integrate entity caching using only this file. 3. **CLAUDE.md (root)** - High-level repo overview with package map, data flow, and links to deep references. Replaces entity-caching-specific content with concise architecture documentation. Also delete execution/engine/CLAUDE.md (cache log rules merged into resolve/CLAUDE.md). Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR adds comprehensive documentation for entity caching integration in the GraphQL engine, restructures the root CLAUDE.md with a high-level module map, and introduces detailed architectural documentation for the Resolve package covering L1/L2 caching strategies and runtime wiring. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
The remote branch expanded execution/engine/CLAUDE.md with E2E test conventions. Keep the remote version since it contains distinct content from what was merged into v2/pkg/engine/resolve/CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
ENTITY_CACHING_INTEGRATION.md (1)
254-257: Avoid panic-prone type assertion in the interceptor example.
ctx.Value("tenant-id").(string)can panic. Use the safevalue, ok := ...pattern (already used later in this file).Suggested doc fix
- L2CacheKeyInterceptor: func(ctx context.Context, key string, info resolve.L2CacheKeyInterceptorInfo) string { - tenantID := ctx.Value("tenant-id").(string) - return tenantID + ":" + key - }, + L2CacheKeyInterceptor: func(ctx context.Context, key string, info resolve.L2CacheKeyInterceptorInfo) string { + if tenantID, ok := ctx.Value("tenant-id").(string); ok { + return tenantID + ":" + key + } + return key + },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ENTITY_CACHING_INTEGRATION.md` around lines 254 - 257, The example for L2CacheKeyInterceptor is using a panic-prone assertion ctx.Value("tenant-id").(string); change it to use the safe form value, ok := ctx.Value("tenant-id").(string) and handle the false case (e.g., fall back to key or use an empty tenant prefix) before constructing and returning the cache key; update the L2CacheKeyInterceptor function to perform the type-checked extraction and a clear fallback to avoid panics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Around line 9-11: Add a language tag to the fenced data-flow block so the code
fence reads with a `text` specifier: replace the triple-backtick fence around
the line "parse → normalize → validate → plan → resolve → response" with a
```text fenced block (i.e., open with ```text and close with ```) to satisfy
MD040; look for the fenced block containing the data-flow string in CLAUDE.md.
In `@ENTITY_CACHING_INTEGRATION.md`:
- Around line 296-298: Add a language label "text" to the fenced code blocks
that show key-format and flow snippets (e.g., the blocks containing
{headerHash}:{"__typename":"User","key":{"id":"123"}},
tenant-X:{headerHash}:{"__typename":"User","key":{"id":"123"}}, the L1/L2 flow
block, and the interceptor-prefix example) so they become ```text ... ```, and
update all similar unlabeled fences in the same sections noted (also at the
other ranges) to avoid MD040 warnings; locate these by searching for the literal
snippet lines and replace their surrounding triple-backtick fences with
triple-backtick + text.
In `@v2/pkg/engine/resolve/CLAUDE.md`:
- Line 16: Several fenced code blocks in CLAUDE.md are missing language
identifiers (causing MD040); update each triple-backtick fence around examples
like the Resolver.ResolveGraphQLResponse sequence, the JSON navigation snippet
("Navigate to object in JSON: value = parent.Get..."), the cache flow lines
("prepareCacheKeys() → tryL1CacheLoad() → tryL2CacheLoad() → fetch..."), and the
"Phase 1 (main): prepareCacheKeys + tryL1CacheLoad" section to include a
language tag (e.g., ```text) on the opening fence so markdownlint no longer
flags them; ensure you apply the same fix at the other occurrences noted in the
comment (around lines with the same snippets).
---
Nitpick comments:
In `@ENTITY_CACHING_INTEGRATION.md`:
- Around line 254-257: The example for L2CacheKeyInterceptor is using a
panic-prone assertion ctx.Value("tenant-id").(string); change it to use the safe
form value, ok := ctx.Value("tenant-id").(string) and handle the false case
(e.g., fall back to key or use an empty tenant prefix) before constructing and
returning the cache key; update the L2CacheKeyInterceptor function to perform
the type-checked extraction and a clear fallback to avoid panics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9af4f5ae-1e32-44c1-a329-99dac96bde1c
📒 Files selected for processing (3)
CLAUDE.mdENTITY_CACHING_INTEGRATION.mdv2/pkg/engine/resolve/CLAUDE.md
| ``` | ||
|
|
||
| ### Configuration Types | ||
| ```go | ||
| // Per-subgraph caching config (explicit opt-in) | ||
| type SubgraphCachingConfig struct { | ||
| SubgraphName string | ||
| EntityCaching plan.EntityCacheConfigurations // For _entities queries | ||
| RootFieldCaching plan.RootFieldCacheConfigurations // For root queries | ||
| } | ||
|
|
||
| type EntityCacheConfiguration struct { | ||
| TypeName string // e.g., "User" | ||
| CacheName string | ||
| TTL time.Duration | ||
| IncludeSubgraphHeaderPrefix bool | ||
| } | ||
|
|
||
| type RootFieldCacheConfiguration struct { | ||
| TypeName string // e.g., "Query" | ||
| FieldName string // e.g., "topProducts" | ||
| CacheName string | ||
| TTL time.Duration | ||
| IncludeSubgraphHeaderPrefix bool | ||
| } | ||
| ``` | ||
|
|
||
| ### Cache Stats (Thread Safety) | ||
| ```go | ||
| type CacheStats struct { | ||
| L1Hits int64 // Main thread only (non-atomic) | ||
| L1Misses int64 // Main thread only (non-atomic) | ||
| L2Hits *atomic.Int64 // Goroutine-safe (atomic) | ||
| L2Misses *atomic.Int64 // Goroutine-safe (atomic) | ||
| } | ||
| ``` | ||
|
|
||
| ## Enabling Caching | ||
|
|
||
| ### Runtime Options | ||
| ```go | ||
| ctx.ExecutionOptions.Caching = CachingOptions{ | ||
| EnableL1Cache: true, // Per-request entity cache | ||
| EnableL2Cache: true, // External cache | ||
| } | ||
| ``` | ||
|
|
||
| ### Per-Subgraph Configuration (L2 only) | ||
| ```go | ||
| subgraphCachingConfigs := engine.SubgraphCachingConfigs{ | ||
| { | ||
| SubgraphName: "products", | ||
| RootFieldCaching: plan.RootFieldCacheConfigurations{ | ||
| {TypeName: "Query", FieldName: "topProducts", CacheName: "default", TTL: 30 * time.Second}, | ||
| }, | ||
| }, | ||
| { | ||
| SubgraphName: "accounts", | ||
| EntityCaching: plan.EntityCacheConfigurations{ | ||
| {TypeName: "User", CacheName: "default", TTL: 30 * time.Second}, | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| opts := []engine.FederationEngineConfigFactoryOption{ | ||
| engine.WithSubgraphEntityCachingConfigs(subgraphCachingConfigs), | ||
| } | ||
| ``` | ||
|
|
||
| ## Cache Flow | ||
|
|
||
| ### Sequential Execution (`tryCacheLoad`) | ||
| 1. `prepareCacheKeys()` - Generate L1 and L2 cache keys | ||
| 2. `tryL1CacheLoad()` - Check L1 (main thread) | ||
| 3. `tryL2CacheLoad()` - Check L2 (main thread) | ||
| 4. Fetch if needed, then `populateL1Cache()` and `updateL2Cache()` | ||
|
|
||
| ### Parallel Execution (`resolveParallel`) | ||
| 1. **Main thread**: `prepareCacheKeys()` + `tryL1CacheLoad()` for all nodes | ||
| 2. **Goroutines**: `tryL2CacheLoad()` + fetch via `loadFetchL2Only()` | ||
| 3. **Main thread**: Merge results, populate L1 cache | ||
|
|
||
| **Rationale**: L1 is cheap (in-memory), check on main thread to skip goroutine work early. L2/fetch are expensive, run in parallel. | ||
|
|
||
| ## Self-Referential Entity Fix | ||
|
|
||
| **Problem**: When `User.friends` returns the same `User` entity, L1 cache causes pointer aliasing → stack overflow on merge. | ||
|
|
||
| **Solution**: `shallowCopyProvidedFields()` in `loader_json_copy.go` creates copies based on `ProvidesData` schema. | ||
|
|
||
| ```go | ||
| // In tryL1CacheLoad: | ||
| ck.FromCache = l.shallowCopyProvidedFields(cachedValue, info.ProvidesData) | ||
| ``` | ||
|
|
||
| ## ProvidesData and Validation | ||
|
|
||
| `FetchInfo.ProvidesData` describes what fields a fetch provides. Used by: | ||
| - `validateItemHasRequiredData()` - Check if cached entity is complete | ||
| - `shallowCopyProvidedFields()` - Copy only required fields | ||
|
|
||
| **Critical**: For nested entity fetches, `ProvidesData` must contain entity fields (`id`, `username`), NOT the parent field (`author`). | ||
|
|
||
| ## configureFetchCaching Logic | ||
|
|
||
| ```go | ||
| func configureFetchCaching(internal, external) FetchCacheConfiguration { | ||
| // 1. Always preserve CacheKeyTemplate for L1 | ||
| result := FetchCacheConfiguration{CacheKeyTemplate: external.Caching.CacheKeyTemplate} | ||
|
|
||
| // 2. Check global disable | ||
| if v.Config.DisableEntityCaching { return result } | ||
|
|
||
| // 3. Determine fetch type FIRST | ||
| if external.RequiresEntityFetch || external.RequiresEntityBatchFetch { | ||
| // Entity fetch: all rootFields same type, use first | ||
| entityTypeName := internal.rootFields[0].TypeName | ||
| cacheConfig := fedConfig.EntityCacheConfig(entityTypeName) | ||
| } else { | ||
| // Root field fetch: need exactly 1 rootField | ||
| if len(internal.rootFields) != 1 { return result } | ||
| cacheConfig := fedConfig.RootFieldCacheConfig(rootField.TypeName, rootField.FieldName) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## Unit Testing | ||
|
|
||
| ```go | ||
| // Standard test setup | ||
| ctrl := gomock.NewController(t) | ||
| defer ctrl.Finish() | ||
|
|
||
| ds := NewMockDataSource(ctrl) | ||
| ds.EXPECT().Load(gomock.Any(), gomock.Any(), gomock.Any()). | ||
| DoAndReturn(func(ctx context.Context, headers any, input []byte) ([]byte, error) { | ||
| return []byte(`{"data":{...}}`), nil | ||
| }).Times(1) | ||
|
|
||
| loader := &Loader{caches: map[string]LoaderCache{"default": cache}} | ||
|
|
||
| // REQUIRED: Disable singleFlight for unit tests | ||
| ctx := NewContext(context.Background()) | ||
| ctx.ExecutionOptions.DisableSubgraphRequestDeduplication = true | ||
| ctx.ExecutionOptions.Caching = CachingOptions{EnableL1Cache: true, EnableL2Cache: true} | ||
|
|
||
| // REQUIRED: Always use arena | ||
| ar := arena.NewMonotonicArena(arena.WithMinBufferSize(1024)) | ||
| resolvable := NewResolvable(ar, ResolvableOptions{}) | ||
| resolvable.Init(ctx, nil, ast.OperationTypeQuery) | ||
|
|
||
| err := loader.LoadGraphQLResponseData(ctx, response, resolvable) | ||
| out := fastjsonext.PrintGraphQLResponse(resolvable.data, resolvable.errors) | ||
| ``` | ||
|
|
||
| ### FakeLoaderCache | ||
| Test mock in `cache_load_test.go` with TTL support and operation logging. | ||
|
|
||
| ### Assertions | ||
|
|
||
| **IMPORTANT**: Always use exact assertions in cache tests. Never use vague comparisons. | ||
|
|
||
| ```go | ||
| // GOOD: Exact values - always preferred | ||
| assert.Equal(t, 3, hitCount, "should have exactly 3 L1 hits") | ||
| assert.Equal(t, int64(12), l1HitsInt, "should have exactly 12 L1 hits") | ||
| assert.Equal(t, 2, accountsCalls, "should call accounts subgraph exactly twice") | ||
|
|
||
| // BAD: Never use vague comparisons | ||
| assert.GreaterOrEqual(t, hitCount, 1) // DON'T DO THIS | ||
| assert.Greater(t, l1HitsInt, int64(0)) // DON'T DO THIS | ||
| assert.LessOrEqual(t, calls, 5) // DON'T DO THIS | ||
| ``` | ||
|
|
||
| Exact assertions catch regressions that vague assertions miss. If the expected value changes, update the test to reflect the new exact value. | ||
|
|
||
| ### Snapshot Comments | ||
|
|
||
| **IMPORTANT**: Every event line in a `CacheAnalyticsSnapshot` assertion MUST have a brief comment explaining **why** that event occurred. Focus on causation, not field values. | ||
|
|
||
| ```go | ||
| // GOOD: explains the "why" | ||
| L2Reads: []resolve.CacheKeyEvent{ | ||
| {CacheKey: keyUser, Kind: resolve.CacheKeyMiss, ...}, // First request, L2 empty | ||
| {CacheKey: keyUser, Kind: resolve.CacheKeyHit, ...}, // Populated by Request 1 | ||
| }, | ||
|
|
||
| // BAD: restates the field value | ||
| {CacheKey: keyUser, Kind: resolve.CacheKeyMiss, ...}, // this is a miss | ||
| ``` | ||
|
|
||
| ## Federation Test Setup | ||
|
|
||
| Test services: `accounts`, `products`, `reviews` in `execution/federationtesting/` | ||
|
|
||
| ### Testing Entity Caching vs @provides | ||
| ```graphql | ||
| type Review { | ||
| # @provides - gateway trusts subgraph, NO entity resolution | ||
| author: User! @provides(fields: "username") | ||
|
|
||
| # No @provides - gateway MUST resolve via _entities | ||
| # Use for testing L1/L2 caching | ||
| authorWithoutProvides: User! | ||
| } | ||
| ``` | ||
|
|
||
| ### Run Tests | ||
| ```bash | ||
| go test -run "TestL1Cache" ./v2/pkg/engine/resolve/... -v | ||
| go test -run "TestFederationCaching" ./execution/engine/... -v | ||
| go test -race ./execution/engine/... -v # Race detector | ||
| ``` | ||
|
|
||
| ## astjson API Reference | ||
|
|
||
| ```go | ||
| // Create values on arena | ||
| astjson.ObjectValue(arena) | ||
| astjson.ArrayValue(arena) | ||
| astjson.StringValue(arena, string) | ||
| astjson.StringValueBytes(arena, []byte) | ||
| astjson.NumberValue(arena, string) | ||
| astjson.TrueValue(arena) | ||
| astjson.FalseValue(arena) | ||
| astjson.NullValue // Global constant (not a function) | ||
|
|
||
| // Manipulate | ||
| value.Set(arena, key, val) | ||
| value.SetArrayItem(arena, idx, val) | ||
| value.Get(keys...) | ||
| value.GetArray() | ||
| value.GetStringBytes() | ||
| value.MarshalTo([]byte) | ||
| value.Type() // TypeNull, TypeTrue, TypeObject, etc. | ||
| ``` | ||
|
|
||
| ## LoaderCache Interface | ||
|
|
||
| ```go | ||
| type LoaderCache interface { | ||
| Get(ctx context.Context, keys []string) ([]*CacheEntry, error) | ||
| Set(ctx context.Context, entries []*CacheEntry, ttl time.Duration) error | ||
| Delete(ctx context.Context, keys []string) error | ||
| } | ||
|
|
||
| type CacheEntry struct { | ||
| Key string | ||
| Value []byte // JSON-encoded entity | ||
| } | ||
| ``` | ||
|
|
||
| ## Always use exact assertions | ||
|
|
||
| Use `assert.Equal` with exact expected values. Never use `Contains`, `GreaterOrEqual`, `LessOrEqual`, or any vague comparison. | ||
| For objects or slices, always compare against a fully defined expected value, not just a subset. | ||
|
|
||
| ```go | ||
| // CORRECT | ||
| assert.Equal(t, 3, len(log), "should have exactly 3 cache operations") | ||
| assert.Equal(t, 1, tracker.GetCount(host), "should call subgraph exactly once") | ||
| assert.Equal(t, int64(12), stats.L1Hits, "should have exactly 12 L1 hits") | ||
|
|
||
| // WRONG — hides regressions | ||
| assert.GreaterOrEqual(t, len(log), 1) | ||
| assert.Greater(t, stats.L1Hits, int64(0)) | ||
| assert.Contains(t, log[0].Keys, expectedKey) | ||
| parse → normalize → validate → plan → resolve → response | ||
| ``` |
There was a problem hiding this comment.
Specify a fence language for the data-flow block.
The fenced block should include a language tag (text) to pass MD040.
Suggested doc fix
-```
+```text
parse → normalize → validate → plan → resolve → response</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>
[warning] 9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @CLAUDE.md around lines 9 - 11, Add a language tag to the fenced data-flow
block so the code fence reads with a text specifier: replace the
triple-backtick fence around the line "parse → normalize → validate → plan →
resolve → response" with a text fenced block (i.e., open with text and
close with ```) to satisfy MD040; look for the fenced block containing the
data-flow string in CLAUDE.md.
</details>
<!-- fingerprinting:phantom:triton:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
| ``` | ||
| {headerHash}:{"__typename":"User","key":{"id":"123"}} | ||
| ``` |
There was a problem hiding this comment.
Label these fenced blocks with a language to clear MD040 warnings.
Use text for the key-format and flow snippets.
Suggested doc fix
- ```
+ ```text
{headerHash}:{"__typename":"User","key":{"id":"123"}}
```
- ```
+ ```text
tenant-X:{headerHash}:{"__typename":"User","key":{"id":"123"}}
```
-```
+```text
L1 check (main thread, entity fetches only)
↓ miss
...
Populate L1 + L2 (main thread for L1, goroutine for L2)- +text
[optional-interceptor-prefix:][optional-header-hash:]{"__typename":"TypeName","key":{...}}
Also applies to: 301-303, 319-327, 386-388
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 296-296: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ENTITY_CACHING_INTEGRATION.md` around lines 296 - 298, Add a language label
"text" to the fenced code blocks that show key-format and flow snippets (e.g.,
the blocks containing {headerHash}:{"__typename":"User","key":{"id":"123"}},
tenant-X:{headerHash}:{"__typename":"User","key":{"id":"123"}}, the L1/L2 flow
block, and the interceptor-prefix example) so they become ```text ... ```, and
update all similar unlabeled fences in the same sections noted (also at the
other ranges) to avoid MD040 warnings; locate these by searching for the literal
snippet lines and replace their surrounding triple-backtick fences with
triple-backtick + text.
| | **Resolvable** | `resolvable.go` | Response data, two-pass rendering, error handling | | ||
|
|
||
| **End-to-end flow:** | ||
| ``` |
There was a problem hiding this comment.
Add language tags to fenced code blocks to satisfy markdownlint.
These fences are missing a language identifier (MD040), which will keep docs lint noisy.
Suggested doc fix
-```
+```text
Resolver.ResolveGraphQLResponse(ctx, response, data, writer)
1. Acquire concurrency semaphore
...
-```
+```
-```
+```text
1. Navigate to object in JSON: value = parent.Get(obj.Path...)
...
-```
+```
-```
+```text
prepareCacheKeys() → tryL1CacheLoad() → tryL2CacheLoad() → fetch → populateL1Cache() + updateL2Cache
-```
+```
-```
+```text
Phase 1 (main): prepareCacheKeys + tryL1CacheLoad for all fetches
...
-```
+```Also applies to: 209-209, 355-355, 360-360
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@v2/pkg/engine/resolve/CLAUDE.md` at line 16, Several fenced code blocks in
CLAUDE.md are missing language identifiers (causing MD040); update each
triple-backtick fence around examples like the Resolver.ResolveGraphQLResponse
sequence, the JSON navigation snippet ("Navigate to object in JSON: value =
parent.Get..."), the cache flow lines ("prepareCacheKeys() → tryL1CacheLoad() →
tryL2CacheLoad() → fetch..."), and the "Phase 1 (main): prepareCacheKeys +
tryL1CacheLoad" section to include a language tag (e.g., ```text) on the opening
fence so markdownlint no longer flags them; ensure you apply the same fix at the
other occurrences noted in the comment (around lines with the same snippets).
…ertion - Add `text` language tag to all unlabeled fenced code blocks (MD040) - Fix panic-prone ctx.Value type assertion in interceptor example Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Comprehensive documentation for the entity caching system and resolve package execution engine.
Changes
Verification
Tests pass:
go test ./v2/pkg/engine/resolve/... -count=1🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation