Skip to content

docs: comprehensive caching and resolve package documentation#1433

Merged
jensneuse merged 3 commits intofeat/add-caching-supportfrom
jensneuse/caching-docs
Mar 6, 2026
Merged

docs: comprehensive caching and resolve package documentation#1433
jensneuse merged 3 commits intofeat/add-caching-supportfrom
jensneuse/caching-docs

Conversation

@jensneuse
Copy link
Copy Markdown
Member

@jensneuse jensneuse commented Mar 6, 2026

Summary

Comprehensive documentation for the entity caching system and resolve package execution engine.

Changes

  • v2/pkg/engine/resolve/CLAUDE.md (589 lines) - Full resolve package reference covering the resolution pipeline (Resolver, Loader, Resolvable) and entity caching internals. Single file because caching is deeply embedded in the fetch execution flow.
  • ENTITY_CACHING_INTEGRATION.md (680 lines) - Complete router integration guide with all public APIs, configuration options, cache key formats, invalidation mechanisms, analytics, and working examples. Another agent can fully integrate entity caching using only this file.
  • CLAUDE.md (rewritten) - High-level repo overview with package map, data flow, and links to deep references. Replaced entity-caching-specific content with repo-wide architecture documentation.
  • execution/engine/CLAUDE.md - Deleted. Cache log test rules merged into resolve/CLAUDE.md testing section.

Verification

Tests pass: go test ./v2/pkg/engine/resolve/... -count=1

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Implemented two-level entity caching system supporting per-request in-memory cache and cross-request external cache with configurable TTL settings.
    • Added cache analytics and monitoring capabilities to track cache performance.
  • Documentation

    • Added comprehensive entity caching integration guide with setup instructions and usage examples.
    • Updated project documentation structure for improved navigation and reference.

Create three well-structured documentation files for the entity caching system and resolve package:

1. **v2/pkg/engine/resolve/CLAUDE.md** - Full resolve package reference covering the resolution pipeline (Resolver, Loader, Resolvable) and entity caching internals. Single file because caching is embedded in the fetch execution flow.

2. **ENTITY_CACHING_INTEGRATION.md** - Router integration guide with complete public APIs, configuration options, cache key formats, invalidation mechanisms, analytics, and a full end-to-end example. Another agent can fully integrate entity caching using only this file.

3. **CLAUDE.md (root)** - High-level repo overview with package map, data flow, and links to deep references. Replaces entity-caching-specific content with concise architecture documentation.

Also delete execution/engine/CLAUDE.md (cache log rules merged into resolve/CLAUDE.md).

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

The PR adds comprehensive documentation for entity caching integration in the GraphQL engine, restructures the root CLAUDE.md with a high-level module map, and introduces detailed architectural documentation for the Resolve package covering L1/L2 caching strategies and runtime wiring.

Changes

Cohort / File(s) Summary
Root Documentation Restructuring
CLAUDE.md
Replaced original architecture content with reorganized high-level module map covering Core, Engine, and Execution packages. Removed detailed per-request and external cache emphasis, reframing around GraphQL routing, federation-first design, and architectural decisions.
Caching Integration Guide
ENTITY_CACHING_INTEGRATION.md
New comprehensive guide detailing two-level caching system (L1 in-memory, L2 external). Introduces LoaderCache interface, CacheEntry struct, and L2CacheKeyInterceptorInfo type. Documents configuration surfaces, runtime wiring, cache key formats, behavior rules by operation type (queries, mutations, subscriptions), analytics structures, and end-to-end integration examples.
Resolve Package Architecture
v2/pkg/engine/resolve/CLAUDE.md
New detailed documentation covering Resolve package purpose, architecture (Resolver, Loader, Resolvable components), end-to-end resolution flow, configuration structures, fetch tree execution model, two-pass rendering, cache infrastructure (L1/L2), invalidation mechanisms, analytics, thread-safety model, arena allocation, and testing patterns.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding comprehensive documentation for caching and the resolve package.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jensneuse/caching-docs

Comment @coderabbitai help to get the list of available commands and usage tips.

The remote branch expanded execution/engine/CLAUDE.md with E2E test
conventions. Keep the remote version since it contains distinct content
from what was merged into v2/pkg/engine/resolve/CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
ENTITY_CACHING_INTEGRATION.md (1)

254-257: Avoid panic-prone type assertion in the interceptor example.

ctx.Value("tenant-id").(string) can panic. Use the safe value, ok := ... pattern (already used later in this file).

Suggested doc fix
-    L2CacheKeyInterceptor: func(ctx context.Context, key string, info resolve.L2CacheKeyInterceptorInfo) string {
-        tenantID := ctx.Value("tenant-id").(string)
-        return tenantID + ":" + key
-    },
+    L2CacheKeyInterceptor: func(ctx context.Context, key string, info resolve.L2CacheKeyInterceptorInfo) string {
+        if tenantID, ok := ctx.Value("tenant-id").(string); ok {
+            return tenantID + ":" + key
+        }
+        return key
+    },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ENTITY_CACHING_INTEGRATION.md` around lines 254 - 257, The example for
L2CacheKeyInterceptor is using a panic-prone assertion
ctx.Value("tenant-id").(string); change it to use the safe form value, ok :=
ctx.Value("tenant-id").(string) and handle the false case (e.g., fall back to
key or use an empty tenant prefix) before constructing and returning the cache
key; update the L2CacheKeyInterceptor function to perform the type-checked
extraction and a clear fallback to avoid panics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Around line 9-11: Add a language tag to the fenced data-flow block so the code
fence reads with a `text` specifier: replace the triple-backtick fence around
the line "parse → normalize → validate → plan → resolve → response" with a
```text fenced block (i.e., open with ```text and close with ```) to satisfy
MD040; look for the fenced block containing the data-flow string in CLAUDE.md.

In `@ENTITY_CACHING_INTEGRATION.md`:
- Around line 296-298: Add a language label "text" to the fenced code blocks
that show key-format and flow snippets (e.g., the blocks containing
{headerHash}:{"__typename":"User","key":{"id":"123"}},
tenant-X:{headerHash}:{"__typename":"User","key":{"id":"123"}}, the L1/L2 flow
block, and the interceptor-prefix example) so they become ```text ... ```, and
update all similar unlabeled fences in the same sections noted (also at the
other ranges) to avoid MD040 warnings; locate these by searching for the literal
snippet lines and replace their surrounding triple-backtick fences with
triple-backtick + text.

In `@v2/pkg/engine/resolve/CLAUDE.md`:
- Line 16: Several fenced code blocks in CLAUDE.md are missing language
identifiers (causing MD040); update each triple-backtick fence around examples
like the Resolver.ResolveGraphQLResponse sequence, the JSON navigation snippet
("Navigate to object in JSON: value = parent.Get..."), the cache flow lines
("prepareCacheKeys() → tryL1CacheLoad() → tryL2CacheLoad() → fetch..."), and the
"Phase 1 (main): prepareCacheKeys + tryL1CacheLoad" section to include a
language tag (e.g., ```text) on the opening fence so markdownlint no longer
flags them; ensure you apply the same fix at the other occurrences noted in the
comment (around lines with the same snippets).

---

Nitpick comments:
In `@ENTITY_CACHING_INTEGRATION.md`:
- Around line 254-257: The example for L2CacheKeyInterceptor is using a
panic-prone assertion ctx.Value("tenant-id").(string); change it to use the safe
form value, ok := ctx.Value("tenant-id").(string) and handle the false case
(e.g., fall back to key or use an empty tenant prefix) before constructing and
returning the cache key; update the L2CacheKeyInterceptor function to perform
the type-checked extraction and a clear fallback to avoid panics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9af4f5ae-1e32-44c1-a329-99dac96bde1c

📥 Commits

Reviewing files that changed from the base of the PR and between 9a8682e and 894480c.

📒 Files selected for processing (3)
  • CLAUDE.md
  • ENTITY_CACHING_INTEGRATION.md
  • v2/pkg/engine/resolve/CLAUDE.md

Comment thread CLAUDE.md Outdated
Comment on lines 9 to 11
```

### Configuration Types
```go
// Per-subgraph caching config (explicit opt-in)
type SubgraphCachingConfig struct {
SubgraphName string
EntityCaching plan.EntityCacheConfigurations // For _entities queries
RootFieldCaching plan.RootFieldCacheConfigurations // For root queries
}

type EntityCacheConfiguration struct {
TypeName string // e.g., "User"
CacheName string
TTL time.Duration
IncludeSubgraphHeaderPrefix bool
}

type RootFieldCacheConfiguration struct {
TypeName string // e.g., "Query"
FieldName string // e.g., "topProducts"
CacheName string
TTL time.Duration
IncludeSubgraphHeaderPrefix bool
}
```

### Cache Stats (Thread Safety)
```go
type CacheStats struct {
L1Hits int64 // Main thread only (non-atomic)
L1Misses int64 // Main thread only (non-atomic)
L2Hits *atomic.Int64 // Goroutine-safe (atomic)
L2Misses *atomic.Int64 // Goroutine-safe (atomic)
}
```

## Enabling Caching

### Runtime Options
```go
ctx.ExecutionOptions.Caching = CachingOptions{
EnableL1Cache: true, // Per-request entity cache
EnableL2Cache: true, // External cache
}
```

### Per-Subgraph Configuration (L2 only)
```go
subgraphCachingConfigs := engine.SubgraphCachingConfigs{
{
SubgraphName: "products",
RootFieldCaching: plan.RootFieldCacheConfigurations{
{TypeName: "Query", FieldName: "topProducts", CacheName: "default", TTL: 30 * time.Second},
},
},
{
SubgraphName: "accounts",
EntityCaching: plan.EntityCacheConfigurations{
{TypeName: "User", CacheName: "default", TTL: 30 * time.Second},
},
},
}

opts := []engine.FederationEngineConfigFactoryOption{
engine.WithSubgraphEntityCachingConfigs(subgraphCachingConfigs),
}
```

## Cache Flow

### Sequential Execution (`tryCacheLoad`)
1. `prepareCacheKeys()` - Generate L1 and L2 cache keys
2. `tryL1CacheLoad()` - Check L1 (main thread)
3. `tryL2CacheLoad()` - Check L2 (main thread)
4. Fetch if needed, then `populateL1Cache()` and `updateL2Cache()`

### Parallel Execution (`resolveParallel`)
1. **Main thread**: `prepareCacheKeys()` + `tryL1CacheLoad()` for all nodes
2. **Goroutines**: `tryL2CacheLoad()` + fetch via `loadFetchL2Only()`
3. **Main thread**: Merge results, populate L1 cache

**Rationale**: L1 is cheap (in-memory), check on main thread to skip goroutine work early. L2/fetch are expensive, run in parallel.

## Self-Referential Entity Fix

**Problem**: When `User.friends` returns the same `User` entity, L1 cache causes pointer aliasing → stack overflow on merge.

**Solution**: `shallowCopyProvidedFields()` in `loader_json_copy.go` creates copies based on `ProvidesData` schema.

```go
// In tryL1CacheLoad:
ck.FromCache = l.shallowCopyProvidedFields(cachedValue, info.ProvidesData)
```

## ProvidesData and Validation

`FetchInfo.ProvidesData` describes what fields a fetch provides. Used by:
- `validateItemHasRequiredData()` - Check if cached entity is complete
- `shallowCopyProvidedFields()` - Copy only required fields

**Critical**: For nested entity fetches, `ProvidesData` must contain entity fields (`id`, `username`), NOT the parent field (`author`).

## configureFetchCaching Logic

```go
func configureFetchCaching(internal, external) FetchCacheConfiguration {
// 1. Always preserve CacheKeyTemplate for L1
result := FetchCacheConfiguration{CacheKeyTemplate: external.Caching.CacheKeyTemplate}

// 2. Check global disable
if v.Config.DisableEntityCaching { return result }

// 3. Determine fetch type FIRST
if external.RequiresEntityFetch || external.RequiresEntityBatchFetch {
// Entity fetch: all rootFields same type, use first
entityTypeName := internal.rootFields[0].TypeName
cacheConfig := fedConfig.EntityCacheConfig(entityTypeName)
} else {
// Root field fetch: need exactly 1 rootField
if len(internal.rootFields) != 1 { return result }
cacheConfig := fedConfig.RootFieldCacheConfig(rootField.TypeName, rootField.FieldName)
}
}
```

## Unit Testing

```go
// Standard test setup
ctrl := gomock.NewController(t)
defer ctrl.Finish()

ds := NewMockDataSource(ctrl)
ds.EXPECT().Load(gomock.Any(), gomock.Any(), gomock.Any()).
DoAndReturn(func(ctx context.Context, headers any, input []byte) ([]byte, error) {
return []byte(`{"data":{...}}`), nil
}).Times(1)

loader := &Loader{caches: map[string]LoaderCache{"default": cache}}

// REQUIRED: Disable singleFlight for unit tests
ctx := NewContext(context.Background())
ctx.ExecutionOptions.DisableSubgraphRequestDeduplication = true
ctx.ExecutionOptions.Caching = CachingOptions{EnableL1Cache: true, EnableL2Cache: true}

// REQUIRED: Always use arena
ar := arena.NewMonotonicArena(arena.WithMinBufferSize(1024))
resolvable := NewResolvable(ar, ResolvableOptions{})
resolvable.Init(ctx, nil, ast.OperationTypeQuery)

err := loader.LoadGraphQLResponseData(ctx, response, resolvable)
out := fastjsonext.PrintGraphQLResponse(resolvable.data, resolvable.errors)
```

### FakeLoaderCache
Test mock in `cache_load_test.go` with TTL support and operation logging.

### Assertions

**IMPORTANT**: Always use exact assertions in cache tests. Never use vague comparisons.

```go
// GOOD: Exact values - always preferred
assert.Equal(t, 3, hitCount, "should have exactly 3 L1 hits")
assert.Equal(t, int64(12), l1HitsInt, "should have exactly 12 L1 hits")
assert.Equal(t, 2, accountsCalls, "should call accounts subgraph exactly twice")

// BAD: Never use vague comparisons
assert.GreaterOrEqual(t, hitCount, 1) // DON'T DO THIS
assert.Greater(t, l1HitsInt, int64(0)) // DON'T DO THIS
assert.LessOrEqual(t, calls, 5) // DON'T DO THIS
```

Exact assertions catch regressions that vague assertions miss. If the expected value changes, update the test to reflect the new exact value.

### Snapshot Comments

**IMPORTANT**: Every event line in a `CacheAnalyticsSnapshot` assertion MUST have a brief comment explaining **why** that event occurred. Focus on causation, not field values.

```go
// GOOD: explains the "why"
L2Reads: []resolve.CacheKeyEvent{
{CacheKey: keyUser, Kind: resolve.CacheKeyMiss, ...}, // First request, L2 empty
{CacheKey: keyUser, Kind: resolve.CacheKeyHit, ...}, // Populated by Request 1
},

// BAD: restates the field value
{CacheKey: keyUser, Kind: resolve.CacheKeyMiss, ...}, // this is a miss
```

## Federation Test Setup

Test services: `accounts`, `products`, `reviews` in `execution/federationtesting/`

### Testing Entity Caching vs @provides
```graphql
type Review {
# @provides - gateway trusts subgraph, NO entity resolution
author: User! @provides(fields: "username")

# No @provides - gateway MUST resolve via _entities
# Use for testing L1/L2 caching
authorWithoutProvides: User!
}
```

### Run Tests
```bash
go test -run "TestL1Cache" ./v2/pkg/engine/resolve/... -v
go test -run "TestFederationCaching" ./execution/engine/... -v
go test -race ./execution/engine/... -v # Race detector
```

## astjson API Reference

```go
// Create values on arena
astjson.ObjectValue(arena)
astjson.ArrayValue(arena)
astjson.StringValue(arena, string)
astjson.StringValueBytes(arena, []byte)
astjson.NumberValue(arena, string)
astjson.TrueValue(arena)
astjson.FalseValue(arena)
astjson.NullValue // Global constant (not a function)

// Manipulate
value.Set(arena, key, val)
value.SetArrayItem(arena, idx, val)
value.Get(keys...)
value.GetArray()
value.GetStringBytes()
value.MarshalTo([]byte)
value.Type() // TypeNull, TypeTrue, TypeObject, etc.
```

## LoaderCache Interface

```go
type LoaderCache interface {
Get(ctx context.Context, keys []string) ([]*CacheEntry, error)
Set(ctx context.Context, entries []*CacheEntry, ttl time.Duration) error
Delete(ctx context.Context, keys []string) error
}

type CacheEntry struct {
Key string
Value []byte // JSON-encoded entity
}
```

## Always use exact assertions

Use `assert.Equal` with exact expected values. Never use `Contains`, `GreaterOrEqual`, `LessOrEqual`, or any vague comparison.
For objects or slices, always compare against a fully defined expected value, not just a subset.

```go
// CORRECT
assert.Equal(t, 3, len(log), "should have exactly 3 cache operations")
assert.Equal(t, 1, tracker.GetCount(host), "should call subgraph exactly once")
assert.Equal(t, int64(12), stats.L1Hits, "should have exactly 12 L1 hits")

// WRONG — hides regressions
assert.GreaterOrEqual(t, len(log), 1)
assert.Greater(t, stats.L1Hits, int64(0))
assert.Contains(t, log[0].Keys, expectedKey)
parse → normalize → validate → plan → resolve → response
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify a fence language for the data-flow block.

The fenced block should include a language tag (text) to pass MD040.

Suggested doc fix
-```
+```text
 parse → normalize → validate → plan → resolve → response
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @CLAUDE.md around lines 9 - 11, Add a language tag to the fenced data-flow
block so the code fence reads with a text specifier: replace the
triple-backtick fence around the line "parse → normalize → validate → plan →
resolve → response" with a text fenced block (i.e., open with text and
close with ```) to satisfy MD040; look for the fenced block containing the
data-flow string in CLAUDE.md.


</details>

<!-- fingerprinting:phantom:triton:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment thread ENTITY_CACHING_INTEGRATION.md Outdated
Comment on lines +296 to +298
```
{headerHash}:{"__typename":"User","key":{"id":"123"}}
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Label these fenced blocks with a language to clear MD040 warnings.

Use text for the key-format and flow snippets.

Suggested doc fix
-   ```
+   ```text
    {headerHash}:{"__typename":"User","key":{"id":"123"}}
    ```

-   ```
+   ```text
    tenant-X:{headerHash}:{"__typename":"User","key":{"id":"123"}}
    ```

-```
+```text
 L1 check (main thread, entity fetches only)
   ↓ miss
 ...
 Populate L1 + L2 (main thread for L1, goroutine for L2)

- +text
[optional-interceptor-prefix:][optional-header-hash:]{"__typename":"TypeName","key":{...}}

Also applies to: 301-303, 319-327, 386-388

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 296-296: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ENTITY_CACHING_INTEGRATION.md` around lines 296 - 298, Add a language label
"text" to the fenced code blocks that show key-format and flow snippets (e.g.,
the blocks containing {headerHash}:{"__typename":"User","key":{"id":"123"}},
tenant-X:{headerHash}:{"__typename":"User","key":{"id":"123"}}, the L1/L2 flow
block, and the interceptor-prefix example) so they become ```text ... ```, and
update all similar unlabeled fences in the same sections noted (also at the
other ranges) to avoid MD040 warnings; locate these by searching for the literal
snippet lines and replace their surrounding triple-backtick fences with
triple-backtick + text.

Comment thread v2/pkg/engine/resolve/CLAUDE.md Outdated
| **Resolvable** | `resolvable.go` | Response data, two-pass rendering, error handling |

**End-to-end flow:**
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language tags to fenced code blocks to satisfy markdownlint.

These fences are missing a language identifier (MD040), which will keep docs lint noisy.

Suggested doc fix
-```
+```text
 Resolver.ResolveGraphQLResponse(ctx, response, data, writer)
   1. Acquire concurrency semaphore
   ...
-```
+```

-```
+```text
 1. Navigate to object in JSON: value = parent.Get(obj.Path...)
 ...
-```
+```

-```
+```text
 prepareCacheKeys() → tryL1CacheLoad() → tryL2CacheLoad() → fetch → populateL1Cache() + updateL2Cache
-```
+```

-```
+```text
 Phase 1 (main): prepareCacheKeys + tryL1CacheLoad for all fetches
 ...
-```
+```

Also applies to: 209-209, 355-355, 360-360

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 16-16: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@v2/pkg/engine/resolve/CLAUDE.md` at line 16, Several fenced code blocks in
CLAUDE.md are missing language identifiers (causing MD040); update each
triple-backtick fence around examples like the Resolver.ResolveGraphQLResponse
sequence, the JSON navigation snippet ("Navigate to object in JSON: value =
parent.Get..."), the cache flow lines ("prepareCacheKeys() → tryL1CacheLoad() →
tryL2CacheLoad() → fetch..."), and the "Phase 1 (main): prepareCacheKeys +
tryL1CacheLoad" section to include a language tag (e.g., ```text) on the opening
fence so markdownlint no longer flags them; ensure you apply the same fix at the
other occurrences noted in the comment (around lines with the same snippets).

…ertion

- Add `text` language tag to all unlabeled fenced code blocks (MD040)
- Fix panic-prone ctx.Value type assertion in interceptor example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jensneuse jensneuse merged commit bdd700b into feat/add-caching-support Mar 6, 2026
7 checks passed
@jensneuse jensneuse deleted the jensneuse/caching-docs branch March 6, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant