Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions docs/plans/2026-03-01-context-1m-auto-escalation-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Design: Session-Aware Auto-Escalation for 1M Context

## Problem

The branch adds `context-1m-2025-08-07` to the Anthropic `anthropic-beta` header unconditionally (`provider.ts` line 126). This causes HTTP 400 errors for accounts below Tier 4: `"The long context beta is not yet available for this subscription."` It also enables 2× input / 1.5× output pricing for requests exceeding 200K tokens, even when conversations are small.

## Solution

Session-aware auto-escalation: only send the `context-1m-2025-08-07` beta header when the model supports 1M context AND the session actually needs it.

## Config

Add `context1m` to provider options:

```typescript
context1m: z.union([z.literal("auto"), z.boolean()]).optional()
```

```jsonc
// opencode.json
{ "provider": { "anthropic": { "options": { "context1m": "auto" } } } }
```

- `"auto"` (default): enable header only when model supports 1M AND session input tokens exceed 150K
- `true`: always send header for models that support 1M context
- `false`: never send header

## Decision Logic

Three conditions determine whether the header is sent (in `"auto"` mode, all must be true):

1. **Model supports it**: `model.limit.context > 200_000`
2. **Session needs it**: accumulated input tokens > 150K (75% of 200K threshold)
3. **Config allows it**: `context1m !== false`

For `true` mode: only condition 1 is checked.
For `false` mode: never send.

The model's declared `limit.context` is the capability signal. Users who set `limit.context: 1000000` on a model in their config (e.g., `claude-opus-4-6`) are opting in to 1M support for that model. Models with 200K limits (Haiku, older models) never get the header.

## Implementation

### Touch Points

1. **`provider.ts` — Anthropic loader** (CUSTOM_LOADERS, line 126): Remove `context-1m-2025-08-07` from the static beta header string. Keep `claude-code-20250219`, `interleaved-thinking-2025-05-14`, `fine-grained-tool-streaming-2025-05-14`, and `adaptive-thinking-2026-01-28`.

2. **`provider.ts` — Module-level state**: Add a boolean flag and setter for the session layer to communicate with the fetch wrapper.

```typescript
let _context1m = false
export function setContext1m(enabled: boolean) {
_context1m = enabled
}
```

3. **`provider.ts` — Fetch wrapper** (in `getSDK()`, ~line 1073): For Anthropic requests (check `model.providerID === "anthropic"` or `model.api.npm === "@ai-sdk/anthropic"`), if `_context1m` is true, append `,context-1m-2025-08-07` to the `anthropic-beta` request header.

4. **`session/llm.ts`** — Before each LLM call: Read the provider config, check the model's context limit, check accumulated session tokens, and call `Provider.setContext1m()`.

```typescript
const config = provider.options?.context1m ?? "auto"
const supports1m = model.limit.context > 200_000
const needs1m = lastUsage.tokens.input > 150_000
Provider.setContext1m(config === true ? supports1m : config === false ? false : supports1m && needs1m)
```

5. **`config.ts`** — Provider options schema: Add `context1m` to the options object with the union type.

### Console (`packages/console`)

The console's `anthropic.ts` already conditionally applies the header based on model name (`supports1m = reqModel.includes("sonnet") || reqModel.includes("opus-4-6")`). This is a separate package and can be updated independently to also respect a config option if desired.

## Edge Cases

| Scenario | Behavior |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------- |
| New session, any model | No header — safe for all tiers |
| Opus 4.6 at 180K tokens, auto mode | Header enabled — can grow to 1M |
| Haiku at any token count | Never gets header (200K context limit) |
| Sub-Tier-4, small conversation | No header — works fine |
| Sub-Tier-4, Opus 4.6 at 180K | Header enabled, API returns Tier error. Separate fallback work (see below) handles graceful degradation |
| `context1m: false`, any model | Never sends header, hard 200K limit |
| `context1m: true`, Opus 4.6 at 10K | Header sent. No cost impact — premium pricing only triggers when total input >200K |
| `context1m: true`, Haiku | No header — model doesn't support 1M (context limit ≤200K) |

## Related Work

A separate agent is working on runtime fallback for auth/billing errors (`~/.agent-mail/long-context`). That work makes the error recoverable (fall back to another model). Our work prevents the error from occurring in the first place. Both are complementary.

## Pricing Reference

The `context-1m` header alone doesn't change pricing. Premium rates only apply when total input tokens (including cache) exceed 200K:

- Input: 2× standard rate
- Output: 1.5× standard rate
- Cache read/write: proportional increase

This is why auto-escalation saves money — the header is only present when you'd hit the premium tier anyway.
225 changes: 225 additions & 0 deletions docs/plans/2026-03-01-context-1m-auto-escalation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# 1M Context Error-Retry Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Gracefully handle Anthropic's "long context beta not available" error by retrying without the `context-1m` header, then remembering to skip it for the process lifetime. Zero config needed.

**Architecture:** Keep the `context-1m-2025-08-07` beta header in the static Anthropic loader (already present on this branch). In the fetch wrapper inside `getSDK()`, detect the specific Tier error from the response, retry the request without the header, and set a process-level flag to skip it on future requests.

**Tech Stack:** TypeScript, Vercel AI SDK

**Design doc:** `docs/plans/2026-03-01-context-1m-auto-escalation-design.md`

---

### Task 1: Add Error-Retry Logic to the Fetch Wrapper

**Files:**

- Modify: `packages/opencode/src/provider/provider.ts`

**Context:** The fetch wrapper is at line 1073 inside `getSDK()`. It's a closure that captures `model` from the outer scope. The `anthropic-beta` header including `context-1m-2025-08-07` is set statically in `CUSTOM_LOADERS["anthropic"]` at line 126.

**Step 1: Add process-level disabled flag**

At the top of the `Provider` namespace (after the `log` declaration, around line 49), add:

```typescript
let _context1mDisabled = false
```

**Step 2: Add retry logic in the fetch wrapper**

In the fetch wrapper (`options["fetch"] = async (input, init) => {`, line 1073), replace the final return statement. Currently (line 1106-1110):

```typescript
return fetchFn(input, {
...opts,
// @ts-ignore see here: https://github.com/oven-sh/bun/issues/16682
timeout: false,
})
```

Replace with:

```typescript
const response = await fetchFn(input, {
...opts,
// @ts-ignore see here: https://github.com/oven-sh/bun/issues/16682
timeout: false,
})

// Detect Anthropic "long context beta not available" error and retry without the header
if (!_context1mDisabled && model.api.npm === "@ai-sdk/anthropic" && response.status === 400) {
const cloned = response.clone()
const body = await cloned.json().catch(() => null)
if (
body?.error?.type === "invalid_request_error" &&
typeof body?.error?.message === "string" &&
body.error.message.toLowerCase().includes("long context")
) {
log.info("context-1m beta not available, retrying without it")
_context1mDisabled = true
const headers = new Headers(opts.headers as HeadersInit)
const beta = headers.get("anthropic-beta") ?? ""
headers.set(
"anthropic-beta",
beta
.split(",")
.filter((h) => !h.includes("context-1m"))
.join(","),
)
return fetchFn(input, {
...opts,
headers,
// @ts-ignore
timeout: false,
})
}
}

return response
```

**Step 3: Strip `context-1m` from future requests when disabled**

At the top of the fetch wrapper (after `const opts = init ?? {}`, line 1076), add:

```typescript
// Skip context-1m header if previously detected as unavailable
if (_context1mDisabled && model.api.npm === "@ai-sdk/anthropic") {
const headers = new Headers(opts.headers as HeadersInit)
const beta = headers.get("anthropic-beta") ?? ""
if (beta.includes("context-1m")) {
headers.set(
"anthropic-beta",
beta
.split(",")
.filter((h) => !h.includes("context-1m"))
.join(","),
)
opts.headers = headers
}
}
```

**Step 4: Verify no type errors**

Run: `cd packages/opencode && npx tsc --noEmit`
Expected: No new errors

**Step 5: Describe and advance**

```bash
jj describe -m "feat(provider): auto-retry without context-1m header when account lacks access"
jj new
```

---

### Task 2: Tests

**Files:**

- Create: `packages/opencode/test/provider/context1m.test.ts`

**Step 1: Write tests for the retry behavior**

The retry logic is embedded in the fetch wrapper, which is hard to unit test in isolation. Instead, test the header-stripping logic and the flag behavior:

```typescript
import { describe, test, expect } from "bun:test"

describe("context-1m header stripping", () => {
function strip(beta: string) {
return beta
.split(",")
.filter((h) => !h.includes("context-1m"))
.join(",")
}

test("strips context-1m from beta header", () => {
const header =
"claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14,adaptive-thinking-2026-01-28,context-1m-2025-08-07"
expect(strip(header)).toBe(
"claude-code-20250219,interleaved-thinking-2025-05-14,fine-grained-tool-streaming-2025-05-14,adaptive-thinking-2026-01-28",
)
})

test("preserves other headers when context-1m is not present", () => {
const header = "claude-code-20250219,interleaved-thinking-2025-05-14"
expect(strip(header)).toBe("claude-code-20250219,interleaved-thinking-2025-05-14")
})

test("handles context-1m as only header", () => {
expect(strip("context-1m-2025-08-07")).toBe("")
})
})

describe("error detection", () => {
test("matches the known Anthropic tier error", () => {
const body = {
error: {
type: "invalid_request_error",
message: "The long context beta is not yet available for this subscription.",
},
}
const matches =
body.error.type === "invalid_request_error" &&
typeof body.error.message === "string" &&
body.error.message.toLowerCase().includes("long context")
expect(matches).toBe(true)
})

test("does not match unrelated errors", () => {
const body = {
error: {
type: "invalid_request_error",
message: "max_tokens must be less than 8192",
},
}
const matches =
body.error.type === "invalid_request_error" &&
typeof body.error.message === "string" &&
body.error.message.toLowerCase().includes("long context")
expect(matches).toBe(false)
})
})
```

**Step 2: Run the tests**

Run: `cd packages/opencode && bun test test/provider/context1m.test.ts`
Expected: All tests pass

**Step 3: Run existing tests for regressions**

Run: `cd packages/opencode && bun test test/session/compaction.test.ts`
Expected: All tests pass

**Step 4: Describe and advance**

```bash
jj describe -m "test(provider): add context-1m retry logic tests"
jj new
```

---

### Task 3: Verify End-to-End

**Step 1: Type check the full package**

Run: `cd packages/opencode && npx tsc --noEmit`
Expected: No errors

**Step 2: Run the full test suite**

Run: `cd packages/opencode && bun test`
Expected: All tests pass

**Step 3: Final describe**

```bash
jj describe -m "feat(provider): graceful context-1m fallback for sub-Tier-4 accounts"
```
33 changes: 29 additions & 4 deletions packages/opencode/src/bun/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,33 @@ export namespace BunProc {
}),
)

// For github: dependencies, bun installs under the package's actual name
// (from its package.json "name" field), not under the github: specifier.
// Resolve the real module path by reading the installed package name from
// the cache lockfile.
async function resolveModulePath(pkg: string): Promise<string> {
const nodeModules = path.join(Global.Path.cache, "node_modules")
if (!pkg.startsWith("github:")) return path.join(nodeModules, pkg)
const lockPath = path.join(Global.Path.cache, "bun.lock")
const lock = await Filesystem.readText(lockPath).catch(() => "")
// lockfile maps "actual-name": "github:owner/repo#ref"
for (const line of lock.split("\n")) {
if (line.includes(pkg)) {
const match = line.match(/^\s*"([^"]+)":\s*"/)
if (match && match[1] !== pkg) return path.join(nodeModules, match[1])
}
}
// Fallback: strip github: prefix and use repo name
const repoName = pkg.replace(/^github:/, "").split("#")[0].split("/").pop()
if (repoName) return path.join(nodeModules, repoName)
return path.join(nodeModules, pkg)
}

export async function install(pkg: string, version = "latest") {
// Use lock to ensure only one install at a time
using _ = await Lock.write("bun-install")

const mod = path.join(Global.Path.cache, "node_modules", pkg)
const mod = await resolveModulePath(pkg)
const pkgjsonPath = path.join(Global.Path.cache, "package.json")
const parsed = await Filesystem.readJson<{ dependencies: Record<string, string> }>(pkgjsonPath).catch(async () => {
const result = { dependencies: {} as Record<string, string> }
Expand Down Expand Up @@ -89,7 +111,7 @@ export namespace BunProc {
...(proxied() || process.env.CI ? ["--no-cache"] : []),
"--cwd",
Global.Path.cache,
pkg + "@" + version,
pkg.includes("#") ? pkg : pkg + "@" + version,
]

// Let Bun handle registry resolution:
Expand All @@ -112,11 +134,14 @@ export namespace BunProc {
)
})

// Re-resolve after install in case lockfile changed
const installedMod = await resolveModulePath(pkg)

// Resolve actual version from installed package when using "latest"
// This ensures subsequent starts use the cached version until explicitly updated
let resolvedVersion = version
if (version === "latest") {
const installedPkg = await Filesystem.readJson<{ version?: string }>(path.join(mod, "package.json")).catch(
const installedPkg = await Filesystem.readJson<{ version?: string }>(path.join(installedMod, "package.json")).catch(
() => null,
)
if (installedPkg?.version) {
Expand All @@ -126,6 +151,6 @@ export namespace BunProc {

parsed.dependencies[pkg] = resolvedVersion
await Filesystem.writeJson(pkgjsonPath, parsed)
return mod
return installedMod
}
}
Loading