Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
eea2a1f
feat: implement sandcastle refinement loop with critic-based convergence
jerome-benoit May 4, 2026
6a5e967
fix: address review findings (shell injection, false convergence, nes…
jerome-benoit May 4, 2026
d7cd5e2
fix: guard retry calls, split rebase logic, remove dead critic retry
jerome-benoit May 4, 2026
bfad1d9
fix: handle nullable issue body and guard JSON parse
jerome-benoit May 4, 2026
e6b613a
fix: distinguish stalled from converged (re-reported findings → draft…
jerome-benoit May 5, 2026
5901fbf
fix: LOW-only findings should not prevent convergence
jerome-benoit May 5, 2026
ee43546
fix: log validation errors, conditional checklist, derive PR title fr…
jerome-benoit May 5, 2026
d70bc29
refactor: extract sandcastle into modular architecture
jerome-benoit May 5, 2026
3215748
fix: centralize constants, fix PR type-of-change, sanitize titles
jerome-benoit May 5, 2026
25e85de
fix: full validation post-rebase, execFileSync for gh issue list
jerome-benoit May 5, 2026
4f83a6c
perf: skip critic when implementer produces 0 commits on round 2+
jerome-benoit May 5, 2026
301c391
fix: remove unsound type guard, filter unknown plan IDs, guard Concur…
jerome-benoit May 5, 2026
8e05a15
feat: state-of-the-art algorithmic improvements
jerome-benoit May 5, 2026
100838c
fix(sandcastle): resolve all algorithmic audit findings
jerome-benoit May 5, 2026
badfc92
fix(sandcastle): harden subprocess calls and reduce cyclomatic comple…
jerome-benoit May 5, 2026
3d2a46f
fix(sandcastle): address review findings — convergence, planner failu…
jerome-benoit May 5, 2026
bf8ca04
fix(.sandcastle): address all algorithmic audit findings
jerome-benoit May 5, 2026
d1db4e4
fix: unref timeout timer to prevent process hang, catch critic throws
jerome-benoit May 5, 2026
7f0a41a
fix: count commits before break on critic failure (prevents work loss)
jerome-benoit May 5, 2026
5872f70
fix: suppress unhandled rejection from timeout promise on task success
jerome-benoit May 5, 2026
5881b92
fix: force process exit after completion (prevents hang from timed-ou…
jerome-benoit May 5, 2026
29c2ce6
fix: check findings null before commits zero (correct status on imple…
jerome-benoit May 5, 2026
a4f323d
fix: report known findings in PR body even when converged (prevents s…
jerome-benoit May 5, 2026
41d09b4
feat: state-of-the-art convergence improvements (ARCS, SWE-Agent, Ope…
jerome-benoit May 5, 2026
2f8cbf7
fix: move bestSha after ratchet, add validation timeout, fix severity…
jerome-benoit May 5, 2026
5ef3725
fix: recount totalCommits from git after best-state reset (semantic c…
jerome-benoit May 5, 2026
177ecbd
refactor(.sandcastle): address all 45 quality audit findings
jerome-benoit May 5, 2026
69ca4fa
fix: use constants from constants.ts + add planner timeout (multi-age…
jerome-benoit May 5, 2026
d8b3454
refactor(.sandcastle): harden prompts — cap findings, add known decis…
jerome-benoit May 5, 2026
189f655
perf(.sandcastle): convert execFileSync to async execFileAsync (unblo…
jerome-benoit May 5, 2026
96bbd80
fix: catch planner timeout rejection (retry instead of crash) + add c…
jerome-benoit May 5, 2026
7954fec
Merge branch 'main' into feat/sandcastle-refinement-loop
jerome-benoit May 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions .sandcastle/concurrency-pool.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/** Internal node for the O(1) FIFO waiting queue. Not exported. */
interface QueueNode {
next: null | QueueNode;
resolve: () => void;
}

/**
* A concurrency limiter that restricts parallel execution to a maximum number of tasks.
* Queue operations are O(1) amortized (singly-linked list).
*/
export class ConcurrencyPool {
private head: null | QueueNode = null;
private running = 0;
private tail: null | QueueNode = null;

/**
* @param max - Maximum number of concurrent tasks. Must be a positive integer >= 1.
*/
constructor(private readonly max: number) {
if (!Number.isInteger(max) || max < 1) {
throw new RangeError("ConcurrencyPool max must be a positive integer >= 1");
}
}

Comment on lines +13 to +24
/**
* Executes the given async function, waiting if the pool is at capacity.
* @param fn - Async function to execute within the pool.
* @returns The result of the function.
* @remarks Re-entrant calls using the same pool instance may deadlock when all slots are occupied.
*/
async run<T>(fn: () => Promise<T>): Promise<T> {
await this.acquire();
try {
return await fn();
} finally {
this.release();
}
}

private acquire(): Promise<void> {
if (this.running < this.max) {
this.running++;
return Promise.resolve();
}
return new Promise<void>((resolve) => {
const node: QueueNode = { next: null, resolve };
if (this.tail === null) {
this.head = node;
this.tail = node;
} else {
this.tail.next = node;
this.tail = node;
}
});
}

private release(): void {
this.running--;
const next = this.head;
if (next !== null) {
this.head = next.next;
if (this.head === null) {
this.tail = null;
}
this.running++;
next.resolve();
}
}
}
64 changes: 64 additions & 0 deletions .sandcastle/constants.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import { execFile } from "node:child_process";
import util from "node:util";

/** Model identifier used for implementation and critic agents. */
export const AGENT_MODEL = "github-copilot/claude-sonnet-4.6";

/** Number of context lines around a diff hunk used for hash computation. */
export const CONTEXT_HASH_RADIUS = 3;

/** Async execFile — does not block the event loop. Same error shape as execFileSync. */
export const execFileAsync = util.promisify(execFile);

/** Timeout in milliseconds for git operations. */
export const GIT_TIMEOUT_MS = 30_000;

/** Number of characters to retain from a SHA for display purposes. */
export const HASH_PREFIX_LENGTH = 16;

/** Maximum number of characters captured from stderr before truncation. */
export const MAX_STDERR_CHARS = 500;

/** Maximum number of characters allowed in a PR or commit title. */
export const MAX_TITLE_LENGTH = 200;

/** Model identifier used for planning and orchestration agents. */
export const PLANNER_MODEL = "github-copilot/claude-opus-4.6";

/** Timeout in milliseconds for git push operations. */
export const PUSH_TIMEOUT_MS = 60_000;

/** Timeout in milliseconds for a single sandcastle task execution. */
export const TASK_TIMEOUT_MS = 15 * 60 * 1000;

/** Full validation command run after each implementation round. */
export const VALIDATION_COMMAND =
"npm run type-check && npm run test && npm run test:node && npm run test:edge && npm run prettier-check && npm run lint && npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2";

/** Timeout in milliseconds for the validation command. */
export const VALIDATION_TIMEOUT_MS = 120_000;

/**
* Returns the current HEAD commit SHA for the given working directory.
* @param cwd - Absolute path to the git repository root.
* @returns The full SHA string, or `null` if the command fails.
*/
export async function getHeadSha(cwd: string): Promise<null | string> {
try {
const { stdout } = await execFileAsync("git", ["rev-parse", "HEAD"], {
cwd,
});
return stdout.trim();
} catch {
return null;
}
}

/**
* Converts an unknown thrown value to a human-readable error message.
* @param err - The caught value (may be an `Error` or any other type).
* @returns The `message` property if `err` is an `Error`, otherwise `String(err)`.
*/
export function toErrorMessage(err: unknown): string {
return err instanceof Error ? err.message : String(err);
}
64 changes: 64 additions & 0 deletions .sandcastle/critic-prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Critic Agent

Analyze the implementation on branch `{{BRANCH}}` and produce structured findings.

## Task

Run `git diff main...{{BRANCH}}` to see all changes. Examine the diff carefully. For each issue found, produce a structured finding.

Comment on lines +7 to +8
Read `AGENTS.md` and `CONTRIBUTING.md` for the project's coding standards.

## Output Format

Output your findings as JSON wrapped in nonce-tagged delimiters. Use EXACTLY this tag format:

```text
<findings-{{NONCE}}>[...]</findings-{{NONCE}}>
```

Each finding must have this structure:

```json
{
"file": "path/to/file.ts",
"line": 42,
"title": "short description of the issue",
"severity": "CRITICAL|HIGH|MEDIUM|LOW",
"category": "security|logic|performance|architecture|style",
"confidence": "HIGH|MEDIUM|LOW",
"description": "detailed explanation of why this is a problem",
"suggestion": "how to fix it"
}
```

If no issues are found, output:

```text
<findings-{{NONCE}}>[]</findings-{{NONCE}}>
```

## Rules

- Report ≤5 findings. HIGH and CRITICAL only. Omit LOW/MEDIUM unless zero higher-severity issues exist.
- If >5 HIGH/CRITICAL issues exist, report the top 5 and add a summary note in the last finding's description.
- Do NOT modify any files. Do NOT commit. Do NOT push.
- Only report issues in the CHANGED code (not pre-existing issues).
- Use HIGH confidence only when you've verified the issue by reading the relevant code.
- Use MEDIUM confidence for pattern-based detection.
- Use LOW confidence for style preferences or uncertain issues.
- Focus on: logic errors, missing edge cases, security issues, type safety violations, test gaps.
- Do NOT report formatting issues (prettier handles those).

## Known Design Decisions (do not flag)

- Mid-loop validation convergence bypasses critic (ARCS pattern — deterministic tests > subjective review).
- `process.exit()` at script end kills timed-out sandboxes (no cooperative cancellation available in sandcastle).
- Content-addressed dedup hash includes line number (collision reduction tradeoff, bounded by hard cap).

## Completion

After outputting the findings, output:

```text
<promise>COMPLETE</promise>
```
Loading