Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .beads/issues.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{"id":"kilo-3uu","title":"Audit all 19 completed reviews for hallucinations and errors","status":"closed","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:39.7712831-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T14:58:12.368059079-06:00","closed_at":"2026-02-14T14:58:12.368059079-06:00","close_reason":"Two audit agents completed: code-reviewer found 62 wrong methodology links (all fixed via kilo-kxj), slop-detector found no hallucinations/slop. PR-5817 confidence flagged as warning."}
{"id":"kilo-5ki","title":"Sync fork main with upstream main (diverged)","status":"closed","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:40.68462421-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T15:01:00.984967751-06:00","closed_at":"2026-02-14T15:01:00.984967751-06:00","close_reason":"Fork main synced to upstream fa13626. Cherry-picked 5 CI/infra commits (Qodo, CodeQL, Dependabot, Greptile, SSHD). Excluded .reviews/ docs that caused divergence. Backup on fork-infra-backup branch."}
{"id":"kilo-5nr","title":"Compose email to Emilie explaining the review system","status":"open","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:40.537816451-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T14:52:40.537816451-06:00"}
{"id":"kilo-by8","title":"Review PR #5867 - Add banner and pre-release extension info","status":"closed","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:40.782762644-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T15:05:11.917540058-06:00","closed_at":"2026-02-14T15:05:11.917540058-06:00","close_reason":"PR #5867 was merged upstream on Feb 14. No review needed — already in main."}
{"id":"kilo-gpe","title":"Review PR #5818 - docs autocomplete transplant (3389 lines)","status":"open","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:40.884803024-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T14:52:40.884803024-06:00"}
{"id":"kilo-jqt","title":"Set up second subagent QA gate for all future reviews","status":"open","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:41.033666661-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T14:52:41.033666661-06:00"}
{"id":"kilo-kxj","title":"Fix any issues found by validation audit agents","status":"closed","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:40.225972131-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T14:58:07.306236193-06:00","closed_at":"2026-02-14T14:58:07.306236193-06:00","close_reason":"Fixed 62 wrong methodology links across all journal files. Changed Kilo-Org/kilocode/tree/main/.reviews to jeremylongshore/kilocode/tree/main/.reviews. Zero remaining instances."}
{"id":"kilo-xkw","title":"Run GWI AI slop detection agent on all review artifacts","status":"open","priority":2,"issue_type":"task","owner":"jeremylongshore@users.noreply.github.com","created_at":"2026-02-14T14:52:40.37489834-06:00","created_by":"jeremylongshore","updated_at":"2026-02-14T14:52:40.37489834-06:00"}
88 changes: 88 additions & 0 deletions .reviews/DASHBOARD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# AI PR Review Dashboard

> **Reviewer**: [@jeremylongshore](https://github.com/jeremylongshore) | **Repo**: [Kilo-Org/kilocode](https://github.com/Kilo-Org/kilocode) | **Method**: [AI PR Review Methodology](https://github.com/jeremylongshore/kilocode/blob/main/.reviews/METHODOLOGY.md)

## Summary

| Metric | Value |
|--------|-------|
| Total PRs Reviewed | 17 / 75 |
| Approved | 8 |
| Comments | 7 |
| Changes Requested | 2 |
| Avg Confidence | 4.6 / 5 |
| Lines Analyzed | 837 |
| Files Touched | 39 |

## Verdicts

```
APPROVE ████████░░░░░░░░░ 8 (47%)
COMMENT ███████░░░░░░░░░░ 7 (41%)
REQUEST_CHANGES ██░░░░░░░░░░░░░░░ 2 (12%)
```

## All Reviews

| # | PR | Title | Category | Lines | Verdict | Confidence | Links |
|---|-----|-------|----------|-------|---------|------------|-------|
| 1 | [#5667](https://github.com/Kilo-Org/kilocode/pull/5667) | docs: clarify memory bank status indicators | docs | 2 | APPROVE | 5/5 | [Review](https://github.com/Kilo-Org/kilocode/pull/5667#pullrequestreview-3902290385) [Journal](https://github.com/Kilo-Org/kilocode/pull/5667#pullrequestreview-3902290683) [Bots](https://github.com/jeremylongshore/kilocode/pull/3) |
| 2 | [#5869](https://github.com/Kilo-Org/kilocode/pull/5869) | docs: clarify slash commands (/newtask vs /smol) | docs | 20 | COMMENT | 4/5 | [Review](https://github.com/Kilo-Org/kilocode/pull/5869#pullrequestreview-3902313405) [Journal](https://github.com/Kilo-Org/kilocode/pull/5869#pullrequestreview-3902313534) [Bots](https://github.com/jeremylongshore/kilocode/pull/5) |
| 3 | [#5807](https://github.com/Kilo-Org/kilocode/pull/5807) | docs: remove Enterprise pricing | docs | 71 | COMMENT | 5/5 | [Review](https://github.com/Kilo-Org/kilocode/pull/5807#pullrequestreview-3902322435) [Journal](https://github.com/Kilo-Org/kilocode/pull/5807#pullrequestreview-3902322473) [Bots](https://github.com/jeremylongshore/kilocode/pull/6) |
| 4 | [#5865](https://github.com/Kilo-Org/kilocode/pull/5865) | Add troubleshooting with console capture | docs | 58 | COMMENT | 4/5 | [Review](https://github.com/Kilo-Org/kilocode/pull/5865#pullrequestreview-3902330555) [Journal](https://github.com/Kilo-Org/kilocode/pull/5865#pullrequestreview-3902330605) [Bots](https://github.com/jeremylongshore/kilocode/pull/7) |
| 5 | [#5728](https://github.com/Kilo-Org/kilocode/pull/5728) | feat(docs): add dynamic sitemap.xml generation | docs | 279 | COMMENT | 4/5 | [Review](https://github.com/Kilo-Org/kilocode/pull/5728#pullrequestreview-3902353669) [Journal](https://github.com/Kilo-Org/kilocode/pull/5728#pullrequestreview-3902353728) [Bots](https://github.com/jeremylongshore/kilocode/pull/8) |
| 6 | [#5568](https://github.com/Kilo-Org/kilocode/pull/5568) | fix: override context window for MiniMax/Kimi free models | fix | 6 | COMMENT | 4/5 | [Bots](https://github.com/jeremylongshore/kilocode/pull/9) |
| 7 | [#5331](https://github.com/Kilo-Org/kilocode/pull/5331) | feat(mcp): re-enable oauth resource parameter | feature | 4 | APPROVE | 5/5 | [Bots](https://github.com/jeremylongshore/kilocode/pull/10) |
| 8 | [#5817](https://github.com/Kilo-Org/kilocode/pull/5817) | fix: prevent MCP servers from restarting repeatedly | fix | 88 | APPROVE | 5/5 | |
| 9 | [#5760](https://github.com/Kilo-Org/kilocode/pull/5760) | fix: improve user message visibility | fix | 8 | REQUEST_CHANGES | 5/5 | |
| 10 | [#5575](https://github.com/Kilo-Org/kilocode/pull/5575) | fix: treat maxReadFileLine=0 as unlimited | fix | 22 | COMMENT | 4/5 | |
| 11 | [#5569](https://github.com/Kilo-Org/kilocode/pull/5569) | fix: retry Amazon Bedrock network connection lost errors | fix | 22 | REQUEST_CHANGES | 4/5 | |
| 12 | [#5701](https://github.com/Kilo-Org/kilocode/pull/5701) | fix(api): add type field to messages in Responses API | fix | 26 | APPROVE | 5/5 | |
| 13 | [#5634](https://github.com/Kilo-Org/kilocode/pull/5634) | fix: context condensing prompt not saving properly | fix | 33 | APPROVE | 4/5 | |
| 14 | [#5864](https://github.com/Kilo-Org/kilocode/pull/5864) | fix: organization selector overlapping | fix | 35 | APPROVE | 4/5 | |
| 15 | [#5826](https://github.com/Kilo-Org/kilocode/pull/5826) | fix: prevent Create New Mode form fields from resetting | fix | 39 | APPROVE | 5/5 | |
| 16 | [#5838](https://github.com/Kilo-Org/kilocode/pull/5838) | fix: prevent false unsaved changes dialogs | fix | 49 | COMMENT | 4/5 | |
| 17 | [#5466](https://github.com/Kilo-Org/kilocode/pull/5466) | feat: display generated session names in task history UI | feature | 75 | APPROVE | 5/5 | |

## Tier Progress

| Tier | Description | Reviewed | Total | Status |
|------|-------------|----------|-------|--------|
| 1 | Docs | 5 | 7 | 71% |
| 2 | Tiny fixes + Approved | 11 | 11 | 100% |
| 3 | Small fixes/features | 1 | 13 | 8% |
| 4 | Medium fixes | 0 | 4 | 0% |
| 5 | Providers + medium features | 0 | 27 | 0% |
| 6 | Large features | 0 | 12 | 0% |

## Key Findings

| # | PR | Finding | Impact |
|---|-----|---------|--------|
| 1 | [#5807](https://github.com/Kilo-Org/kilocode/pull/5807) | File deletions need cross-reference checks; bots miss what's NOT in the diff | High |
| 2 | [#5817](https://github.com/Kilo-Org/kilocode/pull/5817) | Race conditions in debounced callbacks need re-check of guards after await | High |
| 3 | [#5760](https://github.com/Kilo-Org/kilocode/pull/5760) | Contributor agreed to implement designer's alternative — don't approve pending revision | Medium |
| 4 | [#5569](https://github.com/Kilo-Org/kilocode/pull/5569) | Maintainer says retrying won't help — hold for investigation | Medium |
| 5 | [#5826](https://github.com/Kilo-Org/kilocode/pull/5826) | VSCode web components cause controlled input issues in React | Medium |
| 6 | [#5634](https://github.com/Kilo-Org/kilocode/pull/5634) | Local state pattern prevents controlled input flickering | Low |

## Methodology

Each PR goes through a 10-step pipeline:

1. **Triage** — Score by complexity, risk, and category
2. **Fork Mirror** — Cherry-pick to [review fork](https://github.com/jeremylongshore/kilocode) for multi-AI analysis
3. **Bot Analysis** — 5+ AI reviewers (CodeRabbit, Gemini, Greptile, CodeQL, Qodo) auto-review
4. **Metadata Fetch** — Pull upstream PR data, CI status, existing comments
5. **Context Read** — Read touched files, surrounding code, tests
6. **Deep Analysis** — Line-by-line diff review with checklist
7. **Verification** — CI checks, type safety, targeted tests
8. **Compose** — Write structured review + narrative journal
9. **Quality Gate** — Tone lint, link verification, human approval
10. **Submit** — Post review + journal to upstream PR

Full methodology: [METHODOLOGY.md](https://github.com/jeremylongshore/kilocode/blob/main/.reviews/METHODOLOGY.md) | Progress: [PROGRESS.md](https://github.com/jeremylongshore/kilocode/blob/main/.reviews/PROGRESS.md)

---

*Generated from review database. Last updated: 2026-02-15.*
98 changes: 98 additions & 0 deletions .reviews/METHODOLOGY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# AI PR Review Methodology

Built from evidence. Each section added after patterns emerge from actual reviews.

---

## Stack

| Layer | Tool | Role | Cost |
|-------|------|------|------|
| Primary | Claude Code | Deep analysis, review composition, journal writing | - |
| Bot | CodeRabbit | Line-by-line review, summaries | Free (public) |
| Bot | Gemini Code Assist | Google model perspective, /gemini commands | Free |
| Bot | Greptile | Codebase-graph-aware review, architecture context | $20/mo |
| Bot | CodeQL | SAST security scanning | Free |
| Bot | Qodo PR-Agent | Open-source auto-describe/review | Free |
| Search | Sourcegraph | Blast radius queries, cross-repo references | Free (public) |
| Gate | Human (Jeremy) | Final approval before submit | - |

## Workflow

1. Pick PR from priority queue
2. **Read ALL existing comments/reviews on upstream PR** — understand maintainer feedback, contributor discussion, and any pending requests before writing our review
3. Mirror PR on fork → bots auto-review (2-5 min)
4. Fetch upstream metadata, diff, CI status
5. Read codebase context + synthesize bot findings
6. Analyze diff, run checklist, create artifacts
7. Verify (CI + local testing scaled by tier)
8. Compose review (Comment 1) + journal (Comment 2)
9. Quality gate (tone lint, metadata check, link check)
10. **Human (Jeremy) approves** — reviews are NOT posted until explicitly approved
11. Submit to upstream with links to fork evidence

## Verification Strategy

| Tier | What We Check |
|------|--------------|
| All | Upstream CI, bot consensus on fork PR |
| 3+ | Targeted tests, type checking, Sourcegraph blast radius |
| 5+ | Full build, manual testing for UI changes |
| Providers | Pattern compliance, security audit, streaming support |

## Evidence

All reviews link to fork PRs where 5-6 independent AI tools analyzed the same change. Bot agreement/disagreement is documented in each journal's "Bot Review Synthesis" section.

---

## Patterns (Emerging)

### Docs PRs (from review #1: PR #5667)
- Changesets not required for docs-only changes in `apps/kilocode-docs/`
- Only `Build Markdoc Site` and `check-translations` CI checks are directly relevant
- Acknowledge contributor resilience (adapting to upstream file removals)
- "Is this true?" is a high-value review question

### Infrastructure (from review #1: PR #5667)
- GitHub Codespaces on fork for build/test/push (devcontainer + SSHD feature)
- Local VM for analysis, review composition, journal writing only
- Cherry-pick upstream PR commits (not API file replacement) for accurate bot diffs
- Codespace free tier (60 core-hours/mo) covers ~60 PRs/month

### Fork PR Methodology (from review #1: PR #5667)
- API file replacement via GitHub Contents API creates wrong diffs (full file swap)
- Must use `git am` with patches from `gh pr diff --patch` for accurate cherry-picks
- Bot reviews are only as good as the diff they see
- Track bot false positives in status.json `bot_findings` field

### Bot Consensus (from review #2: PR #5869)
- When 2+ bots independently flag the same issue with different framing, the finding is almost certainly real
- CodeRabbit: "orphaned bullet point" + Gemini: "breaks grammatical flow" = same structural issue
- Bot agreement directly validates manual analysis and increases confidence score
- Greptile still not responding on docs PRs — investigate trigger conditions

### Document Structure (from review #2: PR #5869)
- Cross-cutting docs changes must check for in-progress syntactic structures (lists, tables, code blocks)
- Inserting a new section mid-list is a classic "insert in the wrong spot" issue
- All CI green doesn't mean content is correct — Markdoc validates syntax, not document coherence
- Source code verification prevents docs drift (check actual command definitions)

### File Deletions (from review #3: PR #5807)
- Always search codebase for references to deleted files (nav configs, feature tables, imports)
- Bots only analyze the diff — they can't flag what's missing from the PR
- Markdoc build passes despite broken internal links — needs link checker CI
- Bot-generated PRs (kiloconnect) may have gaps in cross-reference cleanup

### Links & References (from review #7+)
- Methodology link in journals MUST point to fork: `https://github.com/jeremylongshore/kilocode/tree/main/.reviews`
- NEVER link to `Kilo-Org/kilocode/.reviews` — that path doesn't exist upstream
- All fork PR links must be verified before posting
- No 404s in anything we post — test every link

### Maintainer Context (from reviews #9, #11)
- Always read existing comments — contributor may have agreed to revisions (#5760)
- Maintainer feedback can invalidate the PR approach entirely (#5569)
- Don't approve PRs where the contributor themselves plans to change the implementation

<!-- More patterns added as reviews accumulate -->
88 changes: 88 additions & 0 deletions .reviews/NOTES-autonomous-transfer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Notes: Autonomous Agentic Transfer

## Current Stack (v2)

### Active
- **Claude Code** - main driver (interactive, human-gated on submit)
- **CodeRabbit** - auto-reviews on fork PRs (free, public repos)
- **Gemini Code Assist** - auto-reviews on fork PRs (free)
- **Greptile** - codebase-graph-aware reviews on fork PRs ($20/mo)
- **CodeQL** - SAST security scanning via GitHub Action (free)
- **Qodo PR-Agent** - open-source auto-review via GitHub Action (free)
- **Dependabot** - dependency vulnerability scanning (free)
- **Sourcegraph** - public code search for blast radius (free)

### Not Yet Wired In
- **GWI** - triage scoring, slop detection, codebase-aware drafts
- **Bounty tone lint** - AI slop detection gate before posting
- **Sourcegraph Cody Pro** - unlimited AI codebase chat ($9/mo, pending signup)

## Fork-Based Testing Pattern

The fork (jeremylongshore/kilocode) serves as a test lab:
1. Mirror each upstream PR as a fork PR
2. All bots auto-review the fork PR (5-6 independent AI analyses)
3. Synthesize bot findings into human review
4. Post to upstream with links back to fork
5. Fork becomes public evidence of the methodology

This is the industry-standard pattern for PR verification:
- Cherry-pick/mirror the change
- Run independent analysis in isolated environment
- Document findings with links to evidence
- Submit with full audit trail

## Transfer Path: Interactive → Autonomous

### Phase 1 (Current): Human-driven, bot-assisted
- Human triggers each step
- Bots run automatically on fork
- Human synthesizes and approves
- Human submits to upstream

### Phase 2: Scripted pipeline
- Script creates fork PR automatically
- Script waits for bot reviews
- Script drafts review + journal from bot synthesis
- Human approves and submits

### Phase 3: Agent loop
- Agent processes queue from priority-queue.json
- Agent creates fork PRs, waits for bots, drafts reviews
- Human gate only on submit
- Confidence calibration: tier 1-2 auto-submit, tier 3+ human review

### Phase 4: Full autonomous
- Human audit on sample (every 5th PR)
- GWI triage score drives confidence thresholds
- Bounty tone lint gates all output
- Failure mode monitoring: track post-submit feedback

## Key Questions for Later
- Can GWI's triage score predict which PRs need human review?
- What's the false positive rate per bot? (track in Bot Review Synthesis)
- What's the minimum viable confidence threshold for auto-submit?
- How does Devin's auto-review API endpoint model compare?

## Infrastructure

### Build/Test Environment: GitHub Codespaces
- **Decision**: Use Codespaces on the fork for all build, test, and push operations
- **Rationale**: Local dev VM (4GB) OOM-kills on `pnpm install` for the kilocode monorepo (~2GB node_modules). Codespaces provide 4-core/32GB machines with the project's devcontainer pre-configured.
- **Setup**: Added `ghcr.io/devcontainers/features/sshd:1` to fork's devcontainer.json for CLI access via `gh codespace ssh`
- **Cost**: Free tier = 60 core-hours/month. At ~15 min per PR review session = 1 core-hour = 60 PRs/month on free tier.
- **Machine**: `basicLinux32gb` (4-core, 32GB RAM)
- **Workflow**: SSH into Codespace → cherry-pick upstream PR → push branch → create PR → bots auto-review
- **Why not GCP VM**: Codespaces are already integrated with the fork repo, have the devcontainer, and need zero infrastructure management. GCP VM would require SSH setup, git auth, Node/pnpm install, and ongoing maintenance.

### Local Environment (this VM)
- Used for: analysis, review composition, journal writing, methodology docs
- NOT used for: building, testing, or pushing kilocode changes
- Reason: 4GB RAM cannot handle the monorepo's dependency tree

## Cost Analysis
- Current: $35/mo (Greptile $20 + Sourcegraph Cody $9 + buffer)
- Codespaces: Free tier (60 core-hours/month)
- Per PR: $35/75 = $0.47/PR
- Devin: $500/mo, roughly $6.67/PR at similar volume
- Delta: 14x cheaper with full transparency
Loading