fix(eval): disable ingest rate limit in M6 seeder to unblock baseline (#58 followup) by silongtan · Pull Request #305 · BicameralAI/bicameral-mcp

silongtan · 2026-05-11T01:59:05Z

Summary

Tail-end fix on top of PR #304. The first M6 baseline reading on dev surfaced overall recall 0.000 with 14/25 cases erroring — root cause is the LLM-08 ingest rate limiter (#216, burst=10 / refill=1.0/s) refusing cases 12+ during seeding. The rate limiter is for production agent-loop safety, not eval throughput; there's already a documented env var to bypass it (BICAMERAL_INGEST_RATE_LIMIT_DISABLE).

Math behind the brokenness

The seeder runs 25 cases back-to-back in one process. Bucket starts at 10 tokens, refills at 1/s. Seeding is fast, so:

10 cases burst-through  +  ~1 refill while seeding   =  11 cases pass
14 cases (U4-U8 + all 9 T*) hit empty bucket         →  _IngestRefused

Post-#304 CI on dev confirms the pattern exactly:

M6 preflight retrieval recall eval — 25 cases
  overall recall : 0.000   errors: 14
  transitive_relevance   : 0/9 surfaced, 9 errors  ← all rate-limited
  unbound_decision       : 0/8 surfaced, 5 errors  ← last 5 rate-limited
  vocabulary_mismatch    : 0/8 surfaced, 0 errors  ← first 8, ran clean

(The vocabulary_mismatch zero is the honest BM25-can't-bridge-vocab baseline — designed to surface that miss mode. The eval is working; it just can't measure the other two categories because the seeder doesn't reach them.)

Fix

One-line env var in the seeder's per-case setup, saved + restored like REPO_PATH and SURREAL_URL:

prev_ingest_rate = os.environ.get("BICAMERAL_INGEST_RATE_LIMIT_DISABLE")
os.environ["BICAMERAL_INGEST_RATE_LIMIT_DISABLE"] = "1"

Plus the matching restore block in the finally: clause. Total diff: 15 lines.

Alternatives considered + rejected

Alternative	Why rejected
Clear `_RATE_LIMIT_REGISTRY` between cases	Reaches into private module state; skips the documented env-var contract
Sleep between cases to allow refill	Slow; hides the design intent ("rate limiter doesn't apply to evals")
Lower burst/refill via `.bicameral/config.yaml` per fixture	Every Phase B eval surface would need to re-author the same config

The env-var path is the documented API and one line.

Expected after this fix

vocabulary_mismatch stays at 0/8 surfaced (that's the honest BM25 baseline — the whole point of the category)
transitive_relevance + unbound_decision produce non-zero recall once the seeder doesn't trip the limiter
Phase B picks its direction from the now-readable per-category breakdown

Local verification

✅ 16/16 sociable unit tests pass on the classifier + aggregator
✅ ruff check + format + mypy all green on the touched file
✅ bicameral.link_commit clean — 0 drift, 0 pending checks

Refs

Refs #58 (Phase A baseline). Followup to PR #304.

🤖 Generated with Claude Code

…#58 followup) PR #304's first CI baseline produced overall recall 0.000 with 14/25 cases erroring — root cause: the M6 seeder runs 25 cases back-to-back in a single process, and the LLM-08 ingest rate limiter (#216, burst=10 / refill=1.0/s) refuses cases 12+ with `_IngestRefused("rate_limit_ exceeded")`. Math: 10 initial tokens + ~1 refill while seeding the first 11 cases = 11 cases through, then 14 cases (U4-U8 + all 9 T*) erred. The rate limiter is for production agent-loop safety, not eval throughput. There's already a documented env var to disable it (see `handlers.ingest._check_rate_limit` docstring): ``BICAMERAL_INGEST_RATE_LIMIT_DISABLE`` truthy → bucket check is short-circuited. Setting it in the seeder's per-case env setup (saved + restored like `REPO_PATH` and `SURREAL_URL`) is the documented path. Symptom before this fix (post-#304 CI on dev): M6 preflight retrieval recall eval — 25 cases overall recall : 0.000 errors: 14 transitive_relevance : 0/9 surfaced, 9 errors ← all rate-limited unbound_decision : 0/8 surfaced, 5 errors ← last 5 rate-limited vocabulary_mismatch : 0/8 surfaced, 0 errors ← first 8, ran clean Expected after this fix: vocabulary_mismatch stays 0/8 surfaced (that's the honest BM25-can't-bridge-vocab baseline the eval was designed to surface). transitive_relevance + unbound_decision should produce non-zero recall once the seeder doesn't trip the rate limiter. Belt-and-suspenders alternatives considered: - clear the `_RATE_LIMIT_REGISTRY` dict between cases — works but reaches into private state and skips the env-var contract - sleep between cases to allow refill — works but slow + hides the fact that the rate limiter isn't appropriate for evals - lower burst/refill via `.bicameral/config.yaml` in the synthetic repo — works but requires every Phase B eval surface to re-author the same config The env-var path is the documented API and one line. Smoke verification ------------------ - 16/16 sociable unit tests pass on the classifier + aggregator - ruff check + format + mypy all green on the touched file Refs #58 (Phase A baseline). Followup to PR #304. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-11T01:59:14Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eaeb49ce-9ec2-4ef1-a5ba-dde2b5383a0b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 58-m6-seeder-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

silongtan temporarily deployed to ci-test May 11, 2026 01:59 — with GitHub Actions Inactive

jinhongkuan merged commit 14188f8 into dev May 11, 2026
7 checks passed

silongtan deleted the 58-m6-seeder-fix branch May 11, 2026 02:06

silongtan mentioned this pull request May 11, 2026

M6 preflight handler retrieval: by-design split (handler structural, skill-layer covered by #306) #58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): disable ingest rate limit in M6 seeder to unblock baseline (#58 followup)#305

fix(eval): disable ingest rate limit in M6 seeder to unblock baseline (#58 followup)#305
jinhongkuan merged 1 commit into
devfrom
58-m6-seeder-fix

silongtan commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

silongtan commented May 11, 2026

Summary

Math behind the brokenness

Fix

Alternatives considered + rejected

Expected after this fix

Local verification

Refs

Uh oh!

coderabbitai Bot commented May 11, 2026

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants