Skip to content

fix(eval): disable ingest rate limit in M6 seeder to unblock baseline (#58 followup)#305

Merged
jinhongkuan merged 1 commit into
devfrom
58-m6-seeder-fix
May 11, 2026
Merged

fix(eval): disable ingest rate limit in M6 seeder to unblock baseline (#58 followup)#305
jinhongkuan merged 1 commit into
devfrom
58-m6-seeder-fix

Conversation

@silongtan

Copy link
Copy Markdown
Collaborator

Summary

Tail-end fix on top of PR #304. The first M6 baseline reading on dev surfaced overall recall 0.000 with 14/25 cases erroring — root cause is the LLM-08 ingest rate limiter (#216, burst=10 / refill=1.0/s) refusing cases 12+ during seeding. The rate limiter is for production agent-loop safety, not eval throughput; there's already a documented env var to bypass it (BICAMERAL_INGEST_RATE_LIMIT_DISABLE).

Math behind the brokenness

The seeder runs 25 cases back-to-back in one process. Bucket starts at 10 tokens, refills at 1/s. Seeding is fast, so:

10 cases burst-through  +  ~1 refill while seeding   =  11 cases pass
14 cases (U4-U8 + all 9 T*) hit empty bucket         →  _IngestRefused

Post-#304 CI on dev confirms the pattern exactly:

M6 preflight retrieval recall eval — 25 cases
  overall recall : 0.000   errors: 14
  transitive_relevance   : 0/9 surfaced, 9 errors  ← all rate-limited
  unbound_decision       : 0/8 surfaced, 5 errors  ← last 5 rate-limited
  vocabulary_mismatch    : 0/8 surfaced, 0 errors  ← first 8, ran clean

(The vocabulary_mismatch zero is the honest BM25-can't-bridge-vocab baseline — designed to surface that miss mode. The eval is working; it just can't measure the other two categories because the seeder doesn't reach them.)

Fix

One-line env var in the seeder's per-case setup, saved + restored like REPO_PATH and SURREAL_URL:

prev_ingest_rate = os.environ.get("BICAMERAL_INGEST_RATE_LIMIT_DISABLE")
os.environ["BICAMERAL_INGEST_RATE_LIMIT_DISABLE"] = "1"

Plus the matching restore block in the finally: clause. Total diff: 15 lines.

Alternatives considered + rejected

Alternative Why rejected
Clear _RATE_LIMIT_REGISTRY between cases Reaches into private module state; skips the documented env-var contract
Sleep between cases to allow refill Slow; hides the design intent ("rate limiter doesn't apply to evals")
Lower burst/refill via .bicameral/config.yaml per fixture Every Phase B eval surface would need to re-author the same config

The env-var path is the documented API and one line.

Expected after this fix

  • vocabulary_mismatch stays at 0/8 surfaced (that's the honest BM25 baseline — the whole point of the category)
  • transitive_relevance + unbound_decision produce non-zero recall once the seeder doesn't trip the limiter
  • Phase B picks its direction from the now-readable per-category breakdown

Local verification

  • ✅ 16/16 sociable unit tests pass on the classifier + aggregator
  • ✅ ruff check + format + mypy all green on the touched file
  • bicameral.link_commit clean — 0 drift, 0 pending checks

Refs

Refs #58 (Phase A baseline). Followup to PR #304.

🤖 Generated with Claude Code

…#58 followup)

PR #304's first CI baseline produced overall recall 0.000 with 14/25
cases erroring — root cause: the M6 seeder runs 25 cases back-to-back
in a single process, and the LLM-08 ingest rate limiter (#216, burst=10
/ refill=1.0/s) refuses cases 12+ with `_IngestRefused("rate_limit_
exceeded")`. Math: 10 initial tokens + ~1 refill while seeding the first
11 cases = 11 cases through, then 14 cases (U4-U8 + all 9 T*) erred.

The rate limiter is for production agent-loop safety, not eval
throughput. There's already a documented env var to disable it
(see `handlers.ingest._check_rate_limit` docstring):
``BICAMERAL_INGEST_RATE_LIMIT_DISABLE`` truthy → bucket check is
short-circuited. Setting it in the seeder's per-case env setup (saved
+ restored like `REPO_PATH` and `SURREAL_URL`) is the documented path.

Symptom before this fix (post-#304 CI on dev):
  M6 preflight retrieval recall eval — 25 cases
    overall recall : 0.000   errors: 14
    transitive_relevance   : 0/9 surfaced, 9 errors  ← all rate-limited
    unbound_decision       : 0/8 surfaced, 5 errors  ← last 5 rate-limited
    vocabulary_mismatch    : 0/8 surfaced, 0 errors  ← first 8, ran clean

Expected after this fix: vocabulary_mismatch stays 0/8 surfaced (that's
the honest BM25-can't-bridge-vocab baseline the eval was designed to
surface). transitive_relevance + unbound_decision should produce
non-zero recall once the seeder doesn't trip the rate limiter.

Belt-and-suspenders alternatives considered:
  - clear the `_RATE_LIMIT_REGISTRY` dict between cases — works but
    reaches into private state and skips the env-var contract
  - sleep between cases to allow refill — works but slow + hides the
    fact that the rate limiter isn't appropriate for evals
  - lower burst/refill via `.bicameral/config.yaml` in the synthetic
    repo — works but requires every Phase B eval surface to re-author
    the same config

The env-var path is the documented API and one line.

Smoke verification
------------------

  - 16/16 sociable unit tests pass on the classifier + aggregator
  - ruff check + format + mypy all green on the touched file

Refs #58 (Phase A baseline). Followup to PR #304.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 11, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eaeb49ce-9ec2-4ef1-a5ba-dde2b5383a0b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 58-m6-seeder-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jinhongkuan jinhongkuan merged commit 14188f8 into dev May 11, 2026
7 checks passed
@silongtan silongtan deleted the 58-m6-seeder-fix branch May 11, 2026 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants