From 2c96dca9abe28373cf2e314d344804433684a760 Mon Sep 17 00:00:00 2001 From: Silong Tan Date: Sun, 10 May 2026 21:58:40 -0400 Subject: [PATCH] fix(eval): disable ingest rate limit in M6 seeder to unblock baseline (#58 followup) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #304's first CI baseline produced overall recall 0.000 with 14/25 cases erroring — root cause: the M6 seeder runs 25 cases back-to-back in a single process, and the LLM-08 ingest rate limiter (#216, burst=10 / refill=1.0/s) refuses cases 12+ with `_IngestRefused("rate_limit_ exceeded")`. Math: 10 initial tokens + ~1 refill while seeding the first 11 cases = 11 cases through, then 14 cases (U4-U8 + all 9 T*) erred. The rate limiter is for production agent-loop safety, not eval throughput. There's already a documented env var to disable it (see `handlers.ingest._check_rate_limit` docstring): ``BICAMERAL_INGEST_RATE_LIMIT_DISABLE`` truthy → bucket check is short-circuited. Setting it in the seeder's per-case env setup (saved + restored like `REPO_PATH` and `SURREAL_URL`) is the documented path. Symptom before this fix (post-#304 CI on dev): M6 preflight retrieval recall eval — 25 cases overall recall : 0.000 errors: 14 transitive_relevance : 0/9 surfaced, 9 errors ← all rate-limited unbound_decision : 0/8 surfaced, 5 errors ← last 5 rate-limited vocabulary_mismatch : 0/8 surfaced, 0 errors ← first 8, ran clean Expected after this fix: vocabulary_mismatch stays 0/8 surfaced (that's the honest BM25-can't-bridge-vocab baseline the eval was designed to surface). transitive_relevance + unbound_decision should produce non-zero recall once the seeder doesn't trip the rate limiter. Belt-and-suspenders alternatives considered: - clear the `_RATE_LIMIT_REGISTRY` dict between cases — works but reaches into private state and skips the env-var contract - sleep between cases to allow refill — works but slow + hides the fact that the rate limiter isn't appropriate for evals - lower burst/refill via `.bicameral/config.yaml` in the synthetic repo — works but requires every Phase B eval surface to re-author the same config The env-var path is the documented API and one line. Smoke verification ------------------ - 16/16 sociable unit tests pass on the classifier + aggregator - ruff check + format + mypy all green on the touched file Refs #58 (Phase A baseline). Followup to PR #304. Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/eval/_preflight_m6_seeder.py | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/tests/eval/_preflight_m6_seeder.py b/tests/eval/_preflight_m6_seeder.py index 4bba45ef..68141945 100644 --- a/tests/eval/_preflight_m6_seeder.py +++ b/tests/eval/_preflight_m6_seeder.py @@ -128,8 +128,19 @@ async def seed_m6_case_into_fresh_ctx( # fresh path / fresh ledger. prev_repo = os.environ.get("REPO_PATH") prev_surreal = os.environ.get("SURREAL_URL") + # #216 LLM-08 — the ingest rate limiter has burst=10 / refill=1/s by + # default. The eval runs 25 cases back-to-back in the same process; + # the first ~11 cases consume the burst + refills, and cases 12+ + # raise `_IngestRefused("rate_limit_exceeded")` during seeding, + # corrupting the recall measurement (seeder errors aren't agent + # misses, but they DO eat the cases' slots). The rate limiter is + # for production agent-loop safety, not eval throughput. Disable for + # this run via the documented env var (see `handlers.ingest. + # _check_rate_limit` docstring). + prev_ingest_rate = os.environ.get("BICAMERAL_INGEST_RATE_LIMIT_DISABLE") os.environ["REPO_PATH"] = str(repo_root) os.environ["SURREAL_URL"] = "memory://" + os.environ["BICAMERAL_INGEST_RATE_LIMIT_DISABLE"] = "1" reset_ledger_singleton() reset_code_locator_cache() @@ -227,5 +238,9 @@ async def seed_m6_case_into_fresh_ctx( os.environ.pop("SURREAL_URL", None) else: os.environ["SURREAL_URL"] = prev_surreal + if prev_ingest_rate is None: + os.environ.pop("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", None) + else: + os.environ["BICAMERAL_INGEST_RATE_LIMIT_DISABLE"] = prev_ingest_rate reset_ledger_singleton() reset_code_locator_cache()