test+fix: vitest/route tests, acceptance gates, starter rules, @devvit SDK alignment, CI#1
Conversation
- test/setup.ts: in-memory Redis + Devvit SDK mocks for vitest - 6 test files (95 tests): rule-schema, evaluator, executor, fact-bag, system-prompt, starter-rules — all passing - src/shared/starter-rules.ts: 5 conservative seed rules (SAFE actions only, shadow:true); wired into onAppInstall trigger as a draft bundle - scripts/acceptance.ts: `npm run acceptance` — Day1-Day4 exit gates encoded as config<->code consistency + schema + unit-suite checks (4/4 passing) - devvit.json: remove orphaned `activateRuleForm` form entry (no route in index.ts; activation flows through dashboardForm) — caught by acceptance G1 - HANDOFF.md: update artifact inventory Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughPR은 vibe-mod Devvit 모드 도구에 Day 1 수락 게이트 검증 인프라와 5개 기본 규칙 시드 번들을 추가합니다. 새로운 acceptance.ts 스크립트가 4개 게이트 검증(설정↔코드 일관성, OpenAI 정렬, 운영 배선, 강화)을 자동으로 실행하고, 테스트 이중화와 종합 테스트 스위트(evaluator, executor, fact-bag, rule-schema, starter-rules, system-prompt)를 포함하며, 앱 설치 시 기본 규칙을 초안으로 분배합니다. ChangesDay 1 Acceptance Gates and Starter Rules
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive testing suite and an automated acceptance script to verify the application's state against defined 'exit gates'. Key additions include unit tests for the evaluator, executor, and fact-bag components, along with a global Vitest setup that mocks the Devvit SDK and Redis. Additionally, the PR implements a starter rule seeding mechanism triggered on app installation. Feedback was provided regarding the Redis mock's zAdd implementation, which currently allows duplicate members, potentially leading to inconsistent behavior in future tests.
| zAdd: async (k: string, entry: { member: string; score: number }) => { | ||
| const arr = zsets.get(k) ?? []; | ||
| arr.push(entry); | ||
| zsets.set(k, arr); | ||
| }, |
There was a problem hiding this comment.
The current zAdd implementation in the Redis mock allows duplicate members in the set, which deviates from standard Redis behavior where ZADD updates the score of an existing member. While this doesn't affect the current tests (which use unique IDs), it could lead to incorrect results in future tests that attempt to update a member's score or rely on member uniqueness.
| zAdd: async (k: string, entry: { member: string; score: number }) => { | |
| const arr = zsets.get(k) ?? []; | |
| arr.push(entry); | |
| zsets.set(k, arr); | |
| }, | |
| zAdd: async (k: string, entry: { member: string; score: number }) => { | |
| const arr = (zsets.get(k) ?? []).filter((e) => e.member !== entry.member); | |
| arr.push(entry); | |
| zsets.set(k, arr); | |
| }, |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1deaa1fba7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| on: ['onPostSubmit'], | ||
| when: { | ||
| all: [ | ||
| { fact: 'content.upperCaseRatio', op: 'gt', value: 0.7 }, |
There was a problem hiding this comment.
Match all-caps titles using title content, not body ratio
Update this predicate to evaluate the title itself instead of content.upperCaseRatio. In buildPostFactBag() (src/server/fact-bag.ts), content.upperCaseRatio is derived from post.selftext (body), so this starter rule misses the common case of an ALL-CAPS title on a link post or a post with empty/normal body text. That makes the seeded “shouting title” rule materially less effective than its name and sourceNL promise.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/server/index.ts (1)
527-538:⚠️ Potential issue | 🟠 Major | ⚡ Quick win재설치 시 ACTIVE 번들을 무조건 초기화해 운영 규칙이 사라질 수 있습니다.
Line 537에서 항상
:rules:active를 빈 번들로 덮어써서, 설치 트리거가 다시 실행되면 기존 활성 규칙이 즉시 비활성화됩니다. draft는 보호하면서 active는 보호하지 않아 데이터 손실 위험이 큽니다.🔧 제안 수정
const emptyActive: RuleBundleType = { schemaVersion: '1.0.0', bundleVersion: 1, compiledAt: Date.now(), llmModel: 'seed', llmTokensIn: 0, llmTokensOut: 0, rules: [], }; - await redis.set(`${subredditName}:rules:active`, JSON.stringify(emptyActive)); + const activeKey = `${subredditName}:rules:active`; + if (!(await redis.get(activeKey))) { + await redis.set(activeKey, JSON.stringify(emptyActive)); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/index.ts` around lines 527 - 538, The current code always overwrites the active bundle by calling redis.set(`${subredditName}:rules:active`, JSON.stringify(emptyActive)), which erases existing runtime rules on reinstall; change this to preserve an existing active bundle by checking for the key first (e.g., use redis.exists/GET or a SETNX/SET with NX option) and only write the emptyActive when no active bundle exists, referring to the symbols emptyActive, redis.set (replace with SETNX or conditional set), and the `${subredditName}:rules:active` key so that draft protection remains but active bundles are not unintentionally cleared.
🧹 Nitpick comments (3)
scripts/acceptance.ts (3)
101-101: ⚡ Quick winAction verb 부분 문자열 매칭이 false positive를 유발할 수 있습니다.
VIBE_MOD_SYSTEM_PROMPT.includes(a)는 action verb가 다른 단어의 일부로 나타나도 통과합니다. 예를 들어,"lock"이"unlock"에 포함되어도 통과합니다.♻️ 단어 경계를 사용한 개선 제안
- for (const a of [...SAFE_ACTIONS, ...GUARDED_ACTIONS]) assert(VIBE_MOD_SYSTEM_PROMPT.includes(a), `system prompt is missing action verb ${a}`); + for (const a of [...SAFE_ACTIONS, ...GUARDED_ACTIONS]) { + const pattern = new RegExp(`\\b${a.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}\\b`); + assert(pattern.test(VIBE_MOD_SYSTEM_PROMPT), `system prompt is missing action verb ${a}`); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/acceptance.ts` at line 101, The current check uses VIBE_MOD_SYSTEM_PROMPT.includes(a) which can false-positive match substrings (e.g., "lock" in "unlock"); change the assertion to verify whole-word matches by either testing a word-boundary regex for each action (e.g., new RegExp(`\\b${escapeRegex(a)}\\b`) against VIBE_MOD_SYSTEM_PROMPT) or by tokenizing VIBE_MOD_SYSTEM_PROMPT with a non-word split (e.g., split on /\W+/) and checking the resulting array contains the action; update the loop that iterates SAFE_ACTIONS and GUARDED_ACTIONS to use that exact-word matching and ensure you escape special regex characters when building the RegExp (symbols: SAFE_ACTIONS, GUARDED_ACTIONS, VIBE_MOD_SYSTEM_PROMPT).
95-95: ⚡ Quick winFact path 부분 문자열 매칭이 false positive를 유발할 수 있습니다.
VIBE_MOD_SYSTEM_PROMPT.includes(f)는 fact path가 다른 단어의 일부로 나타나도 통과합니다. 예를 들어, fact path가"age"이고 프롬프트에"message"만 있어도 통과합니다.♻️ 단어 경계를 사용한 개선 제안
- for (const f of FactPaths) assert(VIBE_MOD_SYSTEM_PROMPT.includes(f), `system prompt is missing fact path ${f}`); + for (const f of FactPaths) { + const pattern = new RegExp(`\\b${f.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')}\\b`); + assert(pattern.test(VIBE_MOD_SYSTEM_PROMPT), `system prompt is missing fact path ${f}`); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/acceptance.ts` at line 95, 현재 for-loop check using VIBE_MOD_SYSTEM_PROMPT.includes(f) causes false positives because substring matches (e.g., "age" in "message"); update the check to assert whole-word matches instead: for each entry in FactPaths validate presence in VIBE_MOD_SYSTEM_PROMPT using a word-boundary match (e.g., construct a regex with \b around the fact path) or split the prompt into tokens and test exact equality; update the assertion message to remain `system prompt is missing fact path ${f}` and keep the loop over FactPaths and the VIBE_MOD_SYSTEM_PROMPT identifier.
176-176: ⚡ Quick winonAppInstall 체크가 주석 내 참조도 통과시킬 수 있습니다.
indexTs.includes('seedStarterRules(')는 주석이나 문자열 리터럴에 포함된 경우도 통과합니다. 실제로 함수가 호출되는지 확인하려면 더 정확한 검증이 필요합니다.♻️ AST 기반 검증 제안
실제 함수 호출을 확인하려면:
- run: () => assert(indexTs.includes('seedStarterRules('), 'on-app-install does not call seedStarterRules()'), + run: () => { + const callPattern = /(?:^|[^/\*])\s*seedStarterRules\s*\(/m; + assert(callPattern.test(indexTs), 'on-app-install does not call seedStarterRules()'); + },또는 주석을 제외한 검증 스크립트 추가를 고려하세요.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/acceptance.ts` at line 176, The current check uses indexTs.includes('seedStarterRules(') which can be fooled by comments or string literals; instead parse indexTs into an AST and verify there is an actual CallExpression invoking seedStarterRules (look for a CallExpression with callee Identifier named "seedStarterRules" or equivalent MemberExpression) before asserting in the run function; update the validation in the run closure to parse indexTs (e.g., with TypeScript or Babel parser) and search AST nodes for a real function call to seedStarterRules rather than a plain substring match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/shared/system-prompt.test.ts`:
- Around line 64-69: The test for clarification examples currently only asserts
Array.isArray(ex.assistant.suggestedAnswers) which allows an empty array; update
the assertion in the test that iterates FEW_SHOT_EXAMPLES (the block using
FEW_SHOT_EXAMPLES and ex.assistant) to also assert that suggestedAnswers has at
least one entry (e.g., check ex.assistant.suggestedAnswers.length > 0) while
keeping the existing non-empty question check; ensure you reference
ex.assistant.suggestedAnswers (and handle undefined defensively if needed) so
the test enforces both a question and one or more suggested answers.
In `@test/setup.ts`:
- Around line 159-170: The beforeEach currently calls vi.clearAllMocks(), which
only clears call history but preserves mock implementations and can leak
behavior between tests; replace (or add) a call to vi.resetAllMocks() so
implementations are reset, and then re-establish the default mocks
(fakeReddit.getCurrentSubredditName, fakeReddit.getCurrentUser,
fakeReddit.getUserByUsername, fakeReddit.getUserKarmaFromCurrentSubreddit,
fakeReddit.getModerators, fakeSettings.get) after the reset; ensure you still
clear fakeRedis.store/hashes/zsets and reapply those fakeReddit/fakeSettings
mockResolvedValue assignments inside beforeEach so each test starts with a
clean, consistent mock state.
---
Outside diff comments:
In `@src/server/index.ts`:
- Around line 527-538: The current code always overwrites the active bundle by
calling redis.set(`${subredditName}:rules:active`, JSON.stringify(emptyActive)),
which erases existing runtime rules on reinstall; change this to preserve an
existing active bundle by checking for the key first (e.g., use redis.exists/GET
or a SETNX/SET with NX option) and only write the emptyActive when no active
bundle exists, referring to the symbols emptyActive, redis.set (replace with
SETNX or conditional set), and the `${subredditName}:rules:active` key so that
draft protection remains but active bundles are not unintentionally cleared.
---
Nitpick comments:
In `@scripts/acceptance.ts`:
- Line 101: The current check uses VIBE_MOD_SYSTEM_PROMPT.includes(a) which can
false-positive match substrings (e.g., "lock" in "unlock"); change the assertion
to verify whole-word matches by either testing a word-boundary regex for each
action (e.g., new RegExp(`\\b${escapeRegex(a)}\\b`) against
VIBE_MOD_SYSTEM_PROMPT) or by tokenizing VIBE_MOD_SYSTEM_PROMPT with a non-word
split (e.g., split on /\W+/) and checking the resulting array contains the
action; update the loop that iterates SAFE_ACTIONS and GUARDED_ACTIONS to use
that exact-word matching and ensure you escape special regex characters when
building the RegExp (symbols: SAFE_ACTIONS, GUARDED_ACTIONS,
VIBE_MOD_SYSTEM_PROMPT).
- Line 95: 현재 for-loop check using VIBE_MOD_SYSTEM_PROMPT.includes(f) causes
false positives because substring matches (e.g., "age" in "message"); update the
check to assert whole-word matches instead: for each entry in FactPaths validate
presence in VIBE_MOD_SYSTEM_PROMPT using a word-boundary match (e.g., construct
a regex with \b around the fact path) or split the prompt into tokens and test
exact equality; update the assertion message to remain `system prompt is missing
fact path ${f}` and keep the loop over FactPaths and the VIBE_MOD_SYSTEM_PROMPT
identifier.
- Line 176: The current check uses indexTs.includes('seedStarterRules(') which
can be fooled by comments or string literals; instead parse indexTs into an AST
and verify there is an actual CallExpression invoking seedStarterRules (look for
a CallExpression with callee Identifier named "seedStarterRules" or equivalent
MemberExpression) before asserting in the run function; update the validation in
the run closure to parse indexTs (e.g., with TypeScript or Babel parser) and
search AST nodes for a real function call to seedStarterRules rather than a
plain substring match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 608bf06e-46c2-45bb-b45d-eb956af19946
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (12)
HANDOFF.mddevvit.jsonscripts/acceptance.tssrc/server/evaluator.test.tssrc/server/executor.test.tssrc/server/fact-bag.test.tssrc/server/index.tssrc/shared/rule-schema.test.tssrc/shared/starter-rules.test.tssrc/shared/starter-rules.tssrc/shared/system-prompt.test.tstest/setup.ts
💤 Files with no reviewable changes (1)
- devvit.json
| it('clarification examples carry a question and suggested answers', () => { | ||
| for (const ex of FEW_SHOT_EXAMPLES) { | ||
| if (!('needsClarification' in ex.assistant)) continue; | ||
| expect((ex.assistant.question ?? '').length).toBeGreaterThan(0); | ||
| expect(Array.isArray(ex.assistant.suggestedAnswers)).toBe(true); | ||
| } |
There was a problem hiding this comment.
clarification 예시에서 빈 suggestedAnswers가 통과됩니다.
Line 68은 배열 여부만 검증해서 []도 통과합니다. 테스트 의도(질문 + 제안 답변 제공)를 보장하려면 최소 1개 이상도 함께 확인해야 합니다.
🔧 제안 수정
it('clarification examples carry a question and suggested answers', () => {
for (const ex of FEW_SHOT_EXAMPLES) {
if (!('needsClarification' in ex.assistant)) continue;
expect((ex.assistant.question ?? '').length).toBeGreaterThan(0);
expect(Array.isArray(ex.assistant.suggestedAnswers)).toBe(true);
+ expect((ex.assistant.suggestedAnswers ?? []).length).toBeGreaterThan(0);
}
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it('clarification examples carry a question and suggested answers', () => { | |
| for (const ex of FEW_SHOT_EXAMPLES) { | |
| if (!('needsClarification' in ex.assistant)) continue; | |
| expect((ex.assistant.question ?? '').length).toBeGreaterThan(0); | |
| expect(Array.isArray(ex.assistant.suggestedAnswers)).toBe(true); | |
| } | |
| it('clarification examples carry a question and suggested answers', () => { | |
| for (const ex of FEW_SHOT_EXAMPLES) { | |
| if (!('needsClarification' in ex.assistant)) continue; | |
| expect((ex.assistant.question ?? '').length).toBeGreaterThan(0); | |
| expect(Array.isArray(ex.assistant.suggestedAnswers)).toBe(true); | |
| expect((ex.assistant.suggestedAnswers ?? []).length).toBeGreaterThan(0); | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/shared/system-prompt.test.ts` around lines 64 - 69, The test for
clarification examples currently only asserts
Array.isArray(ex.assistant.suggestedAnswers) which allows an empty array; update
the assertion in the test that iterates FEW_SHOT_EXAMPLES (the block using
FEW_SHOT_EXAMPLES and ex.assistant) to also assert that suggestedAnswers has at
least one entry (e.g., check ex.assistant.suggestedAnswers.length > 0) while
keeping the existing non-empty question check; ensure you reference
ex.assistant.suggestedAnswers (and handle undefined defensively if needed) so
the test enforces both a question and one or more suggested answers.
| beforeEach(() => { | ||
| fakeRedis.store.clear(); | ||
| fakeRedis.hashes.clear(); | ||
| fakeRedis.zsets.clear(); | ||
| vi.clearAllMocks(); | ||
| fakeReddit.getCurrentSubredditName.mockResolvedValue('testsub'); | ||
| fakeReddit.getCurrentUser.mockResolvedValue({ id: 't2_caller', username: 'caller' }); | ||
| fakeReddit.getUserByUsername.mockResolvedValue(null); | ||
| fakeReddit.getUserKarmaFromCurrentSubreddit.mockResolvedValue(0); | ||
| fakeReddit.getModerators.mockResolvedValue([]); | ||
| fakeSettings.get.mockResolvedValue(undefined); | ||
| }); |
There was a problem hiding this comment.
beforeEach 격리 의도와 다르게 mock 구현이 누수될 수 있습니다.
Line 159에서 vi.clearAllMocks()만 사용하면 호출 이력만 초기화되고 구현은 남습니다. 테스트에서 특정 mock 구현을 바꾼 경우 다음 케이스로 전파될 수 있어 flaky 원인이 됩니다.
🔧 제안 수정
beforeEach(() => {
fakeRedis.store.clear();
fakeRedis.hashes.clear();
fakeRedis.zsets.clear();
- vi.clearAllMocks();
+ vi.resetAllMocks();
fakeReddit.getCurrentSubredditName.mockResolvedValue('testsub');
fakeReddit.getCurrentUser.mockResolvedValue({ id: 't2_caller', username: 'caller' });
fakeReddit.getUserByUsername.mockResolvedValue(null);
fakeReddit.getUserKarmaFromCurrentSubreddit.mockResolvedValue(0);
fakeReddit.getModerators.mockResolvedValue([]);
fakeSettings.get.mockResolvedValue(undefined);
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| beforeEach(() => { | |
| fakeRedis.store.clear(); | |
| fakeRedis.hashes.clear(); | |
| fakeRedis.zsets.clear(); | |
| vi.clearAllMocks(); | |
| fakeReddit.getCurrentSubredditName.mockResolvedValue('testsub'); | |
| fakeReddit.getCurrentUser.mockResolvedValue({ id: 't2_caller', username: 'caller' }); | |
| fakeReddit.getUserByUsername.mockResolvedValue(null); | |
| fakeReddit.getUserKarmaFromCurrentSubreddit.mockResolvedValue(0); | |
| fakeReddit.getModerators.mockResolvedValue([]); | |
| fakeSettings.get.mockResolvedValue(undefined); | |
| }); | |
| beforeEach(() => { | |
| fakeRedis.store.clear(); | |
| fakeRedis.hashes.clear(); | |
| fakeRedis.zsets.clear(); | |
| vi.resetAllMocks(); | |
| fakeReddit.getCurrentSubredditName.mockResolvedValue('testsub'); | |
| fakeReddit.getCurrentUser.mockResolvedValue({ id: 't2_caller', username: 'caller' }); | |
| fakeReddit.getUserByUsername.mockResolvedValue(null); | |
| fakeReddit.getUserKarmaFromCurrentSubreddit.mockResolvedValue(0); | |
| fakeReddit.getModerators.mockResolvedValue([]); | |
| fakeSettings.get.mockResolvedValue(undefined); | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/setup.ts` around lines 159 - 170, The beforeEach currently calls
vi.clearAllMocks(), which only clears call history but preserves mock
implementations and can leak behavior between tests; replace (or add) a call to
vi.resetAllMocks() so implementations are reset, and then re-establish the
default mocks (fakeReddit.getCurrentSubredditName, fakeReddit.getCurrentUser,
fakeReddit.getUserByUsername, fakeReddit.getUserKarmaFromCurrentSubreddit,
fakeReddit.getModerators, fakeSettings.get) after the reset; ensure you still
clear fakeRedis.store/hashes/zsets and reapply those fakeReddit/fakeSettings
mockResolvedValue assignments inside beforeEach so each test starts with a
clean, consistent mock state.
SDK adaptation (npx tsc --noEmit now passes — was ~45 errors):
- src/server/devvit-helpers.ts: getCurrentSubreddit{Name,Ref}, asT1/asT3 casts
- reddit.getCurrentSubredditName() -> reddit.getCurrentSubreddit().name
- getPostById/getCommentById take t3_/t1_ branded ids
- thing.report() -> reddit.report(thing, { reason })
- getModerators()/Listing -> .all()
- getUserKarmaFromCurrentSubreddit(name) -> { fromComments, fromPosts }
- modMail.create -> modMail.createModNotification({ subject, bodyMarkdown, subredditId })
- redis has no zCount -> count a last-hour score window via zRange
- @devvit/web/shared has no TaskRequest/TaskResponse -> local TaskBody/TaskAck
Bug fix: executor wrote audit/rollback/breaker keys un-scoped while the
Dashboard, Undo and circuit-breaker handlers read `${sub}:audit` etc — so
Dashboard/Undo never found real audit entries, and the per-sub breaker was
checked against the wrong key. All audit/rollback/breaker keys are now
sub-scoped consistently.
Tests: 148 passing (was 95). Added 6 feature-segmented route test files that
drive the real Hono app via app.fetch() with Devvit + OpenAI mocked:
routes-compose, routes-dashboard, routes-undo, routes-triggers,
routes-scheduler, routes-settings. test/setup.ts gains an in-memory Redis,
Listing/Reddit doubles matching the real SDK, and an OpenAI fetch double.
CI: .github/workflows/ci.yml runs npm ci -> tsc --noEmit -> npm test ->
npm run acceptance on push/PR. Acceptance G4 now includes the typecheck gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
`npm ci` errors with EBADPLATFORM on esbuild's per-platform optional deps (@esbuild/aix-ppc64 etc.) against the committed lockfile on linux runners. `npm install` honours an in-sync lockfile and skips non-matching optional packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
External code-review pointed out 5 valid issues. All addressed in this commit on the same branch — PR #44 picks them up automatically. ## Gemini code-assist ### #1 — Manage menu: parallelize per-rule dry-run reads The previous serial `for` loop multiplied Redis latency by the draft count. With up to 50 drafts and ~10ms per round-trip the difference is ~500ms vs ~10ms wall time. Switched to `Promise.allSettled([...])` so a single failed fetch doesn't stop the others. ### #2 — applyManageActions: enforce the 50-rule cap on apply pause / activate / unpause moves can push a bundle over the 50-rule cap even when no compose-rule call is made. Added an explicit check before the dual-write txn — the moderator gets a "Rule cap exceeded (active=N, draft=N, max=50). Delete a rule first" toast and the bundles stay untouched. New test in routes-manage.test.ts. ### #3 — Use the configured openaiModel as the fallback (not the literal "manage") The previous `'manage'` fallback for `bundle.llmModel` broke `estimateTokenCost()` because 'manage' isn't a key in `OPENAI_PRICING_USD_PER_TOKEN`. The dashboard shows "(~$0 on manage)" until a real compile lands. Now we read the configured `openaiModel` setting through `readOpenaiModel()` (already SELECTION-array-safe per PR #39). ### #4 — Atomic dual-write via WATCH/MULTI/EXEC (the previous PR description's claim was wrong) The previous `Promise.all([redis.set(activeKey, ...), redis.set(draftKey, ...)])` was NOT atomic — if the second `set` failed (e.g. plugin RPC blip), pause could leave a rule absent from active without showing up in draft. Switched to Devvit's `redis.watch(...).multi().set(...).set(...).exec()` pattern (same shape executor.ts already uses for audit writes), and check `exec()`'s null return for the WATCH-aborted case so two moderators managing rules at the same time get an actionable "Another moderator changed the rules at the same time — re-open Manage rules" toast. ## CodeRabbit ### #5 — humanizeRule: cap value-stringification at 100 chars A rule that uses `op: in` against a long allowlist (e.g. 50 banned domains) would otherwise render thousands of characters into the confirm form's `compiledSummary` field, blowing past Devvit's modal description budget and obscuring the actual rule structure. Truncate with `...` after 100 chars in the predicate-tree leaf renderer. ## Test infrastructure test/devvit-testkit.ts: fakeRedis.watch().exec() now returns `[] | null` (matching real Devvit Redis MULTI/EXEC semantics) instead of `void`. Without this, the new WATCH-abort detection in applyManageActions would fire on every test. Updated `FakeTxn.exec` type accordingly. ## Verification - `npm run check` 4/4 gates green - 210 tests pass (1 skipped) — added 1 cap-overflow test in routes-manage.test.ts
5 review issues from Gemini code-assist on PR #45 (refactor/split-server-routes). ## HIGH severity ### #1 Non-ASCII regex made unambiguous helpers/openai.ts: the `replace(/[<U+0080>-<U+FFFF>]/g, ...)` literal contained the U+0080 control character invisibly, which made review tools render it as `/[-<box>]/` and flag it as "matches only `-`". Behaviour was always correct, but the explicit `/[�-ï¿¿]/g` form removes any doubt. ### #2 WATCH ordering in applyManageActions routes/manage.ts: the previous shape did GET → modify → WATCH → MULTI → EXEC, so the optimistic lock didn't actually cover the data we were about to mutate. Restructured into a CAS retry loop: 1. WATCH first 2. GET (under WATCH — uses redis.get because Devvit's `txn.get` queues into the transaction and returns a chainable, not the value) 3. modify in memory 4. MULTI + SET + EXEC (null exec → another mod's call landed first between WATCH and now → loop and retry up to MAX_CAS_ATTEMPTS = 3). After 3 contended attempts the moderator gets a clear "another mod changed the rules — re-open and try again" toast. ### #3 Sequential audit-entry hGetAll in undo handler routes/undo.ts: 100 sequential hGetAll calls could blow the menu handler's deadline. Switched to `Promise.allSettled([...])` and scan the resolved array in order so the "most recent first" break-on-found behaviour is preserved. ## MEDIUM severity ### #4 Sequential audit-entry hGetAll in dashboard routes/dashboard.ts: same parallelization for the 20-entry recent-actions fetch and the per-draft-rule dry-run fetch. ### #5 Outdated comment on /internal/trigger/on-app-install routes/triggers.ts: the comment claimed the submit triggers seed in-band, but they actually fail-safe by returning ok when no bundle exists. Updated to describe the real flow: starter rules are seeded by the deferred /internal/scheduler/seed-on-install task, registered in devvit.json's scheduler.tasks block. ## Verification - `npm run check` 4/4 gates green: typecheck + lint + Prettier + 211 tests (1 skipped) + 3 @devvit/test cases + acceptance G1-G4 - All PR #45 tests still pass with no test-side modifications
5 review issues from Gemini code-assist + CodeRabbit on PR #49. ## Major ### CodeRabbit #4 — pending consumption now atomic (WATCH/MULTI/DEL/EXEC) Two concurrent submits with the same `pendingId` (back-button + re-submit, double-click on Save, multi-tab moderator) could both read the entry before either deleted it, doubling the bundle write + dry-run schedule. Replaced the plain GET-then-DEL with a WATCH/MULTI/DEL/EXEC round-trip. Whichever caller commits first wins; the loser sees `exec() == null` and gets an actionable "Another submission consumed this confirmation" toast. Updated FakeTxn to match (added `del()` method on the txn object — the production Devvit Redis client supports it, the fake didn't). ### CodeRabbit #5 — daily compile counter now bumps at compile time Quota was previously incremented inside `persistRuleAndStartDryRun()`, which only runs on Save. A moderator could repeatedly hit Compile + Cancel and burn through OpenAI tokens without the per-day counter ever moving. Moved the increment to immediately after a successful OpenAI return in compose-rule-submit. The token cost is real either way, so the quota now reflects it either way. ## Medium ### Gemini #1 — dashboard cancelLabel no longer says "Don't show intro again" Devvit's Cancel button does NOT trigger the form submit handler, so clicking the misnamed button could not actually persist `dismissOnboarding` — it just closed the modal. Reverted the label to "Cancel"; the dismissOnboarding boolean toggle + Close (acceptLabel) is the real opt-out path that submits the form values. ## Minor ### CodeRabbit #2 — verify script tolerates non-int SLOW_MO `int(os.environ.get("SLOW_MO", "0"))` crashed if a user passed e.g. `SLOW_MO=fast`. Added `_safe_int` helper that warns + falls back to the default rather than ending the recording session before it starts. ### CodeRabbit #3 — atomic SET + TTL via the `expiration` option Previous shape was `redis.set(key, value)` then `redis.expire(key, ttl)`, which could leak a TTL-less pending key if expire failed after set succeeded. Switched to `redis.set(key, value, { expiration: new Date(...) })` which is the same single round-trip the rollback writer in executor.ts already uses. ## Verification - `npm run check` 4/4 gates green (211 unit/integration tests) - All compose tests still pass with the new pending-id flow - Race condition smoke: a second compose-confirm-submit call against the same pendingId now returns the "another submission consumed" toast instead of double-persisting (previously: double bundle write).
…er-rules test+fix: vitest/route tests, acceptance gates, starter rules, @devvit SDK alignment, CI
…Text + toast 1-line summary Audit-driven UX quick wins for the compose flow. Full audit + scenario rationale: - claudedocs/2026-05-14-compose-flow-audit.md (10 findings vs README promise matrix) - docs/demo-scenario.md (60s demo, README-aligned, Hypothesis 3) ## What changes (3 audit findings landed) ### Finding #1 (CRITICAL for demo) — Clarify modal renders LLM suggestedAnswers as a dropdown The LLM has *always* returned `{ needsClarification, question, suggestedAnswers: [...] }` (see system-prompt.ts:80-85), but the server discarded `suggestedAnswers` and showed only a free-text textarea. Moderators had to paraphrase the question. Now: when `suggestedAnswers` is present, render a `select` field with one click per option PLUS a `clarificationAnswerOther` paragraph as an override / escape hatch. When the LLM returns no suggestions, fall back to the original free-text path. `compose-rule-submit` defensively unwraps SELECTION-array values (PR #39 pattern) and prefers the "Other" override when non-empty. ### Finding #4 — ban/mute toggle gets explanatory helpText on both forms Mod mental model = "checkbox grants ban permission". Real behaviour = "checkbox lets compile succeed when the LLM emits ban/mute". Added a single shared `ALLOW_GUARDED_HELP` string used by the compose form and the clarify modal so both forms explain the toggle the same way. ### Finding #6 — Success toast carries a 1-line summary + an explicit menu pointer Previously: `Compiled rule "X". Dry-run started — check Dashboard in 30s.` The dashboard menu name is "vibe-mod: View rules + log" and Devvit toasts have no buttons, so "check Dashboard" is unreachable for a first-time mod. Now: `Compiled rule "X". → post: modqueue. Dry-run started — open the subreddit ⋯ menu → "vibe-mod: View rules + log" to see preview.` The 1-line `summarizeRule()` makes the deterministic JSON contract visible without leaving the toast (audit finding #2 alternative — the full preview form stays a post-publish item). ## Verification - `npm run check` 4/4 gates green (typecheck + lint + Prettier + 25 compose tests + 3 @devvit/test + acceptance G1–G4) - 6 new tests in routes-compose.test.ts cover: select rendering, free-text fallback, SELECTION-array unwrap, "Other" override precedence, helpText presence, summary in success toast. ## Why this PR (not bundled with module split) Demo video records this UI. UX changes must land before the README screenshot captures (Phase 3) and the Playwright video automation (Phase 4). Module split (Phase 2) is a separate, independent refactor. Refs: claudedocs/2026-05-14-compose-flow-audit.md (Phase 1.5/1.5b), docs/demo-scenario.md (Phase 2.5b)
External code-review pointed out 5 valid issues. All addressed in this commit on the same branch — PR #44 picks them up automatically. ## Gemini code-assist ### #1 — Manage menu: parallelize per-rule dry-run reads The previous serial `for` loop multiplied Redis latency by the draft count. With up to 50 drafts and ~10ms per round-trip the difference is ~500ms vs ~10ms wall time. Switched to `Promise.allSettled([...])` so a single failed fetch doesn't stop the others. ### #2 — applyManageActions: enforce the 50-rule cap on apply pause / activate / unpause moves can push a bundle over the 50-rule cap even when no compose-rule call is made. Added an explicit check before the dual-write txn — the moderator gets a "Rule cap exceeded (active=N, draft=N, max=50). Delete a rule first" toast and the bundles stay untouched. New test in routes-manage.test.ts. ### #3 — Use the configured openaiModel as the fallback (not the literal "manage") The previous `'manage'` fallback for `bundle.llmModel` broke `estimateTokenCost()` because 'manage' isn't a key in `OPENAI_PRICING_USD_PER_TOKEN`. The dashboard shows "(~$0 on manage)" until a real compile lands. Now we read the configured `openaiModel` setting through `readOpenaiModel()` (already SELECTION-array-safe per PR #39). ### #4 — Atomic dual-write via WATCH/MULTI/EXEC (the previous PR description's claim was wrong) The previous `Promise.all([redis.set(activeKey, ...), redis.set(draftKey, ...)])` was NOT atomic — if the second `set` failed (e.g. plugin RPC blip), pause could leave a rule absent from active without showing up in draft. Switched to Devvit's `redis.watch(...).multi().set(...).set(...).exec()` pattern (same shape executor.ts already uses for audit writes), and check `exec()`'s null return for the WATCH-aborted case so two moderators managing rules at the same time get an actionable "Another moderator changed the rules at the same time — re-open Manage rules" toast. ## CodeRabbit ### #5 — humanizeRule: cap value-stringification at 100 chars A rule that uses `op: in` against a long allowlist (e.g. 50 banned domains) would otherwise render thousands of characters into the confirm form's `compiledSummary` field, blowing past Devvit's modal description budget and obscuring the actual rule structure. Truncate with `...` after 100 chars in the predicate-tree leaf renderer. ## Test infrastructure test/devvit-testkit.ts: fakeRedis.watch().exec() now returns `[] | null` (matching real Devvit Redis MULTI/EXEC semantics) instead of `void`. Without this, the new WATCH-abort detection in applyManageActions would fire on every test. Updated `FakeTxn.exec` type accordingly. ## Verification - `npm run check` 4/4 gates green - 210 tests pass (1 skipped) — added 1 cap-overflow test in routes-manage.test.ts
5 review issues from Gemini code-assist on PR #45 (refactor/split-server-routes). ## HIGH severity ### #1 Non-ASCII regex made unambiguous helpers/openai.ts: the `replace(/[<U+0080>-<U+FFFF>]/g, ...)` literal contained the U+0080 control character invisibly, which made review tools render it as `/[-<box>]/` and flag it as "matches only `-`". Behaviour was always correct, but the explicit `/[�-ï¿¿]/g` form removes any doubt. ### #2 WATCH ordering in applyManageActions routes/manage.ts: the previous shape did GET → modify → WATCH → MULTI → EXEC, so the optimistic lock didn't actually cover the data we were about to mutate. Restructured into a CAS retry loop: 1. WATCH first 2. GET (under WATCH — uses redis.get because Devvit's `txn.get` queues into the transaction and returns a chainable, not the value) 3. modify in memory 4. MULTI + SET + EXEC (null exec → another mod's call landed first between WATCH and now → loop and retry up to MAX_CAS_ATTEMPTS = 3). After 3 contended attempts the moderator gets a clear "another mod changed the rules — re-open and try again" toast. ### #3 Sequential audit-entry hGetAll in undo handler routes/undo.ts: 100 sequential hGetAll calls could blow the menu handler's deadline. Switched to `Promise.allSettled([...])` and scan the resolved array in order so the "most recent first" break-on-found behaviour is preserved. ## MEDIUM severity ### #4 Sequential audit-entry hGetAll in dashboard routes/dashboard.ts: same parallelization for the 20-entry recent-actions fetch and the per-draft-rule dry-run fetch. ### #5 Outdated comment on /internal/trigger/on-app-install routes/triggers.ts: the comment claimed the submit triggers seed in-band, but they actually fail-safe by returning ok when no bundle exists. Updated to describe the real flow: starter rules are seeded by the deferred /internal/scheduler/seed-on-install task, registered in devvit.json's scheduler.tasks block. ## Verification - `npm run check` 4/4 gates green: typecheck + lint + Prettier + 211 tests (1 skipped) + 3 @devvit/test cases + acceptance G1-G4 - All PR #45 tests still pass with no test-side modifications
5 review issues from Gemini code-assist + CodeRabbit on PR #49. ## Major ### CodeRabbit #4 — pending consumption now atomic (WATCH/MULTI/DEL/EXEC) Two concurrent submits with the same `pendingId` (back-button + re-submit, double-click on Save, multi-tab moderator) could both read the entry before either deleted it, doubling the bundle write + dry-run schedule. Replaced the plain GET-then-DEL with a WATCH/MULTI/DEL/EXEC round-trip. Whichever caller commits first wins; the loser sees `exec() == null` and gets an actionable "Another submission consumed this confirmation" toast. Updated FakeTxn to match (added `del()` method on the txn object — the production Devvit Redis client supports it, the fake didn't). ### CodeRabbit #5 — daily compile counter now bumps at compile time Quota was previously incremented inside `persistRuleAndStartDryRun()`, which only runs on Save. A moderator could repeatedly hit Compile + Cancel and burn through OpenAI tokens without the per-day counter ever moving. Moved the increment to immediately after a successful OpenAI return in compose-rule-submit. The token cost is real either way, so the quota now reflects it either way. ## Medium ### Gemini #1 — dashboard cancelLabel no longer says "Don't show intro again" Devvit's Cancel button does NOT trigger the form submit handler, so clicking the misnamed button could not actually persist `dismissOnboarding` — it just closed the modal. Reverted the label to "Cancel"; the dismissOnboarding boolean toggle + Close (acceptLabel) is the real opt-out path that submits the form values. ## Minor ### CodeRabbit #2 — verify script tolerates non-int SLOW_MO `int(os.environ.get("SLOW_MO", "0"))` crashed if a user passed e.g. `SLOW_MO=fast`. Added `_safe_int` helper that warns + falls back to the default rather than ending the recording session before it starts. ### CodeRabbit #3 — atomic SET + TTL via the `expiration` option Previous shape was `redis.set(key, value)` then `redis.expire(key, ttl)`, which could leak a TTL-less pending key if expire failed after set succeeded. Switched to `redis.set(key, value, { expiration: new Date(...) })` which is the same single round-trip the rollback writer in executor.ts already uses. ## Verification - `npm run check` 4/4 gates green (211 unit/integration tests) - All compose tests still pass with the new pending-id flow - Race condition smoke: a second compose-confirm-submit call against the same pendingId now returns the "another submission consumed" toast instead of double-persisting (previously: double bundle write).
Summary
Brings vibe-mod from "no tests, doesn't typecheck" to "148 tests passing,
tsc --noEmitclean, CI green" — everything that can be done before the manual Devvit-wizard step.What's in here
Testing layer (was 0 tests → 148)
test/setup.ts— vitest global setup: in-memory Redis double; Devvit SDK doubles whose shapes track@devvit/web@0.12.x(getCurrentSubreddit() → {id,name},getModerators() → Listing, karma lookup→ {fromComments,fromPosts},reddit.report(thing,opts),modMail.createModNotification); an OpenAIfetchdouble.rule-schema,evaluator,executor,fact-bag,system-prompt,starter-rules.app.fetch()with Devvit + OpenAI mocked:routes-compose,routes-dashboard,routes-undo,routes-triggers,routes-scheduler,routes-settings.@devvit/webSDK alignment (npx tsc --noEmit: ~45 errors → 0)src/server/devvit-helpers.ts—getCurrentSubreddit{Name,Ref},asT1/asT3branded-id casts.reddit.getCurrentSubredditName()→reddit.getCurrentSubreddit().name;getPostById/getCommentByIdtaket3_/t1_ids;thing.report()→reddit.report(thing,{reason});getModerators()/Listing→.all();getUserKarmaFromCurrentSubreddit(name)→{fromComments,fromPosts};modMail.create→modMail.createModNotification(...); noredis.zCount→ count a last-hour score window viazRange; noTaskRequest/TaskResponsein@devvit/web/shared→ localTaskBody/TaskAck.Bug fixes (found while writing tests)
executor.tswroteaudit,rollback:…,circuit:openun-scoped while the Dashboard, Undo and rate-limit-breaker handlers read${sub}:audit,${sub}:audit:${id},${sub}:circuit:open. Dashboard/Undo therefore never found real audit entries and the per-sub breaker checked the wrong key. All audit/rollback/breaker keys are now sub-scoped consistently (circuit:beta_freezestays global by design).devvit.json'sactivateRuleForm → /internal/form/activate-rule(no such route; activation goes throughdashboardForm). Caught by acceptance G1.Starter rules
src/shared/starter-rules.ts— 5 conservative seed rules (SAFE actions only,shadow:true).onAppInstallseeds them as a draft bundle; the mod activates from the Dashboard.Tooling
scripts/acceptance.ts—npm run acceptance: G1 app wiring · G2 compiler↔schema · G3 rollback/dry-run/log · G4 hardening (starter rules +tsc --noEmit+ full test suite). 4/4..github/workflows/ci.yml— push/PR: install →tsc --noEmit→npm test→npm run acceptance.Not in scope (needs the Devvit wizard /
npm run dev)npm run acceptanceoutput..devvit-app-id(wizard) +devvit build.Verify
npx tsc --noEmit— clean ·npx vitest run— 148 passed (12 files) ·npm run acceptance— 4/4 · CI green.🤖 Generated with Claude Code