feat(facts): v0.2 fact-layer increment — wordCount / nonAsciiRatio / isLinkPost / real imageCount#17
Conversation
…LinkPost, real imageCount
Adds four content facts derivable entirely from the existing trigger payload (no
new Reddit API calls, no cache changes, no new failure modes):
- content.wordCount — whitespace-delimited body word count
- content.nonAsciiRatio — 0..1 fraction of non-ASCII chars in the body; a crude
"non-Latin / likely non-English" signal without shipping
a language model into the runtime
- content.isLinkPost — true for a link/image/video submission (empty selftext);
always false for comments
- content.imageCount — was hardcoded 0; now a best-effort count of image URLs in
the body (+1 if the post itself links an image: i.redd.it /
i.imgur.com / preview.redd.it / *.png|jpg|gif|webp|... )
Wiring:
- rule-schema.ts: new entries in the closed FactPaths set (FactBag type + system
prompt's FACTS list update automatically).
- fact-bag.ts: pure helpers (nonAsciiRatioOf, wordCountOf, looksLikeImageUrl,
imageUrlCountIn); populated in both buildPostFactBag and buildCommentFactBag.
- system-prompt.ts: "NOTES ON A FEW FACTS" hint block + a few-shot example using
content.isLinkPost; refreshed the stale "gpt-4o-mini" header to gpt-5.4-mini.
- evaluator/executor: no change — they're generic over fact paths and ops.
Tests: +2 fact-bag tests (post & comment new-fact values); starter-rules.test.ts
FactBag literal extended. Also fixed a pre-existing flake in
rule-schema.property.test.ts — the ".strict rejects unknown top-level field"
property could generate "__proto__", which an object literal doesn't make an own
enumerable property and Zod strips anyway, so it's excluded (not a smuggleable
field). 170 tests pass (stable), tsc/lint clean, acceptance 4/4, vite build OK.
This is a deliberately small, low-risk increment toward AutoMod-parity fact
coverage; stateful facts (repost detection, cross-sub spam) and API-backed facts
(per-sub recent-activity counts) are intentionally left for a follow-up — they
need Redis state, new API calls, dry-run-replay support, and their own failure
handling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR enriches post and comment moderation fact bags with new content-derived features: word count, non-ASCII character ratio, and image detection. It expands the rule schema to accept these predicates, adds test coverage, updates fixtures, and documents the new fields in the system prompt with a practical moderation example. ChangesContent fact enrichment
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/server/fact-bag.ts (2)
62-62: ⚡ Quick winURL 정규식 중복 제거 고려
새로 추가된
IMAGE_URL_RE상수는 Line 83과 Line 130의 기존linkRegex와 동일합니다. 유지보수성 향상을 위해 세 위치 모두 동일한 모듈 수준 상수를 사용하도록 통합하는 것을 권장합니다.♻️ 제안된 리팩토링
Line 83과 Line 130의
linkRegex정의를 제거하고IMAGE_URL_RE를 재사용:// In buildPostFactBag (line 83-84): - const linkRegex = /https?:\/\/[^\s)]+/gi; - const links = body.match(linkRegex) ?? []; + const links = body.match(IMAGE_URL_RE) ?? []; // In buildCommentFactBag (line 130-131): - const linkRegex = /https?:\/\/[^\s)]+/gi; - const links = c.body.match(linkRegex) ?? []; + const links = c.body.match(IMAGE_URL_RE) ?? [];참고: 정규식 객체를 재사용하면
lastIndex상태가 유지될 수 있으므로, 각 사용 전에 새로운RegExp인스턴스를 생성하거나 패턴을 복사해야 할 수 있습니다. 또는 함수 스코프에 로컬 상수를 유지하되 패턴 자체를 모듈 수준 문자열로 추출하는 방법도 고려할 수 있습니다.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/fact-bag.ts` at line 62, IMAGE_URL_RE duplicates the /https?:\/\/[^\s)]+/gi pattern used as linkRegex in buildPostFactBag and buildCommentFactBag; remove the local linkRegex declarations and reuse IMAGE_URL_RE by replacing body.match(linkRegex) / c.body.match(linkRegex) with body.match(IMAGE_URL_RE) / c.body.match(IMAGE_URL_RE). To avoid RegExp state bugs from the global flag, either instantiate a fresh RegExp before each match (new RegExp(IMAGE_URL_RE)) or store the pattern as a string and create a RegExp per use; update references in buildPostFactBag and buildCommentFactBag accordingly.
42-51: 💤 Low value
nonAsciiRatioOf함수 성능 최적화 가능현재 구현은 문자열을 두 번 순회합니다:
for...of루프에서 한 번,[...s].length에서 한 번. 동일한 루프에서 총 개수를 세어 하나의 순회로 최적화할 수 있습니다.♻️ 제안된 최적화
function nonAsciiRatioOf(s: string): number { if (s.length === 0) return 0; let nonAscii = 0; + let total = 0; for (const ch of s) { const cp = ch.codePointAt(0) ?? 0; if (cp < 0x20 || cp > 0x7e) nonAscii++; + total++; } - // Iterating with for..of counts code points, so divide by code-point length. - return nonAscii / [...s].length; + return nonAscii / total; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/server/fact-bag.ts` around lines 42 - 51, The function nonAsciiRatioOf currently walks the string twice (once via for...of and once via [...s].length); change it to a single code-point iteration inside nonAsciiRatioOf that increments both total and nonAscii counters in the same loop and then returns nonAscii/total (handle empty string by returning 0 early). Locate the nonAsciiRatioOf function and replace the separate [...s].length usage with the single-loop total counter to avoid the second traversal.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/shared/system-prompt.ts`:
- Line 44: The docstring for content.isLinkPost is too narrow (the “(no text
body)” clause); update the comment/description where content.isLinkPost is
defined so it matches the contract: set content.isLinkPost to true for
link/image/video submissions and false for comments, removing the “no text body”
restriction and any wording that could cause under-classification; update any
adjacent explanatory text or examples that reference content.isLinkPost to
reflect this broader rule.
---
Nitpick comments:
In `@src/server/fact-bag.ts`:
- Line 62: IMAGE_URL_RE duplicates the /https?:\/\/[^\s)]+/gi pattern used as
linkRegex in buildPostFactBag and buildCommentFactBag; remove the local
linkRegex declarations and reuse IMAGE_URL_RE by replacing body.match(linkRegex)
/ c.body.match(linkRegex) with body.match(IMAGE_URL_RE) /
c.body.match(IMAGE_URL_RE). To avoid RegExp state bugs from the global flag,
either instantiate a fresh RegExp before each match (new RegExp(IMAGE_URL_RE))
or store the pattern as a string and create a RegExp per use; update references
in buildPostFactBag and buildCommentFactBag accordingly.
- Around line 42-51: The function nonAsciiRatioOf currently walks the string
twice (once via for...of and once via [...s].length); change it to a single
code-point iteration inside nonAsciiRatioOf that increments both total and
nonAscii counters in the same loop and then returns nonAscii/total (handle empty
string by returning 0 early). Locate the nonAsciiRatioOf function and replace
the separate [...s].length usage with the single-loop total counter to avoid the
second traversal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1abb6730-71df-4311-b45f-fcb9e1018058
📒 Files selected for processing (6)
src/server/fact-bag.test.tssrc/server/fact-bag.tssrc/shared/rule-schema.property.test.tssrc/shared/rule-schema.tssrc/shared/starter-rules.test.tssrc/shared/system-prompt.ts
…rding (review) Addresses CodeRabbit feedback on PR #17: - factor the duplicated /https?:\/\/[^\s)]+/gi out of imageUrlCountIn and both fact-bag builders into one module-level URL_RE (safe to share: String#match with a /g regex doesn't touch lastIndex); - drop the "(no text body)" parenthetical from the content.isLinkPost line in the system prompt — the implementation-detail belongs in rule-schema.ts, the prompt just needs the contract ("true for a link/image/video submission; false for comments") so the model doesn't under-classify. No behavior change; 170 tests pass, tsc/lint/prettier clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rding (review) Addresses CodeRabbit feedback on PR #17: - factor the duplicated /https?:\/\/[^\s)]+/gi out of imageUrlCountIn and both fact-bag builders into one module-level URL_RE (safe to share: String#match with a /g regex doesn't touch lastIndex); - drop the "(no text body)" parenthetical from the content.isLinkPost line in the system prompt — the implementation-detail belongs in rule-schema.ts, the prompt just needs the contract ("true for a link/image/video submission; false for comments") so the model doesn't under-classify. No behavior change; 170 tests pass, tsc/lint/prettier clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(facts): v0.2 fact-layer increment — wordCount / nonAsciiRatio / isLinkPost / real imageCount
What
First, deliberately small increment toward wider fact coverage (the "v0.2 fact layer" item from
HANDOFF.md). Adds four content facts that are pure functions of the existing trigger payload — no new Reddit API calls, no cache changes, no new failure modes:content.wordCountcontent.nonAsciiRatiocontent.isLinkPostcontent.imageCount0→ now a best-effort count of image URLs in the body (+1 if the post itself links an image:i.redd.it/i.imgur.com/preview.redd.it/*.png|jpg|gif|webp|...)Now expressible in one English sentence: "send to modqueue any link post from an account < 7 days old", "report comments under 5 words", "flag posts that are mostly non-Latin text", "remove posts with more than 3 images from low-karma accounts".
Wiring
rule-schema.ts— new entries in the closedFactPathsset.FactBagtype and the system prompt'sFACTSlist update automatically (both derive fromFactPaths).fact-bag.ts— pure helpers (nonAsciiRatioOf,wordCountOf,looksLikeImageUrl,imageUrlCountIn); populated in bothbuildPostFactBagandbuildCommentFactBag(commentisLinkPostis alwaysfalse).system-prompt.ts— added a short "NOTES ON A FEW FACTS" hint block + a few-shot example usingcontent.isLinkPost; refreshed the stalegpt-4o-miniheader comment togpt-5.4-mini(the actual default).evaluator.ts/executor.ts— no change; both are generic over fact paths and ops.Tests
fact-bag.test.ts— +2 tests covering the new facts for post and comment bodies (word/image counts, non-ASCII ratio, link-post detection).starter-rules.test.ts— extended theFactBagliteral with the three new keys (the "emits exactly the closed key set" tests already cover presence).rule-schema.property.test.ts— fixed a pre-existing flake: the.strict()-rejects-unknown-top-level-field property could randomly generate"__proto__", which an object literal doesn't turn into an own enumerable property and which Zod strips anyway (prototype-pollution guard) — so it can't be "smuggled" past validation and isn't a meaningful "unknown field" for that test. Now excluded from the generator. This flake was intermittently failing the pre-push hook / CI.Verification:
170 tests pass(stable across repeated runs),npm run test:devvit3/3,tsc --noEmitclean, ESLint 0 warnings,npm run acceptance4/4,vite build→dist/server/index.cjsOK.Not in scope (follow-ups)
Stateful facts (repost detection, cross-subreddit spam patterns) and API-backed facts (per-sub recent-activity counts, true
subJoinAgeHours,hasVerifiedEmail) — those need Redis state, new Reddit API calls, dry-run-replay support, and their own failure handling, so they're left for a separate PR.HANDOFF.md's "fact layer is narrow" note still stands; this just chips at it safely.🤖 Generated with Claude Code
Summary by CodeRabbit
새로운 기능 및 개선 사항
새로운 기능
문서