Skip to content

feat(facts): v0.2 fact-layer increment — wordCount / nonAsciiRatio / isLinkPost / real imageCount#17

Merged
ComBba merged 2 commits into
mainfrom
feat/fact-layer-v0.2
May 13, 2026
Merged

feat(facts): v0.2 fact-layer increment — wordCount / nonAsciiRatio / isLinkPost / real imageCount#17
ComBba merged 2 commits into
mainfrom
feat/fact-layer-v0.2

Conversation

@ComBba
Copy link
Copy Markdown
Contributor

@ComBba ComBba commented May 13, 2026

What

First, deliberately small increment toward wider fact coverage (the "v0.2 fact layer" item from HANDOFF.md). Adds four content facts that are pure functions of the existing trigger payload — no new Reddit API calls, no cache changes, no new failure modes:

Fact Type Meaning
content.wordCount number whitespace-delimited word count of the body
content.nonAsciiRatio number 0..1 fraction of non-ASCII chars in the body — a crude "non-Latin / likely non-English" signal, without shipping a language-detection model into the runtime
content.isLinkPost boolean true for a link/image/video submission (empty selftext); always false for comments
content.imageCount number was hardcoded 0 → now a best-effort count of image URLs in the body (+1 if the post itself links an image: i.redd.it / i.imgur.com / preview.redd.it / *.png|jpg|gif|webp|...)

Now expressible in one English sentence: "send to modqueue any link post from an account < 7 days old", "report comments under 5 words", "flag posts that are mostly non-Latin text", "remove posts with more than 3 images from low-karma accounts".

Wiring

  • rule-schema.ts — new entries in the closed FactPaths set. FactBag type and the system prompt's FACTS list update automatically (both derive from FactPaths).
  • fact-bag.ts — pure helpers (nonAsciiRatioOf, wordCountOf, looksLikeImageUrl, imageUrlCountIn); populated in both buildPostFactBag and buildCommentFactBag (comment isLinkPost is always false).
  • system-prompt.ts — added a short "NOTES ON A FEW FACTS" hint block + a few-shot example using content.isLinkPost; refreshed the stale gpt-4o-mini header comment to gpt-5.4-mini (the actual default).
  • evaluator.ts / executor.tsno change; both are generic over fact paths and ops.

Tests

  • fact-bag.test.ts — +2 tests covering the new facts for post and comment bodies (word/image counts, non-ASCII ratio, link-post detection).
  • starter-rules.test.ts — extended the FactBag literal with the three new keys (the "emits exactly the closed key set" tests already cover presence).
  • rule-schema.property.test.tsfixed a pre-existing flake: the .strict()-rejects-unknown-top-level-field property could randomly generate "__proto__", which an object literal doesn't turn into an own enumerable property and which Zod strips anyway (prototype-pollution guard) — so it can't be "smuggled" past validation and isn't a meaningful "unknown field" for that test. Now excluded from the generator. This flake was intermittently failing the pre-push hook / CI.

Verification: 170 tests pass (stable across repeated runs), npm run test:devvit 3/3, tsc --noEmit clean, ESLint 0 warnings, npm run acceptance 4/4, vite builddist/server/index.cjs OK.

Not in scope (follow-ups)

Stateful facts (repost detection, cross-subreddit spam patterns) and API-backed facts (per-sub recent-activity counts, true subJoinAgeHours, hasVerifiedEmail) — those need Redis state, new Reddit API calls, dry-run-replay support, and their own failure handling, so they're left for a separate PR. HANDOFF.md's "fact layer is narrow" note still stands; this just chips at it safely.

🤖 Generated with Claude Code

Summary by CodeRabbit

새로운 기능 및 개선 사항

  • 새로운 기능

    • 게시물과 댓글 분석을 위한 추가 콘텐츠 메트릭 도입: 단어 수, 비-ASCII 문자 비율, 이미지 수, 링크 게시물 식별
    • 모더레이션 규칙에서 새로운 메트릭 활용 가능
  • 문서

    • 콘텐츠 메트릭 정의 및 범위에 대한 설명 추가
    • 새로운 규칙 활용 예시 추가

Review Change Stack

…LinkPost, real imageCount

Adds four content facts derivable entirely from the existing trigger payload (no
new Reddit API calls, no cache changes, no new failure modes):

- content.wordCount      — whitespace-delimited body word count
- content.nonAsciiRatio  — 0..1 fraction of non-ASCII chars in the body; a crude
                           "non-Latin / likely non-English" signal without shipping
                           a language model into the runtime
- content.isLinkPost     — true for a link/image/video submission (empty selftext);
                           always false for comments
- content.imageCount     — was hardcoded 0; now a best-effort count of image URLs in
                           the body (+1 if the post itself links an image: i.redd.it /
                           i.imgur.com / preview.redd.it / *.png|jpg|gif|webp|... )

Wiring:
- rule-schema.ts: new entries in the closed FactPaths set (FactBag type + system
  prompt's FACTS list update automatically).
- fact-bag.ts: pure helpers (nonAsciiRatioOf, wordCountOf, looksLikeImageUrl,
  imageUrlCountIn); populated in both buildPostFactBag and buildCommentFactBag.
- system-prompt.ts: "NOTES ON A FEW FACTS" hint block + a few-shot example using
  content.isLinkPost; refreshed the stale "gpt-4o-mini" header to gpt-5.4-mini.
- evaluator/executor: no change — they're generic over fact paths and ops.

Tests: +2 fact-bag tests (post & comment new-fact values); starter-rules.test.ts
FactBag literal extended. Also fixed a pre-existing flake in
rule-schema.property.test.ts — the ".strict rejects unknown top-level field"
property could generate "__proto__", which an object literal doesn't make an own
enumerable property and Zod strips anyway, so it's excluded (not a smuggleable
field). 170 tests pass (stable), tsc/lint clean, acceptance 4/4, vite build OK.

This is a deliberately small, low-risk increment toward AutoMod-parity fact
coverage; stateful facts (repost detection, cross-sub spam) and API-backed facts
(per-sub recent-activity counts) are intentionally left for a follow-up — they
need Redis state, new API calls, dry-run-replay support, and their own failure
handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@ComBba has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 23 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 44002d82-6ce6-4ee2-834b-258145fbcdc9

📥 Commits

Reviewing files that changed from the base of the PR and between 3e08ff0 and a916b63.

📒 Files selected for processing (2)
  • src/server/fact-bag.ts
  • src/shared/system-prompt.ts

Walkthrough

This PR enriches post and comment moderation fact bags with new content-derived features: word count, non-ASCII character ratio, and image detection. It expands the rule schema to accept these predicates, adds test coverage, updates fixtures, and documents the new fields in the system prompt with a practical moderation example.

Changes

Content fact enrichment

Layer / File(s) Summary
Schema expansion for new content facts
src/shared/rule-schema.ts, src/shared/rule-schema.property.test.ts
FactPaths constant expanded to include content.wordCount, content.nonAsciiRatio, and content.isLinkPost. Property test updated to cover __proto__ prototype-pollution handling in schema validation.
Content scoring helper functions
src/server/fact-bag.ts
New helpers compute non-ASCII character ratio, whitespace-tokenized word count, and heuristic image URL detection (regex + host/extension checks). Helpers are shared by post and comment fact builders.
Post and comment fact enrichment
src/server/fact-bag.ts
buildPostFactBag derives imageCount from body URLs and post URL, computes wordCount and nonAsciiRatio from normalized body, and updates containsRegex to use normalized body. buildCommentFactBag similarly computes all four new fields and sets isLinkPost=false.
Test coverage for fact computation
src/server/fact-bag.test.ts
New test cases validate buildPostFactBag across mixed-link, image-post, and non-ASCII scenarios. Extended buildCommentFactBag tests verify word count, image detection, non-ASCII ratio, and isLinkPost=false assertion.
Test fixture updates
src/shared/starter-rules.test.ts
Base FactBag fixture extended with new content fields to ensure rule-matching tests evaluate rules against the complete fact set.
System prompt documentation and examples
src/shared/system-prompt.ts
Added "NOTES ON A FEW FACTS" section documenting word count, image count, ratio semantics, non-ASCII detection, and link-post heuristics. New few-shot rule example routes link posts from accounts under 7 days old to modqueue.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • Two-Weeks-Team/vibe-mod#1: Introduced the original Vitest suites (src/server/fact-bag.test.ts, src/shared/starter-rules.test.ts, src/shared/system-prompt.test.ts) that validate fact paths; this PR extends those test suites with the new content fact computations.

Poem

🐰 New facts unfold with every post and quip,
Word counts dance, non-ASCII takes a dip,
Images emerge from links both near and far,
Our rules now see each moderation star!
Content shines bright in the logic's gleam. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main changes: four new content facts (wordCount, nonAsciiRatio, isLinkPost, and real imageCount) added to the fact-layer v0.2 increment.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/fact-layer-v0.2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/server/fact-bag.ts (2)

62-62: ⚡ Quick win

URL 정규식 중복 제거 고려

새로 추가된 IMAGE_URL_RE 상수는 Line 83과 Line 130의 기존 linkRegex와 동일합니다. 유지보수성 향상을 위해 세 위치 모두 동일한 모듈 수준 상수를 사용하도록 통합하는 것을 권장합니다.

♻️ 제안된 리팩토링

Line 83과 Line 130의 linkRegex 정의를 제거하고 IMAGE_URL_RE를 재사용:

// In buildPostFactBag (line 83-84):
- const linkRegex = /https?:\/\/[^\s)]+/gi;
- const links = body.match(linkRegex) ?? [];
+ const links = body.match(IMAGE_URL_RE) ?? [];

// In buildCommentFactBag (line 130-131):
- const linkRegex = /https?:\/\/[^\s)]+/gi;
- const links = c.body.match(linkRegex) ?? [];
+ const links = c.body.match(IMAGE_URL_RE) ?? [];

참고: 정규식 객체를 재사용하면 lastIndex 상태가 유지될 수 있으므로, 각 사용 전에 새로운 RegExp 인스턴스를 생성하거나 패턴을 복사해야 할 수 있습니다. 또는 함수 스코프에 로컬 상수를 유지하되 패턴 자체를 모듈 수준 문자열로 추출하는 방법도 고려할 수 있습니다.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/fact-bag.ts` at line 62, IMAGE_URL_RE duplicates the
/https?:\/\/[^\s)]+/gi pattern used as linkRegex in buildPostFactBag and
buildCommentFactBag; remove the local linkRegex declarations and reuse
IMAGE_URL_RE by replacing body.match(linkRegex) / c.body.match(linkRegex) with
body.match(IMAGE_URL_RE) / c.body.match(IMAGE_URL_RE). To avoid RegExp state
bugs from the global flag, either instantiate a fresh RegExp before each match
(new RegExp(IMAGE_URL_RE)) or store the pattern as a string and create a RegExp
per use; update references in buildPostFactBag and buildCommentFactBag
accordingly.

42-51: 💤 Low value

nonAsciiRatioOf 함수 성능 최적화 가능

현재 구현은 문자열을 두 번 순회합니다: for...of 루프에서 한 번, [...s].length에서 한 번. 동일한 루프에서 총 개수를 세어 하나의 순회로 최적화할 수 있습니다.

♻️ 제안된 최적화
 function nonAsciiRatioOf(s: string): number {
   if (s.length === 0) return 0;
   let nonAscii = 0;
+  let total = 0;
   for (const ch of s) {
     const cp = ch.codePointAt(0) ?? 0;
     if (cp < 0x20 || cp > 0x7e) nonAscii++;
+    total++;
   }
-  // Iterating with for..of counts code points, so divide by code-point length.
-  return nonAscii / [...s].length;
+  return nonAscii / total;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/server/fact-bag.ts` around lines 42 - 51, The function nonAsciiRatioOf
currently walks the string twice (once via for...of and once via [...s].length);
change it to a single code-point iteration inside nonAsciiRatioOf that
increments both total and nonAscii counters in the same loop and then returns
nonAscii/total (handle empty string by returning 0 early). Locate the
nonAsciiRatioOf function and replace the separate [...s].length usage with the
single-loop total counter to avoid the second traversal.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/shared/system-prompt.ts`:
- Line 44: The docstring for content.isLinkPost is too narrow (the “(no text
body)” clause); update the comment/description where content.isLinkPost is
defined so it matches the contract: set content.isLinkPost to true for
link/image/video submissions and false for comments, removing the “no text body”
restriction and any wording that could cause under-classification; update any
adjacent explanatory text or examples that reference content.isLinkPost to
reflect this broader rule.

---

Nitpick comments:
In `@src/server/fact-bag.ts`:
- Line 62: IMAGE_URL_RE duplicates the /https?:\/\/[^\s)]+/gi pattern used as
linkRegex in buildPostFactBag and buildCommentFactBag; remove the local
linkRegex declarations and reuse IMAGE_URL_RE by replacing body.match(linkRegex)
/ c.body.match(linkRegex) with body.match(IMAGE_URL_RE) /
c.body.match(IMAGE_URL_RE). To avoid RegExp state bugs from the global flag,
either instantiate a fresh RegExp before each match (new RegExp(IMAGE_URL_RE))
or store the pattern as a string and create a RegExp per use; update references
in buildPostFactBag and buildCommentFactBag accordingly.
- Around line 42-51: The function nonAsciiRatioOf currently walks the string
twice (once via for...of and once via [...s].length); change it to a single
code-point iteration inside nonAsciiRatioOf that increments both total and
nonAscii counters in the same loop and then returns nonAscii/total (handle empty
string by returning 0 early). Locate the nonAsciiRatioOf function and replace
the separate [...s].length usage with the single-loop total counter to avoid the
second traversal.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1abb6730-71df-4311-b45f-fcb9e1018058

📥 Commits

Reviewing files that changed from the base of the PR and between 1444c93 and 3e08ff0.

📒 Files selected for processing (6)
  • src/server/fact-bag.test.ts
  • src/server/fact-bag.ts
  • src/shared/rule-schema.property.test.ts
  • src/shared/rule-schema.ts
  • src/shared/starter-rules.test.ts
  • src/shared/system-prompt.ts

Comment thread src/shared/system-prompt.ts Outdated
…rding (review)

Addresses CodeRabbit feedback on PR #17:
- factor the duplicated /https?:\/\/[^\s)]+/gi out of imageUrlCountIn and both
  fact-bag builders into one module-level URL_RE (safe to share: String#match
  with a /g regex doesn't touch lastIndex);
- drop the "(no text body)" parenthetical from the content.isLinkPost line in the
  system prompt — the implementation-detail belongs in rule-schema.ts, the prompt
  just needs the contract ("true for a link/image/video submission; false for
  comments") so the model doesn't under-classify.

No behavior change; 170 tests pass, tsc/lint/prettier clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ComBba ComBba merged commit a08dfa4 into main May 13, 2026
2 checks passed
@ComBba ComBba deleted the feat/fact-layer-v0.2 branch May 14, 2026 06:37
ComBba pushed a commit that referenced this pull request May 15, 2026
…rding (review)

Addresses CodeRabbit feedback on PR #17:
- factor the duplicated /https?:\/\/[^\s)]+/gi out of imageUrlCountIn and both
  fact-bag builders into one module-level URL_RE (safe to share: String#match
  with a /g regex doesn't touch lastIndex);
- drop the "(no text body)" parenthetical from the content.isLinkPost line in the
  system prompt — the implementation-detail belongs in rule-schema.ts, the prompt
  just needs the contract ("true for a link/image/video submission; false for
  comments") so the model doesn't under-classify.

No behavior change; 170 tests pass, tsc/lint/prettier clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ComBba added a commit that referenced this pull request May 15, 2026
feat(facts): v0.2 fact-layer increment — wordCount / nonAsciiRatio / isLinkPost / real imageCount
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant