Skip to content

feat: fixed link unfurling worker#49

Merged
BuckyMcYolo merged 1 commit intomainfrom
dev
Apr 6, 2026
Merged

feat: fixed link unfurling worker#49
BuckyMcYolo merged 1 commit intomainfrom
dev

Conversation

@BuckyMcYolo
Copy link
Copy Markdown
Owner

@BuckyMcYolo BuckyMcYolo commented Apr 6, 2026

Summary

This PR introduces refactored link unfurling functionality with improved security and cleaner architecture:

Changes Made:

Link Unfurling Worker (apps/worker/src/jobs/link-unfurl.ts) - Completely new implementation

  • Replaces previous redirect-handling approach with delegation to open-graph-scraper
  • Implements SSRF protection via isSafeUrl() checks on DNS lookups, private IP ranges, and post-redirect URLs
  • Includes proxy rules for Twitter/X (routes through fxtwitter.com for OG data)
  • Validates final URL after redirects against initial fetch URL
  • Comprehensive error handling and structured logging

URL Extraction (apps/realtime/src/index.ts)

  • Updated regex pattern to exclude bracket characters: /https?:\/\/[^\s<>"[\]]+/g
  • Deduplicates URLs via Set conversion before enqueueing link-unfurl jobs
  • Strips trailing punctuation from extracted URLs

Markdown Normalization (apps/web/src/lib/editor-utils.ts)

  • Adds regex pattern to strip bare markdown links where text equals href: \[([^\]]+)\]\(\1\)
  • Handles autolinked URLs without TipTap's ++ wrappers

Configuration (turbo.json)

  • Updated Turborepo UI mode from stream-with-experimental-timestamps to stream

Concerns:

  1. Critical Bug: OG_FETCH_TIMEOUT_MS = 5 is set to 5 milliseconds, which is far too short for HTTP requests. This will cause timeout failures on virtually all link unfurl attempts. Should likely be 5000 (5 seconds).

  2. No Test Coverage: PR introduces substantial new functionality with no visible test cases. No unit or integration tests provided for the link unfurling logic.

  3. Incomplete Context: The summary indicates code was removed (-61 lines in link-unfurl.ts), but the git history shows this as a new file, making it unclear what functionality was replaced.

Strengths:

  • Strong SSRF protection with multiple validation layers
  • Clean, well-typed code with comprehensive error handling
  • Proper separation of concerns using job queue pattern
  • Good security hardening

Confidence Score: 2/5

The PR shows solid engineering practices in security and architecture, but the timeout configuration appears to be a critical bug that would render the feature non-functional. The lack of test coverage for core functionality and the mysterious reference to removed code in the summary add uncertainty. This requires fixes before merging.

@BuckyMcYolo BuckyMcYolo merged commit 92a0e12 into main Apr 6, 2026
2 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 98b0cf6b-da0e-4334-84ce-17581db811f6

📥 Commits

Reviewing files that changed from the base of the PR and between 54f1273 and c791907.

📒 Files selected for processing (4)
  • apps/realtime/src/index.ts
  • apps/web/src/lib/editor-utils.ts
  • apps/worker/src/jobs/link-unfurl.ts
  • turbo.json

📝 Walkthrough

Walkthrough

This PR refines URL and link processing across multiple applications. Changes include URL extraction with deduplication via Set in realtime, Markdown link normalization in the web editor, delegation of redirect handling to open-graph-scraper in the worker (removing manual redirect following), and a Turborepo UI configuration update.

Changes

Cohort / File(s) Summary
Link Extraction & Normalization
apps/realtime/src/index.ts, apps/web/src/lib/editor-utils.ts
URL extraction now excludes bracket characters from regex patterns and deduplicates normalized URLs via Set. Markdown link normalization strips bare links matching [...](same_text) pattern to plain text.
Redirect Handling Delegation
apps/worker/src/jobs/link-unfurl.ts
Removed manual redirect following logic (resolveRedirects); now delegates redirect handling to open-graph-scraper library. Added post-scrape validation: verifies final URL with isSafeUrl when it differs from initial fetch URL. Reduced timeout from 5000ms to 5ms.
Build Configuration
turbo.json
Updated Turborepo UI mode from stream-with-experimental-timestamps to stream.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 Through URLs long, this rabbit did bound,
Deduping and normalizing what could be found,
Scrapers now handle the redirects with grace,
While safety checks keep bad URLs at bay—what a place!
Our links flow pure through the digital space! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant