Skip to content

feat: added logging to worker for link unfurling#48

Merged
BuckyMcYolo merged 4 commits into
mainfrom
dev
Apr 6, 2026
Merged

feat: added logging to worker for link unfurling#48
BuckyMcYolo merged 4 commits into
mainfrom
dev

Conversation

@BuckyMcYolo
Copy link
Copy Markdown
Owner

@BuckyMcYolo BuckyMcYolo commented Apr 5, 2026

Summary

This PR adds comprehensive structured logging and safer redirect handling to the link unfurling worker job, improving observability and protecting against SSRF/private-IP targets during OG metadata scraping.

Changes

apps/worker/src/jobs/link-unfurl.ts

  • Added structured logging via imported logger across the file (info/warn/error) with contextual objects.

  • URL safety validation (isSafeUrl)

    • Validates protocol and hostname against private/localhost patterns.
    • Performs DNS lookup and logs a warning with { hostname, address } when a resolved IP matches PRIVATE_IP_REGEX and returns false.
    • Catches and logs DNS/URL parse errors with { err, url } and returns false.
  • Redirect resolution (resolveRedirects)

    • Introduced MAX_REDIRECTS = 5.
    • New helper performs manual HEAD requests with redirect: "manual" and a user-agent, follows up to MAX_REDIRECTS, resolves relative Location headers, validates each hop with isSafeUrl, logs and returns null for unsafe targets, missing Location, fetch failures, or too many redirects.
    • Logs "Too many redirects" when the limit is exceeded and warns on fetch failures or unsafe redirect targets.
  • OG scraping (fetchOgEmbed)

    • Applies proxy rules (unchanged mapping logic) to rewrite certain hostnames (e.g., X/Twitter -> fxtwitter.com) before safety checks and scraping.
    • Skips and logs unsafe fetch targets.
    • Resolves redirects before scraping; logs and returns null when redirect resolution fails.
    • Calls open-graph-scraper (OGS) using the resolved URL with a timeout and fetchOptions (user-agent, redirect: "error").
    • Logs warnings when OGS returns an error or unsuccessful result; logs info on success (includes title and whether an image is present); logs error with thrown exception on failures.
    • Builds Embed using resolved OG fields and proxy siteName when applicable.
  • Job processing (createLinkUnfurlProcessor)

    • Logs job start with { jobId, messageId, urlCount }.
    • Runs OG fetches in parallel and logs when no embeds are produced (still updates DB to clear embeds).
    • Updates message embeds in DB; logs a warning when no row is updated (message not found).
    • Emits message:embeds:updated via the provided emitter and logs completion with embedCount.

Code Quality / Notes

  • Logging is structured and consistently applied, improving observability for failures and success cases.
  • Redirect hops are validated to mitigate SSRF/private-IP exposure.
  • open-graph-scraper is invoked with redirect handling unchanged (fetchOptions.redirect: "error"); redirects are proactively resolved beforehand with resolveRedirects.

Confidence Score: 4/5

Well-scoped and improves safety and observability. Minor concerns: no tests added for redirect resolution, DNS edge cases, or proxy-rule behavior; consider adding tests and documenting redirect-resolution behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 5, 2026

Warning

Rate limit exceeded

@BuckyMcYolo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 29 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 16 minutes and 29 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6c54c96a-05c5-434d-97c0-064ef19082e7

📥 Commits

Reviewing files that changed from the base of the PR and between e35d011 and cb0ac05.

📒 Files selected for processing (1)
  • apps/worker/src/jobs/link-unfurl.ts
📝 Walkthrough

Walkthrough

Adds structured logging and a redirect-resolution helper to the link-unfurl worker; URL safety checks now log DNS/private-IP failures and errors; redirects are manually followed (limit 5) and validated; OG fetching uses resolved targets; job lifecycle and DB/update events are logged.

Changes

Cohort / File(s) Summary
Link Unfurl Worker
apps/worker/src/jobs/link-unfurl.ts
Adds logger throughout URL safety checks, redirect resolution, OG scraping, and job processing. Introduces MAX_REDIRECTS and resolveRedirects(startUrl) to follow and validate redirects (HEAD + manual redirects). Updates fetchOgEmbed to skip unsafe URLs, resolve redirects before scraping, and log warnings/errors/successes. Logs job start, no-embed cases, DB update row checks, and post-update emits.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Job as Worker Job
    participant Safe as isSafeUrl
    participant Redirect as resolveRedirects
    participant Fetch as OG Scraper (ogs)
    participant DB as Database
    participant Emit as EventEmitter

    Job->>Safe: Validate initial URL
    alt unsafe
        Safe-->>Job: unsafe -> log warning
    else safe
        Safe-->>Job: safe
        Job->>Redirect: HEAD (manual) follow up to 5
        alt redirect invalid / exceeds limit
            Redirect-->>Job: null -> log & abort
        else resolvedUrl
            Redirect-->>Job: resolvedUrl
            Job->>Fetch: scrape OG from resolvedUrl
            alt scrape error / !success
                Fetch-->>Job: error -> log warning/error
            else success
                Fetch-->>Job: embeds
                Job->>DB: update message embeds
                DB-->>Job: rowsAffected
                Job->>Emit: emit message:embeds:updated
                Emit-->>Job: ack
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through links both near and far,
Checked each IP and chased each redirect star,
I logged the warns, the errors, the cheer,
Resolved the path so images appear—
Now unfurls bloom, neat and clear. 🌷🔎

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: adding logging throughout the link unfurling worker for improved observability and debugging.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/worker/src/jobs/link-unfurl.ts`:
- Around line 46-47: The logs currently emit raw URLs (e.g., urlString,
fetchUrl) which can leak PII/tokens; create a small helper (e.g., sanitizeUrl or
redactQueryParams) that strips or masks the query string and use it wherever
URLs are logged (replace uses in the catch block logging logger.warn({ err, url:
urlString }, ...), and the other logging sites referencing fetchUrl and
urlString around lines 84-85, 101, 108-110, 123). Update all logger calls to
pass the sanitized URL value instead of the raw URL and keep the helper shared
across the file so every URL-bearing field is consistently redacted before
logging.
- Line 96: The fetch call currently uses redirect: "follow", which allows an
attacker to bypass isSafeUrl(fetchUrl) via redirect chains; change the fetch
option to redirect: "error" and then handle redirect responses explicitly: when
the fetch returns a 3xx status, read the Location header and run isSafeUrl() on
that target (and any subsequent candidate locations) before following or
rejecting the request. Update the code around the fetch invocation that uses
fetchUrl (and the surrounding logic that references isSafeUrl) to either reject
on redirects or validate Location headers and only perform a new validated fetch
when isSafeUrl returns true.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 245563e7-f2ad-4018-aa2e-03f43dc9c79d

📥 Commits

Reviewing files that changed from the base of the PR and between fc84f57 and 4ef69c0.

📒 Files selected for processing (1)
  • apps/worker/src/jobs/link-unfurl.ts

Comment thread apps/worker/src/jobs/link-unfurl.ts
Comment thread apps/worker/src/jobs/link-unfurl.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/worker/src/jobs/link-unfurl.ts`:
- Around line 183-186: The early return when embeds.length === 0 leaves stale
data and skips emitting the required RealtimeMessageEmbedsUpdated event; instead
remove the early return in link-unfurl job logic and ensure you always update
the message.embeds column (set to [] when no embeds) and emit the
"message:embeds:updated" event with embeds: [] (matching
RealtimeMessageEmbedsUpdated) even when no embeds are produced; keep the logger
call (e.g., logger.info with job.id and messageId) but change it to note "No
embeds produced, clearing stored embeds" before performing the DB update and
event emission so clients (like use-message-sending.ts) receive the update to
clear their cache.
- Around line 80-111: resolveRedirects currently lets fetch() exceptions bubble
up; wrap the per-iteration fetch call in a try/catch inside resolveRedirects
(around the fetch in the for loop) so any network/DNS/timeout errors (including
AbortError) are caught, log a warning via logger.warn including the
startUrl/current URL and the caught error, and return null instead of throwing;
keep the existing redirect handling (location resolution, isSafeUrl check,
MAX_REDIRECTS logic) unchanged so callers like fetchOgEmbed receive null on
failure rather than an exception.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0a034836-47f3-48ee-8612-f6c71bf308ef

📥 Commits

Reviewing files that changed from the base of the PR and between 4ef69c0 and 9d7df8c.

📒 Files selected for processing (1)
  • apps/worker/src/jobs/link-unfurl.ts

Comment thread apps/worker/src/jobs/link-unfurl.ts
Comment thread apps/worker/src/jobs/link-unfurl.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/worker/src/jobs/link-unfurl.ts`:
- Around line 101-104: The redirect handling can throw when constructing `next`
with `new URL(location, current)`, causing `resolveRedirects()` (called from
`fetchOgEmbed()`) to reject the whole `Promise.all()`; update
`resolveRedirects()` to guard against malformed Location headers by validating
or wrapping the `new URL(location, current)` creation in a try/catch and
returning null (or skipping that redirect) when it fails, so a single bad
redirect does not propagate an exception and instead yields a null/ignored
result for that URL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2a4f5324-98bb-4527-8075-4265a2602388

📥 Commits

Reviewing files that changed from the base of the PR and between 9d7df8c and e35d011.

📒 Files selected for processing (1)
  • apps/worker/src/jobs/link-unfurl.ts

Comment thread apps/worker/src/jobs/link-unfurl.ts Outdated
@BuckyMcYolo BuckyMcYolo merged commit 54f1273 into main Apr 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant