feat: add link previews with OG embeds and background worker#14
Conversation
service Introduce apps/worker BullMQ service for async link unfurling — extracts URLs from sent messages, fetches Open Graph metadata, persists embeds to DB, and broadcasts updates via @socket.io/redis-emitter. Add Discord-style embed cards to chat UI with full-width thumbnails, linked titles, and site name footer. Proxy X/Twitter URLs through fxtwitter.com for OG fetching. Add @tiptap/extension-link for auto-linking pasted URLs in the composer, with ++markdown wrapper stripping for clean stored content.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds link-unfurling: messages gain embeds, realtime server enqueues unfurl jobs to BullMQ, a new worker fetches Open Graph data and updates message embeds in the DB, and the frontend renders embeds and listens for realtime embed updates. Changes
Sequence DiagramsequenceDiagram
participant Client as Web Client
participant Realtime as Realtime Server
participant Queue as BullMQ Queue
participant Worker as Worker Service
participant DB as Database
participant OG as OG Scraper
Client->>Realtime: Send message with URL
Realtime->>Client: Acknowledge message (includes embeds: [])
Realtime->>Queue: Enqueue LinkUnfurlJob (messageId, channelId, urls)
Queue->>Worker: Deliver job
Worker->>OG: Fetch Open Graph metadata (proxy rules, timeout)
OG-->>Worker: Return metadata (embed)
Worker->>DB: Update message.embeds
Worker->>Realtime: Emit message:embeds:updated (channelId, messageId, embeds)
Realtime->>Client: Broadcast embed update
Client->>Client: Update UI and render EmbedCard(s)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/src/lib/helpers/openapi/message-schemas.ts`:
- Around line 19-25: The url and thumbnail fields on the messageEmbedSchema
currently accept any string; update their validators in messageEmbedSchema so
they only allow http or https URLs (e.g., replace z.string() for url and
thumbnail with a validator that enforces the scheme, such as
z.string().url().refine(u => /^https?:\/\//i.test(u)) or
z.string().regex(/^https?:\/\//i), keeping the same optionality for thumbnail
and preserving the messageEmbedSchema export).
In `@apps/realtime/src/index.ts`:
- Around line 419-434: The current parseRedisUrl function discards TLS scheme
and DB index; instead provide the raw REDIS_URL string to BullMQ so URL
semantics are preserved—replace the connection object used when creating
linkUnfurlQueue (the Queue<LinkUnfurlJobData> instance for LINK_UNFURL_QUEUE) to
pass env.REDIS_URL (or { url: env.REDIS_URL }) directly rather than
parseRedisUrl(...) and make the equivalent change for the Worker construction in
the worker setup so both Queue and Worker use the full REDIS_URL string.
In `@apps/web/src/components/chat/composer/message-input.tsx`:
- Around line 105-121: In toStoredMarkdown, the second replace uses a
character-class that breaks on '+' so wrappers around text containing '+' (like
URLs) won't be captured; update the replace call in toStoredMarkdown that
currently targets the "++...++" wrapper to instead capture any characters
(including '+' and newlines) non-greedily between the opening and closing "++"
so it matches the full wrapped content (i.e., match "++", capture any chars
until the next "++" non-greedily, then return the capture group).
In `@apps/web/src/components/chat/embed-card.tsx`:
- Around line 13-52: hasMeta currently includes embed.siteName causing the
fallback clickable URL to be suppressed when only siteName is present; change
hasMeta to only reflect embed.title or embed.description (e.g., const hasMeta =
Boolean(embed.title || embed.description)) so the fallback link block (the
anchor that renders when !hasMeta) will appear for site-name-only embeds, and
update the thumbnail rendering check (the condition using embed.thumbnail &&
hasMeta) to instead allow thumbnail when siteName exists as well (e.g.,
embed.thumbnail && (embed.title || embed.description || embed.siteName)) so
thumbnails still render correctly.
In `@apps/worker/src/index.ts`:
- Around line 32-46: Add a worker-level "error" event handler for
linkUnfurlWorker to catch connection and runtime errors that aren't tied to
individual jobs; locate where linkUnfurlWorker is instantiated (new Worker with
LINK_UNFURL_QUEUE and createLinkUnfurlProcessor(emitter)) and register
linkUnfurlWorker.on("error", handler) that logs the error (include error
stack/message and contextual info such as LINK_UNFURL_QUEUE and worker id) and
triggers any necessary alerts or shutdown logic consistent with existing failure
handling.
- Around line 9-17: The parseRedisUrl function currently only returns
host/port/username/password; update it to also detect TLS and the selected DB:
inspect parsed.protocol and set a boolean like tls (true when parsed.protocol
=== 'rediss:'), extract the database index from parsed.pathname (strip leading
'/' and parseInt, default 0 when empty/invalid), and ensure port falls back to
6379 when parsed.port is empty; update the returned object from parseRedisUrl to
include tls and db fields so callers can configure secure connections and select
the correct Redis database.
In `@apps/worker/src/jobs/link-unfurl.ts`:
- Around line 95-106: The DB update may affect zero rows if the message was
deleted; modify the logic around the update of schema.message (the call using
db.update(...).set(...).where(eq(schema.message.id, messageId))) to verify the
update affected a row before emitting the realtime payload
(RealtimeMessageEmbedsUpdated) via
emitter.to(channelRoom(channelId)).emit("message:embeds:updated", payload).
Concretely, check the result/rowCount from the update (or select the message by
messageId first) and only construct/emit the payload when the message exists
(i.e., update affected >0); otherwise skip emitting.
- Around line 34-43: The URL constructor in matchProxyRule can throw for
malformed strings; wrap the parsing/hostname substitution in a try-catch around
new URL(originalUrl) inside the loop (where OG_PROXY_RULES is iterated) and on
error return null (or continue to next rule) so the job skips invalid matches
instead of throwing; ensure you still return the { fetchUrl, siteName } on
success and leave the function returning null when parsing fails.
- Line 13: The URL_REGEX (/https?:\/\/[^\s<>")\]]+/) currently captures trailing
punctuation (.,!?:;') which breaks OG fetches; update the logic in
extractFirstUrl (or the regex constant URL_REGEX) to avoid or strip those
characters: either tighten URL_REGEX to stop before common trailing punctuation
or, after matching in extractFirstUrl, trim trailing characters like . , ! ? : ;
) ] ' " and unmatched closing punctuation before returning the URL so only the
real URL is fetched.
In `@packages/realtime-types/src/queues.ts`:
- Around line 3-6: Change the LinkUnfurlJobData type to carry only extracted
URL(s) instead of raw message text (e.g. replace content: string with urls:
string[] or firstUrl?: string) in packages/realtime-types/src/queues.ts; then
update the enqueue call in apps/realtime/src/index.ts to extract URLs from the
message before creating the job and populate the new urls field; finally update
the worker in apps/worker/src/jobs/link-unfurl.ts to read the new urls property
(use the first URL or iterate as needed) and remove any usage of message content
so raw message text is no longer stored in the queue payload.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e844f52c-92b2-4299-9a52-a94c36322965
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (21)
apps/api/src/lib/helpers/openapi/message-schemas.tsapps/api/src/lib/queries/messages.tsapps/realtime/package.jsonapps/realtime/src/index.tsapps/realtime/src/services/messages.tsapps/web/package.jsonapps/web/src/components/chat/composer/message-input.tsxapps/web/src/components/chat/embed-card.tsxapps/web/src/components/chat/message-item.tsxapps/web/src/hooks/use-message-sending.tsapps/web/src/lib/realtime-adapter.tsapps/worker/package.jsonapps/worker/src/index.tsapps/worker/src/jobs/link-unfurl.tsapps/worker/tsconfig.jsonapps/worker/tsup.config.tspackages/db/src/schemas/messages.tspackages/realtime-types/package.jsonpackages/realtime-types/src/events.tspackages/realtime-types/src/index.tspackages/realtime-types/src/queues.ts
- Validate embed URL fields with https?:// regex in messageEmbedSchema - Fix parseRedisUrl in realtime and worker to preserve TLS and DB index - Fix ++ regex in toStoredMarkdown to handle URLs containing '+' chars - Fix hasMeta in EmbedCard so fallback link renders for siteName-only embeds - Add worker-level error handler for linkUnfurlWorker - Strip trailing punctuation from URL regex to avoid broken OG fetches - Wrap matchProxyRule URL parsing in try-catch for malformed strings - Check DB update result before emitting realtime embed event - Change LinkUnfurlJobData to carry extracted URLs instead of raw content - Add retention policy (removeOnComplete/removeOnFail) to unfurl queue
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/realtime/src/index.ts`:
- Around line 335-337: The current regex used in parsed.content.match (assigned
to urlMatches) prevents `)` and `]` inside URLs which truncates valid links;
update the matcher to allow `)` and `]` within the main URL match and instead
strip only trailing punctuation (like .,!?;:'")\]] ) via a trailing lookahead or
post-match trim so parentheses/brackets inside paths are preserved; modify the
regex used in parsed.content.match (or add a small post-processing step on
urlMatches) to permit internal `)` and `]` while still excluding final
punctuation from the matched URL.
In `@apps/worker/src/jobs/link-unfurl.ts`:
- Around line 42-55: The fetchOgEmbed function currently fetches user-controlled
URLs (and proxy.fetchUrl from matchProxyRule) without filtering, enabling SSRF;
update fetchOgEmbed (and the similar code at the other fetch site around lines
82-88) to validate and sanitize the target before calling ogs: parse the URL,
resolve its hostname to IP(s) (e.g., dns.lookup/dns.resolve), and reject
requests whose IPs or hostnames map to localhost, private RFC1918 ranges,
link-local, or other internal addresses; alternatively enforce that only allowed
proxy.fetchUrl values are used for untrusted inputs. Ensure validation runs on
both the original url and proxy.fetchUrl (if present) and throw/return null when
the check fails so ogs is never called with an internal address.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c1b0251d-1937-4d6b-aede-528b259f63e9
📒 Files selected for processing (7)
apps/api/src/lib/helpers/openapi/message-schemas.tsapps/realtime/src/index.tsapps/web/src/components/chat/composer/message-input.tsxapps/web/src/components/chat/embed-card.tsxapps/worker/src/index.tsapps/worker/src/jobs/link-unfurl.tspackages/realtime-types/src/queues.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/worker/src/jobs/link-unfurl.ts`:
- Around line 19-40: The isSafeUrl function currently uses lookup(hostname)
which returns only the first resolved address and can miss private IPs; update
isSafeUrl to resolve and inspect all addresses for the hostname (either by using
lookup(hostname, { all: true }) and iterating the returned records or by calling
dns.resolve4 and dns.resolve6), check every returned address against
PRIVATE_IP_REGEX, and reject if any resolved address is private; ensure the
hostname sanity checks remain (protocol, localhost, [::1]) and that errors still
cause a false result.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c38284d8-81cc-4471-8d3e-662c52e8cfe8
📒 Files selected for processing (2)
apps/realtime/src/index.tsapps/worker/src/jobs/link-unfurl.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/worker/src/jobs/link-unfurl.ts`:
- Around line 115-121: The handler currently only processes the first URL (using
urls[0]) and drops the rest; update the LinkUnfurlJobData handling to iterate
over all entries in the urls array by replacing the single-url logic with a loop
or Promise.all that calls fetchOgEmbed for each url, collects non-null results
into the embeds array, and preserves the existing early-return behavior when
embeds is empty; specifically change usages of the url variable and the embeds
assignment in link-unfurl.ts (where fetchOgEmbed and embeds are used) so that
all urls are fetched (e.g., await Promise.all(urls.map(...)), filter out nulls)
and then continue using the resulting embeds array.
- Around line 75-86: The isSafeUrl(fetchUrl) check validates only the initial
URL but ogs(...) follows redirects, allowing SSRF via post-check redirection;
update the ogs call (in the link-unfurl job) to control redirects by adding
fetchOptions.redirect = "manual" (or "error") alongside the existing headers and
timeout, or alternatively resolve and re-run isSafeUrl against the final
redirected URL before proceeding; modify the ogs(...) invocation that uses
fetchUrl and OG_FETCH_TIMEOUT_MS to include this redirect behavior to ensure
redirects are rejected or re-validated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 806c6bf1-fb89-4d37-bf4d-f351a7496065
📒 Files selected for processing (1)
apps/worker/src/jobs/link-unfurl.ts
service
Introduce apps/worker BullMQ service for async link unfurling — extracts URLs from sent messages, fetches Open Graph metadata, persists embeds to DB, and broadcasts updates via @socket.io/redis-emitter.
Add Discord-style embed cards to chat UI with full-width thumbnails, linked titles, and site name footer. Proxy X/Twitter URLs through fxtwitter.com for OG fetching.
Add @tiptap/extension-link for auto-linking pasted URLs in the composer, with ++markdown wrapper stripping for clean stored content.
Pull Request Summary
Overview
Adds asynchronous link unfurling with Open Graph embeds: new worker using BullMQ to fetch OG metadata, realtime embed broadcasting via a Redis-backed Socket.IO emitter, API/schema/type updates to surface embeds, web UI embed cards, and Composer autolinking/sanitization. X/Twitter URLs are proxied through fxtwitter.com when fetching OG metadata. Redis URL parsing preserves TLS and DB index where used.
Key Changes
New worker app
Realtime server
Types & contracts
API / schemas
Web client / UI
Packaging & build
Implementation notes & safety
Concerns & Recommendations
Files & Areas Affected (high level)
Confidence Score: 4/5
Rationale: Implementation is well-scoped and broadly integrated across worker, realtime, API, types, and web. Key safety measures (DNS-based private IP filtering, scheme checks, proxy rules) are present and message embeds are stored on a jsonb column with a default. Remaining items are operational and edge-case hardening (caching/deduping, metrics, retry policies, and URL extraction edge cases).