fix(linkedin): home_feed author body-fallback + drop promoted/suggested/noise rows#1156
Conversation
|
Warning Review limit reached
More reviews will be available in 6 minutes and 1 second. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR extends the LinkedIn connector with home feed author extraction and noise filtering heuristics. It adds two new exported utilities, integrates them into event construction, hardens extension dispatcher validation and Voyager API parsing, and updates test coverage accordingly. ChangesLinkedIn Connector Home Feed and Parsing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ee5ae77 to
03d38e5
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/connectors/src/linkedin.ts`:
- Around line 153-158: The isHomeFeedNoise function is too permissive because it
uses case-insensitive regexes and thus filters posts that mention lowercase
"promoted" or "suggested"; update the two regexes in isHomeFeedNoise to match
the literal capitalized labels by removing the /i flag (change /\bPromoted\b/i
to /\bPromoted\b/ and /\bSuggested\b/i to /\bSuggested\b/) while keeping the
same slice ranges and early-return logic.
- Around line 260-272: The code uses Object.keys(feedRoot) without guarding
against feedRoot being null/undefined; update the logic around feedRoot and the
elements extraction (the feedRoot variable and the elements array assignment
loop) to first ensure feedRoot is a non-null object (e.g., coerce to {} or
return an empty elements array) before calling Object.keys, and skip the
for-loop when feedRoot is null/not an object so the rest of the function remains
defensive and won’t throw on malformed/empty json.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f27ff9d3-9b75-41a9-b44f-dca9a90c249d
📒 Files selected for processing (2)
packages/connectors/src/__tests__/linkedin.test.tspackages/connectors/src/linkedin.ts
| export function isHomeFeedNoise(body: string): boolean { | ||
| if (!body || body.trim().length < 30) return true; | ||
| if (/\bPromoted\b/i.test(body.slice(0, 130))) return true; | ||
| if (/\bSuggested\b/i.test(body.slice(0, 30))) return true; | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Use a case-sensitive match for the Promoted label to avoid dropping genuine posts.
The LinkedIn ad label is rendered exactly as Promoted (and Suggested). With /i over the first 130 chars, a legitimate post whose body mentions "promoted" early (e.g. a "just got promoted" update) is silently filtered out — and for a feed connector, dropping real posts is worse than letting an occasional ad through.
🛡️ Match the literal capitalized labels
- if (/\bPromoted\b/i.test(body.slice(0, 130))) return true;
- if (/\bSuggested\b/i.test(body.slice(0, 30))) return true;
+ if (/\bPromoted\b/.test(body.slice(0, 130))) return true;
+ if (/\bSuggested\b/.test(body.slice(0, 30))) return true;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export function isHomeFeedNoise(body: string): boolean { | |
| if (!body || body.trim().length < 30) return true; | |
| if (/\bPromoted\b/i.test(body.slice(0, 130))) return true; | |
| if (/\bSuggested\b/i.test(body.slice(0, 30))) return true; | |
| return false; | |
| } | |
| export function isHomeFeedNoise(body: string): boolean { | |
| if (!body || body.trim().length < 30) return true; | |
| if (/\bPromoted\b/.test(body.slice(0, 130))) return true; | |
| if (/\bSuggested\b/.test(body.slice(0, 30))) return true; | |
| return false; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/connectors/src/linkedin.ts` around lines 153 - 158, The
isHomeFeedNoise function is too permissive because it uses case-insensitive
regexes and thus filters posts that mention lowercase "promoted" or "suggested";
update the two regexes in isHomeFeedNoise to match the literal capitalized
labels by removing the /i flag (change /\bPromoted\b/i to /\bPromoted\b/ and
/\bSuggested\b/i to /\bSuggested\b/) while keeping the same slice ranges and
early-return logic.
| const feedRoot = data?.data?.data ?? data?.data ?? data; | ||
| let elements: any[] = []; | ||
| for (const key of Object.keys(feedRoot)) { | ||
| const val = feedRoot[key]; | ||
| if (val?.["*elements"] && Array.isArray(val["*elements"])) { | ||
| elements = val["*elements"]; | ||
| break; | ||
| } | ||
| if (val?.elements && Array.isArray(val.elements)) { | ||
| elements = val.elements; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Object.keys(feedRoot) can throw on a null/empty response.
feedRoot falls back to data, so when json is null/undefined (empty or malformed intercepted body), feedRoot is null and Object.keys(null) throws — aborting this parse despite the rest of the function being carefully defensive (?? {}, ?? []).
🛡️ Guard the feed root before enumerating keys
const feedRoot = data?.data?.data ?? data?.data ?? data;
+ if (!feedRoot || typeof feedRoot !== "object") return posts;
let elements: any[] = [];
for (const key of Object.keys(feedRoot)) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/connectors/src/linkedin.ts` around lines 260 - 272, The code uses
Object.keys(feedRoot) without guarding against feedRoot being null/undefined;
update the logic around feedRoot and the elements extraction (the feedRoot
variable and the elements array assignment loop) to first ensure feedRoot is a
non-null object (e.g., coerce to {} or return an empty elements array) before
calling Object.keys, and skip the for-loop when feedRoot is null/not an object
so the rest of the function remains defensive and won’t throw on malformed/empty
json.
|
bug_free 88, simplicity 88, slop 0, bugs 0, 0 blockers Script logs: typecheck/unit/integration all exit 0. Ran bun test packages/connectors/src/tests/linkedin.test.ts and git diff --check; both passed. Review focused on LinkedIn home_feed author fallback/noise filtering; no concrete defects found. Full verdict JSON{
"bug_free_confidence": 88,
"bugs": 0,
"slop": 0,
"simplicity": 88,
"blockers": [],
"change_type": "fix",
"behavior_change_risk": "low",
"tests_adequate": true,
"suggested_fixes": [],
"notes": "Script logs: typecheck/unit/integration all exit 0. Ran bun test packages/connectors/src/__tests__/linkedin.test.ts and git diff --check; both passed. Review focused on LinkedIn home_feed author fallback/noise filtering; no concrete defects found.",
"categories": {
"src": 61,
"tests": 183,
"docs": 0,
"config": 0,
"deps": 0,
"migrations": 0,
"ci": 0,
"generated": 0
}
}Local review gate — branch protection can require the |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
What
Two quality fixes for the LinkedIn
home_feedconnector path (packages/connectors/src/linkedin.ts). The home feed is the one LinkedIn feed that can't use network capture — attaching the CDP debugger stops it rendering — so it relies on the extension's content-scriptgenericScrapeagainst a declarative selector config. That makes everything here inherently heuristic: there's no structured Voyager response to read author/ad fields from, only the scraped row{ id, body, author }.Bug 1 — author came back empty
The old author selector
.update-components-actor__title, .update-components-actor__nameno longer matches: LinkedIn obfuscates the actor class names in the live feed DOM. I probed the live feed — the actor classes don't match, but the author name is reliably present in the row'sbodytext.authorfield selector inHOME_FEED_SCRAPE_CONFIGto a best-effort that catches the actor link's visible name span when present (a[href*="/in/"] span[aria-hidden],a[href*="/company/"] span[aria-hidden]).parseHomeFeedAuthor(body)helper that recovers the author from body text when the selector misses: strips a leading "Feed post ", follows "reposted this" to the original poster, then takes the name before the " • " connection-degree marker (capped to 60 chars; also strips a trailing relative-time token like "17h" that appears in repost segments).buildHomeFeedEventsnow usesrow.authorwhen the DOM selector won, else falls back to the body parse.Bug 2 — promoted/suggested/noise rows became events
The feed mixes in ads, suggestions, and non-post noise (e.g. "Load more comments"). Added a pure
isHomeFeedNoise(body)helper and skip those rows before emitting:< 30chars (drops "Load more comments" etc.)Promotedin the first 130 chars (drops ads like "Feed post Attio 52,728 followers Promoted …")Suggestedin the first 30 chars (drops "Feed post Suggested …")Existing id/body dedupe is unchanged.
Why heuristic
This path can't use the network-capture primitive the other LinkedIn feeds use, so there's no structured author field or ad flag — body-text parsing and substring filtering are the only signals available over a content-script scrape.
Tests
Extended
packages/connectors/src/__tests__/linkedin.test.tswith unit coverage forparseHomeFeedAuthor(incl. the repost case),isHomeFeedNoise(the 3 drop cases + a normal keep), andbuildHomeFeedEventsend-to-end (keep+drop mix, authors correct,row.authorpreferred over body parse).bunx tsc --noEmit -p packages/connectors/tsconfig.jsonreports only pre-existing errors (connector-sdk stale-dist "no exported member" + implicit-any in the untouched company-updates/jobs sync code) — no new errors inlinkedin.ts. Roottsc --noEmit(pre-commit) passes clean.Need help on this PR? Tag
@codesmithwith what you need. Autofix is disabled.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes