feat(linkedin): home_feed via content-script scrape#1151
Conversation
…inter The personalized LinkedIn home feed is the one feed that can't be read via network capture: attaching the CDP debugger stops the feed rendering so the Voyager responses never arrive. Add a home_feed feed that reads it via the extension's cs_scrape content-script op (no debugger) instead. company_updates + jobs are unchanged (still extensionNetworkSync). Bumps the owletto submodule pointer to d8f4535, which provides the cs_scrape op.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a LinkedIn ChangesLinkedIn Home Feed Scraping via Extension
Sequence Diagram(s)sequenceDiagram
participant LinkedInConnector
participant PairedExtensionDispatcher
participant OwlettoExtensionContentScript
participant buildHomeFeedEvents
LinkedInConnector->>PairedExtensionDispatcher: dispatch({ type: "cs_scrape", url: "https://www.linkedin.com/feed/", scrape_config })
PairedExtensionDispatcher->>OwlettoExtensionContentScript: run cs_scrape with config
OwlettoExtensionContentScript-->>PairedExtensionDispatcher: return { loggedIn, rows }
PairedExtensionDispatcher-->>LinkedInConnector: return scrape result
LinkedInConnector->>buildHomeFeedEvents: buildHomeFeedEvents(rows, new Date())
buildHomeFeedEvents-->>LinkedInConnector: return EventEnvelope[]
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/connectors/src/__tests__/linkedin.test.ts (1)
21-24: ⚡ Quick winMove the dynamic-import exception note to the import itself.
This lazy import is justified here, but the repo rule requires the rationale at the call site. Add the explanation directly above Line 22 so the exception stays obvious if the surrounding comments move.
As per coding guidelines,
**/*.ts: No new dynamic imports outside the allow-list; default to staticimport; newawait import(...)needs cost justification in this file + rationale comment at call site🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/connectors/src/__tests__/linkedin.test.ts` around lines 21 - 24, The dynamic import of the LinkedIn module (await import('../linkedin')) is allowed here but the rationale must sit directly at the call site; add a brief comment immediately above the await import explaining why a lazy/dynamic import is justified (e.g., test-only heavy initialization, circular dependency avoidance, or to prevent side effects) so the exception stays with the import; keep LinkedInConnector and buildHomeFeedEvents assignments unchanged and ensure the comment references that this is the reason for using await import('../linkedin').
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/connectors/src/linkedin.ts`:
- Around line 511-515: The current return passes the incoming checkpoint through
which causes repeated emission of the same home-feed posts; modify the logic in
the routine that calls buildHomeFeedEvents() (the block returning { events,
checkpoint }) to persist a bounded set of seen home-feed ids in the checkpoint,
filter out events whose post id is already in that set before assigning
occurred_at, and update the checkpoint with the new bounded set (e.g.,
fixed-size queue or LRU) so subsequent runs skip previously seen ids; reference
the variables events, checkpoint and the buildHomeFeedEvents() output when
implementing the filtering and checkpoint mutation.
In `@packages/owletto`:
- Line 1: The submodule pointer currently references commit
d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a which is not reachable from
owletto/main; do not update the submodule here — instead merge the required
owletto changes (the cs_scrape op) into the owletto main branch, confirm the
merged commit exists on owletto/main, then update the submodule reference in
this repo to that merged commit SHA (replace the current pinned SHA with the
reachable commit) so clones and CI can resolve the submodule reliably.
---
Nitpick comments:
In `@packages/connectors/src/__tests__/linkedin.test.ts`:
- Around line 21-24: The dynamic import of the LinkedIn module (await
import('../linkedin')) is allowed here but the rationale must sit directly at
the call site; add a brief comment immediately above the await import explaining
why a lazy/dynamic import is justified (e.g., test-only heavy initialization,
circular dependency avoidance, or to prevent side effects) so the exception
stays with the import; keep LinkedInConnector and buildHomeFeedEvents
assignments unchanged and ensure the comment references that this is the reason
for using await import('../linkedin').
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 0d220cd2-e423-4c0f-bacd-881dc259ccd8
📒 Files selected for processing (4)
packages/connectors/src/__tests__/browser-scraper-utils.test.tspackages/connectors/src/__tests__/linkedin.test.tspackages/connectors/src/linkedin.tspackages/owletto
| return { | ||
| events, | ||
| // The home feed exposes no stable per-post cursor (opaque token ids, no | ||
| // timestamps), so there is nothing new to checkpoint — pass it through. | ||
| checkpoint: checkpoint as unknown as Record<string, unknown>, |
There was a problem hiding this comment.
Persist a home-feed cursor instead of passing the old checkpoint through.
Line 515 returns the incoming checkpoint unchanged even though buildHomeFeedEvents() only dedupes within one scrape and every emitted event gets a fresh sync-time occurred_at. If LinkedIn keeps the same posts visible across runs, this path will emit them again on every sync. Store a bounded set of seen home-feed ids (or another stable cursor) and filter before returning events.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/connectors/src/linkedin.ts` around lines 511 - 515, The current
return passes the incoming checkpoint through which causes repeated emission of
the same home-feed posts; modify the logic in the routine that calls
buildHomeFeedEvents() (the block returning { events, checkpoint }) to persist a
bounded set of seen home-feed ids in the checkpoint, filter out events whose
post id is already in that set before assigning occurred_at, and update the
checkpoint with the new bounded set (e.g., fixed-size queue or LRU) so
subsequent runs skip previously seen ids; reference the variables events,
checkpoint and the buildHomeFeedEvents() output when implementing the filtering
and checkpoint mutation.
| @@ -1 +1 @@ | |||
| Subproject commit ab506452478a0220a9f45a833ff5b8ed62a25648 | |||
| Subproject commit d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a | |||
There was a problem hiding this comment.
The pinned submodule commit is not reachable from the main branch.
The pipeline failure confirms that commit d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a is not reachable from owletto/main. This will cause problems:
- Other developers cloning this repository will fail to resolve the submodule.
- The commit may disappear if the source branch is rebased or deleted.
- CI/CD pipelines may break when checking out the submodule.
Before updating the submodule pointer in this PR, first merge the required owletto changes (the cs_scrape op) into the owletto main branch, then update this submodule reference to point to that merged commit.
🧰 Tools
🪛 GitHub Actions: Submodule Drift / 0_check-drift.txt
[error] 1-6: Pinned SHA validation failed. Pinned (parent) SHA $PINNED is not reachable from owletto/main. Error: "Pinned SHA $PINNED is not reachable from owletto/main."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/owletto` at line 1, The submodule pointer currently references
commit d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a which is not reachable from
owletto/main; do not update the submodule here — instead merge the required
owletto changes (the cs_scrape op) into the owletto main branch, confirm the
merged commit exists on owletto/main, then update the submodule reference in
this repo to that merged commit SHA (replace the current pinned SHA with the
reachable commit) so clones and CI can resolve the submodule reliably.
|
bug_free 55, simplicity 78, slop 4, bugs 2, 0 blockers Typecheck/unit/integration logs all passed. Re-ran bun test packages/connectors/src/tests/linkedin.test.ts and packages/owletto apps/chrome/tools.test.js; both passed. Browser cs_scrape path is not covered and has wrong firstLine handling plus persistent-tab same-host navigation risk. Suggested fixes
Full verdict JSON{
"bug_free_confidence": 55,
"bugs": 2,
"slop": 4,
"simplicity": 78,
"blockers": [],
"change_type": "feat",
"behavior_change_risk": "medium",
"tests_adequate": false,
"suggested_fixes": [
{
"file": "packages/owletto/apps/chrome/tools.js",
"line": 318,
"change": "Preserve raw innerText for firstLine: split the raw text on newlines before whitespace normalization, then clean the selected line."
},
{
"file": "packages/owletto/apps/chrome/tools.js",
"line": 492,
"change": "Do not skip navigation just because the reused persistent tab is on the same host; navigate when the current URL differs from the requested URL, or gate route preservation behind an explicit option."
}
],
"notes": "Typecheck/unit/integration logs all passed. Re-ran bun test packages/connectors/src/__tests__/linkedin.test.ts and packages/owletto apps/chrome/tools.test.js; both passed. Browser cs_scrape path is not covered and has wrong firstLine handling plus persistent-tab same-host navigation risk.",
"categories": {
"src": 175,
"tests": 137,
"docs": 0,
"config": 0,
"deps": 2,
"migrations": 0,
"ci": 0,
"generated": 0
}
}Local review gate — branch protection can require the |
What
Adds a
home_feedfeed to the LinkedIn connector that reads the user's personalized LinkedIn home feed (linkedin.com/feed/) via the extension's content-script scrape (cs_scrape) op.company_updatesandjobsare unchanged — they still useextensionNetworkSync(passive CDP network capture of Voyager JSON).Why home_feed is different
The personalized home feed is the one feed that cannot be read via network capture. Attaching the CDP debugger stops the feed from rendering, so the Voyager responses never arrive and capture returns nothing. The home feed therefore must be read with a content script (no debugger) via the extension's
cs_scrapeop.How
home_feedfeed entry (nocompany_urlconfig — it always readslinkedin.com/feed/);optionsSchema-stylemax_scrolls(default 8).sync()branches to a newsyncHomeFeed()whenfeedKey === 'home_feed', which dispatches thecs_scrapenavigate (persistent,focus,allowed_origins, the home-feedscrape_config) through the samerequireExtensionDispatcher(ctx)the other feeds use.result.loggedIn === false.buildHomeFeedEvents()mapsresult.rows({ id, body, author }) toEventEnvelope[]:origin_id=li_home_<token>(the componentkey token; NOT a numeric activity id)source_url=https://www.linkedin.com/feed/(nourn:li:activitypermalink — token ids aren't activity ids)occurred_at= sync time (home-feed posts expose no reliable timestamp)The home-feed selectors (
HOME_FEED_SCRAPE_CONFIG) live in the connector, not the extension.Owletto pointer
Bumps the
packages/owlettosubmodule pointer tod8f4535("persistent agent window + generic content-script scrape (#247)"), which provides thecs_scrapeop.Tests / typecheck
bun test packages/connectors/src/__tests__/— 28 pass / 0 fail (6 new for home_feed: row→event mapping, dedupe/empty-author, dispatch flow, not-logged-in error, feed declaration).bunx tsc --noEmit -p packages/connectors/tsconfig.json— no newlinkedin.tserrors vs. main baseline (the 8 reported are pre-existing: unbuilt@lobu/connector-sdkdist + implicit-any in the existing jobs/auth code).🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need. Autofix is disabled.Summary by CodeRabbit
New Features
Tests
Chores