Skip to content

feat(linkedin): home_feed via content-script scrape#1151

Merged
buremba merged 2 commits into
mainfrom
feat/linkedin-home-feed
May 29, 2026
Merged

feat(linkedin): home_feed via content-script scrape#1151
buremba merged 2 commits into
mainfrom
feat/linkedin-home-feed

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 29, 2026

What

Adds a home_feed feed to the LinkedIn connector that reads the user's personalized LinkedIn home feed (linkedin.com/feed/) via the extension's content-script scrape (cs_scrape) op.

company_updates and jobs are unchanged — they still use extensionNetworkSync (passive CDP network capture of Voyager JSON).

Why home_feed is different

The personalized home feed is the one feed that cannot be read via network capture. Attaching the CDP debugger stops the feed from rendering, so the Voyager responses never arrive and capture returns nothing. The home feed therefore must be read with a content script (no debugger) via the extension's cs_scrape op.

How

  • New home_feed feed entry (no company_url config — it always reads linkedin.com/feed/); optionsSchema-style max_scrolls (default 8).
  • sync() branches to a new syncHomeFeed() when feedKey === 'home_feed', which dispatches the cs_scrape navigate (persistent, focus, allowed_origins, the home-feed scrape_config) through the same requireExtensionDispatcher(ctx) the other feeds use.
  • Throws a clear "Not logged into LinkedIn" error when result.loggedIn === false.
  • buildHomeFeedEvents() maps result.rows ({ id, body, author }) to EventEnvelope[]:
    • origin_id = li_home_<token> (the componentkey token; NOT a numeric activity id)
    • source_url = https://www.linkedin.com/feed/ (no urn:li:activity permalink — token ids aren't activity ids)
    • occurred_at = sync time (home-feed posts expose no reliable timestamp)
    • dedupes by id; drops rows missing id/body.

The home-feed selectors (HOME_FEED_SCRAPE_CONFIG) live in the connector, not the extension.

Owletto pointer

Bumps the packages/owletto submodule pointer to d8f4535 ("persistent agent window + generic content-script scrape (#247)"), which provides the cs_scrape op.

Tests / typecheck

  • bun test packages/connectors/src/__tests__/ — 28 pass / 0 fail (6 new for home_feed: row→event mapping, dedupe/empty-author, dispatch flow, not-logged-in error, feed declaration).
  • bunx tsc --noEmit -p packages/connectors/tsconfig.json — no new linkedin.ts errors vs. main baseline (the 8 reported are pre-existing: unbuilt @lobu/connector-sdk dist + implicit-any in the existing jobs/auth code).

🤖 Generated with Claude Code


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

Summary by CodeRabbit

  • New Features

    • Added LinkedIn home-feed scraping via the browser extension, producing post events (with author metadata) and configurable scroll depth.
  • Tests

    • Added and improved tests for LinkedIn connector and browser-scrape behaviors to ensure correct event mapping, deduplication, and sync handling.
  • Chores

    • Updated Owletto extension reference.

Review Change Stack

…inter

The personalized LinkedIn home feed is the one feed that can't be read via
network capture: attaching the CDP debugger stops the feed rendering so the
Voyager responses never arrive. Add a home_feed feed that reads it via the
extension's cs_scrape content-script op (no debugger) instead.

company_updates + jobs are unchanged (still extensionNetworkSync). Bumps the
owletto submodule pointer to d8f4535, which provides the cs_scrape op.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2cb8728e-08fd-426b-a0df-3ccdcc839a2b

📥 Commits

Reviewing files that changed from the base of the PR and between e6e5339 and 10b8b1a.

📒 Files selected for processing (1)
  • packages/owletto

📝 Walkthrough

Walkthrough

Adds a LinkedIn home_feed scraped via the Owletto extension content-script: types and scrape config, buildHomeFeedEvents() exporter, home_feed feed registration with config schema, syncHomeFeed() orchestration using paired-extension dispatch, and unit tests plus a small subproject pointer bump.

Changes

LinkedIn Home Feed Scraping via Extension

Layer / File(s) Summary
Test Mocking Infrastructure
packages/connectors/src/__tests__/browser-scraper-utils.test.ts
Extended the mocked SDK module to expose ConnectorRuntime, calculateEngagementScore, and extensionNetworkSync so downstream tests can import the stubbed module and resolve required symbols.
Home-Feed Types & Event Builder
packages/connectors/src/linkedin.ts, packages/connectors/src/__tests__/linkedin.test.ts
Defines home-feed scrape result types, allowed origins, and HOME_FEED_SCRAPE_CONFIG (DOM selectors and field mapping). Adds exported buildHomeFeedEvents() to dedupe rows and convert them into post event envelopes using sync time and /feed/ source URL. Tests verify token-id mapping to li_home_<token>, field extraction, author defaulting, filtering, and deduplication.
Feed Config & Definition
packages/connectors/src/linkedin.ts
Adds homeFeedConfigSchema with configurable max_scrolls (1–30, default 8) and registers home_feed in the connector's feed definition with a post event kind including author metadata.
Sync Orchestration & Tests
packages/connectors/src/linkedin.ts, packages/connectors/src/__tests__/linkedin.test.ts, packages/owletto
Routes feedKey === 'home_feed' to syncHomeFeed() which dispatches a persistent, focused cs_scrape against https://www.linkedin.com/feed/ with configured scroll depth, throws when loggedIn === false, builds events via buildHomeFeedEvents(), returns checkpoint unchanged, and reports scrape counts and extension-cs-scrape backend. Tests validate dispatcher parameters, event mapping, metadata, and login error handling; packages/owletto pointer bumped.

Sequence Diagram(s)

sequenceDiagram
  participant LinkedInConnector
  participant PairedExtensionDispatcher
  participant OwlettoExtensionContentScript
  participant buildHomeFeedEvents
  LinkedInConnector->>PairedExtensionDispatcher: dispatch({ type: "cs_scrape", url: "https://www.linkedin.com/feed/", scrape_config })
  PairedExtensionDispatcher->>OwlettoExtensionContentScript: run cs_scrape with config
  OwlettoExtensionContentScript-->>PairedExtensionDispatcher: return { loggedIn, rows }
  PairedExtensionDispatcher-->>LinkedInConnector: return scrape result
  LinkedInConnector->>buildHomeFeedEvents: buildHomeFeedEvents(rows, new Date())
  buildHomeFeedEvents-->>LinkedInConnector: return EventEnvelope[]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • lobu-ai/lobu#1132: Earlier migration of the LinkedIn connector onto the Owletto extension infrastructure and chrome dispatch support.

Poem

🐰 I nibble DOM and nibble code,
Owletto hops to fetch the load,
Posts deduped, authors set just so,
Home feeds hop in tidy rows,
Tests clap paws—our scraper's ready to go.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(linkedin): home_feed via content-script scrape' clearly and concisely summarizes the main change: adding a new home_feed feature to the LinkedIn connector that uses content-script scraping.
Description check ✅ Passed The pull request description includes a comprehensive 'What' section explaining the feature, a detailed 'Why' section justifying the approach, thorough 'How' section with implementation details, and 'Tests / typecheck' validation results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/linkedin-home-feed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/connectors/src/__tests__/linkedin.test.ts (1)

21-24: ⚡ Quick win

Move the dynamic-import exception note to the import itself.

This lazy import is justified here, but the repo rule requires the rationale at the call site. Add the explanation directly above Line 22 so the exception stays obvious if the surrounding comments move.

As per coding guidelines, **/*.ts: No new dynamic imports outside the allow-list; default to static import; new await import(...) needs cost justification in this file + rationale comment at call site

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/connectors/src/__tests__/linkedin.test.ts` around lines 21 - 24, The
dynamic import of the LinkedIn module (await import('../linkedin')) is allowed
here but the rationale must sit directly at the call site; add a brief comment
immediately above the await import explaining why a lazy/dynamic import is
justified (e.g., test-only heavy initialization, circular dependency avoidance,
or to prevent side effects) so the exception stays with the import; keep
LinkedInConnector and buildHomeFeedEvents assignments unchanged and ensure the
comment references that this is the reason for using await
import('../linkedin').
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/connectors/src/linkedin.ts`:
- Around line 511-515: The current return passes the incoming checkpoint through
which causes repeated emission of the same home-feed posts; modify the logic in
the routine that calls buildHomeFeedEvents() (the block returning { events,
checkpoint }) to persist a bounded set of seen home-feed ids in the checkpoint,
filter out events whose post id is already in that set before assigning
occurred_at, and update the checkpoint with the new bounded set (e.g.,
fixed-size queue or LRU) so subsequent runs skip previously seen ids; reference
the variables events, checkpoint and the buildHomeFeedEvents() output when
implementing the filtering and checkpoint mutation.

In `@packages/owletto`:
- Line 1: The submodule pointer currently references commit
d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a which is not reachable from
owletto/main; do not update the submodule here — instead merge the required
owletto changes (the cs_scrape op) into the owletto main branch, confirm the
merged commit exists on owletto/main, then update the submodule reference in
this repo to that merged commit SHA (replace the current pinned SHA with the
reachable commit) so clones and CI can resolve the submodule reliably.

---

Nitpick comments:
In `@packages/connectors/src/__tests__/linkedin.test.ts`:
- Around line 21-24: The dynamic import of the LinkedIn module (await
import('../linkedin')) is allowed here but the rationale must sit directly at
the call site; add a brief comment immediately above the await import explaining
why a lazy/dynamic import is justified (e.g., test-only heavy initialization,
circular dependency avoidance, or to prevent side effects) so the exception
stays with the import; keep LinkedInConnector and buildHomeFeedEvents
assignments unchanged and ensure the comment references that this is the reason
for using await import('../linkedin').
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0d220cd2-e423-4c0f-bacd-881dc259ccd8

📥 Commits

Reviewing files that changed from the base of the PR and between 019ad82 and e6e5339.

📒 Files selected for processing (4)
  • packages/connectors/src/__tests__/browser-scraper-utils.test.ts
  • packages/connectors/src/__tests__/linkedin.test.ts
  • packages/connectors/src/linkedin.ts
  • packages/owletto

Comment on lines +511 to +515
return {
events,
// The home feed exposes no stable per-post cursor (opaque token ids, no
// timestamps), so there is nothing new to checkpoint — pass it through.
checkpoint: checkpoint as unknown as Record<string, unknown>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Persist a home-feed cursor instead of passing the old checkpoint through.

Line 515 returns the incoming checkpoint unchanged even though buildHomeFeedEvents() only dedupes within one scrape and every emitted event gets a fresh sync-time occurred_at. If LinkedIn keeps the same posts visible across runs, this path will emit them again on every sync. Store a bounded set of seen home-feed ids (or another stable cursor) and filter before returning events.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/connectors/src/linkedin.ts` around lines 511 - 515, The current
return passes the incoming checkpoint through which causes repeated emission of
the same home-feed posts; modify the logic in the routine that calls
buildHomeFeedEvents() (the block returning { events, checkpoint }) to persist a
bounded set of seen home-feed ids in the checkpoint, filter out events whose
post id is already in that set before assigning occurred_at, and update the
checkpoint with the new bounded set (e.g., fixed-size queue or LRU) so
subsequent runs skip previously seen ids; reference the variables events,
checkpoint and the buildHomeFeedEvents() output when implementing the filtering
and checkpoint mutation.

Comment thread packages/owletto Outdated
@@ -1 +1 @@
Subproject commit ab506452478a0220a9f45a833ff5b8ed62a25648
Subproject commit d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

The pinned submodule commit is not reachable from the main branch.

The pipeline failure confirms that commit d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a is not reachable from owletto/main. This will cause problems:

  • Other developers cloning this repository will fail to resolve the submodule.
  • The commit may disappear if the source branch is rebased or deleted.
  • CI/CD pipelines may break when checking out the submodule.

Before updating the submodule pointer in this PR, first merge the required owletto changes (the cs_scrape op) into the owletto main branch, then update this submodule reference to point to that merged commit.

🧰 Tools
🪛 GitHub Actions: Submodule Drift / 0_check-drift.txt

[error] 1-6: Pinned SHA validation failed. Pinned (parent) SHA $PINNED is not reachable from owletto/main. Error: "Pinned SHA $PINNED is not reachable from owletto/main."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/owletto` at line 1, The submodule pointer currently references
commit d8f45358f2e1b2b82b8136fbfd66ca95d275ef1a which is not reachable from
owletto/main; do not update the submodule here — instead merge the required
owletto changes (the cs_scrape op) into the owletto main branch, confirm the
merged commit exists on owletto/main, then update the submodule reference in
this repo to that merged commit SHA (replace the current pinned SHA with the
reachable commit) so clones and CI can resolve the submodule reliably.

@buremba
Copy link
Copy Markdown
Member Author

buremba commented May 29, 2026

bug_free 55, simplicity 78, slop 4, bugs 2, 0 blockers

Typecheck/unit/integration logs all passed. Re-ran bun test packages/connectors/src/tests/linkedin.test.ts and packages/owletto apps/chrome/tools.test.js; both passed. Browser cs_scrape path is not covered and has wrong firstLine handling plus persistent-tab same-host navigation risk.

Suggested fixes

File Line Change
packages/owletto/apps/chrome/tools.js 318 Preserve raw innerText for firstLine: split the raw text on newlines before whitespace normalization, then clean the selected line.
packages/owletto/apps/chrome/tools.js 492 Do not skip navigation just because the reused persistent tab is on the same host; navigate when the current URL differs from the requested URL, or gate route preservation behind an explicit option.
Full verdict JSON
{
  "bug_free_confidence": 55,
  "bugs": 2,
  "slop": 4,
  "simplicity": 78,
  "blockers": [],
  "change_type": "feat",
  "behavior_change_risk": "medium",
  "tests_adequate": false,
  "suggested_fixes": [
    {
      "file": "packages/owletto/apps/chrome/tools.js",
      "line": 318,
      "change": "Preserve raw innerText for firstLine: split the raw text on newlines before whitespace normalization, then clean the selected line."
    },
    {
      "file": "packages/owletto/apps/chrome/tools.js",
      "line": 492,
      "change": "Do not skip navigation just because the reused persistent tab is on the same host; navigate when the current URL differs from the requested URL, or gate route preservation behind an explicit option."
    }
  ],
  "notes": "Typecheck/unit/integration logs all passed. Re-ran bun test packages/connectors/src/__tests__/linkedin.test.ts and packages/owletto apps/chrome/tools.test.js; both passed. Browser cs_scrape path is not covered and has wrong firstLine handling plus persistent-tab same-host navigation risk.",
  "categories": {
    "src": 175,
    "tests": 137,
    "docs": 0,
    "config": 0,
    "deps": 2,
    "migrations": 0,
    "ci": 0,
    "generated": 0
  }
}

Local review gate — branch protection can require the pi-review commit status. See docs/REVIEW_SCHEMA.md.

@buremba buremba merged commit c9baa1a into main May 29, 2026
19 of 20 checks passed
@buremba buremba deleted the feat/linkedin-home-feed branch May 29, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants