Skip to content

feat(connector-sdk): extensionDomScrape helper; use it in LinkedIn home_feed#1155

Merged
buremba merged 3 commits into
mainfrom
feat/extension-dom-scrape
May 29, 2026
Merged

feat(connector-sdk): extensionDomScrape helper; use it in LinkedIn home_feed#1155
buremba merged 3 commits into
mainfrom
feat/extension-dom-scrape

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 29, 2026

What

Adds the missing SDK helper for the content-script DOM path so the LinkedIn home feed goes through the SDK like the network feeds, instead of a raw inline dispatch.

The connector SDK already had extensionNetworkSync for passive network capture (company updates, jobs, Revolut, X). But the LinkedIn home feed can't use network capture — attaching the CDP debugger that the Network-domain intercept needs stops the personalized feed from rendering, so the Voyager XHRs never fire. Instead it reads the live DOM via the extension's content-script op cs_scrape. That path was hand-wired inline in linkedin.ts (dispatcher.dispatch('navigate', { cs_scrape: true, ... }) + reading observation.result.{rows,loggedIn}), leaving the SDK asymmetric: a helper for the network path, a raw dispatch for the DOM path.

This PR closes that gap.

Changes

  • packages/connector-sdk/src/extension-dom-scrape.ts (new): extensionDomScrape<TItem>({ dispatcher, url, config, parseRows, allowedOrigins, persistent?, focus? }). Does the single navigate+cs_scrape dispatch (defaulting persistent/focus to true), reads observation.result, returns { items, loggedIn, count, host?, landedUrl? } (loggedIn is true unless the extension reports an explicit false). Reuses the existing ChromeActionDispatcher type from extension-network.ts — no second definition. Exports extensionDomScrape + ExtensionScrapeConfig/ExtensionScrapeResult/ExtensionScrapeObservation/ExtensionDomScrapeResult from the package index.
  • packages/connectors/src/linkedin.ts: syncHomeFeed now calls extensionDomScrape instead of the inline dispatch. Deletes the now-dead local CsScrapeObservation / HomeFeedScrapeResult types (keeps HomeFeedRow, HOME_FEED_SCRAPE_CONFIG, buildHomeFeedEvents). company_updates/jobs stay on extensionNetworkSync.

The SDK helper stays generic and names no site — the selectors (HOME_FEED_SCRAPE_CONFIG) and allowed origins stay in the connector, where they belong (CSP/site-specific constraint).

No behavior change

Same single navigate + cs_scrape dispatch with the same scrape_config/allowed_origins, same logged-out error message — just wrapped behind the SDK.

Tests / typecheck

  • bun test packages/connectors/src/__tests__/ packages/connector-sdk/src/__tests__/extension-dom-scrape.test.ts packages/connector-sdk/src/__tests__/extension-network.test.ts38 pass / 0 fail. Existing LinkedIn home-feed tests still pass; added extension-dom-scrape.test.ts (stub dispatcher → asserts dispatch shape, row parsing, and loggedIn true/false/absent handling).
  • bunx tsc --noEmit on packages/connector-sdk → clean. On packages/connectors, no new errors in touched files (linkedin.ts clean); remaining errors are the known pre-existing baseline in other connectors + .ts-import-extension config noise.

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

Summary by CodeRabbit

  • New Features

    • Added extension DOM scraping capabilities to the connector SDK, enabling improved web content extraction with enhanced configuration and result handling for browser extension-based scraping operations.
  • Improvements

    • Updated LinkedIn connector to leverage improved content extraction mechanisms for more reliable data retrieval.
  • Tests

    • Refactored test infrastructure to use shared mock implementations, improving consistency and maintainability across connector unit tests.

Review Change Stack

…me_feed

The SDK had extensionNetworkSync for the passive network-capture path but
the content-script DOM path (cs_scrape) was hand-wired inline in the
LinkedIn connector. Add extensionDomScrape as the DOM-path companion so the
home feed goes through the SDK like the network feeds. The helper stays
generic (names no site); the selectors and allowed origins are supplied by
the caller. No behavior change: same single navigate+cs_scrape dispatch,
just wrapped.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR introduces extensionDomScrape, a reusable SDK helper that encapsulates the pattern of dispatching a cs_scrape navigate action and normalizing the resulting observation. The LinkedIn connector is refactored to use this helper, eliminating manual dispatch logic. Shared test mock infrastructure is created to reduce boilerplate across connector unit tests.

Changes

extensionDomScrape Helper and LinkedIn Integration

Layer / File(s) Summary
SDK Helper Contract and Implementation
packages/connector-sdk/src/extension-dom-scrape.ts
Defines configuration, result, and observation types for cs_scrape operations; implements extensionDomScrape to dispatch navigate with content-script scraping enabled, parse rows via caller-provided function, and return normalized result with safe defaults for loggedIn and count.
SDK Helper Tests
packages/connector-sdk/src/__tests__/extension-dom-scrape.test.ts
Introduces makeDispatcher test helper and verifies extensionDomScrape dispatches with correct flags, handles missing/false loggedIn values, applies persistent/focus overrides, and gracefully handles missing result envelope.
SDK Barrel Exports
packages/connector-sdk/src/index.ts
Re-exports new extension DOM scraping types and the extensionDomScrape function.
Shared Test Mock Infrastructure
packages/connectors/src/__tests__/connector-sdk.mock.ts
Provides connectorSdkMock() factory returning stubbed runtime symbols and a functional extensionDomScrape for test isolation.
LinkedIn Connector Home-Feed Integration
packages/connectors/src/linkedin.ts
Replaces manual dispatcher.dispatch logic with extensionDomScrape call; removes internal HomeFeedScrapeResult and CsScrapeObservation types; preserves auth-wall error behavior.
Connector Test Refactoring
packages/connectors/src/__tests__/linkedin.test.ts, packages/connectors/src/__tests__/browser-scraper-utils.test.ts
Updated to import and use shared connectorSdkMock instead of inline stub implementations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • lobu-ai/lobu#1151: Overlaps on packages/connectors/src/linkedin.ts home-feed scrape implementation via cs_scrape/navigate and enforces the same loggedIn-driven auth behavior.

Poem

🐰 A scraper born from dispatch dispatch dreams,
Now wrapped in helper's elegant seams.
LinkedIn feeds flow through cleaner code,
Tests mock together on a shared road,
Observation and result find harmony at last! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(connector-sdk): extensionDomScrape helper; use it in LinkedIn home_feed' clearly and concisely summarizes the main change: adding a new SDK helper and applying it to the LinkedIn connector.
Description check ✅ Passed The description comprehensively covers the what/why, detailed changes, testing, and notes as required by the template, with all key sections substantially completed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/extension-dom-scrape

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@buremba
Copy link
Copy Markdown
Member Author

buremba commented May 29, 2026

bug_free 87, simplicity 82, slop 12, bugs 0, 0 blockers

Deterministic typecheck/unit/integration logs all exited 0; narrow rerun of extension-dom-scrape + linkedin tests passed (10/10). Skipped server boot because diff is connector SDK/connectors only. Main unverified path is a live Owletto/LinkedIn browser scrape.

Suggested fixes

File Line Change
packages/connectors/src/__tests__/linkedin.test.ts 17 Replace the inline extensionDomScrape reimplementation with the real helper or a shared test mock helper so this test does not duplicate production logic.
packages/connectors/src/__tests__/browser-scraper-utils.test.ts 26 Reuse the same extensionDomScrape test mock/helper used by linkedin.test.ts instead of copying the full implementation here.
Full verdict JSON
{
  "bug_free_confidence": 87,
  "bugs": 0,
  "slop": 12,
  "simplicity": 82,
  "blockers": [],
  "change_type": "feat",
  "behavior_change_risk": "low",
  "tests_adequate": true,
  "suggested_fixes": [
    {
      "file": "packages/connectors/src/__tests__/linkedin.test.ts",
      "line": 17,
      "change": "Replace the inline extensionDomScrape reimplementation with the real helper or a shared test mock helper so this test does not duplicate production logic."
    },
    {
      "file": "packages/connectors/src/__tests__/browser-scraper-utils.test.ts",
      "line": 26,
      "change": "Reuse the same extensionDomScrape test mock/helper used by linkedin.test.ts instead of copying the full implementation here."
    }
  ],
  "notes": "Deterministic typecheck/unit/integration logs all exited 0; narrow rerun of extension-dom-scrape + linkedin tests passed (10/10). Skipped server boot because diff is connector SDK/connectors only. Main unverified path is a live Owletto/LinkedIn browser scrape.",
  "categories": {
    "src": 127,
    "tests": 180,
    "docs": 0,
    "config": 0,
    "deps": 0,
    "migrations": 0,
    "ci": 0,
    "generated": 0
  }
}

Local review gate — branch protection can require the pi-review commit status. See docs/REVIEW_SCHEMA.md.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/connector-sdk/src/extension-dom-scrape.ts (1)

36-42: 💤 Low value

Prefer interface for the observation shape.

ExtensionScrapeObservation defines an object shape as a type alias. An interface with an index signature is equivalent here and keeps it consistent with the other declarations in this file:

♻️ Proposed refactor
-/** Dispatcher observation envelope; the index signature satisfies `ChromeActionOutput`. */
-export type ExtensionScrapeObservation = Record<string, unknown> & {
-  tab_id?: number;
-  cs_scrape?: boolean;
-  persistent_reused?: boolean;
-  result?: ExtensionScrapeResult;
-};
+/** Dispatcher observation envelope; the index signature satisfies `ChromeActionOutput`. */
+export interface ExtensionScrapeObservation {
+  tab_id?: number;
+  cs_scrape?: boolean;
+  persistent_reused?: boolean;
+  result?: ExtensionScrapeResult;
+  [k: string]: unknown;
+}

As per coding guidelines: "Use interface for defining object shapes in TypeScript files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/connector-sdk/src/extension-dom-scrape.ts` around lines 36 - 42,
Replace the type alias ExtensionScrapeObservation with an equivalent interface
to match the file's conventions: declare interface ExtensionScrapeObservation
that includes the index signature (to satisfy ChromeActionOutput) plus the
optional properties tab_id, cs_scrape, persistent_reused, and result
(referencing ExtensionScrapeResult); keep the shape identical but use the
interface keyword so it aligns with other declarations in this module.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/connector-sdk/src/extension-dom-scrape.ts`:
- Around line 36-42: Replace the type alias ExtensionScrapeObservation with an
equivalent interface to match the file's conventions: declare interface
ExtensionScrapeObservation that includes the index signature (to satisfy
ChromeActionOutput) plus the optional properties tab_id, cs_scrape,
persistent_reused, and result (referencing ExtensionScrapeResult); keep the
shape identical but use the interface keyword so it aligns with other
declarations in this module.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 43e55991-a293-471a-bdb0-96aed9d897d1

📥 Commits

Reviewing files that changed from the base of the PR and between 794cfa5 and 2c40174.

📒 Files selected for processing (7)
  • packages/connector-sdk/src/__tests__/extension-dom-scrape.test.ts
  • packages/connector-sdk/src/extension-dom-scrape.ts
  • packages/connector-sdk/src/index.ts
  • packages/connectors/src/__tests__/browser-scraper-utils.test.ts
  • packages/connectors/src/__tests__/connector-sdk.mock.ts
  • packages/connectors/src/__tests__/linkedin.test.ts
  • packages/connectors/src/linkedin.ts

@buremba buremba merged commit ef359c0 into main May 29, 2026
23 checks passed
@buremba buremba deleted the feat/extension-dom-scrape branch May 29, 2026 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants