Skip to content

design: connector-worker placement (#615)#842

Closed
buremba wants to merge 1 commit into
mainfrom
design/connector-placement-615
Closed

design: connector-worker placement (#615)#842
buremba wants to merge 1 commit into
mainfrom
design/connector-placement-615

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 18, 2026

Summary

Draft design proposal for issue #615 — dynamic connector-worker placement (device vs. cloud) and scale-from-zero pod pool. No code changes.

Doc: docs/proposals/connector-placement.md

Key proposals:

  • Explicit connector_definitions.placement (cloud_only/device_only/auto) replaces the implicit "no required_capability == cloud" rule.
  • Denormalised runs.run_target + runs.target_device_worker_id makes the dispatcher's claim filter a single equality check and gives operators a queryable view of pending work per lane.
  • "Org default device" surfaced as organization.default_placement rather than a synthetic device row, so /api/me/devices stays clean.
  • Scale-from-zero sketch (KEDA PG scaler, 15s queue-age trigger, in-flight heartbeat + advisory-locked reaper) — deferred until load justifies it; this PR documents the design so the placement work lands compatible with it.
  • Migration is additive only; no run-row backfill, with a one-release COALESCE bridge in the claim WHERE.

Findings worth flagging during review:

  • Prod's summaries-app-lobu-worker is currently at 1/1, not 0 — the issue body's "scaled to 0" is stale.
  • Today's runs table has no in-flight heartbeat / lease column; last_heartbeat_at lives on the legacy workers table the new dispatcher does not write. A crashed worker leaves running rows orphaned. The lease + reaper in §4.3 covers that even before any autoscaling.

Open questions are listed in §8 of the doc.

Leaving #615 open until the design is reviewed and an implementation issue is broken out.

Test plan

Draft proposal for dynamic device-vs-cloud connector placement, explicit
run_target tagging, and a scale-from-zero pod pool with lease + reaper.
No code changes — design phase only.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 68cb2c72-2635-4414-9521-158a3338e27b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch design/connector-placement-615

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@buremba
Copy link
Copy Markdown
Member Author

buremba commented May 18, 2026

Closing this design draft.

After discussion with @buremba, the design was over-built for current scale:

  • No real broken thing today: implicit capability-based routing works (connector-worker is 1/1, not 0/1 as the issue body claimed) and no user has hit "run waits forever".
  • Scale-from-zero is premature — one idle pod is cheaper than the ops cost of KEDA + cold-start + lease/reaper plumbing until tenant count jumps.
  • The one real bug the design surfaced — no stale-run reaper — has shipped as a small focused PR: fix(runs): add heartbeat + stale-run reaper #849.

Leaving #615 open with a pointer to #849 as the chosen direction; revisiting the rest when there's actual demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants