feat(host-service): project.setup endpoint for local repo path assignment#3345
feat(host-service): project.setup endpoint for local repo path assignment#3345
Conversation
Outlines the approach for allowing users to import existing local repos or clone to a chosen location instead of auto-cloning to a fixed path. Covers local-only storage decision, throw-on-create pattern, and phased implementation plan.
…repos Adds a `project.setup` mutation that lets users either import an existing local repo or clone to a chosen directory. Validates git remotes against the cloud project's GitHub repo (handles SSH + HTTPS). Upserts the local projects table so re-running handles re-pointing.
📝 WalkthroughWalkthroughAdds a design doc and host-service implementation for per-host project path mapping: a new protected Changes
Sequence DiagramsequenceDiagram
participant Client
participant HostRouter as Host Project Router
participant Cloud as Cloud API/DB
participant Git as Git / FileSystem
participant HostDB as Host SQLite
Client->>HostRouter: project.setup(projectId, mode, localPath)
HostRouter->>Cloud: fetch project metadata (repoCloneUrl, provider, slug)
alt mode == "import"
HostRouter->>Git: assert localPath exists & is directory
HostRouter->>Git: run git rev-parse --show-toplevel
HostRouter->>Git: getAllRemoteUrls / getGitHubRemotes
HostRouter->>Git: findMatchingRemote(expectedSlug)
else mode == "clone"
HostRouter->>Git: validate parent directory exists
HostRouter->>Git: clone repoCloneUrl -> targetDir
HostRouter->>Git: getAllRemoteUrls / getGitHubRemotes on clone
HostRouter->>Git: findMatchingRemote(expectedSlug)
end
HostRouter->>HostDB: upsert projects row (repoPath, repoOwner, repoName, repoUrl, remoteName)
HostRouter->>Client: return { repoPath }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds a Key changes:
Two P1 issues in Confidence Score: 3/5Safe to merge after fixing the two P1 bugs in The core import-mode logic is sound and well-validated. Clone mode has two concrete gaps: it skips the
Important Files Changed
Sequence DiagramsequenceDiagram
participant Desktop
participant HostService as host-service (tRPC)
participant Cloud as Cloud API
participant LocalSQLite as Local SQLite
participant FS as Filesystem / Git
Desktop->>HostService: project.setup({ projectId, mode, localPath })
HostService->>Cloud: v2Project.get({ organizationId, id })
Cloud-->>HostService: { repoCloneUrl }
HostService->>HostService: extractGitHubSlug(repoCloneUrl) → expectedSlug
alt mode = import
HostService->>FS: existsSync + statSync(localPath)
HostService->>FS: git rev-parse --show-toplevel
FS-->>HostService: gitRoot
HostService->>FS: git remote -v (on gitRoot)
FS-->>HostService: remotes map
HostService->>HostService: findMatchingRemote(remotes, expectedSlug)
note over HostService: throws BAD_REQUEST if no match
else mode = clone
HostService->>FS: existsSync(parentDir)
HostService->>FS: simpleGit().clone(repoCloneUrl, targetPath)
FS-->>HostService: clone complete
end
HostService->>FS: git remote -v (on repoPath)
FS-->>HostService: remotes + matchingRemote
HostService->>LocalSQLite: INSERT projects ON CONFLICT DO UPDATE
LocalSQLite-->>HostService: ok
HostService-->>Desktop: { repoPath }
Reviews (1): Last reviewed commit: "feat(host-service): add project.setup en..." | Re-trigger Greptile |
| ): Promise<string> { | ||
| if (!existsSync(parentDir)) { | ||
| throw new TRPCError({ | ||
| code: "BAD_REQUEST", | ||
| message: `Parent directory does not exist: ${parentDir}`, | ||
| }); | ||
| } | ||
|
|
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
|
||
| await simpleGit().clone(repoCloneUrl, targetPath); | ||
|
|
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| export function extractGitHubSlug(remoteUrl: string): string | null { | ||
| // SSH format: git@github.com:owner/repo.git | ||
| const sshMatch = remoteUrl.match( | ||
| /^[\w.-]+@github\.com:([^/]+\/[^/]+?)(?:\.git)?$/, | ||
| ); | ||
| if (sshMatch?.[1]) return sshMatch[1]; | ||
|
|
||
| // HTTPS format: https://github.com/owner/repo.git | ||
| const httpsMatch = remoteUrl.match( | ||
| /^https?:\/\/github\.com\/([^/]+\/[^/]+?)(?:\.git)?$/, | ||
| ); | ||
| if (httpsMatch?.[1]) return httpsMatch[1]; | ||
|
|
||
| return null; | ||
| } |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| // Extract repo metadata from the resolved path | ||
| const git = simpleGit(repoPath); | ||
| const remotes = await getAllRemoteUrls(git); | ||
| const matchingRemote = findMatchingRemote(remotes, expectedSlug); | ||
| const remoteUrl = matchingRemote | ||
| ? remotes.get(matchingRemote) | ||
| : undefined; | ||
| const repoFullName = remoteUrl | ||
| ? extractGitHubSlug(remoteUrl) | ||
| : expectedSlug; | ||
| const [repoOwner, repoName] = repoFullName?.split("/") ?? []; |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
🧹 Preview Cleanup CompleteThe following preview resources have been cleaned up:
Thank you for your contribution! 🎉 |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/v2-host-project-paths.md`:
- Around line 49-56: The fenced code blocks containing the ASCII tree/flow
diagrams (e.g., the block showing "host-service local SQLite" and other similar
blocks) are missing language identifiers and trigger markdownlint MD040; edit
each fenced block (including the ones referenced later in the file) to add the
language tag "text" after the opening ``` so they read ```text and save—this
will satisfy markdownlint without changing content.
- Around line 173-177: Update the "Validation for 'Use existing directory'"
wording to not require a literal .git directory; instead state that validation
should use git rev-parse --show-toplevel (as used by the host-service import
path validation and project.setup) so linked worktrees (a .git file) and paths
inside a repo are accepted; replace the bullet "Contains a `.git` folder (is a
git repo)" with a line that explains checking repo root via git rev-parse and
matching remote URL, and add a note that either a .git folder or a .git file
(worktree) is valid.
In `@packages/host-service/src/trpc/router/project/project.ts`:
- Line 2: importExistingRepo() already stores an absolute gitRoot but
cloneRepo() persists the client's raw localPath, making projects.repoPath
cwd-dependent and fragile; change cloneRepo() to resolve and persist an absolute
path (use path.resolve/localPath -> absolute) and before cloning validate that
the parent directory is an actual directory (use
fs.statSync(parentDir).isDirectory() or fs.lstatSync(...).isDirectory()) rather
than existsSync(parentDir). Also update any code that constructs parentDir
(references to basename/join) to compute parent = path.dirname(resolvedPath) so
validation and persisted repoPath are consistent; ensure projects.repoPath is
always the resolved absolute path.
- Around line 64-81: The code currently persists and returns raw git remotes
(repoUrl) which may contain embedded credentials; update the logic around
repoFullName/remoteUrl (the remotes Map, matchingRemote variable,
extractGitHubSlug usage, and the ctx.db.insert into projects where repoUrl is
set) to sanitize remoteUrl before any use: strip userinfo/credentials from the
URL (or replace with a credentialless form like only scheme+host+path) and store
only the sanitized version in projects.repoUrl, and update any error/mismatch
messages (the mismatch handling around lines referenced also) to use the
sanitizedRemote instead of the raw remote; ensure extractGitHubSlug still
receives a safe value (or derive slug from sanitized URL) so no raw credentials
are logged or returned.
In `@packages/host-service/src/trpc/router/project/utils/git-remote.ts`:
- Around line 14-18: The current loop that splits output and uses
/^(\S+)\s+(\S+)\s+\(fetch\)$/ is too narrow and brittle; update the parsing in
the block that iterates over output (the variable named output, iterating with
for (const line of ...), populating remotes) to split lines with a CRLF-safe
split (e.g. split by /\r?\n/) and accept additional remote URL forms (ssh://...,
https://..., optional trailing slash, optional .git) by broadening the regex or,
better, reuse the existing parseGitHubRemote logic: call or import
parseGitHubRemote (from parse-github-remote) to normalize match[2] values before
inserting into remotes, ensuring both fetch and push lines are handled and
normalized so project.setup has all valid GitHub remote forms.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 84d44628-5072-464a-9f9e-7a4c729b26df
📒 Files selected for processing (3)
docs/design/v2-host-project-paths.mdpackages/host-service/src/trpc/router/project/project.tspackages/host-service/src/trpc/router/project/utils/git-remote.ts
| ``` | ||
| host-service local SQLite | ||
| └── projects | ||
| ├── id text PK (matches cloud v2_projects.id) | ||
| ├── repoPath text NOT NULL ← this is the path mapping | ||
| ├── repoProvider, repoOwner, repoName, repoUrl, remoteName | ||
| └── createdAt | ||
| ``` |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
3 issues found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/host-service/src/trpc/router/project/utils/git-remote.ts">
<violation number="1" location="packages/host-service/src/trpc/router/project/utils/git-remote.ts:44">
P2: `ssh://` URI-format remotes are not handled. GitHub supports both SCP-style (`git@github.com:org/repo.git`) and full SSH URI (`ssh://git@github.com/org/repo.git`). The latter won't match either regex, causing valid repos to fail import validation with a misleading "no remote matches" error.
Add a third pattern between the SSH and HTTPS checks:
```typescript
const sshUriMatch = remoteUrl.match(
/^ssh:\/\/[\w.-]+@github\.com\/([^/]+\/[^/]+?)(?:\.git)?$/,
);
if (sshUriMatch?.[1]) return sshUriMatch[1];
```</violation>
</file>
<file name="packages/host-service/src/trpc/router/project/project.ts">
<violation number="1" location="packages/host-service/src/trpc/router/project/project.ts:197">
P2: Validate that `parentDir` is a directory (not just existing) before cloning, so invalid file paths return a controlled `BAD_REQUEST` error.</violation>
<violation number="2" location="packages/host-service/src/trpc/router/project/project.ts:214">
P2: Wrap the `clone()` call in a try/catch to throw a `TRPCError` with `BAD_REQUEST`. Currently a network timeout, auth failure, or disk-full error propagates as a raw `GitError`, which tRPC surfaces as a generic `INTERNAL_SERVER_ERROR` with no actionable message for the client.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Reuse existing parseGitHubRemote helper instead of custom regex, adds ssh:// URI support and credential-safe URL storage - Validate parentDir is a directory in cloneRepo, not just exists - Wrap simpleGit().clone() in try/catch for actionable error messages - Return remotes from importExistingRepo to avoid double git remote -v - Use path.resolve for absolute clone paths - Fix design doc: git rev-parse instead of .git folder check
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/host-service/src/trpc/router/project/project.ts (1)
238-243: Consider cleanup on post-clone verification failure.If the cloned repository doesn't match the expected GitHub remote (which would indicate a bug or misconfiguration), the cloned directory at
targetPathremains on disk. This is a rare edge case, but could leave orphaned directories.💡 Optional: Clean up on failure
if (!matchingRemote) { + // Clean up the mismatched clone + try { + rmSync(targetPath, { recursive: true, force: true }); + } catch { + // Best-effort cleanup + } throw new TRPCError({ code: "INTERNAL_SERVER_ERROR", message: "Cloned repo does not match expected GitHub remote", }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/host-service/src/trpc/router/project/project.ts` around lines 238 - 243, When the post-clone verification fails (the existing check using matchingRemote) ensure the cloned directory at targetPath is cleaned up before throwing the TRPCError: call the appropriate async remove/rm method (e.g., fs.promises.rm or a project-level util) to delete targetPath (recursively) and handle/log any removal errors, then throw the TRPCError as before; update the block around matchingRemote to perform cleanup first and only throw after attempted removal to avoid leaving orphaned clone directories.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/host-service/src/trpc/router/project/project.ts`:
- Around line 238-243: When the post-clone verification fails (the existing
check using matchingRemote) ensure the cloned directory at targetPath is cleaned
up before throwing the TRPCError: call the appropriate async remove/rm method
(e.g., fs.promises.rm or a project-level util) to delete targetPath
(recursively) and handle/log any removal errors, then throw the TRPCError as
before; update the block around matchingRemote to perform cleanup first and only
throw after attempted removal to avoid leaving orphaned clone directories.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f21e442b-084e-43ee-b34d-9de235101173
📒 Files selected for processing (3)
docs/design/v2-host-project-paths.mdpackages/host-service/src/trpc/router/project/project.tspackages/host-service/src/trpc/router/project/utils/git-remote.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/host-service/src/trpc/router/project/utils/git-remote.ts
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/host-service/src/trpc/router/project/project.ts">
<violation number="1" location="packages/host-service/src/trpc/router/project/project.ts:238">
P2: When clone validation fails, the newly cloned directory is left behind, causing retries to fail with "Directory already exists".</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
e5735eb to
9eb7dc7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/host-service/src/trpc/router/project/project.ts`:
- Around line 255-261: The code throws a TRPCError when
remotes.get(matchingRemote) returns undefined but does not remove the previously
created targetPath directory; update the error path to mirror the cleanup at
line 248 by deleting the targetPath before throwing. Locate the block around
remotes.get(matchingRemote) and the variables matchingRemote, parsed, targetPath
and ensure you call the same cleanup routine (or fs.rm/fs.rmdir used elsewhere)
to remove targetPath (and handle any errors silently) immediately prior to
throwing the TRPCError so no orphaned directories remain if
findMatchingRemote/parsed logic fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 87e68cef-0a09-4b40-8387-6b405d109a99
📒 Files selected for processing (1)
packages/host-service/src/trpc/router/project/project.ts
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/host-service/src/trpc/router/project/project.ts (1)
26-26: Consider adding UUID validation forprojectId.The cloud API (
ctx.api.v2Project.get.query) expects a UUID format for the project ID (per context snippet 4:id: z.string().uuid()). Adding.uuid()here would provide earlier, clearer validation errors.- projectId: z.string(), + projectId: z.string().uuid(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/host-service/src/trpc/router/project/project.ts` at line 26, The projectId schema currently uses z.string() but the downstream API expects a UUID; update the validation where projectId is defined (the Zod input schema that contains projectId) to use z.string().uuid() so invalid IDs are rejected earlier and match ctx.api.v2Project.get.query's id: z.string().uuid() expectation; ensure any related types or usages (e.g., the router/method that consumes this schema) still compile after the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/host-service/src/trpc/router/project/project.ts`:
- Line 26: The projectId schema currently uses z.string() but the downstream API
expects a UUID; update the validation where projectId is defined (the Zod input
schema that contains projectId) to use z.string().uuid() so invalid IDs are
rejected earlier and match ctx.api.v2Project.get.query's id: z.string().uuid()
expectation; ensure any related types or usages (e.g., the router/method that
consumes this schema) still compile after the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a25f9eca-c0e2-4998-b7a9-b71e48c118c8
📒 Files selected for processing (1)
packages/host-service/src/trpc/router/project/project.ts
…ment (superset-sh#3345) * docs: add design doc for v2 host project paths Outlines the approach for allowing users to import existing local repos or clone to a chosen location instead of auto-cloning to a fixed path. Covers local-only storage decision, throw-on-create pattern, and phased implementation plan. * feat(host-service): add project.setup endpoint for importing/cloning repos Adds a `project.setup` mutation that lets users either import an existing local repo or clone to a chosen directory. Validates git remotes against the cloud project's GitHub repo (handles SSH + HTTPS). Upserts the local projects table so re-running handles re-pointing. * fix(host-service): address PR review feedback on project.setup - Reuse existing parseGitHubRemote helper instead of custom regex, adds ssh:// URI support and credential-safe URL storage - Validate parentDir is a directory in cloneRepo, not just exists - Wrap simpleGit().clone() in try/catch for actionable error messages - Return remotes from importExistingRepo to avoid double git remote -v - Use path.resolve for absolute clone paths - Fix design doc: git rev-parse instead of .git folder check * fix(host-service): replace non-null assertions with explicit guards * fix(host-service): clean up cloned directory on post-clone validation failure * fix(host-service): add cleanup on all cloneRepo error paths
Summary
project.setupmutation to host-service that lets users import an existing local repo or clone to a chosen directory, instead of auto-cloning to~/.superset/repos/{id}Design decisions
projects.repoPath, no new cloud table neededworkspace.createwill throwPROJECT_NOT_SETUPinstead of auto-cloning, letting the client prompt for import/cloneproject.setupupserts, so re-running with import mode handles re-pointing (no separateupdatePathneeded)Test plan
project.setupwithmode: "import"pointing at an existing local repo — verify it validates git remotes and upserts the local project rowproject.setupwithmode: "import"on a repo with wrong remotes — verify it throws with mismatch detailsproject.setupwithmode: "clone"— verify it clones and upsertsproject.setuptwice on same project — verify upsert overwrites cleanlybun run typecheck --filter=@superset/host-service)Summary by cubic
Adds a
project.setupendpoint inhost-serviceto import an existing local repo or clone to a chosen directory, preventing duplicate checkouts. Validates GitHub remotes against the cloud project and stores the repo path locally.New Features
project.setupmutation withmode: "import" | "clone"; returns{ repoPath }.utils/git-remote.ts; uses sharedparseGitHubRemote(SSH/HTTPS andssh://support, credential-safe URL).projects.repoPathin local SQLite, allowing re-runs to re-point paths.workspace.createthrow pattern is deferred (auto-clone remains for now).Bug Fixes
path.resolve.simple-gitclone in try/catch for clearer error messages.git rev-parsefor repo detection.Written for commit c2b92ba. Summary will update on new commits.
Summary by CodeRabbit
Documentation
New Features