Skip to content

feat(host-service): host attachment store (v2 PR 2)#3916

Merged
Kitenite merged 4 commits intomainfrom
spangle-satellite
Apr 30, 2026
Merged

feat(host-service): host attachment store (v2 PR 2)#3916
Kitenite merged 4 commits intomainfrom
spangle-satellite

Conversation

@Kitenite
Copy link
Copy Markdown
Collaborator

@Kitenite Kitenite commented Apr 30, 2026

Summary

PR 2 of the canonical workspace.create() refactor. Adds host-scoped attachment storage so the renderer can upload files once (keyed by an opaque attachmentId) and reference them in later agent launches without re-uploading or shuttling bytes through workspace creation.

Plan: plans/20260425-host-attachments-pr2.md — added in this PR.
Umbrella design: #3893.

Backend

  • New attachments.upload and attachments.delete tRPC procedures.
  • Storage is per-org under <HOST_MANIFEST_DIR>/attachments/<id>/, matching where host.db lives. Clean GC when an org is removed; one isolation boundary across the host service.
  • Each attachment stored as <id>.<ext> + metadata.json sidecar (id, mediaType, originalFilename, sizeBytes, createdAt).
  • Type validation uses the mime-types lib to map MIME → extension — any MIME the lib recognizes is accepted (no curated allowlist; covers JSON/CSV/SVG/source files for free).
  • Hard-coded 25 MB per-file cap.
  • attachmentId is a UUID; delete enforces UUID input shape, which also blocks path-traversal via the id field.
  • delete is intentionally idempotent (silent success on missing) so cleanup callers don't need try/catch — different verb semantics than agentConfigs.remove.
  • Renderer never sees the on-disk path; attachmentId is the only stable identifier that crosses the wire.

Tests

9 tests against a temp dir injected via HOST_MANIFEST_DIR:

  • upload writes bytes + metadata to disk
  • correct extension chosen per MIME (txt, pdf, jpg, json)
  • unrecognized MIME rejected
  • empty payload rejected
  • oversized payload rejected
  • unique id per upload
  • delete removes the directory
  • delete is idempotent for unknown id
  • non-UUID id rejected (path traversal guard)

Out of scope (deferred)

  • Renderer state slice for tracking uploaded ids. Spec calls for it in PR2, but the only consumer is the new workspace modal in PR5. Building it now means it sits unused for two PRs — moving to PR5 to ship together with the modal that uses it.
  • Streaming/direct upload endpoint — base64-over-tRPC is the v1 transport per the plan.
  • Resolving attachmentId → host-readable path inside prompt assembly. Lives in PR4 (workspace.create()).

Test plan

  • bun test packages/host-service/src/trpc/router/attachments/ — 9/9 pass
  • bun run typecheck — clean
  • bun run lint — clean
  • Manual: upload a PNG via curl/dev console once renderer wires up in PR5

Adds dep

mime-types@3.x + @types/mime-types.

Summary by CodeRabbit

  • New Features

    • Added attachment upload and delete endpoints with UUID-based IDs, secure on-disk storage and metadata tracking.
    • File type validation for common image/document formats; 25 MB per-file upload limit enforced.
    • Uploads return media info and size; consecutive uploads yield distinct IDs. Delete is idempotent and rejects invalid IDs.
  • Tests

    • End-to-end tests covering upload/delete behavior and storage side effects.
  • Documentation

    • Planning doc describing interfaces, storage layout, and validation rules.

PR 2 of the canonical workspace.create() refactor — see
plans/20260425-canonical-workspace-create-flow.md.

Adds host-scoped attachment storage so the renderer can upload files
once (keyed only by an opaque attachmentId) and reference them in
later agent launches without re-uploading or shuttling bytes through
workspace creation.

Backend
- New `attachments.upload` and `attachments.delete` tRPC procedures.
- Storage is per-org under `<HOST_MANIFEST_DIR>/attachments/<id>/`,
  matching where `host.db` lives. This gives clean GC when an org is
  removed and one isolation boundary across the host service.
- Each attachment is stored as `<id>.<ext>` plus a `metadata.json`
  sidecar (id, mediaType, originalFilename, sizeBytes, createdAt).
- Type validation uses the `mime-types` lib to map MIME → extension —
  any MIME the lib recognizes is accepted (no curated allowlist).
- Hard-coded 25 MB per-file cap.
- `attachmentId` is a UUID; `delete` enforces UUID input shape, which
  also blocks path-traversal via the id field.
- `delete` is intentionally idempotent (silent success on missing) so
  cleanup callers don't need try/catch.
- Renderer never sees the on-disk path; `attachmentId` is the only
  stable identifier that crosses the wire.

Tests (9 tests against a temp dir as HOST_MANIFEST_DIR):
- upload writes bytes + metadata to disk
- correct extension chosen per MIME (txt, pdf, jpg, json)
- unrecognized MIME rejected
- empty payload rejected
- oversized payload rejected
- unique id per upload
- delete removes the directory
- delete is idempotent for unknown id
- non-UUID id rejected (path traversal guard)

Out of scope (deferred to later PRs):
- Renderer state slice for tracking uploaded ids (PR 2 follow-up; not
  needed until the new workspace modal in PR 5).
- Direct/streaming upload endpoint — base64-over-tRPC is the v1
  transport per the plan.
- Resolving attachmentId → host-readable path inside `workspace.create()`
  prompt assembly. That lives in PR 4.

Adds dep: `mime-types@3.x` + `@types/mime-types`.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ac2b32e0-0788-4255-9e4a-c1649280b545

📥 Commits

Reviewing files that changed from the base of the PR and between d5db5d6 and 7779723.

📒 Files selected for processing (1)
  • packages/host-service/src/trpc/router/attachments/attachments.test.ts

📝 Walkthrough

Walkthrough

Adds a host-scoped attachments feature: new tRPC attachments router (upload/delete), filesystem-backed storage and metadata, MIME/type and size validation, UUID-based IDs, package export for attachments, and corresponding tests and docs.

Changes

Cohort / File(s) Summary
Package config
packages/host-service/package.json
Adds public export subpath ./attachments and runtime/dev deps for mime-types and @types/mime-types.
Router & API surface
packages/host-service/src/trpc/router/attachments/attachments.ts, packages/host-service/src/trpc/router/attachments/index.ts, packages/host-service/src/trpc/router/attachments/constants.ts
New attachmentsRouter with protected upload and delete mutations; adds AttachmentUploadResult type and MAX_ATTACHMENT_BYTES constant (25 MiB).
Storage utilities
packages/host-service/src/trpc/router/attachments/storage.ts
New persistence helpers and types: attachment root resolution (respects HOST_MANIFEST_DIR with fallback), MIME→extension mapping, path helpers, writeAttachment, and deleteAttachment; writes files and metadata.json with secure modes.
Router integration
packages/host-service/src/trpc/router/router.ts
Mounts the new attachments router into the root appRouter.
Tests
packages/host-service/src/trpc/router/attachments/attachments.test.ts
Adds Bun tests covering upload/delete flows, MIME handling, size/base64 validation, filesystem assertions, idempotency, and path-traversal rejection.
Design doc
plans/20260425-host-attachments-pr2.md
New PR plan describing API, on-disk layout, validation rules, testing strategy, and out-of-scope follow-ups.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant TRPC_Server as "tRPC Server"
    participant Validator
    participant Storage
    participant FS as "Filesystem"

    Client->>TRPC_Server: upload(mediaType, base64Data, originalFilename)
    TRPC_Server->>Validator: resolve MIME → extension
    Validator-->>TRPC_Server: extension / error
    TRPC_Server->>Validator: pre-decode size check (<= MAX_ATTACHMENT_BYTES)
    Validator-->>TRPC_Server: size ok / error
    TRPC_Server->>Validator: decode base64 → bytes
    Validator-->>TRPC_Server: bytes / error
    TRPC_Server->>Storage: generate UUID, build metadata
    TRPC_Server->>Storage: writeAttachment(bytes, metadata)
    Storage->>FS: mkdir (mode 0700), write file & metadata (mode 0600)
    FS-->>Storage: success
    Storage-->>TRPC_Server: stored
    TRPC_Server-->>Client: AttachmentUploadResult{attachmentId,...}
Loading
sequenceDiagram
    participant Client
    participant TRPC_Server as "tRPC Server"
    participant Validator
    participant Storage
    participant FS as "Filesystem"

    Client->>TRPC_Server: delete(attachmentId)
    TRPC_Server->>Validator: validate UUID format
    Validator-->>TRPC_Server: ok / error
    TRPC_Server->>Storage: deleteAttachment(attachmentId)
    Storage->>FS: rimraf attachment directory (force)
    FS-->>Storage: removed / not found
    Storage-->>TRPC_Server: complete (idempotent)
    TRPC_Server-->>Client: Success response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hopped through code with careful paws,
New folders, bytes, and tiny laws.
UUIDs tucked in burrows neat,
Metadata snug, permissions meet,
Uploads, deletes — the warren hums.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(host-service): host attachment store (v2 PR 2)' clearly describes the main feature being added—host attachment storage—and is specific and actionable.
Description check ✅ Passed The PR description is comprehensive and well-structured with clear sections on Summary, Backend implementation, Tests, deferred scope, and a test plan, addressing the template requirements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spangle-satellite

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 6/8 reviews remaining, refill in 12 minutes and 27 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Captures the scope, public API, on-disk layout, validation, and
explicit out-of-scope list for this PR. Notes why storage is per-org
under HOST_MANIFEST_DIR and why there's no hand-curated MIME allowlist.
Renderer state slice is deferred to PR 5 (the modal that consumes it).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

🚀 Preview Deployment

🔗 Preview Links

Service Status Link
Neon Database (Neon) View Branch
Vercel API (Vercel) Open Preview
Vercel Web (Vercel) Open Preview
Vercel Marketing (Vercel) Open Preview
Vercel Admin (Vercel) Open Preview
Vercel Docs (Vercel) Open Preview

Preview updates automatically with new commits

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR adds host-scoped attachment storage to the host-service: two tRPC procedures (attachments.upload and attachments.delete) that accept base64-encoded files, persist them to <HOST_MANIFEST_DIR>/attachments/<uuid>/ with a metadata sidecar, and return an opaque attachmentId for use in later agent launches. The design is clean — UUID-keyed directories, narrow file permissions (0o700/0o600), MIME→extension mapping via mime-types, and idempotent delete — with all 9 unit tests passing.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/improvement suggestions with no blocking correctness issues.

The three comments are all P2: a dead try/catch (misleading but not a security or crash risk), sync I/O (anti-pattern but not broken), and a missing aggregate quota (a hardening concern, not an immediate bug). The core logic — UUID generation, MIME validation, path construction, delete idempotency, and per-org isolation — is correct and well-tested.

packages/host-service/src/trpc/router/attachments/attachments.ts and storage.ts have the minor issues worth addressing in a follow-up.

Important Files Changed

Filename Overview
packages/host-service/src/trpc/router/attachments/attachments.ts Core tRPC router for upload/delete; dead try/catch on base64 decode and no aggregate storage quota are the two notable gaps.
packages/host-service/src/trpc/router/attachments/storage.ts Disk I/O helpers with correct permissions (0o700 dir, 0o600 files); synchronous writeFileSync/mkdirSync block the event loop for uploads up to 25 MB.
packages/host-service/src/trpc/router/attachments/attachments.test.ts 9 well-structured tests covering the happy path, MIME validation, size limits, uniqueness, delete idempotency, and UUID path-traversal guard; good temp-dir isolation.
packages/host-service/src/trpc/router/attachments/constants.ts Single constant for the 25 MB per-file cap; straightforward.
packages/host-service/src/trpc/router/attachments/index.ts Clean barrel re-export of public types and the router.
packages/host-service/src/trpc/router/router.ts Registers the new attachmentsRouter alphabetically in the app router; no issues.

Sequence Diagram

sequenceDiagram
    participant Renderer
    participant tRPC as tRPC (attachmentsRouter)
    participant FS as File System (HOST_MANIFEST_DIR)

    Renderer->>tRPC: attachments.upload({ data, mediaType, originalFilename })
    tRPC->>tRPC: Validate MIME via mime-types.extension()
    tRPC->>tRPC: Decode base64 → Buffer
    tRPC->>tRPC: Check size ≤ 25 MB
    tRPC->>tRPC: Generate attachmentId (UUID)
    tRPC->>FS: mkdirSync attachments/uuid/ (0o700)
    tRPC->>FS: writeFileSync uuid.ext (0o600)
    tRPC->>FS: writeFileSync metadata.json (0o600)
    tRPC-->>Renderer: { attachmentId, mediaType, originalFilename, sizeBytes }

    Renderer->>tRPC: attachments.delete({ attachmentId: UUID })
    tRPC->>tRPC: Validate UUID format (path-traversal guard)
    tRPC->>FS: rmSync attachments/uuid/ { force: true }
    tRPC-->>Renderer: { success: true }
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
packages/host-service/src/trpc/router/attachments/attachments.ts:36-44
**Dead try/catch — `Buffer.from` never throws on invalid base64**

In Node.js and Bun, `Buffer.from(str, 'base64')` does not throw for malformed input — it silently skips unrecognized characters and returns whatever valid bytes it can extract. The `catch` block (and the `"Attachment data is not valid base64"` error message) are therefore dead code that can never execute. Invalid base64 sent by a client will pass silently and be stored as corrupt bytes, with no error surfaced.

A lightweight check like verifying the string matches a base64 pattern before decoding would make the validation real, or at minimum the misleading `try/catch` should be removed so future readers don't assume it provides a validation guarantee.

### Issue 2 of 3
packages/host-service/src/trpc/router/attachments/storage.ts:60-81
**Synchronous file I/O in a mutation handler blocks the event loop**

`mkdirSync` and `writeFileSync` are synchronous, blocking calls. Writing up to 25 MB synchronously in a tRPC mutation handler will stall the entire event loop for the duration of the disk write, preventing any other requests from being handled in the meantime. Consider using the async `fs/promises` equivalents (`mkdir`, `writeFile`) and making `writeAttachment` an `async` function.

### Issue 3 of 3
packages/host-service/src/trpc/router/attachments/attachments.ts:51-56
**No per-org storage quota across uploads**

Each individual upload is capped at 25 MB, but there is no limit on how many uploads an authenticated user can make. An authenticated renderer could exhaust available disk space by making many sequential upload calls. A simple guard — counting existing attachment directories or accumulating `sizeBytes` from stored metadata — would bound the total org footprint.

Reviews (1): Last reviewed commit: "docs(plans): add PR 2 plan for host atta..." | Re-trigger Greptile

Comment on lines +36 to +44
let bytes: Buffer;
try {
bytes = Buffer.from(input.data.data, "base64");
} catch {
throw new TRPCError({
code: "BAD_REQUEST",
message: "Attachment data is not valid base64",
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead try/catch — Buffer.from never throws on invalid base64

In Node.js and Bun, Buffer.from(str, 'base64') does not throw for malformed input — it silently skips unrecognized characters and returns whatever valid bytes it can extract. The catch block (and the "Attachment data is not valid base64" error message) are therefore dead code that can never execute. Invalid base64 sent by a client will pass silently and be stored as corrupt bytes, with no error surfaced.

A lightweight check like verifying the string matches a base64 pattern before decoding would make the validation real, or at minimum the misleading try/catch should be removed so future readers don't assume it provides a validation guarantee.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/trpc/router/attachments/attachments.ts
Line: 36-44

Comment:
**Dead try/catch — `Buffer.from` never throws on invalid base64**

In Node.js and Bun, `Buffer.from(str, 'base64')` does not throw for malformed input — it silently skips unrecognized characters and returns whatever valid bytes it can extract. The `catch` block (and the `"Attachment data is not valid base64"` error message) are therefore dead code that can never execute. Invalid base64 sent by a client will pass silently and be stored as corrupt bytes, with no error surfaced.

A lightweight check like verifying the string matches a base64 pattern before decoding would make the validation real, or at minimum the misleading `try/catch` should be removed so future readers don't assume it provides a validation guarantee.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +60 to +81
export function writeAttachment(
bytes: Uint8Array,
metadata: AttachmentMetadata,
baseDirOverride?: string,
): void {
const dir = getAttachmentDir(metadata.attachmentId, baseDirOverride);
mkdirSync(dir, { recursive: true, mode: 0o700 });
writeFileSync(
getAttachmentFilePath(
metadata.attachmentId,
metadata.mediaType,
baseDirOverride,
),
bytes,
{ mode: 0o600 },
);
writeFileSync(
getAttachmentMetadataPath(metadata.attachmentId, baseDirOverride),
JSON.stringify(metadata, null, 2),
{ mode: 0o600 },
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Synchronous file I/O in a mutation handler blocks the event loop

mkdirSync and writeFileSync are synchronous, blocking calls. Writing up to 25 MB synchronously in a tRPC mutation handler will stall the entire event loop for the duration of the disk write, preventing any other requests from being handled in the meantime. Consider using the async fs/promises equivalents (mkdir, writeFile) and making writeAttachment an async function.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/trpc/router/attachments/storage.ts
Line: 60-81

Comment:
**Synchronous file I/O in a mutation handler blocks the event loop**

`mkdirSync` and `writeFileSync` are synchronous, blocking calls. Writing up to 25 MB synchronously in a tRPC mutation handler will stall the entire event loop for the duration of the disk write, preventing any other requests from being handled in the meantime. Consider using the async `fs/promises` equivalents (`mkdir`, `writeFile`) and making `writeAttachment` an `async` function.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +51 to +56
if (bytes.length > MAX_ATTACHMENT_BYTES) {
throw new TRPCError({
code: "PAYLOAD_TOO_LARGE",
message: `Attachment exceeds ${MAX_ATTACHMENT_BYTES} bytes`,
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No per-org storage quota across uploads

Each individual upload is capped at 25 MB, but there is no limit on how many uploads an authenticated user can make. An authenticated renderer could exhaust available disk space by making many sequential upload calls. A simple guard — counting existing attachment directories or accumulating sizeBytes from stored metadata — would bound the total org footprint.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/host-service/src/trpc/router/attachments/attachments.ts
Line: 51-56

Comment:
**No per-org storage quota across uploads**

Each individual upload is capped at 25 MB, but there is no limit on how many uploads an authenticated user can make. An authenticated renderer could exhaust available disk space by making many sequential upload calls. A simple guard — counting existing attachment directories or accumulating `sizeBytes` from stored metadata — would bound the total org footprint.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/host-service/src/trpc/router/attachments/attachments.test.ts`:
- Around line 92-107: The test never triggers the post-decode empty branch
(bytes.length === 0) in attachments.ts — either rename the test to reflect it
only checks that non-empty base64 ("AA==") is accepted and that Zod rejects an
empty raw string, or add a new assertion that actually exercises the
decoded-empty path by calling caller.upload with a non-empty-but-invalid-base64
payload that passes the Zod min(1) check and decodes to zero bytes (e.g., data:
{ kind: "base64", data: " " } or similar, casting to any if needed) so the code
path that checks bytes.length === 0 in attachments.ts runs and throws.

In `@packages/host-service/src/trpc/router/attachments/attachments.ts`:
- Around line 36-38: Reject the base64 payload by estimating decoded size from
input.data.data.length before calling Buffer.from to prevent decoding oversized
uploads (enforce the 25 MB cap early), then decode only after the size check;
after decoding, avoid the extra copy by passing the Buffer directly to
writeAttachment (do not call new Uint8Array(bytes)). Apply the same change
pattern to the other instances that call Buffer.from and new Uint8Array in this
file (the additional spots around the calls to writeAttachment and the other
Buffer.from usages).

In `@packages/host-service/src/trpc/router/attachments/storage.ts`:
- Around line 23-29: The getAttachmentsRoot function currently uses
process.env.HOST_MANIFEST_DIR?.trim() which still treats an empty string as set;
update getAttachmentsRoot to treat a blank HOST_MANIFEST_DIR as unset by first
capturing and trimming the env var (e.g., const manifest =
process.env.HOST_MANIFEST_DIR?.trim()) and then using baseDirOverride ??
(manifest && manifest.length ? manifest : join(homedir(), ".superset", "host",
"standalone")); ensure you reference the same function name getAttachmentsRoot
and the baseDirOverride parameter so blank env values fall back to the
standalone host path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7f0c3202-c482-4020-a951-dda0f9f79f60

📥 Commits

Reviewing files that changed from the base of the PR and between 51c4035 and adfb4d4.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • packages/host-service/package.json
  • packages/host-service/src/trpc/router/attachments/attachments.test.ts
  • packages/host-service/src/trpc/router/attachments/attachments.ts
  • packages/host-service/src/trpc/router/attachments/constants.ts
  • packages/host-service/src/trpc/router/attachments/index.ts
  • packages/host-service/src/trpc/router/attachments/storage.ts
  • packages/host-service/src/trpc/router/router.ts
  • plans/20260425-host-attachments-pr2.md

Comment thread packages/host-service/src/trpc/router/attachments/attachments.test.ts Outdated
Comment thread packages/host-service/src/trpc/router/attachments/attachments.ts Outdated
Comment thread packages/host-service/src/trpc/router/attachments/storage.ts
attachments.ts:
- Reject oversized payloads BEFORE Buffer.from allocates the decoded
  buffer. Adds estimateDecodedBase64Bytes(); a 1GB base64 string no
  longer spikes ~750MB of host memory before being rejected.
- Drop the dead try/catch around Buffer.from. Node/Bun's base64 decode
  silently skips invalid characters and never throws — the catch was
  misleading dead code. Decoded-empty bytes are still caught by the
  bytes.length === 0 branch.
- Stop wrapping `bytes` in `new Uint8Array(...)` before writeAttachment.
  Buffer already extends Uint8Array; the wrap was a redundant copy.

storage.ts:
- Treat blank/whitespace-only HOST_MANIFEST_DIR as unset. The previous
  `?.trim() ?? fallback` preserved empty strings, which would have
  resolved to `attachments/` (relative to CWD).

Tests:
- Split the conflated empty-payload test into three distinct cases:
  single-byte accept, schema-layer reject of empty input string, and
  the previously-uncovered decoded-empty branch (input `"="` passes
  zod min(1) but Buffer.from returns 0 bytes).
- New oversized test exercises the pre-decode estimator instead of
  allocating a real 25MB buffer in test memory.
- New regression tests for blank and whitespace-only HOST_MANIFEST_DIR
  fallback to ~/.superset/host/standalone/.

Plan doc: noted per-org storage quota as a v1 follow-up (matches the
review feedback we're explicitly deferring).
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/host-service/src/trpc/router/attachments/attachments.test.ts (1)

53-61: ⚡ Quick win

Assert the stored bytes too.

This happy-path test only proves that the file exists and that metadata.json was written. A truncated or corrupted payload would still pass. Read filePath back and compare it to the decoded upload bytes.

🧪 Suggested assertion
 		expect(existsSync(filePath)).toBe(true);
 		expect(existsSync(metaPath)).toBe(true);
+		expect(readFileSync(filePath)).toEqual(Buffer.from(PNG_BASE64, "base64"));
 
 		const metadata = JSON.parse(readFileSync(metaPath, "utf8"));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/host-service/src/trpc/router/attachments/attachments.test.ts` around
lines 53 - 61, The test currently only checks that filePath and metaPath exist
and metadata fields; extend it to verify the stored file contents match the
original upload bytes by reading filePath (e.g., using readFileSync) and
comparing the resulting Buffer/byte array to the decoded upload payload used
earlier in the test (the variable containing the upload bytes or Buffer you sent
when calling the attachment upload helper and the result object). Update
attachments.test.ts to read filePath, decode the original upload bytes the same
way they were encoded for upload, and assert strict equality (byte-for-byte)
between the two buffers to catch truncated or corrupted writes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/host-service/src/trpc/router/attachments/attachments.test.ts`:
- Around line 53-61: The test currently only checks that filePath and metaPath
exist and metadata fields; extend it to verify the stored file contents match
the original upload bytes by reading filePath (e.g., using readFileSync) and
comparing the resulting Buffer/byte array to the decoded upload payload used
earlier in the test (the variable containing the upload bytes or Buffer you sent
when calling the attachment upload helper and the result object). Update
attachments.test.ts to read filePath, decode the original upload bytes the same
way they were encoded for upload, and assert strict equality (byte-for-byte)
between the two buffers to catch truncated or corrupted writes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 33dfa99a-894d-4108-b93f-07bc935e7eb3

📥 Commits

Reviewing files that changed from the base of the PR and between adfb4d4 and d5db5d6.

📒 Files selected for processing (4)
  • packages/host-service/src/trpc/router/attachments/attachments.test.ts
  • packages/host-service/src/trpc/router/attachments/attachments.ts
  • packages/host-service/src/trpc/router/attachments/storage.ts
  • plans/20260425-host-attachments-pr2.md
✅ Files skipped from review due to trivial changes (1)
  • packages/host-service/src/trpc/router/attachments/storage.ts

Read the persisted file back and compare to the decoded upload bytes
so a truncated or corrupted write would fail the happy-path test.
@Kitenite Kitenite merged commit 2f44f08 into main Apr 30, 2026
13 of 14 checks passed
@Kitenite Kitenite deleted the spangle-satellite branch April 30, 2026 21:52
MocA-Love pushed a commit to MocA-Love/superset that referenced this pull request May 8, 2026
* feat(host-service): host attachment store (v2 PR 2)

PR 2 of the canonical workspace.create() refactor — see
plans/20260425-canonical-workspace-create-flow.md.

Adds host-scoped attachment storage so the renderer can upload files
once (keyed only by an opaque attachmentId) and reference them in
later agent launches without re-uploading or shuttling bytes through
workspace creation.

Backend
- New `attachments.upload` and `attachments.delete` tRPC procedures.
- Storage is per-org under `<HOST_MANIFEST_DIR>/attachments/<id>/`,
  matching where `host.db` lives. This gives clean GC when an org is
  removed and one isolation boundary across the host service.
- Each attachment is stored as `<id>.<ext>` plus a `metadata.json`
  sidecar (id, mediaType, originalFilename, sizeBytes, createdAt).
- Type validation uses the `mime-types` lib to map MIME → extension —
  any MIME the lib recognizes is accepted (no curated allowlist).
- Hard-coded 25 MB per-file cap.
- `attachmentId` is a UUID; `delete` enforces UUID input shape, which
  also blocks path-traversal via the id field.
- `delete` is intentionally idempotent (silent success on missing) so
  cleanup callers don't need try/catch.
- Renderer never sees the on-disk path; `attachmentId` is the only
  stable identifier that crosses the wire.

Tests (9 tests against a temp dir as HOST_MANIFEST_DIR):
- upload writes bytes + metadata to disk
- correct extension chosen per MIME (txt, pdf, jpg, json)
- unrecognized MIME rejected
- empty payload rejected
- oversized payload rejected
- unique id per upload
- delete removes the directory
- delete is idempotent for unknown id
- non-UUID id rejected (path traversal guard)

Out of scope (deferred to later PRs):
- Renderer state slice for tracking uploaded ids (PR 2 follow-up; not
  needed until the new workspace modal in PR 5).
- Direct/streaming upload endpoint — base64-over-tRPC is the v1
  transport per the plan.
- Resolving attachmentId → host-readable path inside `workspace.create()`
  prompt assembly. That lives in PR 4.

Adds dep: `mime-types@3.x` + `@types/mime-types`.

* docs(plans): add PR 2 plan for host attachment store

Captures the scope, public API, on-disk layout, validation, and
explicit out-of-scope list for this PR. Notes why storage is per-org
under HOST_MANIFEST_DIR and why there's no hand-curated MIME allowlist.
Renderer state slice is deferred to PR 5 (the modal that consumes it).

* fix(host-service): address review tier 1 on attachments

attachments.ts:
- Reject oversized payloads BEFORE Buffer.from allocates the decoded
  buffer. Adds estimateDecodedBase64Bytes(); a 1GB base64 string no
  longer spikes ~750MB of host memory before being rejected.
- Drop the dead try/catch around Buffer.from. Node/Bun's base64 decode
  silently skips invalid characters and never throws — the catch was
  misleading dead code. Decoded-empty bytes are still caught by the
  bytes.length === 0 branch.
- Stop wrapping `bytes` in `new Uint8Array(...)` before writeAttachment.
  Buffer already extends Uint8Array; the wrap was a redundant copy.

storage.ts:
- Treat blank/whitespace-only HOST_MANIFEST_DIR as unset. The previous
  `?.trim() ?? fallback` preserved empty strings, which would have
  resolved to `attachments/` (relative to CWD).

Tests:
- Split the conflated empty-payload test into three distinct cases:
  single-byte accept, schema-layer reject of empty input string, and
  the previously-uncovered decoded-empty branch (input `"="` passes
  zod min(1) but Buffer.from returns 0 bytes).
- New oversized test exercises the pre-decode estimator instead of
  allocating a real 25MB buffer in test memory.
- New regression tests for blank and whitespace-only HOST_MANIFEST_DIR
  fallback to ~/.superset/host/standalone/.

Plan doc: noted per-org storage quota as a v1 follow-up (matches the
review feedback we're explicitly deferring).

* test(host-service): assert stored attachment bytes match upload

Read the persisted file back and compare to the decoded upload bytes
so a truncated or corrupted write would fail the happy-path test.
MocA-Love pushed a commit to MocA-Love/superset that referenced this pull request May 8, 2026
Recorded as integrated via -s ours after batch PRs #455-#464.

Taken via individual PRs:
- PR  1 (#455): v2 polish 前半 safe set (9 commits)
- PR  2 (#456): v2/host-service polish 中盤 (12 commits)
- PR  3 (#457): sidebar polish + jwt API (5 commits)
- PR  4 (#458): host-service tRPC retry/cache/timeout (3 commits)
- PR  5 (#459): v2 diff pane / file pane polish (2 commits)
- PR  7 (#462): host-service v2 canonical workspace.create + attachment store (PR1 superset-sh#3893 + PR2 superset-sh#3916)
- PR 11 (#463): agents API + onboarding (7 commits + 1 cleanup)
- PR 12 (#464): v1→v2 import flow rewrite (11 commits + 2 follow-ups)
- PR 13 (#460): host-service shell env probe + typo (2 commits)
- PR 16 (#461): marketplace 19 themes (1 commit)

Skipped / deferred (recorded as integrated for behind=0):
- PR  6: CLI v1 launch (superset-sh#3898 + 30+ CLI/SDK followups) — defer to dedicated migration
- PR  9: v2 PR3 (superset-sh#3940) + revert (superset-sh#4017) — net-zero pair
- PR 10: pty-daemon (superset-sh#3896, superset-sh#3971, superset-sh#4054) — fork keeps its terminal-host
- PR 14: Slack MCP-v2 (superset-sh#4197, superset-sh#4208) — depends on mcp-v2/sdk divergence
- PR 15: onboarding remaining (superset-sh#4115, superset-sh#4125, superset-sh#4214, superset-sh#4213, superset-sh#4222, superset-sh#4225) — depends on fork's deleted setup pages

Behind: 0 after this merge.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant