Skip to content

feat: gateway logs#3994

Merged
chronark merged 4 commits intomainfrom
feat/gw-logs
Sep 19, 2025
Merged

feat: gateway logs#3994
chronark merged 4 commits intomainfrom
feat/gw-logs

Conversation

@Flo4604
Copy link
Member

@Flo4604 Flo4604 commented Sep 18, 2025

What does this PR do?

This writes Gateway logs into the new v2 schema, and lets the api etc still write into v1 schema.
Not a breaking change in any way, just renamed the function.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features

    • Metrics pipeline upgraded to a new event version (adds region and refines status/error fields).
    • Gateway now captures response bodies and statuses for richer analytics.
    • Deployment API adds a typed SourceType to classify upload sources.
  • Bug Fixes

    • Improved error capture and propagation for more accurate observability.
  • Chores

    • Standardized buffering APIs for the updated metrics flow.
    • Dev environment trigger behavior inverted: debug → manual, non-debug → auto.

@vercel
Copy link

vercel bot commented Sep 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
dashboard Ignored Ignored Preview Sep 19, 2025 10:16am
engineering Ignored Ignored Preview Sep 19, 2025 10:16am

@changeset-bot
Copy link

changeset-bot bot commented Sep 18, 2025

⚠️ No Changeset found

Latest commit: a219058

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

📝 Walkthrough

Walkthrough

Invert Tilt trigger_mode for dev vs prod; add ApiRequestV2 buffering while keeping V1 path; add request-body caching, response capture, and error recording to gateway Session; change routing GetConfig to return ConfigWithWorkspace (includes WorkspaceID); DB query now returns workspace_id; various proto formatting and minor refactors.

Changes

Cohort / File(s) Summary
Tilt trigger modes
go/Tiltfile
Invert trigger_mode: TRIGGER_MODE_MANUAL if debug, else TRIGGER_MODE_AUTO for api/gateway/ctrl (dashboard/metald unchanged).
ClickHouse client & API (V1/V2)
go/pkg/clickhouse/client.go, go/pkg/clickhouse/interface.go, go/pkg/clickhouse/noop.go, go/pkg/clickhouse/doc.go
Add ApiRequestV2 batcher and BufferApiRequest(ApiRequestV2); rename existing V1 public method to BufferRequest(ApiRequestV1); separate batch targets/configs.
Event buffer / Zen & API routes
go/pkg/zen/middleware_metrics.go, go/apps/api/routes/chproxy_metrics/handler.go, go/apps/api/routes/services.go
Rename EventBuffer API: BufferApiRequest(V1)BufferRequest(V1); update call sites; removed local EventBuffer interface declaration.
Gateway Session: body caching, response capture, error tracking
go/apps/gw/server/session.go, go/apps/gw/server/middleware_errors.go, go/apps/gw/server/middleware_metrics.go, go/apps/gw/server/server.go
Session.init now returns error and caches request body; add Session.error field, SetError(err), and CaptureResponseWriter() that captures status/body; middleware calls SetError; metrics middleware now buffers V2 payload and buffers whenever eventBuffer != nil; WrapHandler propagates init errors through middleware chain.
Gateway proxy: response capture & workspace propagation
go/apps/gw/router/gateway_proxy/handler.go
Use sess.CaptureResponseWriter() for forwarding and call capture func after Forward; obtain configWithWorkspace, set sess.WorkspaceID, and use its Config locally.
Routing: ConfigWithWorkspace & cache types
go/apps/gw/services/routing/types.go, go/apps/gw/services/routing/interface.go, go/apps/gw/services/routing/service.go, go/apps/gw/services/caches/caches.go
Introduce ConfigWithWorkspace{ Config *GatewayConfig; WorkspaceID string }; change GetConfig to return *ConfigWithWorkspace; update cache types and SWR loader to return composite type; unify not-found handling.
Partition DB: include workspace_id
go/pkg/partition/db/queries/gateway_find_config_by_hostname.sql, go/pkg/partition/db/gateway_find_config_by_hostname.sql_generated.go, go/pkg/partition/db/querier_generated.go
SQL now selects workspace_id; generated row struct adds WorkspaceID and scanning updated; doc comment updated.
Middleware metrics payload & behavior
go/apps/gw/server/middleware_metrics.go
Switch buffered payload to ApiRequestV2 (adds Region, ResponseStatus as int32, error sourced from s.error); remove X-Unkey-Metrics gating and unused fault import; buffer when eventBuffer != nil.
Protos: formatting & small API add
go/proto/** (multiple files)
Formatting/whitespace edits across many proto files; add enum SourceType to go/proto/ctrl/v1/deployment.proto.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant GW as Gateway Handler
  participant S as Session
  participant P as Proxy
  participant U as Upstream
  participant MM as Metrics Middleware
  participant CH as ClickHouse

  C->>GW: HTTP request
  GW->>S: init(r) — read & cache request body
  GW->>S: CaptureResponseWriter() → (captureWriter, captureFunc)
  GW->>P: Proxy.Forward(ctx, targetURL, captureWriter, req)
  P->>U: forward
  U-->>P: response
  P-->>GW: return
  GW->>S: captureFunc() — commit status/body into session
  alt handler error occurred
    GW->>S: SetError(err)
  end
  GW->>MM: WithMetrics (eventBuffer != nil)
  MM->>CH: BufferApiRequest(ApiRequestV2)
  CH-->>MM: buffered
  GW-->>C: Response (already written by capture wrapper)
Loading
sequenceDiagram
  autonumber
  participant Lib as Zen Middleware
  participant EB as EventBuffer
  participant CH as ClickHouse

  Lib->>EB: BufferRequest(ApiRequestV1)
  EB->>CH: BufferRequest(V1) -> batcher/flush
  CH-->>EB: ack
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

No Rabbit

Suggested reviewers

  • imeyer
  • chronark
  • perkinsjr

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description contains a brief summary that it writes gateway logs to a v2 schema, but it does not follow the repository template: it omits a "Fixes #" issue reference, does not mark the appropriate "Type of change" checkbox, provides only placeholder test steps ("Test A"/"Test B") with no concrete verification instructions, and leaves the required checklist items unchecked, so reviewers lack necessary testing and compliance details. Because the template requires explicit testing instructions and checklist confirmations, the description is largely incomplete for a merge review. The summary is accurate but insufficient for validating correctness, risk, and release notes. Update the PR to follow the provided template: add the "Fixes #" or explain why there is no issue, check the correct Type of change, and replace "Test A/B" with concrete step-by-step test instructions (include commands, sample requests, and expected ClickHouse outcomes to verify v2 writes and that v1 behavior remains unchanged). Run and report build/format/tests (e.g., pnpm build, pnpm fmt, go test or equivalent), tick the required checklist items, and include any needed docs or migration notes so reviewers can validate behavior and risk. After these updates the description will be suitable for merge review.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: gateway logs" is concise and directly related to the primary change in this PR (adding gateway logging into a new v2 schema and related buffering renames), so it accurately signals the main intent and is easy to scan in history. It is short and avoids noisy details while remaining relevant to the changeset. Consider adding a small clarifier like "v2" or "ClickHouse" if you want more specificity for future readers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/gw-logs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 18, 2025

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
go/pkg/clickhouse/client.go (1)

192-198: Shutdown closes only V1 batch; V2/other processors may drop data and leak goroutines.

Close all batch processors before closing the DB connection to avoid lost flushes.

Apply:

 func (c *clickhouse) Shutdown(ctx context.Context) error {
-	c.requests.Close()
-	err := c.conn.Close()
+	// Close all processors first so they can drain while the connection is alive.
+	c.requests.Close()
+	if c.apiRequests != nil {
+		c.apiRequests.Close()
+	}
+	if c.keyVerifications != nil {
+		c.keyVerifications.Close()
+	}
+	if c.ratelimits != nil {
+		c.ratelimits.Close()
+	}
+	// TODO: batch.Wait(ctx) would be preferable if available.
+	err := c.conn.Close()
 	if err != nil {
 		return fault.Wrap(err, fault.Internal("clickhouse couldn't shut down"))
 	}
 	return nil
 }
go/apps/gw/server/middleware_metrics.go (1)

33-39: Broaden header redaction (Authorization only is insufficient).

Also redact Cookie, Set-Cookie, Proxy-Authorization, X-API-Key, X-Client-Secret, etc.

- for k, vv := range s.r.Header {
-   if strings.ToLower(k) == "authorization" {
+ sensitiveReq := map[string]struct{}{
+   "authorization": {}, "proxy-authorization": {}, "cookie": {}, "x-api-key": {},
+   "x-client-secret": {}, "x-secret-key": {}, "x-unkey": {},
+ }
+ for k, vv := range s.r.Header {
+   if _, ok := sensitiveReq[strings.ToLower(k)]; ok {
      requestHeaders = append(requestHeaders, fmt.Sprintf("%s: %s", k, "[REDACTED]"))
    } else {
      requestHeaders = append(requestHeaders, fmt.Sprintf("%s: %s", k, strings.Join(vv, ",")))
    }
  }

Do the same for response headers below (at least Set-Cookie).

- for k, vv := range s.w.Header() {
-   responseHeaders = append(responseHeaders, fmt.Sprintf("%s: %s", k, strings.Join(vv, ",")))
- }
+ for k, vv := range s.w.Header() {
+   if strings.ToLower(k) == "set-cookie" {
+     responseHeaders = append(responseHeaders, fmt.Sprintf("%s: %s", k, "[REDACTED]"))
+   } else {
+     responseHeaders = append(responseHeaders, fmt.Sprintf("%s: %s", k, strings.Join(vv, ",")))
+   }
+ }
🧹 Nitpick comments (16)
go/Tiltfile (2)

268-269: Dashboard trigger mode inconsistent with the others.

Dashboard still uses AUTO in debug while api/gw/ctrl switched to MANUAL. If unintentional, align it.

Apply if desired:

-        trigger_mode=TRIGGER_MODE_AUTO if debug_mode else TRIGGER_MODE_MANUAL
+        trigger_mode=TRIGGER_MODE_MANUAL if debug_mode else TRIGGER_MODE_AUTO

291-299: Gateway port forward in help text is wrong (6060 vs 8080/8443).

Fix the message to match k8s_resource port_forwards.

Apply:

-API: http://localhost:7070
-Gateway: http://localhost:6060
+API: http://localhost:7070
+Gateway (HTTP):  http://localhost:8080
+Gateway (HTTPS): https://localhost:8443
 Ctrl: http://localhost:7091
 Prometheus: http://localhost:9090
 S3 Console: http://localhost:9000
 ClickHouse: http://localhost:8123
go/apps/gw/server/middleware_errors.go (1)

41-43: Good: capture original error for metrics.

Storing the raw error on session is useful for V2 metrics.

Consider logging only sanitized/public messages: current logger.Error includes err.Error(), which may leak sensitive upstream details. Prefer the user‑facing string plus a hashed/typed code.

go/pkg/zen/middleware_metrics.go (2)

90-96: Trim first X-Forwarded-For entry.

Sometimes entries are space‑prefixed; trim to avoid leading spaces in stored IPs.

Apply:

-                ips := strings.Split(s.r.Header.Get("X-Forwarded-For"), ",")
+                ips := strings.Split(s.r.Header.Get("X-Forwarded-For"), ",")
                 ipAddress := ""
                 if len(ips) > 0 {
-                    ipAddress = ips[0]
+                    ipAddress = strings.TrimSpace(ips[0])
                 }

96-116: V1 buffering event build: LGTM.

Redaction and label metrics intact; matches schema.ApiRequestV1.

Consider extending redact() to catch common fields like "token", "secret", "password".

go/pkg/clickhouse/noop.go (1)

15-16: Duplicate interface satisfaction line.

Two identical “var _ Bufferer = (*noop)(nil)” lines.

Apply:

 var _ Bufferer = (*noop)(nil)
-var _ Bufferer = (*noop)(nil)
go/apps/gw/router/gateway_proxy/handler.go (1)

71-77: Switch to capture writer: good; ensure capture always runs.

Defer captureFunc so it executes even if Forward returns early/panics.

Apply:

-    captureWriter, captureFunc := sess.CaptureResponseWriter()
-    err = h.Proxy.Forward(ctx, *targetURL, captureWriter, req)
-    // Capture the response data back to session after forwarding
-    captureFunc()
+    captureWriter, captureFunc := sess.CaptureResponseWriter()
+    defer captureFunc()
+    err = h.Proxy.Forward(ctx, *targetURL, captureWriter, req)

Note: consider truncation limits in the capture layer to avoid large body allocations (see session.go suggestion).

go/apps/gw/server/session.go (2)

156-166: Log message content is misleading.

Logs “gateway error” for all sends and labels request ID as “urn”. Fix label and only log on non‑2xx/3xx if that was intended.

Apply:

-    s.w.WriteHeader(status)
-    log.Printf("gateway error: %d urn: %s", status, s.RequestID())
+    s.w.WriteHeader(status)
+    lvl := "info"
+    if status >= 400 {
+        lvl = "error"
+    }
+    log.Printf("gateway %s: status=%d requestId=%s", lvl, status, s.RequestID())

185-209: Remove unused wrapResponseWriter

Type and methods are defined in go/apps/gw/server/session.go (lines 185–204) but never referenced; captureResponseWriter (lines 210–216) provides the needed behavior — remove wrapResponseWriter to avoid dead/confusing code.

go/pkg/clickhouse/interface.go (1)

17-24: Interface split V1/V2 — implementors verified; optional rename recommended.

  • Verified: interface defines BufferRequest(schema.ApiRequestV1) and BufferApiRequest(schema.ApiRequestV2); implementors updated in go/pkg/clickhouse/client.go and go/pkg/clickhouse/noop.go — no stale implementors.
  • Optional: rename for symmetry (e.g. BufferApiRequestV1/BufferApiRequestV2 or BufferRequestV1/BufferRequestV2); change go/pkg/clickhouse/interface.go and all implementors if applied.
go/pkg/clickhouse/client.go (3)

66-74: Unconditional ClickHouse driver debug logs risk leaking sensitive data in prod.

Gate opts.Debug behind config/env.

-	opts.Debug = true
-	opts.Debugf = func(format string, v ...any) {
-		config.Logger.Debug(fmt.Sprintf(format, v...))
-	}
+	if config.Logger != nil && shouldEnableCHDebug() { // e.g., env UNKEY_CH_DEBUG=true
+		opts.Debug = true
+		opts.Debugf = func(format string, v ...any) {
+			config.Logger.Debug(fmt.Sprintf(format, v...))
+		}
+	}

You can add:

func shouldEnableCHDebug() bool { return strings.ToLower(os.Getenv("UNKEY_CH_DEBUG")) == "true" }

99-116: Normalize log field keys for errors.

"err" vs "error" is inconsistent across flushers; pick one for log/metrics filters.

- "err", err.Error(),
+ "error", err.Error(),

Also applies to: 117-134, 143-151, 156-172


60-65: Validate DSN early.

Guard against empty/invalid URL and give a clearer error.

 func New(config Config) (*clickhouse, error) {
+	if strings.TrimSpace(config.URL) == "" {
+		return nil, fault.Internal("clickhouse DSN is empty")
+	}
 	opts, err := ch.ParseDSN(config.URL)
go/apps/gw/server/middleware_metrics.go (3)

55-63: X-Forwarded-For parsing is naïve.

Trim empty entries and ignore “unknown”. Consider falling back to s.r.RemoteAddr if s.Location() is empty.

 ips := strings.Split(s.r.Header.Get("X-Forwarded-For"), ",")
 ipAddress := ""
 if len(ips) > 0 {
-	ipAddress = strings.TrimSpace(ips[0])
+	for _, ip := range ips {
+		ip = strings.TrimSpace(ip)
+		if ip != "" && strings.ToLower(ip) != "unknown" {
+			ipAddress = ip
+			break
+		}
+	}
 }
 if ipAddress == "" {
 	ipAddress = s.Location()
+	if ipAddress == "" {
+		ipAddress = s.r.RemoteAddr
+	}
 }

90-98: getErrorMessage may leak internal details.

Restrict content/length; prefer stable error classes for analytics.

 func getErrorMessage(err error) string {
 	if err == nil {
 		return ""
 	}
-	// Fallback for non-fault errors
-	return err.Error()
+	msg := err.Error()
+	if len(msg) > 512 {
+		msg = msg[:512] + "…"
+	}
+	return msg
 }

Optional: map known errors to codes and store codes in an extra field later.


54-54: Always-on buffering can spike CH load; add a feature flag or sampler.

Gate by env/workspace config or sample by path/status to control costs.

- if eventBuffer != nil {
+ if eventBuffer != nil && metricsEnabledFor(s.WorkspaceID) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e3a016 and 3530ec3.

📒 Files selected for processing (10)
  • go/Tiltfile (3 hunks)
  • go/apps/api/routes/chproxy_metrics/handler.go (1 hunks)
  • go/apps/gw/router/gateway_proxy/handler.go (1 hunks)
  • go/apps/gw/server/middleware_errors.go (1 hunks)
  • go/apps/gw/server/middleware_metrics.go (2 hunks)
  • go/apps/gw/server/session.go (4 hunks)
  • go/pkg/clickhouse/client.go (6 hunks)
  • go/pkg/clickhouse/interface.go (1 hunks)
  • go/pkg/clickhouse/noop.go (1 hunks)
  • go/pkg/zen/middleware_metrics.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
go/pkg/zen/middleware_metrics.go (1)
go/pkg/clickhouse/schema/requests.go (1)
  • ApiRequestV1 (8-58)
go/apps/gw/router/gateway_proxy/handler.go (1)
go/apps/gw/services/proxy/interface.go (1)
  • Proxy (12-15)
go/pkg/clickhouse/noop.go (2)
go/pkg/clickhouse/schema/requests.go (1)
  • ApiRequestV1 (8-58)
go/pkg/clickhouse/schema/types.go (1)
  • ApiRequestV2 (36-53)
go/apps/gw/server/middleware_metrics.go (1)
go/pkg/clickhouse/schema/types.go (1)
  • ApiRequestV2 (36-53)
go/pkg/clickhouse/interface.go (2)
go/pkg/clickhouse/schema/requests.go (1)
  • ApiRequestV1 (8-58)
go/pkg/clickhouse/schema/types.go (1)
  • ApiRequestV2 (36-53)
go/apps/api/routes/chproxy_metrics/handler.go (1)
go/pkg/clickhouse/interface.go (1)
  • ClickHouse (42-45)
go/pkg/clickhouse/client.go (4)
go/pkg/batch/process.go (3)
  • BatchProcessor (21-27)
  • New (84-120)
  • Config (32-60)
go/pkg/clickhouse/schema/types.go (1)
  • ApiRequestV2 (36-53)
go/pkg/clickhouse/schema/requests.go (1)
  • ApiRequestV1 (8-58)
go/pkg/buffer/buffer.go (1)
  • Buffer (14-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build / Build
  • GitHub Check: Docs
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (9)
go/Tiltfile (1)

172-173: Trigger mode inversion in debug: confirm intent across services.

api/gw/ctrl now use TRIGGER_MODE_MANUAL when debug_mode is true (was AUTO). Verify team workflow/docs expect manual rebuilds in debug and that auto_init doesn’t surprise folks.

Also applies to: 205-206, 239-240

go/apps/api/routes/chproxy_metrics/handler.go (1)

60-63: Rename to BufferRequest: looks correct.

The call site matches the updated ClickHouse interface for ApiRequestV1 buffering.

Please confirm all Bufferer implementations (client/noop/mocks) provide BufferRequest(ApiRequestV1).

go/pkg/zen/middleware_metrics.go (1)

16-18: Interface rename acknowledged.

EventBuffer now exposes BufferRequest(ApiRequestV1); consistent with CH V1 path.

go/pkg/clickhouse/noop.go (2)

18-22: No-op BufferRequest(V1): good.

Matches new interface and safely discards events.


24-26: No-op BufferApiRequest(V2): good.

Signature updated to V2; consistent with interface.

go/apps/gw/server/session.go (3)

31-32: Session error field addition: OK.

Enables original-error propagation into metrics.


80-86: SetError stores first error only: OK.

Prevents overwrite; aligns with “original error” semantics.


173-183: Reset clears captured error as well: good.

Consistent reuse semantics.

go/pkg/clickhouse/client.go (1)

117-134: Confirm V2 table name and retention.

Ensure default.api_requests_raw_v2 exists with expected schema/TTL and aligns with infra conventions (other tables use metrics/verifications/ratelimits DBs).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
go/proto/ctrl/v1/deployment.proto (1)

96-103: Reserve removed field number to prevent wire-format conflicts.

Field 19 appears intentionally removed (PII). Reserve it.

   // Extended git information
   string git_commit_message = 17;
   string git_commit_author_name = 18;
   // Removed: email is PII and not stored
+  // Reserve the old field number to avoid reuse/wire conflicts.
+  reserved 19;
   string git_commit_author_username = 20;
   string git_commit_author_avatar_url = 21;
   int64 git_commit_timestamp = 22; // Unix epoch milliseconds
♻️ Duplicate comments (1)
go/apps/gw/server/session.go (1)

238-266: Cap captured response body to avoid unbounded memory use; keep streaming intact

This duplicates earlier feedback: write path currently accumulates entire bodies.

Apply within this range:

 type captureResponseWriter struct {
     http.ResponseWriter
     statusCode int
     body       []byte
     written    bool
 }
 
 func (w *captureResponseWriter) WriteHeader(code int) {
@@
 }
 
 func (w *captureResponseWriter) Write(b []byte) (int, error) {
     if !w.written {
         w.WriteHeader(http.StatusOK)
     }
 
-    // Capture the body
-    w.body = append(w.body, b...)
+    // Capture the body with truncation
+    const maxCaptureBytes = 64 * 1024 // 64KiB
+    if len(w.body) < maxCaptureBytes {
+        remaining := maxCaptureBytes - len(w.body)
+        if remaining > 0 {
+            if len(b) > remaining {
+                w.body = append(w.body, b[:remaining]...)
+            } else {
+                w.body = append(w.body, b...)
+            }
+        }
+    }
 
     return w.ResponseWriter.Write(b)
 }

Add optional interface support (outside this range):

@@
-import (
+import (
     "bytes"
+    "bufio"
+    "errors"
     "encoding/json"
     "io"
     "log"
     "net"
     "net/http"
     "time"
 )
@@
 func (w *captureResponseWriter) Write(b []byte) (int, error) {
@@
 }
+
+// Optional interfaces passthrough
+func (w *captureResponseWriter) Flush() {
+    if f, ok := w.ResponseWriter.(http.Flusher); ok {
+        f.Flush()
+    }
+}
+
+func (w *captureResponseWriter) Hijack() (net.Conn, *bufio.ReadWriter, error) {
+    if h, ok := w.ResponseWriter.(http.Hijacker); ok {
+        return h.Hijack()
+    }
+    return nil, nil, errors.New("hijacker not supported")
+}
+
+func (w *captureResponseWriter) Push(target string, opts *http.PushOptions) error {
+    if p, ok := w.ResponseWriter.(http.Pusher); ok {
+        return p.Push(target, opts)
+    }
+    return http.ErrNotSupported
+}
🧹 Nitpick comments (9)
go/proto/metald/v1/vm.proto (1)

72-74: Endpoint formatting-only change — no semantic impact

Types remain string and uint32; no action needed. Consider documenting valid port range (0–65535) in a follow-up.

go/proto/ctrl/v1/acme.proto (1)

10-16: Nit: clarify the "token" naming.

Consider renaming to challenge_token or acme_token for readability across request/response.

go/proto/ctrl/v1/deployment.proto (1)

25-47: Model mutually exclusive source details with oneof (avoids invalid combinations).

Current schema allows docker_image + git_* together. Consider oneof for source details.

message CreateDeploymentRequest {
  string workspace_id = 1;
  string project_id   = 2;
  string branch       = 3;
  string environment_slug = 4;

  // Either Git or CLI upload details.
  oneof source {
    GitSource git = 30;
    CLISource cli = 31;
  }

  // ... keep keyspace_id, etc.
}

message GitSource {
  string commit_sha = 1;
  string commit_message = 2;
  string author_name = 3;
  string author_username = 4;
  string author_avatar_url = 5;
  int64  commit_timestamp = 6;
}

message CLISource {
  string docker_image = 1;
}
go/apps/gw/server/server.go (2)

205-220: Capture init error on the session before invoking middleware.

Ensures logging/metrics see the original error even if middleware forgets to call SetError.

-		err := sess.init(w, r)
-		if err != nil {
+		err := sess.init(w, r)
+		if err != nil {
+			// Ensure error is recorded on the session for downstream logging/metrics.
+			sess.SetError(err)
 			// Apply default middleware chain for session initialization errors
 			handleFn := func(ctx context.Context, session *Session) error {
 				return err // Return the session init error
 			}

223-227: Avoid panicking on unhandled errors; log and return 500 as a safety net.

Found a gateway panic-recovery middleware at go/apps/gw/server/middleware_panic_recovery.go; panics are caught, but prefer explicit handling instead of relying on global recovery.

-		if err != nil {
-			// Error should have been handled by error middleware
-			// If we get here, something went wrong
-			panic(err)
-		}
+		if err != nil {
+			// Error should have been handled by error middleware; fail safe.
+			s.logger.Error("unhandled error in gateway handler", "error", err.Error(), "request_id", sess.requestID)
+			http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
+			return
+		}
go/apps/gw/services/caches/caches.go (2)

22-23: Field type updated; align comment and callers

Switch to cache.Cache[string, routing.ConfigWithWorkspace] looks good. Please update the field comment to reflect the new composite type.

- // HostName -> Config
+ // HostName -> ConfigWithWorkspace

75-83: Gateway config cache: OK; consider metric/resource naming

Construction with the new generic is correct. Optionally, rename Resource to something like "gateway_config_with_workspace" to distinguish metrics from the prior value type.

go/apps/gw/server/session.go (1)

91-106: CaptureResponseWriter lifecycle: defer capture

Ensure capture always runs, even on early returns.

-    return wrapper, capture
+    return wrapper, capture

Usage already updated in handler with a defer (see suggested change there).

go/apps/gw/services/routing/service.go (1)

60-63: Be tolerant to forward-compatible fields during unmarshal.

Discard unknown fields to avoid loader failures when the stored JSON has newer fields.

Apply:

- if err := protojson.Unmarshal(gatewayRow.Config, &gatewayConfig); err != nil {
+ if err := (protojson.UnmarshalOptions{DiscardUnknown: true}).Unmarshal(gatewayRow.Config, &gatewayConfig); err != nil {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3530ec3 and 03ed8f0.

⛔ Files ignored due to path filters (1)
  • go/gen/proto/metald/v1/metald.pb.go is excluded by !**/*.pb.go, !**/gen/**
📒 Files selected for processing (21)
  • go/apps/gw/router/gateway_proxy/handler.go (3 hunks)
  • go/apps/gw/server/server.go (1 hunks)
  • go/apps/gw/server/session.go (5 hunks)
  • go/apps/gw/services/caches/caches.go (3 hunks)
  • go/apps/gw/services/routing/interface.go (2 hunks)
  • go/apps/gw/services/routing/service.go (2 hunks)
  • go/apps/gw/services/routing/types.go (1 hunks)
  • go/pkg/partition/db/gateway_find_config_by_hostname.sql_generated.go (1 hunks)
  • go/pkg/partition/db/querier_generated.go (1 hunks)
  • go/pkg/partition/db/queries/gateway_find_config_by_hostname.sql (1 hunks)
  • go/proto/assetmanagerd/v1/asset.proto (4 hunks)
  • go/proto/builderd/v1/builder.proto (0 hunks)
  • go/proto/ctrl/v1/acme.proto (1 hunks)
  • go/proto/ctrl/v1/build.proto (2 hunks)
  • go/proto/ctrl/v1/deployment.proto (1 hunks)
  • go/proto/ctrl/v1/service.proto (1 hunks)
  • go/proto/metald/v1/deployment.proto (1 hunks)
  • go/proto/metald/v1/metald.proto (1 hunks)
  • go/proto/metald/v1/network.proto (1 hunks)
  • go/proto/metald/v1/vm.proto (5 hunks)
  • go/proto/partition/v1/gateway.proto (1 hunks)
💤 Files with no reviewable changes (1)
  • go/proto/builderd/v1/builder.proto
✅ Files skipped from review due to trivial changes (8)
  • go/proto/ctrl/v1/service.proto
  • go/proto/metald/v1/metald.proto
  • go/pkg/partition/db/querier_generated.go
  • go/proto/metald/v1/deployment.proto
  • go/proto/assetmanagerd/v1/asset.proto
  • go/proto/metald/v1/network.proto
  • go/proto/ctrl/v1/build.proto
  • go/proto/partition/v1/gateway.proto
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-15T20:45:05.696Z
Learnt from: Flo4604
PR: unkeyed/unkey#3952
File: go/apps/ctrl/services/routing/service.go:69-91
Timestamp: 2025-09-15T20:45:05.696Z
Learning: In Unkey's routing service, gateway lookups should be workspace-scoped using FindGatewayByHostnameAndWorkspace instead of hostname-only queries to prevent cross-tenant access issues.

Applied to files:

  • go/pkg/partition/db/gateway_find_config_by_hostname.sql_generated.go
  • go/pkg/partition/db/queries/gateway_find_config_by_hostname.sql
📚 Learning: 2025-09-01T01:57:42.227Z
Learnt from: imeyer
PR: unkeyed/unkey#3899
File: go/proto/metald/v1/metald.proto:5-9
Timestamp: 2025-09-01T01:57:42.227Z
Learning: In the unkeyed/unkey repository, buf is configured to properly resolve metald proto imports like "metald/v1/vm.proto" without needing the full "go/proto/" prefix. The buf lint command `buf lint --path proto/metald` passes successfully with these relative import paths.

Applied to files:

  • go/proto/metald/v1/vm.proto
🧬 Code graph analysis (8)
go/apps/gw/services/routing/types.go (1)
go/apps/gw/services/routing/interface.go (1)
  • Config (25-32)
go/pkg/partition/db/gateway_find_config_by_hostname.sql_generated.go (1)
go/apps/gw/services/routing/interface.go (1)
  • Config (25-32)
go/apps/gw/services/caches/caches.go (2)
go/apps/gw/services/routing/types.go (1)
  • ConfigWithWorkspace (6-9)
go/apps/gw/services/routing/interface.go (1)
  • Config (25-32)
go/apps/gw/services/routing/interface.go (1)
go/apps/gw/services/routing/types.go (1)
  • ConfigWithWorkspace (6-9)
go/apps/gw/server/server.go (1)
go/apps/gw/server/session.go (1)
  • Session (18-33)
go/apps/gw/services/routing/service.go (5)
go/apps/gw/services/routing/types.go (1)
  • ConfigWithWorkspace (6-9)
go/apps/gw/services/routing/interface.go (1)
  • Config (25-32)
go/internal/services/caches/op.go (1)
  • DefaultFindFirstOp (9-22)
go/pkg/db/handle_err_no_rows.go (1)
  • IsNotFound (8-10)
go/pkg/cache/interface.go (1)
  • Null (42-42)
go/apps/gw/router/gateway_proxy/handler.go (2)
go/apps/gw/services/routing/interface.go (1)
  • Config (25-32)
go/apps/gw/services/proxy/interface.go (1)
  • Proxy (12-15)
go/apps/gw/server/session.go (2)
go/apps/gw/server/server.go (1)
  • New (56-105)
go/pkg/fault/wrap.go (3)
  • Wrap (25-67)
  • Internal (75-89)
  • Public (97-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build / Build
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (18)
go/proto/metald/v1/vm.proto (6)

91-94: DeleteVmResponse block expansion — OK

Pure formatting; wire format unaffected.


95-98: BootVmRequest block expansion — OK

No semantic changes.


117-120: PauseVmRequest block expansion — OK

No semantic changes.


125-128: ResumeVmRequest block expansion — OK

No semantic changes.


144-147: GetVmInfoRequest block expansion — OK

No semantic changes.


9-10: go_package option repositioned — OK; buf failed from repo root, re-run from go/ to verify

Order change is valid in proto3. The supplied run failed with: go/proto/metald/v1/deployment.proto:5:8: import "metald/v1/vm.proto": file does not exist — run buf from the go/ directory (where buf.yaml lives) and re-run buf lint/generate to confirm codegen still targets github.com/unkeyed/unkey/go/gen/proto/metald/v1;metaldv1.

go/proto/ctrl/v1/deployment.proto (2)

45-47: Optional presence: verify generator versions for proto3 'optional'.
Sandbox verification failed: protoc and protoc-gen-go not found; go.mod missing.

  • Confirm CI/local toolchain uses a protoc + protoc-gen-go that support proto3 "optional" for scalars; if not, change the field to a wrapper type (google.protobuf.StringValue) or remove the optional qualifier.
  • Document/pin required generator versions in the repo/CI to avoid build surprises.

18-23: Require callers to set SourceType explicitly (do not rely on UNSPECIFIED).
Default numeric 0 = SOURCE_TYPE_UNSPECIFIED; confirm every CreateDeploymentRequest and other deployment-creation callers set SOURCE_TYPE_GIT or SOURCE_TYPE_CLI_UPLOAD. The provided ripgrep produced no matches — manual verification required.

go/proto/ctrl/v1/acme.proto (1)

7-7: Regenerate protos and commit generated diffs.

Formatting-only change to go_package looks good; no semantic impact. Sandbox verification produced no output — run the following locally and commit any generated diffs:

#!/bin/bash
set -euo pipefail
if command -v make >/dev/null 2>&1 && rg -n "proto" Makefile >/dev/null 2>&1; then
  make proto || true
fi
git status --porcelain
go/apps/gw/services/caches/caches.go (1)

9-9: Routing import change: OK

Import swap to routing is correct given the new ConfigWithWorkspace type.

go/apps/gw/services/routing/types.go (1)

5-9: New wrapper type: LGTM

Encapsulating WorkspaceID alongside GatewayConfig is clean and minimally invasive.

go/apps/gw/router/gateway_proxy/handler.go (2)

38-45: Config lookup error mapping: LGTM

Error wrapping and codes look consistent with existing fault patterns.


47-50: Session workspace propagation: LGTM

Assigning WorkspaceID before validation/auth is appropriate.

go/apps/gw/services/routing/interface.go (2)

17-19: GetConfig signature change: LGTM

Returning *ConfigWithWorkspace aligns with the new cache/type flow.


30-31: Cache value type switch: verified — callers updated
GatewayConfigCache now uses routing.ConfigWithWorkspace; SWR in go/apps/gw/services/routing/service.go returns ConfigWithWorkspace, and caches are wired in go/apps/gw/services/caches/caches.go and go/apps/gw/run.go. No remaining cache store/load sites referencing *partitionv1.GatewayConfig were found.

go/apps/gw/server/session.go (1)

108-114: SetError first-write wins: LGTM

Captures the originating error without clobbering.

go/apps/gw/services/routing/service.go (2)

26-28: Confirm GatewayConfigCache -> ConfigWithWorkspace refactor across call-sites

Interface and service files already use cache.Cache[string, ConfigWithWorkspace]; run.go and routing handlers still call RoutingService.GetConfig — ensure GetConfig's signature and any cache.Get/Set consumers accept ConfigWithWorkspace (check: go/apps/gw/services/routing/interface.go, go/apps/gw/services/routing/service.go, go/apps/gw/run.go, go/apps/gw/router/gateway_proxy/handler.go, go/apps/gw/router/acme_challenge/handler.go).


51-52: Signature change: verify downstream usage and docs.

GetConfig (go/apps/gw/services/routing/service.go:51–52) now returns *routing.ConfigWithWorkspace — confirm all callers/docs updated. The gateway handler passes config into RoutingService.SelectVM(ctx, config) (go/apps/gw/router/gateway_proxy/handler.go:66); verify SelectVM's parameter type accepts the new return (or update callers/docs).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
go/apps/gw/services/routing/service.go (1)

71-83: Nice fix: non-NotFound errors now propagate instead of being silently dropped.

This addresses the prior review concern. Minor polish: include the hostname in internal messages to aid ops triage.

Apply:

-    return nil, fault.Wrap(err,
-      fault.Code(codes.App.Internal.ServiceUnavailable.URN()),
-      fault.Internal("error loading gateway configuration"),
-      fault.Public("Failed to load gateway configuration"),
-    )
+    return nil, fault.Wrap(err,
+      fault.Code(codes.App.Internal.ServiceUnavailable.URN()),
+      fault.Internal(fmt.Sprintf("error loading gateway configuration for hostname %q", host)),
+      fault.Public("Failed to load gateway configuration"),
+    )
@@
-  return nil, fault.Wrap(err,
+  return nil, fault.Wrap(err,
     fault.Code(codes.Gateway.Routing.ConfigNotFound.URN()),
-    fault.Internal("no gateway configuration found for hostname"),
+    fault.Internal(fmt.Sprintf("no gateway configuration found for hostname %q", host)),
     fault.Public("No configuration found for this domain"),
   )

Run to sanity-check call sites were updated to the new signature and composite type:

#!/bin/bash
set -euo pipefail
echo "Interface declaration(s) of GetConfig:"
rg -nP -C2 '(interface|type)\s+\w*Service\b.*?\{(?s).*?\bGetConfig\s*\(' --type go

echo -e "\nCall sites of GetConfig(ctx, host):"
rg -nP -C2 '\bGetConfig\s*\(\s*ctx\s*,\s*[^)]*\)' --type go

echo -e "\nUsages of ConfigWithWorkspace:"
rg -nP -C2 '\bConfigWithWorkspace\b' --type go
🧹 Nitpick comments (1)
go/apps/gw/services/routing/service.go (1)

51-69: GetConfig signature + SWR loader migration is sound.

Good: returns composite, keeps loader returning zero-value on error and defers decision to caller; uses DefaultFindFirstOp for proper cache writes.

Please add tests covering:

  • not-found (db.IsNotFound → wrapped ConfigNotFound fault),
  • non-not-found loader error (e.g., bad JSON → ServiceUnavailable fault),
  • happy path (WorkspaceID plumbed).

I can draft table-driven tests hitting these three paths if helpful.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03ed8f0 and 09422df.

📒 Files selected for processing (3)
  • go/apps/api/routes/services.go (0 hunks)
  • go/apps/gw/services/routing/service.go (2 hunks)
  • go/pkg/clickhouse/doc.go (1 hunks)
💤 Files with no reviewable changes (1)
  • go/apps/api/routes/services.go
✅ Files skipped from review due to trivial changes (1)
  • go/pkg/clickhouse/doc.go
🧰 Additional context used
🧬 Code graph analysis (1)
go/apps/gw/services/routing/service.go (5)
go/apps/gw/services/routing/types.go (1)
  • ConfigWithWorkspace (6-9)
go/apps/gw/services/routing/interface.go (1)
  • Config (25-32)
go/internal/services/caches/op.go (1)
  • DefaultFindFirstOp (9-22)
go/pkg/db/handle_err_no_rows.go (1)
  • IsNotFound (8-10)
go/pkg/cache/interface.go (1)
  • Null (42-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build / Build
🔇 Additional comments (2)
go/apps/gw/services/routing/service.go (2)

26-27: Cache value type migration to ConfigWithWorkspace looks correct.

Matches interface/config wiring and enables carrying WorkspaceID alongside the proto config. LGTM.


87-88: Returning pointer to local composite is fine (escapes to heap).

LGTM.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
go/Tiltfile (1)

174-175: Trigger mode inversion looks accidental; align with other resources.

Dashboard/Metald still use AUTO in debug, MANUAL otherwise. API/GW/Ctrl were flipped here, which is likely to degrade DX in debug mode.

Apply to restore consistency:

-        trigger_mode=TRIGGER_MODE_MANUAL if debug_mode else TRIGGER_MODE_AUTO
+        trigger_mode=TRIGGER_MODE_AUTO if debug_mode else TRIGGER_MODE_MANUAL

Repeat the same replacement for the GW and Ctrl resources at Lines 207 and 241.

Also applies to: 207-208, 241-242

go/apps/gw/services/routing/service.go (3)

72-79: Include hostname in the internal error for diagnosability.

Apply:

-        return nil, fault.Wrap(err,
+        return nil, fault.Wrap(err,
             fault.Code(codes.App.Internal.ServiceUnavailable.URN()),
-            fault.Internal("error loading gateway configuration"),
+            fault.Internal(fmt.Sprintf("error loading gateway configuration for hostname %q", host)),
             fault.Public("Failed to load gateway configuration"),
         )

80-86: Include hostname in NotFound fault to aid ops triage.

Apply:

-        return nil, fault.Wrap(err,
+        return nil, fault.Wrap(err,
             fault.Code(codes.Gateway.Routing.ConfigNotFound.URN()),
-            fault.Internal("no gateway configuration found for hostname"),
+            fault.Internal(fmt.Sprintf("no gateway configuration found for hostname %q", host)),
             fault.Public("No configuration found for this domain"),
         )

60-69: Update misleading comment to reflect JSON storage (protojson used)

Gateway configs are stored/read with protojson.Marshal/Unmarshal (see deploy_workflow.go); update the comment in go/apps/gw/services/routing/service.go (GetConfig).

-        // Unmarshal the protobuf blob from the database
+        // Unmarshal GatewayConfig from JSON stored in the database
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09422df and a219058.

📒 Files selected for processing (2)
  • go/Tiltfile (3 hunks)
  • go/apps/gw/services/routing/service.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
go/apps/gw/services/routing/service.go (6)
go/apps/gw/services/routing/types.go (1)
  • ConfigWithWorkspace (6-9)
go/pkg/partition/db/queries.go (1)
  • Query (29-29)
go/apps/gw/services/routing/interface.go (1)
  • Config (25-32)
go/internal/services/caches/op.go (1)
  • DefaultFindFirstOp (9-22)
go/pkg/db/handle_err_no_rows.go (1)
  • IsNotFound (8-10)
go/pkg/cache/interface.go (1)
  • Null (42-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Build / Build
  • GitHub Check: Vade Review
🔇 Additional comments (3)
go/apps/gw/services/routing/service.go (3)

27-29: Cache type change LGTM.

Switch to Cache[string, ConfigWithWorkspace] matches the new API surface.


88-89: Return of &config is fine.

Value escapes and is safe; interface remains clear.


52-55: Host canonicalization verified — callers strip port before GetConfig.
Calls to GetConfig use routing.ExtractHostname and pass the stripped host (go/apps/gw/router/gateway_proxy/handler.go lines 35–38; go/apps/gw/router/acme_challenge/handler.go lines 30–33).

@chronark chronark enabled auto-merge September 19, 2025 13:01
@chronark
Copy link
Collaborator

@ogzhanolguncu

@ogzhanolguncu
Copy link
Contributor

:ack: I'll check in 10

Copy link
Contributor

@ogzhanolguncu ogzhanolguncu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requests are reflecting to CH correctly. LGTM.

@chronark chronark added this pull request to the merge queue Sep 19, 2025
@graphite-app
Copy link

graphite-app bot commented Sep 19, 2025

Video gif. An elderly man in suspenders smiles as he holds up two thumbs as if saying good luck.  (Added via Giphy)

@graphite-app
Copy link

graphite-app bot commented Sep 19, 2025

Graphite Automations

"Post a GIF when PR approved" took an action on this PR • (09/19/25)

1 gif was posted to this PR based on Andreas Thomas's automation.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 19, 2025
@chronark chronark merged commit 86134f4 into main Sep 19, 2025
19 checks passed
@chronark chronark deleted the feat/gw-logs branch September 19, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants