Skip to content

feat/gossip#5015

Merged
Flo4604 merged 30 commits intomainfrom
feat/gossip
Feb 16, 2026
Merged

feat/gossip#5015
Flo4604 merged 30 commits intomainfrom
feat/gossip

Conversation

@Flo4604
Copy link
Member

@Flo4604 Flo4604 commented Feb 12, 2026

What does this PR do?

Adds a specific gossip implementation that would work for us - in theory.

We have 2 seperate gossip memberlists, one for intra cluster messages and one for cross region messages.
The idea is to:

Have a single node be the broadcast who talks to other clusters meaning

We publish a message in us-east-1 one of our 3 nodes will send it to eu-central-1 and that itself will distribute the message to its local members.

That way we dont need everyone to know about everyone and keep latency shit for only a single req across the globe

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Ran make fmt on /go directory
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

@vercel
Copy link

vercel bot commented Feb 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dashboard Ready Ready Preview, Comment Feb 16, 2026 7:03pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
engineering Ignored Ignored Preview Feb 16, 2026 7:03pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This pull request replaces Kafka-based distributed cache invalidation with a gossip-based cluster membership system using HashiCorp memberlist. Changes include introducing a new cluster package implementing two-tier LAN/WAN gossip with automatic ambassador election, updating cache clustering to use a Broadcaster interface, removing the eventstream infrastructure, and wiring gossip configuration across API, Frontline, and Sentinel services with corresponding CLI flags and Kubernetes manifests.

Changes

Cohort / File(s) Summary
Infrastructure & Configuration Removal
.github/workflows/job_bazel.yaml, dev/docker-compose.yaml, Makefile, dev/Tiltfile
Removed Kafka from Docker Compose, CI workflows, and Makefile targets. Updated development environment to exclude Kafka container and dependencies.
Go Dependencies
go.mod, MODULE.bazel, tools/exportoneof/...
Replaced kafka-go with hashicorp/memberlist dependency. Added new exportoneof tool for proto code generation. Updated Bazel modules configuration.
Cluster Package Implementation
pkg/cluster/...
New 10-file cluster package implementing gossip-based two-tier membership (LAN/WAN) with SWIM protocol via memberlist. Includes bridge/ambassador election, DNS seed resolution, message multiplexing, and comprehensive tests.
Cache Clustering Updates
pkg/cache/clustering/broadcaster.go, broadcaster_gossip.go, broadcaster_noop.go, cluster_cache.go, dispatcher.go, gossip_e2e_test.go, BUILD.bazel
Introduced Broadcaster interface replacing eventstream-based invalidation. Added GossipBroadcaster for cluster-based propagation and NoopBroadcaster for disabled mode. Removed Kafka-backed tests and added gossip E2E tests.
Eventstream Package Removal
pkg/events/*, pkg/eventstream/*
Completely removed pub/sub Topic infrastructure, Producer/Consumer interfaces, and Kafka integration code. Deleted integration tests and no-op implementations.
API Service Integration
cmd/api/main.go, svc/api/config.go, svc/api/run.go, svc/api/BUILD.bazel
Replaced Kafka broker configuration with Gossip cluster flags (gossip-enabled, gossip-bind-addr, LAN/WAN ports and seeds, secret-key). Updated config struct and wiring logic.
Frontline Service Integration
cmd/frontline/main.go, svc/frontline/config.go, svc/frontline/run.go, svc/frontline/services/caches/...
Added Gossip configuration to CLI and service config. Updated cache service to use Broadcaster for distributed invalidation and NodeID for cluster identity.
Sentinel Service Integration
cmd/sentinel/main.go, svc/sentinel/config.go, svc/sentinel/run.go, svc/sentinel/services/router/...
Added Gossip cluster configuration and wiring. Updated router service with Broadcaster and NodeID fields for cache invalidation propagation.
Kubernetes Manifests
dev/k8s/manifests/api.yaml, dev/k8s/manifests/frontline.yaml, dev/k8s/manifests/cilium-policies.yaml
Added gossip LAN ports (7946 TCP/UDP) and environment variables to API and Frontline deployments. Created headless Services for gossip endpoints. Added CiliumNetworkPolicy rules for inter-pod gossip communication.
Proto Definitions
proto/cache/v1/invalidation.proto, proto/cluster/v1/envelope.proto
Refactored CacheInvalidationEvent with oneof action field supporting cache_key or clear_all. Created new ClusterMessage envelope with Direction enum and payload routing.
Build Configuration Updates
internal/services/caches/BUILD.bazel, svc/api/integration/cluster/cache/BUILD.bazel, svc/api/BUILD.bazel, svc/frontline/BUILD.bazel, svc/sentinel/BUILD.bazel, svc/sentinel/services/router/BUILD.bazel, pkg/cluster/BUILD.bazel, tools/exportoneof/BUILD.bazel
Added clustering and cluster dependencies across services. Removed eventstream and kafka-go dependencies. Narrowed test targets to new gossip-based implementations.
Kubernetes Controller Updates
svc/krane/internal/sentinel/apply.go, svc/krane/internal/sentinel/delete.go, svc/krane/internal/sentinel/controller.go, svc/krane/internal/sentinel/consts.go, svc/krane/pkg/labels/labels.go, svc/krane/run.go
Extended Sentinel K8s controller to manage gossip headless Services and CiliumNetworkPolicy resources. Added dynamic client integration and gossip LAN port constant. Added ComponentGossipLAN label method.
Test Removals & Refactoring
pkg/cache/clustering/consume_events_test.go, pkg/cache/clustering/e2e_test.go, pkg/cache/clustering/produce_events_test.go, svc/api/integration/cluster/cache/consume_events_test.go, svc/api/integration/cluster/cache/produce_events_test.go, pkg/eventstream/eventstream_integration_test.go
Removed all Kafka-backed integration tests. Replaced with new gossip E2E tests validating cross-node invalidation (Remove and Clear operations).
Integration Harness Updates
svc/api/integration/harness.go, svc/api/internal/testutil/http.go
Removed Docker-based Kafka orchestration. Updated caches config to use Broadcaster instead of CacheInvalidationTopic.
Documentation & Tooling
web/apps/engineering/content/docs/architecture/services/cluster-service.mdx, tools/exportoneof/main.go
Added comprehensive cluster architecture documentation. Introduced exportoneof code generation tool for proto oneof interface export.

Sequence Diagram(s)

sequenceDiagram
    participant Node1 as Node 1<br/>(API Instance)
    participant LAN1 as LAN Pool<br/>(memberlist)
    participant Node2 as Node 2<br/>(API Instance)
    participant LAN2 as LAN Pool<br/>(memberlist)
    participant WAN as WAN Pool<br/>(Ambassador)

    rect rgba(100, 150, 200, 0.5)
        Note over Node1,Node2: Same Region (LAN) Invalidation
        Node1->>LAN1: Broadcast(CacheInvalidation)
        LAN1->>Node2: NotifyMsg(ClusterMessage)
        Note over Node2: Deserialize & Apply<br/>Cache Invalidation
    end

    rect rgba(150, 100, 200, 0.5)
        Note over Node1,WAN: Inter-Region (WAN) Invalidation
        Node1->>LAN1: Broadcast(CacheInvalidation)
        LAN1->>WAN: Bridge relays to WAN<br/>(direction=DIRECTION_WAN)
        WAN->>LAN2: Ambassador notifies<br/>remote LAN pool
        LAN2->>Node2: NotifyMsg(ClusterMessage)
        Note over Node2: Deserialize & Apply<br/>Cache Invalidation
    end
Loading
sequenceDiagram
    participant App as Service Start
    participant Cluster as cluster.New()
    participant LAN as LAN Memberlist
    participant Seeds as LAN Seeds
    participant Bridge as Bridge Eval Loop
    participant WAN as WAN Memberlist
    participant WanSeeds as WAN Seeds

    App->>Cluster: New(cfg Config)
    activate Cluster
    Cluster->>LAN: Create with DefaultLANConfig
    Cluster->>LAN: Add Delegate & EventDelegate
    Cluster->>LAN: Create TransmitLimitedQueue
    Cluster->>Bridge: Start bridgeEvalLoop goroutine
    Cluster->>Seeds: joinSeeds(LANSeeds)
    activate Seeds
    Seeds->>LAN: Join with backoff/retry
    Seeds-->>Cluster: Success callback
    deactivate Seeds
    
    Note over Bridge: Periodic evaluation
    Bridge->>LAN: Get smallest member by name
    alt Is this node smallest?
        Bridge->>WAN: promoteToBridge
        activate WAN
        WAN->>WAN: Create with DefaultWANConfig
        WAN->>WAN: Add WAN delegate
        WAN->>WanSeeds: joinSeeds(WANSeeds)
        WanSeeds->>WAN: Join with backoff
        WAN-->>Bridge: Success
        deactivate WAN
    else Is not smallest
        Bridge->>WAN: demoteFromBridge (if currently bridge)
    end
    
    Cluster-->>App: Return Cluster instance
    deactivate Cluster
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides context on the gossip implementation and architectural goal but lacks critical information: testing steps, checklist items, and issue references are all missing or unchecked, failing to meet template requirements. Complete the PR template: reference a tracking issue, provide concrete testing steps, run all required checks (fmt, build, etc.), and check off template items before merging.
Docstring Coverage ⚠️ Warning Docstring coverage is 45.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'feat/gossip' is vague and generic. While it indicates a feature related to gossip, it does not clearly convey the primary change (replacing Kafka-based cache invalidation with a two-tier gossip cluster for distributed cache invalidation). Use a more descriptive title such as 'Replace Kafka-based cache invalidation with gossip cluster' to clearly summarize the main architectural change.
✅ Passed checks (1 passed)
Check name Status Explanation
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/gossip

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 13:36 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 14:03 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 14:09 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 16:19 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 16:31 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 16:47 Inactive
@Flo4604 Flo4604 changed the title feat/gossip: draft wip please ignore feat/gossip Feb 12, 2026
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 17:06 Inactive
@Flo4604 Flo4604 changed the base branch from main to chore/remove-agent February 12, 2026 17:19
@vercel vercel bot temporarily deployed to Preview – engineering February 12, 2026 17:21 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 17:21 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 17:40 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 18:01 Inactive
Base automatically changed from chore/remove-agent to main February 12, 2026 18:06
@Flo4604 Flo4604 requested a review from chronark February 12, 2026 18:07
@vercel vercel bot temporarily deployed to Preview – engineering February 12, 2026 18:20 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 12, 2026 18:22 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering February 16, 2026 17:43 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 16, 2026 17:46 Inactive
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@internal/services/caches/caches.go`:
- Around line 168-249: The dispatcher created in New() must be closed when any
subsequent cache creation fails to avoid leaking resources: after the dispatcher
is successfully created (variable name dispatcher in New), ensure you call
dispatcher.Close() on every early return that follows (e.g., every "return
Caches{}, err" that occurs after calls to createCache such as when building
ratelimitNamespace, verificationKeyByHash, liveApiByID, clickhouseSetting,
keyAuthToApiRow, apiToKeyAuthRow, etc.), or preferably add a deferred cleanup
like "defer func(){ if !initialized { dispatcher.Close() } }()" immediately
after creating dispatcher and set initialized=true only on the final successful
return; update all error paths accordingly so dispatcher.Close() runs on
failure.

In `@pkg/cache/clustering/broadcaster_gossip.go`:
- Around line 60-63: GossipBroadcaster.Close currently forwards to
b.cluster.Close but ownership is ambiguous and can result in double-close;
modify GossipBroadcaster to make Close idempotent by adding a sync.Once (or
equivalent boolean + mutex) on the GossipBroadcaster struct and invoke
b.cluster.Close inside that Once, or clearly transfer/document ownership so only
one caller closes the cluster (e.g., remove cluster.Close from
GossipBroadcaster.Close if run.go defers closing); update the Close method on
GossipBroadcaster to use the Once/guard and ensure subsequent Close calls return
nil (or the original error) without calling cluster.Close again.

In `@svc/frontline/services/caches/caches.go`:
- Around line 104-160: When
clustering.NewInvalidationDispatcher(config.Broadcaster) succeeds but a
subsequent createCache call fails, the dispatcher is leaked; update the New()
path to call dispatcher.Close() (or dispatcher.Close(context?) depending on its
API) before each early return after dispatcher initialization (i.e., before each
fmt.Errorf return after createCache for frontlineRoute, sentinelsByEnvironment,
tlsCertificate). Guard the Close call with a nil check on dispatcher and ensure
you preserve the original returned error; do the same for any other early
returns in this function after dispatcher was set.
🧹 Nitpick comments (6)
svc/krane/internal/sentinel/apply.go (2)

392-446: Multiple gossip services with identical selectors per environment.

Each sentinel creates its own gossip service (<k8sName>-gossip-lan) but the selector matches ALL sentinels in the environment via EnvironmentID + ComponentSentinel. This means multiple headless services will resolve to the same set of pods.

While this works (DNS will resolve any of them to the same pod IPs), it creates redundant services. Consider either:

  1. Use a single environment-scoped gossip service name (idempotent across sentinels)
  2. Keep per-sentinel services but scope the selector to that sentinel

This isn't blocking since it functions correctly, but adds unnecessary resources.


448-524: Same redundancy applies to CiliumNetworkPolicy.

Similar to the gossip service, each sentinel creates its own policy with the same environment-scoped selector. Multiple policies with identical selectors are functionally equivalent but redundant.

pkg/cache/clustering/gossip_e2e_test.go (1)

54-55: Magic sleep may be fragile.

The 50ms sleep before node 2 creation appears to be a timing workaround. Consider documenting why this is needed or using a more deterministic approach (e.g., waiting for node 1 to be ready to accept connections).

dev/k8s/manifests/api.yaml (1)

78-84: Consider adding UNKEY_GOSSIP_BIND_ADDR.

Gossip enabled but bind address not specified. If the default (likely 0.0.0.0 or pod IP) is intentional, this is fine, but explicit config aids clarity.

svc/sentinel/services/router/service.go (1)

45-82: Consider extracting clusterOpts and createCache to a shared package.

This pattern is duplicated in svc/frontline/services/caches/caches.go. Could be a shared helper in pkg/cache/clustering.

pkg/cache/clustering/broadcaster_gossip.go (1)

31-39: Handler invocation uses context.Background() instead of propagating context.

The handler signature accepts a context, but HandleCacheInvalidation always passes context.Background(). Consider storing the subscription context or accepting context as a parameter if cancellation/deadline propagation is needed.

@vercel vercel bot temporarily deployed to Preview – engineering February 16, 2026 18:22 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 16, 2026 18:25 Inactive
Copy link
Collaborator

@chronark chronark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe reorder the proto fields, but it's not super important

@vercel vercel bot temporarily deployed to Preview – dashboard February 16, 2026 18:38 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 16, 2026 18:50 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard February 16, 2026 19:03 Inactive
@Flo4604 Flo4604 merged commit c5bef06 into main Feb 16, 2026
11 of 12 checks passed
@Flo4604 Flo4604 deleted the feat/gossip branch February 16, 2026 19:08
MichaelUnkey pushed a commit that referenced this pull request Feb 26, 2026
* add a gossip implementation

* add gossip to sentinel/frontline

* add message muxing

* sentinel fun

* cleansings

* cleansings

* cleansings

* cleansings

* use oneof

* fix bazel happiness

* do some changies

* exportoneof

* more cool fancy thingx

* change gateway choosing

* add label

* adjjust some more

* adjjust some more

* fixa test

* goodbye kafka

* fix: bazel

* rename gateway -> ambassador

* add docs

* fix: rabbit comments

* [autofix.ci] apply automated fixes

* idfk

* more changes

* more changes

* fix ordering

* fix missing files

* fix test

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 17, 2026
* test keys table

* re org and exports

* error fix

* Apos

* chore: remove deployment breadcrumbs (#5019)

* fix: cleanup project side nav

* feat: simplify deployment overview page

only show build logs until it's built, then show domains and network

* chore: clean up nav

* fix(clickhouse): improve latest keys used queries for high volume (150M +)  (#4959)

* fix(clickhouse): improve clickhouse query for key logs and add  new table and mv for latest keys used

* fix valid/error count = 0 scenario

* remove identity_id from order by

* wrap identity_id with aggregating function since its removed from the order key

---------

Co-authored-by: Flo <53355483+Flo4604@users.noreply.github.com>

* fix: domain refetch and promotion disable rule (#5013)

* fix: domain refetch and promotion disable rule

* fix: regression

---------

Co-authored-by: Andreas Thomas <dev@chronark.com>

* refactor: move custom domains to tanstack db (#5017)

* refactor: move custom domains to tanstack db

* fix: comment

* fix: delete mutation

* remove: unnecessary query

* remove agent (#5021)

* remove agent

* remove agent

* chore: vault in dashboard (#5023)

* remove agent

* remove agent

* use vault in dashboard

* remove

* project domain (#5022)

* fix: cleanup project side nav

* feat: simplify deployment overview page

only show build logs until it's built, then show domains and network

* chore: clean up nav

* feat: add per-project sticky domain and only display that

* chore: use vault in api (#5024)

* chore: use vault in api

* chore: use vault in api

* fix harness

* use memory test

* vault container go start

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* fix: Make GH callback dynamic (#5029)

* dunno

* nextjs should allow a setting that says dynamic

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* allow longer timeouts (#5032)

* docs: add ratelimit.unkey.com benchmark links to ratelimiting docs

Add references to real-time performance benchmarks in:
- introduction.mdx: new 'Performance at scale' accordion
- modes.mdx: link after latency claim

Presents benchmarks as capability demonstration rather than comparison.

* docs: add description to cache store interface page (#5037)

Add missing SEO description to frontmatter

Generated-By: mintlify-agent

Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: Andreas Thomas <dev@chronark.com>

* docs: remove orphaned SDK documentation (#5033)

Remove Spring Boot Java, Rust, and Elixir SDK docs that are not linked in navigation and appear to be outdated/unmaintained.

Generated-By: mintlify-agent

Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: Andreas Thomas <dev@chronark.com>

* No data

* add bg

* rework release (#5044)

* rework release

* rework release

* feat: generate rpc wrappers (#5028)

* feat: generate rpc wrappers

* bazel happyier

* more changes

* more changes

* move path

* delete old files (#5043)

* fix: rabbit comments

---------

Co-authored-by: Oz <21091016+ogzhanolguncu@users.noreply.github.com>

* feat/gossip (#5015)

* add a gossip implementation

* add gossip to sentinel/frontline

* add message muxing

* sentinel fun

* cleansings

* cleansings

* cleansings

* cleansings

* use oneof

* fix bazel happiness

* do some changies

* exportoneof

* more cool fancy thingx

* change gateway choosing

* add label

* adjjust some more

* adjjust some more

* fixa test

* goodbye kafka

* fix: bazel

* rename gateway -> ambassador

* add docs

* fix: rabbit comments

* [autofix.ci] apply automated fixes

* idfk

* more changes

* more changes

* fix ordering

* fix missing files

* fix test

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* fix: retry hubble ui (#5056)

* fix: wait for cillium policy until CRDs are ready (#5059)

* fix: retry cillium policy until CRDs are ready

* fix: blocks until all system pods are ready

* deployment build screen v1 (#5042)

* fix: cleanup project side nav

* feat: simplify deployment overview page

only show build logs until it's built, then show domains and network

* feat: new build screen for ongoing deployments

* fix: table column typo

* fix: update copy to remove mention of analytics deletion (#5067)

* fix typo (#5039)

* rfc: sentinel middlewares (#5041)

* fix: cleanup project side nav

* feat: simplify deployment overview page

only show build logs until it's built, then show domains and network

* feat: middleware rfc

* Update svc/sentinel/proto/buf.gen.ts.yaml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Flo <53355483+Flo4604@users.noreply.github.com>

* feat: config files (#5045)

* fix: cleanup project side nav

* feat: simplify deployment overview page

only show build logs until it's built, then show domains and network

* feat: add pkg/config for struct-tag-driven TOML/YAML/JSON configuration

Introduces a new configuration package that replaces environment variable
based configuration with file-based config. Features:

- Load and validate config from TOML, YAML, or JSON files
- Struct tag driven: required, default, min/max, oneof, nonempty
- Environment variable expansion (${VAR} and ${VAR:-default})
- JSON Schema generation for editor autocompletion
- Collects all validation errors instead of failing on first
- Custom Validator interface for cross-field checks

Also adds cmd/generate-config-docs for generating MDX documentation
from Go struct tags, and a Makefile target 'config-docs'.

Amp-Thread-ID: https://ampcode.com/threads/T-019c672a-0e8e-7138-b0ab-27cdbeaca7ba
Co-authored-by: Amp <amp@ampcode.com>

* remove gen

* clean up

* feat(api): migrate API service to file-based TOML config (#5046)

* feat(api): migrate API service to file-based config

Migrate the API service from environment variables to TOML file-based
configuration using pkg/config. Replaces all UNKEY_* env vars with a
structured api.toml config file.

Changes:
- Rewrite svc/api/config.go with tagged Config struct
- Update svc/api/run.go to use new config fields
- Update cmd/api/main.go to accept --config flag
- Add dev/config/api.toml for docker-compose
- Update dev/k8s/manifests/api.yaml with ConfigMap
- Regenerate config docs from struct tags

Amp-Thread-ID: https://ampcode.com/threads/T-019c672a-0e8e-7138-b0ab-27cdbeaca7ba
Co-authored-by: Amp <amp@ampcode.com>

* feat(vault): migrate Vault service to file-based TOML config (#5047)

* feat(vault): migrate Vault service to file-based config

Migrate the Vault service from environment variables to TOML file-based
configuration using pkg/config.

Changes:
- Rewrite svc/vault/config.go with tagged Config struct
- Update svc/vault/run.go to use new config fields
- Update cmd/vault/main.go to accept --config flag
- Add dev/config/vault.toml for docker-compose
- Update dev/k8s/manifests/vault.yaml with ConfigMap
- Remove UNKEY_* env vars from docker-compose and k8s

Amp-Thread-ID: https://ampcode.com/threads/T-019c672a-0e8e-7138-b0ab-27cdbeaca7ba
Co-authored-by: Amp <amp@ampcode.com>

* feat(ctrl): migrate Ctrl API and Worker to file-based TOML config (#5048)

* feat(ctrl): migrate Ctrl API and Worker services to file-based config

Migrate both ctrl-api and ctrl-worker from environment variables to TOML
file-based configuration using pkg/config.

Changes:
- Rewrite svc/ctrl/api/config.go and svc/ctrl/worker/config.go
- Update run.go files to use new config fields
- Update cmd/ctrl/api.go and worker.go to accept --config flag
- Add dev/config/ctrl-api.toml and ctrl-worker.toml
- Update dev/k8s/manifests/ctrl-api.yaml and ctrl-worker.yaml with ConfigMaps
- Remove UNKEY_* env vars from docker-compose and k8s manifests

* feat(krane): migrate Krane service to file-based TOML config (#5049)

* feat(krane): migrate Krane service to file-based config

Migrate the Krane container orchestrator from environment variables to
TOML file-based configuration using pkg/config.

Changes:
- Rewrite svc/krane/config.go with tagged Config struct
- Update svc/krane/run.go to use new config fields
- Update cmd/krane/main.go to accept --config flag
- Add dev/config/krane.toml for docker-compose
- Update dev/k8s/manifests/krane.yaml with ConfigMap
- Remove UNKEY_* env vars from docker-compose and k8s

Amp-Thread-ID: https://ampcode.com/threads/T-019c672a-0e8e-7138-b0ab-27cdbeaca7ba
Co-authored-by: Amp <amp@ampcode.com>

* feat(frontline): migrate Frontline service to file-based TOML config (#5050)

* feat(frontline): migrate Frontline service to file-based config

Migrate the Frontline reverse proxy from environment variables to TOML
file-based configuration using pkg/config.

Changes:
- Rewrite svc/frontline/config.go with tagged Config struct
- Update svc/frontline/run.go to use new config fields
- Update cmd/frontline/main.go to accept --config flag
- Update dev/k8s/manifests/frontline.yaml with ConfigMap
- Remove UNKEY_* env vars from k8s manifest

Amp-Thread-ID: https://ampcode.com/threads/T-019c672a-0e8e-7138-b0ab-27cdbeaca7ba
Co-authored-by: Amp <amp@ampcode.com>

* feat(preflight): migrate Preflight service to file-based TOML config (#5051)

* feat(preflight): migrate Preflight service to file-based config

Migrate the Preflight webhook admission controller from environment
variables to TOML file-based configuration using pkg/config.

Changes:
- Rewrite svc/preflight/config.go with tagged Config struct
- Update svc/preflight/run.go to use new config fields
- Update cmd/preflight/main.go to accept --config flag
- Update dev/k8s/manifests/preflight.yaml with ConfigMap
- Remove UNKEY_* env vars from k8s manifest

Amp-Thread-ID: https://ampcode.com/threads/T-019c672a-0e8e-7138-b0ab-27cdbeaca7ba
Co-authored-by: Amp <amp@ampcode.com>

* feat(sentinel): migrate Sentinel service to file-based config (#5052)

Migrate the Sentinel sidecar from environment variables to TOML
file-based configuration using pkg/config. This is the final service
migration in the config stack.

Changes:
- Rewrite svc/sentinel/config.go with tagged Config struct
- Update svc/sentinel/run.go to use new config fields
- Update cmd/sentinel/main.go to accept --config flag
- Update dev/docker-compose.yaml: replace env vars with TOML volume
  mounts for all migrated services (api, vault, krane, ctrl-api,
  ctrl-worker)
- Minor formatting fix in pkg/db generated code

---------

Co-authored-by: Amp <amp@ampcode.com>

---------

Co-authored-by: Amp <amp@ampcode.com>

---------

Co-authored-by: Amp <amp@ampcode.com>

---------

Co-authored-by: Amp <amp@ampcode.com>

---------

Co-authored-by: Amp <amp@ampcode.com>

---------

Co-authored-by: Amp <amp@ampcode.com>

* fix: bad config

* remove unnecessary tls config for ctrl api

* fix: error

* fix: do not log config content

* ix: remove kafka

* fix: replica

* fix: return err

* fix: only overwrite frontline id if missing

* fix: observability

* fix: otel

* fix: redundant config

* fix: reuse tls

* fix: consolidate

* fix: use shared configs

* fix: config

* fix: something

* Update pkg/config/common.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix: vault startup

* fix: instanceid

* fix: vault config

* fix: make configs required

* fix: everything works again

---------

Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* clean deployment url label (#4976)

* clean deployment url

* fix conversion error and maintain single source of truth

---------

Co-authored-by: Andreas Thomas <dev@chronark.com>
Co-authored-by: Flo <53355483+Flo4604@users.noreply.github.com>

* feat: New deploy settings (#5073)

* feat: add github section

* feat: Add icons

* feat: add new sections

* feat: add settingsgroup

* feat: add region selection

* feat: add instances

* feat: add memory and cpu section

* feat: add sections

* feat: add health check

* feat: add scaling

* fix: get rid of redundant prop

* refactor: Add toasts to mutations

* refactor: rename component

* feat: add port section

* feat: fix overlapping borders

* refactor: fix healthcheck tRPC

* feat: add command section

* feat: add env section

* fix: finalize env-vars

* refactor: finalize

* feat: Add custom domains

* fix: overflwo

* feat: make tRPC route for each mutation

* fix: displayValue styles

* refactor: tidy

* fix: revert accidental changes

* feat: add cname table

* fix: github styling issues

* refactor: tidy

* refactor: rename

* fix: linter

* fix: dynamic form issue

* feat: allow env selection

* chore: tidy

* fix: use same chevron

* fix: use certmanager if availiable otherwise certfile (#5076)

* fix: use certmanager if availiable otherwise certfile

* feat: make tls enabled by default

now you need to explicitely pass tls.disabled=true
if not, we fail during startup.

also renamed some port vars to make it obvious what they are used for

* chore: log candidates for easier debugging

* fix: use static certs first

---------

Co-authored-by: chronark <dev@chronark.com>

* feat: sentinel key verification middleware (#5079)

* feat: key-sentinel-middleware

* fix error pages (#5083)

* fix error pages

* remove test

* move some files

* Update svc/frontline/internal/errorpage/error.go.tmpl

Co-authored-by: Andreas Thomas <dev@chronark.com>

* [autofix.ci] apply automated fixes

---------

Co-authored-by: Andreas Thomas <dev@chronark.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* add rl headers.

* feat: new ui and fixed a bunch of stuff

* Update svc/sentinel/engine/match.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix: coderabbit

---------

Co-authored-by: Andreas Thomas <dev@chronark.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* clean up after sentinel middleware (#5088)

* feat: key-sentinel-middleware

* fix error pages (#5083)

* fix error pages

* remove test

* move some files

* Update svc/frontline/internal/errorpage/error.go.tmpl

Co-authored-by: Andreas Thomas <dev@chronark.com>

* [autofix.ci] apply automated fixes

---------

Co-authored-by: Andreas Thomas <dev@chronark.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* add rl headers.

* feat: new ui and fixed a bunch of stuff

* Update svc/sentinel/engine/match.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix: coderabbit

* chore: clean up old columns

* fix: db

---------

Co-authored-by: Flo <flo@unkey.com>
Co-authored-by: Flo <53355483+Flo4604@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix proto type (#5093)

* fix: cleanup project side nav

* feat: simplify deployment overview page

only show build logs until it's built, then show domains and network

* fix: runtime exception due to gaslighting type

* fix: Modals with combo box work again  (#5002)

* chore: remove chproxy routes (#5101)

* chore: remove chproxy routes

* refactor: move prometheus metrics to scoped packages (#5102)

* remove the hand holding (#5108)

* feat: gossip metrics (#5107)

* fix: Make identity slugs copyable (#5100)

* fix: make me copy

* Update web/apps/dashboard/app/(app)/[workspaceSlug]/authorization/permissions/components/table/components/assigned-items-cell.tsx

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* refactor: useDeployment hook usage (#5086)

* refactor: useDeployment hook usage

* fix: remove redundant check

* feat: i need more metrics (#5115)

* docs: Python and Go examples (#5058)

* Add Python and Go SDK documentation

- Add Go quickstart guide with stdlib, Gin, and Echo examples
- Add Python quickstart guide with FastAPI, Flask, Django
- Add Go cookbook: stdlib, Gin, Echo middleware recipes
- Add Python cookbook: FastAPI rate limiting recipe

All content uses Unkey v2 API only.

* Add final Python cookbook recipe and 5-minutes guide

* Update docs.json sidebar navigation

- Add Go and Python quickstart guides to Framework Guides
- Add Go and Python cookbook recipes to Recipes section
- Remove duplicate 5-minutes quickstart file

* Add Go examples to quickstart and reorganize cookbook by language

- Add Go code examples to /quickstart/quickstart.mdx for key creation and verification
- Reorganize cookbook recipes into subsections: TypeScript, Go, Python, General
- Keep existing TypeScript and Python examples in quickstart

* Update cookbook index with new Go and Python recipes

* Fix code issues in Go and Python documentation

- Fix int to string conversion in go-gin-middleware (use strconv)
- Fix middleware composition in go-stdlib-middleware
- Fix wait calculation in python-fastapi-ratelimit (use total_seconds)
- Fix headers attachment in python-fastapi-ratelimit (use JSONResponse)
- Fix nil pointer dereference in quickstart/go
- Fix unsafe type assertion in quickstart/go

* Fix async/sync issue and nil pointer in quickstart docs

- Use verify_key_async in Python async route
- Add nil check for result.Code in Go quickstart

* Fix more code issues in documentation

- Fix GetUnkeyResult type assertion in go-gin-middleware
- Fix imports in python-fastapi-ratelimit (add JSONResponse, remove unused timedelta)
- Update basic rate limit example to use async API with context manager
- Add missing os import in Django settings snippet

* Fix missing os import in python-flask-auth.mdx

* Fix unsafe type assertions in Go middleware docs

- Fix RequirePermission in go-echo-middleware with safe type assertion
- Fix GetUnkeyResult in go-echo-middleware with safe type assertion
- Fix RequirePermission in go-gin-middleware with safe type assertion

* Fix error handling in Python docs - replace ApiError with UnkeyError

* Update legacy analytics documentation

- Replace outdated /apis/features/analytics.mdx with minimal reference page
- Remove analytics from API Keys sidebar in docs.json
- Add redirect from /apis/features/analytics to /analytics/overview

* fix

* Update to mint

* Fix critical type assertion issues in go-gin-middleware

- Store pointer to struct in context (not value) for type assertion compatibility
- Add checked type assertion in RequireRole with proper error handling

* Add it back

* fix the comma

* revert

* Update go examples

* cookbook update

* update quickstart

* remove analytics page that is redirected

* Update web/apps/docs/cookbook/go-echo-middleware.mdx

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update web/apps/docs/cookbook/go-echo-middleware.mdx

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* fix: deploy quick fixes (#5085)

* fix: fetch correct deployment+sentinel

* fix: add missing team switcher hover indicator

* refactor: use the same empty text

* fix: lock network view and fix generate dummy network

* fix: safari rendering issue of network

* chore: fmt

* fix: build

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: Flo <53355483+Flo4604@users.noreply.github.com>
Co-authored-by: James P <james@unkey.com>

* docs: remove duplicate onboarding page (#5035)

Remove quickstart/onboarding/onboarding-api.mdx which duplicates content from the new quickstart. Redirects already exist in docs.json.

Generated-By: mintlify-agent

Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: Andreas Thomas <dev@chronark.com>
Co-authored-by: James P <james@unkey.com>

* docs: remove deprecated Vercel integration page (#5034)

The Vercel integration is currently not supported. Remove the page to avoid confusing users.

Generated-By: mintlify-agent

Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: James P <james@unkey.com>

* fix: schema cache (#5116)

* fix: do background gossip connect (#5119)

* fix: do background gossip connect

* bazel happy

* chore: debug wan failures (#5124)

* chore: debug wan failures

* add log writer

* bazel ..........

* bazel ..........

* fix: a user cannot click outside of the org selection modal (#5031)

* fix: a user cannot click outside of the org selection modal

* use errorMessage instead of hard coding messages

* restore x closing functionality

* fix rabbit, fix flash of empty state

* clear last used workspace when auto-selection fails

* remove unused conditional

---------

Co-authored-by: James P <james@unkey.com>

* sentinel prewarm cache (#5071)

* fix: cleanup project side nav

* feat: simplify deployment overview page

only show build logs until it's built, then show domains and network

* feat: sentinels prewarm their cache

it's not optmized, but pretty neat

---------

Co-authored-by: Flo <53355483+Flo4604@users.noreply.github.com>

* fix: ignore empty wan (#5122)

Co-authored-by: Andreas Thomas <dev@chronark.com>
Co-authored-by: James P <james@unkey.com>

* docs: move docs (#5125)

* refactor: deploy settings tanstack (#5104)

* refactor: move them to tanstack

* refactor: tidy up

* feat: add env provider to decide what env we are on

* refactor: tidy

* feat: add scroll into view for settingcard

* fix: bg

* refactor: remove toasts from env-vars

* chore: tidy

* fix: build

* feat: vault bulk en/decrypt (#5127)

* feat: vault bulk en/decrypt

* oops wrong file

* cleanup proto

* [autofix.ci] apply automated fixes

* cleanup

* cleanup

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* feat: trace generated rpc clients (#5128)

* feat: trace generated rpc clients

* ignore not found

* fix: docs generator paths (#5136)

* fix: retry memberlist creation (#5134)

* fix: retry memberlist creation

* remove comments

* move to const

* fix: Restore filtering on logs (#5138)

* Restore filtering on logs

Restores filtering on the logs.

* [autofix.ci] apply automated fixes

* fmt

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* fix: resolve dns to ip (#5139)

* fix: resolve dns to ip

* rabbit comments

* Fix/issue 5132 billing section widths (#5140)

* fix: Billing page has inconsistent section widths (#5132)

Standardized all SettingCard components to use consistent width classes:
- Updated Usage component: contentWidth changed from 'w-full lg:w-[320px]' to 'w-full'
- Updated CancelAlert component: contentWidth changed from 'w-full lg:w-[320px]' to 'w-full'
- Updated Billing Portal in client.tsx: contentWidth changed from 'w-full lg:w-[320px]' to 'w-full'
- Updated CurrentPlanCard component: removed min-w-[200px] from className for consistency

All billing sections now use contentWidth='w-full' for consistent layout.

Fixes #5132

* Fix billing and setting cards

* [autofix.ci] apply automated fixes

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* pnpm i

* No virtualization

* pagination footer

* sorting and pagination changes

* refactor

* exports

* move data-table to ui

* install

* fmt

* Footer style changes

* sorting and pagination changes

* sorting and footer loading fix

* cleanup

* prefetch pages

* changes for review comments from rabbit and meg

* Ref fix and removed not needed import

* [autofix.ci] apply automated fixes

* sorting fix and key navigation

* review changes mess

* minor rabbit changes

* Update loading-indicator animation delay

* style change on pageination footer

* ref type change

---------

Co-authored-by: Andreas Thomas <dev@chronark.com>
Co-authored-by: Meg Stepp <mcstepp@users.noreply.github.com>
Co-authored-by: Flo <53355483+Flo4604@users.noreply.github.com>
Co-authored-by: Oz <21091016+ogzhanolguncu@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: James P <james@unkey.com>
Co-authored-by: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com>
Co-authored-by: gui martins <guilhermev2huehue@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Amp <amp@ampcode.com>
Co-authored-by: Vansh Malhotra <vansh.malhotra439@gmail.com>
Co-authored-by: Flo <flo@unkey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants