Platform Test Matrix: Linux (Ubuntu) + macOS
- Linux: Full workspace (including Control Plane DB tests, migrations, schema drift, coverage, performance)
- macOS: Full workspace including Control Plane (PostgreSQL 15 via Homebrew service) This ensures cross-platform parity for the CLI, operator, and control-plane.
An internal Platform-as-a-Service (PaaS) designed to minimize application deployment latency and systematically elevate Developer Experience (DX) by transferring the entirety of the build pipeline (dependency resolution, compilation, packaging) to the client edge through a high‑performance Rust CLI. Rather than executing non‑deterministic server‑side builds, developers upload a pre‑assembled, production‑ready artifact. This model is intended to reduce end‑to‑end deployment latency from minutes to seconds while decreasing infrastructure consumption and variance.
AetherEngine provides an opinionated, artifact‑centric deployment paradigm initially constrained to Node.js Long-Term Support (LTS) runtimes (e.g. 18.x, 20.x). The Minimum Viable Product (MVP) aims to empirically demonstrate an ≥80% reduction in mean deployment time for existing internal Node.js services relative to the incumbent pipeline.
- Business: ≥80% reduction in p95 deploy duration versus baseline.
- Product: A cohesive workflow—from local source to production runtime—achieved through 3–5 intuitive CLI commands.
- Technical: A stable, horizontally extensible control plane (Rust + Axum + SQLx + PostgreSQL) and operational data plane (Kubernetes) supporting artifact‑based rollouts.
Historical Note: Early exploratory drafts referenced TiDB. The authoritative MVP datastore is PostgreSQL (SQLx). TiDB may re‑enter the roadmap for multi‑region or HTAP scenarios.
The platform decomposes into four bounded components:
Component | Role | Core Technologies |
---|---|---|
Aether CLI | Local build, packaging, artifact upload, deployment orchestration | Rust, clap, reqwest, tokio |
Control Plane | API surface, deployment metadata, orchestration, Kubernetes integration | Rust, Axum, SQLx, PostgreSQL, kube-rs |
Artifact Registry | Immutable storage for packaged application artifacts | S3-compatible object storage (e.g. MinIO) |
Data Plane | Deterministic execution environment for runtime containers | Kubernetes, optimized Node.js base images |
- Developer executes
aether deploy
at the project root. - CLI detects Node.js project (presence of
package.json
). - Runs
npm install --production
(or a deterministic equivalent) locally. - Packages source +
node_modules
into a compressed artifact (app.tar.gz
). - Requests a pre‑signed upload URL from the Control Plane.
- Uploads the artifact to the Artifact Registry.
- Issues a deployment request (
POST /deployments
) containing artifact digest + runtime metadata. - Control Plane persists deployment record and synthesizes a Kubernetes workload specification.
- Data Plane init container downloads & decompresses the artifact.
- Main container (base image:
aether-nodejs:20-slim
) executes the defined start command (default:npm start
).
- Artifact Immutability & Addressability (content hash).
- Deterministic Local Build (eliminating CI variability).
- Minimal Server Trust Surface (no remote build execution).
- Observability‑first (deployment UUID + digest propagation across logs / traces).
Command | Purpose | Notes |
---|---|---|
aether login |
Authenticate user; store credential securely | Supports token rotation |
aether deploy |
Package & publish artifact; trigger deployment | Auto runtime detection, hash computation |
aether logs |
Stream live or historical logs | Pod label selectors |
aether list |
Enumerate applications & recent deployments | Future: filtering & pagination |
Planned Enhancements:
- Parallel compression + hashing for large dependency graphs
- Local SBOM generation (supply chain visibility)
- Integrity verification before runtime entrypoint execution
$ aether --help
Global Flags:
--log-level <trace|debug|info|warn|error> (default: info)
--log-format <auto|text|json> (default: auto)
Subcommands:
login [--username <name>] Authenticate (mock)
deploy [--dry-run] Package and (mock) deploy current project
logs [--app <name>] Show recent logs (mock)
list List applications (mock)
completions --shell <bash|zsh|fish> Generate shell completion script (hidden)
Examples:
aether login
aether deploy --dry-run
aether deploy
aether --log-format json list
aether completions --shell bash > aether.bash
aether deploy --format json --no-sbom --pack-only
Configuration:
- Config file:
${XDG_CONFIG_HOME:-~/.config}/aether/config.toml
- Session file:
${XDG_CACHE_HOME:-~/.cache}/aether/session.json
- Env override:
AETHER_DEFAULT_NAMESPACE
- Ignore file:
.aetherignore
(glob patterns, one per line, # comments)
Exit Codes:
Code | Meaning |
---|---|
0 | Success |
2 | Usage / argument error (clap) |
10 | Config error |
20 | Runtime internal |
30 | I/O error |
40 | Network error (reserved) |
Performance: Target cold start <150ms (local); CI threshold set to <800ms for noise tolerance.
When invoking aether deploy --format json
, the CLI prints a single JSON object to stdout (logs remain on stderr) with the following stable fields:
Field | Type | Description |
---|---|---|
artifact |
string | Path to generated .tar.gz artifact |
digest |
string (hex sha256) | Content hash of packaged files (streaming computed) |
size_bytes |
number | Size of artifact on disk |
manifest |
string | Path to manifest file listing per‑file hashes |
sbom |
string | null |
signature |
string | null |
Error Behavior (JSON mode): currently non‑zero failures may still emit human readable text before JSON; future work will standardize an error envelope { "error": { code, message } }
(tracked in Issue 01 follow-up – now resolved in this branch by suppressing SBOM generation when skipped).
Responsibilities:
- REST API (Axum) for authentication, deployment management, log access
- Artifact metadata tracking (digest, runtime, size, provenance timestamps)
- Kubernetes workload synthesis via
kube-rs
- Enforcement of environment, secret, and resource policies
Representative Endpoints (MVP subset):
POST /deployments
– Register new deployment (idempotent via artifact digest)GET /apps/{app}/logs
– Stream or tail logs (upgrade: WebSocket or chunked HTTP)GET /apps/{app}/deployments
– List historical deploymentsPOST /artifacts
– Upload artifact (headers:X-Aether-Artifact-Digest
, optionalX-Aether-Signature
)GET /artifacts
– List recent artifacts (metadata only)GET /healthz
,GET /readyz
– Liveness / readiness probes
All API errors return a stable JSON envelope and appropriate HTTP status code:
HTTP/1.1 409 Conflict
Content-Type: application/json
{
"code": "conflict",
"message": "application name exists"
}
Canonical error codes (subject to extension):
Code | HTTP | Semantics |
---|---|---|
bad_request |
400 | Payload / validation failure |
not_found |
404 | Entity does not exist |
conflict |
409 | Uniqueness or state conflict |
service_unavailable |
503 | Dependency (DB, downstream) not ready |
internal |
500 | Unclassified unexpected error |
Design Notes:
- Machine-friendly
code
enables future localization / client mapping. message
intentionally human oriented; avoid leaking internal stack traces.- Additional diagnostic fields (e.g.
details
,trace_id
) may be added when tracing is wired. - Non-error (2xx) responses never include this envelope.
On success (200 OK
) the control plane returns:
Field | Type | Meaning |
---|---|---|
artifact_url |
string | Location reference (currently file URI mock) |
digest |
string | SHA-256 hex digest (server recomputed) |
duplicate |
bool | True if digest already existed (idempotent; file not re-written) |
app_linked |
bool | True if app_name matched an existing application and was linked |
verified |
bool | True if an attached Ed25519 signature matched a registered application public key |
Additional error codes related to artifact upload:
Code | HTTP | Semantics |
---|---|---|
missing_digest |
400 | Header X-Aether-Artifact-Digest absent |
invalid_digest |
400 | Malformed digest (length/hex) |
digest_mismatch |
400 | Provided digest did not match recomputed |
Register an Ed25519 public key for an application so subsequent uploads with header X-Aether-Signature
over the digest value set verified=true
.
Endpoint:
POST /apps/{app_name}/public-keys
{ "public_key_hex": "<64 hex chars>" }
Response 201 Created
:
{ "app_id": "<uuid>", "public_key_hex": "<hex>", "active": true }
Multiple keys per app are allowed (all active=true
by default). Deactivation endpoint TBD.
HEAD /artifacts/{digest}
returns 200
if present, 404
if absent (no body). Enables the CLI to skip re‑uploads.
Uploads are limited by a semaphore (env: AETHER_MAX_CONCURRENT_UPLOADS
, default 32
). Excess uploads await a permit, preventing resource exhaustion.
Metric | Type | Description |
---|---|---|
artifact_upload_bytes_total |
Counter | Bytes successfully persisted (new uploads) |
artifact_upload_duration_seconds |
Histogram | End-to-end upload + verify duration |
artifact_uploads_in_progress |
Gauge | Concurrent in-flight uploads |
artifacts_total |
Gauge | Total stored artifacts (initial load + increment on insert) |
Bearer token auth (Authorization: Bearer <token>
) configured via AETHER_API_TOKENS
(CSV) or fallback AETHER_API_TOKEN
. OpenAPI spec exposes a bearer_auth
security scheme applied globally.
Two-phase single-part flow:
POST /artifacts/presign
– obtain presigned PUT (or method NONE if duplicate already stored)- Client performs PUT directly to object storage (S3 / MinIO) using returned headers
POST /artifacts/complete
– finalize (size & optional remote hash verification, quota + retention enforcement, idempotency)
Multipart flow (large artifacts) adds:
POST /artifacts/multipart/init
– returnsupload_id
+storage_key
- Loop:
POST /artifacts/multipart/presign-part
– presign each part (client uploads via PUT) POST /artifacts/multipart/complete
– supply list of(part_number, etag)
pairs, finalize record
Idempotency: supply idempotency_key
on complete endpoints; conflicting reuse across different digests is rejected with 409 idempotency_conflict
.
Quota Enforcement: configurable per-app limits on artifact count and cumulative bytes. Rejections return 403 quota_exceeded
.
Retention Policy: keep latest N stored artifacts per app; older rows deleted post-store (retention events emitted).
Server-Side Encryption (S3): set AETHER_S3_SSE
to AES256
or aws:kms
(optionally AETHER_S3_SSE_KMS_KEY
).
Remote Verification Toggles:
- Size:
AETHER_VERIFY_REMOTE_SIZE
(default on) - Metadata digest:
AETHER_VERIFY_REMOTE_DIGEST
(default on) - Full hash (small objects only):
AETHER_VERIFY_REMOTE_HASH
+AETHER_REMOTE_HASH_MAX_BYTES
Existing (core) metrics plus newly added artifact lifecycle instrumentation:
Counters:
artifact_presign_requests_total
– presign attemptsartifact_presign_failures_total
– presign errors (backend/head failures)artifact_complete_failures_total
– completion DB / logic errorsartifact_upload_bytes_total
– bytes of legacy (deprecated) direct uploadsartifact_digest_mismatch_total
– remote metadata/hash mismatchesartifact_size_exceeded_total
– rejected for per-object size limitartifact_pending_gc_runs_total
/artifact_pending_gc_deleted_total
– stale pending cleanupartifact_events_total
– audit events writtenartifact_legacy_upload_requests_total
– deprecated/artifacts
hitsartifact_multipart_inits_total
– multipart session startsartifact_multipart_part_presigns_total
– part presign callsartifact_multipart_completes_total
– successful multipart completesartifact_multipart_complete_failures_total
– multipart completion failuresartifact_quota_exceeded_total
– quota rejections
Gauges:
artifact_uploads_in_progress
– active legacy direct uploadsartifacts_total
– stored artifact rows (adjusted on insert/init)
Histograms:
artifact_upload_duration_seconds
– legacy direct upload wall timeartifact_put_duration_seconds
– client-reported PUT transfer duration (two-phase + multipart)artifact_complete_duration_seconds
– server-side complete handler timeartifact_multipart_part_size_bytes
– distribution of part sizes (approximate; estimated at completion)artifact_multipart_parts_per_artifact
– distribution of part counts per multipart artifact
Cardinality Guidance: all metrics intentionally have zero or minimal label cardinality (no per-app labels) to remain low cost at scale; future segmentation (e.g. per-app) would use dynamic metric families + allow lists.
Core limits & behavior:
AETHER_MAX_ARTIFACT_SIZE_BYTES
– reject complete if reported size exceeds (0=disabled)AETHER_MAX_CONCURRENT_UPLOADS
– semaphore permits for legacy endpoint (default 32)AETHER_PRESIGN_EXPIRE_SECS
– expiry for presigned URLs (default 900)AETHER_REQUIRE_PRESIGN
– force presign before complete (true|1
)
Quotas & retention:
AETHER_MAX_ARTIFACTS_PER_APP
– limit count per app (0/absent disables)AETHER_MAX_TOTAL_BYTES_PER_APP
– cumulative byte quota per appAETHER_RETAIN_LATEST_PER_APP
– keep N newest stored artifacts; delete older
Remote verification:
AETHER_VERIFY_REMOTE_SIZE
– enable size HEAD check (default true)AETHER_VERIFY_REMOTE_DIGEST
– validate remote metadata sha256AETHER_VERIFY_REMOTE_HASH
– fetch full object (<=AETHER_REMOTE_HASH_MAX_BYTES
) and hashAETHER_REMOTE_HASH_MAX_BYTES
– cap for remote hash download (default 8,000,000)
Multipart thresholds:
AETHER_MULTIPART_THRESHOLD_BYTES
– client selects multipart if artifact size >= thresholdAETHER_MULTIPART_PART_SIZE_BYTES
– desired part size (client buffer; default 8 MiB)
Storage/S3:
AETHER_STORAGE_MODE
–mock
ors3
AETHER_ARTIFACT_BUCKET
– S3 bucket name (defaultartifacts
)AETHER_S3_BASE_URL
– mock base URL (for mock backend only)AETHER_S3_ENDPOINT_URL
– custom S3 endpoint (MinIO / alternative)AETHER_S3_SSE
–AES256
|aws:kms
(enables SSE)AETHER_S3_SSE_KMS_KEY
– KMS key id/arn when usingaws:kms
Pending GC:
AETHER_PENDING_TTL_SECS
– external GC driver: delete pending older than TTL (used by helperrun_pending_gc
)AETHER_PENDING_GC_INTERVAL_SECS
– operator side scheduling hint (not yet wired)
Client / CLI related:
AETHER_MAX_CONCURRENT_UPLOADS
– legacy path concurrency limitAETHER_API_BASE
– base URL used by CLI for API calls
All boolean style env vars treat true|1
(case-insensitive) as enabled.
Initial Target: Self‑hosted MinIO (S3-compatible API).
Requirements:
- Pre‑signed URL issuance (time‑boxed; ideally single‑use)
- Content-addressed hierarchy (e.g.
artifacts/<app>/<sha256>/app.tar.gz
) - Optional server-side encryption (future)
- Lifecycle policies: Age + unreferenced digest reclamation
Kubernetes Design:
- Init Container: Fetch + decompress artifact into ephemeral volume (EmptyDir or ephemeral CSI)
- Main Container: Execute Node.js process (non-root user) with env injection
- Rollout Strategy (MVP): Replace; roadmap includes canary + blue/green
- Observability: Standardized labels
app=aether, app_name=<name>, deployment_id=<uuid>
Base Image Objectives:
- Slim, reproducible, frequently patched
- Non-root (USER 1000)
- Reduced attack surface (no build toolchain in runtime layer)
Tables (PostgreSQL via SQLx):
applications(id, name, owner, created_at)
artifacts(id, app_id, digest, runtime, size_bytes, created_at)
deployments(id, app_id, artifact_id, status, rollout_started_at, rollout_completed_at)
deployment_events(id, deployment_id, phase, message, timestamp)
users(id, email, auth_provider, created_at)
- Transport Security: HTTPS/TLS enforced for Control Plane + artifact operations.
- Authentication: Short‑lived bearer tokens acquired via
aether login
. - Authorization: Ownership / role validation on mutating endpoints.
- Integrity: SHA‑256 digest validation at deploy time + (optionally) runtime recheck.
- Isolation: Single shared namespace initially; namespace-per-application in roadmap.
Theme | Enhancement |
---|---|
Multi-Runtime | Python, Go, JVM adapters |
Progressive Delivery | Canary, automated rollback on SLO breach |
Observability | Structured event streaming, OpenTelemetry tracing |
Policy | OPA integration, admission control gates |
Supply Chain | Artifact signing (Cosign), provenance attestations |
Scalability | Sharded registry, multi-cluster scheduling |
Detailed procedures are specified in DEVELOPMENT.md
. A provisioning and verification script dev.sh
automates environment bootstrap (Rust toolchain, Docker, MicroK8s, PostgreSQL container) and readiness checks.
Quick Start:
- Ensure Linux host with Docker (and optionally Snap if using MicroK8s).
- Option A (script):
./dev.sh bootstrap
Integration & migration tests now use a Docker ephemeral Postgres (via testcontainers
) by default when DATABASE_URL
is not set. This replaces the previous pg-embed
binary extraction approach (which was fragile in CI with cached/corrupt archives). Behavior:
- If
DATABASE_URL
is defined, tests connect directly (database auto-created if absent). - Else a container
postgres:15-alpine
is started once per test process; a databaseaether_test
is created and migrations applied. - Environment variables:
AETHER_TEST_SHARED_POOL=1
– reuse a single connection pool across tests.AETHER_DISABLE_TESTCONTAINERS=1
– force failure if noDATABASE_URL
(debug / hard fail mode).AETHER_TEST_PG_IMAGE=postgres:16-alpine
– override image.
- The harness exports
DATABASE_URL
after container startup so tests that expect it (e.g. schema checks) transparently work. - First run pays the image pull cost; subsequent runs are typically <10s for schema tests (previously ~56s with fallback retries).
Rationale: deterministic startup, less custom retry logic, smaller maintenance surface vs. embedded binaries.
3. Option B (docker-compose):
bash docker compose up -d postgres minio export DATABASE_URL=postgres://aether:postgres@localhost:5432/aether_dev make test
4. To just start DB for tests: make db-start
or ./dev.sh db-start
5. Run full suite: make test
docker-compose services (Postgres + MinIO) are defined in docker-compose.yml
to enable S3-mode (AETHER_STORAGE_MODE=s3
).
3. Run ./dev.sh verify
to confirm environment readiness.
- Adopt conventional commits (
feat:
,fix:
,docs:
,refactor:
etc.). - Run
cargo fmt
andcargo clippy --all-targets --all-features -D warnings
pre‑PR. - Provide architectural rationale in PR descriptions for substantial changes.
- Maintain backward compatibility for any published CLI flags until a deprecation path is documented.
Internal proprietary platform (license designation TBD). All code, documentation, and artifacts are confidential. External distribution prohibited without explicit authorization.
- Architecture Lead: (TBD)
- Platform Engineering Channel: (internal)
#aether-engine
- Incident Escalation: On-call rotation (TBD)
Version | Date | Author | Notes |
---|---|---|---|
1.0 (MVP Draft) | 2025-09-19 | Initial Compilation | First canonical English architecture document |