fix(docker): add dedicated health endpoint for container healthcheck (#1147)#1355
Conversation
…bhigyanpatwari#1147) PR abhigyanpatwari#1182 switched the probe to HEAD (-I) so the SSE body no longer blocks curl, but the underlying handler still never calls res.end(), leaving the connection open until Docker's 5s timeout races with the TCP half-close. Add a lightweight GET /api/health that returns {"status":"ok"} immediately and point the compose healthcheck at it, fully decoupling the probe from the SSE stream.
|
@azizur100389 is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 8022 tests passed 1 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
|
Claude finished @magyargergo's task in 3m 51s —— View job PR #1355 Review —
|
| Property | Value |
|---|---|
| Head SHA | a43a060 |
| Status | Open, not draft |
| Commits | 1 |
| Changed files | 2 (docker-compose.yaml +1/-1, gitnexus/src/server/api.ts +7/-0) |
| Typecheck | ✅ tsc --noEmit clean |
| Tests | ✅ 8022 passed / 0 failed / 1 skipped, 3 platforms |
| E2E | ✅ (web-only, not directly relevant) |
| Docker manual validation | PR body marks as reviewer-pending; CI does not automate docker-compose up |
| Issue linkage | Closes #1147 ✅ |
| package-lock / deps | No dependency changes ✅ |
| Hidden Unicode (changed lines) | Pre-existing em-dashes in comments only; no new non-ASCII in executable code or config values ✅ |
| Vercel status | Irrelevant to this Docker/server healthcheck change |
| Code matches PR body | ✅ |
Understanding the Change
/api/heartbeat is a Server-Sent Events (SSE) endpoint: it sets Content-Type: text/event-stream, calls res.flushHeaders(), writes :ok\n\n, then keeps the connection open — emitting :ping\n\n every 15s indefinitely and never calling res.end() during normal operation. A Docker healthcheck curl against it would never receive a response termination under normal operation; the probe depends on curl interpreting the half-open connection under Connection: keep-alive with no Content-Length — fragile at best.
PR #1182 changed the probe to curl -fsSI (HEAD) as a mitigation: curl with -I sends HEAD, reads response headers only, and exits. This works but relies on Express's automatic response to HEAD on a GET handler — semantically awkward for a long-lived stream, and potentially sensitive to future middleware changes.
This PR adds a dedicated GET /api/health endpoint at api.ts:783-785:
app.get('/api/health', (_req, res) => {
res.json({ status: 'ok' });
});res.json() calls res.end() internally. The response is guaranteed to terminate. The probe becomes: send GET, get {"status":"ok"}, done.
What it proves: the Express process is alive and HTTP routing is functional.
What it intentionally does not prove: repo availability, embedding DB readiness, Git availability, indexing readiness — correctly out of scope for a liveness probe.
Findings
[minor] No dedicated route test for GET /api/health
- Category: Tests (DoD §2.7)
- Files:
gitnexus/src/server/api.ts,gitnexus/test/unit/ - Issue: There is no unit or integration test asserting
GET /api/health → 200 { status: 'ok' }. Therate-limit.test.tssource-grep suite checks that specific routes carrycreateRouteLimiter, but does not assert health endpoint presence or shape. No existing test would fail if the health route were accidentally removed or returned 500. - Why it matters: DoD §6 states "The runtime path is not actually exercised by the tests" is a "Not Done" signal. For this specific endpoint the risk is low (3-line synchronous handler, no deps), but future refactors of
createServer's route-registration block could silently drop it without a test catching the regression before Docker healthchecks in prod. - Recommended fix: Add a small test — Fix this →
- Blocks merge: No — the endpoint is trivially correct, CI is green, and the structural risk is low for a 3-line synchronous handler. But this is the strongest argument against "fully Done" per DoD.
[cosmetic] Landing page HTML does not list /api/health
- Category: Observability / Correctness (DoD §2.8)
- Files:
gitnexus/src/server/api.ts:186(landingPageHtml()) - Issue: The landing page (shown when no web UI is built) lists
/api/heartbeatas an endpoint but does not list/api/health. The new endpoint is invisible to developers navigating to the server root. - Why it matters: Minor discoverability gap only. The endpoint is not user-facing in normal operation.
- Recommended fix: Add
<p class="endpoint"><code>/api/health</code> <span style="color:#5a5a70">— Docker/orchestrator healthcheck</span></p>after the heartbeat line. - Blocks merge: No.
Server Endpoint Assessment ✅
app.get('/api/health', ...) at api.ts:783 is registered at the start of the API route block, approximately 989 lines before registerWebUI(app, staticDir) is called at line 1781. Route order is correct.
SPA_FALLBACK_REGEX at line 131 is:
/^(?!\/api(?:\/|$))(?!.*\.\w{1,10}$).*/
This regex explicitly excludes all /api/* paths via the negative lookahead (?!\/api(?:\/|$)). Even if /api/health were registered after the SPA fallback (it isn't), the regex would still not match it. Double protection.
app.get('/api/health', (_req, res) => {
res.json({ status: 'ok' });
});- ✅ Returns
res.json(...)— Express setsContent-Type: application/jsonand callsres.end()automatically - ✅ Synchronous handler — no async work, no await, no I/O
- ✅ No filesystem, database, Git, or network access
- ✅ No rate limiter — Docker probes at 30s intervals (2 rpm) against a default 60 rpm budget; even if a limiter were present it wouldn't fire. But absence is correct and intentional.
- ✅ Does not shadow
/api/heartbeat(different path literal, registered two lines later) - ✅ Does not rely on any repo state or initialization
- ✅ HEAD requests will work via Express's automatic HEAD-from-GET handling, though Docker now uses GET
Docker Healthcheck Assessment ✅
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:4747/api/health']
interval: 30s
timeout: 5s
retries: 3
start_period: 15s- ✅ Target is
http://localhost:4747/api/health— correct in-container address - ✅
curl -fexits 0 for 2xx, non-zero for 4xx/5xx or connection failure — correct semantics for a finite response - ✅ No
-Ineeded — response terminates naturally - ✅
curlis explicitly installed inDockerfile.cliline 35:RUN apt-get update && apt-get install -y --no-install-recommends curl git - ✅
interval: 30s,timeout: 5s,retries: 3,start_period: 15s— all appropriate - ✅
gitnexus-webdepends_on: condition: service_healthyis preserved — web service still waits for server healthcheck before starting - ✅ Cannot hang on SSE connection —
/api/healthterminates immediately - ✅ No internet access or workspace mount required
- ℹ️
curl -fwithout-swill emit curl progress/error output to stdout on failure. The existinggitnexus-webhealthcheck at line 38 uses the samecurl -fpattern without-s, so this is consistent with repo style. Not a blocker.
Heartbeat / SSE Regression Assessment ✅
/api/heartbeat at api.ts:787-805 is completely unchanged:
app.get('/api/heartbeat', (_req, res) => {
res.set({ 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', Connection: 'keep-alive' });
res.flushHeaders();
res.write(':ok\n\n');
const interval = setInterval(() => res.write(':ping\n\n'), 15_000);
_req.on('close', () => clearInterval(interval));
});- ✅ SSE headers, initial ping, keep-alive interval, and close cleanup are all unchanged
- ✅ EventSource clients will continue to work identically
- ✅ Docker healthcheck no longer touches this endpoint
- ✅ No accidental refactor or movement of the heartbeat route
Test Assessment
- ✅ CI reports 8022 tests passed, 0 failed, 3 platforms
- ✅
rate-limit.test.tssource-grep tests check routes that carrycreateRouteLimiter—/api/healthis intentionally un-limited and is not asserted; these tests remain valid and unaffected ⚠️ No dedicated route test forGET /api/health → 200 { "status": "ok" }(see Findings above)- ✅ No test relies on
/api/heartbeatbeing the healthcheck target - ✅ The PR body documents manual Docker validation is pending reviewer verification — acceptable for a 3-line synchronous endpoint if CI is trusted
Hidden Unicode / Hygiene Assessment ✅
Running character analysis on the changed files confirms:
docker-compose.yamlline 22 (the changed healthcheck test command): pure ASCII —['CMD', 'curl', '-f', 'http://localhost:4747/api/health']docker-compose.yamlline 15: pre-existing em-dash—in a comment. Not introduced by this PR.gitnexus/src/server/api.tslines 780–785 (the new health endpoint + its comment): pure ASCII- Em-dashes and arrows in other api.ts lines are pre-existing in comments and HTML string literals throughout the file
No bidi controls, no hidden characters, no invisible whitespace in executable code, route strings, or YAML structure. Clean.
- ✅ No unrelated formatting churn
- ✅ No dependency or package-lock changes
- ✅ Diff is minimal and scoped exactly to the stated change
Elegance Assessment ✅
This is an exemplary small fix:
- 7 lines added, 1 line changed — nothing more
- Handler is 2 lines (comment block + one-liner body) — immediately obvious to any contributor
- Comment explains the why (
#1147, "container managers do not confuse a long-lived SSE stream") without over-explaining - No readiness framework, no DB ping, no filesystem check — correctly scoped as a liveness-only probe
- Named
/api/health— the universal convention for HTTP liveness endpoints - No duplication of existing health logic (none existed)
- Does not break any existing SSE, API, or routing behaviour
Final Verdict
production-ready with minor follow-ups
The core change is correct, minimal, and well-scoped. /api/health returns a finite, synchronous {"status":"ok"} with no rate limiter, no SSE dependency, registered well before the SPA fallback (which also explicitly excludes /api/* paths), backed by a Dockerfile that already installs curl. The SSE heartbeat is untouched. CI is green across all three platforms with 8022 tests. The only gap against the full DoD bar is the absence of a dedicated route test — the DoD §6 flags this as a "Not Done" signal, but for a 3-line synchronous handler with no dependencies, the structural risk is low enough that it warrants a follow-up rather than a block. The landing page HTML cosmetic miss (not listing /api/health) is strictly minor. Both can be addressed in a fast follow-up commit without re-opening the healthcheck risk.
- Source-grep test asserts GET /api/health is registered in api.ts - Landing page HTML now lists /api/health for discoverability
Summary
GET /api/healthendpoint that returns{"status":"ok"}immediately./api/heartbeat(which never callsres.end()) to/api/health, fully decoupling the probe from the long-lived stream.Context
PR #1182 mitigated the issue by adding
-I(HEAD) to the curl probe, but the SSE handler still never terminates the response -- the HEAD request relies on curl interpreting the lack of body correctly despiteConnection: keep-aliveand noContent-Length. A dedicated health endpoint eliminates the ambiguity.Test plan
npx tsc --noEmit-- compiles cleanlynpx vitest run test/unit/rate-limit.test.ts-- api.ts source-grep assertions still pass (15/15)docker compose up -dmarksgitnexus-serverhealthy (reviewer verification)Closes #1147