Skip to content

feat(telemetry): anonymous PostHog workflow-invocation tracking#1262

Merged
coleam00 merged 4 commits intodevfrom
feat/posthog-telemetry
Apr 16, 2026
Merged

feat(telemetry): anonymous PostHog workflow-invocation tracking#1262
coleam00 merged 4 commits intodevfrom
feat/posthog-telemetry

Conversation

@coleam00
Copy link
Copy Markdown
Owner

@coleam00 coleam00 commented Apr 16, 2026

Summary

  • Adds anonymous PostHog telemetry: one workflow_invoked event per workflow run with workflow_name, workflow_description, platform, and archon_version.
  • Anonymous by design — random install UUID persisted to $ARCHON_HOME/telemetry-id, $process_person_profile: false keeps events in PostHog's anonymous tier (no person profile created, ~4x cheaper).
  • Opt-out via ARCHON_TELEMETRY_DISABLED=1 or DO_NOT_TRACK=1; self-host via POSTHOG_API_KEY / POSTHOG_HOST.

Closes #1261.

How it works

  • Capture (packages/workflows/src/executor.ts): fire-and-forget call to captureWorkflowInvoked() right after the workflow_started event emits. Never blocks or throws — PostHog errors are logged at debug and swallowed.
  • Core module (packages/paths/src/telemetry.ts): lazy singleton PostHog client, embedded write-only project key (phc_… keys can only ingest events — safe to ship in source), 500-char cap on description length.
  • Shutdown (packages/server/src/index.ts, packages/cli/src/cli.ts): shutdownTelemetry() flushes buffered events on SIGTERM / end-of-command so short-lived CLI runs don't drop events.

What is sent

  • workflow_name (e.g. archon-plan)
  • workflow_description (from the workflow YAML, capped at 500 chars)
  • platform (cli | web | slack | telegram | github | discord)
  • archon_version (from BUNDLED_VERSION)
  • distinct_id = random UUID from $ARCHON_HOME/telemetry-id

Not sent: prompts, file paths, code, user identity, env vars, conversation content, repo names, git info.

Test plan

  • bun run validate — type-check + lint + format + all tests (green)
  • 9 new tests in packages/paths/src/telemetry.test.ts cover opt-out detection, disabled-path no-op, and telemetry-id persistence
  • Manual: run a workflow with telemetry enabled, confirm event lands in PostHog project dashboard
  • Manual: ARCHON_TELEMETRY_DISABLED=1 bun run cli workflow run … → no network call, no telemetry-id file written
  • Manual: DO_NOT_TRACK=1 bun run cli workflow run … → same as above
  • Manual: confirm events appear under PostHog's "anonymous events" (not under persons)

Follow-ups (deliberately out of scope)

  • Include workflow_source (bundled vs. project vs. global) — requires threading source through executeWorkflow() signatures; worth a separate PR.
  • Add docs-site "Telemetry" page (README section is in this PR).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added anonymous telemetry that emits a single non-identifying "workflow_invoked" event on workflow start, persists a local install ID, supports opt-out (ARCHON_TELEMETRY_DISABLED, DO_NOT_TRACK), can be redirected/self-hosted (POSTHOG_API_KEY, POSTHOG_HOST), and flushes pending events during shutdown.
  • Documentation

    • Added Telemetry section detailing collected fields, explicit exclusions of sensitive data, opt-out, and configuration.
  • Tests

    • Added tests for opt-out rules, ID persistence, disabled-mode behavior, and shutdown/flush.

Emits one `workflow_invoked` event per run with workflow name/description,
platform, and Archon version. Uses a stable random UUID persisted to
`$ARCHON_HOME/telemetry-id` for distinct-install counting, with
`$process_person_profile: false` to stay in PostHog's anonymous tier.

Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host
via `POSTHOG_API_KEY` / `POSTHOG_HOST`.

Closes #1261

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 921c8c3c-00e8-4651-b4ef-85feff59c2d7

📥 Commits

Reviewing files that changed from the base of the PR and between 73cbe3b and 60322b8.

📒 Files selected for processing (2)
  • packages/paths/src/telemetry.test.ts
  • packages/paths/src/telemetry.ts

📝 Walkthrough

Walkthrough

Added an anonymous PostHog telemetry subsystem that emits a single workflow_invoked event on workflow start, with env-var opt-outs, a persistent per-install anonymous ID, tests and docs, lazy client init, and shutdown flushes wired into CLI and server.

Changes

Cohort / File(s) Summary
Documentation
./.env.example, ./README.md
New telemetry docs and env var examples: opt-outs (ARCHON_TELEMETRY_DISABLED, DO_NOT_TRACK), PostHog overrides (POSTHOG_API_KEY, POSTHOG_HOST), collected/excluded fields, and behavior notes.
Telemetry Core
packages/paths/src/telemetry.ts
New module: lazy posthog-node client, opt-out detection, stable install UUID persisted at ${ARCHON_HOME}/telemetry-id, captureWorkflowInvoked(), isTelemetryDisabled(), getOrCreateTelemetryId(), shutdownTelemetry(), and resetTelemetryForTests(). Network failures are suppressed and logged at debug level.
Tests
packages/paths/src/telemetry.test.ts
New Bun tests covering opt-out rules, telemetry-id persistence, disabled-mode robustness, and shutdown/no-init behavior.
Exports & Dependency
packages/paths/src/index.ts, packages/paths/package.json
Re-export telemetry APIs/types; added posthog-node runtime dependency.
Executor Integration
packages/workflows/src/executor.ts
Fire-and-forget captureWorkflowInvoked() call after workflow_started, passing workflow name, truncated description, platform, and bundled version.
Shutdown Integration
packages/cli/src/cli.ts, packages/server/src/index.ts
Await shutdownTelemetry() during CLI and server graceful shutdown to flush queued telemetry.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Executor as Workflow Executor
    participant Telemetry as Telemetry Module
    participant FS as File System
    participant PostHog as PostHog API
    participant Shutdown as Shutdown Handler

    User->>Executor: invoke workflow
    Executor->>Executor: emit workflow_started
    Executor->>Telemetry: captureWorkflowInvoked(props)

    Note over Telemetry: check opt-out (ARCHON_TELEMETRY_DISABLED, DO_NOT_TRACK, POSTHOG_API_KEY)

    alt enabled
        Telemetry->>FS: read/create telemetry-id
        FS-->>Telemetry: telemetry-id (UUID)
        Telemetry->>PostHog: init client (lazy, singleton)
        Telemetry->>PostHog: queue workflow_invoked event (fire-and-forget)
    else disabled
        Telemetry-->>Executor: no-op
    end

    User->>Shutdown: process shutdown
    Shutdown->>Telemetry: shutdownTelemetry()
    alt client initialized
        Telemetry->>PostHog: flush and close
        PostHog-->>Telemetry: flushed
    end
    Telemetry-->>Shutdown: resolved
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 A small hop when workflows run,
I whisper "invoked" and then I'm done.
My ID stays hidden, safe in a file,
Opt-out if you wish — I'll rest for a while.
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning PR addresses most issue requirements: event properties, anonymous UUID, opt-out support, fire-and-forget design, shutdown hooks, tests, and README docs. However, workflow_source threading is incomplete (not passed through executeWorkflow call sites as required), and config-file opt-out precedence (telemetry.enabled in config.yaml) is unimplemented. Implement workflow_source threading through executeWorkflow() call sites in CLI, orchestrator foreground, and background; add config-file opt-out support (telemetry.enabled in ~/.archon/config.yaml and repo .archon/config.yaml).
Docstring Coverage ⚠️ Warning Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description covers the core implementation (capture, module design, shutdown), what is sent, and test evidence, but is missing formal sections matching the template (UX Journey, Architecture Diagram, Label Snapshot, detailed validation commands output). Provide missing template sections: UX Journey (before/after flows), Architecture Diagram with module connections, Label Snapshot (risk/size/scope labels), and complete Validation Evidence output from bun run validate.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely and accurately captures the main change: adding anonymous PostHog telemetry for workflow invocation tracking.
Out of Scope Changes check ✅ Passed All changes directly support telemetry implementation: environment variables, README docs, core telemetry module, tests, CLI/server shutdown hooks, and executor event capture. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/posthog-telemetry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/paths/src/telemetry.test.ts (1)

135-162: Consider increasing the timing buffer for CI stability.

The 20ms setTimeout on line 155 could be flaky on slow CI runners. While the test logic is correct (verifying read-only preservation of an existing file), consider a slightly larger buffer or polling approach.

That said, this is a minor concern — the test is validating a read operation that should complete well within 20ms under normal conditions.

♻️ Optional: Slightly larger timeout for CI stability
     // Give the async capture a moment to run its file read.
-    await new Promise(resolve => setTimeout(resolve, 20));
+    await new Promise(resolve => setTimeout(resolve, 50));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/paths/src/telemetry.test.ts` around lines 135 - 162, Increase the
async wait after calling captureWorkflowInvoked to avoid flaky CI failures:
replace the single short fixed delay (the current setTimeout 20ms used after
captureWorkflowInvoked in the test) with a larger buffer or a small polling loop
that checks the telemetry-id file until it exists/contains data (referencing
captureWorkflowInvoked, resetTelemetryForTests, shutdownTelemetry and the
telemetry-id file read via join(tmpHome, 'telemetry-id')). Ensure the test still
cleans up by awaiting shutdownTelemetry and keep the assertion that the stored
value equals the existingId.
packages/paths/src/telemetry.ts (1)

79-80: Persist telemetry-id with restrictive file permissions.

Line 80 writes a stable pseudonymous identifier without an explicit mode. Prefer user-only permissions to reduce local privacy exposure on shared systems.

🔐 Proposed fix
-    writeFileSync(idPath, id, 'utf8');
+    writeFileSync(idPath, id, { encoding: 'utf8', mode: 0o600 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/paths/src/telemetry.ts` around lines 79 - 80, Persisted telemetry id
is written without restrictive permissions; update the write to use file mode
0o600 so only the owner can read/write (and optionally set the directory created
by mkdirSync(getArchonHome()) to 0o700). Locate the mkdirSync(getArchonHome(), {
recursive: true }) and writeFileSync(idPath, id, 'utf8') calls and change the
mkdirSync options to include mode: 0o700 and change the writeFileSync call to
pass an options object with encoding 'utf8' and mode 0o600 (ensuring the idPath
file is created with owner-only permissions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/paths/src/telemetry.ts`:
- Around line 127-154: The event is missing the required workflow_source field:
update the WorkflowInvokedProperties interface to include a workflowSource
property (union type 'bundled' | 'project' | 'global', make required to enforce
presence) and include it in captureWorkflowInvoked's client.capture payload as
workflow_source: props.workflowSource (alongside
workflow_name/workflow_description), ensuring the schema emitted by
captureWorkflowInvoked contains the workflow_source value.

---

Nitpick comments:
In `@packages/paths/src/telemetry.test.ts`:
- Around line 135-162: Increase the async wait after calling
captureWorkflowInvoked to avoid flaky CI failures: replace the single short
fixed delay (the current setTimeout 20ms used after captureWorkflowInvoked in
the test) with a larger buffer or a small polling loop that checks the
telemetry-id file until it exists/contains data (referencing
captureWorkflowInvoked, resetTelemetryForTests, shutdownTelemetry and the
telemetry-id file read via join(tmpHome, 'telemetry-id')). Ensure the test still
cleans up by awaiting shutdownTelemetry and keep the assertion that the stored
value equals the existingId.

In `@packages/paths/src/telemetry.ts`:
- Around line 79-80: Persisted telemetry id is written without restrictive
permissions; update the write to use file mode 0o600 so only the owner can
read/write (and optionally set the directory created by
mkdirSync(getArchonHome()) to 0o700). Locate the mkdirSync(getArchonHome(), {
recursive: true }) and writeFileSync(idPath, id, 'utf8') calls and change the
mkdirSync options to include mode: 0o700 and change the writeFileSync call to
pass an options object with encoding 'utf8' and mode 0o600 (ensuring the idPath
file is created with owner-only permissions).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a024931-94f6-4561-8249-1fd70ffa058a

📥 Commits

Reviewing files that changed from the base of the PR and between f1c5dcb and ce9e24e.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • .env.example
  • README.md
  • packages/cli/src/cli.ts
  • packages/paths/package.json
  • packages/paths/src/index.ts
  • packages/paths/src/telemetry.test.ts
  • packages/paths/src/telemetry.ts
  • packages/server/src/index.ts
  • packages/workflows/src/executor.ts

Comment on lines +127 to +154
export interface WorkflowInvokedProperties {
workflowName: string;
workflowDescription?: string;
platform?: string;
archonVersion?: string;
}

/**
* Fire-and-forget capture of a `workflow_invoked` event. Never throws, never
* awaits — safe to call from hot paths.
*/
export function captureWorkflowInvoked(props: WorkflowInvokedProperties): void {
if (isTelemetryDisabled()) return;
void (async (): Promise<void> => {
try {
const client = await getClient();
if (!client) return;
const description = props.workflowDescription?.slice(0, DESCRIPTION_MAX_LENGTH);
client.capture({
distinctId: getTelemetryId(),
event: 'workflow_invoked',
properties: {
$process_person_profile: false,
workflow_name: props.workflowName,
...(description ? { workflow_description: description } : {}),
...(props.platform ? { platform: props.platform } : {}),
...(props.archonVersion ? { archon_version: props.archonVersion } : {}),
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

workflow_source is missing from the emitted event schema.

The event payload does not include the required workflow_source property (bundled | project | global), so telemetry records are incomplete.

💡 Proposed fix
 export interface WorkflowInvokedProperties {
   workflowName: string;
   workflowDescription?: string;
+  workflowSource: 'bundled' | 'project' | 'global';
   platform?: string;
   archonVersion?: string;
 }
@@
         properties: {
           $process_person_profile: false,
           workflow_name: props.workflowName,
           ...(description ? { workflow_description: description } : {}),
+          workflow_source: props.workflowSource,
           ...(props.platform ? { platform: props.platform } : {}),
           ...(props.archonVersion ? { archon_version: props.archonVersion } : {}),
         },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export interface WorkflowInvokedProperties {
workflowName: string;
workflowDescription?: string;
platform?: string;
archonVersion?: string;
}
/**
* Fire-and-forget capture of a `workflow_invoked` event. Never throws, never
* awaits safe to call from hot paths.
*/
export function captureWorkflowInvoked(props: WorkflowInvokedProperties): void {
if (isTelemetryDisabled()) return;
void (async (): Promise<void> => {
try {
const client = await getClient();
if (!client) return;
const description = props.workflowDescription?.slice(0, DESCRIPTION_MAX_LENGTH);
client.capture({
distinctId: getTelemetryId(),
event: 'workflow_invoked',
properties: {
$process_person_profile: false,
workflow_name: props.workflowName,
...(description ? { workflow_description: description } : {}),
...(props.platform ? { platform: props.platform } : {}),
...(props.archonVersion ? { archon_version: props.archonVersion } : {}),
},
export interface WorkflowInvokedProperties {
workflowName: string;
workflowDescription?: string;
workflowSource: 'bundled' | 'project' | 'global';
platform?: string;
archonVersion?: string;
}
/**
* Fire-and-forget capture of a `workflow_invoked` event. Never throws, never
* awaits safe to call from hot paths.
*/
export function captureWorkflowInvoked(props: WorkflowInvokedProperties): void {
if (isTelemetryDisabled()) return;
void (async (): Promise<void> => {
try {
const client = await getClient();
if (!client) return;
const description = props.workflowDescription?.slice(0, DESCRIPTION_MAX_LENGTH);
client.capture({
distinctId: getTelemetryId(),
event: 'workflow_invoked',
properties: {
$process_person_profile: false,
workflow_name: props.workflowName,
workflow_source: props.workflowSource,
...(description ? { workflow_description: description } : {}),
...(props.platform ? { platform: props.platform } : {}),
...(props.archonVersion ? { archon_version: props.archonVersion } : {}),
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/paths/src/telemetry.ts` around lines 127 - 154, The event is missing
the required workflow_source field: update the WorkflowInvokedProperties
interface to include a workflowSource property (union type 'bundled' | 'project'
| 'global', make required to enforce presence) and include it in
captureWorkflowInvoked's client.capture payload as workflow_source:
props.workflowSource (alongside workflow_name/workflow_description), ensuring
the schema emitted by captureWorkflowInvoked contains the workflow_source value.

coleam00 and others added 2 commits April 16, 2026 13:17
The `telemetry-id preservation` test exercised the real capture path with
the embedded production key, so every `bun run validate` published a
tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback
so the flush fails silently; bump test timeout to accommodate the
retry-then-give-up window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The PostHog SDK's internal logFlushError() writes 'Error while flushing
PostHog' directly to stderr via console.error on any network or HTTP
error, bypassing logger config. For a fire-and-forget telemetry path
this leaked stack traces to users' terminals whenever PostHog was
unreachable (offline, firewalled, DNS broken, rate-limited).

Pass a silentFetch wrapper to the PostHog client that masks failures as
fake 200 responses. The SDK never sees an error, so it never logs.
Original failure is still recorded at debug level for diagnostics.

Side benefit: shutdown is now fast on network failure (no retry loop),
so offline CLI commands no longer hang ~10s on exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/paths/src/telemetry.test.ts`:
- Around line 136-162: The test currently instantiates a real PostHog client and
relies on a timer to wait for async init; instead mock the PostHog client at the
import seam and make the test synchronous: stub out the module/function that
creates the external client (the code path exercised by captureWorkflowInvoked /
telemetry init) so no network call occurs, remove the setTimeout-based wait,
call resetTelemetryForTests(), write the existingId to telemetry-id in tmpHome,
call captureWorkflowInvoked({ workflowName: 'w' }), then immediately read
telemetry-id and assert it equals existingId; use your test framework's mocking
API (e.g., jest.mock or sinon.stub) to replace the PostHog/client initializer so
shutdownTelemetry() can run deterministically without hitting POSTHOG_HOST.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c0c47a33-02cb-49e1-8bc2-6c5db4484197

📥 Commits

Reviewing files that changed from the base of the PR and between ce9e24e and 107a270.

📒 Files selected for processing (1)
  • packages/paths/src/telemetry.test.ts

Comment thread packages/paths/src/telemetry.test.ts Outdated
Comment on lines +136 to +162
// Longer timeout: the bogus-host fallback below triggers posthog-node's
// shutdown-flush retry loop, which can take ~5–10s to give up.
// Simulate a prior run by writing a UUID, then enable the capture path so
// lazy init exercises the id-read. We redirect POSTHOG_HOST to a
// guaranteed-unreachable loopback port so the client's flush fails
// silently (swallowed by our error hook) instead of leaking a test event
// to production PostHog.
const { writeFileSync, mkdirSync } = await import('fs');
const existingId = '11111111-1111-4111-8111-111111111111';
mkdirSync(tmpHome, { recursive: true });
writeFileSync(join(tmpHome, 'telemetry-id'), existingId, 'utf8');

delete process.env.ARCHON_TELEMETRY_DISABLED;
process.env.POSTHOG_HOST = 'http://127.0.0.1:1';
resetTelemetryForTests();

captureWorkflowInvoked({ workflowName: 'w' });

// Give the async capture a moment to run its file read.
await new Promise(resolve => setTimeout(resolve, 20));

const stored = readFileSync(join(tmpHome, 'telemetry-id'), 'utf8').trim();
expect(stored).toBe(existingId);

// Clean up any in-flight client (swallows the unreachable-host error).
await shutdownTelemetry();
}, 20000);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify timing/network coupling in this test and locate telemetry client creation seam for spying.
rg -n -C2 'POSTHOG_HOST|127\.0\.0\.1:1|setTimeout\(|20000' packages/paths/src/telemetry.test.ts
rg -n -C3 'new PostHog|posthog|captureWorkflowInvoked|resetTelemetryForTests' packages/paths/src/telemetry.ts

Repository: coleam00/Archon

Length of output: 1778


🏁 Script executed:

#!/bin/bash
# Search for the test file and telemetry implementation
find . -name "telemetry.test.ts" -o -name "telemetry.ts" 2>/dev/null | head -20

Repository: coleam00/Archon

Length of output: 131


🏁 Script executed:

#!/bin/bash
# Read test file around lines 136-162 to verify network and timing concerns
wc -l packages/paths/src/telemetry.test.ts

Repository: coleam00/Archon

Length of output: 99


🏁 Script executed:

#!/bin/bash
# Read the complete test file and relevant telemetry implementation
echo "=== telemetry.test.ts ===" && cat -n packages/paths/src/telemetry.test.ts

Repository: coleam00/Archon

Length of output: 6268


🏁 Script executed:

#!/bin/bash
# Read the telemetry implementation to understand how it uses PostHog and whether the network behavior is actually exercised
cat -n packages/paths/src/telemetry.ts

Repository: coleam00/Archon

Length of output: 7528


🏁 Script executed:

#!/bin/bash
# Check if there are other ID-related tests or if this is the only coverage
grep -n "telemetry-id\|getTelemetryId\|getOrCreateTelemetryId" packages/paths/src/telemetry.test.ts

Repository: coleam00/Archon

Length of output: 464


Replace loopback-network + timer waiting with a fully mocked, deterministic unit test

This test creates a real PostHog client (line 152 triggers async init that imports and instantiates the external SDK) and depends on fixed-delay timing (setTimeout on line 155, 20s timeout on line 162). This violates the guideline: "Unit tests must test pure functions with mocked external dependencies (database, AI SDKs, platform APIs)".

The core logic being tested—ID persistence in getOrCreateTelemetryId()—can be verified without network calls. Mock the PostHog client initialization at the import seam, remove the setTimeout, and assert synchronously.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/paths/src/telemetry.test.ts` around lines 136 - 162, The test
currently instantiates a real PostHog client and relies on a timer to wait for
async init; instead mock the PostHog client at the import seam and make the test
synchronous: stub out the module/function that creates the external client (the
code path exercised by captureWorkflowInvoked / telemetry init) so no network
call occurs, remove the setTimeout-based wait, call resetTelemetryForTests(),
write the existingId to telemetry-id in tmpHome, call captureWorkflowInvoked({
workflowName: 'w' }), then immediately read telemetry-id and assert it equals
existingId; use your test framework's mocking API (e.g., jest.mock or
sinon.stub) to replace the PostHog/client initializer so shutdownTelemetry() can
run deterministically without hitting POSTHOG_HOST.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
packages/paths/src/telemetry.test.ts (1)

135-160: ⚠️ Potential issue | 🟠 Major

Make this persistence test deterministic (remove real network + timer wait)

Line 147 and Line 153 rely on a real unreachable host and fixed sleep; this keeps the test timing/network-coupled and flaky. Please mock the external telemetry transport (e.g., via spyOn() seam) and synchronize via awaited telemetry lifecycle (not setTimeout).

As per coding guidelines: “Tests: Unit tests must test pure functions with mocked external dependencies (database, AI SDKs, platform APIs)” and “Use spyOn() for internal modules in test files instead of mock.module()…”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/paths/src/telemetry.test.ts` around lines 135 - 160, The test should
avoid real network and timers: remove setting POSTHOG_HOST and the setTimeout;
instead spyOn the telemetry transport (e.g., spyOn(moduleContainingSilentFetch,
'silentFetch') or spyOn(posthogClient, 'capture') to return a resolved Promise)
before calling resetTelemetryForTests() and captureWorkflowInvoked({
workflowName: 'w' }); then await shutdownTelemetry() to ensure the async capture
completes deterministically, and finally read and assert the telemetry-id file;
keep references to resetTelemetryForTests, captureWorkflowInvoked,
shutdownTelemetry, and silentFetch (or the actual transport function) so the
mock targets are clear.
packages/paths/src/telemetry.ts (1)

181-208: ⚠️ Potential issue | 🟠 Major

workflow_source is still missing from the emitted workflow_invoked payload

Line 181 and Line 202-208 define and emit event properties without workflow_source, so telemetry records are incomplete versus the required schema.

💡 Proposed fix
 export interface WorkflowInvokedProperties {
   workflowName: string;
   workflowDescription?: string;
+  workflowSource: 'bundled' | 'project' | 'global';
   platform?: string;
   archonVersion?: string;
 }
@@
         properties: {
           $process_person_profile: false,
           workflow_name: props.workflowName,
           ...(description ? { workflow_description: description } : {}),
+          workflow_source: props.workflowSource,
           ...(props.platform ? { platform: props.platform } : {}),
           ...(props.archonVersion ? { archon_version: props.archonVersion } : {}),
         },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/paths/src/telemetry.ts` around lines 181 - 208, The emitted
telemetry for workflow_invoked is missing the workflow_source field; update the
WorkflowInvokedProperties interface to include an optional workflowSource (or
workflow_source) property and then include it when building the capture
properties inside captureWorkflowInvoked (e.g., add ...(props.workflowSource ? {
workflow_source: props.workflowSource } : {}) so the emitted payload conforms to
the required schema).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/paths/src/telemetry.test.ts`:
- Around line 135-160: The test should avoid real network and timers: remove
setting POSTHOG_HOST and the setTimeout; instead spyOn the telemetry transport
(e.g., spyOn(moduleContainingSilentFetch, 'silentFetch') or spyOn(posthogClient,
'capture') to return a resolved Promise) before calling resetTelemetryForTests()
and captureWorkflowInvoked({ workflowName: 'w' }); then await
shutdownTelemetry() to ensure the async capture completes deterministically, and
finally read and assert the telemetry-id file; keep references to
resetTelemetryForTests, captureWorkflowInvoked, shutdownTelemetry, and
silentFetch (or the actual transport function) so the mock targets are clear.

In `@packages/paths/src/telemetry.ts`:
- Around line 181-208: The emitted telemetry for workflow_invoked is missing the
workflow_source field; update the WorkflowInvokedProperties interface to include
an optional workflowSource (or workflow_source) property and then include it
when building the capture properties inside captureWorkflowInvoked (e.g., add
...(props.workflowSource ? { workflow_source: props.workflowSource } : {}) so
the emitted payload conforms to the required schema).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd3540f5-b442-4c3f-bafb-459c431cbac5

📥 Commits

Reviewing files that changed from the base of the PR and between 107a270 and 73cbe3b.

📒 Files selected for processing (2)
  • packages/paths/src/telemetry.test.ts
  • packages/paths/src/telemetry.ts

Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback
dance with a direct synchronous call to getOrCreateTelemetryId(). Export
the function with an @internal marker so tests can exercise the id path
without spinning up the PostHog client. No network, no timer, no flake.

Addresses CodeRabbit feedback on #1262.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coleam00 coleam00 merged commit d535c83 into dev Apr 16, 2026
3 of 4 checks passed
@coleam00 coleam00 deleted the feat/posthog-telemetry branch April 16, 2026 18:46
ztech-gthb pushed a commit to ztech-gthb/Archon that referenced this pull request Apr 18, 2026
…am00#1262)

* feat(telemetry): add anonymous PostHog workflow-invocation tracking

Emits one `workflow_invoked` event per run with workflow name/description,
platform, and Archon version. Uses a stable random UUID persisted to
`$ARCHON_HOME/telemetry-id` for distinct-install counting, with
`$process_person_profile: false` to stay in PostHog's anonymous tier.

Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host
via `POSTHOG_API_KEY` / `POSTHOG_HOST`.

Closes coleam00#1261

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(telemetry): stop leaking test events to production PostHog

The `telemetry-id preservation` test exercised the real capture path with
the embedded production key, so every `bun run validate` published a
tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback
so the flush fails silently; bump test timeout to accommodate the
retry-then-give-up window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(telemetry): silence posthog-node stderr leak on network failure

The PostHog SDK's internal logFlushError() writes 'Error while flushing
PostHog' directly to stderr via console.error on any network or HTTP
error, bypassing logger config. For a fire-and-forget telemetry path
this leaked stack traces to users' terminals whenever PostHog was
unreachable (offline, firewalled, DNS broken, rate-limited).

Pass a silentFetch wrapper to the PostHog client that masks failures as
fake 200 responses. The SDK never sees an error, so it never logs.
Original failure is still recorded at debug level for diagnostics.

Side benefit: shutdown is now fast on network failure (no retry loop),
so offline CLI commands no longer hang ~10s on exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(telemetry): make id-preservation test deterministic

Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback
dance with a direct synchronous call to getOrCreateTelemetryId(). Export
the function with an @internal marker so tests can exercise the id path
without spinning up the PostHog client. No network, no timer, no flake.

Addresses CodeRabbit feedback on coleam00#1262.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
…am00#1262)

* feat(telemetry): add anonymous PostHog workflow-invocation tracking

Emits one `workflow_invoked` event per run with workflow name/description,
platform, and Archon version. Uses a stable random UUID persisted to
`$ARCHON_HOME/telemetry-id` for distinct-install counting, with
`$process_person_profile: false` to stay in PostHog's anonymous tier.

Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host
via `POSTHOG_API_KEY` / `POSTHOG_HOST`.

Closes coleam00#1261

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(telemetry): stop leaking test events to production PostHog

The `telemetry-id preservation` test exercised the real capture path with
the embedded production key, so every `bun run validate` published a
tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback
so the flush fails silently; bump test timeout to accommodate the
retry-then-give-up window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(telemetry): silence posthog-node stderr leak on network failure

The PostHog SDK's internal logFlushError() writes 'Error while flushing
PostHog' directly to stderr via console.error on any network or HTTP
error, bypassing logger config. For a fire-and-forget telemetry path
this leaked stack traces to users' terminals whenever PostHog was
unreachable (offline, firewalled, DNS broken, rate-limited).

Pass a silentFetch wrapper to the PostHog client that masks failures as
fake 200 responses. The SDK never sees an error, so it never logs.
Original failure is still recorded at debug level for diagnostics.

Side benefit: shutdown is now fast on network failure (no retry loop),
so offline CLI commands no longer hang ~10s on exit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(telemetry): make id-preservation test deterministic

Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback
dance with a direct synchronous call to getOrCreateTelemetryId(). Export
the function with an @internal marker so tests can exercise the id path
without spinning up the PostHog client. No network, no timer, no flake.

Addresses CodeRabbit feedback on coleam00#1262.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(telemetry): anonymous workflow-invocation tracking via PostHog

1 participant