feat(telemetry): anonymous PostHog workflow-invocation tracking#1262
feat(telemetry): anonymous PostHog workflow-invocation tracking#1262
Conversation
Emits one `workflow_invoked` event per run with workflow name/description, platform, and Archon version. Uses a stable random UUID persisted to `$ARCHON_HOME/telemetry-id` for distinct-install counting, with `$process_person_profile: false` to stay in PostHog's anonymous tier. Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host via `POSTHOG_API_KEY` / `POSTHOG_HOST`. Closes #1261 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdded an anonymous PostHog telemetry subsystem that emits a single Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Executor as Workflow Executor
participant Telemetry as Telemetry Module
participant FS as File System
participant PostHog as PostHog API
participant Shutdown as Shutdown Handler
User->>Executor: invoke workflow
Executor->>Executor: emit workflow_started
Executor->>Telemetry: captureWorkflowInvoked(props)
Note over Telemetry: check opt-out (ARCHON_TELEMETRY_DISABLED, DO_NOT_TRACK, POSTHOG_API_KEY)
alt enabled
Telemetry->>FS: read/create telemetry-id
FS-->>Telemetry: telemetry-id (UUID)
Telemetry->>PostHog: init client (lazy, singleton)
Telemetry->>PostHog: queue workflow_invoked event (fire-and-forget)
else disabled
Telemetry-->>Executor: no-op
end
User->>Shutdown: process shutdown
Shutdown->>Telemetry: shutdownTelemetry()
alt client initialized
Telemetry->>PostHog: flush and close
PostHog-->>Telemetry: flushed
end
Telemetry-->>Shutdown: resolved
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/paths/src/telemetry.test.ts (1)
135-162: Consider increasing the timing buffer for CI stability.The 20ms
setTimeouton line 155 could be flaky on slow CI runners. While the test logic is correct (verifying read-only preservation of an existing file), consider a slightly larger buffer or polling approach.That said, this is a minor concern — the test is validating a read operation that should complete well within 20ms under normal conditions.
♻️ Optional: Slightly larger timeout for CI stability
// Give the async capture a moment to run its file read. - await new Promise(resolve => setTimeout(resolve, 20)); + await new Promise(resolve => setTimeout(resolve, 50));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/paths/src/telemetry.test.ts` around lines 135 - 162, Increase the async wait after calling captureWorkflowInvoked to avoid flaky CI failures: replace the single short fixed delay (the current setTimeout 20ms used after captureWorkflowInvoked in the test) with a larger buffer or a small polling loop that checks the telemetry-id file until it exists/contains data (referencing captureWorkflowInvoked, resetTelemetryForTests, shutdownTelemetry and the telemetry-id file read via join(tmpHome, 'telemetry-id')). Ensure the test still cleans up by awaiting shutdownTelemetry and keep the assertion that the stored value equals the existingId.packages/paths/src/telemetry.ts (1)
79-80: Persisttelemetry-idwith restrictive file permissions.Line 80 writes a stable pseudonymous identifier without an explicit mode. Prefer user-only permissions to reduce local privacy exposure on shared systems.
🔐 Proposed fix
- writeFileSync(idPath, id, 'utf8'); + writeFileSync(idPath, id, { encoding: 'utf8', mode: 0o600 });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/paths/src/telemetry.ts` around lines 79 - 80, Persisted telemetry id is written without restrictive permissions; update the write to use file mode 0o600 so only the owner can read/write (and optionally set the directory created by mkdirSync(getArchonHome()) to 0o700). Locate the mkdirSync(getArchonHome(), { recursive: true }) and writeFileSync(idPath, id, 'utf8') calls and change the mkdirSync options to include mode: 0o700 and change the writeFileSync call to pass an options object with encoding 'utf8' and mode 0o600 (ensuring the idPath file is created with owner-only permissions).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/paths/src/telemetry.ts`:
- Around line 127-154: The event is missing the required workflow_source field:
update the WorkflowInvokedProperties interface to include a workflowSource
property (union type 'bundled' | 'project' | 'global', make required to enforce
presence) and include it in captureWorkflowInvoked's client.capture payload as
workflow_source: props.workflowSource (alongside
workflow_name/workflow_description), ensuring the schema emitted by
captureWorkflowInvoked contains the workflow_source value.
---
Nitpick comments:
In `@packages/paths/src/telemetry.test.ts`:
- Around line 135-162: Increase the async wait after calling
captureWorkflowInvoked to avoid flaky CI failures: replace the single short
fixed delay (the current setTimeout 20ms used after captureWorkflowInvoked in
the test) with a larger buffer or a small polling loop that checks the
telemetry-id file until it exists/contains data (referencing
captureWorkflowInvoked, resetTelemetryForTests, shutdownTelemetry and the
telemetry-id file read via join(tmpHome, 'telemetry-id')). Ensure the test still
cleans up by awaiting shutdownTelemetry and keep the assertion that the stored
value equals the existingId.
In `@packages/paths/src/telemetry.ts`:
- Around line 79-80: Persisted telemetry id is written without restrictive
permissions; update the write to use file mode 0o600 so only the owner can
read/write (and optionally set the directory created by
mkdirSync(getArchonHome()) to 0o700). Locate the mkdirSync(getArchonHome(), {
recursive: true }) and writeFileSync(idPath, id, 'utf8') calls and change the
mkdirSync options to include mode: 0o700 and change the writeFileSync call to
pass an options object with encoding 'utf8' and mode 0o600 (ensuring the idPath
file is created with owner-only permissions).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7a024931-94f6-4561-8249-1fd70ffa058a
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.env.exampleREADME.mdpackages/cli/src/cli.tspackages/paths/package.jsonpackages/paths/src/index.tspackages/paths/src/telemetry.test.tspackages/paths/src/telemetry.tspackages/server/src/index.tspackages/workflows/src/executor.ts
| export interface WorkflowInvokedProperties { | ||
| workflowName: string; | ||
| workflowDescription?: string; | ||
| platform?: string; | ||
| archonVersion?: string; | ||
| } | ||
|
|
||
| /** | ||
| * Fire-and-forget capture of a `workflow_invoked` event. Never throws, never | ||
| * awaits — safe to call from hot paths. | ||
| */ | ||
| export function captureWorkflowInvoked(props: WorkflowInvokedProperties): void { | ||
| if (isTelemetryDisabled()) return; | ||
| void (async (): Promise<void> => { | ||
| try { | ||
| const client = await getClient(); | ||
| if (!client) return; | ||
| const description = props.workflowDescription?.slice(0, DESCRIPTION_MAX_LENGTH); | ||
| client.capture({ | ||
| distinctId: getTelemetryId(), | ||
| event: 'workflow_invoked', | ||
| properties: { | ||
| $process_person_profile: false, | ||
| workflow_name: props.workflowName, | ||
| ...(description ? { workflow_description: description } : {}), | ||
| ...(props.platform ? { platform: props.platform } : {}), | ||
| ...(props.archonVersion ? { archon_version: props.archonVersion } : {}), | ||
| }, |
There was a problem hiding this comment.
workflow_source is missing from the emitted event schema.
The event payload does not include the required workflow_source property (bundled | project | global), so telemetry records are incomplete.
💡 Proposed fix
export interface WorkflowInvokedProperties {
workflowName: string;
workflowDescription?: string;
+ workflowSource: 'bundled' | 'project' | 'global';
platform?: string;
archonVersion?: string;
}
@@
properties: {
$process_person_profile: false,
workflow_name: props.workflowName,
...(description ? { workflow_description: description } : {}),
+ workflow_source: props.workflowSource,
...(props.platform ? { platform: props.platform } : {}),
...(props.archonVersion ? { archon_version: props.archonVersion } : {}),
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export interface WorkflowInvokedProperties { | |
| workflowName: string; | |
| workflowDescription?: string; | |
| platform?: string; | |
| archonVersion?: string; | |
| } | |
| /** | |
| * Fire-and-forget capture of a `workflow_invoked` event. Never throws, never | |
| * awaits — safe to call from hot paths. | |
| */ | |
| export function captureWorkflowInvoked(props: WorkflowInvokedProperties): void { | |
| if (isTelemetryDisabled()) return; | |
| void (async (): Promise<void> => { | |
| try { | |
| const client = await getClient(); | |
| if (!client) return; | |
| const description = props.workflowDescription?.slice(0, DESCRIPTION_MAX_LENGTH); | |
| client.capture({ | |
| distinctId: getTelemetryId(), | |
| event: 'workflow_invoked', | |
| properties: { | |
| $process_person_profile: false, | |
| workflow_name: props.workflowName, | |
| ...(description ? { workflow_description: description } : {}), | |
| ...(props.platform ? { platform: props.platform } : {}), | |
| ...(props.archonVersion ? { archon_version: props.archonVersion } : {}), | |
| }, | |
| export interface WorkflowInvokedProperties { | |
| workflowName: string; | |
| workflowDescription?: string; | |
| workflowSource: 'bundled' | 'project' | 'global'; | |
| platform?: string; | |
| archonVersion?: string; | |
| } | |
| /** | |
| * Fire-and-forget capture of a `workflow_invoked` event. Never throws, never | |
| * awaits — safe to call from hot paths. | |
| */ | |
| export function captureWorkflowInvoked(props: WorkflowInvokedProperties): void { | |
| if (isTelemetryDisabled()) return; | |
| void (async (): Promise<void> => { | |
| try { | |
| const client = await getClient(); | |
| if (!client) return; | |
| const description = props.workflowDescription?.slice(0, DESCRIPTION_MAX_LENGTH); | |
| client.capture({ | |
| distinctId: getTelemetryId(), | |
| event: 'workflow_invoked', | |
| properties: { | |
| $process_person_profile: false, | |
| workflow_name: props.workflowName, | |
| workflow_source: props.workflowSource, | |
| ...(description ? { workflow_description: description } : {}), | |
| ...(props.platform ? { platform: props.platform } : {}), | |
| ...(props.archonVersion ? { archon_version: props.archonVersion } : {}), | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/paths/src/telemetry.ts` around lines 127 - 154, The event is missing
the required workflow_source field: update the WorkflowInvokedProperties
interface to include a workflowSource property (union type 'bundled' | 'project'
| 'global', make required to enforce presence) and include it in
captureWorkflowInvoked's client.capture payload as workflow_source:
props.workflowSource (alongside workflow_name/workflow_description), ensuring
the schema emitted by captureWorkflowInvoked contains the workflow_source value.
The `telemetry-id preservation` test exercised the real capture path with the embedded production key, so every `bun run validate` published a tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback so the flush fails silently; bump test timeout to accommodate the retry-then-give-up window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The PostHog SDK's internal logFlushError() writes 'Error while flushing PostHog' directly to stderr via console.error on any network or HTTP error, bypassing logger config. For a fire-and-forget telemetry path this leaked stack traces to users' terminals whenever PostHog was unreachable (offline, firewalled, DNS broken, rate-limited). Pass a silentFetch wrapper to the PostHog client that masks failures as fake 200 responses. The SDK never sees an error, so it never logs. Original failure is still recorded at debug level for diagnostics. Side benefit: shutdown is now fast on network failure (no retry loop), so offline CLI commands no longer hang ~10s on exit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/paths/src/telemetry.test.ts`:
- Around line 136-162: The test currently instantiates a real PostHog client and
relies on a timer to wait for async init; instead mock the PostHog client at the
import seam and make the test synchronous: stub out the module/function that
creates the external client (the code path exercised by captureWorkflowInvoked /
telemetry init) so no network call occurs, remove the setTimeout-based wait,
call resetTelemetryForTests(), write the existingId to telemetry-id in tmpHome,
call captureWorkflowInvoked({ workflowName: 'w' }), then immediately read
telemetry-id and assert it equals existingId; use your test framework's mocking
API (e.g., jest.mock or sinon.stub) to replace the PostHog/client initializer so
shutdownTelemetry() can run deterministically without hitting POSTHOG_HOST.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c0c47a33-02cb-49e1-8bc2-6c5db4484197
📒 Files selected for processing (1)
packages/paths/src/telemetry.test.ts
| // Longer timeout: the bogus-host fallback below triggers posthog-node's | ||
| // shutdown-flush retry loop, which can take ~5–10s to give up. | ||
| // Simulate a prior run by writing a UUID, then enable the capture path so | ||
| // lazy init exercises the id-read. We redirect POSTHOG_HOST to a | ||
| // guaranteed-unreachable loopback port so the client's flush fails | ||
| // silently (swallowed by our error hook) instead of leaking a test event | ||
| // to production PostHog. | ||
| const { writeFileSync, mkdirSync } = await import('fs'); | ||
| const existingId = '11111111-1111-4111-8111-111111111111'; | ||
| mkdirSync(tmpHome, { recursive: true }); | ||
| writeFileSync(join(tmpHome, 'telemetry-id'), existingId, 'utf8'); | ||
|
|
||
| delete process.env.ARCHON_TELEMETRY_DISABLED; | ||
| process.env.POSTHOG_HOST = 'http://127.0.0.1:1'; | ||
| resetTelemetryForTests(); | ||
|
|
||
| captureWorkflowInvoked({ workflowName: 'w' }); | ||
|
|
||
| // Give the async capture a moment to run its file read. | ||
| await new Promise(resolve => setTimeout(resolve, 20)); | ||
|
|
||
| const stored = readFileSync(join(tmpHome, 'telemetry-id'), 'utf8').trim(); | ||
| expect(stored).toBe(existingId); | ||
|
|
||
| // Clean up any in-flight client (swallows the unreachable-host error). | ||
| await shutdownTelemetry(); | ||
| }, 20000); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify timing/network coupling in this test and locate telemetry client creation seam for spying.
rg -n -C2 'POSTHOG_HOST|127\.0\.0\.1:1|setTimeout\(|20000' packages/paths/src/telemetry.test.ts
rg -n -C3 'new PostHog|posthog|captureWorkflowInvoked|resetTelemetryForTests' packages/paths/src/telemetry.tsRepository: coleam00/Archon
Length of output: 1778
🏁 Script executed:
#!/bin/bash
# Search for the test file and telemetry implementation
find . -name "telemetry.test.ts" -o -name "telemetry.ts" 2>/dev/null | head -20Repository: coleam00/Archon
Length of output: 131
🏁 Script executed:
#!/bin/bash
# Read test file around lines 136-162 to verify network and timing concerns
wc -l packages/paths/src/telemetry.test.tsRepository: coleam00/Archon
Length of output: 99
🏁 Script executed:
#!/bin/bash
# Read the complete test file and relevant telemetry implementation
echo "=== telemetry.test.ts ===" && cat -n packages/paths/src/telemetry.test.tsRepository: coleam00/Archon
Length of output: 6268
🏁 Script executed:
#!/bin/bash
# Read the telemetry implementation to understand how it uses PostHog and whether the network behavior is actually exercised
cat -n packages/paths/src/telemetry.tsRepository: coleam00/Archon
Length of output: 7528
🏁 Script executed:
#!/bin/bash
# Check if there are other ID-related tests or if this is the only coverage
grep -n "telemetry-id\|getTelemetryId\|getOrCreateTelemetryId" packages/paths/src/telemetry.test.tsRepository: coleam00/Archon
Length of output: 464
Replace loopback-network + timer waiting with a fully mocked, deterministic unit test
This test creates a real PostHog client (line 152 triggers async init that imports and instantiates the external SDK) and depends on fixed-delay timing (setTimeout on line 155, 20s timeout on line 162). This violates the guideline: "Unit tests must test pure functions with mocked external dependencies (database, AI SDKs, platform APIs)".
The core logic being tested—ID persistence in getOrCreateTelemetryId()—can be verified without network calls. Mock the PostHog client initialization at the import seam, remove the setTimeout, and assert synchronously.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/paths/src/telemetry.test.ts` around lines 136 - 162, The test
currently instantiates a real PostHog client and relies on a timer to wait for
async init; instead mock the PostHog client at the import seam and make the test
synchronous: stub out the module/function that creates the external client (the
code path exercised by captureWorkflowInvoked / telemetry init) so no network
call occurs, remove the setTimeout-based wait, call resetTelemetryForTests(),
write the existingId to telemetry-id in tmpHome, call captureWorkflowInvoked({
workflowName: 'w' }), then immediately read telemetry-id and assert it equals
existingId; use your test framework's mocking API (e.g., jest.mock or
sinon.stub) to replace the PostHog/client initializer so shutdownTelemetry() can
run deterministically without hitting POSTHOG_HOST.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
packages/paths/src/telemetry.test.ts (1)
135-160:⚠️ Potential issue | 🟠 MajorMake this persistence test deterministic (remove real network + timer wait)
Line 147 and Line 153 rely on a real unreachable host and fixed sleep; this keeps the test timing/network-coupled and flaky. Please mock the external telemetry transport (e.g., via
spyOn()seam) and synchronize via awaited telemetry lifecycle (notsetTimeout).As per coding guidelines: “Tests: Unit tests must test pure functions with mocked external dependencies (database, AI SDKs, platform APIs)” and “Use
spyOn()for internal modules in test files instead ofmock.module()…”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/paths/src/telemetry.test.ts` around lines 135 - 160, The test should avoid real network and timers: remove setting POSTHOG_HOST and the setTimeout; instead spyOn the telemetry transport (e.g., spyOn(moduleContainingSilentFetch, 'silentFetch') or spyOn(posthogClient, 'capture') to return a resolved Promise) before calling resetTelemetryForTests() and captureWorkflowInvoked({ workflowName: 'w' }); then await shutdownTelemetry() to ensure the async capture completes deterministically, and finally read and assert the telemetry-id file; keep references to resetTelemetryForTests, captureWorkflowInvoked, shutdownTelemetry, and silentFetch (or the actual transport function) so the mock targets are clear.packages/paths/src/telemetry.ts (1)
181-208:⚠️ Potential issue | 🟠 Major
workflow_sourceis still missing from the emittedworkflow_invokedpayloadLine 181 and Line 202-208 define and emit event properties without
workflow_source, so telemetry records are incomplete versus the required schema.💡 Proposed fix
export interface WorkflowInvokedProperties { workflowName: string; workflowDescription?: string; + workflowSource: 'bundled' | 'project' | 'global'; platform?: string; archonVersion?: string; } @@ properties: { $process_person_profile: false, workflow_name: props.workflowName, ...(description ? { workflow_description: description } : {}), + workflow_source: props.workflowSource, ...(props.platform ? { platform: props.platform } : {}), ...(props.archonVersion ? { archon_version: props.archonVersion } : {}), },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/paths/src/telemetry.ts` around lines 181 - 208, The emitted telemetry for workflow_invoked is missing the workflow_source field; update the WorkflowInvokedProperties interface to include an optional workflowSource (or workflow_source) property and then include it when building the capture properties inside captureWorkflowInvoked (e.g., add ...(props.workflowSource ? { workflow_source: props.workflowSource } : {}) so the emitted payload conforms to the required schema).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/paths/src/telemetry.test.ts`:
- Around line 135-160: The test should avoid real network and timers: remove
setting POSTHOG_HOST and the setTimeout; instead spyOn the telemetry transport
(e.g., spyOn(moduleContainingSilentFetch, 'silentFetch') or spyOn(posthogClient,
'capture') to return a resolved Promise) before calling resetTelemetryForTests()
and captureWorkflowInvoked({ workflowName: 'w' }); then await
shutdownTelemetry() to ensure the async capture completes deterministically, and
finally read and assert the telemetry-id file; keep references to
resetTelemetryForTests, captureWorkflowInvoked, shutdownTelemetry, and
silentFetch (or the actual transport function) so the mock targets are clear.
In `@packages/paths/src/telemetry.ts`:
- Around line 181-208: The emitted telemetry for workflow_invoked is missing the
workflow_source field; update the WorkflowInvokedProperties interface to include
an optional workflowSource (or workflow_source) property and then include it
when building the capture properties inside captureWorkflowInvoked (e.g., add
...(props.workflowSource ? { workflow_source: props.workflowSource } : {}) so
the emitted payload conforms to the required schema).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dd3540f5-b442-4c3f-bafb-459c431cbac5
📒 Files selected for processing (2)
packages/paths/src/telemetry.test.tspackages/paths/src/telemetry.ts
Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback dance with a direct synchronous call to getOrCreateTelemetryId(). Export the function with an @internal marker so tests can exercise the id path without spinning up the PostHog client. No network, no timer, no flake. Addresses CodeRabbit feedback on #1262. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…am00#1262) * feat(telemetry): add anonymous PostHog workflow-invocation tracking Emits one `workflow_invoked` event per run with workflow name/description, platform, and Archon version. Uses a stable random UUID persisted to `$ARCHON_HOME/telemetry-id` for distinct-install counting, with `$process_person_profile: false` to stay in PostHog's anonymous tier. Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host via `POSTHOG_API_KEY` / `POSTHOG_HOST`. Closes coleam00#1261 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(telemetry): stop leaking test events to production PostHog The `telemetry-id preservation` test exercised the real capture path with the embedded production key, so every `bun run validate` published a tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback so the flush fails silently; bump test timeout to accommodate the retry-then-give-up window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(telemetry): silence posthog-node stderr leak on network failure The PostHog SDK's internal logFlushError() writes 'Error while flushing PostHog' directly to stderr via console.error on any network or HTTP error, bypassing logger config. For a fire-and-forget telemetry path this leaked stack traces to users' terminals whenever PostHog was unreachable (offline, firewalled, DNS broken, rate-limited). Pass a silentFetch wrapper to the PostHog client that masks failures as fake 200 responses. The SDK never sees an error, so it never logs. Original failure is still recorded at debug level for diagnostics. Side benefit: shutdown is now fast on network failure (no retry loop), so offline CLI commands no longer hang ~10s on exit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(telemetry): make id-preservation test deterministic Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback dance with a direct synchronous call to getOrCreateTelemetryId(). Export the function with an @internal marker so tests can exercise the id path without spinning up the PostHog client. No network, no timer, no flake. Addresses CodeRabbit feedback on coleam00#1262. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…am00#1262) * feat(telemetry): add anonymous PostHog workflow-invocation tracking Emits one `workflow_invoked` event per run with workflow name/description, platform, and Archon version. Uses a stable random UUID persisted to `$ARCHON_HOME/telemetry-id` for distinct-install counting, with `$process_person_profile: false` to stay in PostHog's anonymous tier. Opt out with `ARCHON_TELEMETRY_DISABLED=1` or `DO_NOT_TRACK=1`. Self-host via `POSTHOG_API_KEY` / `POSTHOG_HOST`. Closes coleam00#1261 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(telemetry): stop leaking test events to production PostHog The `telemetry-id preservation` test exercised the real capture path with the embedded production key, so every `bun run validate` published a tombstone `workflow_name: "w"` event. Redirect POSTHOG_HOST to loopback so the flush fails silently; bump test timeout to accommodate the retry-then-give-up window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(telemetry): silence posthog-node stderr leak on network failure The PostHog SDK's internal logFlushError() writes 'Error while flushing PostHog' directly to stderr via console.error on any network or HTTP error, bypassing logger config. For a fire-and-forget telemetry path this leaked stack traces to users' terminals whenever PostHog was unreachable (offline, firewalled, DNS broken, rate-limited). Pass a silentFetch wrapper to the PostHog client that masks failures as fake 200 responses. The SDK never sees an error, so it never logs. Original failure is still recorded at debug level for diagnostics. Side benefit: shutdown is now fast on network failure (no retry loop), so offline CLI commands no longer hang ~10s on exit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test(telemetry): make id-preservation test deterministic Replace the fire-and-forget capture + setTimeout + POSTHOG_HOST-loopback dance with a direct synchronous call to getOrCreateTelemetryId(). Export the function with an @internal marker so tests can exercise the id path without spinning up the PostHog client. No network, no timer, no flake. Addresses CodeRabbit feedback on coleam00#1262. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
workflow_invokedevent per workflow run withworkflow_name,workflow_description,platform, andarchon_version.$ARCHON_HOME/telemetry-id,$process_person_profile: falsekeeps events in PostHog's anonymous tier (no person profile created, ~4x cheaper).ARCHON_TELEMETRY_DISABLED=1orDO_NOT_TRACK=1; self-host viaPOSTHOG_API_KEY/POSTHOG_HOST.Closes #1261.
How it works
packages/workflows/src/executor.ts): fire-and-forget call tocaptureWorkflowInvoked()right after theworkflow_startedevent emits. Never blocks or throws — PostHog errors are logged atdebugand swallowed.packages/paths/src/telemetry.ts): lazy singleton PostHog client, embedded write-only project key (phc_…keys can only ingest events — safe to ship in source), 500-char cap on description length.packages/server/src/index.ts,packages/cli/src/cli.ts):shutdownTelemetry()flushes buffered events on SIGTERM / end-of-command so short-lived CLI runs don't drop events.What is sent
workflow_name(e.g.archon-plan)workflow_description(from the workflow YAML, capped at 500 chars)platform(cli|web|slack|telegram|github|discord)archon_version(fromBUNDLED_VERSION)distinct_id= random UUID from$ARCHON_HOME/telemetry-idNot sent: prompts, file paths, code, user identity, env vars, conversation content, repo names, git info.
Test plan
bun run validate— type-check + lint + format + all tests (green)packages/paths/src/telemetry.test.tscover opt-out detection, disabled-path no-op, and telemetry-id persistenceARCHON_TELEMETRY_DISABLED=1 bun run cli workflow run …→ no network call, notelemetry-idfile writtenDO_NOT_TRACK=1 bun run cli workflow run …→ same as aboveFollow-ups (deliberately out of scope)
workflow_source(bundled vs. project vs. global) — requires threading source throughexecuteWorkflow()signatures; worth a separate PR.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests