Skip to content

feat(observability): vendor-neutral OTEL tracing + opt-in Sentry#172

Merged
buremba merged 3 commits into
mainfrom
feat/otel-sentry-improvements
Apr 10, 2026
Merged

feat(observability): vendor-neutral OTEL tracing + opt-in Sentry#172
buremba merged 3 commits into
mainfrom
feat/otel-sentry-improvements

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented Apr 9, 2026

Summary

  • Rename TEMPO_ENDPOINTOTEL_EXPORTER_OTLP_ENDPOINT so traces ship to any OTLP collector (Tempo, Jaeger, Datadog, Honeycomb, etc.)
  • Make Sentry opt-in only: remove hardcoded fallback DSN, add consent prompt during lobu init
  • Add OTEL collector endpoint prompt to lobu init
  • Fix Chat SDK tracing gap: platform messages (Slack, Telegram, Discord) now create root spans with traceparent propagation
  • Add response_delivery span to thread consumer for full round-trip tracing
  • Update observability docs with complete trace flow and vendor-neutral language
  • Fix pre-existing typecheck/biome issues (exclude CLI from root typecheck, exclude CSS from biome)

Test plan

  • Core OTEL module unit tests (disabled/enabled, root/child spans, traceparent format)
  • Sentry opt-in unit tests (disabled without DSN, enabled with DSN)
  • End-to-end bot test via Telegram (message sent, response received)
  • Full trace verified in Tempo: 6 spans across gateway + worker in single trace
  • All pre-commit checks pass (format, lint, typecheck)

buremba added 2 commits April 10, 2026 00:44
npm publish doesn't resolve workspace:* protocol — switch to pnpm publish
so @lobu/worker gets @lobu/core resolved to ^3.0.5 on npm.
- Rename TEMPO_ENDPOINT to OTEL_EXPORTER_OTLP_ENDPOINT across
  gateway, worker, Helm charts, and docs so traces can ship to any
  OTLP-compatible collector

- Make Sentry opt-in: remove hardcoded fallback DSN, add opt-in
  prompt during lobu init

- Add OTEL collector endpoint prompt to lobu init

- Fix tracing gap: Chat SDK messages now create root spans with
  traceparent propagation; add response_delivery span to thread
  consumer for full round-trip tracing

- Update observability docs with complete trace flow

- Exclude CLI from root typecheck (has own tsconfig, uses DOM types
  incompatible with Bun-only root config)

- Fix biome config: exclude CSS files (Tailwind directives unsupported)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8bf59c4a8b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/worker/package.json Outdated
"dependencies": {
"@hono/node-server": "^1.19.9",
"@lobu/core": "workspace:*",
"@lobu/core": "^3.0.5",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep worker on matching @lobu/core workspace version

This change widens the worker dependency to ^3.0.5 while the same commit updates worker startup to use the renamed otlpEndpoint tracing config from core; older core versions in that range can still satisfy this constraint and won’t necessarily understand the new option shape. In environments with a pre-existing lockfile, that can pair a new worker with an older core and silently disable the new tracing path, so the dependency should stay aligned to the current workspace/release version.

Useful? React with 👍 / 👎.

Comment on lines +352 to +354
await queueProducer.enqueueMessage(payload);
rootSpan?.end();
void flushTracing();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge End root tracing span on enqueue failures

The new message_received root span is only ended on the success path after enqueue; if resolveAgentOptions(...) or enqueueMessage(...) throws (for example during Redis/queue outages), the function exits without ending the span. That leaves unfinished spans unexported and can accumulate in memory during repeated failures, so span termination should be handled in a finally block.

Useful? React with 👍 / 👎.

- Restore @lobu/core to workspace:* in worker package.json
- Wrap message-handler-bridge tracing in try/finally so root span
  is always ended, even on enqueue failures
@buremba buremba merged commit f3345d3 into main Apr 10, 2026
9 checks passed
@buremba buremba deleted the feat/otel-sentry-improvements branch April 10, 2026 00:09
buremba pushed a commit that referenced this pull request Apr 23, 2026
Follow-ups from an aggregated teammate review of 128 PRs merged between
2026-04-09 and 2026-04-23. Five concrete gaps patched:

- worker: constrain UploadUserFile to the workspace root (#203 follow-up).
  path.join allowed `../` and absolute paths to escape the workspace. Now
  resolves and rejects anything outside workspaceDir when one is set.
- core: flip Sentry sendDefaultPii to false (#172 follow-up). User content
  and identifiers flow through this stack; the schema has no scrubbing so
  PII-by-default was unsafe.
- gateway: make SlackInstructionProvider extend BaseInstructionProvider
  (#269 follow-up). Sibling Skills/Network providers are wrapped in a
  try/catch that returns "" on error; Slack was bypassing it and would
  crash session-context assembly if listConnections threw.
- owletto-backend: rate-limit the join_organization MCP tool to match the
  REST endpoint (#296 follow-up). Keyed on userId since MCP calls don't
  carry a client IP.

Skipped one reviewer finding: removing the process.env fallback for API
keys at worker.ts:1099/1109 (the inconsistency with #225 base-URL code).
Embedded/dev workers depend on that fallback since credentialStore is
only populated from gateway-supplied session context.
buremba added a commit that referenced this pull request Apr 23, 2026
…instr guard, MCP join rate-limit) (#325)

* fix: address gaps found in post-merge review of last 2 weeks of PRs

Follow-ups from an aggregated teammate review of 128 PRs merged between
2026-04-09 and 2026-04-23. Five concrete gaps patched:

- worker: constrain UploadUserFile to the workspace root (#203 follow-up).
  path.join allowed `../` and absolute paths to escape the workspace. Now
  resolves and rejects anything outside workspaceDir when one is set.
- core: flip Sentry sendDefaultPii to false (#172 follow-up). User content
  and identifiers flow through this stack; the schema has no scrubbing so
  PII-by-default was unsafe.
- gateway: make SlackInstructionProvider extend BaseInstructionProvider
  (#269 follow-up). Sibling Skills/Network providers are wrapped in a
  try/catch that returns "" on error; Slack was bypassing it and would
  crash session-context assembly if listConnections threw.
- owletto-backend: rate-limit the join_organization MCP tool to match the
  REST endpoint (#296 follow-up). Keyed on userId since MCP calls don't
  carry a client IP.

Skipped one reviewer finding: removing the process.env fallback for API
keys at worker.ts:1099/1109 (the inconsistency with #225 base-URL code).
Embedded/dev workers depend on that fallback since credentialStore is
only populated from gateway-supplied session context.

* fix: address remaining gaps from post-merge review

Second pass on the 2-week PR review. Five more gaps closed:

- gateway: unit tests for verifyOwnedAgentAccess covering owner, cross-tenant,
  cross-platform, agent-scoped, admin-bypass, unknown-agent, and external
  OAuth mismatches (#285 follow-up). Closes the test hole in the cross-tenant
  ownership guard.
- owletto-backend: validate each CSP frame-ancestor entry against a strict
  host-source / scheme-source grammar before joining (#246 follow-up).
  Malformed env entries like `https:// lobu.ai` are now dropped instead of
  silently rendered into the directive.
- owletto-backend: introduce normalizeHost() in utils/public-origin and use
  it from getSubdomainZone, extractSubdomainOrg, getCanonicalRedirectUrl, and
  the BetterAuth trustedOrigins wiring (#234/#224/#214 follow-up). Unifies
  the ad-hoc .toLowerCase()/.replace() patterns and adds IDN→punycode so
  `müller.lobu.ai` matches the ASCII zone configured in env.
- owletto-backend: redact member emails that surface via template_data and
  tab template_data in resolve_path, not only on the single resolved entity
  (#309 follow-up). A dashboard data source that enumerates $member entities
  no longer leaks emails to non-admin callers. New utils/member-redaction
  helper plus unit coverage.
- owletto-backend: treat #311 as already closed — ToolContext.memberRole is
  `string | null` (required, not optional), so TypeScript already catches
  future literal omissions at construction.

---------

Co-authored-by: Claude <noreply@anthropic.com>
buremba added a commit that referenced this pull request May 18, 2026
Picks up:
- PR #170 (lobu-ai/owletto): inline pairing + permissions into the
  chrome extension sidepanel (no more separate pairing.html tab).
- PR #172 (lobu-ai/owletto): pin chrome extension ID via manifest key
  (`amnnhclgmbldmfcfamonoggjhfidemmm`) and hardcode it into the Mac
  bridge's allowed_origins; env var override now validates against
  the chrome ID regex.
buremba added a commit that referenced this pull request May 18, 2026
#889)

Picks up owletto-ai/owletto#185 (mac menubar context picker +
per-context Keychain + Start/Stop semantics) and adds the CLI/
server hooks the menubar needs.

CLI / server changes:

- getServerConfig() in packages/cli/src/internal/context.ts now
  honors LOBU_CONTEXT env in addition to the explicit arg and the
  config's currentContext. Without this, the Mac menubar's
  spawn-of-`lobu run` (which passes LOBU_CONTEXT=<context>)
  couldn't make the server resolve the right context's server
  block — every run fell back to the operator's CLI default and
  picked PGlite even when a databaseUrl was configured for the
  intended context.

- LobuServerConfig + UserServerConfig gain a `lifecycle?:
  "managed" | "external"` field. The Mac menubar uses this to
  decide whether to own the runner's process (Start/Stop verbs)
  or just connect (Sign in/Sign out). Today only the menubar
  reads it; the CLI's `lobu run` ignores it.

Submodule:

- packages/owletto bumped from fb84cf15 (#174 merge) → 970eb500.
  Inbetween: #170 chrome-ext inline pairing, #172 manifest key,
  #173 sidepanel iframe fix, plus #185.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant