perf: performance sweep#674
Merged
Merged
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Implements the low/medium-risk "TOP 3" items from the parallel performance analyses. Behavior-preserving caching (bounded / immutable data), batched DB + embedding work, parallelized independent awaits, and a few hot-path micro-opts. Areas: - core: memoize getEncryptionKey() (per-worker-RPC / per-config-decrypt hot path); pass log `info` to Sentry by reference instead of spreading; tighten sanitizeForLogging (single compiled regex + maxDepth cap). - agent-worker: parallelize output/ dir clearing; drop per-delta JSON.stringify-for-logging on the stream-delta POST path. - connector-worker / embeddings: real array batching in batchGenerateLocalEmbeddings (single vectorized ONNX pass); batch embed_backfill embeddings instead of one round-trip per event. - server gateway: reorder API auth middleware so the local worker-token check runs before the remote OIDC userinfo fetch; stop calling flushTracing() per inbound message (SimpleSpanProcessor already exports on span end); pre-lowercase the env domain allow/deny lists once; memoize getConfiguredPublicOrigin(). - server auth: bearer (PAT/OAuth) MCP auth now trusts the shared 60s membership-role cache (was hard-bypassing it every request, same as the cookie path). - server stores: hoist `decrypt` to a static import + fast-path decryptLegacyEncryptedConfig (no clone when nothing is enc:v1:). - server tools/sandbox: memoize enumerateSDKManifest, getAllTools, and getTool over the static tool registry / method metadata. - server index: memoize generateOpenAPISpec() by origin; defer assertExternalDepsResolvable until after listen(); assign the env snapshot to c.env by reference instead of Object.assign per request. - server identity: batch revokeDerivationsForEvent into one ANY(...) UPDATE inside the ingest transaction. - server watchers: short-circuit + time-bound the unbounded `runs` seq-scan in reconcileWatcherRuns; skip it entirely when no active watcher run is awaiting a dispatched message. - server notifications: dedup the connections/targets fetch + service token mint and run sends via Promise.allSettled; fan bot delivery out once per createNotificationForUsers call instead of once per user. - server connectors: detach the repair-agent triggers from the worker-completion ACK (fire-and-forget; atomic claims make it safe); fetch loadRecentRuns lazily so it isn't paid on every rejected failure. - openclaw-plugin: memoize the last (query → recall block) so the second recall hook for a turn is a free cache hit.
Member
Author
|
Rebased onto current
All other perf fixes from the PR body are unchanged. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Behavior-preserving performance fixes from the parallel performance-analysis sweep. Caching is bounded (immutable-per-process or short TTL) and credential-affecting paths invalidate correctly.
What landed (per area, with expected win)
core/encryption.tsENCRYPTION_KEYbuffer (was base64-decode + regex on every call).verifyWorkerToken/decrypt(per worker→gateway RPC, per config-value load).core/logger.tsinfoto Sentry by reference instead of{...info, message}.core/sanitize.tsmaxDepthcap; drop deadenvbranch.agent-worker/worker.tsPromise.alltheoutput/dir clearing (was serial awaitedunlinks).agent-worker/gateway-integration.tsJSON.stringify(payload).substring(0,500)log on the stream-delta POST; log identifying fields only.embeddings/embeddings.tsbatchGenerateLocalEmbeddingspasses the whole batch to transformers.js (single padded vectorized ONNX pass).Promise.allover per-text calls ran serially).connector-worker/executor.tsembed_backfillnow batches embeddings instead of one round-trip per event.server/gateway/auth/api-auth-middleware.tsuserinfofetch./api/v1/agents/*.server/gateway/.../message-handler-bridge.ts,unified-thread-consumer.tsflushTracing()per inbound message (SimpleSpanProcessoralready exports on span end).server/gateway/proxy/http-proxy.ts.toLowerCase()from the per-request domain matcher.server/utils/public-origin.tsgetConfiguredPublicOrigin()(mirrorshasLocalFrontend).new URL()per auth resolution.server/workspace/multi-tenant.tsserver/lobu/stores/postgres-stores.tsdecryptto a static import; fast-pathdecryptLegacyEncryptedConfig(no clone unless a value isenc:v1:).require+ needless object clone from a per-row connection mapper.server/sandbox/sdk-manifest.ts,tools/registry.tsenumerateSDKManifest,getAllTools,getToolover the static tool registry / method metadata.tools/listschema-flatten become Map lookups.server/index.tsgenerateOpenAPISpec()by origin./openapi.jsonbecomes a Map lookup instead of an O(tools × schema) walk.server/server.tsassertExternalDepsResolvableuntil afterlisten(); assign the env snapshot toc.envby reference instead ofObject.assignper request.server/identity/engine.tsrevokeDerivationsForEventinto one= ANY(...)UPDATE.server/watchers/automation.tsreconcileWatcherRunsshort-circuits when no active watcher run awaits a dispatched message; otherwise drives the containment join from the small side and bounds thechat_messagescan to recent completions.runsseq-scan that ran every 60s, forever.server/notifications/service.tsPromise.allSettled, and fan bot delivery out once percreateNotificationForUserscall (was once per user → N duplicate messages + N re-fetches).O(users × connections)serial HTTP →O(connections)parallel.server/worker-api.ts,connectors/repair-agent.tsloadRecentRunslazily so the heaviest query isn't paid on every rejected failure.openclaw-plugin/index.ts(query → recall block)so the second recall hook for a turn is a free cache hit.search_memorycalls per turn in half on the latency-sensitive hook path.Validation:
make build-packages,bun run typecheck, and the affected packages' unit suites (core, agent-worker, connector-worker, embeddings, server__tests__/unit) all pass. A few tests were updated to reflect the new behavior (encryption-key cache reset hook;loadRecentRunsis now lazy; the embeddings test mock now mirrors the array-batch contract). Integration suites (Postgres-backed) were not run.Deferred (architectural / higher-risk — needs a dedicated PR)
packages/server/src/db/migrations/+embedded-schema-patches.ts— add acurrent_eventsview (noevent_embeddingsjoin) and migrate the ~65 call sites; med-risk view change across many readers, M–L effort. (analyst 10 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/server/src/db/migrations/—idx_entities_metadata_gin(GIN overentities.metadata) + rewritefindEntitiesByMetadataFieldto@>;@>semantics differ for non-string JSON + write-amplification — needs its own test. (analyst 10 Add Claude Code GitHub Workflow #3)packages/server/src/db/migrations/— dynamic-metadata GIN index onevents; same write-amplification /@>concerns. (analyst 10 Add Claude Code GitHub Workflow #3)packages/server/src/gateway/proxy/secret-proxy.ts— stream request/response bodies (duplex: 'half') instead of buffering into strings; needs proxy integration tests across Node 22–24. (analyst 08 Add Claude Code GitHub Workflow #3)packages/server/src/gateway/proxy/egress-judge/cache.ts— coarsen the verdict cache key (drop/normalize the path) so plain-HTTP traffic actually hits the cache; widens the trust window — needs a per-policy opt-out. (analyst 08 feat: Replace GitHub Actions with Slack Bolt Application #4)packages/server/src/gateway/permissions/grant-store.ts+http-proxy.ts— in-process grant cache forisDenied/hasGrant; invalidation wiring on grant mutations. (analyst 08 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/server/src/mcp-proxy/credential-resolver.ts— short-TTL cache for resolved MCP credentials, invalidate on auth failure; correctness-sensitive (token rotation). (analyst 11 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/server/src/utils/compiler-core.ts+sandbox/run-script.ts— LRU compile cache +esbuild.transformoverbuildfor sandbox scripts; needs care around thenpm:rewrite path. (analyst 11 Kubernetes Integration #2)packages/server/src/sandbox/run-script.ts— pre-snapshotGUEST_PREAMBLE(and the isolate-pool variant — explicitly out of scope). (analyst 11 Add Claude Code GitHub Workflow #3)packages/server/src/gateway/connections/conversation-state-store.ts— drop thehistory_indexwrite + DB lock fromappendHistory, bulk-insert backfill rows; touches the Chat-SDK list contract. (analyst 07 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/server/src/gateway/connections/slack-connection-coordinator.ts— in-memoryteamId → connectionIdindex for Slack webhook routing; derived-index invalidation. (analyst 07 feat: Replace GitHub Actions with Slack Bolt Application #4)packages/server/src/lobu/stores/postgres-secret-store.ts— TTL cache forPostgresSecretStore.get; secret-rotation staleness window. (analyst 14 feat: Replace GitHub Actions with Slack Bolt Application #4)packages/server/src/worker-api/device-reconcile.ts— gatereconcileDeviceCapabilitiesbehind a change-detector + memoize the connector catalog walk; must keep the self-heal path. (analyst 14 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/server/src/auth/schema-validation.ts+ajv-singleton.ts— cache compiled AJV validators (+ optional entity-type-schema DB lookup cache). (analyst 13 Kubernetes Integration #2)packages/server/src/workspace/multi-tenant.ts— full bearer-tokenAuthInfocache + drop the redundant 3rdSELECT "user"by propagatingemail/name/emailVerified; touches the authz gate, needs careful invalidation. (analyst 13 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/server/src/connectors/repair-agent.ts+worker-api.ts— foldmaybeCloseRepairThread's UPDATE into the success-path feeds UPDATE; touches the worker-completion transaction's race semantics. (analyst 09 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/cli/src/commands/_lib/apply/apply-cmd.ts— parallelizefetchRemoteSnapshotinto dependency waves with a concurrency cap; well-tested but reorders network calls. (analyst 03 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/cli/src/commands/eval.ts+eval/runner.ts— bounded-concurrency eval/trial runner; could surface gateway resource pressure. (analyst 03 Kubernetes Integration #2)packages/cli/package.json— split server-only deps out of@lobu/cli'sdependencies(sharp/jimp/playwright as optional); easy to break a lazy runtime path. (analyst 03 Issue #2: Changes from Claude #5)packages/agent-worker/src/embedded/just-bash-bootstrap.ts— memoizediscoverBinaries()+ bound the per-binary shebang read; (the lazy-import hoist is also pending). (analyst 02 Remove Github Actions Integration and replace with Slack Bolt Application #1)packages/agent-worker/src/openclaw/worker.ts— cache theSessionManager/sessionpersessionFile; correctness-sensitive (provider-change / session-reset invalidation). (analyst 02 Kubernetes Integration #2)packages/agent-worker/src/openclaw/worker.ts— skip the.skills/rewrite whenskillsConfigis unchanged (hash gate). (analyst 02 Issue #2: Changes from Claude #5)packages/agent-worker/**+gateway/sse-client.ts+embedded/mcp-cli-commands.ts— hoistawait import(...)of hot worker modules to static imports (also fixes a documented repo-rule violation). (analyst 02 Add Claude Code GitHub Workflow #3)packages/connector-worker/src/daemon/worker.ts— honornext_poll_seconds+ claim back-to-back when work is queued (instead of a fixed 10s sleep). (analyst 04 Add Claude Code GitHub Workflow #3)packages/connector-sdk/src/browser/cdp.ts— cache the resolved CDP ws:// URL +Promise.allthe port probes; stale-endpoint risk. (analyst 04 feat: Replace GitHub Actions with Slack Bolt Application #4)packages/connector-worker/src/executor/subprocess.ts— warm-pool of pre-forked child processes; defeats part of the per-run isolation guarantee. (analyst 04 Support local as an alternative to Kubernetes deployment #6)packages/connectors/**— sharedcreateRateLimiter/ bounded-concurrency util + convert the per-item-sleep connectors (Reddit, Gmail, website); parallelize RSS fetches. (analyst 05 Kubernetes Integration #2/feat: Replace GitHub Actions with Slack Bolt Application #4/Issue #2: Changes from Claude #5)packages/openclaw-plugin/src/index.ts— replacefetchMcpBootstrapSync/refreshStoredTokenSyncnode -esubprocesses with async fire-and-forget; touches the "tools must exist before prompt build" invariant. (analyst 06 Add Claude Code GitHub Workflow #3)packages/landing/**— re-architectclient:loadislands (client:visible/client:idle, pass selected use-case data as props); re-encodedemo.mp4+ optimizepublic/images. (analyst 06 Issue #2: Changes from Claude #5/Support local as an alternative to Kubernetes deployment #6)packages/server/src/__tests__/setup/test-db.ts+vitest.config.ts— per-fork test database (dropsingleFork); test-isolation rewrite needing a full integration run. (analyst 15 Remove Github Actions Integration and replace with Slack Bolt Application #1)