fix: bug-fix sweep#673
Conversation
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dfdd3e00e7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Must keep paginating until pageToken is exhausted, otherwise the sync | ||
| // token is never obtained and every subsequent sync re-runs the full | ||
| // window from scratch. maxResults is only a soft cap on stored events. | ||
| if (!pageToken) break; |
There was a problem hiding this comment.
Enforce max_results while paginating calendar sync
For calendars with more than max_results events, this loop now keeps paginating until Google has no nextPageToken and still pushes every returned event, so the documented cap (“Maximum events to fetch per sync”) is no longer enforced. Once events.length >= maxResults, the request size drops to 1 and the connector can crawl the rest of a busy calendar one event at a time, ingesting far more rows than configured; keep paging for the sync token, but stop appending/returning events after the cap.
Useful? React with 👍 / 👎.
| const xreal = request.headers.get('X-Real-IP'); | ||
| if (xreal) return xreal.trim(); | ||
| } | ||
| return 'unknown'; |
There was a problem hiding this comment.
Preserve per-client rate-limit keys when proxy trust is unset
When TRUSTED_PROXY is not explicitly set, this now always returns the literal unknown, so every caller of the unauthenticated rate-limited endpoints shares the same bucket (for example invitation preview, public-org join, and OAuth registration, which this commit also enables by default). In deployments that have not added this new env var, 10 OAuth client registrations or 5 invitation previews from any users globally can throttle everyone else; either retain a request-specific fallback or make the trusted-proxy requirement part of the deployment config before switching these endpoints on.
Useful? React with 👍 / 👎.
|
Addressed both Codex P2s in
|
|
Rebased onto current |
07cb728 to
5b97b5b
Compare
…ak/timeout/race fixes Security: - CRITICAL: sandbox action-key smuggling — createActionCaller spread caller input before forcing `action`, so a read-only query_sdk script could set `action: "delete"` and reach write/delete admin handlers. Now spreads input first then forces the discriminator (and strips any caller `action`); added a defensive read-mode check in __sdk_dispatch and a regression test. - HIGH: MCP-proxy SSRF — assertSafeUrl only ran in probeMcpServer; discoverTools /callTool/sendRequest fetched config.upstream_url unchecked. Now validated in sendRequest on every outbound fetch. - HIGH: OAuth profile:read auto-approval gated to first-party (canonical-origin) redirect URIs only; DCR registration rate-limiter on by default + fails closed. - secret-proxy: reject (don't warn) unauthenticated requests that name an agent. - cross-org tool-list bleed — MCP tool-discovery cache keyed by orgId:connectorKey. - getClientIP only trusts X-Forwarded-For / CF-Connecting-IP / X-Real-IP behind TRUSTED_PROXY; takes the rightmost trusted hop. - getLobuServiceToken fails closed without an organizationId. - AgentConnectionStore.deleteConnection / channel-binding list+deleteAll org-scoped. - insertEvent onConflict probe now org-scoped (defense against cross-tenant supersede). Other fixes: - core: sanitizeForLogging cycle guard (WeakSet → "[Circular]"); verifyWorkerToken rejects far-future timestamps; retryWithBackoff isolates throwing shouldRetry/onRetry. - agent-worker: custom-command execFile gets a 120s timeout + correct signal-kill exit code; heartbeat-failure abort resolves the in-flight turn before dispose(); worker exits when SSE reconnects are exhausted (no zombie); 60–120s timeouts on gateway/MCP fetches. - embeddings: getExtractor no longer caches a rejected model-load promise. - connectors: Google Calendar paginates to the trailing nextSyncToken (incremental sync now engages); Gmail uses epoch-second `after:` (no per-day re-emit); Google Play parseDate computed numerically. - connector-sdk: close the launched browser / CDP socket on partial-acquire failure; remove the goto timeout listener leak. - connector-worker: never auto-complete a watcher run as success in the daemon. - server: watcher next_run_at advances on terminal failure / materialize error (no per-minute retry storm); deliverToBotConnections called once per createNotificationForUsers (no per-admin duplicate sends); scaleDeployment(name,1) throws when the worker is gone so the consumer re-creates it; insert-event upsertEmbedding uses the txn handle; dotenv loaded before ./instrument so Sentry isn't silently disabled; test-db fixSchemaConstraints includes the 'task' run type. - openclaw-plugin: autoCapture moved to the agent_end hook (the worker drops before_prompt_build) and pairs the answer with the question that prompted it.
… keep per-client rate-limit key without TRUSTED_PROXY
…lone typecheck (drop dead InvalidationEvent.resource sites + unused import left by #672)
Bug-fix sweep from 15 parallel hunter reports. Security fixes first, then high-confidence contained correctness fixes. Not for auto-merge — needs careful human review of the auth / sandbox changes.
Security
action-key smuggling:createActionCallerspread caller input before forcingaction, letting a read-onlyquery_sdkscript setaction: "delete"and reach write/delete admin handlers. Now spreads input first, strips any calleraction, forces the discriminator; plus a defensive read-modeaccess !== "read"reject in__sdk_dispatch. Regression test added.packages/server/src/sandbox/namespaces/action-call.ts,packages/server/src/sandbox/run-script.tsassertSafeUrlonly ran inprobeMcpServer;discoverTools/callTool/sendRequestfetchedconfig.upstream_urlunchecked. Now validated insendRequeston every outbound fetch.packages/server/src/mcp-proxy/client.tsprofile:readsilent auto-approval now gated to redirect URIs on the canonical origin (getConfiguredPublicOrigin()) — DCR-registered third-party clients fall through to the consent page. DCR registration rate-limiter is on by default (RATE_LIMIT_ENABLED !== 'false') and fails closed (503) instead of fail-open.packages/server/src/auth/oauth/routes.tspackages/server/src/gateway/proxy/secret-proxy.tsconnectorKey→ org B could get org A's cached catalog. Now keyed${orgId}:${connectorKey}.packages/server/src/mcp-proxy/client.tsgetClientIPtrustedX-Forwarded-Forunconditionally → IP rate-limit bypass. Now only trusts forwarded headers behindTRUSTED_PROXY, taking the rightmost trusted hop; otherwise'unknown'(coarse bucket).packages/server/src/utils/rate-limiter.tsgetLobuServiceToken()with no org argument minted a token acting as a random tenant's admin. Now fails closed without anorganizationId.packages/server/src/lobu/service-token.tsAgentConnectionStore.deleteConnection/listChannelBindings/deleteAllChannelBindingsnow org-scoped.insertEventon-conflict origin probe now filters onorganization_id(prevents cross-tenant supersede on append-onlyevents).packages/server/src/lobu/stores/postgres-stores.ts,packages/server/src/utils/insert-event.tsOther fixes
core
sanitizeForLoggingcycle guard (WeakSet→"[Circular]") — was a stack overflow on any circular log object. + regression test.verifyWorkerTokenrejects far-future timestamps (was a one-directional skew check).retryWithBackoffisolates throwingshouldRetry/onRetrycallbacks so they can't mask the real error.agent-worker
execFilegets a 120s timeout +killSignal: SIGKILL; signal-killed children now report a non-zero exit code (was reported as0/success).session.dispose()— was a permanent hang exactly when the gateway is already in trouble.AbortSignal.timeouton gateway and MCP-toolfetchcalls.embeddings
getExtractor()no longer caches a rejected model-load promise (one transient failure bricked the backend until restart).connectors
nextPageTokenis exhausted so the trailingnextSyncTokenis captured — incremental sync now actually engages (was re-emitting the full ±1y window every run on busy calendars).after:<unix-seconds>instead ofafter:YYYY/MM/DD— no more re-emitting the whole current day on every sync.parseDatecomputed numerically (s*1000 + ms) — string concat produced 1970 / year-7340 dates.connector-sdk
WebSocket(+Target.closeTarget) on a partial-acquire failure (newContext/addCookies/newPage/CDP-setup throw) — was leaking Chrome processes / sockets on long-lived worker hosts.CdpPage.gotoremoves itsmessagelistener on navigation timeout (was accumulating handlers on long-lived sessions).connector-worker
run_type='watcher'run assuccess— logs and skips instead, so a stray watcher run can't be stomped (the server-side poll allowlist already excludes it; this is the defensive backstop).server
next_run_atadvances on terminal failure / materialize error (mirrors the feeds model) — stops a permanently-broken watcher re-dispatching a fresh agent run every 60s forever.deliverToBotConnectionscalled once percreateNotificationForUsersinstead of once per admin — was posting the same Slack/Telegram notification N times in a multi-admin org.scaleDeployment(name, 1)throws when the worker process is gone (instead of silent no-op) soMessageConsumer/createWorkerDeploymentre-create it and drain the already-queued message.insertEvent→upsertEmbeddinguses the transaction-boundsqlhandle (was grabbing the singleton pool, breaking atomicity / FK for any transactional caller with an embedding).dotenv.config()loaded at the top of./instrumentso Sentry isn't silently disabled whenSENTRY_DSNlives in.env.test-db.tsfixSchemaConstraintsincludes the'task'run type (was re-narrowingruns_run_type_checkand dropping it between tests).openclaw-plugin
autoCapturemoved frombefore_prompt_build(silently dropped by the worker's plugin-loader) toagent_end, and now pairs the answer with the question that actually prompted it (last assistant message → preceding user message) instead of a mismatched Q/A pair.Deferred (higher-risk / low-confidence — needs a dedicated PR)
packages/core/src/utils/lock.ts—AsyncLock.acquirereleases the new lock on acquisition timeout → concurrent critical sections. Fix needs a careful restructure of the lock-chain wiring.packages/server/src/server.ts+start-local.ts— graceful shutdown doesn'tawait httpServer.close()and tears down DB/gateway before the listener stops; nounhandledRejection/uncaughtExceptionhandlers. Shutdown rework, out of scope here.packages/server/src/gateway/orchestration/impl/embedded-deployment.ts— crashed worker still silently drops the in-flight message (no re-queue / no user-facing error). ThescaleDeploymentfix here covers recovery on the next message, but immediate re-spawn / job-fail-on-unexpected-exit needs plumbing the deployment manager → message consumer.packages/server/src/gateway/orchestration/base-deployment-manager.ts— reconcile can scale a worker to 0 right after a message was queued for it (updateDeploymentActivitybumpslastActivitylast). Needs activity bump at enqueue time.packages/server/src/connect/routes.ts—/connect/:token/validateactivates feeds (status='active', next_run_at=NOW()) before validation completes → cron can run an unvalidated connection in parallel. + concurrent/oauth/startclobbers the PKCE verifier; userinfo/actor failures fall through to a'connect-flow'sentinel owner.packages/server/src/watchers/automation.ts—reconcileWatcherRunsmarks a windowed runcompletedeven if the agent turn is still in flight (complete_windowmid-turn). +dispatchWatcherRunclaim→HTTP→update with no transaction.packages/server/src/lobu/stores/postgres-stores.ts+gateway/channels/binding-service.ts— Telegram-DM channel bindings collide across orgs (key is(platform, channel_id, team_id)withteam_idNULL andchannel_id=user id);/lobu trycan clobber the user's own bound agent andresolveAgentdoesn't check the binding's agent belongs to the connection's org. Needs a keyspace/schema change.packages/server/src/db/embedded-schema-patches.ts— omits thedevice_worker_connection_bindingbackfillUPDATEs present in the dbmate migration; embedded installs with pre-existingdevice_workersrows diverge.packages/connector-sdk/src/retry.ts—withHttpRetrymis-classifies errors via substring matching ('500'matches ids,'invalid'matches transient 5xx bodies). Wants structuredstatuson the thrown error.packages/cli/src/eval/client.ts,commands/chat.ts) — assume\nframing; break on CRLF / multi-linedata:. Robustness, deferred.google_gmail.ts,youtube.ts) — barecatch {}in per-item sync loops swallow 401/429 → "success" withitems_found: 0. YouTube also persists an expiring searchpageTokenas its checkpoint.