fix(evolution): 8 mitigações para Baileys 7 / Evolution v2.3.7 (#2437/#2491/#2495/#2497/#2498)#23
fix(evolution): 8 mitigações para Baileys 7 / Evolution v2.3.7 (#2437/#2491/#2495/#2497/#2498)#23adm01-debug wants to merge 16 commits into
Conversation
…#2498) Janela de 30s pós-515 (Connection Replaced) durante scan de QR no protocolo multi-device do Baileys. O 401/loggedOut que segue é apenas limpeza de slot antigo, não logout real. - markStream515 / hadRecentStream515 / isConnectionReplaced515 em evolution-helpers.ts (in-memory + fallback persistido em audit) - handleConnectionUpdate registra 515 e suprime alerta crítico - handleLogoutInstance ignora reasonCode=401 dentro da janela
Quando o health-check detecta instância 'connected' sem mensagens nos
últimos 30min, dispara PUT /instance/restart/{instance} (rate-limited a
1/h via system_logs.category='auto_restart_deaf_session') para recriar
o socket interno sem invalidar a sessão.
Recuperação automática do bug 'session deaf' do Baileys 7.0 onde o WS
permanece aberto mas messages.upsert para de chegar.
…37/#2497) - Default 2.3000.1033773198 (versão validada pela comunidade) - Override via env CONFIG_SESSION_PHONE_VERSION ou body.sessionPhoneVersion - Reduz risco de ban ao parear novos números (issue EvolutionAPI#2497) e QR-cycling de 1min em vez dos 3min padrão (issue EvolutionAPI#2437)
Combinação de syncFullHistory=true + pre-key generation do Baileys 7.0 satura CPU/RAM da Evolution e dispara QR cíclico. Toggle agora aparece só para role 'admin' e default permanece OFF mesmo para admin. Defesa adicional no onSave força false para não-admin.
…_DOWN Endpoint /message/archiveChat está quebrado em Evolution v2.3.7 (PrismaClientValidationError, issue EvolutionAPI/#2495). Antes a chamada caía no DLQ como falha transiente sem visibilidade. Agora retorna envelope explícito com code='ARCHIVE_CHAT_UPSTREAM_DOWN'. Remover o branch quando upstream publicar fix.
…EN/MESSAGING_HISTORY_SET - set-webhook default events agora incluem 4 sinais novos para observabilidade do Baileys 7 (estados intermediários, distinção logout-real, renovação de token, history sync v2) - evolution-health checa STATUS_INSTANCE e LOGOUT_INSTANCE como críticos - webhook router trata status.instance e messaging.history.set (log only, não processa inline para não estourar timeout 60s da edge function)
10s por chamada (3 chamadas + auto-restart cabem no limite de 60s da edge function). Antes, com Evolution saturada (#2437), o health-check travava 30s+ em cada fetch e estourava timeout sem reportar nada. Agora distingue 'unreachable' de 'timeout' nos alerts.
- evolution-webhook persiste last_token_renewed_at em whatsapp_connections - evolution-health alerta se renewal >24h enquanto instância está 'connected' (socket preso silenciosamente) - Migration 20260426180846 adiciona coluna + índice
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe pull request adds WhatsApp Web version configuration support, implements mitigation for false logout events triggered by stream:error 515 during QR scanning, expands webhook event types and health monitoring capabilities including token freshness checks, enforces admin-only access for sync history settings, and updates related test fixtures and validations. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…essaging.history.set) Os 2 eventos novos do Baileys 7 introduzidos no commit d44b8f8 quebravam os testes de contrato (lista canônica fixada em 27 + assertion de 'evento órfão'). Atualiza WEBHOOK_EVENTS_29 e WEBHOOK_EVENTS em conjunto. Marcados como critical:false — são sinais de observabilidade, não bloqueiam o pipeline principal.
Antes: a job "Unit Tests" do CI ficava em "cancelled" (timeout). Build, E2E e Smoke cascateavam o cancelamento. Causa-raiz era um teste que pendurava o runner e mais 8 arquivos quebrados em coleta/asserção. ## Hang (causa do cancelled no CI) - WhatsAppStatusSection: clicar "Ver Status" abre StoryViewer (framer-motion AnimatePresence + Radix Dialog) e trava o jsdom. Skip + TODO até refatorar para testabilidade. ## Pollution intra-arquivo - useEvolutionApi: o pattern `await expect(act(...)).rejects.toThrow()` em "callApi throws and logs on supabase error" deixa um unhandled rejection que zera `result.current` em 71 testes seguintes. Troquei por try/catch + asserção explícita. ## Coleta — supabaseUrl is required - vitest.config.ts: `define` injeta VITE_SUPABASE_URL/PUBLISHABLE_KEY fallback (test.supabase.co) para módulos que constroem o client no topo. Destrava 7 arquivos de teste de uma vez. ## Falhas pontuais - ChatPanelHeader: mock de SLAIndicatorForContact (puxa useQuery). - MessageDetailsDialog: 2 testes de tab-switch skip (Radix Tabs + Dialog não troca de aba em jsdom — TODO usar userEvent). - useMessageReactions: mock de logger.getLogger + supabase.channel. - useIdempotencyMissAlerts.toastDedupe: hook usa `isDev`, não `isAdmin` — mock corrigido. - EditContactDialog: mock de useExternalCargos com 'Dev' na lista. - realtimeFanout: useRetryResolutionAlerts adicionado ao diagrama TRILHA_MENSAGENS_NAVEGAVEL e à allowlist do validador. Resultado local: `npm test` → 240 files, 3434 pass, 38 skip, 0 fail.
CI lintou os arquivos modificados e pegou 2 errors herdados:
- scripts/regen-trilha-mensagens.ts:193 — `no-regex-spaces` em
` %% Links navegaveis` / ` click `. Troquei o literal " " por
`{2}` no regex.
- toastDedupe.test.tsx:1 — `@ts-nocheck` proibido por
`@typescript-eslint/ban-ts-comment`. Removido; tipagem do arquivo
já estava OK (tsc --noEmit limpo).
Restantes são warnings (no-console / no-explicit-any) que já existiam.
Adiciona .mcp.json com: - portainer: https://portainer-mcp.atomicabr.com.br/mcp - evolution: https://evolution-mcp.adm01.workers.dev/mcp E .claude/settings.json com enableAllProjectMcpServers + allowlist explícita pra que próximas sessões já tenham essas tools disponíveis sem prompt de confirmação. Permite ao Claude (em sessões futuras) ler/atualizar variáveis de ambiente e reiniciar o container da Evolution API direto via Portainer, sem depender de SSH manual. Nota: os endpoints fazem auth do lado deles — este arquivo só lista URLs, não embarca segredos.
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
supabase/functions/_shared/evolution-webhook-handlers.ts (1)
131-156:⚠️ Potential issue | 🟠 MajorCritical-disconnect alert is suppressed for 515, but the "🟢 restaurada" alert will still fire ~5s later — operators will be paged anyway.
Flow on a 515 transient:
connection.updatewithstatus === 'close'arrives → DB row goes to'disconnected'. The critical alert is correctly suppressed by!replaced515.connection.updatewithstatus === 'open'arrives ~5s later. At that pointprevConn.status === 'disconnected'(written by step 1), so the branch at lines 149–156 fires and inserts aninfo"🟢 Conexão … restaurada" alert.Net effect: every 515 produces an info alert and pollutes the warroom feed for an event the user explicitly wants hidden. Two ways to fix:
- Preferred — also gate the 'restored' alert on 515 history: check
hadRecentStream515(supabase, instance)before inserting the restored alert and skip it within the 30 s window.- Or — do not write
status='disconnected'at all whenreplaced515is true. Keeps DB at'connected'through the bounce, so neither alert fires. Slightly riskier (a real disconnect immediately after a 515 would be masked for 30s).♻️ Proposed gating on the restored alert
- if (status === 'connected' && prevConn?.status !== 'connected') { - await supabase.from('warroom_alerts').insert({ - alert_type: 'info', - title: `🟢 Conexão ${instance} restaurada`, - message: `A instância ${instance} reconectou com sucesso ao WhatsApp.`, - source: 'evolution-webhook', - }); - } + if (status === 'connected' && prevConn?.status !== 'connected') { + // Suprime "restaurada" se o close anterior foi um 515 — caso contrário cada + // bounce de Connection Replaced gera ruído no warroom. + const post515Bounce = await hadRecentStream515(supabase, instance); + if (!post515Bounce) { + await supabase.from('warroom_alerts').insert({ + alert_type: 'info', + title: `🟢 Conexão ${instance} restaurada`, + message: `A instância ${instance} reconectou com sucesso ao WhatsApp.`, + source: 'evolution-webhook', + }); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@supabase/functions/_shared/evolution-webhook-handlers.ts` around lines 131 - 156, The restored-alert branch currently fires regardless of a recent 515 bounce; change it to skip creating the "🟢 Conexão ... restaurada" warroom alert when a recent 515 stream was seen for this instance. Concretely, inside the block that checks if (status === 'connected' && prevConn?.status !== 'connected') call the helper hadRecentStream515(supabase, instance) and only insert the info alert when that returns false; alternatively (less preferred) avoid writing status='disconnected' earlier when replaced515 is true, but implement the hadRecentStream515 gating around the restored-alert insertion to match the review request.
🧹 Nitpick comments (8)
src/hooks/__tests__/useMessageReactions.test.tsx (2)
8-25: LGTM — mocks align with hook's realtime usage.The added
channel/removeChannelmocks correctly mirror the chainable.on().subscribe()pattern used byuseMessageReactions(seesrc/hooks/useMessageReactions.ts:22-34), so existing tests no longer crash when the realtime effect runs.One optional follow-up: there's no test that exercises the realtime path (e.g., asserting
supabase.channelis called withreactions:${messageId}andremoveChannelis invoked on unmount, or thatdisableRealtimeskips subscription). Worth adding to lock in the behavior these mocks were introduced for.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hooks/__tests__/useMessageReactions.test.tsx` around lines 8 - 25, Add unit tests covering the realtime branch of useMessageReactions: create a test that mounts the hook with a sample messageId and asserts supabase.channel was called with `reactions:${messageId}`, that the channel's on()/subscribe() chain was invoked, and that supabase.removeChannel is called on unmount; also add a test that sets disableRealtime to true and asserts no supabase.channel/subscription calls occur. Use the existing mock channelObj (on/subscribe/unsubscribe) and the supabase mock functions to spy/assert these interactions in the tests for useMessageReactions.
40-40: Nit:getLogger()returns fresh spies on every call.Each invocation of
getLogger()returns a new object with newvi.fn()instances, so any module that captures the logger at import time will hold a different reference than what a test could re-create later — and assertions on logger calls would be impossible to wire up. Not an issue for the current tests (none assert on the logger), but if you later need to verify log output, consider hoisting a single shared mock object:♻️ Suggested tweak
vi.mock('@/lib/logger', () => ({ - log: { error: vi.fn(), debug: vi.fn(), info: vi.fn(), warn: vi.fn() }, - getLogger: () => ({ error: vi.fn(), debug: vi.fn(), info: vi.fn(), warn: vi.fn() }), + log: { error: vi.fn(), debug: vi.fn(), info: vi.fn(), warn: vi.fn() }, + getLogger: vi.fn(() => ({ error: vi.fn(), debug: vi.fn(), info: vi.fn(), warn: vi.fn() })), }));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/hooks/__tests__/useMessageReactions.test.tsx` at line 40, Tests currently call getLogger() which creates new vi.fn() spies on each invocation; replace this with a single shared mock logger object so every getLogger() call returns the same reference. Create a const sharedLogger = { error: vi.fn(), debug: vi.fn(), info: vi.fn(), warn: vi.fn() } and change getLogger to return sharedLogger (so modules that import the logger can be asserted against later); ensure existing tests import or reference getLogger/get the sharedLogger where they need to inspect calls.src/components/inbox/contact-details/__tests__/WhatsAppStatusSection.test.tsx (1)
207-220: Optional: unblock skipped test by mocking framer-motion / disabling pointer-events check.The TODO is well-documented, but two lightweight options often unstick this kind of test without a refactor:
- Mock
framer-motionsomotion.divandAnimatePresenceresolve to plaindivs (this is the most common cause of jsdom hangs here, not Radix Dialog itself).- Use
userEvent.setup({ pointerEventsCheck: 0 })to bypass Radix Dialog's pointer-events-on-body lock.Example top-of-file mock that has worked elsewhere in this file's neighborhood:
vi.mock('framer-motion', async () => { const actual = await vi.importActual<typeof import('framer-motion')>('framer-motion'); return { ...actual, AnimatePresence: ({ children }: { children: React.ReactNode }) => <>{children}</>, motion: new Proxy({}, { get: () => (props: any) => <div {...props} /> }), }; });Not blocking — happy to leave as
it.skipfor now.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/components/inbox/contact-details/__tests__/WhatsAppStatusSection.test.tsx` around lines 207 - 220, The skipped test for WhatsAppStatusSection can be unblocked by mocking framer-motion or disabling Radix pointer-event checks: add a top-of-file mock for 'framer-motion' that replaces AnimatePresence with a passthrough component and replaces motion.* with simple div-wrapping functions, or alternatively initialize userEvent with userEvent.setup({ pointerEventsCheck: 0 }) in the test and remove the it.skip so the test runs; target the test that references WhatsAppStatusSection and the "Ver Status" button and ensure the mock or userEvent setup is applied before rendering in that test file.src/components/inbox/chat/__tests__/MessageDetailsDialog.test.tsx (1)
48-59: Optional: switch touserEventwithpointerEventsCheck: 0to unskip.The Radix Dialog body-pointer-events lock is exactly what
@testing-library/user-event'spointerEventsCheckoption is designed to bypass:import userEvent from '@testing-library/user-event'; const user = userEvent.setup({ pointerEventsCheck: 0 }); await user.click(screen.getByRole('tab', { name: 'Payload' }));That typically resolves Radix Tabs-inside-Dialog assertions in jsdom without needing a component refactor. Not blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/components/inbox/chat/__tests__/MessageDetailsDialog.test.tsx` around lines 48 - 59, Unskip the test for MessageDetailsDialog and replace direct element.click() calls with `@testing-library/user-event` using pointerEventsCheck: 0: import userEvent from '@testing-library/user-event', create const user = userEvent.setup({ pointerEventsCheck: 0 }) inside the test, then await user.click(screen.getByRole('tab', { name: 'Payload' })) and await user.click(screen.getByRole('tab', { name: 'Raw Data' })); keep the existing assertions that await findByTestId('copy-payload') and findByTestId('copy-raw') and retain the timedRpcMock.mockResolvedValueOnce({ data: FULL, error: null }) and wrap(<MessageDetailsDialog ... />) setup.supabase/migrations/20260426180846_add_baileys_health_columns.sql (1)
10-11: Index may be unused by current query patterns.The health-check reads
last_token_renewed_atafter filtering byinstance_id(seesupabase/functions/evolution-health/index.ts:217-230), and the webhook updates byinstance_id. Neither query benefits from an index onlast_token_renewed_atalone — the existing key oninstance_idalready covers point lookups, and there's no scan for "stale tokens across all connections" in the PR.If you anticipate future "find all connections with no renewal in N hours" queries, keep it; otherwise consider dropping it to save write overhead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@supabase/migrations/20260426180846_add_baileys_health_columns.sql` around lines 10 - 11, The new index idx_whatsapp_connections_last_token_renewed_at on public.whatsapp_connections is likely unused because queries filter by instance_id first (see evolution-health logic) and updates use instance_id, so remove this single-column index to avoid write overhead; alternatively, if you expect queries like "find all connections with no renewal in N hours", replace it with a composite index that includes instance_id (e.g., on (instance_id, last_token_renewed_at)) or keep it only if such global scans are planned—update the migration to drop the CREATE INDEX or create a composite index accordingly.supabase/functions/evolution-webhook/index.ts (1)
157-160: Verbose casts on already-typedbaseData.
baseDatais already typedRecord<string, unknown>(line 90), so the(baseData as Record<string, unknown>)casts on each line are redundant and obscure the intent. Optional cleanup:♻️ Suggested simplification
- const chats = Array.isArray((baseData as Record<string, unknown>).chats) ? ((baseData as Record<string, unknown>).chats as unknown[]).length : 0; - const messages = Array.isArray((baseData as Record<string, unknown>).messages) ? ((baseData as Record<string, unknown>).messages as unknown[]).length : 0; - const contacts = Array.isArray((baseData as Record<string, unknown>).contacts) ? ((baseData as Record<string, unknown>).contacts as unknown[]).length : 0; + const chats = Array.isArray(baseData.chats) ? baseData.chats.length : 0; + const messages = Array.isArray(baseData.messages) ? baseData.messages.length : 0; + const contacts = Array.isArray(baseData.contacts) ? baseData.contacts.length : 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@supabase/functions/evolution-webhook/index.ts` around lines 157 - 160, Remove the redundant "(baseData as Record<string, unknown>)" casts and use the typed baseData directly when checking properties: for each variable (chats, messages, contacts) call Array.isArray(baseData.chats) / baseData.messages / baseData.contacts and then cast the property value to unknown[] only when computing .length (e.g., Array.isArray(baseData.chats) ? (baseData.chats as unknown[]).length : 0); keep the console.log(`[MESSAGING_HISTORY_SET] instance=${instance} chats=${chats} messages=${messages} contacts=${contacts}`) as-is.supabase/functions/evolution-health/index.ts (1)
175-183: Use the sharedUPSTREAM_TIMEOUT_MS(or a named restart constant) instead of a literal 15000.The other three upstream calls in this file all use
UPSTREAM_TIMEOUT_MS = 10_000. Hardcoding15000here drifts from that policy and makes future tuning ad-hoc. If the restart genuinely needs more headroom (Baileys session warm-up), promote it to a named constant (e.g.,RESTART_TIMEOUT_MS) right next toUPSTREAM_TIMEOUT_MSso the rationale is local and greppable.♻️ Proposed change
- const UPSTREAM_TIMEOUT_MS = 10_000 + const UPSTREAM_TIMEOUT_MS = 10_000 + const RESTART_TIMEOUT_MS = 15_000 // restart precisa abrir socket + handshake — folga vs upstream comum- signal: AbortSignal.timeout(15000), + signal: AbortSignal.timeout(RESTART_TIMEOUT_MS),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@supabase/functions/evolution-health/index.ts` around lines 175 - 183, The restart fetch uses a hardcoded 15000ms timeout; replace that literal with the shared UPSTREAM_TIMEOUT_MS constant (or if restart needs more headroom, introduce a named RESTART_TIMEOUT_MS constant declared next to UPSTREAM_TIMEOUT_MS) and use that constant for the AbortSignal timeout in the fetch call to `${EVOLUTION_API_URL}/instance/restart/${INSTANCE_NAME}`; ensure the new RESTART_TIMEOUT_MS has a short comment explaining why it differs from UPSTREAM_TIMEOUT_MS so the rationale is local and discoverable.supabase/functions/evolution-api/index.ts (1)
211-213: Minor: redundantString(...)cast and stale-default risk.The
typeof === 'string'guard already proves the value is a string, soString(body.sessionPhoneVersion).trim()can be simplified tobody.sessionPhoneVersion.trim(). Also, the hardcoded fallback'2.3000.1033773198'will silently rot once WhatsApp Web rotates the version that triggers ban-on-pair; the comment notes it is overridable viaCONFIG_SESSION_PHONE_VERSION, but consider exporting it as a named constant near the file top so it is greppable when the next pin is needed.♻️ Proposed simplification
- const sessionPhoneVersion = (typeof body.sessionPhoneVersion === 'string' && body.sessionPhoneVersion.trim()) - ? String(body.sessionPhoneVersion).trim() - : (Deno.env.get('CONFIG_SESSION_PHONE_VERSION') || '2.3000.1033773198'); + const sessionPhoneVersion = + (typeof body.sessionPhoneVersion === 'string' && body.sessionPhoneVersion.trim()) + ? body.sessionPhoneVersion.trim() + : (Deno.env.get('CONFIG_SESSION_PHONE_VERSION') || DEFAULT_SESSION_PHONE_VERSION);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@supabase/functions/evolution-api/index.ts` around lines 211 - 213, The sessionPhoneVersion assignment uses a redundant String(...) cast and an inline hardcoded fallback; replace String(body.sessionPhoneVersion).trim() with body.sessionPhoneVersion.trim(), and move the literal '2.3000.1033773198' into a named exported constant (e.g., DEFAULT_SESSION_PHONE_VERSION) declared near the top of the file so it’s easy to find and update; then use Deno.env.get('CONFIG_SESSION_PHONE_VERSION') || DEFAULT_SESSION_PHONE_VERSION as the fallback when setting sessionPhoneVersion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/components/connections/InstanceSettingsDialog.tsx`:
- Around line 101-113: The current onSave handler in InstanceSettingsDialog
forcefully sets syncFullHistory to false for non-admins which silently flips a
previously admin-enabled true value; change the payload construction in onSave
so that for non-admins you omit the syncFullHistory key entirely (i.e., build
payload = { instanceName, ...settingsData } and delete or avoid adding
syncFullHistory when isAdmin is false) instead of forcing false, referencing the
onSave handler, settingsData, setSettings, isAdmin and loadSettings to ensure
non-admin saves don't overwrite admin-set syncFullHistory; alternatively, if you
prefer the stricter posture, enforce syncFullHistory:false during loadSettings
for non-admins so UI state and persisted value remain consistent.
In `@src/components/inbox/chat/__tests__/MessageDetailsDialog.test.tsx`:
- Around line 70-77: The skipped test "shows copy buttons for supervisor" leaves
the supervisor RBAC path untested; unskip and make it reliably exercise the
supervisor branch in MessageDetailsDialog by either (a) removing the tab click
and asserting the copy buttons appear on the initially rendered tab (if copy
buttons are in default TabsContent), or (b) keep the tab interaction but use
userEvent with pointerEventsCheck: 0 to bypass pointer gating so the tab switch
actually mounts the TabsContent and the test can find 'copy-payload'; ensure
profileRef.current = { role: 'supervisor' } and
timedRpcMock.mockResolvedValueOnce({ data: FULL, error: null }) remain in the
test setup so the component renders the supervisor UI.
In `@supabase/functions/_shared/evolution-helpers.ts`:
- Around line 358-378: The isConnectionReplaced515 predicate is too permissive
because the /\b515\b/ check on free-form strings can match unrelated numeric
fields; update isConnectionReplaced515 to only accept a plain 515 match for
structured statusReason (the existing statusReason === 515 || '515' branch) but
for string candidates require co-occurrence with a 515-specific token—i.e.,
change the candidate checks so that instead of raw /\b515\b/ you only return
true when 515 appears near stream/connection/error/replaced tokens (for example
require a regex that matches 515 within the same token group or within N chars
of /(stream|connection|replaced|error)/i), keeping the other
/connection[\s_-]?replaced/i and /stream[\s:_-]?error/i checks intact; update
references in the loop over candidates (variables candidates, candidate, text)
so only tightened-pattern matches trigger true.
- Around line 338-353: The DB fallback in hadRecentStream515 queries
webhook_audit_log for error_message like '%stream%515%' but no code ever writes
such entries—isConnectionReplaced515 only calls markStream515 (in-memory), so
the fallback is ineffective after cold starts; fix by either (A) writing a
stable audit row when markStream515 is invoked (call auditWebhookEvent or a new
helper with a consistent error_message format that includes "stream 515" so
hadRecentStream515 can match it), or (B) create and write to a dedicated
persistent table for 515 events and update hadRecentStream515 to query that
table, or (C) remove the misleading DB fallback from hadRecentStream515 and rely
solely on the in-memory map; choose one approach and implement the corresponding
changes in markStream515, auditWebhookEvent, and hadRecentStream515 (or add the
new table-access functions) so the fallback reflects actual persisted data.
In `@supabase/functions/_shared/evolution-webhook-handlers.ts`:
- Around line 17-27: handleConnectionUpdate() marks 515 events in-memory via
isConnectionReplaced515() and markStream515(), but the subsequent webhook audit
is saved without an error_message so hadRecentStream515() cannot find a
persistent record after cold-start; update the code path where
isConnectionReplaced515() is true to set the audit payload's error_message
(e.g., "stream:error 515" or the raw statusReason) before calling the audit/save
routine so the persistent fallback query in hadRecentStream515() (which searches
error_message ilike '%stream%515%') will succeed across invocations. Ensure you
still call markStream515(instance) and include the same reason text in both the
in-memory mark and the persisted audit.
In `@supabase/functions/evolution-api/index.ts`:
- Around line 288-301: The deterministic short-circuit for the 'archive-chat'
branch currently returns HTTP 503 which triggers transient-retry logic; change
the Response status to 200 while keeping the existing envelope
(EVOLUTION_ENVELOPE_VERSION, error: true, status: 503, code:
'ARCHIVE_CHAT_UPSTREAM_DOWN', message: ...) and the same headers (corsHeaders,
'Content-Type': 'application/json') so the client inspects the envelope/code and
avoids retries/DLQ; update the Response construction in the action ===
'archive-chat' block accordingly.
In `@supabase/functions/evolution-health/index.ts`:
- Around line 215-230: The current check in the try block skips alerting when
last_token_renewed_at is null; update the logic in the supabase query handling
(whatsapp_connections select of last_token_renewed_at) so that if
conn.last_token_renewed_at is null you also evaluate the connection age (use
conn.connected_at or conn.updated_at timestamp from the same record) and push
the same alert when instanceConnected and the connection has been connected
longer than oneDayMs; additionally replace the empty catch with a catch that
calls a logger (e.g., log.warn or processLogger.warn) and includes the caught
error so only schema-missing noise stays silent while RLS/network errors are
surfaced for debugging.
In `@supabase/functions/evolution-webhook/index.ts`:
- Around line 140-149: The supabase update call using
supabase.from('whatsapp_connections').update(...) does not throw on
PostgREST/RLS/missing-column errors, so the current try/catch won't catch them;
change the code to await the returned result and explicitly check the returned
error field (e.g., const { data, error } = await
supabase.from('whatsapp_connections').update(...).eq('instance_id', instance)),
and when error is present log it with process/console.warn (including
error.message and context like instance and mention missing column) and skip the
token-renewal tracking path; keep network exceptions handled as before but rely
on the explicit error check to detect migration/RLS issues.
---
Outside diff comments:
In `@supabase/functions/_shared/evolution-webhook-handlers.ts`:
- Around line 131-156: The restored-alert branch currently fires regardless of a
recent 515 bounce; change it to skip creating the "🟢 Conexão ... restaurada"
warroom alert when a recent 515 stream was seen for this instance. Concretely,
inside the block that checks if (status === 'connected' && prevConn?.status !==
'connected') call the helper hadRecentStream515(supabase, instance) and only
insert the info alert when that returns false; alternatively (less preferred)
avoid writing status='disconnected' earlier when replaced515 is true, but
implement the hadRecentStream515 gating around the restored-alert insertion to
match the review request.
---
Nitpick comments:
In `@src/components/inbox/chat/__tests__/MessageDetailsDialog.test.tsx`:
- Around line 48-59: Unskip the test for MessageDetailsDialog and replace direct
element.click() calls with `@testing-library/user-event` using pointerEventsCheck:
0: import userEvent from '@testing-library/user-event', create const user =
userEvent.setup({ pointerEventsCheck: 0 }) inside the test, then await
user.click(screen.getByRole('tab', { name: 'Payload' })) and await
user.click(screen.getByRole('tab', { name: 'Raw Data' })); keep the existing
assertions that await findByTestId('copy-payload') and findByTestId('copy-raw')
and retain the timedRpcMock.mockResolvedValueOnce({ data: FULL, error: null })
and wrap(<MessageDetailsDialog ... />) setup.
In
`@src/components/inbox/contact-details/__tests__/WhatsAppStatusSection.test.tsx`:
- Around line 207-220: The skipped test for WhatsAppStatusSection can be
unblocked by mocking framer-motion or disabling Radix pointer-event checks: add
a top-of-file mock for 'framer-motion' that replaces AnimatePresence with a
passthrough component and replaces motion.* with simple div-wrapping functions,
or alternatively initialize userEvent with userEvent.setup({ pointerEventsCheck:
0 }) in the test and remove the it.skip so the test runs; target the test that
references WhatsAppStatusSection and the "Ver Status" button and ensure the mock
or userEvent setup is applied before rendering in that test file.
In `@src/hooks/__tests__/useMessageReactions.test.tsx`:
- Around line 8-25: Add unit tests covering the realtime branch of
useMessageReactions: create a test that mounts the hook with a sample messageId
and asserts supabase.channel was called with `reactions:${messageId}`, that the
channel's on()/subscribe() chain was invoked, and that supabase.removeChannel is
called on unmount; also add a test that sets disableRealtime to true and asserts
no supabase.channel/subscription calls occur. Use the existing mock channelObj
(on/subscribe/unsubscribe) and the supabase mock functions to spy/assert these
interactions in the tests for useMessageReactions.
- Line 40: Tests currently call getLogger() which creates new vi.fn() spies on
each invocation; replace this with a single shared mock logger object so every
getLogger() call returns the same reference. Create a const sharedLogger = {
error: vi.fn(), debug: vi.fn(), info: vi.fn(), warn: vi.fn() } and change
getLogger to return sharedLogger (so modules that import the logger can be
asserted against later); ensure existing tests import or reference getLogger/get
the sharedLogger where they need to inspect calls.
In `@supabase/functions/evolution-api/index.ts`:
- Around line 211-213: The sessionPhoneVersion assignment uses a redundant
String(...) cast and an inline hardcoded fallback; replace
String(body.sessionPhoneVersion).trim() with body.sessionPhoneVersion.trim(),
and move the literal '2.3000.1033773198' into a named exported constant (e.g.,
DEFAULT_SESSION_PHONE_VERSION) declared near the top of the file so it’s easy to
find and update; then use Deno.env.get('CONFIG_SESSION_PHONE_VERSION') ||
DEFAULT_SESSION_PHONE_VERSION as the fallback when setting sessionPhoneVersion.
In `@supabase/functions/evolution-health/index.ts`:
- Around line 175-183: The restart fetch uses a hardcoded 15000ms timeout;
replace that literal with the shared UPSTREAM_TIMEOUT_MS constant (or if restart
needs more headroom, introduce a named RESTART_TIMEOUT_MS constant declared next
to UPSTREAM_TIMEOUT_MS) and use that constant for the AbortSignal timeout in the
fetch call to `${EVOLUTION_API_URL}/instance/restart/${INSTANCE_NAME}`; ensure
the new RESTART_TIMEOUT_MS has a short comment explaining why it differs from
UPSTREAM_TIMEOUT_MS so the rationale is local and discoverable.
In `@supabase/functions/evolution-webhook/index.ts`:
- Around line 157-160: Remove the redundant "(baseData as Record<string,
unknown>)" casts and use the typed baseData directly when checking properties:
for each variable (chats, messages, contacts) call Array.isArray(baseData.chats)
/ baseData.messages / baseData.contacts and then cast the property value to
unknown[] only when computing .length (e.g., Array.isArray(baseData.chats) ?
(baseData.chats as unknown[]).length : 0); keep the
console.log(`[MESSAGING_HISTORY_SET] instance=${instance} chats=${chats}
messages=${messages} contacts=${contacts}`) as-is.
In `@supabase/migrations/20260426180846_add_baileys_health_columns.sql`:
- Around line 10-11: The new index
idx_whatsapp_connections_last_token_renewed_at on public.whatsapp_connections is
likely unused because queries filter by instance_id first (see evolution-health
logic) and updates use instance_id, so remove this single-column index to avoid
write overhead; alternatively, if you expect queries like "find all connections
with no renewal in N hours", replace it with a composite index that includes
instance_id (e.g., on (instance_id, last_token_renewed_at)) or keep it only if
such global scans are planned—update the migration to drop the CREATE INDEX or
create a composite index accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 942b228c-7301-46b3-bb8c-dca9e44b3d1e
📒 Files selected for processing (21)
.env.examplescripts/regen-trilha-mensagens.tssrc/components/connections/InstanceSettingsDialog.tsxsrc/components/inbox/chat/__tests__/ChatPanelHeader.test.tsxsrc/components/inbox/chat/__tests__/MessageDetailsDialog.test.tsxsrc/components/inbox/contact-details/__tests__/EditContactDialog.test.tsxsrc/components/inbox/contact-details/__tests__/WhatsAppStatusSection.test.tsxsrc/hooks/__tests__/useEvolutionApi.test.tssrc/hooks/__tests__/useMessageReactions.test.tsxsrc/hooks/monitoring/__tests__/useIdempotencyMissAlerts.toastDedupe.test.tsxsrc/test/fixtures/TRILHA_MENSAGENS_NAVEGAVEL.mmdsrc/test/realtimeFanout.test.tssupabase/functions/_shared/evolution-helpers.tssupabase/functions/_shared/evolution-sync-actions.tssupabase/functions/_shared/evolution-webhook-handlers.tssupabase/functions/evolution-api/index.tssupabase/functions/evolution-health/index.tssupabase/functions/evolution-webhook/__tests__/contract.test.tssupabase/functions/evolution-webhook/index.tssupabase/migrations/20260426180846_add_baileys_health_columns.sqlvitest.config.ts
| onSave={async () => { | ||
| try { | ||
| // Defesa: força syncFullHistory=false para não-admin mesmo | ||
| // se o backend tiver retornado true em loadSettings. | ||
| const payload = isAdmin | ||
| ? { instanceName, ...settingsData } | ||
| : { instanceName, ...settingsData, syncFullHistory: false }; | ||
| await setSettings(payload); | ||
| toast.success('Configurações salvas!'); | ||
| } catch { | ||
| toast.error('Erro ao salvar'); | ||
| } | ||
| }} isLoading={isLoading} /> |
There was a problem hiding this comment.
Side-effect: non-admin save will silently flip syncFullHistory from true→false.
If an admin previously enabled syncFullHistory and a non-admin later opens this dialog and saves any unrelated change (e.g. toggling rejectCall), loadSettings will have hydrated settingsData.syncFullHistory = true from the backend (line 67), and this defense will overwrite it back to false in the same setSettings call — without surfacing anything in the UI.
This matches the PR objective ("default OFF; defesa em onSave"), so flagging only as a heads-up. If you want to avoid the silent override, you could either (a) skip the field entirely from the payload for non-admins instead of forcing false, or (b) also force syncFullHistory: false in loadSettings for non-admins so state and persisted value stay consistent.
Optional: omit instead of overwriting
- const payload = isAdmin
- ? { instanceName, ...settingsData }
- : { instanceName, ...settingsData, syncFullHistory: false };
+ const { syncFullHistory: _sfh, ...rest } = settingsData;
+ const payload = isAdmin
+ ? { instanceName, ...settingsData }
+ : { instanceName, ...rest, syncFullHistory: false };(Functionally similar — the difference is whether you want to re-assert false on every non-admin save, or only on the initial admin-driven flow. The current code re-asserts, which is the stricter posture.)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| onSave={async () => { | |
| try { | |
| // Defesa: força syncFullHistory=false para não-admin mesmo | |
| // se o backend tiver retornado true em loadSettings. | |
| const payload = isAdmin | |
| ? { instanceName, ...settingsData } | |
| : { instanceName, ...settingsData, syncFullHistory: false }; | |
| await setSettings(payload); | |
| toast.success('Configurações salvas!'); | |
| } catch { | |
| toast.error('Erro ao salvar'); | |
| } | |
| }} isLoading={isLoading} /> | |
| onSave={async () => { | |
| try { | |
| // Defesa: força syncFullHistory=false para não-admin mesmo | |
| // se o backend tiver retornado true em loadSettings. | |
| const { syncFullHistory: _sfh, ...rest } = settingsData; | |
| const payload = isAdmin | |
| ? { instanceName, ...settingsData } | |
| : { instanceName, ...rest, syncFullHistory: false }; | |
| await setSettings(payload); | |
| toast.success('Configurações salvas!'); | |
| } catch { | |
| toast.error('Erro ao salvar'); | |
| } | |
| }} isLoading={isLoading} /> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/components/connections/InstanceSettingsDialog.tsx` around lines 101 -
113, The current onSave handler in InstanceSettingsDialog forcefully sets
syncFullHistory to false for non-admins which silently flips a previously
admin-enabled true value; change the payload construction in onSave so that for
non-admins you omit the syncFullHistory key entirely (i.e., build payload = {
instanceName, ...settingsData } and delete or avoid adding syncFullHistory when
isAdmin is false) instead of forcing false, referencing the onSave handler,
settingsData, setSettings, isAdmin and loadSettings to ensure non-admin saves
don't overwrite admin-set syncFullHistory; alternatively, if you prefer the
stricter posture, enforce syncFullHistory:false during loadSettings for
non-admins so UI state and persisted value remain consistent.
| it.skip('shows copy buttons for supervisor', async () => { | ||
| profileRef.current = { role: 'supervisor' }; | ||
| timedRpcMock.mockResolvedValueOnce({ data: FULL, error: null }); | ||
| wrap(<MessageDetailsDialog messageId="m1" open onOpenChange={() => {}} />); | ||
| await waitFor(() => expect(screen.getByText(/wamid\.x/)).toBeInTheDocument()); | ||
| expect(screen.getByTestId('copy-payload')).toBeInTheDocument(); | ||
| (screen.getByRole('tab', { name: 'Payload' }) as HTMLElement).click(); | ||
| expect(await screen.findByTestId('copy-payload')).toBeInTheDocument(); | ||
| }); |
There was a problem hiding this comment.
Supervisor RBAC path is now uncovered.
The "shows copy buttons for supervisor" case is the only test that actually exercises the supervisor branch of the role check. With both this and the admin tab test skipped, the role gating for copy-payload / copy-raw is only verified for the negative case (agent at line 61). If copy-payload is rendered inside a TabsContent that's only mounted when its tab is active, consider one of:
- Add a non-skipped supervisor assertion that doesn't depend on switching tabs (e.g. assert visibility on the default tab if any copy button lives there), or
- Apply the
userEvent({ pointerEventsCheck: 0 })approach so this stays as real coverage rather than a TODO.
Otherwise a regression that hides copy buttons for supervisors would slip through CI.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/components/inbox/chat/__tests__/MessageDetailsDialog.test.tsx` around
lines 70 - 77, The skipped test "shows copy buttons for supervisor" leaves the
supervisor RBAC path untested; unskip and make it reliably exercise the
supervisor branch in MessageDetailsDialog by either (a) removing the tab click
and asserting the copy buttons appear on the initially rendered tab (if copy
buttons are in default TabsContent), or (b) keep the tab interaction but use
userEvent with pointerEventsCheck: 0 to bypass pointer gating so the tab switch
actually mounts the TabsContent and the test can find 'copy-payload'; ensure
profileRef.current = { role: 'supervisor' } and
timedRpcMock.mockResolvedValueOnce({ data: FULL, error: null }) remain in the
test setup so the component renders the supervisor UI.
| try { | ||
| const cutoff = new Date(Date.now() - STREAM_515_TTL_MS).toISOString(); | ||
| const { data } = await supabase | ||
| .from('webhook_audit_log') | ||
| .select('id') | ||
| .eq('instance', instance) | ||
| .eq('event_type', 'connection.update') | ||
| .gte('created_at', cutoff) | ||
| .ilike('error_message', '%stream%515%') | ||
| .limit(1) | ||
| .maybeSingle(); | ||
| return !!data; | ||
| } catch { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether any code path writes webhook_audit_log rows with error_message
# matching the 'stream 515' pattern that hadRecentStream515 looks for.
rg -nP -C3 "webhook_audit_log" --type=ts
echo "---"
rg -nP -C3 "stream.*515|Connection Replaced" --type=tsRepository: adm01-debug/zapp-web
Length of output: 7577
DB fallback query expects undocumented webhook_audit_log entries that are never written.
The fallback in hadRecentStream515 (lines 340–348) queries webhook_audit_log for rows with error_message matching %stream%515%, but the codebase never writes such rows. When isConnectionReplaced515() detects a 515 error, only markStream515() is called to update the in-memory map; no audit entry is created. Since the in-memory map is ephemeral and lost on edge-function cold starts, the DB fallback should catch the signal across invocations—but it cannot, because no one populates it. This means 515→401 suppression silently breaks after a cold start.
Either (a) write an audit row when markStream515() is called (e.g., via auditWebhookEvent() with a stable error_message format), or (b) store 515 events in a dedicated table, or (c) accept that this fallback is inoperative and remove it to avoid false expectations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@supabase/functions/_shared/evolution-helpers.ts` around lines 338 - 353, The
DB fallback in hadRecentStream515 queries webhook_audit_log for error_message
like '%stream%515%' but no code ever writes such entries—isConnectionReplaced515
only calls markStream515 (in-memory), so the fallback is ineffective after cold
starts; fix by either (A) writing a stable audit row when markStream515 is
invoked (call auditWebhookEvent or a new helper with a consistent error_message
format that includes "stream 515" so hadRecentStream515 can match it), or (B)
create and write to a dedicated persistent table for 515 events and update
hadRecentStream515 to query that table, or (C) remove the misleading DB fallback
from hadRecentStream515 and rely solely on the in-memory map; choose one
approach and implement the corresponding changes in markStream515,
auditWebhookEvent, and hadRecentStream515 (or add the new table-access
functions) so the fallback reflects actual persisted data.
| export function isConnectionReplaced515(data: Record<string, unknown> | null | undefined): boolean { | ||
| if (!isRecord(data)) return false; | ||
| const statusReason = (data.statusReason ?? data.statusCode ?? data.code) as unknown; | ||
| if (statusReason === 515 || statusReason === '515') return true; | ||
|
|
||
| const candidates: unknown[] = [ | ||
| data.reason, data.message, | ||
| (data.lastDisconnect as Record<string, unknown> | undefined)?.error, | ||
| (data.lastDisconnect as Record<string, unknown> | undefined)?.reason, | ||
| ]; | ||
| for (const candidate of candidates) { | ||
| const text = typeof candidate === 'string' | ||
| ? candidate | ||
| : isRecord(candidate) ? JSON.stringify(candidate) : ''; | ||
| if (!text) continue; | ||
| if (/\b515\b/.test(text)) return true; | ||
| if (/connection[\s_-]?replaced/i.test(text)) return true; | ||
| if (/stream[\s:_-]?error/i.test(text) && /515/.test(text)) return true; | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
\b515\b may false-positive on unrelated payload fields.
The predicate scans data.reason, data.message, and stringified lastDisconnect.error/reason for \b515\b. Any unrelated numeric (timestamps, ports, message IDs) containing 515 would match and cause isConnectionReplaced515 to return true, triggering markStream515 and a 30s window where genuine 401 logouts get suppressed. Consider tightening to require co-occurrence with a 515-specific token (e.g., stream:error or Connection Replaced) when matching string payloads — the statusReason === 515 check at line 361 already covers the structured case.
♻️ Suggested tightening
for (const candidate of candidates) {
const text = typeof candidate === 'string'
? candidate
: isRecord(candidate) ? JSON.stringify(candidate) : '';
if (!text) continue;
- if (/\b515\b/.test(text)) return true;
if (/connection[\s_-]?replaced/i.test(text)) return true;
if (/stream[\s:_-]?error/i.test(text) && /515/.test(text)) return true;
}
return false;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@supabase/functions/_shared/evolution-helpers.ts` around lines 358 - 378, The
isConnectionReplaced515 predicate is too permissive because the /\b515\b/ check
on free-form strings can match unrelated numeric fields; update
isConnectionReplaced515 to only accept a plain 515 match for structured
statusReason (the existing statusReason === 515 || '515' branch) but for string
candidates require co-occurrence with a 515-specific token—i.e., change the
candidate checks so that instead of raw /\b515\b/ you only return true when 515
appears near stream/connection/error/replaced tokens (for example require a
regex that matches 515 within the same token group or within N chars of
/(stream|connection|replaced|error)/i), keeping the other
/connection[\s_-]?replaced/i and /stream[\s:_-]?error/i checks intact; update
references in the loop over candidates (variables candidates, candidate, text)
so only tightened-pattern matches trigger true.
| // Evolution API issue #2498: stream:error 515 (Connection Replaced) durante | ||
| // scan de QR é seguido por um 401/loggedOut interno do Baileys que apenas | ||
| // limpa o slot antigo no protocolo multi-device. Não é um logout real. | ||
| // Janela de 30s pós-515 + reasonCode 401 → tratar como falso positivo. | ||
| if (reasonCode === 401 && await hadRecentStream515(supabase, instance)) { | ||
| console.log( | ||
| `[LOGOUT_INSTANCE] instance=${instance} ignored: false-positive 401 within ` + | ||
| `30s of stream:error 515 (Baileys multi-device reconnect, EvolutionAPI#2498).` | ||
| ); | ||
| return; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# 1) Find where webhook_audit_log is inserted and what fields are set.
ast-grep --pattern $"supabase.from('webhook_audit_log').insert($_)"
ast-grep --pattern $'supabase.from("webhook_audit_log").insert($_)'
rg -nP -C6 "from\(['\"]webhook_audit_log['\"]\)\.insert"
# 2) See what event_type value is recorded for connection updates and whether
# the 515 string lands in error_message.
rg -nP -C4 "event_type" --type=ts | rg -nP -C2 '515|stream|connection\.update|CONNECTION_UPDATE'Repository: adm01-debug/zapp-web
Length of output: 1855
🏁 Script executed:
#!/bin/bash
# Find all calls to auditWebhookEvent to see what event_type and error_message values are passed
rg -nP -B3 -A3 "auditWebhookEvent" --type=ts | head -100
# Search for connection.update handling and 515 error scenarios
rg -nP -B5 -A5 "connection\.update|stream.*515|515.*stream" --type=ts | head -150
# Look for where error_message might be set with 515/stream info
rg -nP -B4 -A4 "error_message" --type=ts | grep -B4 -A4 "515\|stream" | head -80Repository: adm01-debug/zapp-web
Length of output: 22221
Persistent fallback for stream 515 suppression will fail after edge-function cold-start.
The handleConnectionUpdate() function detects 515 errors via isConnectionReplaced515() and marks them in-memory with markStream515(instance). However, when the webhook audit is recorded at line 257-260 of evolution-webhook/index.ts, the event is audited with status: 'processed' and no error_message field is populated.
The persistent fallback query in hadRecentStream515() (evolution-helpers.ts:344-346) expects to find records where event_type='connection.update' and error_message ilike '%stream%515%'. Since the audit record has no error_message, this query will return no results after a function cold-start when the in-memory map is cleared, and the 515 → 401 false-positive suppression will silently fail cross-invocation.
To fix: When isConnectionReplaced515() returns true, populate error_message with the detected reason (e.g., "stream:error 515" or the raw statusReason value) before auditing, so the persistent fallback query can find it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@supabase/functions/_shared/evolution-webhook-handlers.ts` around lines 17 -
27, handleConnectionUpdate() marks 515 events in-memory via
isConnectionReplaced515() and markStream515(), but the subsequent webhook audit
is saved without an error_message so hadRecentStream515() cannot find a
persistent record after cold-start; update the code path where
isConnectionReplaced515() is true to set the audit payload's error_message
(e.g., "stream:error 515" or the raw statusReason) before calling the audit/save
routine so the persistent fallback query in hadRecentStream515() (which searches
error_message ilike '%stream%515%') will succeed across invocations. Ensure you
still call markStream515(instance) and include the same reason text in both the
in-memory mark and the persisted audit.
| if (action === 'archive-chat') { | ||
| // Upstream broken on Evolution v2.3.7 — `/message/archiveChat/` | ||
| // retorna PrismaClientValidationError (issue EvolutionAPI/#2495). | ||
| // Curto-circuitamos com um envelope explícito para não poluir o DLQ | ||
| // com falhas determinísticas e dar feedback acionável ao cliente. | ||
| // Remover este branch quando a Evolution publicar fix. | ||
| return new Response(JSON.stringify({ | ||
| version: EVOLUTION_ENVELOPE_VERSION, | ||
| error: true, | ||
| status: 503, | ||
| code: 'ARCHIVE_CHAT_UPSTREAM_DOWN', | ||
| message: 'archive-chat indisponível: endpoint /message/archiveChat está quebrado na Evolution v2.3.7 (issue #2495). Aguardar fix upstream.', | ||
| }), { status: 503, headers: { ...corsHeaders, 'Content-Type': 'application/json' } }); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect how the frontend retry/DLQ layer reacts to 503 vs envelope-error responses
# from this edge function.
rg -nP -C5 'invokeEvolutionWithRetry|archive[-_]chat|ARCHIVE_CHAT_UPSTREAM_DOWN'
rg -nP -C5 '\b503\b' --type=ts -g '!**/node_modules/**' -g '!**/dist/**'Repository: adm01-debug/zapp-web
Length of output: 50376
🏁 Script executed:
# Find invokeEvolutionWithRetry implementation and check how it handles HTTP status codes
rg -n -A 30 'export async function invokeEvolutionWithRetry' src/lib/evolutionSendRetry.ts | head -80Repository: adm01-debug/zapp-web
Length of output: 1629
🏁 Script executed:
# Get more of the invokeEvolutionWithRetry function to see the retry logic
rg -n -A 100 'export async function invokeEvolutionWithRetry' src/lib/evolutionSendRetry.ts | head -150Repository: adm01-debug/zapp-web
Length of output: 4819
🏁 Script executed:
# Look at the isTransient function in evolutionSendRetry.ts to see if 503 is treated as transient
rg -n -B 5 -A 15 'function isTransient' src/lib/evolutionSendRetry.tsRepository: adm01-debug/zapp-web
Length of output: 947
🏁 Script executed:
# Check if the frontend has any special handling for ARCHIVE_CHAT_UPSTREAM_DOWN or envelope codes
rg -n 'ARCHIVE_CHAT_UPSTREAM_DOWN|code.*ARCHIVE|code.*==' src/lib/evolutionSendRetry.ts src/hooks/realtime/messageSender.ts src/hooks/evolution/ -B 2 -A 2Repository: adm01-debug/zapp-web
Length of output: 46
🏁 Script executed:
# Check how the frontend consumes responses from evolution-api invoke calls
# Look for where the response data is processed
rg -n -A 10 'const result = await invokeEvolutionWithRetry' src/hooks/realtime/messageSender.tsRepository: adm01-debug/zapp-web
Length of output: 46
🏁 Script executed:
# Find where invokeEvolutionWithRetry is called and how its result is handled
rg -n 'invokeEvolutionWithRetry' src/ -A 5 | head -100Repository: adm01-debug/zapp-web
Length of output: 2001
🏁 Script executed:
# Get more context around the invokeEvolutionWithRetry call in messageSender.ts
rg -n -A 20 'const { data: apiResult, error: apiError } = await invokeEvolutionWithRetry' src/hooks/realtime/messageSender.tsRepository: adm01-debug/zapp-web
Length of output: 230
🏁 Script executed:
# Get the context around where invokeEvolutionWithRetry is called in messageSender
sed -n '195,230p' src/hooks/realtime/messageSender.tsRepository: adm01-debug/zapp-web
Length of output: 1651
🏁 Script executed:
# Verify the complete isTransient check for the ARCHIVE_CHAT scenario
# Check if the TRANSIENT_PATTERNS would match the message text
python3 << 'EOF'
TRANSIENT_PATTERNS = [
'fetch', 'network', 'timeout', 'aborted', 'econnreset',
'enotfound', '502', '503', '504', '429', 'unavailable',
'temporarily', 'gateway',
]
message = 'archive-chat indisponível: endpoint /message/archiveChat está quebrado na Evolution v2.3.7 (issue `#2495`). Aguardar fix upstream.'
msg_lower = message.lower()
print(f"Message: {message}\n")
print(f"Message (lowercased): {msg_lower}\n")
print("Checking TRANSIENT_PATTERNS matches:")
for pattern in TRANSIENT_PATTERNS:
if pattern in msg_lower:
print(f" ✓ MATCHES: '{pattern}'")
# Check what isTransient would return for 503 status
print("\n--- isTransient logic ---")
print("With status: 503")
print(" status >= 500? YES → return True (TRANSIENT)")
print("\nWith status: 200")
print(" status >= 500? NO")
print(" status === 429? NO")
print(" message matches patterns? NO (indisponível ≠ unavailable)")
print(" → return False (NOT TRANSIENT) → NO RETRY, NO DLQ")
EOFRepository: adm01-debug/zapp-web
Length of output: 654
HTTP 503 causes DLQ pollution — contradicts the comment's intent. Use HTTP 200 with error envelope instead.
The code returns HTTP 503, which signals "transient unavailable" to the frontend's invokeEvolutionWithRetry. That function's isTransient() helper explicitly treats any status ≥ 500 as retryable (lines 47, 52 in evolutionSendRetry.ts), so every archive-chat attempt will retry for maxRetries before being enqueued to the DLQ — the opposite of the comment's stated goal ("para não poluir o DLQ com falhas determinísticas").
Compare with other deterministic short-circuits in this file:
- INSTANCE_PAUSED (line 79): returns
status: 200+error: trueenvelope +Retry-After: 60. - archive-chat (line 300): returns
status: 503+ envelope but no Retry-After hint.
The frontend client checks the error flag and code field (not HTTP status) to determine whether to retry/DLQ. Returning status 200 keeps the message out of the retry loop entirely, as intended.
Change the response status from 503 to 200 to align with the project's deterministic-error pattern. The envelope fields remain unchanged; the client already has the logic to branch on code === 'ARCHIVE_CHAT_UPSTREAM_DOWN'.
Fix
- }), { status: 503, headers: { ...corsHeaders, 'Content-Type': 'application/json' } });
+ }), { status: 200, headers: { ...corsHeaders, 'Content-Type': 'application/json' } });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@supabase/functions/evolution-api/index.ts` around lines 288 - 301, The
deterministic short-circuit for the 'archive-chat' branch currently returns HTTP
503 which triggers transient-retry logic; change the Response status to 200
while keeping the existing envelope (EVOLUTION_ENVELOPE_VERSION, error: true,
status: 503, code: 'ARCHIVE_CHAT_UPSTREAM_DOWN', message: ...) and the same
headers (corsHeaders, 'Content-Type': 'application/json') so the client inspects
the envelope/code and avoids retries/DLQ; update the Response construction in
the action === 'archive-chat' block accordingly.
| // Token freshness (Baileys 7 NEW_JWT_TOKEN). Ausência por >24h enquanto | ||
| // a instância está conectada é sinal de socket "preso" silenciosamente. | ||
| try { | ||
| const { data: conn } = await supabase | ||
| .from('whatsapp_connections') | ||
| .select('last_token_renewed_at') | ||
| .eq('instance_id', INSTANCE_NAME) | ||
| .maybeSingle() | ||
| const lastRenew = conn?.last_token_renewed_at ? new Date(conn.last_token_renewed_at).getTime() : null | ||
| const oneDayMs = 24 * 60 * 60 * 1000 | ||
| if (instanceConnected && lastRenew && Date.now() - lastRenew > oneDayMs) { | ||
| alerts.push(`No JWT token renewal in >24h (sessão potencialmente presa)`) | ||
| } | ||
| } catch { | ||
| // Coluna pode ainda não existir — ignore. | ||
| } |
There was a problem hiding this comment.
Edge case: never-renewed token (last_token_renewed_at IS NULL) is silently OK forever.
Two small gaps:
- The condition
instanceConnected && lastRenew && Date.now() - lastRenew > oneDayMsskips the alert whenlast_token_renewed_atisnull. For a connection that came online before the migration landed (or for whichNEW_JWT_TOKENsimply never fired) the column staysnullindefinitely while the socket may be the very "presa" case the check should flag. Consider also alerting whenlastRenewis null and the connection has beenconnectedlong enough to have been expected to renew (e.g., compare against aconnected_at/updated_atfield, or backfill the column on connect). - The bare
catch {}swallows all errors, not only "column does not exist". Alog.warnwith the error keeps the silence on schema drift but surfaces RLS / network anomalies during incident debugging.
🛡️ Proposed adjustment
- try {
- const { data: conn } = await supabase
- .from('whatsapp_connections')
- .select('last_token_renewed_at')
- .eq('instance_id', INSTANCE_NAME)
- .maybeSingle()
- const lastRenew = conn?.last_token_renewed_at ? new Date(conn.last_token_renewed_at).getTime() : null
- const oneDayMs = 24 * 60 * 60 * 1000
- if (instanceConnected && lastRenew && Date.now() - lastRenew > oneDayMs) {
- alerts.push(`No JWT token renewal in >24h (sessão potencialmente presa)`)
- }
- } catch {
- // Coluna pode ainda não existir — ignore.
- }
+ try {
+ const { data: conn } = await supabase
+ .from('whatsapp_connections')
+ .select('last_token_renewed_at, updated_at')
+ .eq('instance_id', INSTANCE_NAME)
+ .maybeSingle()
+ const lastRenew = conn?.last_token_renewed_at ? new Date(conn.last_token_renewed_at).getTime() : null
+ const connectedSince = conn?.updated_at ? new Date(conn.updated_at).getTime() : null
+ const oneDayMs = 24 * 60 * 60 * 1000
+ if (instanceConnected) {
+ if (lastRenew && Date.now() - lastRenew > oneDayMs) {
+ alerts.push('No JWT token renewal in >24h (sessão potencialmente presa)')
+ } else if (!lastRenew && connectedSince && Date.now() - connectedSince > oneDayMs) {
+ alerts.push('No NEW_JWT_TOKEN ever received in >24h connected (sessão possivelmente presa)')
+ }
+ }
+ } catch (e) {
+ log.warn('token freshness check failed', { error: e instanceof Error ? e.message : String(e) })
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@supabase/functions/evolution-health/index.ts` around lines 215 - 230, The
current check in the try block skips alerting when last_token_renewed_at is
null; update the logic in the supabase query handling (whatsapp_connections
select of last_token_renewed_at) so that if conn.last_token_renewed_at is null
you also evaluate the connection age (use conn.connected_at or conn.updated_at
timestamp from the same record) and push the same alert when instanceConnected
and the connection has been connected longer than oneDayMs; additionally replace
the empty catch with a catch that calls a logger (e.g., log.warn or
processLogger.warn) and includes the caught error so only schema-missing noise
stays silent while RLS/network errors are surfaced for debugging.
| try { | ||
| await supabase.from('whatsapp_connections') | ||
| .update({ last_token_renewed_at: new Date().toISOString() }) | ||
| .eq('instance_id', instance); | ||
| } catch (e) { | ||
| // Coluna pode não existir ainda; log e segue. Migration cuida disso. | ||
| console.warn(`[NEW_JWT_TOKEN] update skipped: ${(e as Error).message}`); | ||
| } | ||
| console.log(`[NEW_JWT_TOKEN] instance=${instance} token renewed`); | ||
| } |
There was a problem hiding this comment.
try/catch won't catch missing-column or RLS errors from supabase-js.
supabase.from(...).update(...) resolves with { data, error } on most failures (PostgREST errors, RLS denial, missing column) instead of throwing. Only network/abort failures reject the promise. So if the migration hasn't run yet, the update silently fails, the console.warn you intend never fires, and you have no signal that token freshness isn't being tracked — which is exactly the failure mode the comment claims to handle.
Capture and check error explicitly:
🛡️ Suggested fix
- try {
- await supabase.from('whatsapp_connections')
- .update({ last_token_renewed_at: new Date().toISOString() })
- .eq('instance_id', instance);
- } catch (e) {
- // Coluna pode não existir ainda; log e segue. Migration cuida disso.
- console.warn(`[NEW_JWT_TOKEN] update skipped: ${(e as Error).message}`);
- }
+ try {
+ const { error: updErr } = await supabase.from('whatsapp_connections')
+ .update({ last_token_renewed_at: new Date().toISOString() })
+ .eq('instance_id', instance);
+ if (updErr) {
+ // Coluna pode não existir ainda (42703) ou RLS — log e segue.
+ console.warn(`[NEW_JWT_TOKEN] update skipped: ${updErr.message ?? updErr.code}`);
+ }
+ } catch (e) {
+ console.warn(`[NEW_JWT_TOKEN] update threw: ${(e as Error).message}`);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try { | |
| await supabase.from('whatsapp_connections') | |
| .update({ last_token_renewed_at: new Date().toISOString() }) | |
| .eq('instance_id', instance); | |
| } catch (e) { | |
| // Coluna pode não existir ainda; log e segue. Migration cuida disso. | |
| console.warn(`[NEW_JWT_TOKEN] update skipped: ${(e as Error).message}`); | |
| } | |
| console.log(`[NEW_JWT_TOKEN] instance=${instance} token renewed`); | |
| } | |
| try { | |
| const { error: updErr } = await supabase.from('whatsapp_connections') | |
| .update({ last_token_renewed_at: new Date().toISOString() }) | |
| .eq('instance_id', instance); | |
| if (updErr) { | |
| // Coluna pode não existir ainda (42703) ou RLS — log e segue. | |
| console.warn(`[NEW_JWT_TOKEN] update skipped: ${updErr.message ?? updErr.code}`); | |
| } | |
| } catch (e) { | |
| console.warn(`[NEW_JWT_TOKEN] update threw: ${(e as Error).message}`); | |
| } | |
| console.log(`[NEW_JWT_TOKEN] instance=${instance} token renewed`); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@supabase/functions/evolution-webhook/index.ts` around lines 140 - 149, The
supabase update call using supabase.from('whatsapp_connections').update(...)
does not throw on PostgREST/RLS/missing-column errors, so the current try/catch
won't catch them; change the code to await the returned result and explicitly
check the returned error field (e.g., const { data, error } = await
supabase.from('whatsapp_connections').update(...).eq('instance_id', instance)),
and when error is present log it with process/console.warn (including
error.message and context like instance and mention missing column) and skip the
token-renewal tracking path; keep network exceptions handled as before but rely
on the explicit error check to detect migration/RLS issues.
6 correções acionáveis nos commits do chat anterior, todas com
implicação em produção:
1. evolution-webhook-handlers.ts (handleConnectionUpdate):
o alerta "🟢 restaurada" disparava no eco do bounce de 515
(open ~5s após close), desfazendo o silenciamento de #1b5b7e7.
Agora só dispara se hadRecentStream515(...) retornar false.
2. evolution-helpers.ts (isConnectionReplaced515): regex `\b515\b`
isolado matchava timestamps/IDs aleatórios que contivessem
"515" e disparava a janela de 30s suprimindo logouts reais.
Agora exige co-ocorrência com stream:error.
3. evolution-webhook-handlers.ts: persiste audit row com
error_message="stream:error 515 ..." quando markStream515 é
chamado, para o fallback de DB no hadRecentStream515 funcionar
após cold-start da edge function.
4. InstanceSettingsDialog.tsx (onSave): non-admin save forçava
syncFullHistory=false, sobrescrevendo silenciosamente um valor
true que admin tinha setado. Agora omite a chave do payload
para não-admins.
5. evolution-api/index.ts (archive-chat): retornava HTTP 503,
que `invokeEvolutionWithRetry.isTransient` trata como retriable
e gera retry-storm + DLQ — exatamente o oposto do objetivo
("não poluir DLQ"). Agora HTTP 200 com envelope error+code, o
cliente lê o body para diferenciar.
6. evolution-webhook/index.ts (NEW_JWT_TOKEN): supabase-js retorna
{data,error} em falhas RLS/coluna ausente sem rejeitar a
promise; o try/catch original não capturava nada disso. Agora
checa `error` explícito.
7. evolution-health/index.ts (token freshness): pulava o alerta
quando last_token_renewed_at era NULL (cenário pré-migration
ou Baileys sem emitir NEW_JWT_TOKEN). Agora também alerta se
conexão >24h sem nenhum NEW_JWT_TOKEN. Bare catch substituído
por catch que logga (RLS/network não passam silenciosos).
Causa real do "Unit Tests: failure" no CI: o workflow define
`VITE_SUPABASE_URL: \${{ secrets.VITE_SUPABASE_URL }}` global. Quando
o secret não está configurado no repo, a variável de ambiente vira
string vazia (não undefined). O `??` de antes só caía no fallback
em null/undefined; em "" passava a string vazia adiante e o
`createClient(SUPABASE_URL, ...)` rejeitava com "supabaseUrl is
required" em 8 arquivos de teste que constroem o client no topo.
Trocado por `||` (também substitui ""), validado com
`VITE_SUPABASE_URL='' VITE_SUPABASE_PUBLISHABLE_KEY='' CI=true npm test`
local — 240/240 verde antes era 232/240.
dlq-idempotency.spec.ts importa dois `test`s: o do `@playwright/test` (default, sem fixtures customizados) e `authTest` do `./fixtures/auth` (com `authenticatedPage`). O test #3 desestruturava `authenticatedPage` mas chamava o `test()` default, fazendo o Playwright abortar a coleta inteira do shard com: Test has unknown parameter "authenticatedPage" at dlq-idempotency.spec.ts:217 Trocado para `authTest(...)`. Os outros arquivos do diretório importam `test` direto de `./fixtures/auth` (que já é authTest) e não têm o problema.
GitHub runners têm 2 cores + ~7GB RAM. Vitest default fork-pool com paralelismo causou flakes intermitentes em \"Unit Tests\" no CI: 3434 testes + jsdom + react-testing-library == picos de memória. Em CI: - pool=forks com singleFork=true: tudo num único processo, sem contenção de heap entre forks paralelos. - retry=2: tolera race conditions residuais (timers, realtime pubsub in-memory) sem precisar fix individual. Local mantém default rápido (paralelismo + sem retry) — não muda o ciclo de dev.
|
Folded into umbrella PR #32 with conflicts resolved on Generated by Claude Code |
Contexto
A Evolution API que consumimos (
evolution.atomicabr.com.br) roda v2.3.7 + Baileys 7.0.0-rc.9 (release candidate). Cinco issues upstream confirmados explicam os sintomas: banimentos no scan, QR cíclico, sessões "surdas", 515→401 falso logout,archiveChatquebrado.Esta PR aplica as 8 mitigações que dependem só do nosso lado (frontend + edge functions). Variáveis de ambiente do servidor Evolution (
CONFIG_SESSION_PHONE_VERSION,CACHE_REDIS_ENABLED=falseetc) ficam para o operador daevolution.atomicabr.com.br.Sumário das 8 melhorias
1b5b7e7Suprime LOGOUT falso pós stream:error 515aca8278Auto-restart em "deaf session"4eeb7ecsessionPhoneVersionno set-settings7b0d8d8syncFullHistoryadmin-only325f1dbarchive-chat503 explícitod44b8f84 webhook events extrasSTATUS_INSTANCE/ history v229efe1fAbortSignal.timeoutno health693b4b3HandlerNEW_JWT_TOKEN+ collast_token_renewed_atDetalhes por commit
1. Falso
LOGOUT_INSTANCEpós 515 —_shared/evolution-helpers.ts+evolution-webhook-handlers.tsmarkStream515/hadRecentStream515/isConnectionReplaced515handleLogoutInstanceignorareasonCode=401dentro da janelahandleConnectionUpdateregistra 515 e suprime alerta crítico de "desconectou"2. Auto-restart em "deaf session" —
evolution-health/index.tsPUT /instance/restart/{instance}(rate-limit 1/h viasystem_logs.category='auto_restart_deaf_session')messages.upsertpara3.
sessionPhoneVersion—evolution-api/index.ts:set-settings2.3000.1033773198(validado pela comunidade)CONFIG_SESSION_PHONE_VERSIONoubody.sessionPhoneVersion.env.exampleatualizado4.
syncFullHistoryadmin-only —InstanceSettingsDialog.tsxadminonSaveforçafalsepara não-admin5.
archive-chatcurto-circuita —evolution-api/index.ts503 ARCHIVE_CHAT_UPSTREAM_DOWNcom envelope versionado6. Eventos extras —
evolution-api/index.ts:set-webhook+evolution-webhook/index.tsSTATUS_INSTANCE,LOGOUT_INSTANCE,NEW_JWT_TOKEN,MESSAGING_HISTORY_SETstatus.instanceemessaging.history.set(apenas log, não processa inline para não estourar timeout 60s)STATUS_INSTANCE/LOGOUT_INSTANCEemcriticalEvents7. Timeouts —
evolution-health/index.tsAbortSignal.timeout(10_000)em todas as 3 chamadas upstreamunreachabledetimeoutnos alerts8.
NEW_JWT_TOKENhealth signal —evolution-webhook/index.ts+ nova migrationlast_token_renewed_atemwhatsapp_connectionsconnected20260426180846_add_baileys_health_columns.sqlTest plan
auto_restart_deaf_sessionemsystem_logs(1× por hora)set-settingscom payload semsessionPhoneVersione confirmar via Evolution logs que o servidor recebeu o defaultagente validar que toggle "Sincronizar histórico" não aparece emInstanceSettingsDialogadmin, ver toggle, deixar OFF (default), salvar, confirmarsyncFullHistory:falsena requisiçãoarchive-chate confirmar resposta503comcode:"ARCHIVE_CHAT_UPSTREAM_DOWN"(sem entrada no DLQ)evolution-healthGET com Evolution lenta (>10s) — verificar alerts*timeout*em vez de hangNEW_JWT_TOKENemwebhook_audit_loge atualização dewhatsapp_connections.last_token_renewed_at20260426180846_add_baileys_health_columns.sqlem stagingPendência (operador da Evolution — fora desta PR)
:latest).https://claude.ai/code/session_01UCHM93gZ9vcZBfcVUkwT4T
Generated by Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Configuration