Skip to content

fix(core): accept URL-safe base64 in ENCRYPTION_KEY validator (prod hotfix)#735

Merged
buremba merged 1 commit into
mainfrom
fix/encryption-key-urlsafe-base64
May 15, 2026
Merged

fix(core): accept URL-safe base64 in ENCRYPTION_KEY validator (prod hotfix)#735
buremba merged 1 commit into
mainfrom
fix/encryption-key-urlsafe-base64

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 15, 2026

Why — prod is broken right now

PR #692 ("sec: hardening sweep — encryption-key", merged 2026-05-13) tightened getEncryptionKey() in packages/core/src/utils/encryption.ts to require strict canonical base64 ([A-Za-z0-9+/]).

Prod's existing ENCRYPTION_KEY is URL-safe base64 (uses _/- instead of +//) — generated historically with openssl rand -base64 32 | tr +/ -_. Decodes to the same 32 bytes; the alphabet is the only difference. The new regex rejects it.

Effect: every call to encrypt() / decrypt() 500s in prod. App-pod logs show:

[worker-auth] Error verifying token: ENCRYPTION_KEY must be a canonical base64...
  at getEncryptionKey
  at encrypt
  at generateWorkerToken
  at /lobu/api/v1/agents

lobu chat against any agent on app.lobu.ai returns Internal server error. The food-ordering watchers (lunch-open / lunch-finalize, scheduled Mon-Fri 11:00 / 11:35 UTC in lobu-team) will fail when they fire on Monday — same code path.

Fix

Add a parallel branch that accepts URL-safe base64 ([A-Za-z0-9_-], no padding), with the same round-trip + length check so typo'd keys are still rejected.

No key rotation needed — URL-safe and standard base64 decode to identical 32 bytes.

Test plan

  • bun test packages/core/src/__tests__/encryption-key-validation.test.ts — 7 pass (added one for URL-safe).
  • bun run typecheck clean.
  • Post-deploy: lobu chat -c lobu -a food-ordering "ping" against lobu-team returns a real reply (currently 500s).
  • Post-deploy: [worker-auth] Error verifying token lines stop appearing in summaries-app-lobu-app pod logs.

Summary by CodeRabbit

  • New Features

    • Encryption keys now support URL-safe base64 format in addition to existing formats.
  • Tests

    • Added regression test to validate URL-safe base64 encryption key handling with round-trip encryption and decryption.

Review Change Stack

PR #692 ("sec: hardening sweep") tightened `getEncryptionKey()` to require
strict canonical base64 (`[A-Za-z0-9+/]`). Historical keys generated as
`openssl rand -base64 32 | tr +/ -_` use the URL-safe alphabet (`-_`) and
the validator started rejecting them — even though they decode to the same
32 bytes.

Effect: every code path that calls `encrypt()` / `decrypt()` 500s on prod,
including agent session creation. `lobu chat` against any agent in prod
returns "Internal server error" with this in app-pod logs:

  [worker-auth] Error verifying token: ENCRYPTION_KEY must be a canonical
  base64 or hex encoded 32-byte key.
    at getEncryptionKey
    at encrypt
    at generateWorkerToken
    at /lobu/api/v1/agents

Fix: add a parallel branch that accepts URL-safe base64 (alphabet
`[A-Za-z0-9_-]`, no padding), with the same round-trip + length check so
typo'd keys are still rejected. No key rotation needed.

Test added: 32-byte URL-safe base64 key round-trips encrypt/decrypt.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

Added support for URL-safe base64-encoded AES-256 encryption keys to the encryption utility, with round-trip validation in the key decoding logic. A regression test verifies the feature works end-to-end through encrypt/decrypt cycles.

Changes

Base64url encryption key support

Layer / File(s) Summary
URL-safe base64 key validation and testing
packages/core/src/utils/encryption.ts, packages/core/src/__tests__/encryption-key-validation.test.ts
getEncryptionKey() now accepts base64url-encoded 32-byte keys with canonical round-trip validation and caching. A new test case verifies that keys created with Buffer.toString("base64url") successfully encrypt and decrypt plaintext.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit hops with base64 glee,
URL-safe keys, validated with care,
Thirty-two bytes in base64url cheer,
Round-trip they dance through cipher's snare,
Secure and tested, ready to share! 🐰🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically identifies the main fix: accepting URL-safe base64 in the ENCRYPTION_KEY validator as a production hotfix.
Description check ✅ Passed The PR description provides comprehensive context (why, what, effect, fix), includes a detailed test plan with checkboxes, and lists follow-up validation steps. Most required sections are well-documented; post-deploy validation is appropriately deferred.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/encryption-key-urlsafe-base64

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7183404bd8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// some keys were generated as `openssl rand -base64 32 | tr +/ -_` and stored
// in this form; same 32 bytes, just a different alphabet. Apply the same
// round-trip check so typos still get rejected.
if (/^[A-Za-z0-9_-]+$/.test(key)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Accept the padded URL-safe keys this hotfix targets

For URL-safe keys generated by the documented historical command openssl rand -base64 32 | tr +/ -_, the trailing base64 padding is preserved (=), so 32-byte keys are 44 chars and end in =. This regex rejects those keys before decoding, and the later round-trip would also compare Node's unpadded base64url output to the padded input, so deployments with the stated production key format will continue throwing from getEncryptionKey() and 500ing on encrypt/decrypt paths.

Useful? React with 👍 / 👎.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 70.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
packages/core/src/utils/encryption.ts 70.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/__tests__/encryption-key-validation.test.ts`:
- Around line 45-52: The test is intermittently passing because
getEncryptionKey() caches the first valid key, so set process.env.ENCRYPTION_KEY
after clearing that cache; either call jest.resetModules() before
requiring/importing encrypt/decrypt or add and call a small exported reset
function (e.g., resetEncryptionKeyCache) that clears the module-level cached key
used by getEncryptionKey() so the URL-safe key is actually parsed during this
test.

In `@packages/core/src/utils/encryption.ts`:
- Around line 37-50: The validation error message that currently claims only
"canonical base64 or hex" are accepted must be updated to mention URL-safe
base64 as well; locate the key validation/throw site in the same function where
cachedKey is set (the URL-safe check using /^[A-Za-z0-9_-]+$/ and
Buffer.from(key, "base64url") is performed) and change the thrown/returned error
text to include "URL-safe base64" (or "base64url") alongside canonical base64
and hex so the error accurately reflects all accepted formats.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7e6a1262-f422-4d59-84a5-9c924f4ee1d4

📥 Commits

Reviewing files that changed from the base of the PR and between 73eba79 and 7183404.

📒 Files selected for processing (2)
  • packages/core/src/__tests__/encryption-key-validation.test.ts
  • packages/core/src/utils/encryption.ts

Comment on lines +45 to +52
test("valid 32-byte URL-safe base64 key round-trips encrypt/decrypt", () => {
// Historical keys were sometimes generated with `openssl rand -base64 32 |
// tr +/ -_` and stored in URL-safe form (alphabet [A-Za-z0-9_-], no
// padding). Same 32 bytes — must be accepted.
process.env.ENCRYPTION_KEY = Buffer.alloc(32, 99).toString("base64url");
const enc = encrypt("urlsafe secret");
expect(decrypt(enc)).toBe("urlsafe secret");
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This regression can pass without exercising the URL-safe parsing path.

Because getEncryptionKey() caches the first valid key, this test may reuse the previous test’s base64 key instead of the URL-safe key set on Line 49.

Proposed patch
-import { decrypt, encrypt } from "../utils/encryption";
+import {
+  __resetEncryptionKeyCacheForTests,
+  decrypt,
+  encrypt,
+} from "../utils/encryption";
@@
   beforeEach(() => {
     originalKey = process.env.ENCRYPTION_KEY;
+    __resetEncryptionKeyCacheForTests();
   });
@@
   afterEach(() => {
     if (originalKey !== undefined) {
       process.env.ENCRYPTION_KEY = originalKey;
     } else {
       delete process.env.ENCRYPTION_KEY;
     }
+    __resetEncryptionKeyCacheForTests();
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/__tests__/encryption-key-validation.test.ts` around lines
45 - 52, The test is intermittently passing because getEncryptionKey() caches
the first valid key, so set process.env.ENCRYPTION_KEY after clearing that
cache; either call jest.resetModules() before requiring/importing
encrypt/decrypt or add and call a small exported reset function (e.g.,
resetEncryptionKeyCache) that clears the module-level cached key used by
getEncryptionKey() so the URL-safe key is actually parsed during this test.

Comment on lines +37 to +50
// Try as URL-safe base64 (alphabet [A-Za-z0-9_-], no padding). Historically
// some keys were generated as `openssl rand -base64 32 | tr +/ -_` and stored
// in this form; same 32 bytes, just a different alphabet. Apply the same
// round-trip check so typos still get rejected.
if (/^[A-Za-z0-9_-]+$/.test(key)) {
const urlsafeBuffer = Buffer.from(key, "base64url");
if (
urlsafeBuffer.length === 32 &&
urlsafeBuffer.toString("base64url") === key
) {
cachedKey = urlsafeBuffer;
return urlsafeBuffer;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the validation error text to mention URL-safe base64.

Line 66 still says only canonical base64 or hex are accepted, which is now outdated and can mislead incident triage.

Proposed patch
-  throw new Error(
-    "ENCRYPTION_KEY must be a canonical base64 or hex encoded 32-byte key. " +
-      "Generate a valid key with: openssl rand -base64 32 (or openssl rand -hex 32)"
-  );
+  throw new Error(
+    "ENCRYPTION_KEY must be a canonical base64, URL-safe base64 (unpadded), or hex encoded 32-byte key. " +
+      "Generate a valid key with: openssl rand -base64 32, openssl rand -base64 32 | tr +/ -_, or openssl rand -hex 32"
+  );

Also applies to: 65-68

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/utils/encryption.ts` around lines 37 - 50, The validation
error message that currently claims only "canonical base64 or hex" are accepted
must be updated to mention URL-safe base64 as well; locate the key
validation/throw site in the same function where cachedKey is set (the URL-safe
check using /^[A-Za-z0-9_-]+$/ and Buffer.from(key, "base64url") is performed)
and change the thrown/returned error text to include "URL-safe base64" (or
"base64url") alongside canonical base64 and hex so the error accurately reflects
all accepted formats.

@buremba buremba merged commit df759d7 into main May 15, 2026
28 of 29 checks passed
@buremba buremba deleted the fix/encryption-key-urlsafe-base64 branch May 15, 2026 00:36
buremba added a commit that referenced this pull request May 18, 2026
…fix Slack (#881)

Two Slack connections in prod (`slack-lobu-preview`, `1b91933131464c95`) were
stuck in `status=error` since 2026-05-13 14:22 UTC with `Failed to resolve
secret ref for connection X field "botToken"`. The underlying secret rows
were intact in the right org; the failure was a transient boot-time issue
caused by the encryption-key parser regression in #692 (a 43-char unpadded
URL-safe base64 key was rejected for ~2 days until #735 added the urlsafe
branch). But once #735 deployed, the connections did not self-heal — the
boot loop only retried `status=active` rows and there was no other path
that would flip them back.

Three changes to the boot path:

1. `initialize()` now retries `status=error` rows alongside `active` ones.
   Transient deploy-time failures self-heal on the next boot; on success
   the error marker is cleared (under the connection's own org context).
   `stopped` stays operator-driven.

2. The catch block wraps `connectionStore.updateConnection` in
   `orgContext.run(connection.organizationId, ...)`. Boot has no ALS org
   id and the postgres store's `saveConnection()` calls `getOrgId()`
   strict — without the wrap the error-marker write itself throws and the
   row stays `active`, hiding the failure.

3. `startInstance()` always rebinds to the connection's `organizationId`
   instead of only when the caller has no org bound. A caller's org that
   happens to differ from the connection's used to silently win and the
   secret lookup would miss the right tenant bucket. Webhooks resolved
   by team_id reach this path with whatever ALS context the public route
   leaves in place; rebinding unconditionally is the safe default.

Tests: `chat-instance-manager-boot.test.ts` drives three red→green cases
against PGlite — recovery from a previous boot's `error` state, the
error-marking branch persisting the status under the right org, and the
cross-tenant rebind in `startInstance()`. All three failed before the fix
and pass after.

Followup manual SQL flips the existing errored rows back to `active` on
prod once the new image rolls (no migration — this is a data fix tied
to an already-fixed transient regression, not a schema change).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants