fix(core): accept URL-safe base64 in ENCRYPTION_KEY validator (prod hotfix)#735
Conversation
PR #692 ("sec: hardening sweep") tightened `getEncryptionKey()` to require strict canonical base64 (`[A-Za-z0-9+/]`). Historical keys generated as `openssl rand -base64 32 | tr +/ -_` use the URL-safe alphabet (`-_`) and the validator started rejecting them — even though they decode to the same 32 bytes. Effect: every code path that calls `encrypt()` / `decrypt()` 500s on prod, including agent session creation. `lobu chat` against any agent in prod returns "Internal server error" with this in app-pod logs: [worker-auth] Error verifying token: ENCRYPTION_KEY must be a canonical base64 or hex encoded 32-byte key. at getEncryptionKey at encrypt at generateWorkerToken at /lobu/api/v1/agents Fix: add a parallel branch that accepts URL-safe base64 (alphabet `[A-Za-z0-9_-]`, no padding), with the same round-trip + length check so typo'd keys are still rejected. No key rotation needed. Test added: 32-byte URL-safe base64 key round-trips encrypt/decrypt.
📝 WalkthroughWalkthroughAdded support for URL-safe base64-encoded AES-256 encryption keys to the encryption utility, with round-trip validation in the key decoding logic. A regression test verifies the feature works end-to-end through encrypt/decrypt cycles. ChangesBase64url encryption key support
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7183404bd8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // some keys were generated as `openssl rand -base64 32 | tr +/ -_` and stored | ||
| // in this form; same 32 bytes, just a different alphabet. Apply the same | ||
| // round-trip check so typos still get rejected. | ||
| if (/^[A-Za-z0-9_-]+$/.test(key)) { |
There was a problem hiding this comment.
Accept the padded URL-safe keys this hotfix targets
For URL-safe keys generated by the documented historical command openssl rand -base64 32 | tr +/ -_, the trailing base64 padding is preserved (=), so 32-byte keys are 44 chars and end in =. This regex rejects those keys before decoding, and the later round-trip would also compare Node's unpadded base64url output to the padded input, so deployments with the stated production key format will continue throwing from getEncryptionKey() and 500ing on encrypt/decrypt paths.
Useful? React with 👍 / 👎.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/core/src/__tests__/encryption-key-validation.test.ts`:
- Around line 45-52: The test is intermittently passing because
getEncryptionKey() caches the first valid key, so set process.env.ENCRYPTION_KEY
after clearing that cache; either call jest.resetModules() before
requiring/importing encrypt/decrypt or add and call a small exported reset
function (e.g., resetEncryptionKeyCache) that clears the module-level cached key
used by getEncryptionKey() so the URL-safe key is actually parsed during this
test.
In `@packages/core/src/utils/encryption.ts`:
- Around line 37-50: The validation error message that currently claims only
"canonical base64 or hex" are accepted must be updated to mention URL-safe
base64 as well; locate the key validation/throw site in the same function where
cachedKey is set (the URL-safe check using /^[A-Za-z0-9_-]+$/ and
Buffer.from(key, "base64url") is performed) and change the thrown/returned error
text to include "URL-safe base64" (or "base64url") alongside canonical base64
and hex so the error accurately reflects all accepted formats.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 7e6a1262-f422-4d59-84a5-9c924f4ee1d4
📒 Files selected for processing (2)
packages/core/src/__tests__/encryption-key-validation.test.tspackages/core/src/utils/encryption.ts
| test("valid 32-byte URL-safe base64 key round-trips encrypt/decrypt", () => { | ||
| // Historical keys were sometimes generated with `openssl rand -base64 32 | | ||
| // tr +/ -_` and stored in URL-safe form (alphabet [A-Za-z0-9_-], no | ||
| // padding). Same 32 bytes — must be accepted. | ||
| process.env.ENCRYPTION_KEY = Buffer.alloc(32, 99).toString("base64url"); | ||
| const enc = encrypt("urlsafe secret"); | ||
| expect(decrypt(enc)).toBe("urlsafe secret"); | ||
| }); |
There was a problem hiding this comment.
This regression can pass without exercising the URL-safe parsing path.
Because getEncryptionKey() caches the first valid key, this test may reuse the previous test’s base64 key instead of the URL-safe key set on Line 49.
Proposed patch
-import { decrypt, encrypt } from "../utils/encryption";
+import {
+ __resetEncryptionKeyCacheForTests,
+ decrypt,
+ encrypt,
+} from "../utils/encryption";
@@
beforeEach(() => {
originalKey = process.env.ENCRYPTION_KEY;
+ __resetEncryptionKeyCacheForTests();
});
@@
afterEach(() => {
if (originalKey !== undefined) {
process.env.ENCRYPTION_KEY = originalKey;
} else {
delete process.env.ENCRYPTION_KEY;
}
+ __resetEncryptionKeyCacheForTests();
});🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/core/src/__tests__/encryption-key-validation.test.ts` around lines
45 - 52, The test is intermittently passing because getEncryptionKey() caches
the first valid key, so set process.env.ENCRYPTION_KEY after clearing that
cache; either call jest.resetModules() before requiring/importing
encrypt/decrypt or add and call a small exported reset function (e.g.,
resetEncryptionKeyCache) that clears the module-level cached key used by
getEncryptionKey() so the URL-safe key is actually parsed during this test.
| // Try as URL-safe base64 (alphabet [A-Za-z0-9_-], no padding). Historically | ||
| // some keys were generated as `openssl rand -base64 32 | tr +/ -_` and stored | ||
| // in this form; same 32 bytes, just a different alphabet. Apply the same | ||
| // round-trip check so typos still get rejected. | ||
| if (/^[A-Za-z0-9_-]+$/.test(key)) { | ||
| const urlsafeBuffer = Buffer.from(key, "base64url"); | ||
| if ( | ||
| urlsafeBuffer.length === 32 && | ||
| urlsafeBuffer.toString("base64url") === key | ||
| ) { | ||
| cachedKey = urlsafeBuffer; | ||
| return urlsafeBuffer; | ||
| } | ||
| } |
There was a problem hiding this comment.
Update the validation error text to mention URL-safe base64.
Line 66 still says only canonical base64 or hex are accepted, which is now outdated and can mislead incident triage.
Proposed patch
- throw new Error(
- "ENCRYPTION_KEY must be a canonical base64 or hex encoded 32-byte key. " +
- "Generate a valid key with: openssl rand -base64 32 (or openssl rand -hex 32)"
- );
+ throw new Error(
+ "ENCRYPTION_KEY must be a canonical base64, URL-safe base64 (unpadded), or hex encoded 32-byte key. " +
+ "Generate a valid key with: openssl rand -base64 32, openssl rand -base64 32 | tr +/ -_, or openssl rand -hex 32"
+ );Also applies to: 65-68
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/core/src/utils/encryption.ts` around lines 37 - 50, The validation
error message that currently claims only "canonical base64 or hex" are accepted
must be updated to mention URL-safe base64 as well; locate the key
validation/throw site in the same function where cachedKey is set (the URL-safe
check using /^[A-Za-z0-9_-]+$/ and Buffer.from(key, "base64url") is performed)
and change the thrown/returned error text to include "URL-safe base64" (or
"base64url") alongside canonical base64 and hex so the error accurately reflects
all accepted formats.
…fix Slack (#881) Two Slack connections in prod (`slack-lobu-preview`, `1b91933131464c95`) were stuck in `status=error` since 2026-05-13 14:22 UTC with `Failed to resolve secret ref for connection X field "botToken"`. The underlying secret rows were intact in the right org; the failure was a transient boot-time issue caused by the encryption-key parser regression in #692 (a 43-char unpadded URL-safe base64 key was rejected for ~2 days until #735 added the urlsafe branch). But once #735 deployed, the connections did not self-heal — the boot loop only retried `status=active` rows and there was no other path that would flip them back. Three changes to the boot path: 1. `initialize()` now retries `status=error` rows alongside `active` ones. Transient deploy-time failures self-heal on the next boot; on success the error marker is cleared (under the connection's own org context). `stopped` stays operator-driven. 2. The catch block wraps `connectionStore.updateConnection` in `orgContext.run(connection.organizationId, ...)`. Boot has no ALS org id and the postgres store's `saveConnection()` calls `getOrgId()` strict — without the wrap the error-marker write itself throws and the row stays `active`, hiding the failure. 3. `startInstance()` always rebinds to the connection's `organizationId` instead of only when the caller has no org bound. A caller's org that happens to differ from the connection's used to silently win and the secret lookup would miss the right tenant bucket. Webhooks resolved by team_id reach this path with whatever ALS context the public route leaves in place; rebinding unconditionally is the safe default. Tests: `chat-instance-manager-boot.test.ts` drives three red→green cases against PGlite — recovery from a previous boot's `error` state, the error-marking branch persisting the status under the right org, and the cross-tenant rebind in `startInstance()`. All three failed before the fix and pass after. Followup manual SQL flips the existing errored rows back to `active` on prod once the new image rolls (no migration — this is a data fix tied to an already-fixed transient regression, not a schema change).
Why — prod is broken right now
PR #692 ("sec: hardening sweep — encryption-key", merged 2026-05-13) tightened
getEncryptionKey()inpackages/core/src/utils/encryption.tsto require strict canonical base64 ([A-Za-z0-9+/]).Prod's existing
ENCRYPTION_KEYis URL-safe base64 (uses_/-instead of+//) — generated historically withopenssl rand -base64 32 | tr +/ -_. Decodes to the same 32 bytes; the alphabet is the only difference. The new regex rejects it.Effect: every call to
encrypt()/decrypt()500s in prod. App-pod logs show:lobu chatagainst any agent onapp.lobu.aireturnsInternal server error. The food-ordering watchers (lunch-open/lunch-finalize, scheduled Mon-Fri 11:00 / 11:35 UTC inlobu-team) will fail when they fire on Monday — same code path.Fix
Add a parallel branch that accepts URL-safe base64 (
[A-Za-z0-9_-], no padding), with the same round-trip + length check so typo'd keys are still rejected.No key rotation needed — URL-safe and standard base64 decode to identical 32 bytes.
Test plan
bun test packages/core/src/__tests__/encryption-key-validation.test.ts— 7 pass (added one for URL-safe).bun run typecheckclean.lobu chat -c lobu -a food-ordering "ping"againstlobu-teamreturns a real reply (currently 500s).[worker-auth] Error verifying tokenlines stop appearing insummaries-app-lobu-apppod logs.Summary by CodeRabbit
New Features
Tests