fix: enforce UTF-8 encoding on all database creation paths by cmeans-claude-dev[bot] · Pull Request #175 · cmeans/mcp-awareness

cmeans-claude-dev · 2026-04-08T17:09:43Z

Summary

Root cause: session database inherited SQL_ASCII encoding from template1, causing UnicodeEncodeError on non-ASCII SQL comments (the root cause behind PR fix: replace em dash with ASCII in session SQL comment #172)
All database creation paths now explicitly specify ENCODING 'UTF8' LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0
Externalized CREATE DATABASE SQL to sql/session_create_database.sql
Runtime guard validates exactly 1 {} placeholder in the SQL file — a stray {} in a comment silently broke psycopg.sql.SQL().format() via IndexError swallowed by the exception handler
Docker-compose files (main, demo, oauth) set POSTGRES_INITDB_ARGS for UTF-8 cluster init
Updated deployment docs: pg_hba rules use all databases, CREATE DATABASE commands include encoding
Benchmarks updated to match

QA

Prerequisites

pip install -e ".[dev]"
Deploy to test instance on alternate port (AWARENESS_PORT=8421)

Manual tests (via MCP tools)

- Auto-created session DB is UTF-8 — verified via test_creates_database_if_missing (checks encoding=6/UTF8, collate=C)
- Externalized SQL loads correctly — session_create_database.sql exists, 1 placeholder, tests pass
- Test passes — pytest tests/test_session_registry.py — 53/53 pass
- Guard test passes — TestCreateDatabaseSqlIntegrity — both placeholder validation tests pass
- pg_hba docs updated — deployment plan shows all databases with explanation
- CHANGELOG not needed — infrastructure hardening, no user-facing change

🤖 Generated with Claude Code

The session database inherited SQL_ASCII from template1, causing UnicodeEncodeError on non-ASCII SQL comments. Fix all creation paths: - session_registry: externalize CREATE DATABASE SQL with explicit ENCODING/LC_COLLATE/LC_CTYPE and TEMPLATE template0 - docker-compose (main, demo, oauth): add POSTGRES_INITDB_ARGS - benchmarks: add encoding to bench DB creation - docs: update manual CREATE DATABASE commands in deployment plan and session persistence spec - tests: verify auto-created database has UTF-8 encoding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The awareness user needs access to awareness, awareness_sessions, and postgres (for _ensure_database auto-create). Using per-database pg_hba rules caused connection failures when adding new databases. Updated deployment plan, design spec, and session persistence spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

en_US.UTF-8 is not available on all Postgres containers (CI testcontainers failed). C.UTF-8 is universally available and still provides UTF-8 encoding. Updated SQL, docker-compose files, benchmarks, docs, and tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

C.UTF-8 locale is not available on all Postgres environments (CI testcontainers failed). ENCODING 'UTF8' TEMPLATE template0 is sufficient — locale inherits from template0 (C). Manual provisioning docs keep C.UTF-8 since holodeck has it installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… SQL The right fix is matching the test environment to production, not removing locale from CREATE DATABASE. Configure the testcontainers Postgres with POSTGRES_INITDB_ARGS to initialize the cluster with C.UTF-8, then restore the full CREATE DATABASE with locale. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

C.UTF-8 locale name varies by OS (C.UTF-8 vs C.utf8) causing CREATE DATABASE to fail silently on some Postgres environments. The C locale is universally available and still allows UTF-8 encoding. Session DB stores IDs and timestamps — locale-sensitive collation isn't needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The comment '-- The {} placeholder is formatted...' contained a literal {} which psycopg.sql.SQL().format() treated as a second placeholder, causing IndexError silently swallowed by _ensure_database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

psycopg.sql.SQL().format() treats ALL {} as placeholders, including those in SQL comments. A stray {} in a comment silently broke database creation via IndexError swallowed by the exception handler. Added: - Runtime guard in _ensure_database: validates exactly 1 placeholder - Test: asserts session_create_database.sql has exactly 1 {} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-08T18:54:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Injects SQL with two {} placeholders into the cache and verifies _ensure_database fails safely without creating the database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmeans · 2026-04-08T19:15:56Z

Adding QA Active — reviewing UTF-8 enforcement.

cmeans

QA Review — Round 1

Thorough root-cause fix for the SQL_ASCII encoding issue. The {} placeholder guard is smart — catches the exact class of bug that caused the original problem (stray {} in comments silently breaks psycopg.sql.SQL().format()).

Code review

Area	Verdict
`session_create_database.sql` — UTF-8, template0, portable `'C'` locale	✅
Placeholder guard (exactly 1 `{}`) in `_ensure_database`	✅
SQL file comment explains `'C'` vs `'C.UTF-8'` portability choice	✅
Docker Compose — `POSTGRES_INITDB_ARGS` for UTF-8 cluster init (3 files)	✅
Deployment plan — `pg_hba.conf` widened to `all` databases, CREATE DATABASE has encoding	✅
Design spec — updated pg_hba explanation, deployment steps re-numbered	✅
Session persistence spec — CREATE DATABASE has encoding, pg_hba prerequisite added	✅
Benchmarks — CREATE DATABASE has encoding	✅
Test: `test_creates_database_if_missing` now verifies encoding=UTF8 and collate=C	✅
Test: `TestCreateDatabaseSqlIntegrity` — placeholder count guard + multi-placeholder rejection	✅
53/53 session tests pass	✅
721/721 full suite pass	✅
CI: all green	✅

Findings

1. [Substantive] README test count is 719, should be 721

2 new tests (TestCreateDatabaseSqlIntegrity) bring the total to 721.

2. [Observation] Locale: code uses 'C', docs use 'C.UTF-8'

The SQL file correctly uses 'C' for portability (explained in comment). The deployment plan and session persistence spec use 'C.UTF-8' for the holodeck (Debian). This means the auto-created session DB gets 'C' while manually created databases get 'C.UTF-8'. Acceptable for session data (IDs/timestamps), but noting the difference.

PR checkboxes

#	Test	Result
1	Auto-created session DB is UTF-8	✅ Verified via `test_creates_database_if_missing` (checks encoding=6, collate=C)
2	Externalized SQL loads correctly	✅ File exists, 1 placeholder, tests pass
3	UTF-8 test passes	✅ `53 passed`
4	Guard test passes	✅ Both integrity tests pass
5	pg_hba docs updated	✅ `all` databases, with explanation
6	CHANGELOG not needed	✅ Infrastructure hardening

Verdict

Finding #1 (test count) needs a fix. Finding #2 is an observation, non-blocking.

cmeans · 2026-04-08T19:17:57Z

Applying QA Failed — README test count 719→721. All checkboxes checked. One observation (locale C vs C.UTF-8), non-blocking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmeans-claude-dev · 2026-04-08T19:18:45Z

Dev response

QA finding: test count 719 → 721

Fixed in c39cc24 — updated both occurrences in README

Ready for re-review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmeans-claude-dev · 2026-04-08T19:20:13Z

Dev response (Round 2)

QA finding 1: README test count 719 → 721

Fixed in c39cc24

QA finding 2: locale mismatch — docs say C.UTF-8, code uses C

Fixed in 62ba869 — deployment plan and session persistence spec now use LC_COLLATE 'C' to match the code

Ready for re-review.

cmeans

QA Review — Round 2

Both findings fixed:

#	Finding	Fix
1	README test count 719→721	✅
2	Locale: docs used `'C.UTF-8'`, code used `'C'`	✅ All aligned to `'C'` (deployment plan + session persistence spec)

CI green. Zero new findings. Verdict: Pass — ready for signoff.

cmeans · 2026-04-08T19:25:24Z

Applying Ready for QA Signoff — test count and locale drift both fixed, CI green, zero findings.

cmeans

LGTM

cmeans-claude-dev Bot added the Dev Active Developer is actively working on this PR; QA should not start label Apr 8, 2026

github-actions Bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 8, 2026

cmeans-claude-dev[bot] and others added 3 commits April 8, 2026 13:20

cmeans force-pushed the fix/database-utf8-encoding branch from d9fa71d to e376e66 Compare April 8, 2026 18:20

cmeans-claude-dev[bot] and others added 6 commits April 8, 2026 13:27

style: ruff format session_registry.py

62af823

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: cover placeholder guard with multi-placeholder injection test

b2926e2

Injects SQL with two {} placeholders into the cache and verifies _ensure_database fails safely without creating the database. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmeans-claude-dev Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Dev Active Developer is actively working on this PR; QA should not start labels Apr 8, 2026

cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 8, 2026

cmeans reviewed Apr 8, 2026

View reviewed changes

cmeans added QA Failed QA found issues — needs dev attention and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 8, 2026

docs: update test count 719 → 721

c39cc24

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cmeans-claude-dev Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA and removed QA Failed QA found issues — needs dev attention labels Apr 8, 2026

docs: align manual CREATE DATABASE locale with code (C, not C.UTF-8)

62ba869

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 8, 2026

cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 8, 2026

cmeans reviewed Apr 8, 2026

View reviewed changes

cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 8, 2026

cmeans approved these changes Apr 8, 2026

View reviewed changes

cmeans added QA Approved Manual QA testing completed and passed and removed Ready for QA Signoff QA passed — ready for maintainer final review and merge labels Apr 8, 2026

cmeans-claude-dev Bot merged commit d74ce97 into main Apr 8, 2026
35 checks passed

cmeans-claude-dev Bot deleted the fix/database-utf8-encoding branch April 8, 2026 19:29

cmeans mentioned this pull request Apr 9, 2026

release: v0.16.1 #181

Closed

cmeans-claude-dev Bot mentioned this pull request Apr 9, 2026

release: v0.16.1 #182

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: enforce UTF-8 encoding on all database creation paths#175

fix: enforce UTF-8 encoding on all database creation paths#175
cmeans-claude-dev[bot] merged 12 commits into
mainfrom
fix/database-utf8-encoding

cmeans-claude-dev Bot commented Apr 8, 2026 •

edited by cmeans

Loading

Uh oh!

codecov Bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

cmeans commented Apr 8, 2026

Uh oh!

cmeans left a comment

Uh oh!

cmeans commented Apr 8, 2026

Uh oh!

cmeans-claude-dev Bot commented Apr 8, 2026

Uh oh!

cmeans-claude-dev Bot commented Apr 8, 2026

Uh oh!

cmeans left a comment

Uh oh!

cmeans commented Apr 8, 2026

Uh oh!

cmeans left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cmeans-claude-dev Bot commented Apr 8, 2026 • edited by cmeans Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

QA

Prerequisites

Manual tests (via MCP tools)

Uh oh!

codecov Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cmeans commented Apr 8, 2026

Uh oh!

cmeans left a comment

Choose a reason for hiding this comment

QA Review — Round 1

Code review

Findings

PR checkboxes

Verdict

Uh oh!

cmeans commented Apr 8, 2026

Uh oh!

cmeans-claude-dev Bot commented Apr 8, 2026

Dev response

Uh oh!

cmeans-claude-dev Bot commented Apr 8, 2026

Dev response (Round 2)

Uh oh!

cmeans left a comment

Choose a reason for hiding this comment

QA Review — Round 2

Uh oh!

cmeans commented Apr 8, 2026

Uh oh!

cmeans left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cmeans-claude-dev Bot commented Apr 8, 2026 •

edited by cmeans

Loading

codecov Bot commented Apr 8, 2026 •

edited

Loading