Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d8c51bc
docs(logging): design doc — split AuthLog into durable audit + centra…
windischb Jun 3, 2026
1e17c76
docs(logging): register the redesign doc in the dev-docs navigation
windischb Jun 3, 2026
3d2764e
docs(logging): rewrite redesign to the converged projection-over-stre…
windischb Jun 3, 2026
1bf1770
docs(logging): lock the erase-scrub mechanism — DeleteWhere, not re-p…
windischb Jun 3, 2026
1ca2084
docs(logging): make Track B explicitly opt-in / off by default
windischb Jun 3, 2026
cf868b9
feat(audit): Phase 0 — AuditEvents taxonomy + AuthAuditView EventProj…
windischb Jun 3, 2026
6c70595
docs(logging): correct projection type to EventProjection; mark Phase…
windischb Jun 3, 2026
dec92c1
feat(audit): Phase 1 — login-success markers + aggregated known-user …
windischb Jun 3, 2026
ca11501
docs(logging): mark Phase 1 shipped + note magic-link/external streak…
windischb Jun 3, 2026
f601f1f
feat(audit): Phase 2 (GDPR slice) — mask-and-keep erased audit, durab…
windischb Jun 3, 2026
e574dd0
docs(logging): §A.4.2 erase = mask-and-keep, durable via IncludeArchi…
windischb Jun 3, 2026
fdd3c11
feat(audit): Phase 2 — tenant audit read endpoint (GET /api/admin/audit)
windischb Jun 3, 2026
d4d6d0a
docs(logging): Phase 2 read endpoint shipped; AuditSettings/fan-out/U…
windischb Jun 3, 2026
a99ac36
feat(audit): Phase 2 — per-realm audit visibility window (AuditSettings)
windischb Jun 3, 2026
b3977bc
docs(logging): Phase 2 visibility window shipped; only SPA UI + defer…
windischb Jun 3, 2026
8a0f32e
feat(audit): Phase 2 — SPA AuditLogView (tenant audit grid + category…
windischb Jun 3, 2026
4092a6b
docs(logging): Phase 2 SPA AuditLogView shipped + DevTools-verified
windischb Jun 3, 2026
939e1bd
feat(audit): show the actor — resolve user identity at read time (era…
windischb Jun 3, 2026
627b9c8
docs(logging): read-time actor resolution from erasure-masked Applica…
windischb Jun 3, 2026
f4aefbf
feat(audit): Phase 3a — streamless security/ops store scaffold
windischb Jun 3, 2026
33758ca
feat(audit): Phase 3c — migrate streamless "Auth:" sites to typed emit
windischb Jun 3, 2026
1728615
feat(audit): Phase 3b — Security-log read surface + audit-of-the-audi…
windischb Jun 3, 2026
dadda49
feat(audit): Phase 3d — Quartz prune job for the streamless security …
windischb Jun 3, 2026
355b140
feat(audit): Phase 3e — repurpose AuthLogView as the tenant Security …
windischb Jun 3, 2026
db86b81
feat(audit): Phase 3f — delete legacy AuthLog, LIA, tests, docs
windischb Jun 3, 2026
09a12a1
feat(audit): Phase 4a — Serilog → OTLP log export sink (off by default)
windischb Jun 3, 2026
e20504d
feat(audit): Phase 4b/4c — collector redaction-ruleset v1 + end-to-en…
windischb Jun 3, 2026
e3eea2b
feat(audit): Phase 4d — log export/redaction docs + local observabili…
windischb Jun 3, 2026
c1adde7
docs(audit): Phase 4e — mark Phase 4 shipped in the redesign design doc
windischb Jun 3, 2026
352d1ee
feat(audit): Phase 4f — adversarial-review fixes (redaction-ruleset v…
windischb Jun 3, 2026
abd73ce
feat(audit): username source-belt — log user.Id, mask attempted ident…
windischb Jun 3, 2026
bff9c2e
fix(observability): append /v1/<signal> for HttpProtobuf OTLP metrics…
windischb Jun 4, 2026
991b39d
fix(observability): make plaintext OTLP metrics/traces export actuall…
windischb Jun 4, 2026
4790db0
feat(observability): Phase 5 — in-app per-realm live error feed
windischb Jun 4, 2026
5a5582a
refactor(admin): combine Audit + Security into one tabbed /admin/logs…
windischb Jun 4, 2026
b06cf1a
docs(audit): Phase 2 — AuditSettings visibility window is shipped, no…
windischb Jun 4, 2026
3b4c41c
docs(architecture): ADR — persistence model, event sourcing vs. flat …
windischb Jun 4, 2026
cc32ea7
docs(codeql): triage Bucket 3 — masked-PII exposure FPs (PR #51)
windischb Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions dev-docs/.vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ export default withMermaid(defineConfig({
{ text: '⭐ Federation v1 — Implementation Spec', link: '/future-features/federation-v1-design' },
{ text: '⭐ Account Lifecycle — Email + Deletion Plan', link: '/future-features/lifecycle-email-deletion-plan' },
{ text: '⭐ Production-Readiness Audit 2026-05-13', link: '/future-features/production-readiness-audit-2026-05-13' },
{ text: '⭐ Logging & Audit Redesign', link: '/future-features/logging-audit-redesign' },
{ text: 'HA / Multi-Instance Readiness', link: '/future-features/ha-multi-instance' },
{ text: 'Realm Backup / Restore / DR', link: '/future-features/realm-backup-restore' },
{ text: 'Enterprise SSO — SAML + LDAP', link: '/future-features/enterprise-sso-saml-ldap' },
Expand Down Expand Up @@ -113,6 +114,7 @@ export default withMermaid(defineConfig({
{ text: 'Overview', link: '/architecture/' },
{ text: 'Authentication slice', link: '/architecture/authentication' },
{ text: 'Authorization slice', link: '/architecture/authorization' },
{ text: 'Persistence model (ES vs. flat docs)', link: '/architecture/persistence-model' },
],
},
{
Expand Down
53 changes: 53 additions & 0 deletions dev-docs/architecture/persistence-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Persistence model — event sourcing vs. flat documents

> **Status: accepted (2026-06-04).** Modgud is a *hybrid* persistence system by design — event-sourced aggregates **and** flat Marten documents in the same store. This page records when to reach for which on a **new** feature, and the discipline that keeps the hybrid safe as the app grows. **Decision: keep the existing model as-is (no rebuild); choose best-fit per feature going forward.**

## Why this exists

The IAM core was rebuilt on an event-sourced foundation (Marten), and event sourcing (ES) carries a real, recurring tax: projection-vs-aggregation modelling, no cheap targeted per-stream rebuild, source-gen for every `Apply`/`Create`, subclass registration in two places, event evolution via tolerant JSON, and — the big one — the tension between an append-only log and GDPR erasure (mask-bytes-in-place + archive + scrub-the-projection, instead of a plain `DELETE`). That tax is worth paying where a feature genuinely needs ES, and pure overhead where it does not. This doc stops "we event-source here because we event-source everywhere" from becoming the default.

## The decision

**Default to a flat Marten document.** Reach for an event-sourced aggregate only when the feature needs at least one of:

| Trigger | Example |
|---|---|
| **History / "who changed what, when" is a first-class requirement** | the user-aggregate audit trail; OAuth client config changes |
| **Non-trivial invariants / a real state machine** | account lifecycle (active → locked → deactivated → deleted), OAuth grant state |
| **Temporal queries / rebuildable read models** | "what did this look like at time X"; projecting one stream into several views |

If none of these apply — settings, lookups, associations, caches, ephemeral challenges, the streamless security log — use a **flat document**. The friction of ES buys nothing there.

This is *already* how Modgud is built: the user and OAuth aggregates are event-sourced; `ApplicationUser`, `UserSecurityData`, sessions, external links, passkeys, `RealmSettings`, and the streamless `SecurityAuditEntry` store are flat documents. "Best fit per feature" is not a new direction — it is the existing de-facto architecture, written down.

## The safety rules (what keeps a hybrid from breaking)

The recurring worry with mixing ES and plain CRUD is: *a stored event references an entity that lives in another store and was hard-deleted — does replay/projection then dangle?* This is a **design-discipline** problem, not an ES-vs-CRUD problem (any system with references and deletes must answer it — even pure CRUD chooses cascade vs. set-null per foreign key). ES only forces you to confront it earlier, because replay asks "and if the target is gone?". The rules:

1. **Cross-boundary references are IDs, resolved at read time, and must tolerate absence.** A projection or read endpoint that joins to another store shows a tombstone / "unknown" when the target is gone — it never assumes existence and never crashes. There are no hard cross-aggregate foreign keys.
2. **Tombstone — don't hard-delete — anything referenced as a resolution source.** Hard-delete only leaf/secondary data that nothing resolves against. Keep a tombstone for anything other streams or projections point at.
3. **Events are self-contained.** A projection reads the **event**, not a live lookup that can vanish. Either the event carries what the projection needs, or the projection joins at read time (rule 1) — never a hard dependency on a mutable row in another store.
4. **One aggregate = one consistency boundary.** Don't span a single invariant transactionally across an ES aggregate and a flat document and expect referential integrity — across that line it is eventually consistent. What must be valid *together* belongs in the same aggregate.
5. **Don't bake mutable cross-store data into a projection.** A projection should be self-sufficient from its own stream, or join at read time. Baking in another store's mutable field makes the projection silently stale until a rebuild.

## Worked example — the GDPR erase (this discipline, in production)

`GdprService.PerformPermanentEraseAsync` is the hybrid under maximum stress (an entity *must* disappear, but events reference it):

- The user's **event stream** is PII-masked in place (`ApplyEventDataMasking`) then **archived** — kept, hidden from live queries.
- `ApplicationUser` (a flat doc) becomes a **tombstone** (`deleted-{guid}`) — *deliberately not deleted*, so references still resolve (rule 2).
- Seven **streamless secondary docs are hard-deleted** (`UserSession`, `UserSecurityData`, `ExternalClaimsStore`, `StoredPasskeyCredential`, `UserChangeRequest`, `EmailOtpChallenge`, `ExternalIdentityLink`) — nothing resolves against them at read time.
- The `AuthAuditView` projection holds only the pseudonymous `UserId`; at read time it joins to `ApplicationUser` and an erased user surfaces as `deleted-{guid}` (rules 1 + 3). The "external identity linked" audit row survives even though the link doc is gone, because the row was projected from the **event**, not the deleted doc.

No dangling reference, because the discipline is applied: tombstone what is referenced, hard-delete only leaves, resolve tolerantly, keep events self-sufficient.

## What we will NOT do

- **No rebuild of the existing ES aggregates.** The tax is already paid (the machinery exists, the gotchas are documented in `engineering-gotchas/`); ripping it out is high-cost, high-risk, low-ROI. "Was ES the right call?" and "should we change it?" have different answers — the second is *no*.
- **No new event-sourced store just for symmetry.** (The first audit-redesign draft proposed exactly that and it was correctly discarded — see `future-features/logging-audit-redesign.md`.)

## Consequences

- Marten supports event streams **and** documents in one store, session, transaction, and Wolverine outbox — so the hybrid is first-class, not bolted-on, and a single feature can even use both atomically. This is a reason to *stay* on the current foundation rather than migrate.
- New features get the simpler model by default, so the ES tax stops spreading by inertia.
- The five rules above are the price of the hybrid: they must be checked whenever a new reference crosses a store boundary or a new delete path is added.
18 changes: 18 additions & 0 deletions dev-docs/codeql-triage.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,3 +293,21 @@ a valid reason — each finding gets its own audit.
document if the exact invocation matters.
- The post-flip `codeql.yml` change that triggered this:
commit `ebcdb2c`.

## Addendum 2026-06-04 — Bucket 3: `cs/exposure-of-sensitive-information` (masked-PII logging)

The logging/audit-redesign PR (#51) surfaced three new `cs/exposure-of-sensitive-information` alerts (medium). All false positives. **Resolved by dismissal**, not config — see the "why not a config fix" finding below, which is the load-bearing part of this entry (it stops the next maintainer re-investigating).

- **#42 `PendingAdminInviteService.cs:217` (prod):** `logger.LogInformation("… Email={MaskedEmail}", …, LogPiiMasking.MaskEmail(invite.Email))`. The email *is* masked before logging — but `MaskEmail` keeps the first local-part char + the full domain (deliberate, for ops triage), so CodeQL follows the method body, sees the return derived from the input, and flags it. Dismissed **false positive** (the value is masked per policy).
- **#40/#41 `OtelLogsRedactionTests.cs:122/131` (test):** the OTel-redaction E2E test logs a *synthetic* email/IP/JWT through the real collector and then asserts they are absent from the export — it deliberately handles fake PII to prove redaction works. Dismissed **used in test** (the "Tests" bucket this triage already normalises).

### Why not a config fix (verified locally with `gh codeql`, 2026-06-04)

Both "durable" routes were built and tested against a local C# DB before being rejected — record this so it isn't re-attempted:

1. **Models-as-Data (neutral/barrier/summary) does NOT work for this query.** `ExposureOfPrivateInformationQuery.qll` defines its barrier as `isBarrier(node) { node instanceof Sanitizer }` with `Sanitizer` a hardcoded abstract QL class — it never consults MaD. A `neutralModel` for `LogPiiMasking.MaskEmail`/`MaskUsername` loaded as **"unused"** and changed nothing. So you cannot teach this query a sanitizer via a data-extension pack.
2. **A custom query (thin wrapper adding `MaskEmail`/`MaskUsername` as a `Sanitizer` subclass) works but is operationally worse.** It correctly cleared #42 + the other masked sites locally. But it needs a new rule id (`cs/exposure-of-sensitive-information-modgud`) and excluding the built-in to avoid double-reporting — and code-scanning dismissals are keyed to the rule id, so the **~19 already-dismissed** findings (the local DB shows 26 total; CI shows 3 because 23 are dismissed) would re-appear under the new id. The one-time re-dismissal churn + maintaining a forked security query outweighs the benefit for an advisory FP.

**Consequence / standing guidance:** a *future* masked-PII log site (`LogPiiMasking.Mask*` whose result is logged) will flag `cs/exposure-of-sensitive-information` again — dismiss it **false positive** with a one-line "masked via LogPiiMasking; query can't model the sanitizer (Bucket 3)". Only revisit the custom-query route if these become frequent enough that re-dismissing each one is more pain than a one-time 19-alert re-dismissal + a forked query.

If `MaskEmail` is ever changed to leak more (e.g. keep the whole local-part), revisit the model: the point of the neutral model is that the masking is *sufficient*, not that the method name is magic.
Loading
Loading