Skip to content

Security hardening: package bumps, multi-tenant validator, cert load, lib split#1

Merged
mghabin merged 1 commit intomainfrom
security-hardening
Apr 24, 2026
Merged

Security hardening: package bumps, multi-tenant validator, cert load, lib split#1
mghabin merged 1 commit intomainfrom
security-hardening

Conversation

@mghabin
Copy link
Copy Markdown
Owner

@mghabin mghabin commented Apr 24, 2026

Closes the bucket-1 backlog from the senior-engineer review.

What

  • Package bumps (closes GHSA-rpq8-q44m-2rpg + restores audit gate)
    • Microsoft.Identity.Web 3.5.0 → 4.8.0
    • Microsoft.Identity.Abstractions 12.0.0 (newly pinned)
    • Microsoft.Identity.Client 4.66.2 → 4.83.3
    • OpenTelemetry.* → 1.15.x
    • Azure.Identity 1.13.1 → 1.21.0
    • System.Security.Cryptography.Xml 10.0.7 (newly pinned)
    • Microsoft.AspNetCore.DataProtection 10.0.7 (closes GHSA-9mv3-2cwr-p262 transitively)
    • Drop NU1902;NU1903 from WarningsNotAsErrors so future advisories break the build.
  • Multi-tenant IssuerValidator: replaced the substring Contains + non-JsonWebToken bypass with AadIssuerValidator + exact-match allow-list. Throws at startup if MultiTenant has empty AllowedTenantIds.
  • AccountingService cert load: IHostedService for async cert download (out of DI ctor) + ManagedIdentityCredential (no fallback chain) + X509KeyStorageFlags.EphemeralKeySet.
  • DownstreamProbeService: widen catch so MSAL/KV failures surface in logs and exit code.
  • Ftgo.Auth split into Ftgo.Auth.Client (no AspNetCore dep — referenced by workers) and Ftgo.Auth (web shell with FrameworkReference). Workers no longer pull Microsoft.AspNetCore.App transitively.
  • ApiGateway double-MIW reg: AddEntraAuth now takes an optional Action<MicrosoftIdentityWebApiAuthenticationBuilder> for downstream-API wiring; gateway uses one path.
  • RequireClientAppAttribute: derive AuthorizeAttribute, hardcode Policy, drop public setter, fix doc.
  • KitchenService: replace obsolete ManagedIdentityCredential() ctor with ManagedIdentityId.SystemAssigned.
  • Tests: add EntraAuthJwtPostConfigure audience pinning, scheme filter, multi-tenant install/skip, empty-allow-list guard, and RequireClientAppAttribute shape. 8 → 15 passing.

Verify

dotnet build EntraAuthPatterns.slnx        # 0 warnings 0 errors
dotnet test  EntraAuthPatterns.slnx        # 15/15 pass
dotnet format --verify-no-changes          # clean

… cert load, split Auth lib

- Bump Microsoft.Identity.Web 3.5.0 -> 4.8.0 + Microsoft.Identity.Abstractions 12.0.0
  (closes GHSA-rpq8-q44m-2rpg). Bump MSAL, OpenTelemetry, Azure.Identity, M.E.* to
  latest patches. Drop NU1902/NU1903 from WarningsNotAsErrors so future advisories
  break the build again.
- Replace custom multi-tenant IssuerValidator (substring Contains + non-JsonWebToken
  bypass) with Microsoft.IdentityModel.Validators.AadIssuerValidator, then enforce
  AllowedTenantIds with exact-match equality. Throw at startup if MultiTenant has
  empty allow-list.
- Ftgo.AccountingService: move sync KV cert download out of DI ctor into
  IHostedService (StartAsync), switch DefaultAzureCredential -> ManagedIdentityCredential,
  load with X509KeyStorageFlags.EphemeralKeySet.
- DownstreamProbeService: widen catch beyond HttpRequestException so MSAL/KV failures
  surface in logs and exit code instead of being swallowed.
- Split Ftgo.Auth into Ftgo.Auth.Client (no AspNetCore dep, used by workers) and
  Ftgo.Auth (web-host shell with FrameworkReference + JwtBearer/MIW/PostConfigure/
  ProblemDetails/RequireClientApp). Workers no longer pull Microsoft.AspNetCore.App
  transitively.
- Ftgo.ApiGateway: stop double-registering MIW. AddEntraAuth now accepts an optional
  Action<MicrosoftIdentityWebApiAuthenticationBuilder> for downstream-API wiring.
- RequireClientAppAttribute: derive AuthorizeAttribute, hardcode policy in ctor,
  drop mutable Policy setter, fix lying XML doc.
- KitchenService: replace obsolete ManagedIdentityCredential() ctor with
  ManagedIdentityCredential(ManagedIdentityId.SystemAssigned).
- Tests: add EntraAuthJwtPostConfigure coverage (audience pinning, scheme filter,
  IssuerValidator install/skip, multi-tenant empty-allow-list guard) and a
  RequireClientAppAttribute shape test. 8 -> 15 passing tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mghabin mghabin merged commit b7318ef into main Apr 24, 2026
4 checks passed
@mghabin mghabin deleted the security-hardening branch April 24, 2026 18:08
mghabin added a commit that referenced this pull request May 1, 2026
…st-class tier (#103)

## Why

The auto-deployed first cloud tier was named **`dev`**, which conflated
two things:

1. The tier where developer-machine work happens (a laptop, no
   Azure resources, `dotnet run`).
2. The tier where CI auto-deploys every push to \`main\` without a
   human gate (an Azure RG, real cloud APIs, public ingress).

Calling them both \"dev\" hid what was actually running where. The
honest
name for #2 is **\`ci\`** (built and deployed by CI, no human gate),
and #1 is its own first-class tier called **\`local\`** (no Azure, no
GH Environment, configured via \`appsettings.Development.json\` +
\`dotnet user-secrets\`).

Final ladder: **\`local → ci → ppe → prod\`**. \`ppe\` (Microsoft jargon
for \"pre-production environment\", analogous to \"staging\" elsewhere)
is unchanged.

## What this PR does

**Phase A — codebase rename** (commit \`dbd6537\`):

- \`infra/bicep/{azure,main,bootstrap}.dev.bicepparam\` →
\`.ci.bicepparam\`
- \`@allowed([ 'dev', 'ppe', 'prod' ])\` → \`@allowed([ 'ci', 'ppe',
'prod' ])\` in main/azure/bootstrap.bicep
- \`devPublicClients\` var → \`ciPublicClients\` (gated on
\`environmentName == 'ci'\`)
- \`scripts/{bootstrap-env,provision-apps}.sh\`: default ENV is now
\`ci\`; \`ENV=dev\` still accepted with a deprecation warning
- \`.github/workflows/{cd,cd-cleanup,deploy-env,ci}.yml\`:
\`deploy-dev\` job → \`deploy-ci\`, matrix env list, default
workflow_dispatch input

**Phase B — docs sweep** (commit \`cb6021e\`):

- \`docs/environments.md\` rewritten as a **4-tier table** (local / ci /
ppe / prod) with explicit \`ASPNETCORE_ENVIRONMENT\` mapping per tier
and idle cost
- \`docs/run-locally.md\` opening promotes \`local\` to the documented
first tier
- \`glossary.md\` gains entries for \`ci\`, \`local\`, \`ppe\` with
primary-source citations
- Mechanical rename across operations.md, cost-zero.md, deploy-cloud.md,
sample-setup.md, matrix.md, best-practices.md, README.md, DOCTRINE.md,
coverage-map.md

**Phase C — live cutover runbook** (commit \`6b39dfe\`):

- \`docs/operations.md\` gains a \"Renaming a deployment tier (one-shot
live cutover)\" runbook with the **OIDC-safe step order** (provision new
tier → migrate \`ENTRA_CONFIG_JSON\` → smoke test → tear down old). The
non-obvious bit is FIC subjects: the GitHub OIDC token's \`sub\` claim
is matched **verbatim** against the FIC subject, so renaming the GH
environment before the FIC fails with \`AADSTS70021\`.

**Plus** (commit \`f35c5d2\`): README's stale references to the deleted
credential-patterns demo (\`orderservice\` / \`restaurantservice\`) are
updated to the live app reg names (\`orders-api\` /
\`restaurants-api\`).

## Live cutover not yet executed

The PR is **codebase-only**. Live Azure / GitHub changes (provisioning
\`rg-ftgo-ci-eastus\`, migrating \`ENTRA_CONFIG_JSON\` to a new \`ci\`
GH
env, updating FIC subjects, tearing down \`rg-ftgo-dev-eastus\`) are
deferred to a one-shot operation following the runbook in this PR
(\`docs/operations.md\` → \"Renaming a deployment tier\" section). Until
that happens, the \`dev\` tier still works because:

- \`scripts/{bootstrap-env,provision-apps}.sh\` accept \`ENV=dev\` as a
deprecated alias and translate it to \`ENV=ci\` (with a stderr warning).
- The existing \`dev\` GitHub Environment is unchanged. The auto-CD
  push trigger will fail to find the \`ci\` env after this PR merges
  until the cutover runs — that's intentional, so the cutover is
  done deliberately, not as a side-effect of merge.

## Verification

- ✅ \`dotnet test EntraAuthPatterns.slnx\` → 66/66
- ✅ \`az bicep build\` on every \`.bicep\` and every \`.bicepparam\`
(BCP427 on the params is environment-variable interpolation in CI,
expected)
- ✅ \`actionlint .github/workflows/*.yml\` clean
- ✅ \`markdownlint-cli2\` (CI's pinned version) clean
- ✅ The positive-path Restaurants 200 test
(\`RestaurantsApp_Allows_AppTokenWithRoleAndAllowListedAzp\` in PR #102)
still passes — that addresses the \"we never tested 200\" gap I
previously surfaced
- 🟡 Live cloud probe deferred to the cutover (see runbook)

## Out of scope (deliberate)

- Renaming \`ppe\` → \`staging\` (PPE is Microsoft jargon, kept by
preference)
- Adding ephemeral PR-preview environments (separate doctrine decision)
- Cosmetic rename of the live app reg display names
\`ftgo-dev-{apigateway,orders-api,restaurants-api}\` → \`ftgo-ci-…\`.
The appIds (the only thing \`ENTRA_CONFIG_JSON\` references) are
unchanged. Optional follow-up.

---------

Co-authored-by: mghabin <81494213+MohammadGhabin@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mghabin added a commit that referenced this pull request May 2, 2026
…rdened defaults) (#110)

Audit-driven cleanup wave addressing the user's three complaints: (1)
too much narrative commentary, (2) not prod-ready, (3) not following
Microsoft prod-code conventions / .NET design patterns / clean code.

## What changed (3 axes)

### 1. Comment cull (user's #1 complaint)

Centralized doctrine, removed duplicated narrative essays, kept only the
comments that prevent real bugs (security rationale, non-obvious gotchas
— max 1-2 lines each).

- **Bicep**: `cd-bootstrap.bicep` 16-line role-assignment recovery essay
→ 2-line pointer; `aca-stack.bicep` 11-line BCP138 paragraph → succinct
doc + structured `@metadata`; `app-registrations.bicep` 12-line FIC
essay → 5-line block.
- **Bash scripts**: `bootstrap-env.sh` 25-line + `provision-apps.sh`
38-line headers → 5-9 line orientation + pointers to `docs/`.
- **Authorization doctrine**: 5+ duplicated narrative blocks across
`Orders.Api` and `Restaurants.Api` source files → centralized in
**DOCTRINE.md** new section *Authorization-policy doctrine (canonical)*
covering the two-named-policy rule, the `[Authorize(Roles=...)]` trap
with `MapInboundClaims=false`, and the `azp` allow-list rationale.
Source files now reference DOCTRINE.md in 1-line summaries.

### 2. DevOps fixes

- **`pinact` job rename**: `verify` → `pinact` to match the
required-status-check name in the ruleset. **Real bug** — currently
dormant because all merges use `--admin` bypass.
- **`secrets: inherit` → explicit pass-through**: declared `secrets:`
block in `deploy-env.yml`'s `workflow_call`, pass `AZURE_TENANT_ID:`
explicitly from `cd.yml`. Defense in depth.
- **timeout-minutes** on every job (5-45 min depending on workload).
- **Bash hardening**: `IFS=$'\\n\\t'`, `trap ERR`, split SC2155
declare/assign in `provision-apps.sh`.
- **Bicep API bumps** (3 of 14 resources were stale):
`Microsoft.App/{containerApps,managedEnvironments}` preview →
`2025-07-01` GA; `Microsoft.KeyVault/vaults` preview → `2024-11-01` GA.
The other 11 resources were already on the latest GA.

### 3. Prod-readiness

- **Kestrel hardening**: `ConfigureEntraAuthKestrel()` pins
`MaxRequestBodySize=10 MB`, `RequestHeadersTimeout=30s`,
`KeepAliveTimeout=65s`, `AddServerHeader=false`. Wired into all 3 web
services. Closes a DoS gap where defaults are effectively unlimited.
- **HTTPS redirect at app level**: `AddEntraAuthHttpsRedirection()`
registers a 308 permanent redirect; `UseHttpsRedirection()` in
middleware. Defense in depth alongside ACA's `allowInsecure: false` edge
enforcement.
- **`EntraAuthRateLimiterOptions` validator**: bounds checks on
`PermitLimit`/`AppPermitLimit`/`Window`/`SegmentsPerWindow`.
`AddEntraAuthRateLimiter` now fails fast on misconfig instead of
silently breaking.
- **Test coverage gap closed**: `DownstreamProbeServiceTests`
(success-path log + exception-path `ExitCode=1` + `StopApplication`) —
the worker startup probe was untested.
`EntraAuthRateLimiterOptionsValidatorTests` covers fail-fast behaviour.
- **`nuget.config`** with `packageSourceMapping` pinning all packages to
`nuget.org` (defense against transitive package-source injection).

## What I deliberately did NOT change

- **Comments that prevent real bugs**: security rationale on
`MixedClaimsPolicies`, `RequireClientApp`,
`EntraAuthSecurityHeadersExtensions`. The audit's first-pass agent
argued these are "all justified" — I disagreed for the doctrine
duplication but agreed for the security-policy rationale (1-2 lines
each, prevents subtle bugs).
- **CAF resource naming** (cosmetic).
- **Diagnostic settings on every PaaS** (KV + LAW already have them; App
Insights doesn't really need them for this sample).
- **TimeProvider injection** (premature — no time-sensitive business
logic).

## Verification

- `dotnet test EntraAuthPatterns.slnx` → **72/72** (was 66; +6 from this
PR).
- `az bicep build` on every `.bicep` + `.bicepparam` — no errors.
- `actionlint .github/workflows/*.yml` — clean.
- `shellcheck scripts/*.sh` — clean.

## Out of scope (still tracked separately)

- #80 — M.T.P runner migration (upstream blocker, re-checked this
session).
- #83 — `WebApplicationFactory<Program>` migration for
`RestaurantsAppPolicyEndToEndTests`.
- #85 — kitchen-worker → ACA Job conversion.

Co-authored-by: mghabin <81494213+MohammadGhabin@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant