Security hardening: package bumps, multi-tenant validator, cert load, lib split#1
Merged
Security hardening: package bumps, multi-tenant validator, cert load, lib split#1
Conversation
… cert load, split Auth lib - Bump Microsoft.Identity.Web 3.5.0 -> 4.8.0 + Microsoft.Identity.Abstractions 12.0.0 (closes GHSA-rpq8-q44m-2rpg). Bump MSAL, OpenTelemetry, Azure.Identity, M.E.* to latest patches. Drop NU1902/NU1903 from WarningsNotAsErrors so future advisories break the build again. - Replace custom multi-tenant IssuerValidator (substring Contains + non-JsonWebToken bypass) with Microsoft.IdentityModel.Validators.AadIssuerValidator, then enforce AllowedTenantIds with exact-match equality. Throw at startup if MultiTenant has empty allow-list. - Ftgo.AccountingService: move sync KV cert download out of DI ctor into IHostedService (StartAsync), switch DefaultAzureCredential -> ManagedIdentityCredential, load with X509KeyStorageFlags.EphemeralKeySet. - DownstreamProbeService: widen catch beyond HttpRequestException so MSAL/KV failures surface in logs and exit code instead of being swallowed. - Split Ftgo.Auth into Ftgo.Auth.Client (no AspNetCore dep, used by workers) and Ftgo.Auth (web-host shell with FrameworkReference + JwtBearer/MIW/PostConfigure/ ProblemDetails/RequireClientApp). Workers no longer pull Microsoft.AspNetCore.App transitively. - Ftgo.ApiGateway: stop double-registering MIW. AddEntraAuth now accepts an optional Action<MicrosoftIdentityWebApiAuthenticationBuilder> for downstream-API wiring. - RequireClientAppAttribute: derive AuthorizeAttribute, hardcode policy in ctor, drop mutable Policy setter, fix lying XML doc. - KitchenService: replace obsolete ManagedIdentityCredential() ctor with ManagedIdentityCredential(ManagedIdentityId.SystemAssigned). - Tests: add EntraAuthJwtPostConfigure coverage (audience pinning, scheme filter, IssuerValidator install/skip, multi-tenant empty-allow-list guard) and a RequireClientAppAttribute shape test. 8 -> 15 passing tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mghabin
added a commit
that referenced
this pull request
May 1, 2026
…st-class tier (#103) ## Why The auto-deployed first cloud tier was named **`dev`**, which conflated two things: 1. The tier where developer-machine work happens (a laptop, no Azure resources, `dotnet run`). 2. The tier where CI auto-deploys every push to \`main\` without a human gate (an Azure RG, real cloud APIs, public ingress). Calling them both \"dev\" hid what was actually running where. The honest name for #2 is **\`ci\`** (built and deployed by CI, no human gate), and #1 is its own first-class tier called **\`local\`** (no Azure, no GH Environment, configured via \`appsettings.Development.json\` + \`dotnet user-secrets\`). Final ladder: **\`local → ci → ppe → prod\`**. \`ppe\` (Microsoft jargon for \"pre-production environment\", analogous to \"staging\" elsewhere) is unchanged. ## What this PR does **Phase A — codebase rename** (commit \`dbd6537\`): - \`infra/bicep/{azure,main,bootstrap}.dev.bicepparam\` → \`.ci.bicepparam\` - \`@allowed([ 'dev', 'ppe', 'prod' ])\` → \`@allowed([ 'ci', 'ppe', 'prod' ])\` in main/azure/bootstrap.bicep - \`devPublicClients\` var → \`ciPublicClients\` (gated on \`environmentName == 'ci'\`) - \`scripts/{bootstrap-env,provision-apps}.sh\`: default ENV is now \`ci\`; \`ENV=dev\` still accepted with a deprecation warning - \`.github/workflows/{cd,cd-cleanup,deploy-env,ci}.yml\`: \`deploy-dev\` job → \`deploy-ci\`, matrix env list, default workflow_dispatch input **Phase B — docs sweep** (commit \`cb6021e\`): - \`docs/environments.md\` rewritten as a **4-tier table** (local / ci / ppe / prod) with explicit \`ASPNETCORE_ENVIRONMENT\` mapping per tier and idle cost - \`docs/run-locally.md\` opening promotes \`local\` to the documented first tier - \`glossary.md\` gains entries for \`ci\`, \`local\`, \`ppe\` with primary-source citations - Mechanical rename across operations.md, cost-zero.md, deploy-cloud.md, sample-setup.md, matrix.md, best-practices.md, README.md, DOCTRINE.md, coverage-map.md **Phase C — live cutover runbook** (commit \`6b39dfe\`): - \`docs/operations.md\` gains a \"Renaming a deployment tier (one-shot live cutover)\" runbook with the **OIDC-safe step order** (provision new tier → migrate \`ENTRA_CONFIG_JSON\` → smoke test → tear down old). The non-obvious bit is FIC subjects: the GitHub OIDC token's \`sub\` claim is matched **verbatim** against the FIC subject, so renaming the GH environment before the FIC fails with \`AADSTS70021\`. **Plus** (commit \`f35c5d2\`): README's stale references to the deleted credential-patterns demo (\`orderservice\` / \`restaurantservice\`) are updated to the live app reg names (\`orders-api\` / \`restaurants-api\`). ## Live cutover not yet executed The PR is **codebase-only**. Live Azure / GitHub changes (provisioning \`rg-ftgo-ci-eastus\`, migrating \`ENTRA_CONFIG_JSON\` to a new \`ci\` GH env, updating FIC subjects, tearing down \`rg-ftgo-dev-eastus\`) are deferred to a one-shot operation following the runbook in this PR (\`docs/operations.md\` → \"Renaming a deployment tier\" section). Until that happens, the \`dev\` tier still works because: - \`scripts/{bootstrap-env,provision-apps}.sh\` accept \`ENV=dev\` as a deprecated alias and translate it to \`ENV=ci\` (with a stderr warning). - The existing \`dev\` GitHub Environment is unchanged. The auto-CD push trigger will fail to find the \`ci\` env after this PR merges until the cutover runs — that's intentional, so the cutover is done deliberately, not as a side-effect of merge. ## Verification - ✅ \`dotnet test EntraAuthPatterns.slnx\` → 66/66 - ✅ \`az bicep build\` on every \`.bicep\` and every \`.bicepparam\` (BCP427 on the params is environment-variable interpolation in CI, expected) - ✅ \`actionlint .github/workflows/*.yml\` clean - ✅ \`markdownlint-cli2\` (CI's pinned version) clean - ✅ The positive-path Restaurants 200 test (\`RestaurantsApp_Allows_AppTokenWithRoleAndAllowListedAzp\` in PR #102) still passes — that addresses the \"we never tested 200\" gap I previously surfaced - 🟡 Live cloud probe deferred to the cutover (see runbook) ## Out of scope (deliberate) - Renaming \`ppe\` → \`staging\` (PPE is Microsoft jargon, kept by preference) - Adding ephemeral PR-preview environments (separate doctrine decision) - Cosmetic rename of the live app reg display names \`ftgo-dev-{apigateway,orders-api,restaurants-api}\` → \`ftgo-ci-…\`. The appIds (the only thing \`ENTRA_CONFIG_JSON\` references) are unchanged. Optional follow-up. --------- Co-authored-by: mghabin <81494213+MohammadGhabin@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mghabin
added a commit
that referenced
this pull request
May 2, 2026
…rdened defaults) (#110) Audit-driven cleanup wave addressing the user's three complaints: (1) too much narrative commentary, (2) not prod-ready, (3) not following Microsoft prod-code conventions / .NET design patterns / clean code. ## What changed (3 axes) ### 1. Comment cull (user's #1 complaint) Centralized doctrine, removed duplicated narrative essays, kept only the comments that prevent real bugs (security rationale, non-obvious gotchas — max 1-2 lines each). - **Bicep**: `cd-bootstrap.bicep` 16-line role-assignment recovery essay → 2-line pointer; `aca-stack.bicep` 11-line BCP138 paragraph → succinct doc + structured `@metadata`; `app-registrations.bicep` 12-line FIC essay → 5-line block. - **Bash scripts**: `bootstrap-env.sh` 25-line + `provision-apps.sh` 38-line headers → 5-9 line orientation + pointers to `docs/`. - **Authorization doctrine**: 5+ duplicated narrative blocks across `Orders.Api` and `Restaurants.Api` source files → centralized in **DOCTRINE.md** new section *Authorization-policy doctrine (canonical)* covering the two-named-policy rule, the `[Authorize(Roles=...)]` trap with `MapInboundClaims=false`, and the `azp` allow-list rationale. Source files now reference DOCTRINE.md in 1-line summaries. ### 2. DevOps fixes - **`pinact` job rename**: `verify` → `pinact` to match the required-status-check name in the ruleset. **Real bug** — currently dormant because all merges use `--admin` bypass. - **`secrets: inherit` → explicit pass-through**: declared `secrets:` block in `deploy-env.yml`'s `workflow_call`, pass `AZURE_TENANT_ID:` explicitly from `cd.yml`. Defense in depth. - **timeout-minutes** on every job (5-45 min depending on workload). - **Bash hardening**: `IFS=$'\\n\\t'`, `trap ERR`, split SC2155 declare/assign in `provision-apps.sh`. - **Bicep API bumps** (3 of 14 resources were stale): `Microsoft.App/{containerApps,managedEnvironments}` preview → `2025-07-01` GA; `Microsoft.KeyVault/vaults` preview → `2024-11-01` GA. The other 11 resources were already on the latest GA. ### 3. Prod-readiness - **Kestrel hardening**: `ConfigureEntraAuthKestrel()` pins `MaxRequestBodySize=10 MB`, `RequestHeadersTimeout=30s`, `KeepAliveTimeout=65s`, `AddServerHeader=false`. Wired into all 3 web services. Closes a DoS gap where defaults are effectively unlimited. - **HTTPS redirect at app level**: `AddEntraAuthHttpsRedirection()` registers a 308 permanent redirect; `UseHttpsRedirection()` in middleware. Defense in depth alongside ACA's `allowInsecure: false` edge enforcement. - **`EntraAuthRateLimiterOptions` validator**: bounds checks on `PermitLimit`/`AppPermitLimit`/`Window`/`SegmentsPerWindow`. `AddEntraAuthRateLimiter` now fails fast on misconfig instead of silently breaking. - **Test coverage gap closed**: `DownstreamProbeServiceTests` (success-path log + exception-path `ExitCode=1` + `StopApplication`) — the worker startup probe was untested. `EntraAuthRateLimiterOptionsValidatorTests` covers fail-fast behaviour. - **`nuget.config`** with `packageSourceMapping` pinning all packages to `nuget.org` (defense against transitive package-source injection). ## What I deliberately did NOT change - **Comments that prevent real bugs**: security rationale on `MixedClaimsPolicies`, `RequireClientApp`, `EntraAuthSecurityHeadersExtensions`. The audit's first-pass agent argued these are "all justified" — I disagreed for the doctrine duplication but agreed for the security-policy rationale (1-2 lines each, prevents subtle bugs). - **CAF resource naming** (cosmetic). - **Diagnostic settings on every PaaS** (KV + LAW already have them; App Insights doesn't really need them for this sample). - **TimeProvider injection** (premature — no time-sensitive business logic). ## Verification - `dotnet test EntraAuthPatterns.slnx` → **72/72** (was 66; +6 from this PR). - `az bicep build` on every `.bicep` + `.bicepparam` — no errors. - `actionlint .github/workflows/*.yml` — clean. - `shellcheck scripts/*.sh` — clean. ## Out of scope (still tracked separately) - #80 — M.T.P runner migration (upstream blocker, re-checked this session). - #83 — `WebApplicationFactory<Program>` migration for `RestaurantsAppPolicyEndToEndTests`. - #85 — kitchen-worker → ACA Job conversion. Co-authored-by: mghabin <81494213+MohammadGhabin@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the bucket-1 backlog from the senior-engineer review.
What
NU1902;NU1903fromWarningsNotAsErrorsso future advisories break the build.Contains+ non-JsonWebTokenbypass withAadIssuerValidator+ exact-match allow-list. Throws at startup if MultiTenant has emptyAllowedTenantIds.IHostedServicefor async cert download (out of DI ctor) +ManagedIdentityCredential(no fallback chain) +X509KeyStorageFlags.EphemeralKeySet.catchso MSAL/KV failures surface in logs and exit code.Ftgo.Authsplit intoFtgo.Auth.Client(no AspNetCore dep — referenced by workers) andFtgo.Auth(web shell with FrameworkReference). Workers no longer pullMicrosoft.AspNetCore.Apptransitively.AddEntraAuthnow takes an optionalAction<MicrosoftIdentityWebApiAuthenticationBuilder>for downstream-API wiring; gateway uses one path.RequireClientAppAttribute: deriveAuthorizeAttribute, hardcodePolicy, drop public setter, fix doc.ManagedIdentityCredential()ctor withManagedIdentityId.SystemAssigned.EntraAuthJwtPostConfigureaudience pinning, scheme filter, multi-tenant install/skip, empty-allow-list guard, andRequireClientAppAttributeshape. 8 → 15 passing.Verify