Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions .githooks/pre-commit
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,82 @@ if [ -n "$ENCODING_ISSUES" ]; then
echo ""
fi

# DIRECT DATABASE ACCESS GUARD
#
# Python services outside core-api must not write directly to the database.
# All mutations must go through core-api internal HTTP endpoints.

DB_WRITE_VIOLATIONS=""

while IFS= read -r staged_file; do
[ -z "$staged_file" ] && continue

case "$staged_file" in
services/billing-service/app/*.py|services/billing-service/app/**/*.py|\
services/admin-service/app/*.py|services/admin-service/app/**/*.py)
# Skip test files
case "$staged_file" in
*/tests/*) continue ;;
esac

if [ -f "$staged_file" ]; then
if grep -Eq 'session\.(add|commit)\(' "$staged_file"; then
DB_WRITE_VIOLATIONS="$DB_WRITE_VIOLATIONS $staged_file"
fi
fi
;;
esac
done <<EOF
$STAGED_FILES
EOF

if [ -n "$DB_WRITE_VIOLATIONS" ]; then
echo "Direct database write detected in service layer:"
echo "$DB_WRITE_VIOLATIONS" | tr ' ' '\n' | sed '/^$/d; s/^/ - /'
echo ""
echo "billing-service and admin-service must not call session.add() or"
echo "session.commit() directly. Route all writes through the core-api"
echo "internal HTTP endpoints (see services/core-api/src/Curvit.Api/Controllers/Internal*)."
FAILED_CHECKS="$FAILED_CHECKS db-write-violation"
echo ""
fi

# Core-api controllers (non-Internal) must not inject CurvitDbContext directly.
# Only Internal* controllers are permitted to access the DB context.

CORE_API_CONTEXT_VIOLATIONS=""

while IFS= read -r staged_file; do
[ -z "$staged_file" ] && continue

case "$staged_file" in
services/core-api/src/Curvit.Api/Controllers/*.cs)
# Allow Internal* controllers — they are the intentional DB adapters
case "$staged_file" in
*/Internal*.cs) continue ;;
esac

if [ -f "$staged_file" ]; then
if grep -q 'CurvitDbContext' "$staged_file"; then
CORE_API_CONTEXT_VIOLATIONS="$CORE_API_CONTEXT_VIOLATIONS $staged_file"
fi
fi
;;
esac
done <<EOF
$STAGED_FILES
EOF

if [ -n "$CORE_API_CONTEXT_VIOLATIONS" ]; then
echo "CurvitDbContext injected in a non-Internal controller:"
echo "$CORE_API_CONTEXT_VIOLATIONS" | tr ' ' '\n' | sed '/^$/d; s/^/ - /'
echo ""
echo "Controllers other than Internal* must not depend on CurvitDbContext."
echo "Introduce a repository interface instead (see Curvit.Application.Interfaces)."
FAILED_CHECKS="$FAILED_CHECKS db-context-in-controller"
echo ""
fi

# RESULTS

if [ -n "$FAILED_CHECKS" ]; then
Expand Down
295 changes: 295 additions & 0 deletions docs/content-platform.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,295 @@
# Content Platform

This document describes the unified content platform introduced to replace the blog-only model.

Public URL patterns:
/content/blog/{slug} → generic content endpoint for ContentItems (ContentType=Blog)
/blog/{slug} → legacy BlogPosts endpoint (backward-compatible; reads BlogPosts)
/content/guides/{slug} → /guides/{slug}
/content/articles/{slug} → /articles/{slug}
/content/cv-for/{slug} → /cv-for/{slug}

**Supported content types**

| Content Type | URL pattern | Description |
|---|---|---|
| `Blog` | `/blog/{slug}` | Time-sensitive blog posts |
| `Guide` | `/guides/{slug}` | Evergreen how-to guides |
| `Article` | `/articles/{slug}` | Long-form articles |
| `CvFor` | `/cv-for/{slug}` | Role-specific CV pages |

---

## Data model

The canonical model is the `ContentItem` entity (C# domain) / `ContentItems` table (PostgreSQL).

### Core fields

| Field | Type | Notes |
|---|---|---|
| `Id` | UUID | Primary key |
| `ContentType` | string(50) | `Blog`, `Guide`, `Article`, `CvFor` |
| `Category` | string(100) | Optional grouping |
| `Title` | string(250) | Required |
| `Slug` | string(250) | Required; unique **per content type** |
| `Excerpt` | text | Short summary |
| `ContentMarkdown` | text | Full body in Markdown |
| `AuthorName` | string(200) | Required |
| `Status` | string(20) | See [Publishing lifecycle](#publishing-lifecycle) |

### SEO fields

| Field | Type |
|---|---|
| `SeoTitle` | string(250) |
| `SeoDescription` | string(500) |
| `CanonicalUrl` | string(500) |
| `TargetKeyword` | string(200) |
| `SearchIntent` | string(100) |

### Publication window

| Field | Type | Notes |
|---|---|---|
| `PublishedAt` | timestamptz | Set when first published |
| `StartDate` | timestamptz | Not visible before this date |
| `ExpiryDate` | timestamptz | Not visible after this date |
| `ReadingTimeMinutes` | int | Optional |

### Content provenance

| Field | Type | Values |
|---|---|---|
| `ContentSource` | string(50) | `Manual`, `AiGenerated`, `AiAssisted`, `Imported` |
| `CreatedBy` | string(200) | User identifier of creator |
| `LastEditedBy` | string(200) | User identifier of last editor |
| `CreatedFromContentId` | UUID | Source item ID when duplicated |
| `AiGenerated` | bool | True if fully AI-generated |
| `AiPromptVersion` | string(100) | Prompt version used (if AI) |

### Audit / workflow

| Field | Type | Notes |
|---|---|---|
| `AuthorId` | string(200) | Identity of the author |
| `ApprovedBy` | string(200) | Reviewer who approved |
| `PublishedBy` | string(200) | Who published it |
| `LastWorkflowActionBy` | string(200) | Who performed the last lifecycle action |
| `LastWorkflowActionAt` | timestamptz | When the last lifecycle action occurred |

### Content clusters

| Field | Type | Notes |
|---|---|---|
| `ClusterId` | UUID | Groups content into a topic cluster |
| `HubPageId` | UUID | Links to a hub/pillar page |
| `PrimaryTopic` | string(200) | E.g. `ATS`, `CV Writing`, `Career Change` |

### Editorial

| Field | Type | Notes |
|---|---|---|
| `EditorialNotes` | text | Internal notes, **never publicly exposed** |

### Timestamps

| Field | Type |
|---|---|
| `CreatedAt` | timestamptz |
| `UpdatedAt` | timestamptz |

---

## Slug uniqueness

Slugs are unique **within a content type**. The same slug value can exist for different content types.

```
/guides/writing-a-cv → Guide with slug "writing-a-cv"
/articles/writing-a-cv → Article with slug "writing-a-cv" (allowed)
```

The database enforces this via a unique index on `(ContentType, Slug)`.

---

## URL conventions

| Content type | Public URL | List URL |
|---|---|---|
| Blog | `/blog/{slug}` | `/blog` |
| Guide | `/guides/{slug}` | `/guides` |
| Article | `/articles/{slug}` | `/articles` |
| CvFor | `/cv-for/{slug}` | `/cv-for` |

A generic route also exists: `/content/{type-slug}/{slug}` where type-slug is `blog`, `guides`, `articles`, or `cv-for`.

---

## Publishing lifecycle

**Status values**

| Status | Description |
|---|---|
| `Draft` | Work in progress, not visible publicly |
| `Review` | Awaiting editorial review, not visible publicly |
| `Published` | Live and publicly visible (subject to date window) |
| `Archived` | Removed from public view |

**Transitions**

```
Draft → Review → Published → Archived
↑___________↓ (re-publish from archived)
```

A `Published` item is only visible to the public when:
- `Status == "Published"`
- `StartDate` is null **or** `StartDate <= now`
- `ExpiryDate` is null **or** `ExpiryDate >= now`

---

## Content roles

Four roles are available for content lifecycle management:

| Role | Can do |
|---|---|
| `ContentAuthor` | Create drafts, edit own drafts, submit for review |
| `ContentReviewer` | View review queue, approve or request changes |
| `ContentPublisher` | Publish approved content, schedule, unpublish, archive |
| `ContentAdministrator` | Full access to all content and queues |

---

## API endpoints

All endpoints are served by the **cms-service**.

### Public endpoints

| Method | Path | Description |
|---|---|---|
| `GET` | `/blog` | List published blogs |
| `GET` | `/blog/{slug}` | Get published blog by slug |
| `GET` | `/guides` | List published guides |
| `GET` | `/guides/{slug}` | Get published guide by slug |
| `GET` | `/articles` | List published articles |
| `GET` | `/articles/{slug}` | Get published article by slug |
| `GET` | `/cv-for` | List published CV-For pages |
| `GET` | `/cv-for/{slug}` | Get published CV-For page by slug |
| `GET` | `/content/{type}` | Generic list (type = blog/guides/articles/cv-for) |
| `GET` | `/content/{type}/{slug}` | Generic get |

**Query parameters for list endpoints**

| Param | Type | Description |
|---|---|---|
| `category` | string | Filter by category |
| `page` | int (≥1) | Page number (default 1) |
| `page_size` | int (1–100) | Items per page (default 20) |

Public endpoints return only `Published` items within the active date window. Draft, Review and Archived items are never returned.

### Admin endpoints (authentication required)

| Method | Path | Description |
|---|---|---|
| `GET` | `/admin/content` | List all content (all statuses) |
| `GET` | `/admin/content/{id}` | Get item including internal fields |
| `POST` | `/admin/content` | Create a new content item |
| `PUT` | `/admin/content/{id}` | Update a content item |
| `POST` | `/admin/content/{id}/publish` | Publish a content item |
| `POST` | `/admin/content/{id}/archive` | Archive a content item |

**Admin list query parameters**

| Param | Alias | Description |
|---|---|---|
| `content_type` | | Filter by content type |
| `category` | | Filter by category |
| `status` | | Filter by status |
| `search` | | Full-text search on title/excerpt |
| `target_keyword` | | Filter by target keyword |
| `page` | | Page number |
| `page_size` | | Items per page |

### Legacy blog admin endpoints (backward-compatible)

The original blog admin endpoints remain functional and serve the `BlogPosts` table unchanged:

| Method | Path |
|---|---|
| `GET` | `/admin/blog` |
| `GET` | `/admin/blog/{slug}` |
| `POST` | `/admin/blog` |
| `PUT` | `/admin/blog/{id}` |
| `DELETE` | `/admin/blog/{id}` |

---

## Migration approach

The migration `20260530000000_AddContentPlatform` performs the following steps:

1. Creates the `ContentItems` table with all new fields.
2. Copies all rows from `BlogPosts` into `ContentItems` with:
- `ContentType = 'Blog'`
- `ContentMarkdown` ← `Content`
- Status values mapped: `published` → `Published`, `archived` → `Archived`, `review` → `Review`, anything else → `Draft`
- `ContentSource = 'Manual'`
- `AiGenerated = false`
- All SEO, date and slug fields preserved exactly.

The original `BlogPosts` and `BlogSources` tables are **not dropped**. The legacy blog endpoints continue to read from `BlogPosts` to ensure zero-downtime backward compatibility.

### Verification

After running the migration, confirm:

```sql
SELECT
(SELECT COUNT(*) FROM "BlogPosts") AS blog_posts_before,
(SELECT COUNT(*) FROM "ContentItems" WHERE "ContentType" = 'Blog') AS blog_posts_after;
```

Both counts should be equal.

---

## Sitemap

All published content items across all content types should be included in the sitemap. Draft, Review and Archived items must be excluded.

Sitemap paths follow the [URL conventions](#url-conventions) table. The `PublishedAt` date should be used as `<lastmod>` where available, falling back to `UpdatedAt`.

---

## SEO guidance

- `SeoTitle`: 50–60 characters recommended.
- `SeoDescription`: 150–160 characters recommended.
- `CanonicalUrl`: Set explicitly when syndicating content.
- Each `(ContentType, Slug)` combination is unique — canonical URL collisions are prevented at the data layer.

---

## Adding a new content type

1. Add the new type name to `VALID_CONTENT_TYPES` in `services/cms-service/app/models/schemas.py`.
2. Add a URL-segment → type mapping entry to `_SLUG_TO_TYPE` in `services/cms-service/app/routers/content.py`.
3. Optionally add dedicated route functions (e.g. `list_published_new_type`) following the existing pattern.
4. No schema or database changes are required — the `ContentItems` table accommodates any content type string.
5. Update this document.

---

## Security

- Public endpoints never expose `Draft`, `Review` or `Archived` content.
- `EditorialNotes`, `AiPromptVersion` and other internal fields are excluded from public response schemas (`ContentItemSchema`). They are only present in `ContentItemAdminSchema`.
- Admin endpoints require authentication via the existing Curvit auth infrastructure.
- Markdown rendered on the frontend must be sanitised to prevent stored XSS. Use the existing content sanitiser pattern.
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ PY
echo "[8/8] Validate compose rendering for dev/staging/prod"
(
cd "${ROOT_DIR}"
IMAGE_TAG=sha-local docker compose -f docker-compose.yml config >/dev/null
IMAGE_TAG=sha-local docker compose --env-file .env.example -f docker-compose.yml config >/dev/null
IMAGE_TAG=sha-local STAGING_ENV_FILE=environments/staging/.env.example docker compose \
--env-file environments/staging/.env.example \
-f docker-compose.yml -f docker-compose.staging.yml config >/dev/null
Expand Down
1 change: 1 addition & 0 deletions services/admin-service/app/models/db_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ class DataSubjectRequest(Base):
received_at = Column("ReceivedAt", DateTime(timezone=True), nullable=False)
due_at = Column("DueAt", DateTime(timezone=True), nullable=False)
completed_at = Column("CompletedAt", DateTime(timezone=True), nullable=True)
notes = Column("Notes", String(2000), nullable=True)


class RetentionSettings(Base):
Expand Down
Loading
Loading