diff --git a/CHANGELOG.md b/CHANGELOG.md index 759180e6b..976b66f84 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,24 @@ All notable changes to this project will be documented in this file. ### Changed +- **@overeng/notion-md**: Compacted the VRS decision log by removing six superseded `.decisions/` records and redirecting every citation to the surviving decision. The reconciler/placeholder design — `0005` inline id-carrying placeholders, `0010` visible-placeholder deletion, `0011` block-level reconciliation, `0014` reconciliation-as-universal-push-engine, `0015` renderer-symmetric Markdown↔block converter — is wholly superseded by **decision 0016** (refuse lossy pages), and the `0013` stateless in-buffer schema fingerprint by **decision 0017** (ephemeral file-engine session, drift detected from the engine's base snapshot). The six records were deleted rather than tombstoned (their rationale lives in `experiments.md`), and all references — the `schema-snapshot.ts` doc-comment, the `01-editor`/`04-fidelity`/`06-data-source` spec/requirements, `experiments.md`, `README.md`, and the surviving decision records `0003`/`0008`/`0009`/`0016`/`0017`/`0019` — were redirected to the superseding decision (as a live link where one is cited, or as historical prose naming the superseded concept). `0016`/`0017` were tightened to state the supersession once, and the duplicated "Refined by 0017" blockquotes on `0003`/`0008`/`0009` trimmed to one line. No decision IDs were renumbered; the six deleted IDs simply disappear (decision log is now 13 records: `0001`–`0004`, `0006`–`0009`, `0012`, `0016`–`0019`). Zero links resolve to a deleted record. Docs-only — no library surface change. + +- **@overeng/notion-md / @overeng/notion-effect-client**: Consolidated the body/Markdown pipeline onto one canonical body form applied at BOTH Notion wire boundaries (decision 0019). The block-tree renderer (`treeToMarkdown`) joined every sibling block — including consecutive list items — with `\n\n`, so a tight Notion list pulled as a _loose_ CommonMark list (and a stray indented blank line appeared inside nested lists), while push canonicalized through remark: a two-oracle pull-loose / push-tight divergence that the whitespace-insensitive `semanticEquivalent` gate masked rather than caught. Fix: (1) `canonicalizeBlockMarkdown` now forces lists tight (`spread = false`) and folds line-ending normalization into its input; (2) **pull** is routed through it so pull and push agree by construction — the body a surface reads (`cat`/`edit`/file sync), the body hashed/compared, and the body pushed are the same canonical bytes; (3) `canonicalizeBlockMarkdown` + its remark/unified/unist deps **moved down into `@overeng/notion-effect-client`** beside `treeToMarkdown`/`media-url.ts`, and `observeFromSnapshots` canonicalizes the rendered body once at the source so the evidence fingerprint, the fidelity classifier, pull, hash, and push all see the same bytes (`semanticEquivalent` stays in notion-md as sync policy); (4) the dead client-side `markdownToBlocks` converter (`parseInlineMarkdown`/`parseMarkdownTable`/`parseTableRow`/`isTableSeparator`, plus its `mod.ts` re-exports) was deleted — under refuse-lossy/one-engine push goes through Notion's server-side parse, so it was on no path (no version bump; sole consumer); (5) the divergent duplicate `stripChildAnchors` (sync.ts filter-only vs tree.ts filter+normalize+collapse) deduped onto one definition; (6) the dead `renderedMarkdown ?? endpoint` pull fallback removed (rendered Markdown is total on the pull path; a missing value is now a defect, closing the latent "headings run together" symptom-2 trap). The renderer deliberately stays parseable-not-canonical (its `\n\n` joins must NOT be made block-type-aware). Behavior: an already-synced list-bearing page re-canonicalizes loose → tight once on the next pull (benign, base-hash restated); the known Case B residual (paragraph-after-list, #756) is unchanged. `demo/showcase.nmd` re-baselined to the canonical shape. + +- **@overeng/notion-md**: Lifted leaked implementation detail out of four VRS requirements into the owning subsystem specs, where the mechanism already lives. **R45** (01-editor) dropped the `ProgressReporter`/no-op-default-`Layer`/`TaskList`-Layer/lazy-`createTuiApp`/#787-TDZ mechanism and is now a high-level testable constraint: progress instrumentation must be behavior-neutral and zero-cost when non-interactive (a write path is byte-identical with the indicator on or off) and must not reintroduce the umbrella's startup-crash class (#787) — mechanism cited in `01-editor/spec.md` §Sync Progress Indicator, decision 0018. **R31** (04-fidelity) dropped the internal `replaceRemoteBodyVerified` symbol (keeps the guarded-verified-replace + typed-title two-write-body-first constraint), cites `04-fidelity/spec.md` §Push Strategy. **R40** (04-fidelity) dropped the internal `markdownToBlocks`/`parseInlineMarkdown` names (keeps the "no client-side Markdown→block reconstruction; push through Notion's server-side parse" constraint + the live-proven loss list), cites decision 0015 and experiments.md. **R14** (06-data-source) dropped the `schema_snapshot`-comparison mechanism (keeps the refuse/explicit-accept-on-schema-drift, distinct non-`--force`-able exit code, resolve-by-re-pull constraint), cites `06-data-source/spec.md` §Data-Source Binding and Schema Drift. Notion-platform vocabulary (`replace_content`), block-type names, and observable behaviors (stderr/`isTTY`/exit codes) were intentionally kept as testable WHAT. Global requirement IDs unchanged (R01–R46, R42 retired); all internal links resolve. Docs-only — no library surface change. + +- **@overeng/notion-md**: Rebalanced the VRS subsystem boundaries so `03-sync-engine/spec.md` owns the guarded-push engine mechanics that had leaked into its consumers. Moved into `03`: the `update_content` vs `replace_content` write-verb selection, the canonical-base rule (base = the value the first pull emitted, never recomputed locally), the post-push `semanticEquivalent` gate, and settle/re-observe/re-pull — previously narrated in `04-fidelity/spec.md`'s "Push Strategy and Canonical Base" section and `02-file-sync/spec.md`'s push-flow steps 10–11. `02` keeps the file-surface pull/status/push orchestration and now delegates engine internals to `03` via cites; `04` keeps refuse-lossy / classifier / hosted-media and cites `03` where fidelity intersects the push (the `put`-specific two-write title order stays in `01-editor`). Root `spec.md` notes that observability (OTEL), verification, and the Effect-services overview stay cross-cutting at root by design (a future `07-observability` is a possible follow-up). Global requirement IDs unchanged (R01–R46, R42 retired); all internal links resolve. Docs-only — no library surface change. + +- **@overeng/notion-md**: VRS design docs restructured into six numeric layered subsystem dirs (`docs/vrs/01-editor/` … `06-data-source/`). `docs/vrs/spec.md` is now a thin architecture index (Status + Scope + the system-shape/dependency diagram + the subsystem-map table + the cross-cutting OpenTelemetry / Verification / residual long-term-decision lists); the §-section bodies moved into each subsystem's `spec.md`, and the requirement bullets distributed into each subsystem's `requirements.md` PRESERVING the global IDs (R01–R41 / A01–A05 / T01–T08 — one ID per subsystem, never renumbered, so every cross-reference still resolves; R42 stays retired). Root `requirements.md` keeps only the cross-cutting surface-boundary / Effect-native / observability / verification requirements + the global Assumptions/Tradeoffs; `vision.md` + `glossary.md` stay whole at root (inherited downward). `docs/vrs/decisions/` renamed to `docs/vrs/.decisions/` (dot-prefixed per `/sk-vrs`; `git mv` preserves history, all 0001–0017 records unchanged). External path references updated (`docs/vrs/README.md`, `impl-delta.md` section pointers). Docs-only — no library surface change. + +- **@overeng/notion-md**: Added the write-path **sync-progress indicator** design (R43–R45, decision 0018). When a write-path command (`edit` save, `put`, file `sync`) runs its multi-round-trip sync — which currently reads as a hang — it surfaces a discrete-stage `@overeng/tui-react` `TaskList` (observe → write-body → write-title → settle; per-row pending/active-spinner/done/failed + `X/total · elapsed`), **not** a fake `%` bar (there is no per-block progress data: `replace_content` is one opaque call and the block-tree pull discovers children by recursive crawl). Mechanism: a `ProgressReporter` Effect service (`Context.Tag`, no-op default Layer) the engine emits purpose-tagged stage events to, so the four near-identical pulls read as distinct human stages; the CLI `TaskList` Layer renders through the TUI seam to stderr, gated on `process.stderr.isTTY`, so `cat`'s stdout stays pure and `… | put > file` degrades to static. The TUI app is constructed lazily inside the handler to avoid re-entering the #787 concurrent-module-load TDZ; scope is the write path only (`cat` excluded). The redundant-4-pulls collapse (#788) is the complementary perf lever. Design-only (decision record + 01-editor requirements/spec + impl-delta Group H); not yet implemented. + +- **@overeng/notion-md**: Add Group G observability + coverage for the editor surface (R21–R24, R29). Span-assertion tests drive the real instrumented `cat`/`put`/`edit` paths against a fake gateway through the in-process otelite capture bridge and assert the actually-emitted span shape: `notion-md.cat` (`notion_md.page_id`, `notion_md.editor.mode`, `span.label`), `notion-md.put` (plus `notion_md.put.force`/`body_written`/`title_written`), and `notion-md.edit` (`notion_md.editor.mode`, `notion_md.edit.outcome`) wrapping the engine's `notion-md.sync-page`/`status-page`/`push-page` child spans (decision 0017). An R24 leak guard pushes a sentinel body through the gateway and asserts no captured span attribute carries the body, a signed-URL marker (`X-Amz-`/`Signature=`/`Expires=`), or a `Bearer` token. Adds the previously-untested `edit` conflict-relocation path: a concurrent same-line remote change produces an unmergeable `NmdConflictError`, which `edit` relocates out of `$TMPDIR` to a durable `.conflict.md` carrying base/local/remote bodies. (The spec span table's `nmd.*` attribute shorthand was never implemented; tests assert the real `notion_md.*` keys, divergence noted for separate reconciliation.) + +- **@overeng/notion-cli**: Wire the top-level `notion edit ` marquee alias (R18, decision 0004) — it delegates to the same engine-backed editor session as `notion md edit` via a shared command factory, and is the only first-level command outside the `md`/`schema`/`db` namespaces. `notion md cat|put|edit` remain composed through the existing dispatch. Verified at the type/dispatch level and live through the standalone `notion-md` binary (`cat`/`edit` against a real scratch page). The umbrella _runtime_ is currently blocked by a pre-existing tui-react TDZ at startup (#787); the standalone `notion-md` binary is unaffected. + +- **@overeng/notion-md**: Detect data-source schema drift before a property write via an engine `schema_snapshot` (R14, decision 0017, impl-delta Group F). For a data-source-backed page, `pullPage` now retrieves the parent data source (`GET /v1/data_sources/{id}` via `page.parent.data_source_id`) and captures the **writable** property schema into the sidecar `data_source` binding: a canonical projection of `{ name, type, sorted option names }` sorted by property name, options only for `select`/`multi_select`/`status`, hashing **names not ids** (a rename is id-preserving), excluding ids/colors/descriptions/status-groups/timestamps/`created_by`/`last_edited_by`/`request_id`/computed properties. Before any property write the engine re-retrieves the live schema, recomputes the hash, and on drift refuses with `NmdSchemaDriftError` (exit 6) rather than risk Notion silently auto-creating a `select` option for an unknown value name — distinct from the exit-7 value/body conflict and **not** `--force`-able; resolve by re-pulling. This is the file-engine path `edit --frontmatter` reuses (no stateless in-buffer fingerprint, no `put --frontmatter`; the decision-0013 fingerprint subsystem is gone). Standalone (non-data-source) pages have no snapshot and skip the check. A benign color-only schema change does not trip; the five structural mutations (add/remove/rename/retype property, add option) do. Verified with deterministic projection/drift unit tests + fake-gateway engine tests, and live E2E on real Notion: a row's `page.parent` decodes as `data_source_id`, the sidecar binding is captured non-null, a benign property push round-trips, and after a live structural schema mutation a property push refuses with exit 6 without writing. + - **@overeng/utils-dev/otelite**: Resolve the `otelite` binary from `OTELITE_BIN` before falling back to `PATH`, and document the plain-shell Nix workflow for focused wrapper tests. - **@overeng/otel-contract**: Add branded/refined OTEL name schemas (`OtelAttributeKey`, `OtelSpanName`, `OtelMetricName`, `OtelServiceName`), validate contract names/keys at definition time, add an Effect `Metric` runtime bridge for schema-first metric contracts, and extend the raw-OTEL lint rule to ban raw Effect `Metric.*` APIs outside approved contract/test boundaries. @@ -25,11 +43,25 @@ All notable changes to this project will be documented in this file. ### Fixed +- **@overeng/notion-effect-client, @overeng/notion-md**: Canonicalize hosted-media URLs so media-bearing bodies are idempotent and pushable (R36, decision 0007). Notion-hosted media (`type: "file"`) renders with an expiring signed S3 URL whose `X-Amz-*`/signature/`Expires` query params rotate on every pull, making the rendered body hash volatile (breaking `cat`→`put` idempotence and staling base hashes with zero edits) and causing `update_content`/`replace_content` pushes on media pages to be rejected by the post-push `semanticEquivalent` gate. A shared `canonicalizeMediaUrl` (and `canonicalizeMediaUrlsInMarkdown`) in `notion-effect-client` strips only the volatile signature/expiry query-param family by name (origin + path + any benign params kept). The renderer applies it in `getBlockUrl`'s Notion-hosted `file.url` branch so pull/`cat` output is deterministic; `canonical-markdown.ts` applies the identical function inside `canonicalizeBlockMarkdown`/`semanticEquivalent` so the gate compares the same canonical form. External (stable) URLs — including ones carrying a benign query param — are left untouched. Verified with deterministic unit tests (signed→canonical, rotated-signature equality, external-with-query untouched) and live E2E on real Notion: a hosted+external-media page's body hash is stable across two no-op pulls and a no-op push is not rejected by the post-push gate. +- **@overeng/notion-core, @overeng/notion-md**: Fix silent body-data loss on push for renderable-but-not-round-trip-safe blocks (R38, #785). The body-fidelity classifier (`classifyBodyCompleteness`) previously flagged only API-`unsupported` blocks, so `child_database` (`[embedded db]()`), `table_of_contents` (`[TOC]`), `synced_block`, `child_page`-in-body, and degraded `bookmark`/`embed`/`link_preview`/`breadcrumb`/`link_to_page` classified `complete`. They render to Markdown that Notion re-parses as a plain paragraph on push, so editing an _unrelated_ paragraph and pushing silently destroyed the untouched block (live-proven on real Notion; affected file `sync`, not just the planned editor). The classifier now also flags a curated set of known not-round-trip-safe types (criterion: body-Markdown rendering does not reparse to the same block — a type set because notion-core is pure and the endpoint/independent renderings agree, so a suffix heuristic can't catch them), surfaces the offending types in the lossy verdict, and the shared refusal gate (`assertRemoteMarkdownComplete`) refuses such pages at the pull on every surface (`cat`/`put`/`edit`/file `sync`) with a message naming the block class and pointing to the Notion UI (`NmdRemoteBodyLossyError`, exit 3). `child_page` has a dual role: a child page that is a tree node (its own `.nmd` file) is tolerated via `tolerateTreeChildPages` on the tree path while still being refused as a single page's body block, and any _other_ lossy block on the same tree node is still refused. Verified with deterministic classifier/gate unit + fake-gateway tests and live E2E (lossy page refused at pull, representable page still round-trips). Hosted/external media stays representable (decision 0007). +- **@overeng/notion-cli**: Fix the umbrella `notion` binary crashing at startup for every command with `ReferenceError: Cannot access 'createTuiApp' before initialization` (#787). `runRootCli` (`cli.ts`) imports the three command trees concurrently via `Promise.all`, and each schema/db renderer's `app.ts` built its `*App` by calling `createTuiApp(...)` as a module-load side-effect. Under Bun's concurrent async module evaluation that top-level call reached the shared `@overeng/tui-react` module graph while it was still mid-initialization, so a re-exported binding (`createTuiApp`, then its body's `createInterruptedAction`, …) was in the temporal dead zone. Root cause is the module-load side-effect, not a barrel export-order/TDZ bug and not a circular import (verified: the only barrel self-imports are JSDoc; converting `createTuiApp` to a hoisted function only relocated the crash to the next top-level const). Fix: all five renderer `app.ts` modules (`Diff`/`GenerateConfig`/`Generate`/`Introspect`/`Info`) now construct their `*App` lazily via a memoized `get*App()` accessor instead of at module top level, so no `createTuiApp(...)` runs during import; the shared `tokenOption`/`resolveNotionToken` helpers also moved out of `schema/mod.ts` so the concurrently imported `db` tree no longer reads a schema export while that module is still evaluating. The memoization preserves the previous single-instance (one registry/atom set) semantics. Confined to `notion-cli` — no change to shared `@overeng/tui-react`. Regression test (`src/concurrent-import.unit.test.ts`) spawns the umbrella's concurrent `Promise.all` import path under Bun (the binary's runtime) and asserts no TDZ crash; verified RED on the pre-fix code and GREEN after. +- **@overeng/notion-effect-client**: Restrict `canonicalizeMediaUrlsInMarkdown` to known Notion-media hosts so an external signed URL is preserved (PR #786 review, P1). The Markdown-string canonicalization path — run by `canonicalizeBlockMarkdown` on both pull and push — previously stripped the volatile `X-Amz-*`/signature/`Expires` params from _any_ signed-looking URL by param name, so an external private-S3 image embedded by URL would lose its load-bearing credentials and, because canonicalization now runs on both wire boundaries, could be surfaced and then persisted broken. Unlike the renderer (`getBlockUrl`, which only canonicalizes `type: "file"` URLs), the string path cannot see `file` vs `external`, so it now gates on a Notion-media host allowlist (`prod-files-secure.s3.us-west-2.amazonaws.com`, `file.notion.so`, `*.notion.so`, `*.notion-static.com`, and the older `s3.../secure.notion-static.com/...` bucket-in-path form); non-Notion hosts are left untouched. `canonicalizeMediaUrl` itself stays host-agnostic (the renderer feeds it only known-hosted `file` URLs, so its idempotence guarantee is unchanged). Regression test: an external signed S3 URL on a different bucket host inside Markdown is preserved verbatim. +- **@overeng/notion-effect-client**: Declare `@overeng/utils` as a runtime dependency (PR #786 review, P2). `config.ts` imports `sha256Hex` (`notionTokenFingerprint`) from `@overeng/utils` in production code, but the package listed it only as a dev/peer dependency, so a standalone install could fail to resolve it at runtime. Moved `utilsPkg` into the generated runtime `dependencies` (`package.json.genie.ts` + `dt genie:run`). The regeneration also reconciled a pre-existing stale generated `package.json` — a leftover external `peerDependencies` block the current genie source no longer emits — which had drifted the lockfile and broke the frozen-lockfile Nix build. +- **@overeng/notion-cli**: Apply the editor exit-code teardown to the umbrella `notion edit` alias (PR #786 review, P2). The standalone `notion-md` binary maps tagged editor failures to the scriptable exit codes (3 lossy / 6 schema-drift / 8 abort, …) via a `runMain` teardown, but the umbrella `notion` root used the default teardown, so `notion edit` collapsed those to the generic exit 1 — diverging from `notion-md edit` for scripts. The umbrella now wires the same `editorExitCode` teardown; safe for the non-editor commands (`editorExitCode` falls back to 1 for any unmapped failure and 0 on success, matching the previous default), with Ctrl+C now mapping to 130 consistent with the standalone binary. Runtime e2e of the umbrella binary remains gated on #787. - **@overeng/otelite**: Honor durability-before-ack — flush each export to the kernel before the 200/OK. `tokio::fs::File` buffers writes, so `write_all` alone did NOT guarantee the bytes reached the kernel before the sink acked; an independent reader (or a crash) before the next flush could miss them, contradicting R05 ("flush … before acking") and the `append_line` doc's own "durably reaching the kernel before returning" promise. This surfaced as a CI flake in the `durable_before_ack` gate (a read immediately after the 200 occasionally saw an empty file under thread contention — reproduced ~1/60 at 16 test threads). Fix: `SignalFile::append_line` / `append_json` now `flush()` after `write_all`, before returning. This is a flush, not an fsync — `sync_all` (physical-disk durability) stays deferred to shutdown, so the M2 "no per-export fsync under the lock" throughput decision is preserved. Verified: 0 failures over 200 × 16-thread runs (was ~1/60). - **@overeng/otelite**: Make the HTTP-JSON metrics receive path lossless, fixing two silent data-loss bugs a stress test surfaced. The upstream `opentelemetry-proto` `with-serde` deserialize — which the receiver used to BUILD the proto value the sink then re-serialized — silently drops several metric JSON shapes: a `sum`/`gauge` `NumberDataPoint` whose int64 value is the default string form (`"asInt":"7"`) lost its value entirely (captured null), and a regular `histogram` metric was dropped down to `{name,description,unit,metadata}` (its data oneof gone). Both returned HTTP 200 + bumped `counts.metrics` → a silent mis-capture that violates the lossless + "loud, never silent" contracts (decisions/0011). Fix: on the JSON metrics path, `with-serde` still runs purely as the dialect VALIDATOR (Err → 400 + `note_rejected`, gate unchanged), but on success the receiver now persists the VALIDATED RAW JSON body verbatim (re-emitted through `serde_json::Value` via the new `Sink::write_metrics_json`, counting metrics from the JSON structure) instead of the lossy proto re-serialization. Since the body is already canonical OTLP/JSON and `inspect` walks raw JSON, the JSON metrics path is now lossless for string-int64 sums/gauges, regular histograms, AND exponential histograms — the last also RESOLVING the previously-documented exp-histogram-on-JSON limitation for the receive path. Traces/logs JSON paths and all protobuf/gRPC paths are unchanged (already lossless). New gates (real receiver, no mocks): an HTTP-JSON round-trip of a string-int64 sum + histogram + exponential histogram all survive receive → capture → `inspect`; cross-transport equivalence extended to metrics (the same logical string-int64-sum + histogram over HTTP-JSON vs HTTP-protobuf vs gRPC flattens to equivalent `inspect` rows, the proto/gRPC fixture built natively to avoid the lossy `with-serde` source); and a loud-rejection guard that a malformed metrics JSON body still 400s + is captured nowhere. KNOWN RESIDUAL: the upstream metrics `with-serde` is more lenient than the trace one, so for metrics the JSON dialect gate is effectively structural (malformed JSON / hard field-type mismatches), tolerating some non-default dialect shapes (numeric int64 nanos, string enums) rather than rejecting them loudly — a stricter metrics dialect gate is a follow-up (#769, #772). ### Added +- **@overeng/notion-md**: Staged sync-progress + drift notes for `notion-md edit` (R43–R45, decisions [0018](packages/@overeng/notion-md/docs/vrs/.decisions/0018-staged-task-list-sync-progress.md)/[0020](packages/@overeng/notion-md/docs/vrs/.decisions/0020-no-live-watch-edit-stays-single-shot.md)). After the editor exits, the push made ~8–10 Notion round-trips emitting only OTEL spans — the terminal was silent and read as a hang. A `ProgressReporter` Effect service (`Context.Tag`, `progress.ts`) now receives purpose-tagged stage transitions the engine emits at the existing push call sites (observe → write-body → write-title → settle, with `write-title` shown as skipped when the title is unchanged), and `edit` wires a **stderr line** renderer on the write path, gated on `process.stderr.isTTY`. The static-line rung is deliberate (decision 0018 sanctions it): `edit` returns from a full-screen editor that owned the TTY, so a mounting `TaskList` TUI would fight the terminal, and lines sidestep the #787 module-load TDZ entirely; the animated `TaskList` Layer is the same Tag's later drop-in. The seam is **behavior-neutral and zero-cost** (R45): the emit helpers use `Effect.serviceOption` (no engine `R`-churn) and swallow all failures and defects (`Effect`), so a write path is byte-identical with the indicator absent, present, or even hostile — pinned by a test running a changed-buffer push with no / capturing / failing reporters and asserting an identical `EditResult`. The "+ warn" half reuses the guarded push's **existing** drift outcomes (no background poller, no extra `last_edited_time` pull): when the push auto-merges against a moved remote, or conflicts (exit 7, relocated `.conflict.md`), `edit` emits a visible stderr `note:` line. Live two-way watch mode (push-on-save + live upstream reflection in the editor) was investigated and **rejected** — the editor owning the TTY makes live feedback impossible and live buffer-reload unreachable plugin-free (decision 0020). Scope this pass is the `edit` path; `put`/file-`sync` emit through the engine but do not yet wire a render Layer. +- **@overeng/notion-md**: Test coverage for the canonical-body consolidation (decision 0019, `canonical-markdown.test.ts`). (1) **Two-oracle agreement on canonical inputs**: the system has two oracles for "did the body change" — the raw-hash oracle (`sha256Digest`, used by `classifyPlan` in `tree.ts`; byte-exact, not canonicalization-invariant) and the canon-invariant `semanticEquivalent`. Post-consolidation both wire boundaries route through `canonicalizeBlockMarkdown`, so the oracles must agree for canonical inputs: raw-hash-equal ⟺ semanticEquivalent-true. The test pins this with the **referee being neither oracle** (`canonicalize-then-byte-equal`), over a curated seed set of raw-spelling variants that must converge to identical canonical bytes (emphasis fold, loose/tight list, soft-wrap, bullet-marker style, heading spacing) plus a seeded `fast-check` property over arbitrary bodies, and asserts `canonicalizeBlockMarkdown` idempotence as the "already-canonical" premise. This guards the failure mode a future canonicalization change could silently introduce (e.g. `semanticEquivalent` newly calling two distinct canonical bodies equal would surface as a referee mismatch instead of hiding behind the corrupted oracle). (2) **Golden-file fixpoint over `demo/showcase.nmd`**: asserts the committed demo body is already canonical (`canonicalizeBlockMarkdown(demoBody) === demoBody` — a real `sync` would not rewrite it) and that canonicalization is idempotent over it, locking the re-baseline the consolidation performed and catching future drift. Test-only — no library surface change. +- **@overeng/notion-md**: Add `edit --read-only` — open a Notion page in `$VISUAL`/`$EDITOR`/`vi` for inspection only (the terminal analogue of `vim -R` / `git show`). It reuses the exact `cat` presentation (default `# title` + body, or the full `.nmd` envelope with `--frontmatter`) via a shared `projectPageBuffer` projection, writes it to a scoped `$TMPDIR` temp file, opens the editor, then on exit **never** calls any push/write path (no `syncPage`, `replaceRemoteBodyVerified`, or metadata/property writes): every edit is discarded, the temp tree is scope-cleaned, a `read-only: changes were not synced` note is printed to **stderr**, and the session exits 0. This is a deliberately lighter path than `edit` — a single observe/pull, no engine round-trip and no `NmdStateStore` — and ignores the editor's exit code entirely (a non-zero exit is just a clean no-op, never an exit-8 abort, since nothing is pushed). It keeps the SAFE default: the same lossy refusal as `cat`/`edit` (exit 3) at observe time. There is no base-hash/guard machinery (nothing is written), and no `--force` interaction to resolve because `edit` exposes no `--force` (force lives on `put`/`sync`). Wired through the shared `edit` command factory so `notion-md edit`, `notion md edit`, and the top-level `notion edit` alias all gain `--read-only`; emits the `notion_md.edit.outcome=read-only` OTEL attribute on the existing `notion-md.edit` span. Covered by fake-gateway e2e tests (no write method invoked — proven by `dieMessage` write paths — edits discarded, temp tree reaped, `--read-only --frontmatter`, non-zero editor exit, lossy refusal) and a span-shape test. OPEN QUESTION: read-only currently refuses a not-round-trip-safe page (exit 3) like `edit`; since it never pushes, it could relax this to let you _inspect_ a lossy page — deferred pending confirmation. +- **@overeng/notion-md**: Add the `cat` / `put` / `edit` editor surfaces (VRS "Editor Surfaces", impl-delta Group A; R30–R39). `cat [--frontmatter]` prints a Notion page as default-mode `# ` + body to stdout with the title+body base hash on **stderr** (`base-hash: sha256:…`, decisions 0001/0002/0006), or dumps the read-only `.nmd` envelope; `put <page> (--base-hash <h> | --force)` writes the body+title from stdin as two writes body-first (decision 0012) — guarded by default (exit 7 on drift), `--force` concurrency-only (decision 0009); `edit <page> [--frontmatter]` is an ephemeral file-engine session (decision 0017) that pulls into a `$TMPDIR` `.nmd`, opens `$VISUAL`/`$EDITOR`/`vi`, splices the edit back, pushes through the engine with a full-body `replace_content`, relocates any `.conflict.roughdraft.md` to a durable `<page>.conflict.md`, and scope-cleans the temp tree. All three refuse a not-round-trip-safe page at the pull (exit 3, Group C/R38) and share a title↔H1 splice helper with exact-byte round-trip (idempotent `cat`→`put` fixpoint). Distinct, scriptable exit codes (0/1/3/4/5/7/8/9/10) are mapped at the `runMain` teardown after finalizers close. Commands are wired under `notion-md cat|put|edit` and `notion md cat|put|edit`. Hosted-media URL canonicalization (Group B) and `--frontmatter` schema-drift exit 6 (Group F) are deferred; the editor serves the representable non-media majority. Covered by splice/command unit tests, a fake-gateway `edit` e2e, and live Notion E2E (cat→put fixpoint, guarded conflict, edit round-trip, lossy refusal). `cat <page> [--frontmatter]` prints a Notion page as default-mode `# <title>` + body to stdout with the title+body base hash on **stderr** (`base-hash: sha256:…`, decisions 0001/0002/0006), or dumps the read-only `.nmd` envelope; `put <page> (--base-hash <h> | --force)` writes the body+title from stdin as two writes body-first (decision 0012) — guarded by default (exit 7 on drift), `--force` concurrency-only (decision 0009); `edit <page> [--frontmatter]` is an ephemeral file-engine session (decision 0017) that pulls into a `$TMPDIR` `.nmd`, opens `$VISUAL`/`$EDITOR`/`vi`, splices the edit back, pushes through the engine with a full-body `replace_content`, relocates any `.conflict.roughdraft.md` to a durable `<page>.conflict.md`, and scope-cleans the temp tree. All three refuse a not-round-trip-safe page at the pull (exit 3, Group C/R38) and share a title↔H1 splice helper with exact-byte round-trip (idempotent `cat`→`put` fixpoint). Distinct, scriptable exit codes (0/1/3/4/5/7/8/9/10) are mapped at the `runMain` teardown after finalizers close. Commands are wired under `notion-md cat|put|edit` and `notion md cat|put|edit`. Hosted-media URL canonicalization (Group B) and `--frontmatter` schema-drift exit 6 (Group F) are deferred; the editor serves the representable non-media majority. Covered by splice/command unit tests, a fake-gateway `edit` e2e, and live Notion E2E (cat→put fixpoint, guarded conflict, edit round-trip, lossy refusal). +- **@overeng/notion-effect-client**: Add `notionTokenFingerprint(token)`, a log-safe identifier of the active Notion integration token, formatted `` `<scheme>…#<8hex>` `` where `<scheme>` is the public token-type prefix (everything up to and including the first `_`, e.g. `ntn_` / `secret_`) and `<8hex>` is the first 8 hex chars of `sha256(token)`. An empty token yields `<none>`; a token with no `_` yields `…#<8hex>` with an empty scheme. It emits zero secret bytes beyond the scheme prefix (the body is hashed, never echoed), so it is safe to surface in errors and logs while still distinguishing one credential from another. + +- **@overeng/notion-md**: `NmdGatewayError` now carries a `token_fingerprint` field and appends `` ` [integration token <fingerprint>]` `` to its `message`, computed once per gateway layer via `notionTokenFingerprint(config.authToken)`. This lets a user tell _which_ integration token is active when a Notion API call fails (e.g. a `secrets-run` token resolving to a different integration than expected), without leaking the secret. + - **@overeng/otel-contract**: Add the schema-first OTEL operation and metric-contract DSL (`OtelOperation`, `OtelMetric`, `OtelSpan.withStream`, attribute builders, compiled metadata, `encodeSync`, and checked dynamic span-map annotations) and migrate product instrumentation across the repo off raw `Effect.withSpan` / `Stream.withSpan` / `Effect.annotateCurrentSpan` and normal `unsafe*` contract calls. The contract remains runtime-light: it owns schema-backed names, labels, attributes, cardinality metadata, and encoders, while package-local code keeps exporter/provider setup, service identity, Restate replay gates, and runtime-specific bridges. `@overeng/oxc-config` now ships `overeng/no-raw-otel-primitives` with generated rollout config, `@overeng/utils-dev/otelite` gains reusable metric/log expectation helpers, and `restate-effect` adopts the same idiom for internal spans while preserving hook-owned Restate spans and replay-aware metrics. - **@overeng/utils/node/otel-attrs**: Add schema-first OTEL attribute and span contracts (`OtelAttr`, `OtelAttrs`, `OtelSpan`) plus otelite expectation helpers that derive span assertions from the same compiled attribute encoders used by runtime instrumentation. Ambiguous encodings fail closed unless explicitly annotated, redacted values only support redacted/drop policies, and span definitions require the dedicated `OtelAttr.spanLabel()` contract. diff --git a/nix/oxc-config-plugin.nix b/nix/oxc-config-plugin.nix index 1c16a4b58..313de62c3 100644 --- a/nix/oxc-config-plugin.nix +++ b/nix/oxc-config-plugin.nix @@ -28,7 +28,7 @@ let pnpm = pinnedPnpm; }; packageDir = "packages/@overeng/oxc-config"; - pnpmDepsHash = "sha256-0MeOm3vZjJiGpmVAyt6fOavjhYfehVswkXvN6DGLsjQ="; + pnpmDepsHash = "sha256-rdI4O8FDKX3N095b/fBFJQxtzUYPM62cl3mqKROmMMw="; srcPath = if builtins.isAttrs src && builtins.hasAttr "outPath" src then diff --git a/packages/@overeng/genie/nix/build.nix b/packages/@overeng/genie/nix/build.nix index e3f25c565..b9bc7fc3c 100644 --- a/packages/@overeng/genie/nix/build.nix +++ b/packages/@overeng/genie/nix/build.nix @@ -25,7 +25,7 @@ let # Managed by the repo FOD refresh workflow — do not edit manually. depsBuilds = { "." = { - hash = "sha256-yV0ONh4haXUHi9isWdVnsuKfEXjGO8ESqsDrKALbVuU="; + hash = "sha256-6L+2U0Nssegn2SW0CiuNxPLRMSNCx5aGjVh6TaSlWNw="; }; }; nativeNodePackages = [ opentuiCoreNative ]; diff --git a/packages/@overeng/megarepo/nix/build.nix b/packages/@overeng/megarepo/nix/build.nix index a82a0d95e..356bc6523 100644 --- a/packages/@overeng/megarepo/nix/build.nix +++ b/packages/@overeng/megarepo/nix/build.nix @@ -24,7 +24,7 @@ let # Managed by the repo FOD refresh workflow — do not edit manually. depsBuilds = { "." = { - hash = "sha256-1f7bldN6rGvybyvQZ00pQKp24zCL9ceoxpP8dvfU2Kg="; + hash = "sha256-lebuXoDP5ihwV9xwyz8fiBDpXMd1+biCeHGvTpit07c="; }; }; nativeNodePackages = [ opentuiCoreNative ]; diff --git a/packages/@overeng/notion-cli/docs/glossary.md b/packages/@overeng/notion-cli/docs/glossary.md index b758fb703..cd2aa1797 100644 --- a/packages/@overeng/notion-cli/docs/glossary.md +++ b/packages/@overeng/notion-cli/docs/glossary.md @@ -16,6 +16,19 @@ _Avoid_: alias, mode A command implemented directly inside the Bun-compatible root CLI, such as `notion db info`. _Avoid_: local command +**Editor alias**: +The top-level `notion edit <page>` command, an intentional marquee verb that +delegates to `notion md edit`. The only first-level command outside the +`md`/`schema`/`db` namespaces. Distinct from a retired legacy alias. +_Avoid_: shortcut, legacy alias + +**Editor command**: +A `notion md` command for editor-based page editing, owned by +`@overeng/notion-md`: the stateless `cat`/`put` body pipes (stdin/stdout, no local +file) and `edit` (an ephemeral file-engine session over a `$TMPDIR` temp tree). +`--frontmatter` is read-only on `cat` and read/write on `edit`. +_Avoid_: streaming command (it spans the engine-backed `edit`), pipe command + **Node-backed Leaf**: A `notion db` command that must execute in the packaged Node runtime because datasource-sync imports `node:sqlite`. _Avoid_: sqlite command, replica namespace diff --git a/packages/@overeng/notion-cli/docs/requirements.md b/packages/@overeng/notion-cli/docs/requirements.md index cd7b33f81..a831aedcf 100644 --- a/packages/@overeng/notion-cli/docs/requirements.md +++ b/packages/@overeng/notion-cli/docs/requirements.md @@ -26,6 +26,8 @@ This document defines package-level requirements for `@overeng/notion-cli`. It i - **R04 Database namespace:** Database metadata, replica sync, status, conflict, diagnostics, and export workflows must live under `notion db`. - **R05 Markdown namespace:** Markdown page workflows must live under `notion md` and be composed from `@overeng/notion-md`. - **R06 Schema namespace:** Schema generation, introspection, config generation, and drift detection must live under `notion schema`. +- **R17 Markdown editor surface:** `notion md` must expose the `@overeng/notion-md` editor commands `cat`, `put`, and `edit` for editor-based two-way page editing (the stateless `cat`/`put` pipes and the engine-backed `edit`). +- **R18 Editor alias:** The root must expose a top-level `notion edit <page>` alias that delegates to `notion md edit`. This is an intentional marquee verb, not a legacy compatibility alias under R03; it is the only first-level command outside the `md`/`schema`/`db` namespaces. ### Must Preserve Runtime Boundaries diff --git a/packages/@overeng/notion-cli/docs/spec.md b/packages/@overeng/notion-cli/docs/spec.md index 2e26d3f1b..8db1732fa 100644 --- a/packages/@overeng/notion-cli/docs/spec.md +++ b/packages/@overeng/notion-cli/docs/spec.md @@ -24,11 +24,14 @@ This spec does not define: ## Command Surface -Trace: R01-R06, R11-R13. +Trace: R01-R06, R11-R13, R17-R18. ```text notion +├── edit <page> # top-level alias → md edit (marquee verb, R18) ├── md ... # @overeng/notion-md command tree +│ ├── cat / put / edit # editor surface: cat/put pipes + engine-backed edit (R17) +│ └── sync / status / plan # file-based surface ├── schema │ ├── generate │ ├── introspect diff --git a/packages/@overeng/notion-cli/nix/build.nix b/packages/@overeng/notion-cli/nix/build.nix index d02f522aa..b49a7ef3f 100644 --- a/packages/@overeng/notion-cli/nix/build.nix +++ b/packages/@overeng/notion-cli/nix/build.nix @@ -33,7 +33,7 @@ let # Managed by the repo FOD refresh workflow — do not edit manually. depsBuilds = { "." = { - hash = "sha256-CuFkj+1ti/aKBhqG8ZnJmJLHq64CKujgwVgxVneOnHo="; + hash = "sha256-ooTKSmikoUsrTRHe0Okjvv2607KVxojksmznKCebk2s="; }; }; nativeNodePackages = [ opentuiCoreNative ]; diff --git a/packages/@overeng/notion-cli/src/cli-command.unit.test.ts b/packages/@overeng/notion-cli/src/cli-command.unit.test.ts index d1d0c49f4..9cca3c75f 100644 --- a/packages/@overeng/notion-cli/src/cli-command.unit.test.ts +++ b/packages/@overeng/notion-cli/src/cli-command.unit.test.ts @@ -9,11 +9,12 @@ const placeholderCommand = (name: string) => Command.make(name, {}, () => Effect.void).pipe(Command.withDescription(`${name} command`)) describe('notion root command composition', () => { - it('does not expose the removed root sqlite command', async () => { + it('exposes md/schema/db plus the top-level edit alias, not the removed sqlite command', async () => { const command = makeNotionRootCommand({ schemaCommand: placeholderCommand('schema'), dbCommand: placeholderCommand('db'), notionMdDispatchCommand: placeholderCommand('md'), + notionEditAliasCommand: placeholderCommand('edit'), }) const completions = await Effect.runPromise(Command.getBashCompletions(command, 'notion')) @@ -22,6 +23,8 @@ describe('notion root command composition', () => { expect(completionText).toContain('schema') expect(completionText).toContain('db') expect(completionText).toContain('md') + // R18: the top-level `notion edit` marquee alias is a first-level command. + expect(completionText).toContain('edit') expect(completionText).not.toContain('sqlite') }) }) diff --git a/packages/@overeng/notion-cli/src/cli.ts b/packages/@overeng/notion-cli/src/cli.ts index 3fd40ecee..f59e123b0 100644 --- a/packages/@overeng/notion-cli/src/cli.ts +++ b/packages/@overeng/notion-cli/src/cli.ts @@ -2,8 +2,9 @@ import { Command } from '@effect/cli' import { NodeContext, NodeRuntime } from '@effect/platform-node' -import { Cause, Effect, Layer, Option } from 'effect' +import { Cause, Effect, type Exit, Layer, Option } from 'effect' +import { editorExitCode } from '@overeng/notion-md' import { CurrentWorkingDirectory } from '@overeng/utils/node' import { rewriteHelpSubcommand } from '@overeng/utils/node/cli-help-rewrite' import { CliVersion, resolveCliVersion } from '@overeng/utils/node/cli-version' @@ -41,32 +42,77 @@ const makeNotionRootCommand = < MdRequirements, MdError, MdConfig, + EditName extends string, + EditRequirements, + EditError, + EditConfig, >({ schemaCommand, dbCommand, notionMdDispatchCommand, + notionEditAliasCommand, }: { readonly schemaCommand: Command.Command<SchemaName, SchemaRequirements, SchemaError, SchemaConfig> readonly dbCommand: Command.Command<DbName, DbRequirements, DbError, DbConfig> readonly notionMdDispatchCommand: Command.Command<MdName, MdRequirements, MdError, MdConfig> + readonly notionEditAliasCommand: Command.Command< + EditName, + EditRequirements, + EditError, + EditConfig + > }) => Command.make('notion').pipe( - Command.withSubcommands([schemaCommand, dbCommand, notionMdDispatchCommand]), + // `edit` is the top-level marquee alias for `md edit` (R18); it is the only + // first-level command outside the md/schema/db namespaces. + Command.withSubcommands([ + notionEditAliasCommand, + schemaCommand, + dbCommand, + notionMdDispatchCommand, + ]), Command.withDescription( 'Notion CLI - database operations, schema generation, and markdown sync', ), ) +/** + * Map the program `Exit` to the editor-surface exit-code contract + * (notion-md `exit-codes.ts`), mirroring the standalone `notion-md` binary. + * + * Without this the umbrella `notion edit` alias would collapse every tagged + * editor failure (e.g. 3 lossy / 6 schema-drift / 8 abort) to the framework's + * default exit 1, so `notion edit` and `notion-md edit` would disagree for + * scripts. Safe for non-editor commands (schema/db/md): `editorExitCode` falls + * back to 1 for any unmapped failure and 0 on success, matching the previous + * default teardown (Ctrl+C now maps to 130, consistent with `notion-md`). + */ +const editorTeardown = <E, A>(exit: Exit.Exit<E, A>, onExit: (code: number) => void): void => { + onExit(editorExitCode(exit)) +} + const runRootCli = async (argv: ReadonlyArray<string>) => { - const [{ notionMdDispatchCommand }, { dbCommand }, { schemaCommand }] = await Promise.all([ - import('@overeng/notion-md/cli-program'), - import('./commands/db/mod.ts'), - import('./commands/schema/mod.ts'), - ]) + /* + * These trees are imported CONCURRENTLY. That concurrency is what triggers the + * upstream Bun bug oven-sh/bun#30634 (TDZ on a re-exported `const` read during + * parallel dynamic `import()`, Node-fine) — which is why every renderer's TUI + * app is built lazily via `get*App()` instead of at module top level (#787). + * TODO(bun#30634): once the Bun fix (PR oven-sh/bun#30656) ships and we pin a + * Bun version that includes it, the lazy `get*App()` workaround can be reverted + * to plain top-level `const *App = createTuiApp(...)`. See + * `concurrent-import.unit.test.ts` (the regression guard). + */ + const [{ notionMdDispatchCommand, notionEditAliasCommand }, { dbCommand }, { schemaCommand }] = + await Promise.all([ + import('@overeng/notion-md/cli-program'), + import('./commands/db/mod.ts'), + import('./commands/schema/mod.ts'), + ]) const command = makeNotionRootCommand({ schemaCommand, dbCommand, notionMdDispatchCommand, + notionEditAliasCommand, }) const cli = Command.run(command, { name: 'notion', @@ -98,7 +144,7 @@ const runRootCli = async (argv: ReadonlyArray<string>) => { makeOtelCliLayer({ serviceName: 'notion-cli' }), ), ), - NodeRuntime.runMain({ disableErrorReporting: true }), + NodeRuntime.runMain({ disableErrorReporting: true, teardown: editorTeardown }), ) } diff --git a/packages/@overeng/notion-cli/src/commands/db/mod.ts b/packages/@overeng/notion-cli/src/commands/db/mod.ts index 8be18b4a8..e3f624049 100644 --- a/packages/@overeng/notion-cli/src/commands/db/mod.ts +++ b/packages/@overeng/notion-cli/src/commands/db/mod.ts @@ -13,7 +13,7 @@ import { } from '@overeng/notion-datasource-sync/cli/effect-command' import { outputOption as tuiOutputOption, outputModeLayer } from '@overeng/tui-react/node' -import { InfoApp } from '../../renderers/InfoOutput/app.ts' +import { getInfoApp } from '../../renderers/InfoOutput/app.ts' import { InfoView } from '../../renderers/InfoOutput/view.tsx' /** Re-export internal types for TypeScript declaration emit */ @@ -24,7 +24,7 @@ export type { PlatformError } from '@effect/platform/Error' import { NotionConfig, NotionDatabases, NotionDataSources } from '@overeng/notion-effect-client' import { run } from '@overeng/tui-react' -import { resolveNotionToken, tokenOption } from '../schema/mod.ts' +import { resolveNotionToken, tokenOption } from '../shared.ts' const databaseIdArg = Args.text({ name: 'database-id' }).pipe( Args.withDescription('The Notion database ID to operate on'), @@ -49,8 +49,10 @@ const infoCommand = Command.make( authToken: resolvedToken, }) + const infoApp = getInfoApp() + yield* run( - InfoApp, + infoApp, (tui) => Effect.gen(function* () { const program = Effect.gen(function* () { @@ -90,7 +92,7 @@ const infoCommand = Command.make( Effect.provide(Layer.merge(configLayer, FetchHttpClient.layer)), ) }), - { view: React.createElement(InfoView, { stateAtom: InfoApp.stateAtom }) }, + { view: React.createElement(InfoView, { stateAtom: infoApp.stateAtom }) }, ).pipe(Effect.provide(outputModeLayer(output))) }), ).pipe(Command.withDescription('Display information about a Notion database')) diff --git a/packages/@overeng/notion-cli/src/commands/schema/mod.ts b/packages/@overeng/notion-cli/src/commands/schema/mod.ts index 33a8e42a1..2eaf86377 100644 --- a/packages/@overeng/notion-cli/src/commands/schema/mod.ts +++ b/packages/@overeng/notion-cli/src/commands/schema/mod.ts @@ -7,27 +7,23 @@ import { fileURLToPath } from 'node:url' import { Args, Command, Options } from '@effect/cli' import { FetchHttpClient, FileSystem } from '@effect/platform' -import { Effect, Layer, Option, Redacted, Schema } from 'effect' +import { Effect, Layer, Option, Schema } from 'effect' import React from 'react' import { EffectPath } from '@overeng/effect-path' -import { - NotionConfig, - NotionDatabases, - NotionDataSources, - resolveNotionToken as resolveNotionTokenFromEnv, -} from '@overeng/notion-effect-client' +import { NotionConfig, NotionDatabases, NotionDataSources } from '@overeng/notion-effect-client' import { run } from '@overeng/tui-react' import { outputOption as tuiOutputOption, outputModeLayer } from '@overeng/tui-react/node' -import { DiffApp } from '../../renderers/DiffOutput/app.ts' +import { getDiffApp } from '../../renderers/DiffOutput/app.ts' import { DiffView } from '../../renderers/DiffOutput/view.tsx' -import { GenerateConfigApp } from '../../renderers/GenerateConfigOutput/app.ts' +import { getGenerateConfigApp } from '../../renderers/GenerateConfigOutput/app.ts' import { GenerateConfigView } from '../../renderers/GenerateConfigOutput/view.tsx' -import { GenerateApp } from '../../renderers/GenerateOutput/app.ts' +import { getGenerateApp } from '../../renderers/GenerateOutput/app.ts' import { GenerateView } from '../../renderers/GenerateOutput/view.tsx' -import { IntrospectApp } from '../../renderers/IntrospectOutput/app.ts' +import { getIntrospectApp } from '../../renderers/IntrospectOutput/app.ts' import { IntrospectView } from '../../renderers/IntrospectOutput/view.tsx' +import { resolveNotionToken, tokenOption } from '../shared.ts' /** Re-export internal types for TypeScript declaration emit */ export type { PlatformError } from '@effect/platform/Error' @@ -38,6 +34,8 @@ import { computeDiff, hasDifferences, parseGeneratedFile } from '../../diff.ts' import { introspectDatabase, type PropertyTransformConfig } from '../../introspect.ts' import { formatCode, writeSchemaToFile } from '../../output.ts' +export { resolveNotionToken, tokenOption } from '../shared.ts' + // ----------------------------------------------------------------------------- // Exported Errors // ----------------------------------------------------------------------------- @@ -75,19 +73,6 @@ const getGeneratorVersion = Effect.gen(function* () { return pkg.version }).pipe(Effect.orElseSucceed(() => 'unknown')) -/** Resolve the Notion API token as a `Redacted` value from the CLI option or the environment. */ -export const resolveNotionToken = (token: Option.Option<string>) => - Option.isSome(token) === true - ? Effect.succeed(Redacted.make(token.value)) - : resolveNotionTokenFromEnv() - -/** CLI option for providing a Notion API token (defaults to `NOTION_API_TOKEN`). */ -export const tokenOption = Options.text('token').pipe( - Options.withAlias('t'), - Options.withDescription('Notion API token (defaults to NOTION_API_TOKEN env var)'), - Options.optional, -) - // ----------------------------------------------------------------------------- // Generate Command // ----------------------------------------------------------------------------- @@ -215,8 +200,10 @@ const generateCommand = Command.make( authToken: resolvedToken, }) + const generateApp = getGenerateApp() + yield* run( - GenerateApp, + generateApp, (tui) => Effect.gen(function* () { const program = Effect.gen(function* () { @@ -305,7 +292,7 @@ const generateCommand = Command.make( Effect.provide(Layer.merge(configLayer, FetchHttpClient.layer)), ) }), - { view: React.createElement(GenerateView, { stateAtom: GenerateApp.stateAtom }) }, + { view: React.createElement(GenerateView, { stateAtom: generateApp.stateAtom }) }, ).pipe(Effect.provide(outputModeLayer(tuiOutput))) }), ).pipe(Command.withDescription('Generate Effect schema from a Notion database')) @@ -329,8 +316,10 @@ const introspectCommand = Command.make( authToken: resolvedToken, }) + const introspectApp = getIntrospectApp() + yield* run( - IntrospectApp, + introspectApp, (tui) => Effect.gen(function* () { const program = Effect.gen(function* () { @@ -400,7 +389,7 @@ const introspectCommand = Command.make( Effect.provide(Layer.merge(configLayer, FetchHttpClient.layer)), ) }), - { view: React.createElement(IntrospectView, { stateAtom: IntrospectApp.stateAtom }) }, + { view: React.createElement(IntrospectView, { stateAtom: introspectApp.stateAtom }) }, ).pipe(Effect.provide(outputModeLayer(output))) }), ).pipe(Command.withDescription('Introspect a Notion database and display its schema')) @@ -439,8 +428,10 @@ const generateFromConfigCommand = Command.make( authToken: resolvedToken, }) + const generateConfigApp = getGenerateConfigApp() + yield* run( - GenerateConfigApp, + generateConfigApp, (tui) => Effect.gen(function* () { const program = Effect.gen(function* () { @@ -529,7 +520,7 @@ const generateFromConfigCommand = Command.make( ) }), { - view: React.createElement(GenerateConfigView, { stateAtom: GenerateConfigApp.stateAtom }), + view: React.createElement(GenerateConfigView, { stateAtom: generateConfigApp.stateAtom }), }, ).pipe(Effect.provide(outputModeLayer(output))) }), @@ -571,8 +562,10 @@ const diffCommand = Command.make( authToken: resolvedToken, }) + const diffApp = getDiffApp() + yield* run( - DiffApp, + diffApp, (tui) => Effect.gen(function* () { const program = Effect.gen(function* () { @@ -634,7 +627,7 @@ const diffCommand = Command.make( Effect.provide(Layer.merge(configLayer, FetchHttpClient.layer)), ) }), - { view: React.createElement(DiffView, { stateAtom: DiffApp.stateAtom }) }, + { view: React.createElement(DiffView, { stateAtom: diffApp.stateAtom }) }, ).pipe(Effect.provide(outputModeLayer(output))) }), ).pipe( diff --git a/packages/@overeng/notion-cli/src/commands/shared.ts b/packages/@overeng/notion-cli/src/commands/shared.ts new file mode 100644 index 000000000..58d3e3d86 --- /dev/null +++ b/packages/@overeng/notion-cli/src/commands/shared.ts @@ -0,0 +1,17 @@ +import { Options } from '@effect/cli' +import { Effect, Option, Redacted } from 'effect' + +import { resolveNotionToken as resolveNotionTokenFromEnv } from '@overeng/notion-effect-client' + +/** Resolve the Notion API token as a `Redacted` value from the CLI option or the environment. */ +export const resolveNotionToken = (token: Option.Option<string>) => + Option.isSome(token) === true + ? Effect.succeed(Redacted.make(token.value)) + : resolveNotionTokenFromEnv() + +/** CLI option for providing a Notion API token (defaults to `NOTION_API_TOKEN`). */ +export const tokenOption = Options.text('token').pipe( + Options.withAlias('t'), + Options.withDescription('Notion API token (defaults to NOTION_API_TOKEN env var)'), + Options.optional, +) diff --git a/packages/@overeng/notion-cli/src/concurrent-import.fixture.ts b/packages/@overeng/notion-cli/src/concurrent-import.fixture.ts new file mode 100644 index 000000000..2b9833a8b --- /dev/null +++ b/packages/@overeng/notion-cli/src/concurrent-import.fixture.ts @@ -0,0 +1,25 @@ +/** + * Reproduction fixture for #787: the umbrella `notion` binary loads its three + * command trees concurrently via `Promise.all` (see `cli.ts`'s `runRootCli`). + * + * Under Bun's concurrent async module evaluation this used to reach a renderer + * `app.ts`'s top-level `createTuiApp(...)` side-effect while the shared + * `@overeng/tui-react` module graph was still mid-initialization, producing a + * TDZ `ReferenceError: Cannot access '…' before initialization`. + * + * Run with Bun (the umbrella binary's runtime). Exits non-zero on the crash. + */ +const main = async () => { + await Promise.all([ + import('@overeng/notion-md/cli-program'), + import('./commands/db/mod.ts'), + import('./commands/schema/mod.ts'), + ]) + process.stdout.write('CONCURRENT_IMPORT_OK\n') +} + +main().catch((error: unknown) => { + const message = error instanceof Error ? error.message : String(error) + process.stderr.write(`CONCURRENT_IMPORT_CRASH: ${message}\n`) + process.exit(1) +}) diff --git a/packages/@overeng/notion-cli/src/concurrent-import.unit.test.ts b/packages/@overeng/notion-cli/src/concurrent-import.unit.test.ts new file mode 100644 index 000000000..e30ce99f7 --- /dev/null +++ b/packages/@overeng/notion-cli/src/concurrent-import.unit.test.ts @@ -0,0 +1,34 @@ +import { spawnSync } from 'node:child_process' +import { fileURLToPath } from 'node:url' + +import { describe, expect, it } from 'vitest' + +/** + * Regression test for #787. + * + * The umbrella `notion` binary runs on Bun and imports its three command trees + * concurrently. A renderer `app.ts` calling `createTuiApp(...)` as a module-load + * side-effect crashed with a TDZ error under Bun's concurrent module evaluation. + * + * This test spawns the fixture with Bun — the binary's real runtime — because the + * crash is specific to Bun's async-evaluation interleaving and does not reproduce + * under vitest's Node runner. + */ +describe('concurrent command-tree import (#787)', () => { + it('loads all three trees concurrently under Bun without a TDZ crash', () => { + const fixture = fileURLToPath(new URL('./concurrent-import.fixture.ts', import.meta.url)) + + const proc = spawnSync('bun', ['run', fixture], { + encoding: 'utf8', + timeout: 25_000, + }) + + const stdout = proc.stdout ?? '' + const stderr = proc.stderr ?? '' + + expect(proc.error, `${stderr}\n${String(proc.error)}`).toBeUndefined() + expect(stderr, stderr).not.toContain('before initialization') + expect(stdout).toContain('CONCURRENT_IMPORT_OK') + expect(proc.status).toBe(0) + }, 30_000) +}) diff --git a/packages/@overeng/notion-cli/src/exit-codes.unit.test.ts b/packages/@overeng/notion-cli/src/exit-codes.unit.test.ts new file mode 100644 index 000000000..90836aa2b --- /dev/null +++ b/packages/@overeng/notion-cli/src/exit-codes.unit.test.ts @@ -0,0 +1,26 @@ +import { Exit } from 'effect' +import { describe, expect, it } from 'vitest' + +import { editorExitCode } from '@overeng/notion-md' + +/** + * The umbrella `notion` binary wires `editorExitCode` into its `runMain` + * teardown (see `cli.ts`) so that `notion edit` honors the same scriptable + * exit-code contract as the standalone `notion-md edit`. These assertions pin + * that shared mapping: editor-tagged failures get their distinct codes and any + * other failure falls back to 1 (the framework default), so non-editor umbrella + * commands are unaffected. + */ +describe('umbrella editor teardown exit codes', () => { + it('maps editor-tagged failures to their distinct codes', () => { + expect(editorExitCode(Exit.fail({ _tag: 'NmdRemoteBodyLossyError' }))).toBe(3) + expect(editorExitCode(Exit.fail({ _tag: 'NmdSchemaDriftError' }))).toBe(6) + expect(editorExitCode(Exit.fail({ _tag: 'NmdEditorAbortedError' }))).toBe(8) + }) + + it('falls back to 1 for non-editor failures and 0 on success', () => { + expect(editorExitCode(Exit.fail({ _tag: 'SomeUnmappedError' }))).toBe(1) + expect(editorExitCode(Exit.fail(new Error('boom')))).toBe(1) + expect(editorExitCode(Exit.succeed(undefined))).toBe(0) + }) +}) diff --git a/packages/@overeng/notion-cli/src/renderers/DiffOutput.stories.tsx b/packages/@overeng/notion-cli/src/renderers/DiffOutput.stories.tsx index 338abdf17..c3b1a6831 100644 --- a/packages/@overeng/notion-cli/src/renderers/DiffOutput.stories.tsx +++ b/packages/@overeng/notion-cli/src/renderers/DiffOutput.stories.tsx @@ -3,10 +3,12 @@ import React from 'react' import { TuiStoryPreview } from '@overeng/tui-react/storybook' -import { DiffApp } from './DiffOutput/mod.ts' +import { getDiffApp } from './DiffOutput/mod.ts' import type { DiffState } from './DiffOutput/schema.ts' import { DiffView } from './DiffOutput/view.tsx' +const DiffApp = getDiffApp() + export default { title: 'NotionCLI/Diff Output', component: DiffView, diff --git a/packages/@overeng/notion-cli/src/renderers/DiffOutput/app.ts b/packages/@overeng/notion-cli/src/renderers/DiffOutput/app.ts index da99d92e8..07a438bf4 100644 --- a/packages/@overeng/notion-cli/src/renderers/DiffOutput/app.ts +++ b/packages/@overeng/notion-cli/src/renderers/DiffOutput/app.ts @@ -1,12 +1,23 @@ -import { createTuiApp } from '@overeng/tui-react' +import { type TuiApp, createTuiApp } from '@overeng/tui-react' import { DiffState, DiffAction, diffReducer } from './schema.ts' -/** TUI app definition for the schema diff command. */ -export const DiffApp = createTuiApp({ - stateSchema: DiffState, - actionSchema: DiffAction, - initial: { _tag: 'Loading' } as DiffState, - reducer: diffReducer, - exitCode: (state) => (state._tag === 'Error' ? 1 : 0), -}) +let cached: TuiApp<DiffState, DiffAction> | undefined + +/** + * TUI app definition for the schema diff command. + * + * Constructed lazily (and memoized) rather than at module top level: building it + * eagerly is a module-load side-effect that crashes the umbrella `notion` binary + * under Bun's concurrent command-tree import (#787, upstream oven-sh/bun#30634). + * The five renderer `get*App()` accessors share this workaround; see `cli.ts` for + * the trigger + the TODO to drop it once the Bun fix lands. + */ +export const getDiffApp = (): TuiApp<DiffState, DiffAction> => + (cached ??= createTuiApp({ + stateSchema: DiffState, + actionSchema: DiffAction, + initial: { _tag: 'Loading' } as DiffState, + reducer: diffReducer, + exitCode: (state) => (state._tag === 'Error' ? 1 : 0), + })) diff --git a/packages/@overeng/notion-cli/src/renderers/DiffOutput/mod.ts b/packages/@overeng/notion-cli/src/renderers/DiffOutput/mod.ts index 8e41551b9..45ef7fa0f 100644 --- a/packages/@overeng/notion-cli/src/renderers/DiffOutput/mod.ts +++ b/packages/@overeng/notion-cli/src/renderers/DiffOutput/mod.ts @@ -3,7 +3,7 @@ export { DiffState, DiffAction, diffReducer } from './schema.ts' export type { DiffState as DiffStateType, DiffAction as DiffActionType } from './schema.ts' // App -export { DiffApp } from './app.ts' +export { getDiffApp } from './app.ts' // Views export { DiffView, type DiffViewProps } from './view.tsx' diff --git a/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput.stories.tsx b/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput.stories.tsx index 7c27d1d54..f0054b897 100644 --- a/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput.stories.tsx +++ b/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput.stories.tsx @@ -3,10 +3,12 @@ import React from 'react' import { TuiStoryPreview } from '@overeng/tui-react/storybook' -import { GenerateConfigApp } from './GenerateConfigOutput/mod.ts' +import { getGenerateConfigApp } from './GenerateConfigOutput/mod.ts' import type { GenerateConfigAction, GenerateConfigState } from './GenerateConfigOutput/schema.ts' import { GenerateConfigView } from './GenerateConfigOutput/view.tsx' +const GenerateConfigApp = getGenerateConfigApp() + export default { title: 'NotionCLI/Generate Config Output', component: GenerateConfigView, diff --git a/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/app.ts b/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/app.ts index f45d1af78..813015b98 100644 --- a/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/app.ts +++ b/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/app.ts @@ -1,12 +1,21 @@ -import { createTuiApp } from '@overeng/tui-react' +import { type TuiApp, createTuiApp } from '@overeng/tui-react' import { GenerateConfigState, GenerateConfigAction, generateConfigReducer } from './schema.ts' -/** TUI app definition for the config-based schema generation command. */ -export const GenerateConfigApp = createTuiApp({ - stateSchema: GenerateConfigState, - actionSchema: GenerateConfigAction, - initial: { _tag: 'Loading', configPath: '' } as GenerateConfigState, - reducer: generateConfigReducer, - exitCode: (state) => (state._tag === 'Error' ? 1 : 0), -}) +let cached: TuiApp<GenerateConfigState, GenerateConfigAction> | undefined + +/** + * TUI app definition for the config-based schema generation command. + * + * Constructed lazily (and memoized) rather than at module top level: building it + * eagerly is a module-load side-effect that crashes the umbrella `notion` binary + * under Bun's concurrent command-tree import (#787). + */ +export const getGenerateConfigApp = (): TuiApp<GenerateConfigState, GenerateConfigAction> => + (cached ??= createTuiApp({ + stateSchema: GenerateConfigState, + actionSchema: GenerateConfigAction, + initial: { _tag: 'Loading', configPath: '' } as GenerateConfigState, + reducer: generateConfigReducer, + exitCode: (state) => (state._tag === 'Error' ? 1 : 0), + })) diff --git a/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/mod.ts b/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/mod.ts index 37a874ca6..aa8a5553f 100644 --- a/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/mod.ts +++ b/packages/@overeng/notion-cli/src/renderers/GenerateConfigOutput/mod.ts @@ -6,7 +6,7 @@ export type { } from './schema.ts' // App -export { GenerateConfigApp } from './app.ts' +export { getGenerateConfigApp } from './app.ts' // Views export { GenerateConfigView, type GenerateConfigViewProps } from './view.tsx' diff --git a/packages/@overeng/notion-cli/src/renderers/GenerateOutput.stories.tsx b/packages/@overeng/notion-cli/src/renderers/GenerateOutput.stories.tsx index da797805f..16e17a29c 100644 --- a/packages/@overeng/notion-cli/src/renderers/GenerateOutput.stories.tsx +++ b/packages/@overeng/notion-cli/src/renderers/GenerateOutput.stories.tsx @@ -3,10 +3,12 @@ import React from 'react' import { TuiStoryPreview } from '@overeng/tui-react/storybook' -import { GenerateApp } from './GenerateOutput/mod.ts' +import { getGenerateApp } from './GenerateOutput/mod.ts' import type { GenerateAction, GenerateState } from './GenerateOutput/schema.ts' import { GenerateView } from './GenerateOutput/view.tsx' +const GenerateApp = getGenerateApp() + export default { title: 'NotionCLI/Generate Output', component: GenerateView, diff --git a/packages/@overeng/notion-cli/src/renderers/GenerateOutput/app.ts b/packages/@overeng/notion-cli/src/renderers/GenerateOutput/app.ts index 2c3d6eff6..99a569170 100644 --- a/packages/@overeng/notion-cli/src/renderers/GenerateOutput/app.ts +++ b/packages/@overeng/notion-cli/src/renderers/GenerateOutput/app.ts @@ -1,12 +1,21 @@ -import { createTuiApp } from '@overeng/tui-react' +import { type TuiApp, createTuiApp } from '@overeng/tui-react' import { GenerateState, GenerateAction, generateReducer } from './schema.ts' -/** TUI app definition for the single-database schema generation command. */ -export const GenerateApp = createTuiApp({ - stateSchema: GenerateState, - actionSchema: GenerateAction, - initial: { _tag: 'Introspecting', databaseId: '' } as GenerateState, - reducer: generateReducer, - exitCode: (state) => (state._tag === 'Error' ? 1 : 0), -}) +let cached: TuiApp<GenerateState, GenerateAction> | undefined + +/** + * TUI app definition for the single-database schema generation command. + * + * Constructed lazily (and memoized) rather than at module top level: building it + * eagerly is a module-load side-effect that crashes the umbrella `notion` binary + * under Bun's concurrent command-tree import (#787). + */ +export const getGenerateApp = (): TuiApp<GenerateState, GenerateAction> => + (cached ??= createTuiApp({ + stateSchema: GenerateState, + actionSchema: GenerateAction, + initial: { _tag: 'Introspecting', databaseId: '' } as GenerateState, + reducer: generateReducer, + exitCode: (state) => (state._tag === 'Error' ? 1 : 0), + })) diff --git a/packages/@overeng/notion-cli/src/renderers/GenerateOutput/mod.ts b/packages/@overeng/notion-cli/src/renderers/GenerateOutput/mod.ts index 3b38827f5..f41977539 100644 --- a/packages/@overeng/notion-cli/src/renderers/GenerateOutput/mod.ts +++ b/packages/@overeng/notion-cli/src/renderers/GenerateOutput/mod.ts @@ -6,7 +6,7 @@ export type { } from './schema.ts' // App -export { GenerateApp } from './app.ts' +export { getGenerateApp } from './app.ts' // Views export { GenerateView, type GenerateViewProps } from './view.tsx' diff --git a/packages/@overeng/notion-cli/src/renderers/InfoOutput.stories.tsx b/packages/@overeng/notion-cli/src/renderers/InfoOutput.stories.tsx index 325d2872f..ae213e038 100644 --- a/packages/@overeng/notion-cli/src/renderers/InfoOutput.stories.tsx +++ b/packages/@overeng/notion-cli/src/renderers/InfoOutput.stories.tsx @@ -7,10 +7,12 @@ import React from 'react' import { TuiStoryPreview } from '@overeng/tui-react/storybook' -import { InfoApp } from './InfoOutput/mod.ts' +import { getInfoApp } from './InfoOutput/mod.ts' import type { InfoState } from './InfoOutput/schema.ts' import { InfoView } from './InfoOutput/view.tsx' +const InfoApp = getInfoApp() + // ============================================================================= // State Factories // ============================================================================= diff --git a/packages/@overeng/notion-cli/src/renderers/InfoOutput/app.ts b/packages/@overeng/notion-cli/src/renderers/InfoOutput/app.ts index 30f314b06..efacc6132 100644 --- a/packages/@overeng/notion-cli/src/renderers/InfoOutput/app.ts +++ b/packages/@overeng/notion-cli/src/renderers/InfoOutput/app.ts @@ -1,12 +1,21 @@ -import { createTuiApp } from '@overeng/tui-react' +import { type TuiApp, createTuiApp } from '@overeng/tui-react' import { InfoState, InfoAction, infoReducer } from './schema.ts' -/** TUI app definition for the database info command. */ -export const InfoApp = createTuiApp({ - stateSchema: InfoState, - actionSchema: InfoAction, - initial: { _tag: 'Loading' } as InfoState, - reducer: infoReducer, - exitCode: (state) => (state._tag === 'Error' ? 1 : 0), -}) +let cached: TuiApp<InfoState, InfoAction> | undefined + +/** + * TUI app definition for the database info command. + * + * Constructed lazily (and memoized) rather than at module top level: building it + * eagerly is a module-load side-effect that crashes the umbrella `notion` binary + * under Bun's concurrent command-tree import (#787). + */ +export const getInfoApp = (): TuiApp<InfoState, InfoAction> => + (cached ??= createTuiApp({ + stateSchema: InfoState, + actionSchema: InfoAction, + initial: { _tag: 'Loading' } as InfoState, + reducer: infoReducer, + exitCode: (state) => (state._tag === 'Error' ? 1 : 0), + })) diff --git a/packages/@overeng/notion-cli/src/renderers/InfoOutput/mod.ts b/packages/@overeng/notion-cli/src/renderers/InfoOutput/mod.ts index 5725e4564..d608097a0 100644 --- a/packages/@overeng/notion-cli/src/renderers/InfoOutput/mod.ts +++ b/packages/@overeng/notion-cli/src/renderers/InfoOutput/mod.ts @@ -3,7 +3,7 @@ export { InfoState, InfoAction, infoReducer } from './schema.ts' export type { InfoState as InfoStateType, InfoAction as InfoActionType } from './schema.ts' // App -export { InfoApp } from './app.ts' +export { getInfoApp } from './app.ts' // Views export { InfoView, type InfoViewProps } from './view.tsx' diff --git a/packages/@overeng/notion-cli/src/renderers/IntrospectOutput.stories.tsx b/packages/@overeng/notion-cli/src/renderers/IntrospectOutput.stories.tsx index 9a05854d4..4f3e9e655 100644 --- a/packages/@overeng/notion-cli/src/renderers/IntrospectOutput.stories.tsx +++ b/packages/@overeng/notion-cli/src/renderers/IntrospectOutput.stories.tsx @@ -3,10 +3,12 @@ import React from 'react' import { TuiStoryPreview } from '@overeng/tui-react/storybook' -import { IntrospectApp } from './IntrospectOutput/mod.ts' +import { getIntrospectApp } from './IntrospectOutput/mod.ts' import type { IntrospectState } from './IntrospectOutput/schema.ts' import { IntrospectView } from './IntrospectOutput/view.tsx' +const IntrospectApp = getIntrospectApp() + export default { title: 'NotionCLI/Introspect Output', component: IntrospectView, diff --git a/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/app.ts b/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/app.ts index 6c5e97697..a87dd7eb0 100644 --- a/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/app.ts +++ b/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/app.ts @@ -1,12 +1,21 @@ -import { createTuiApp } from '@overeng/tui-react' +import { type TuiApp, createTuiApp } from '@overeng/tui-react' import { IntrospectState, IntrospectAction, introspectReducer } from './schema.ts' -/** TUI app definition for the database introspection command. */ -export const IntrospectApp = createTuiApp({ - stateSchema: IntrospectState, - actionSchema: IntrospectAction, - initial: { _tag: 'Loading' } as IntrospectState, - reducer: introspectReducer, - exitCode: (state) => (state._tag === 'Error' ? 1 : 0), -}) +let cached: TuiApp<IntrospectState, IntrospectAction> | undefined + +/** + * TUI app definition for the database introspection command. + * + * Constructed lazily (and memoized) rather than at module top level: building it + * eagerly is a module-load side-effect that crashes the umbrella `notion` binary + * under Bun's concurrent command-tree import (#787). + */ +export const getIntrospectApp = (): TuiApp<IntrospectState, IntrospectAction> => + (cached ??= createTuiApp({ + stateSchema: IntrospectState, + actionSchema: IntrospectAction, + initial: { _tag: 'Loading' } as IntrospectState, + reducer: introspectReducer, + exitCode: (state) => (state._tag === 'Error' ? 1 : 0), + })) diff --git a/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/mod.ts b/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/mod.ts index 323037db0..973510afc 100644 --- a/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/mod.ts +++ b/packages/@overeng/notion-cli/src/renderers/IntrospectOutput/mod.ts @@ -6,7 +6,7 @@ export type { } from './schema.ts' // App -export { IntrospectApp } from './app.ts' +export { getIntrospectApp } from './app.ts' // Views export { IntrospectView, type IntrospectViewProps } from './view.tsx' diff --git a/packages/@overeng/notion-core/src/body-fidelity.ts b/packages/@overeng/notion-core/src/body-fidelity.ts index 9adcd7ff1..ba7ba47ad 100644 --- a/packages/@overeng/notion-core/src/body-fidelity.ts +++ b/packages/@overeng/notion-core/src/body-fidelity.ts @@ -2,6 +2,7 @@ export type BodyLossyReason = | 'endpoint_truncated' | 'unknown_blocks' | 'unsupported_blocks' + | 'not_round_trip_safe_blocks' | 'rendered_markdown_unavailable' | 'rendered_markdown_has_unobserved_suffix' @@ -12,6 +13,17 @@ export type BodyCompleteness = | { readonly _tag: 'lossy' readonly reasons: readonly BodyLossyReason[] + /** + * The distinct block types that triggered a block-level lossy reason + * (`unsupported_blocks` / `not_round_trip_safe_blocks`), sorted and + * deduplicated. Empty/absent when the verdict is purely Markdown-level + * (truncation, suffix, …). Surfaced so the refusal gate can name the + * offending block class in its message and so a tree node can tolerate + * its own `child_page` blocks without losing the other reasons. Optional + * so older inline verdicts (e.g. `rendered_markdown_unavailable`) need not + * restate it. + */ + readonly lossyBlockTypes?: readonly string[] } export interface MarkdownBodySnapshot { @@ -44,6 +56,48 @@ export interface BodyFidelityObservation { const unsupportedBlockTypes = new Set(['unsupported']) +/** + * Notion block types whose body-Markdown rendering does **not** reparse to the + * same block on push (R38, decisions 0016/0017). They render to Markdown that + * Notion's parser re-creates as a plain paragraph (or drops), so a + * `replace_content` over them silently destroys the original block — the live + * data-loss defect in #785. A page containing any of these is refused at the + * pull on every surface (`cat`/`put`/`edit`/file `sync`). + * + * Criterion is a curated type set rather than a live reparse check on purpose: + * `@overeng/notion-core` is pure and carries no Markdown parser, and for these + * blocks the endpoint Markdown and the independently rendered Markdown already + * *agree* (both emit `[TOC]`, `[embedded db]()`, …), so the existing + * `hasUnobservedRenderedSuffix` heuristic cannot catch them. The set is the + * principled encoding of "known not round-trip-safe in the body" the platform + * permits (impl-delta Group C; R38 explicitly sanctions a known-lossy type set + * where a reparse check isn't practical). + * + * `child_page` is included here as a *body* block (R30: a child-page block in + * the body is refused). It has a dual role: a child page that is a tree node + * (its own `.nmd` file) is legitimately managed by the file tree engine, so the + * refusal gate tolerates `child_page` for tree nodes via + * `tolerateTreeChildPages` while still refusing it for single-page surfaces and + * still refusing any *other* lossy block on the same page. + * + * Hosted/external media (`image`/`video`/`audio`/`file`/`pdf`) is deliberately + * absent: media is representable and stays editable; only its URL is volatile + * (decision 0007, Group B). + */ +const notRoundTripSafeBlockTypes = new Set([ + 'child_database', + 'table_of_contents', + 'synced_block', + 'child_page', + 'breadcrumb', + 'bookmark', + 'embed', + 'link_preview', + 'link_to_page', +]) + +const TREE_TOLERATED_BLOCK_TYPE = 'child_page' + const normalizeLines = (value: string): string => value.replace(/\r\n?/gu, '\n').trimEnd() const normalizeComparableMarkdown = (value: string): string => @@ -76,11 +130,22 @@ export const classifyBodyCompleteness = (opts: { readonly inventory: BlockInventory }): BodyCompleteness => { const reasons: BodyLossyReason[] = [] + const lossyBlockTypes: string[] = [] if (opts.markdown.truncated === true) reasons.push('endpoint_truncated') if (opts.markdown.unknownBlockIds.length > 0) reasons.push('unknown_blocks') - if (opts.inventory.entries.some((entry) => unsupportedBlockTypes.has(entry.type)) === true) { + + const blockTypes = unique(opts.inventory.entries.map((entry) => entry.type)) + const unsupportedTypes = blockTypes.filter((type) => unsupportedBlockTypes.has(type)) + const notRoundTripSafeTypes = blockTypes.filter((type) => notRoundTripSafeBlockTypes.has(type)) + + if (unsupportedTypes.length > 0) { reasons.push('unsupported_blocks') + lossyBlockTypes.push(...unsupportedTypes) + } + if (notRoundTripSafeTypes.length > 0) { + reasons.push('not_round_trip_safe_blocks') + lossyBlockTypes.push(...notRoundTripSafeTypes) } if ( hasUnobservedRenderedSuffix({ @@ -91,7 +156,73 @@ export const classifyBodyCompleteness = (opts: { reasons.push('rendered_markdown_has_unobserved_suffix') } - return reasons.length === 0 ? { _tag: 'complete' } : { _tag: 'lossy', reasons: unique(reasons) } + return reasons.length === 0 + ? { _tag: 'complete' } + : { + _tag: 'lossy', + reasons: unique(reasons), + lossyBlockTypes: unique(lossyBlockTypes).toSorted(), + } +} + +/** + * Re-evaluate a completeness verdict for a **tree node** (a child page that is + * its own `.nmd` file): the node's own `child_page` blocks are managed by the + * file tree engine (re-emitted as `<page>` anchors and stripped before + * comparison, R12/R30), so they must not refuse the parent. Every *other* lossy + * reason — including a real `table_of_contents`/`synced_block`/… on the same + * page, truncation, unknown blocks, or a suffix mismatch — is preserved, so + * this never reopens #785 on the tree path. + * + * Tolerating means: drop `child_page` from `lossyBlockTypes`, and drop the + * `not_round_trip_safe_blocks` reason only if that was the *sole* not-round- + * trip-safe block. A verdict that stays lossy keeps its remaining reasons. + */ +export const tolerateTreeChildPages = (completeness: BodyCompleteness): BodyCompleteness => { + if (completeness._tag === 'complete') return completeness + + const remainingTypes = (completeness.lossyBlockTypes ?? []).filter( + (type) => type !== TREE_TOLERATED_BLOCK_TYPE, + ) + + // If `child_page` was the only block-level lossy type, the + // `not_round_trip_safe_blocks` reason no longer applies (it could only have + // been raised by a not-round-trip-safe type; `unsupported` keeps its own + // reason). Keep it whenever another not-round-trip-safe type remains. + const droppedNotRoundTripSafe = + completeness.reasons.includes('not_round_trip_safe_blocks') === true && + remainingTypes.some((type) => notRoundTripSafeBlockTypes.has(type)) === false + + const reasons = droppedNotRoundTripSafe + ? completeness.reasons.filter((reason) => reason !== 'not_round_trip_safe_blocks') + : completeness.reasons + + return reasons.length === 0 + ? { _tag: 'complete' } + : { _tag: 'lossy', reasons, lossyBlockTypes: remainingTypes } +} + +/** + * Human-facing refusal message for a lossy verdict (R30/R38). Names the + * offending block class(es) when known and points the user to the Notion UI — + * shared so every surface (`cat`/`put`/`edit`/file `sync`/tree) refuses with the + * same wording. `context` distinguishes the call site (clean base, push, …) + * without changing the user-facing meaning. + */ +export const describeBodyLossyRefusal = (opts: { + readonly pageId: string + readonly completeness: Extract<BodyCompleteness, { readonly _tag: 'lossy' }> + readonly context: string +}): string => { + const lossyBlockTypes = opts.completeness.lossyBlockTypes ?? [] + const blocks = + lossyBlockTypes.length > 0 + ? ` containing not-losslessly-representable block(s): ${lossyBlockTypes.join(', ')}` + : '' + return ( + `Remote Markdown body for page ${opts.pageId} is lossy (${opts.completeness.reasons.join(', ')})${blocks}; ` + + `${opts.context}. Edit such blocks in the Notion UI.` + ) } export const stableBodyFidelityStringify = (value: unknown): string => { diff --git a/packages/@overeng/notion-core/src/body-fidelity.unit.test.ts b/packages/@overeng/notion-core/src/body-fidelity.unit.test.ts index 215f284b9..ef84fd5ee 100644 --- a/packages/@overeng/notion-core/src/body-fidelity.unit.test.ts +++ b/packages/@overeng/notion-core/src/body-fidelity.unit.test.ts @@ -1,6 +1,12 @@ import { describe, expect, it } from 'vitest' -import { classifyBodyCompleteness, stableBodyFidelityStringify } from './body-fidelity.ts' +import { + classifyBodyCompleteness, + describeBodyLossyRefusal, + stableBodyFidelityStringify, + tolerateTreeChildPages, + type BodyCompleteness, +} from './body-fidelity.ts' describe('classifyBodyCompleteness', () => { const inventory = { @@ -36,6 +42,7 @@ describe('classifyBodyCompleteness', () => { ).toEqual({ _tag: 'lossy', reasons: ['endpoint_truncated', 'unknown_blocks'], + lossyBlockTypes: [], }) }) @@ -51,6 +58,7 @@ describe('classifyBodyCompleteness', () => { ).toEqual({ _tag: 'lossy', reasons: ['rendered_markdown_has_unobserved_suffix'], + lossyBlockTypes: [], }) }) @@ -66,6 +74,7 @@ describe('classifyBodyCompleteness', () => { ).toEqual({ _tag: 'lossy', reasons: ['rendered_markdown_has_unobserved_suffix'], + lossyBlockTypes: [], }) }) @@ -112,8 +121,149 @@ describe('classifyBodyCompleteness', () => { ).toEqual({ _tag: 'lossy', reasons: ['unsupported_blocks'], + lossyBlockTypes: ['unsupported'], }) }) + + const blockInventory = (type: string) => ({ + entries: [{ id: `id-${type}`, type, hasChildren: false, inTrash: false }], + }) + + // R38: a block whose body-Markdown rendering does not reparse to the same + // block (round-trip-safety) must classify lossy so the pull gate refuses it. + for (const type of [ + 'child_database', + 'table_of_contents', + 'synced_block', + 'child_page', + 'breadcrumb', + 'bookmark', + 'embed', + 'link_preview', + 'link_to_page', + ]) { + it(`classifies not-round-trip-safe block "${type}" as lossy`, () => { + expect( + classifyBodyCompleteness({ + markdown: { markdown: 'Prose', truncated: false, unknownBlockIds: [] }, + inventory: { ...blockInventory(type), renderedMarkdown: 'Prose' }, + }), + ).toEqual({ + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: [type], + }) + }) + } + + it('keeps representable media (image) complete', () => { + expect( + classifyBodyCompleteness({ + markdown: { markdown: 'Prose', truncated: false, unknownBlockIds: [] }, + inventory: { ...blockInventory('image'), renderedMarkdown: 'Prose' }, + }), + ).toEqual({ _tag: 'complete' }) + }) + + it('reports every distinct offending block type, sorted and deduplicated', () => { + expect( + classifyBodyCompleteness({ + markdown: { markdown: 'Prose', truncated: false, unknownBlockIds: [] }, + inventory: { + entries: [ + { id: '1', type: 'synced_block', hasChildren: false, inTrash: false }, + { id: '2', type: 'table_of_contents', hasChildren: false, inTrash: false }, + { id: '3', type: 'synced_block', hasChildren: false, inTrash: false }, + { id: '4', type: 'unsupported', hasChildren: false, inTrash: false }, + ], + renderedMarkdown: 'Prose', + }, + }), + ).toEqual({ + _tag: 'lossy', + reasons: ['unsupported_blocks', 'not_round_trip_safe_blocks'], + lossyBlockTypes: ['synced_block', 'table_of_contents', 'unsupported'], + }) + }) +}) + +describe('tolerateTreeChildPages', () => { + it('clears a child_page-only verdict to complete (tree node owns its sub-pages)', () => { + const verdict = classifyBodyCompleteness({ + markdown: { markdown: 'Prose', truncated: false, unknownBlockIds: [] }, + inventory: { + entries: [{ id: '1', type: 'child_page', hasChildren: false, inTrash: false }], + renderedMarkdown: 'Prose', + }, + }) + expect(verdict).toEqual({ + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['child_page'], + }) + expect(tolerateTreeChildPages(verdict)).toEqual({ _tag: 'complete' }) + }) + + it('still refuses a tree node that ALSO has a real lossy block (#785 stays fixed)', () => { + const verdict = classifyBodyCompleteness({ + markdown: { markdown: 'Prose', truncated: false, unknownBlockIds: [] }, + inventory: { + entries: [ + { id: '1', type: 'child_page', hasChildren: false, inTrash: false }, + { id: '2', type: 'table_of_contents', hasChildren: false, inTrash: false }, + ], + renderedMarkdown: 'Prose', + }, + }) + expect(tolerateTreeChildPages(verdict)).toEqual({ + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['table_of_contents'], + }) + }) + + it('preserves Markdown-level reasons even when child_page is tolerated', () => { + const verdict: BodyCompleteness = { + _tag: 'lossy', + reasons: ['endpoint_truncated', 'not_round_trip_safe_blocks'], + lossyBlockTypes: ['child_page'], + } + expect(tolerateTreeChildPages(verdict)).toEqual({ + _tag: 'lossy', + reasons: ['endpoint_truncated'], + lossyBlockTypes: [], + }) + }) + + it('is a no-op for a complete verdict', () => { + expect(tolerateTreeChildPages({ _tag: 'complete' })).toEqual({ _tag: 'complete' }) + }) +}) + +describe('describeBodyLossyRefusal', () => { + it('names the offending block class and points to the Notion UI', () => { + const message = describeBodyLossyRefusal({ + pageId: 'page-1', + completeness: { + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['table_of_contents'], + }, + context: 'refusing to treat it as a clean notion-md base', + }) + expect(message).toContain('table_of_contents') + expect(message).toContain('Notion UI') + }) + + it('omits the block clause when no block types are known', () => { + const message = describeBodyLossyRefusal({ + pageId: 'page-1', + completeness: { _tag: 'lossy', reasons: ['endpoint_truncated'] }, + context: 'refusing verified body operation', + }) + expect(message).not.toContain('block(s):') + expect(message).toContain('endpoint_truncated') + }) }) describe('stableBodyFidelityStringify', () => { diff --git a/packages/@overeng/notion-core/src/mod.ts b/packages/@overeng/notion-core/src/mod.ts index 490dbca3a..ec6b6c99b 100644 --- a/packages/@overeng/notion-core/src/mod.ts +++ b/packages/@overeng/notion-core/src/mod.ts @@ -45,6 +45,7 @@ export { richTextPlainText } from './rich-text.ts' export { classifyBodyCompleteness, + describeBodyLossyRefusal, type BlockInventory, type BlockInventoryEntry, type BodyCompleteness, @@ -52,4 +53,5 @@ export { type BodyLossyReason, type MarkdownBodySnapshot, stableBodyFidelityStringify, + tolerateTreeChildPages, } from './body-fidelity.ts' diff --git a/packages/@overeng/notion-datasource-sync/src/body/adapter.unit.test.ts b/packages/@overeng/notion-datasource-sync/src/body/adapter.unit.test.ts index d180a5fd2..0bacd44ba 100644 --- a/packages/@overeng/notion-datasource-sync/src/body/adapter.unit.test.ts +++ b/packages/@overeng/notion-datasource-sync/src/body/adapter.unit.test.ts @@ -152,6 +152,7 @@ const fakeNotionMdGateway = ( (() => Effect.die('updateMarkdown should not be called by these tests')), updatePageProperties: () => Effect.die('updatePageProperties should not be called by these tests'), + retrieveDataSource: () => Effect.die('retrieveDataSource should not be called by these tests'), updatePageMetadata: () => Effect.die('updatePageMetadata should not be called by these tests'), listChildPages: () => Effect.succeed([]), createPage: () => Effect.die('createPage should not be called by these tests'), diff --git a/packages/@overeng/notion-datasource-sync/src/body/notion-md.ts b/packages/@overeng/notion-datasource-sync/src/body/notion-md.ts index 64136b2d2..3e7ca6323 100644 --- a/packages/@overeng/notion-datasource-sync/src/body/notion-md.ts +++ b/packages/@overeng/notion-datasource-sync/src/body/notion-md.ts @@ -95,6 +95,7 @@ const unknownBlockCauseFromLossyReasons = ( if (reasons.includes('rendered_markdown_has_unobserved_suffix') === true) return 'truncation' if (reasons.includes('unknown_blocks') === true) return 'unknown' if (reasons.includes('unsupported_blocks') === true) return 'unsupported' + if (reasons.includes('not_round_trip_safe_blocks') === true) return 'unsupported' if (reasons.includes('rendered_markdown_unavailable') === true) return 'unsupported' return undefined } diff --git a/packages/@overeng/notion-datasource-sync/src/e2e/body-adapter.e2e.test.ts b/packages/@overeng/notion-datasource-sync/src/e2e/body-adapter.e2e.test.ts index 144a1636d..ec292feae 100644 --- a/packages/@overeng/notion-datasource-sync/src/e2e/body-adapter.e2e.test.ts +++ b/packages/@overeng/notion-datasource-sync/src/e2e/body-adapter.e2e.test.ts @@ -683,6 +683,7 @@ describe('body adapter E2E boundary', () => { }), updatePageProperties: () => Effect.die('updatePageProperties should not be called by this test'), + retrieveDataSource: () => Effect.die('retrieveDataSource should not be called by this test'), updatePageMetadata: () => Effect.die('updatePageMetadata should not be called by this test'), listChildPages: () => Effect.succeed([]), createPage: () => Effect.die('createPage should not be called by this test'), diff --git a/packages/@overeng/notion-datasource-sync/src/e2e/cli.e2e.test.ts b/packages/@overeng/notion-datasource-sync/src/e2e/cli.e2e.test.ts index 167383b05..75a404634 100644 --- a/packages/@overeng/notion-datasource-sync/src/e2e/cli.e2e.test.ts +++ b/packages/@overeng/notion-datasource-sync/src/e2e/cli.e2e.test.ts @@ -1302,6 +1302,7 @@ describe('CLI command surface', () => { updateMarkdown: () => Effect.die('updateMarkdown should not be called by this test'), updatePageProperties: () => Effect.die('updatePageProperties should not be called by this test'), + retrieveDataSource: () => Effect.die('retrieveDataSource should not be called by this test'), updatePageMetadata: () => Effect.die('updatePageMetadata should not be called by this test'), listChildPages: () => Effect.succeed([]), createPage: () => Effect.die('createPage should not be called by this test'), diff --git a/packages/@overeng/notion-effect-client/package.json b/packages/@overeng/notion-effect-client/package.json index 6857838c4..f918a15b4 100644 --- a/packages/@overeng/notion-effect-client/package.json +++ b/packages/@overeng/notion-effect-client/package.json @@ -14,32 +14,37 @@ } }, "dependencies": { + "@effect/cluster": "0.58.2", + "@effect/experimental": "0.60.0", + "@effect/opentelemetry": "0.63.0", + "@effect/platform": "0.96.1", + "@effect/platform-node": "0.106.0", + "@effect/rpc": "0.75.1", + "@effect/vitest": "0.29.0", + "@effect/workflow": "0.18.0", "@overeng/content-address": "workspace:^", "@overeng/notion-core": "workspace:^", "@overeng/notion-effect-schema": "workspace:^", - "@overeng/otel-contract": "workspace:^" + "@overeng/otel-contract": "workspace:^", + "@overeng/utils": "workspace:^", + "@playwright/test": "1.59.1", + "effect": "3.21.2", + "remark-gfm": "4.0.1", + "remark-parse": "11.0.0", + "remark-stringify": "11.0.0", + "unified": "11.0.5", + "unist-util-visit": "5.1.0", + "vitest": "3.2.4" }, "devDependencies": { "@effect/platform": "0.96.1", "@effect/vitest": "0.29.0", - "@overeng/utils": "workspace:^", "@overeng/utils-dev": "workspace:^", "@types/node": "25.3.3", "effect": "3.21.2", "typescript": "5.9.3", "vitest": "3.2.4" }, - "peerDependencies": { - "@effect/cluster": "^0.58.2", - "@effect/experimental": "^0.60.0", - "@effect/opentelemetry": "^0.63.0", - "@effect/platform": "^0.96.1", - "@effect/platform-node": "^0.106.0", - "@effect/rpc": "^0.75.1", - "@effect/workflow": "^0.18.0", - "@playwright/test": "^1.59.1", - "effect": "^3.21.2" - }, "$genie": { "source": "package.json.genie.ts", "warning": "DO NOT EDIT - changes will be overwritten", diff --git a/packages/@overeng/notion-effect-client/package.json.genie.ts b/packages/@overeng/notion-effect-client/package.json.genie.ts index 6318968a6..39c60c094 100644 --- a/packages/@overeng/notion-effect-client/package.json.genie.ts +++ b/packages/@overeng/notion-effect-client/package.json.genie.ts @@ -14,10 +14,20 @@ import utilsPkg from '../utils/package.json.genie.ts' const runtimeDeps = catalog.compose({ workspace: workspaceMember({ memberPath: 'packages/@overeng/notion-effect-client' }), dependencies: { - workspace: [contentAddressPkg, notionCorePkg, notionEffectSchemaPkg, otelContractPkg], + // `@overeng/utils` is a runtime import (`sha256Hex` in `config.ts`), so it + // must be a real runtime dependency — not a dev/peer dep that a standalone + // consumer could fail to provide. + workspace: [contentAddressPkg, notionCorePkg, notionEffectSchemaPkg, otelContractPkg, utilsPkg], + external: catalog.pick( + 'remark-gfm', + 'remark-parse', + 'remark-stringify', + 'unified', + 'unist-util-visit', + ), }, devDependencies: { - workspace: [utilsDevPkg, utilsPkg], + workspace: [utilsDevPkg], external: { ...catalog.pick( '@effect/platform', @@ -29,9 +39,10 @@ const runtimeDeps = catalog.compose({ ), }, }, - peerDependencies: { - workspace: [utilsPkg], - }, + // `@overeng/utils` is a runtime workspace dep that carries peer dependencies + // (the @effect/* cluster + @playwright/test). `mode: 'install'` makes genie + // install those inherited peers explicitly so a standalone consumer resolves. + mode: 'install', }) export default packageJson( diff --git a/packages/@overeng/notion-effect-client/src/body-observation.ts b/packages/@overeng/notion-effect-client/src/body-observation.ts index d53a34b0e..974ef95cc 100644 --- a/packages/@overeng/notion-effect-client/src/body-observation.ts +++ b/packages/@overeng/notion-effect-client/src/body-observation.ts @@ -16,6 +16,7 @@ import { type BodyEvidenceFingerprint, type RemoteBodyObservationEvidence, } from './body-evidence.ts' +import { canonicalizeBlockMarkdown } from './canonical-markdown.ts' import type { NotionConfig } from './config.ts' import type { NotionApiError } from './error.ts' import { NotionMarkdown } from './markdown.ts' @@ -73,7 +74,17 @@ export const observeFromSnapshots = Effect.fn('NotionBody.observeFromSnapshots') readonly beforeLastEditedTime?: string readonly afterLastEditedTime?: string }) { - const renderedMarkdown = yield* NotionMarkdown.treeToMarkdown({ tree: opts.tree }) + /* + * Canonicalize the rendered body once, at the source, before it flows into + * the inventory, the fidelity classifier, and the evidence fingerprint — so + * the evidence, the classifier, pull, hash, and push all see the same + * canonical bytes (decision 0019, "agree by construction"). The renderer + * emits parseable-not-canonical Markdown (it joins sibling blocks with `\n\n`); + * this is the single place that turns that into the one canonical form. + */ + const renderedMarkdown = canonicalizeBlockMarkdown( + yield* NotionMarkdown.treeToMarkdown({ tree: opts.tree }), + ) const markdown = { markdown: opts.markdown.markdown, truncated: opts.markdown.truncated, diff --git a/packages/@overeng/notion-effect-client/src/body-observation.unit.test.ts b/packages/@overeng/notion-effect-client/src/body-observation.unit.test.ts index 6d47f4191..667c015e7 100644 --- a/packages/@overeng/notion-effect-client/src/body-observation.unit.test.ts +++ b/packages/@overeng/notion-effect-client/src/body-observation.unit.test.ts @@ -198,6 +198,7 @@ describe('NotionBody.observeFromSnapshots', () => { expect(observed.completeness).toEqual({ _tag: 'lossy', reasons: ['rendered_markdown_has_unobserved_suffix'], + lossyBlockTypes: [], }) expect(observed.inventory.entries.map((entry) => entry.type)).toEqual([ 'paragraph', @@ -348,10 +349,13 @@ describe('NotionBody.observeFromSnapshots', () => { }), ) - expect(observed.inventory.renderedMarkdown).toBe('# Heading\n\nNested child') + // Canonical body ends in a single trailing newline (decision 0019); + // the lossy verdict (endpoint missing the rendered suffix) is unchanged. + expect(observed.inventory.renderedMarkdown).toBe('# Heading\n\nNested child\n') expect(observed.completeness).toEqual({ _tag: 'lossy', reasons: ['rendered_markdown_has_unobserved_suffix'], + lossyBlockTypes: [], }) }) }) @@ -373,7 +377,8 @@ describe('NotionBody.observe', () => { expect(observed.pageId).toBe(pageId) expect(observed.markdown.markdown).toBe('Stable body') - expect(observed.inventory.renderedMarkdown).toBe('Stable body') + // Canonical rendered body ends in a trailing newline (decision 0019). + expect(observed.inventory.renderedMarkdown).toBe('Stable body\n') expect(observed.completeness).toEqual({ _tag: 'complete' }) expect(test.requests).toEqual([ { method: 'GET', path: `/v1/pages/${pageId}` }, @@ -404,7 +409,7 @@ describe('NotionBody.observe', () => { ) expect(observed.markdown.markdown).toBe('Retry attempt body') - expect(observed.inventory.renderedMarkdown).toBe('Retry attempt body') + expect(observed.inventory.renderedMarkdown).toBe('Retry attempt body\n') expect(test.counts()).toEqual({ pageCalls: 4, markdownCalls: 2, diff --git a/packages/@overeng/notion-effect-client/src/canonical-markdown.ts b/packages/@overeng/notion-effect-client/src/canonical-markdown.ts new file mode 100644 index 000000000..077db6447 --- /dev/null +++ b/packages/@overeng/notion-effect-client/src/canonical-markdown.ts @@ -0,0 +1,111 @@ +import remarkGfm from 'remark-gfm' +import remarkParse from 'remark-parse' +import remarkStringify from 'remark-stringify' +import { unified } from 'unified' +import { visit } from 'unist-util-visit' + +import { canonicalizeMediaUrlsInMarkdown } from './media-url.ts' + +/* + * Canonical Markdown serialization used as the wire and on-disk form. + * + * Why a canonical form: Notion's enhanced-Markdown endpoint reserializes any + * pushed body into its own block model, so byte-equal roundtrips are not + * achievable. We define one canonical shape (CommonMark + GFM, paragraphs + * unwrapped onto a single logical line, ATX headings, hyphen list bullets, + * tight lists) and normalize both push input and pull output to it. The + * push-side guard then checks canonical equality instead of byte equality, and + * the visible Notion page no longer shows hard breaks from soft-wrapped source + * paragraphs. + * + * This lives beside the renderer (`treeToMarkdown`) and the media-URL + * canonicalizer (`media-url.ts`) it calls, so the canonical body is produced + * where the bytes originate: `body-observation` emits an already-canonical + * `renderedMarkdown`, and the evidence fingerprint, the fidelity classifier, + * pull, hash, and push all see the same canonical string (decision 0019). + */ + +/* + * Soft line breaks inside a paragraph (a literal `\n` in source) render as + * hard line breaks on Notion. Collapse them to single spaces so a logical + * paragraph survives as one Notion block. Authors who want a hard break must + * use the explicit `break` node (two trailing spaces or a backslash). + */ +const unwrapSoftBreaks: () => (tree: unknown) => void = () => (tree) => { + visit(tree as never, 'text', (node: { value: string }) => { + if (node.value.includes('\n') === true) { + node.value = node.value.replace(/[ \t]*\n[ \t]*/g, ' ') + } + }) +} + +/* + * Force every list and list item tight (`spread = false`), so remark-stringify + * emits a single `\n` (not a blank line) between consecutive items. + * + * The block-tree renderer (`treeToMarkdown`) joins every sibling block — + * including consecutive list items — with `\n\n`, producing a *loose* + * CommonMark list (a blank line between every bullet) plus a stray indented + * blank line inside nested lists. `remark-stringify` preserves list tightness + * from its input, so re-stringifying a loose list stays loose unless we flip + * `spread` off here. This is the single place that owns list-tightness policy: + * pull and push both route through `canonicalizeBlockMarkdown`, so the canonical + * body is tight regardless of how the renderer joined the siblings. + * + * It only flips `spread`; it never removes the blank line *before* a following + * non-list block (that boundary is structural, not list-internal), so a + * paragraph after a list keeps its separating blank line. + */ +const forceTightLists: () => (tree: unknown) => void = () => (tree) => { + visit(tree as never, (node: { type: string; spread?: boolean }) => { + if (node.type === 'list' || node.type === 'listItem') { + node.spread = false + } + }) +} + +const processor = unified() + .use(remarkParse) + .use(remarkGfm) + .use(unwrapSoftBreaks) + .use(forceTightLists) + .use(remarkStringify, { + bullet: '-', + emphasis: '_', + strong: '*', + fence: '`', + fences: true, + listItemIndent: 'one', + rule: '-', + setext: false, + tightDefinitions: true, + }) + +/** + * Reduce arbitrary Markdown to the single canonical body form, applied at BOTH + * Notion wire boundaries — pull receive and push send — so the body a consumer + * reads (`cat` / `edit` / file sync), the body hashed, and the body pushed are + * the same bytes (decision 0019). The steps, in order: + * + * 1. line-ending normalize (CRLF/CR → LF) + * 2. hosted-media URL canonicalize (volatile signature/expiry query params + * stripped, decision 0007 / R36) via the same shared function the renderer + * uses, so a rotated signed URL compares equal across pulls + * 3. remark parse + GFM + * 4. `unwrapSoftBreaks` — collapse intra-paragraph soft breaks + * 5. `forceTightLists` — `spread = false` on every list / list item + * 6. remark-stringify (the config above) + * 7. ensure a single trailing newline + * + * Spacing/tightness policy lives only here: the renderer emits parseable-not- + * canonical Markdown (it joins blocks with `\n\n` so they stay distinct), and + * this layer decides the canonical shape. The renderer joins must not be made + * block-type-aware — that would re-split the policy across two serializers. + */ +export const canonicalizeBlockMarkdown = (markdown: string): string => { + const normalized = canonicalizeMediaUrlsInMarkdown( + markdown.replace(/\r\n/g, '\n').replace(/\r/g, '\n'), + ) + const rendered = processor.processSync(normalized).toString() + return rendered.endsWith('\n') === true ? rendered : `${rendered}\n` +} diff --git a/packages/@overeng/notion-effect-client/src/canonical-markdown.unit.test.ts b/packages/@overeng/notion-effect-client/src/canonical-markdown.unit.test.ts new file mode 100644 index 000000000..7a036a913 --- /dev/null +++ b/packages/@overeng/notion-effect-client/src/canonical-markdown.unit.test.ts @@ -0,0 +1,85 @@ +import { describe, expect, it } from '@effect/vitest' + +import { canonicalizeBlockMarkdown } from './canonical-markdown.ts' + +describe('canonicalizeBlockMarkdown', () => { + it('unwraps soft-wrapped paragraph lines into one logical line', () => { + const wrapped = [ + 'Use this skill when designing software and you need a', + 'principled read on whether a code-level solution makes the system', + 'simpler.', + ].join('\n') + + expect(canonicalizeBlockMarkdown(wrapped)).toBe( + 'Use this skill when designing software and you need a principled read on whether a code-level solution makes the system simpler.\n', + ) + }) + + it('preserves paragraph boundaries on blank lines', () => { + const input = 'First paragraph.\n\nSecond paragraph.' + expect(canonicalizeBlockMarkdown(input)).toBe('First paragraph.\n\nSecond paragraph.\n') + }) + + it('preserves explicit hard breaks', () => { + const input = 'Line one.\\\nLine two.' + expect(canonicalizeBlockMarkdown(input)).toBe('Line one.\\\nLine two.\n') + }) + + it('keeps list structure with unwrapped continuations', () => { + const input = ['- first item that wraps across', ' two lines', '- second item'].join('\n') + expect(canonicalizeBlockMarkdown(input)).toBe( + '- first item that wraps across two lines\n- second item\n', + ) + }) + + it('leaves fenced code blocks untouched', () => { + const input = '```ts\nconst x = 1\nconst y = 2\n```' + expect(canonicalizeBlockMarkdown(input)).toBe('```ts\nconst x = 1\nconst y = 2\n```\n') + }) + + it('is idempotent', () => { + const input = 'Paragraph one wraps\nacross lines.\n\nParagraph two.' + const once = canonicalizeBlockMarkdown(input) + expect(canonicalizeBlockMarkdown(once)).toBe(once) + }) + + it('normalizes CRLF line endings to LF', () => { + const input = 'Line one\r\nstill line one.\r\n\r\nLine two.' + expect(canonicalizeBlockMarkdown(input)).toBe('Line one still line one.\n\nLine two.\n') + }) + + /* + * List-tightness locking tests (decision 0019). The renderer joins sibling + * blocks with `\n\n`, producing a *loose* list; the canonical layer is the + * single place that forces lists tight (`spread = false`) while leaving the + * blank line before a following non-list block intact. + */ + it('forces a loose bulleted list tight', () => { + expect(canonicalizeBlockMarkdown('- a\n\n- b\n\n- c\n')).toBe('- a\n- b\n- c\n') + }) + + it('keeps the blank line before a paragraph following a list', () => { + expect(canonicalizeBlockMarkdown('- a\n\n- b\n\nA paragraph after.\n')).toBe( + '- a\n- b\n\nA paragraph after.\n', + ) + }) + + it('keeps consecutive headings blank-separated even from tight input', () => { + expect(canonicalizeBlockMarkdown('# H1\n## H2\n### H3\n')).toBe('# H1\n\n## H2\n\n### H3\n') + }) + + it('removes the stray indented blank line inside a nested list', () => { + // The shape `treeToMarkdown` produces for a nested list: loose siblings + // plus a ` ` whitespace-only line between nested items. + expect(canonicalizeBlockMarkdown('- A\n\n- B\n - B1\n \n - B2\n\n- C\n')).toBe( + '- A\n- B\n - B1\n - B2\n- C\n', + ) + }) + + it('is idempotent over a list + paragraph body', () => { + const input = '- a\n\n- b\n\nAfter.\n' + const once = canonicalizeBlockMarkdown(input) + expect(canonicalizeBlockMarkdown(once)).toBe(once) + expect(once).toBe('- a\n- b\n\nAfter.\n') + }) +}) diff --git a/packages/@overeng/notion-effect-client/src/config.ts b/packages/@overeng/notion-effect-client/src/config.ts index df84472cb..34f9a6f71 100644 --- a/packages/@overeng/notion-effect-client/src/config.ts +++ b/packages/@overeng/notion-effect-client/src/config.ts @@ -1,5 +1,7 @@ import { Config, Context, Effect, Option, Redacted, Schema } from 'effect' +import { sha256Hex } from '@overeng/utils' + export { NOTION_API_BASE_URL, NOTION_API_VERSION } from '@overeng/notion-core' /** Configuration for the Notion client */ @@ -54,3 +56,25 @@ export const resolveNotionToken = Effect.fn('resolveNotionToken')(function* () { envVars: NOTION_TOKEN_ENV_VARS, }) }) + +/** + * Produce a log-safe fingerprint of a Notion integration token so a user can + * tell *which* credential is active without leaking the secret. + * + * Format: `` `<scheme>…#<8hex>` `` where `<scheme>` is the public token-type + * prefix (everything up to and including the first `_`, e.g. `ntn_` / `secret_`) + * and `<8hex>` is the first 8 hex chars of `sha256(token)`. An empty token + * yields `<none>`. A token with no `_` yields `…#<8hex>` with an empty scheme. + * + * Security: never emits any secret bytes beyond the scheme prefix. The first-`_` + * split is done via `indexOf` (not `split('_')[0]`, which would return the whole + * token when no `_` is present). + */ +export const notionTokenFingerprint = (token: Redacted.Redacted<string>): string => { + const raw = Redacted.value(token) + if (raw.length === 0) return '<none>' + const sep = raw.indexOf('_') + const scheme = sep === -1 ? '' : raw.slice(0, sep + 1) + const digest = sha256Hex(raw).slice(0, 8) + return `${scheme}…#${digest}` +} diff --git a/packages/@overeng/notion-effect-client/src/config.unit.test.ts b/packages/@overeng/notion-effect-client/src/config.unit.test.ts new file mode 100644 index 000000000..ed2c6edb0 --- /dev/null +++ b/packages/@overeng/notion-effect-client/src/config.unit.test.ts @@ -0,0 +1,44 @@ +import { Redacted } from 'effect' +import { describe, expect, it } from 'vitest' + +import { sha256Hex } from '@overeng/utils' + +import { notionTokenFingerprint } from './config.ts' + +describe('notionTokenFingerprint', () => { + it('formats as `<scheme>…#<8hex>` for a scheme-prefixed token', () => { + const fp = notionTokenFingerprint(Redacted.make('ntn_SECRETBODYxyz')) + expect(fp).toMatch(/^ntn_…#[0-9a-f]{8}$/u) + /* The 8 hex chars are the first 8 of sha256(token). */ + expect(fp).toBe(`ntn_…#${sha256Hex('ntn_SECRETBODYxyz').slice(0, 8)}`) + }) + + it('never leaks token bytes beyond the scheme prefix', () => { + const fp = notionTokenFingerprint(Redacted.make('ntn_SECRETBODYxyz')) + expect(fp).not.toContain('SECRETBODY') + expect(fp).not.toContain('xyz') + }) + + it('is stable across calls for the same token', () => { + const token = Redacted.make('secret_abc123') + expect(notionTokenFingerprint(token)).toBe(notionTokenFingerprint(token)) + }) + + it('distinguishes two different tokens', () => { + const a = notionTokenFingerprint(Redacted.make('ntn_aaaaaaaa')) + const b = notionTokenFingerprint(Redacted.make('ntn_bbbbbbbb')) + expect(a).not.toBe(b) + }) + + it('returns `<none>` for an empty token', () => { + expect(notionTokenFingerprint(Redacted.make(''))).toBe('<none>') + }) + + it('emits an empty scheme (no leaked bytes) for a token with no underscore', () => { + const fp = notionTokenFingerprint(Redacted.make('nounderscoreSECRET')) + expect(fp).toMatch(/^…#[0-9a-f]{8}$/u) + expect(fp).not.toContain('nounderscore') + expect(fp).not.toContain('SECRET') + expect(fp).toBe(`…#${sha256Hex('nounderscoreSECRET').slice(0, 8)}`) + }) +}) diff --git a/packages/@overeng/notion-effect-client/src/markdown.ts b/packages/@overeng/notion-effect-client/src/markdown.ts index b13cbe642..e5779ef49 100644 --- a/packages/@overeng/notion-effect-client/src/markdown.ts +++ b/packages/@overeng/notion-effect-client/src/markdown.ts @@ -20,6 +20,7 @@ import { NotionBlocks, type RetrieveNestedOptions, } from './blocks.ts' +import { canonicalizeMediaUrl } from './media-url.ts' // ----------------------------------------------------------------------------- // Types @@ -120,7 +121,10 @@ export const getBlockCaption = (block: BlockWithData): RichTextArray => { /** * Get URL from various block types (image, video, audio, file, pdf, bookmark, embed, link_preview). * - * Handles both external URLs and Notion-hosted file URLs. + * Handles both external URLs and Notion-hosted file URLs. Notion-hosted file + * URLs (`file.url`) are canonicalized — their volatile signature/expiry query + * params are stripped (decision 0007 / R36) — so the rendered body is + * deterministic across pulls. External URLs are returned untouched. * * @example * ```ts @@ -141,7 +145,7 @@ export const getBlockUrl = (block: BlockWithData): string | undefined => { if (typeData?.url !== undefined) return typeData.url if (typeData?.external?.url !== undefined) return typeData.external.url - if (typeData?.file?.url !== undefined) return typeData.file.url + if (typeData?.file?.url !== undefined) return canonicalizeMediaUrl(typeData.file.url) return undefined } @@ -670,196 +674,9 @@ export const BlockHelpers = { * }) * ``` */ -// ----------------------------------------------------------------------------- -// Markdown → Notion Blocks -// ----------------------------------------------------------------------------- - -const RICH_TEXT_CHUNK_SIZE = 2000 - -/** Parse inline markdown (**bold**, *italic*) into Notion rich_text elements */ -export const parseInlineMarkdown = (text: string): Array<Record<string, unknown>> => { - const elements: Array<Record<string, unknown>> = [] - const regex = /\*\*(.+?)\*\*|\*(.+?)\*/g - let lastIndex = 0 - let match: RegExpExecArray | null - - while ((match = regex.exec(text)) !== null) { - if (match.index > lastIndex) { - const before = text.slice(lastIndex, match.index) - if (before.length > 0) elements.push({ type: 'text', text: { content: before } }) - } - if (match[1] !== undefined) { - elements.push({ - type: 'text', - text: { content: match[1] }, - annotations: { bold: true }, - }) - } else if (match[2] !== undefined) { - elements.push({ - type: 'text', - text: { content: match[2] }, - annotations: { italic: true }, - }) - } - lastIndex = match.index + match[0].length - } - - const remaining = text.slice(lastIndex) - if (remaining.length > 0) elements.push({ type: 'text', text: { content: remaining } }) - - if (elements.length === 0) elements.push({ type: 'text', text: { content: text } }) - - return elements.flatMap((el) => { - const content = (el as { text: { content: string } }).text.content - if (content.length <= RICH_TEXT_CHUNK_SIZE) return [el] - const chunks: Array<Record<string, unknown>> = [] - for (let i = 0; i < content.length; i += RICH_TEXT_CHUNK_SIZE) { - chunks.push({ - ...el, - text: { content: content.slice(i, i + RICH_TEXT_CHUNK_SIZE) }, - }) - } - return chunks - }) -} - -/** Parse a single table row (|col1|col2|) into cell strings */ -const parseTableRow = (line: string): string[] => - line - .trim() - .replace(/^\||\|$/g, '') - .split('|') - .map((cell) => cell.trim()) - -/** Check if a line is a markdown table separator (|---|---| or |:---:|) */ -const isTableSeparator = (line: string): boolean => - /^\|[\s:?-]+(\|[\s:?-]+)+\|?\s*$/.test(line.trim()) - -/** Parse markdown table lines into a Notion table block */ -const parseMarkdownTable = (lines: string[]): Record<string, unknown> | undefined => { - if (lines.length < 2) return undefined - if (lines.every((l) => l.trim().startsWith('|')) === false) return undefined - - const sepIdx = lines.findIndex((l) => isTableSeparator(l)) - if (sepIdx < 0) return undefined - - const headerLines = lines.slice(0, sepIdx) - const dataLines = lines.slice(sepIdx + 1) - - const headerCells = headerLines.length > 0 ? parseTableRow(headerLines[0]!) : [] - const tableWidth = headerCells.length - if (tableWidth === 0) return undefined - - const toRow = (cells: string[]) => ({ - type: 'table_row' as const, - table_row: { - cells: Array.from({ length: tableWidth }, (_, i) => parseInlineMarkdown(cells[i] ?? '')), - }, - }) - - const children = [ - toRow(headerCells), - ...dataLines.filter((l) => l.trim().length > 0).map((l) => toRow(parseTableRow(l))), - ] - - return { - type: 'table', - table: { - table_width: tableWidth, - has_column_header: true, - has_row_header: false, - children, - }, - } -} - -/** Convert markdown text to Notion blocks (headings, lists, dividers, paragraphs, tables) */ -export const markdownToBlocks = (markdown: string): Array<Record<string, unknown>> => { - const blocks: Array<Record<string, unknown>> = [] - const normalized = markdown.replace(/<br\s*\/?>\s*/gi, '\n') - const rawParagraphs = normalized.split(/\n\n+/) - const paragraphs: string[] = [] - for (const raw of rawParagraphs) { - const lines = raw.split('\n') - let current: string[] = [] - for (const line of lines) { - const isBlockStart = /^#{1,4}\s|^-{3,}$|^- |^\d+\.\s/.test(line.trim()) - if (isBlockStart === true && current.length > 0) { - paragraphs.push(current.join('\n')) - current = [line] - } else { - current.push(line) - } - } - if (current.length > 0) paragraphs.push(current.join('\n')) - } - - for (const para of paragraphs) { - const trimmed = para.trim() - if (trimmed.length === 0) continue - - if (/^-{3,}$/.test(trimmed) === true) { - blocks.push({ type: 'divider', divider: {} }) - continue - } - - const headingMatch = trimmed.match(/^(#{1,4})\s+(.+)$/) - if (headingMatch?.[1] !== undefined && headingMatch[2] !== undefined) { - const level = headingMatch[1].length as 1 | 2 | 3 | 4 - const type = `heading_${level}` as const - blocks.push({ - type, - [type]: { rich_text: parseInlineMarkdown(headingMatch[2]) }, - }) - continue - } - - const lines = trimmed.split('\n') - const allBullets = lines.every((l) => l.trim().startsWith('- ')) - const allNumbered = lines.every((l) => /^\d+\.\s/.test(l.trim())) - - if (allBullets === true) { - for (const line of lines) { - const content = line.trim().replace(/^- /, '') - blocks.push({ - type: 'bulleted_list_item', - bulleted_list_item: { rich_text: parseInlineMarkdown(content) }, - }) - } - continue - } - - if (allNumbered === true) { - for (const line of lines) { - const content = line.trim().replace(/^\d+\.\s/, '') - blocks.push({ - type: 'numbered_list_item', - numbered_list_item: { rich_text: parseInlineMarkdown(content) }, - }) - } - continue - } - - const tableBlock = parseMarkdownTable(lines) - if (tableBlock !== undefined) { - blocks.push(tableBlock) - continue - } - - const text = trimmed.replace(/ \n/g, '\n') - blocks.push({ - type: 'paragraph', - paragraph: { rich_text: parseInlineMarkdown(text) }, - }) - } - - return blocks -} - /** Namespace for Notion markdown conversion utilities */ export const NotionMarkdown = { pageToMarkdown, treeToMarkdown, blocksToMarkdown, - markdownToBlocks, } as const diff --git a/packages/@overeng/notion-effect-client/src/markdown.unit.test.ts b/packages/@overeng/notion-effect-client/src/markdown.unit.test.ts index 85fe4dc40..9e43d1bd7 100644 --- a/packages/@overeng/notion-effect-client/src/markdown.unit.test.ts +++ b/packages/@overeng/notion-effect-client/src/markdown.unit.test.ts @@ -14,8 +14,6 @@ import { getEquationExpression, getTableRowCells, isTodoChecked, - markdownToBlocks, - parseInlineMarkdown, treeToMarkdown, } from './markdown.ts' @@ -128,6 +126,28 @@ describe('Block Helpers', () => { expect(getBlockUrl(block)).toBe('https://s3.notion.so/file.pdf') }) + it('canonicalizes a signed Notion-hosted file URL (strips signature params)', () => { + const block = mockBlock('image', { + type: 'file', + file: { + url: + 'https://prod-files-secure.s3.us-west-2.amazonaws.com/abc/photo.png' + + '?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Expires=3600&X-Amz-Signature=deadbeef', + }, + }) + expect(getBlockUrl(block)).toBe( + 'https://prod-files-secure.s3.us-west-2.amazonaws.com/abc/photo.png', + ) + }) + + it('leaves a benign external query param on an external URL untouched', () => { + const block = mockBlock('image', { + type: 'external', + external: { url: 'https://example.com/img.png?v=2' }, + }) + expect(getBlockUrl(block)).toBe('https://example.com/img.png?v=2') + }) + it('extracts URL from bookmark', () => { const block = mockBlock('bookmark', { url: 'https://google.com' }) expect(getBlockUrl(block)).toBe('https://google.com') @@ -281,392 +301,6 @@ describe('Block Helpers', () => { }) }) -describe('parseInlineMarkdown', () => { - it('handles bold', () => { - const result = parseInlineMarkdown('hello **world**') - expect(result).toMatchInlineSnapshot(` - [ - { - "text": { - "content": "hello ", - }, - "type": "text", - }, - { - "annotations": { - "bold": true, - }, - "text": { - "content": "world", - }, - "type": "text", - }, - ] - `) - }) - - it('handles italic', () => { - const result = parseInlineMarkdown('hello *world*') - expect(result).toMatchInlineSnapshot(` - [ - { - "text": { - "content": "hello ", - }, - "type": "text", - }, - { - "annotations": { - "italic": true, - }, - "text": { - "content": "world", - }, - "type": "text", - }, - ] - `) - }) - - it('handles mixed bold and italic', () => { - const result = parseInlineMarkdown('**bold** and *italic*') - expect(result).toMatchInlineSnapshot(` - [ - { - "annotations": { - "bold": true, - }, - "text": { - "content": "bold", - }, - "type": "text", - }, - { - "text": { - "content": " and ", - }, - "type": "text", - }, - { - "annotations": { - "italic": true, - }, - "text": { - "content": "italic", - }, - "type": "text", - }, - ] - `) - }) - - it('returns plain text when no markdown', () => { - const result = parseInlineMarkdown('plain text') - expect(result).toMatchInlineSnapshot(` - [ - { - "text": { - "content": "plain text", - }, - "type": "text", - }, - ] - `) - }) -}) - -describe('markdownToBlocks', () => { - it('converts headings', () => { - const blocks = markdownToBlocks('# H1\n\n## H2\n\n### H3') - expect(blocks).toMatchInlineSnapshot(` - [ - { - "heading_1": { - "rich_text": [ - { - "text": { - "content": "H1", - }, - "type": "text", - }, - ], - }, - "type": "heading_1", - }, - { - "heading_2": { - "rich_text": [ - { - "text": { - "content": "H2", - }, - "type": "text", - }, - ], - }, - "type": "heading_2", - }, - { - "heading_3": { - "rich_text": [ - { - "text": { - "content": "H3", - }, - "type": "text", - }, - ], - }, - "type": "heading_3", - }, - ] - `) - }) - - it('converts bullet lists', () => { - const blocks = markdownToBlocks('- item one\n- item two') - expect(blocks).toMatchInlineSnapshot(` - [ - { - "bulleted_list_item": { - "rich_text": [ - { - "text": { - "content": "item one", - }, - "type": "text", - }, - ], - }, - "type": "bulleted_list_item", - }, - { - "bulleted_list_item": { - "rich_text": [ - { - "text": { - "content": "item two", - }, - "type": "text", - }, - ], - }, - "type": "bulleted_list_item", - }, - ] - `) - }) - - it('converts numbered lists', () => { - const blocks = markdownToBlocks('1. first\n2. second') - expect(blocks).toMatchInlineSnapshot(` - [ - { - "numbered_list_item": { - "rich_text": [ - { - "text": { - "content": "first", - }, - "type": "text", - }, - ], - }, - "type": "numbered_list_item", - }, - { - "numbered_list_item": { - "rich_text": [ - { - "text": { - "content": "second", - }, - "type": "text", - }, - ], - }, - "type": "numbered_list_item", - }, - ] - `) - }) - - it('converts dividers', () => { - const blocks = markdownToBlocks('---') - expect(blocks).toMatchInlineSnapshot(` - [ - { - "divider": {}, - "type": "divider", - }, - ] - `) - }) - - it('converts simple tables', () => { - const blocks = markdownToBlocks('| A | B |\n|---|---|\n| 1 | 2 |') - expect(blocks).toMatchInlineSnapshot(` - [ - { - "table": { - "children": [ - { - "table_row": { - "cells": [ - [ - { - "text": { - "content": "A", - }, - "type": "text", - }, - ], - [ - { - "text": { - "content": "B", - }, - "type": "text", - }, - ], - ], - }, - "type": "table_row", - }, - { - "table_row": { - "cells": [ - [ - { - "text": { - "content": "1", - }, - "type": "text", - }, - ], - [ - { - "text": { - "content": "2", - }, - "type": "text", - }, - ], - ], - }, - "type": "table_row", - }, - ], - "has_column_header": true, - "has_row_header": false, - "table_width": 2, - }, - "type": "table", - }, - ] - `) - }) - - it('converts tables with bold cells', () => { - const blocks = markdownToBlocks('| **Name** | Value |\n|---|---|\n| **A** | 1 |') - const table = blocks[0] as { table: { children: Array<{ table_row: { cells: unknown[][] } }> } } - const headerCells = table.table.children[0]!.table_row.cells - expect(headerCells[0]).toMatchInlineSnapshot(` - [ - { - "annotations": { - "bold": true, - }, - "text": { - "content": "Name", - }, - "type": "text", - }, - ] - `) - }) - - it('handles large table with multiple rows', () => { - const md = [ - '| Projekt | Leistung | Standort |', - '|---|---|---|', - '| Solar Park A | 10 MW | Bayern |', - '| Solar Park B | 25 MW | NRW |', - '| Solar Park C | 15 MW | Sachsen |', - '| Solar Park D | 30 MW | Brandenburg |', - '| Solar Park E | 20 MW | Hessen |', - ].join('\n') - const blocks = markdownToBlocks(md) - expect(blocks).toHaveLength(1) - const table = blocks[0] as { table: { children: unknown[]; table_width: number } } - expect(table.table.table_width).toBe(3) - expect(table.table.children).toHaveLength(6) // 1 header + 5 data rows - }) - - it('handles alignment markers in tables', () => { - const md = '| Left | Center | Right |\n|:---|:---:|---:|\n| a | b | c |' - const blocks = markdownToBlocks(md) - expect(blocks).toHaveLength(1) - expect(blocks[0]).toHaveProperty('type', 'table') - }) - - it('falls back to paragraph for non-table pipe content', () => { - const blocks = markdownToBlocks('this | is | not a table') - expect(blocks).toHaveLength(1) - expect(blocks[0]).toHaveProperty('type', 'paragraph') - }) - - it('pads missing cells to table width', () => { - const md = '| A | B | C |\n|---|---|---|\n| 1 |' - const blocks = markdownToBlocks(md) - const table = blocks[0] as { table: { children: Array<{ table_row: { cells: unknown[][] } }> } } - const dataRow = table.table.children[1]!.table_row.cells - expect(dataRow).toHaveLength(3) - // Missing cells should get empty string content - expect(dataRow[1]).toMatchInlineSnapshot(` - [ - { - "text": { - "content": "", - }, - "type": "text", - }, - ] - `) - }) - - it('handles mixed headings + tables + paragraphs', () => { - const md = '# Title\n\nSome text here.\n\n| A | B |\n|---|---|\n| 1 | 2 |\n\n## Section' - const blocks = markdownToBlocks(md) - expect(blocks.map((b) => b.type)).toMatchInlineSnapshot(` - [ - "heading_1", - "paragraph", - "table", - "heading_2", - ] - `) - }) - - it('normalizes <br> before parsing', () => { - const blocks = markdownToBlocks('line1<br>line2<br/>line3') - expect(blocks).toHaveLength(1) - expect(blocks[0]).toHaveProperty('type', 'paragraph') - const para = blocks[0] as { paragraph: { rich_text: Array<{ text: { content: string } }> } } - expect(para.paragraph.rich_text[0]!.text.content).toBe('line1\nline2\nline3') - }) - - it('splits inline block markers within paragraphs', () => { - const blocks = markdownToBlocks('some text\n# Heading') - expect(blocks.map((b) => b.type)).toMatchInlineSnapshot(` - [ - "paragraph", - "heading_1", - ] - `) - }) -}) - describe('treeToMarkdown', () => { it('renders heading children', async () => { const markdown = await Effect.runPromise( diff --git a/packages/@overeng/notion-effect-client/src/media-url.ts b/packages/@overeng/notion-effect-client/src/media-url.ts new file mode 100644 index 000000000..b292f9d79 --- /dev/null +++ b/packages/@overeng/notion-effect-client/src/media-url.ts @@ -0,0 +1,112 @@ +/* + * Hosted-media URL canonicalization (decision 0007 / R36). + * + * Notion-hosted media (image/file/video/pdf with `type: "file"`) renders with + * an expiring signed S3 URL whose signature query params (`X-Amz-*`, + * `Signature`, `Expires`, …) rotate on every pull. Left raw, the rendered body + * hash is volatile — breaking `cat`→`put` idempotence, staling base hashes with + * zero edits, and causing media-page pushes to be rejected by the post-push + * `semanticEquivalent` gate. + * + * Canonicalization strips only the volatile signature/expiry family of query + * params (by name) and keeps everything else, so a benign external query param + * (`?v=2`, `?w=800`) survives untouched. This single function is the shared + * source of truth: the renderer applies it so pull output is deterministic, and + * `canonical-markdown.ts` applies it inside `semanticEquivalent` / + * `canonicalizeBlockMarkdown` so the gate compares the same canonical form. + */ + +/* + * Volatile query-parameter names stripped from a signed URL. Covers the AWS + * SigV4 family Notion's S3 presigner emits plus the legacy SigV2 names, matched + * case-insensitively. Any other param (including benign external ones) is kept. + */ +const VOLATILE_QUERY_PARAMS: ReadonlySet<string> = new Set([ + 'x-amz-algorithm', + 'x-amz-credential', + 'x-amz-date', + 'x-amz-expires', + 'x-amz-signature', + 'x-amz-signedheaders', + 'x-amz-security-token', + 'x-amz-content-sha256', + // Legacy SigV2 / generic presign params. + 'awsaccesskeyid', + 'signature', + 'expires', +]) + +/** + * Strip the volatile signature/expiry query params from a single URL while + * keeping its origin, path, and any non-volatile query params. Non-URL or + * unparseable input is returned unchanged. + */ +export const canonicalizeMediaUrl = (url: string): string => { + let parsed: URL + try { + parsed = new URL(url) + } catch { + return url + } + + const volatileKeys = Array.from(parsed.searchParams.keys()).filter((key) => + VOLATILE_QUERY_PARAMS.has(key.toLowerCase()), + ) + if (volatileKeys.length === 0) return url + for (const key of volatileKeys) parsed.searchParams.delete(key) + + /* + * Rebuild from origin + pathname + remaining query so the result is the bare + * canonical URL. `URL.toString()` would re-encode and append a trailing `?` + * edge case; assemble explicitly instead. + */ + const remaining = parsed.searchParams.toString() + const query = remaining === '' ? '' : `?${remaining}` + return `${parsed.origin}${parsed.pathname}${query}${parsed.hash}` +} + +/* + * Matches a Markdown URL occurrence inside `(...)` of an image/link, i.e. the + * `](<url>)` form the renderer emits for media blocks. We only rewrite URLs in + * that position so prose text that happens to contain `X-Amz-` is left alone. + */ +const MARKDOWN_URL_RE = /\]\(([^)\s]+)\)/g + +/* + * Hosts that serve Notion-hosted media (the signed S3/CDN URLs whose signature + * params rotate per pull). An EXTERNAL signed URL (a user's own private S3 + * bucket embedded by URL) must NOT be touched — stripping its signature breaks + * the credentials. So the markdown-string path (which, unlike the renderer, + * cannot see `file` vs `external`) only canonicalizes URLs on these hosts. + */ +const isNotionMediaHost = (parsed: URL): boolean => + parsed.hostname === 'prod-files-secure.s3.us-west-2.amazonaws.com' || + parsed.hostname === 'file.notion.so' || + parsed.hostname.endsWith('.notion.so') || + parsed.hostname.endsWith('.notion-static.com') || + // Older bucket-in-path form: s3*.amazonaws.com/secure.notion-static.com/... + (parsed.hostname.endsWith('.amazonaws.com') && + parsed.pathname.startsWith('/secure.notion-static.com/')) + +/** + * Canonicalize every hosted-media URL embedded in a Markdown string. Used by + * the hashing/gating path where only the rendered text (not the source block) + * is available. Unlike the renderer it cannot see `file` vs. `external`, so it + * restricts signature stripping to known Notion-media hosts (see + * `isNotionMediaHost`). External signed URLs — e.g. a user's own private S3 + * bucket embedded by URL, whose `X-Amz-*` params are load-bearing credentials — + * are preserved untouched. + */ +export const canonicalizeMediaUrlsInMarkdown = (markdown: string): string => + markdown.replace(MARKDOWN_URL_RE, (match, url: string) => { + let parsed: URL + try { + parsed = new URL(url) + } catch { + return match + } + if (isNotionMediaHost(parsed) === false) return match + + const canonical = canonicalizeMediaUrl(url) + return canonical === url ? match : `](${canonical})` + }) diff --git a/packages/@overeng/notion-effect-client/src/media-url.unit.test.ts b/packages/@overeng/notion-effect-client/src/media-url.unit.test.ts new file mode 100644 index 000000000..ec0307bbe --- /dev/null +++ b/packages/@overeng/notion-effect-client/src/media-url.unit.test.ts @@ -0,0 +1,108 @@ +import { describe, expect, it } from '@effect/vitest' + +import { canonicalizeMediaUrl, canonicalizeMediaUrlsInMarkdown } from './media-url.ts' + +/* + * A Notion-hosted signed S3 URL shaped like the live ones (synthetic values — + * no real signature). The volatile `X-Amz-*` family rotates on every pull. + */ +const signedUrl = + 'https://prod-files-secure.s3.us-west-2.amazonaws.com/abc/def/photo.png' + + '?X-Amz-Algorithm=AWS4-HMAC-SHA256' + + '&X-Amz-Credential=ASIAEXAMPLE%2F20260615%2Fus-west-2%2Fs3%2Faws4_request' + + '&X-Amz-Date=20260615T120000Z' + + '&X-Amz-Expires=3600' + + '&X-Amz-Signature=deadbeef' + + '&X-Amz-SignedHeaders=host' + + '&X-Amz-Security-Token=tok' + +const canonicalSignedUrl = 'https://prod-files-secure.s3.us-west-2.amazonaws.com/abc/def/photo.png' + +describe('canonicalizeMediaUrl', () => { + it('strips the volatile signature/expiry params, keeping origin + path', () => { + expect(canonicalizeMediaUrl(signedUrl)).toBe(canonicalSignedUrl) + }) + + it('is idempotent', () => { + const once = canonicalizeMediaUrl(signedUrl) + expect(canonicalizeMediaUrl(once)).toBe(once) + }) + + it('treats two rotated-signature variants as the same canonical form', () => { + const rotated = signedUrl.replace('X-Amz-Signature=deadbeef', 'X-Amz-Signature=cafef00d') + expect(canonicalizeMediaUrl(signedUrl)).toBe(canonicalizeMediaUrl(rotated)) + }) + + it('leaves an external URL with a benign query param untouched', () => { + const external = 'https://example.com/img.png?v=2&w=800' + expect(canonicalizeMediaUrl(external)).toBe(external) + }) + + it('keeps non-volatile params while stripping volatile ones', () => { + const mixed = `${signedUrl}&v=2` + expect(canonicalizeMediaUrl(mixed)).toBe(`${canonicalSignedUrl}?v=2`) + }) + + it('leaves a bare external URL untouched', () => { + const external = 'https://example.com/img.png' + expect(canonicalizeMediaUrl(external)).toBe(external) + }) + + it('returns non-URL input unchanged', () => { + expect(canonicalizeMediaUrl('not a url')).toBe('not a url') + }) +}) + +describe('canonicalizeMediaUrlsInMarkdown', () => { + it('canonicalizes a hosted-media image URL inside markdown', () => { + const md = `![caption](${signedUrl})\n` + expect(canonicalizeMediaUrlsInMarkdown(md)).toBe(`![caption](${canonicalSignedUrl})\n`) + }) + + it('canonicalizes a hosted-media link URL inside markdown', () => { + const md = `[a file](${signedUrl})\n` + expect(canonicalizeMediaUrlsInMarkdown(md)).toBe(`[a file](${canonicalSignedUrl})\n`) + }) + + it('leaves external-URL media untouched', () => { + const md = '![caption](https://example.com/img.png?v=2)\n' + expect(canonicalizeMediaUrlsInMarkdown(md)).toBe(md) + }) + + it('does not rewrite a signed-looking URL that is plain prose text', () => { + const md = 'See https://example.com/x?X-Amz-Signature=abc in prose.\n' + expect(canonicalizeMediaUrlsInMarkdown(md)).toBe(md) + }) + + /* + * Host-gate regression (PR #786 P1): the markdown-string path cannot see + * `file` vs. `external`, so it must only strip signature params on known + * Notion-media hosts. An external signed URL on a different bucket carries + * load-bearing credentials and must survive untouched. + */ + it('preserves an external signed S3 URL on a non-Notion bucket host', () => { + const externalSigned = + 'https://my-private-bucket.s3.eu-central-1.amazonaws.com/photo.png' + + '?X-Amz-Algorithm=AWS4-HMAC-SHA256' + + '&X-Amz-Signature=deadbeef' + + '&X-Amz-Expires=3600' + const md = `![x](${externalSigned})\n` + expect(canonicalizeMediaUrlsInMarkdown(md)).toBe(md) + }) + + it('still canonicalizes a Notion-hosted signed URL on the gated host', () => { + const md = `![caption](${signedUrl})\n` + expect(canonicalizeMediaUrlsInMarkdown(md)).toBe(`![caption](${canonicalSignedUrl})\n`) + }) + + it('canonicalizes a signed file.notion.so URL', () => { + const notionSigned = + 'https://file.notion.so/f/f/abc/def/photo.png' + + '?X-Amz-Signature=deadbeef' + + '&X-Amz-Expires=3600' + const md = `![caption](${notionSigned})\n` + expect(canonicalizeMediaUrlsInMarkdown(md)).toBe( + '![caption](https://file.notion.so/f/f/abc/def/photo.png)\n', + ) + }) +}) diff --git a/packages/@overeng/notion-effect-client/src/mod.ts b/packages/@overeng/notion-effect-client/src/mod.ts index 2fcf372c5..7d3b6e8c3 100644 --- a/packages/@overeng/notion-effect-client/src/mod.ts +++ b/packages/@overeng/notion-effect-client/src/mod.ts @@ -51,6 +51,7 @@ export { NOTION_TOKEN_ENV_VARS, type NotionClientConfig, NotionConfig, + notionTokenFingerprint, NotionTokenMissing, resolveNotionToken, } from './config.ts' @@ -140,10 +141,10 @@ export { getEquationExpression, getTableRowCells, isTodoChecked, - markdownToBlocks, NotionMarkdown, - parseInlineMarkdown, } from './markdown.ts' +export { canonicalizeBlockMarkdown } from './canonical-markdown.ts' +export { canonicalizeMediaUrl, canonicalizeMediaUrlsInMarkdown } from './media-url.ts' // Notion Markdown local format export type { ClassifyNmdFrontmatterPayloadOptions, diff --git a/packages/@overeng/notion-md/demo/showcase.nmd b/packages/@overeng/notion-md/demo/showcase.nmd index 473c5e4ad..85c5328df 100644 --- a/packages/@overeng/notion-md/demo/showcase.nmd +++ b/packages/@overeng/notion-md/demo/showcase.nmd @@ -73,30 +73,45 @@ --- # notion-md automated demo + This page is the durable live showcase for `@overeng/notion-md`: a local `.nmd` file synced 1:1 with a Notion page through the same CLI used by the E2E tests. + ## Body surface + - Stock Notion enhanced Markdown stays in the body. - Strict sync metadata stays in frontmatter. - Content-addressed merge evidence stays in `.notion-md/objects`. + ## Editable Markdown + ### Lists + 1. Pull the Notion page into `demo/showcase.nmd`. 2. Edit the Markdown body locally. 3. Run guarded `push` or `sync`. + - Bulleted content round-trips through the Markdown endpoint. - Local-only metadata is stripped before writing to Notion. + ### Tasks + - [x] Pull demo page - [x] Push local Markdown edits - [x] Verify clean status - [ ] Extend once file bytes and comment projection are implemented + ### Callout + > The demo intentionally avoids local frontmatter prose in the Notion body. If you can see this page in Notion, you are seeing only the synced Markdown body. + ### Code + ```typescript const command = 'notion-md sync packages/@overeng/notion-md/demo/showcase.nmd' ``` + ### Table + <table header-row="true"> <tr> <td>Surface</td> diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0001-title-as-h1-presentation.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0001-title-as-h1-presentation.md new file mode 100644 index 000000000..dc39bae20 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0001-title-as-h1-presentation.md @@ -0,0 +1,25 @@ +# Title is presented as a leading H1, but always transported through the typed title API + +Default streaming mode renders the page title as a leading `# <title>` line so a +human can edit it as normal Markdown. This looks like it violates R04 (properties +must sync through typed APIs, not body Markdown) — but it does not, because the +H1 is a **presentation** affordance only. + +The hard rule: on `put`, the leading title H1 is parsed out, the title is written +through the typed page-metadata API (`updatePageMetadata`), and the H1 is +**stripped from the body** before the body is pushed. The body sent to Notion +stays stock enhanced Markdown (R01), and the title never travels as a body block +(R04). R01/R03/R04 govern the _transport_ surface; presentation is unconstrained. + +## Status + +accepted + +## Consequences + +- The base-hash guard must cover title + body together (a title change is a + document change), not the body alone. +- `cat`/`put` default mode must canonicalize the title-H1 boundary identically + so the round-trip is idempotent. +- Edge rules (line 1 not an H1, body's own leading H1, untitled page) are + specified in spec.md and refined in later decisions. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0002-base-hash-on-stderr.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0002-base-hash-on-stderr.md new file mode 100644 index 000000000..059bcbdd1 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0002-base-hash-on-stderr.md @@ -0,0 +1,22 @@ +# `cat` emits the base hash on stderr; standalone `put` needs `--base-hash` or `--force` + +`cat` writes the [[Base hash]] to **stderr** (`base-hash: sha256:…`) and keeps +**stdout pure Markdown**, so `notion-md cat X | pandoc` and similar pipes work +unpolluted. The alternative (hash on stdout line 1) was rejected because it +would force every consumer to strip a header. + +Consequence: a bare pipe can't carry the out-of-band token, so guarded `put` +needs the hash supplied explicitly. `put` with neither `--base-hash` nor +`--force` refuses and points to both. `edit` threads the hash internally so the +common path never sees it. + +## Status + +accepted + +## Considered Options + +- Hash on stdout line 1 — single capture gets body+token, but pollutes every + pipe and breaks "stdout is pure Markdown." +- Hash on stderr (chosen) — pipe-clean stdout; power users capture stderr or + use `--force`; `edit` hides it. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0003-edit-is-a-session-not-live-sync.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0003-edit-is-a-session-not-live-sync.md new file mode 100644 index 000000000..f4452bb3c --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0003-edit-is-a-session-not-live-sync.md @@ -0,0 +1,27 @@ +# `edit` is a discrete pull-edit-push session, not live/continuous sync + +> **Refined by [0017](./0017-edit-is-an-ephemeral-file-engine-session.md):** the +> session shape below is unchanged, but `edit` is now an ephemeral file-engine +> session whose concurrency safety comes from the engine's base-snapshot guard. + +`edit` pulls once, opens `$VISUAL`/`$EDITOR` on a `$TMPDIR` temp file, and on +editor exit does one guarded `put`. It is not character-level or continuous +two-way sync, and remote changes do not stream into an open editor. + +This is deliberate. Live sync fights a modal editor (cursor jumps, mid-edit +merges), the Notion API is not built for it, and it would require an editor +plugin — which the design explicitly avoids in favor of the canonical +`$EDITOR` integration (the `git commit` / `kubectl edit` / `sudoedit` pattern). +The session model is the accepted interpretation of "two-way editing." + +## Status + +accepted + +## Consequences + +- Concurrency safety comes from the guarded `put` (decision 0002), not from + live reconciliation. A remote change during an editor session surfaces as a + conflict on save, not as a live buffer update. +- A truly zero-temp-file, in-buffer experience would require an editor plugin; + that is out of scope and documented as such, not shipped. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0004-umbrella-surfacing.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0004-umbrella-surfacing.md new file mode 100644 index 000000000..557b659c8 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0004-umbrella-surfacing.md @@ -0,0 +1,25 @@ +# `edit` is promoted to the `notion` umbrella root; `cat`/`put` stay under `notion md` + +`cat`, `put`, and `edit` live in the notion-md Effect command tree, so they +appear as `notion md cat|put|edit` for free via the existing umbrella dispatch +(and as `notion-md …` standalone). On top of that, `edit` is re-exposed as a +top-level alias **`notion edit <page>`**. + +Rationale: "open my Notion page in `$EDITOR`" is the marquee cross-cutting verb +a user reaches for (cf. `kubectl edit`, not `kubectl resource edit`); burying it +under `notion md edit` hurts discoverability. The `cat`/`put` primitives are +body/page operations and stay namespaced under the markdown surface. + +The `edit` command itself lives in notion-md (the package stays self-contained +and `notion-md edit` works alone); notion-cli only re-exposes it. + +## Status + +accepted + +## Consequences + +- notion-cli docs (`docs/glossary.md`, `requirements.md`, `spec.md`) record the + dispatched `cat`/`put`/`edit` surface and the `notion edit` alias. +- `<page>` accepts a page id or full Notion URL everywhere, resolved through + `parseNotionUuid` from `@overeng/notion-core`. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0006-writable-projection-guard.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0006-writable-projection-guard.md new file mode 100644 index 000000000..967b8a49e --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0006-writable-projection-guard.md @@ -0,0 +1,51 @@ +# The guard hash and `put` write unit are the writable projection, not the literal rendered document + +The base hash that guards `put` is computed over a canonical serialization of +the **writable projection** of a page, identical across both streaming modes: + +- title, +- writable metadata (icon, external cover, `in_trash`, `is_locked`), +- writable properties, +- body. + +Read-only / computed fields — formula/rollup values, `created_time`, +`last_edited_by`, `unique_id`, expiring cover URLs — are **excluded** from the +hash and **ignored on write** (with a stderr note if the user changed one). The +projection body must be **URL-canonicalized** (decision 0007). + +## Why + +Two reasons, and live validation (experiments.md) corrected which one matters: + +1. **Semantic:** the guard should cover exactly what `put` can write. A + concurrent change to a field the user cannot write (a rollup, a computed + value) is not a conflict for the user's intended write, so it must not + manufacture one. +2. **Idempotence — and this is the operative one:** the volatility that breaks + `cat`→`put` is **not** in the metadata. Live testing showed + `last_edited_time` is minute-rounded and only advances on a real edit, so two + no-op pulls never differ on it. The real per-pull volatility is the + **hosted-media signed URL inside the body** — a writable projection that + embeds the body _verbatim_ is non-idempotent for media pages, while a + URL-canonicalized body is stable. So the projection's body must be + URL-canonicalized (decision 0007); metadata exclusion is correct but + secondary, and is justified by computed/formula/rollup values (which _can_ + change underneath a no-op pull), not by `last_edited_time`. + +One guard rule serves both modes (default mode is the projection where title is +the only non-body writable field). Property **schema drift** since the pull is +still refused separately (R14); the projection guard covers value/body drift, +not schema changes. + +## Status + +accepted (refines decision 0002; idempotence rationale corrected by live +validation — see experiments.md) + +## Consequences + +- The canonical writable-projection serialization must be deterministic and + stable across pulls: defined field order, volatile/computed fields omitted, + and the body URL-canonicalized (decision 0007). +- Streaming scope (including the object-store-overflow boundary) is governed by + decision 0008. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0007-canonicalize-hosted-media-urls.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0007-canonicalize-hosted-media-urls.md new file mode 100644 index 000000000..b5792d9b9 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0007-canonicalize-hosted-media-urls.md @@ -0,0 +1,42 @@ +# Hosted-media URLs are canonicalized (signature stripped) everywhere bodies are hashed, diffed, or gated + +Notion-hosted media blocks (image/file/video/pdf with `type: "file"`) render as +`![caption](<signed-url>)`, and the signed S3 URL (`X-Amz-Signature/Credential/ +Expires/Security-Token…`) **rotates on every pull**. External-URL media is +stable. Live testing (experiments.md, items 3/6c/7) showed this single fact +breaks the streaming surface three ways: + +- raw body hash differs between two no-op pulls → `cat`→`put` non-idempotent and + the base hash goes stale within the URL TTL with zero edits; +- a stored base hash is unusable for media pages; +- `update_content` pushes on a media page are **rejected** by the gateway's + post-push gate, because `semanticEquivalent` does whitespace-only + normalization and the re-observed rotated URL ≠ the expected body. + +Decision: **canonicalize hosted-media URLs** — strip the volatile +`X-Amz-*` / signature / `Expires` query params, keep `origin + pathname` — at +every point a body is hashed, diffed, base-tracked, or gated, **including inside +`semanticEquivalent`**. External (stable) URLs are left untouched. This is the +chosen fix over an opaque `notion-file:<id>` placeholder reference: it is +simpler, keeps Markdown image syntax, was validated to make the body hash stable +(`b02e7f27` across both pulls), and reuses the existing renderer/diff path. + +## Status + +accepted + +## Consequences + +- The renderer (or a post-render canonicalization step) must emit the + signature-stripped URL for hosted media so `cat` output is deterministic. +- `canonical-markdown.ts` `canonicalizeBlockMarkdown` / `semanticEquivalent` + must URL-canonicalize, not just normalize whitespace — otherwise every + `update_content` push on a media page fails closed (item 6c). +- The canonicalized URL is stable but not directly fetchable; that is accepted + for an editing surface (the user edits text, not media URLs). Canonicalization + is for **hashing / diffing / gating** only. Whether a canonicalized URL + survives a full `replace_content` round-trip is an implementation-verification + item (the live file stays authoritative on the remote); it does not change the + hashing contract. +- The hashed, human-visible body never carries volatile data — the same + no-volatile-data-in-the-hash principle the design applies throughout. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0008-streaming-scope-boundary.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0008-streaming-scope-boundary.md new file mode 100644 index 000000000..327c6f6e4 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0008-streaming-scope-boundary.md @@ -0,0 +1,36 @@ +# Streaming surface scope: body + writable projection only; other surfaces stay file-based + +> **Refined by [0017](./0017-edit-is-an-ephemeral-file-engine-session.md):** this +> scope boundary now applies to the **stateless pipes `cat`/`put`** only; `edit` +> is an ephemeral file-engine session that widens reach on representable pages. + +The streaming surface (`cat`/`put`/`edit`) is stateless — no `.notion-md/`, no +sidecar, no object store. It therefore operates on exactly the **writable +projection** (decision 0006): body + title + writable metadata + writable +properties. Everything else stays on the file-based path (`sync`/`status`/ +`plan`): + +- **File bytes / object store:** no download or upload; hosted files stay + remote-authoritative and are referenced by canonicalized URL (decision 0007). + (Originally this also covered an inline placeholder for non-representable media; + that placeholder approach was abandoned for refuse-lossy — decision 0016.) +- **Comments, data-source schema, unsupported-block payload snapshots, + base-snapshot three-way merge, tree / child-page operations:** file-only. + +This is a **scope boundary, not a refusal**: those surfaces simply are not +represented in the stream, so editing a page's body/title through streaming +leaves them untouched on the remote. A user who needs to edit one of those +surfaces uses the file-based path. + +## Status + +accepted + +## Consequences + +- Streaming never triggers object-store overflow (it carries no storage + payload), so there is no overflow case to refuse — the earlier "overflow out + of scope" note in decisions 0006/0007 is subsumed here. +- The file-based and streaming surfaces are complementary views of the same + page, not competing ones; guidance should point users to `sync` when they need + a surface outside the projection. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0009-force-is-concurrency-only.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0009-force-is-concurrency-only.md new file mode 100644 index 000000000..6ce03a58f --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0009-force-is-concurrency-only.md @@ -0,0 +1,29 @@ +# `--force` bypasses only the concurrency guard, not correctness guards + +> **Refined by [0017](./0017-edit-is-an-ephemeral-file-engine-session.md):** the +> exit-6 schema-drift guard is now the engine's `schema_snapshot` comparison; the +> principle — `--force` is concurrency-only — stands. + +`put --force` (and `edit --force`) bypasses **only** the exit-7 base-hash guard +(`NotionMdBodyConflictError`) — the last-writer-wins concurrency escape. It does +**not** override the exit-3 lossy refusal or the exit-6 schema-drift refusal. + +Those are correctness guards, not concurrency: pushing a lossy body can delete +content the user never saw (R12), and writing properties against a drifted +schema can corrupt typed data (R14). Neither is about "someone edited +concurrently," so a concurrency override must not silently disable them. Per R15 +a destructive mode must report exactly what it bypasses, and `--force` reports +only the guard. + +## Status + +accepted + +## Consequences + +- Bypassing lossy/schema-drift requires their own explicit modes, not `--force`. +- Under refuse-lossy (decision 0016) there is no inline placeholder to delete; + the historical "deleting a visible placeholder is normal editing" carve-out is + moot, so this is unrelated to `--force`. +- Keeps the streaming surface aligned with the vision's "not a last-writer-wins + backup tool" while still offering a deliberate single-purpose override. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0012-non-atomic-title-body-write-order.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0012-non-atomic-title-body-write-order.md new file mode 100644 index 000000000..004d9a1bc --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0012-non-atomic-title-body-write-order.md @@ -0,0 +1,45 @@ +# A default-mode `put` is two non-atomic writes: body first, title last, partial failure reported + +> **Scoped by [0017](./0017-edit-is-an-ephemeral-file-engine-session.md) to the +> stateless `put`.** `edit` is an ephemeral file-engine session and inherits the +> engine's settle-and-re-pull, so the exit-10 partial-write model below applies to +> `put` (body + title, two writes), not to `edit`. + +A default-mode `put` performs **two** remote writes — the body +(`replaceRemoteBodyVerified` → `replace_content`) and the title (typed page API). +Notion has no transaction across them, so either can fail independently. + +Decision: + +- **Order:** body first, title last. The title write is cheap and idempotent, so + doing it last narrows the partial-failure window and a retry re-applies it + harmlessly. +- **Partial failure:** if one write lands and the other fails, report exactly + which landed, fail with **exit 10** (`NmdPartialWriteError`), and state the page + is in a mixed state with a stale base hash. Never silent exit 0. +- **Precedence:** a known partial write (exit 10) **dominates** the post-push + semantic-equivalence gate (exit 9) — once one write is known to have landed and + the other failed, report exit 10 and skip the gate. +- **Recovery:** re-`cat` (the page is authoritative), then re-edit. + +`--frontmatter` mode can additionally write writable properties; the same +ordering principle applies (body, then page-level title/metadata/properties), +and a mid-sequence failure is reported the same way. + +## Why + +A body replace and a typed title write are genuinely separate API calls with no +shared transaction. Modelling `put` as an ordered pair with body-first ordering +and a clear exit-10-dominates-exit-9 rule is the honest, recoverable model. + +## Status + +accepted + +## Consequences + +- Exit 10 (`NmdPartialWriteError`) carries which write landed; exit 9 only fires + when both writes completed but the result is not semantically equivalent. +- The body is replaced in a single `replace_content` call (decision 0016), so + there is no intra-body op sequence to partially apply — the only partial state + is body-applied / title-not (or the reverse). diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0016-refuse-lossy-pages.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0016-refuse-lossy-pages.md new file mode 100644 index 000000000..4515c627d --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0016-refuse-lossy-pages.md @@ -0,0 +1,78 @@ +# The editor refuses pages with lossy blocks instead of reconciling them + +> **Broadened by [0017](./0017-edit-is-an-ephemeral-file-engine-session.md)** to +> uniform refusal across `cat`/`put`/`edit` and the file-based `sync`. Read +> "streaming editor" below as "the editor surface." + +The editor (`cat`/`put`/`edit`) targets the **representable-Markdown majority**. A page whose body contains any **not-losslessly-representable block** +— `child_database`, `synced_block`, `table_of_contents`, `child_page`, the API +`unsupported` type, and similar — is **refused** (exit 3, +`NmdRemoteBodyLossyError`) at read time, with a clear message pointing the user +to the Notion UI or the file-based `.nmd` sync path. The editor never presents +or pushes a body it cannot round-trip. + +## Why + +Live experimentation mapped a hard platform ceiling that bars the parts of the +"edit everything as Markdown" ideal which motivated the heavy reconciler: + +- **No backlink / inbound-reference endpoint.** Repositioning a `synced_block` + original or any referenced block must mint a new id, silently breaking inbound + references the API cannot enumerate. +- **`child_database` is uncreatable via the block API**, so recreate-move is + impossible for it by platform limitation. +- **The Markdown endpoint is lossy and non-injective** (a callout and a quote + both render `> 💡 hello`), so a reconstruct-from-Markdown push corrupts. + +Given that ceiling, the elegant design serves the representable majority cleanly +and refuses the rest **honestly** rather than building a reconciler / converter / +recreate-move edifice that is partly impossible and fragile where possible. + +Crucially, **refuse-lossy is the pre-existing posture, not a new invention.** The +file-based path already refuses lossy observations for clean-base adoption +(`assertSnapshotComplete`) and already pushes representable bodies through the +Markdown endpoint with live E2E coverage. Abandoning the reconciler returns the +streaming path to the posture the file-based path always had; the +renderer-symmetric converter was only ever needed for client-side block +reconstruction inside the reconciler, which is now gone — so the +representable-body push path needs no new verification. + +This honors the user's explicit choice ("refuse lossy pages, clean ideal") and is +**honest scope, not an MVP cop-out**: it is the smallest, most elegant design the +platform actually permits. + +## Status + +accepted + +## Consequences + +- **`put` is a guarded body replace** (`replaceRemoteBodyVerified` → + `replace_content`) plus a typed title / property write — **two writes, not a + block-op sequence** (decision 0012). Because the body contains no opaque blocks, + `replace_content` can never destroy one. +- **Supersedes the reconciler edifice** — inline id-carrying placeholders, + visible-placeholder deletion as normal editing, block-level reconciliation by + id, reconciliation as the universal push engine, and the renderer-symmetric + Markdown↔block converter are all abandoned (their decision records are removed; + rationale in [experiments.md](../experiments.md)). No `<notion-block id>` token, + no recreate-move. +- **Exit 3 flips from a rare fallback to the defining refusal.** It names the + refused block classes and points to the Notion UI / file-based sync. The + reconciler-only **exit 11** (opaque-move-unsupported) is deleted. +- **R38 survives, repurposed.** The body-fidelity classifier must still flag + every not-losslessly-representable block — but now to drive the **refusal + gate** (refuse the page up front) rather than to id-anchor a placeholder. This + is a correctness prerequisite: today `child_database`/`toc` classify + `complete`, so without R38 a `replace_content` would silently destroy them. +- **Refusing all synced-original / inbound-referenced moves is subsumed**: such + blocks are simply refused with the page. +- **Hosted media stays in scope** (representable; only its URL is volatile — + decision 0007). Media pages are **not** refused. Whether a canonicalized + (non-fetchable) hosted-media URL survives a full `replace_content` round-trip + is an implementation-verification item, not a spec refusal. +- The file-based path keeps its existing **Markdown** three-way merge and + guarded `replace_content`; it is not rewritten onto a block engine. +- Decision 0007 (media canonicalization) is unaffected; 0006 (writable-projection + guard) and 0008 (scope boundary) stand. (The stateless schema fingerprint this + record once called "unaffected" was later superseded by 0017.) diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0017-edit-is-an-ephemeral-file-engine-session.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0017-edit-is-an-ephemeral-file-engine-session.md new file mode 100644 index 000000000..230118bdd --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0017-edit-is-an-ephemeral-file-engine-session.md @@ -0,0 +1,78 @@ +# `edit` is an ephemeral file-engine session; `cat`/`put` are the only stateless pipes + +`notion edit` is **not** a second push engine. It is sugar over the existing +file-based `sync` engine: pull the page into an ephemeral `.nmd` + `.notion-md/` +under `$TMPDIR`, present the body (default mode: `# title` + body) in `$EDITOR`, +splice the edit back into the envelope, push through the existing guarded +`syncPage`, then delete the temp dir. Only `cat` and `put` stay gateway-only +(stateless stdin/stdout body pipes). + +This collapses `edit` from a bespoke streaming push path into a thin wrapper over +`pullPage` + body splice + `syncPage` + temp-dir cleanup, and lets it inherit the +mature engine's guarded push, 3-way Markdown merge, settle-and-re-pull, and +out-of-band preservation — none of which needs reimplementing. + +## Why + +The file engine is fully location-relative (state paths derive from the `.nmd` +path argument; no `process.cwd()`), and the live test suite already drives it +inside a `mkdtemp` dir, so an ephemeral `$TMPDIR` session is a supported +configuration, not new surface. The engine detects both remote-body conflict and +schema drift from the **base snapshot** it captures at pull — exactly what a +pull-at-start / push-at-end session has — so `edit` needs no separate guard +machinery. + +## Consequences + +- **Refuse-lossy is uniform across `cat`/`put`/`edit` (and `sync`).** The engine + refuses a lossy body at the **pull** gate (`assertRemoteMarkdownComplete`), and + `edit` materializes through that same `pullPage`. So `edit` refuses the same + opaque-block pages `cat`/`put` refuse — it does **not** preserve/edit them (the + earlier "edit handles lossy pages" framing was a latent-bug artifact, see R38). + Decision 0016's refusal is therefore a property of the **shared core**, not a + streaming-only carve-out. `edit` gains reach over `cat`/`put` only on + _representable_ pages (object store, 3-way merge, `unsupported_blocks` + preservation of resolvable blocks). +- **R38 is a blocking prerequisite and a real file-path bug fix.** Today the + classifier flags only API-`unsupported` blocks, so `child_database` / + `table_of_contents` / `synced_block` mis-classify `complete` and a + `replace_content` push silently destroys them — in the existing file path, not + just streaming. The uniform refusal above is only sound once R38 lands. +- **The stateless in-buffer schema-drift fingerprint is superseded and deleted** + (removes R42; impl-delta Group F is repurposed from the fingerprint to a + small engine `schema_snapshot` addition). It existed solely because a stateless + pipe has no base snapshot. `edit --frontmatter` runs over the engine's base + snapshot, so drift is detected by snapshot comparison (a `schema_snapshot` + sidecar role the file engine must capture at pull and compare at push — a small + engine addition, strictly simpler than a parallel in-buffer fingerprint + re-derived by an independent implementation). +- **`cat`/`put` are body-only pipes; structured property editing moves to `edit`.** + `cat --frontmatter` (read) survives — a stateless envelope dump is safe and + useful in pipes. **Stateless `put --frontmatter` (property _write_) is dropped**: + a safe property write needs drift detection, which needs a base snapshot, which + means `edit --frontmatter` (interactive) or the file-based `sync` (scripted). + This is the one capability cut — non-interactive property writes use `sync`. +- **The bespoke two-write partial model (0012) applies only to the stateless + pipe.** `edit` inherits the engine's settle-and-re-pull, so the exit-10 + partial-write model is not part of `edit`; `put` (body + title, two writes) + keeps exit 10. **Schema drift is surfaced as exit 6**, redefined from the + deleted stateless fingerprint to the engine's `schema_snapshot` comparison for + `edit --frontmatter` / `sync` (R14) — distinct from the exit-7 conflict so it is + not `--force`-able. +- **`edit` forces a full-body `replace_content`.** Post-R38 every page `edit` + accepts is representable (opaque pages refused at the pull), so a full replace is + safe and closes the engine's targeted-`update_content` silent-partial-apply + window for the single-session case. +- **Temp-dir lifecycle and conflict relocation are the new edges.** `edit` must + scope-clean the `$TMPDIR` dir on success / conflict / editor-abort / interrupt, + and copy any `.conflict.roughdraft.md` out of `$TMPDIR` (which is reaped) to a + durable sibling so a conflicted edit is recoverable. +- **Statelessness is preserved where it is intrinsic (the pipe), incidental + elsewhere.** `cat`/`put` write nothing; `edit` writes a `$TMPDIR` temp tree + (never the cwd) — consistent with decision 0003 / T07, which already conceded + `edit` is a temp-file session, not an in-memory buffer. + +## Status + +accepted (supersedes the stateless in-buffer schema fingerprint; redefines 0003, +0008, 0012; broadens 0016 to uniform) diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0018-staged-task-list-sync-progress.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0018-staged-task-list-sync-progress.md new file mode 100644 index 000000000..7a657cb2d --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0018-staged-task-list-sync-progress.md @@ -0,0 +1,82 @@ +# Sync progress is a staged `TaskList`, not a fake percentage + +When a write-path command (`edit` save, `put`, file `sync`) runs, the sync makes +several remote round-trips with no UI in between, so it reads as a hang. The +remedy is a **discrete-stage progress indicator** rendered as +`@overeng/tui-react`'s `TaskList` — one checklist row per named sync stage +(observe → write-body → write-title → settle), each row carrying a status +(`pending` / `active` spinner / `success` / `error` / `skipped`) plus an +`X/total · elapsed` summary — driven by stage events the engine emits to a +`ProgressReporter` service. + +It is **not** a smooth `%` / data progress bar. + +## Why + +There is no per-block progress data to drive a percentage. The body write +(`replace_content`) is one opaque server-side call, and the block-tree pull +discovers children by a recursive crawl whose total is unknown until it finishes. +A `%` bar would have to invent a denominator — a fake, jittery number that erodes +trust the moment a stage stalls. Discrete named stages are the honest shape: the +user sees _which_ step is running and that it is still moving (the spinner), which +is exactly the "is it hung?" question being answered. + +The four near-identical remote pulls the engine already performs (status pull, +pre-push re-read, post-write re-observe, …) read as one indistinguishable hang +unless they are labelled. Emitting **purpose-tagged** stage events lets the same +mechanical pull surface as a distinct human stage ("observe", "settle", …) +without the engine knowing anything about rendering. The complementary perf lever +— collapsing those redundant pulls — is tracked separately (#788); this decision +makes the existing passes legible, it does not change how many there are. + +## Mechanism + +- A `ProgressReporter` service (`Context.Tag`) with a **no-op default `Layer`**. + The engine emits purpose-tagged stage events to it; in every non-interactive + context the events fall on the floor with zero rendering cost and no behavior + change. The CLI provides a `TaskList`-backed `Layer` only on the write path. +- Rendered through the TUI render seam to **stderr**, gated on + `process.stderr.isTTY`. `cat`'s **stdout stays pure Markdown** (decision 0002) + so `notion-md cat … | put` and `… | put > file` keep working; when stderr is not + a TTY the reporter degrades to the no-op (or a terse static line), never animated + control sequences in a pipe or log. +- The CLI `TaskList` app is constructed **lazily inside the command handler**, not + at module top level. notion-md has no runtime TUI import today; a top-level + `createTuiApp(...)` would re-enter the same concurrent-module-load TDZ that + crashed the umbrella in #787 (`createTuiApp` reached while + `@overeng/tui-react` was mid-initialization). A lazy, memoized accessor inside + the handler keeps the TUI graph out of import-time evaluation. +- **Scope: the write path only** (`edit` save, `put`, file `sync`). `cat` is a + read pipe and is excluded — it must stay a clean stdout producer with nothing on + stderr but its existing `base-hash:` line. + +## Consequences + +- The engine depends on `ProgressReporter` but stays render-agnostic: it emits + tagged stage transitions, the Layer decides whether/how to render. Pure planning + code is unaffected. +- Because the default Layer is a no-op, fake-gateway and live E2E suites that drive + the engine without a TUI need no change; only the CLI write path wires the + `TaskList` Layer. +- The stage vocabulary (observe / write-body / write-title / settle) is a CLI-facing + presentation contract, distinct from the OTEL span names (which stay on the + `notion_md.*` namespace). A stage may map to several spans or none. +- `put`'s two non-atomic writes (body, then title — decision 0012) surface as two + rows, so a partial write (exit 10) is visibly the title row failing after the + body row succeeded, rather than an opaque failure. + +## Implementation state + +The `ProgressReporter` `Context.Tag` seam and the engine stage emits are +implemented; `edit`'s write path ships the **static-line rung** (sequential +stderr lines via `ProgressReporterStderrLines`), deliberately _not_ the animated +`TaskList` — `edit` returns from a full-screen editor that owned the TTY, so a +mounting TUI would fight the terminal, and lines sidestep the #787 module-load +TDZ entirely. The animated `TaskList` Layer is the same Tag's later drop-in (zero +engine re-touch). `put` and file-`sync` emit through the engine but do not yet +wire a render Layer (they stay no-op until wired). + +## Status + +accepted (composes with 0002 stderr/stdout split, 0012 two-write order, 0017 +engine reuse; complementary to redundant-pull collapse #788) diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0019-one-canonical-body-at-both-wire-boundaries.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0019-one-canonical-body-at-both-wire-boundaries.md new file mode 100644 index 000000000..b41c0fbd7 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0019-one-canonical-body-at-both-wire-boundaries.md @@ -0,0 +1,74 @@ +# Canonical body is one function applied at BOTH wire boundaries + +The body has **one renderer** (`treeToMarkdown`) and **one canonicalizer** +(`canonicalizeBlockMarkdown`). They are wired so that **both** wire boundaries — +pull receive and push send — route the body through the single canonicalizer, so +the body a consumer reads (`cat` / `edit` / file sync / baseline), the body +hashed/compared, and the body pushed are the _same canonical bytes_. + +- The **renderer emits parseable-not-canonical Markdown**: it joins sibling + blocks (including consecutive list items) with `\n\n` so they stay distinct + after a parse. It carries **no spacing/tightness policy** and its joins must + **not** be made block-type-aware — that would re-split spacing policy across + two serializers and reintroduce the divergence this decision removes. +- **Spacing / list-tightness policy lives only in the canonical layer.** + `canonicalizeBlockMarkdown` is: line-ending normalize → media-URL canonicalize + → remark parse + GFM → `unwrapSoftBreaks` → `forceTightLists` (`spread = false` + on every list / list item) → remark-stringify → single trailing `\n`. +- The canonical function lives in `@overeng/notion-effect-client`, beside the + renderer and the media-URL canonicalizer it calls. `observeFromSnapshots` + canonicalizes the rendered body **once, at the source**, before it flows into + the inventory, the fidelity classifier, and the evidence fingerprint, so all of + them agree by construction. `semanticEquivalent` (the push integrity gate) + stays in `@overeng/notion-md` — it is sync _policy_, not the wire form. + +## Why + +Before this, pull emitted the raw renderer output (always-loose lists, headings +correct) while push canonicalized through remark (tightness follows input). The +two serializers never reconciled — a two-oracle divergence. It surfaced as the +loose-bullet-list line-break bug and a stray indented blank line inside nested +lists, and was only _masked_ (not caught) by `semanticEquivalent`'s +whitespace-collapse. Routing both boundaries through one function makes pull and +push agree by construction: the line-break bug cannot recur because exactly one +place decides spacing and both boundaries read from it. + +## Consequences + +- **`semanticEquivalent` is unaffected.** It canonicalizes both sides then + collapses whitespace outside fenced code; loose-vs-tight differs only in + inter-item blank lines (whitespace, non-code), so the gate is invariant across + this change — it neither newly-fails nor newly-passes anything. +- **`update_content` is not a third un-canonical push.** `planMarkdownUpdate` + diffs an already-canonical base/remote (both from pull) against a possibly-raw + desired buffer and emits `oldStr`/`newStr` as **raw substrings Notion matches + verbatim**. `desired` must **not** be canonicalized; the + `remote.replace(oldStr, newStr) === desired` guard plus the canonicalizing + `replace_content` fallback and the `semanticEquivalent` post-push gate cover + correctness. The merge / 3-way normalization stays line-level. +- **The title↔H1 frame consumes the canonical body verbatim.** `editor-surface` + frames an already-canonical body and must **not** re-canonicalize the + title-framed buffer — the `# <title>` line is presentation, not body Markdown, + and re-parsing it would break the load-bearing H1 round-trip. +- **The vestigial client `markdownToBlocks` converter is deleted.** Under + refuse-lossy / one-engine (decision 0016), push sends a raw Markdown + string to Notion's server-side `/markdown` endpoint; the client-side converter + was on no path. +- **Body bytes change → body hashes change, once.** Any already-synced page with + a list re-canonicalizes loose → tight on the next pull; the recorded base hash + goes stale and shows as a benign one-time body change. For list-tightness the + through-Notion fixpoint converges; the known **Case B** residual + (paragraph-after-list, #756) is non-idempotent on Notion's server reparse and + stays _masked_ by `semanticEquivalent` — pre-existing, not changed by this + consolidation. + +## Deliberate non-changes + +- **`normalizeMarkdownLineEndings` stays** as the line-level sub-step (on-disk / + title-frame / hash-prep). It is a step _of_ the canonical function, not a + sibling, and is intentionally separate from block-level canonicalization. +- **`normalizeComparableMarkdown` / `normalizeLines` stay in `notion-core`.** + Core is pure (zero deps, no Markdown parser); it cannot import the canonical + function — and Option 2 puts that function in `notion-effect-client`, which + _depends on_ core, so the constraint is permanent. The per-line `trimEnd` used + by the fidelity suffix compare is a legitimately separate pure concern. diff --git a/packages/@overeng/notion-md/docs/vrs/.decisions/0020-no-live-watch-edit-stays-single-shot.md b/packages/@overeng/notion-md/docs/vrs/.decisions/0020-no-live-watch-edit-stays-single-shot.md new file mode 100644 index 000000000..276ebae08 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/.decisions/0020-no-live-watch-edit-stays-single-shot.md @@ -0,0 +1,58 @@ +# `edit` stays a single-shot session — no live watch / bidirectional sync + +A live, two-way watch mode was investigated — push each `:w` save to Notion as +you type (local→remote), and reflect upstream Notion edits into the open editor +buffer (remote→local) — and **rejected**. `edit` remains the single-shot +`$EDITOR` session of decision [0003](./0003-edit-is-a-session-not-live-sync.md) / +[0017](./0017-edit-is-an-ephemeral-file-engine-session.md): pull once → edit → push +once on clean exit. The "live" feeling is instead approximated by making the +**existing** post-exit push legible (staged progress, decision +[0018](./0018-staged-task-list-sync-progress.md)) and surfacing remote drift the +guarded push already detects as a visible stderr note. + +## Why (two walls, both empirically spiked) + +The reuse story is fine — push-on-save is the existing engine in a loop, and the +guarded push already self-advances its base snapshot per write, so the +"moving base" is free. The walls are not engineering gaps; they are inherent to +driving a black-box `$EDITOR` that owns its buffer **and** the terminal: + +- **No live feedback (local→remote).** The editor is spawned with the real TTY + inherited (`editor-commands.ts` `defaultRunEditor`, `stdin/stdout/stderr: +'inherit'`) and the CLI blocks on its exit. Nothing can render to that terminal + while the editor is up without corrupting its screen. So per-save sync status + could only ever appear _after_ the editor exits — which defeats watch mode. +- **No live reflection (remote→local).** Even if the CLI detects an upstream + change (cheap via `last_edited_time`) and rewrites the temp file, it **cannot + make the editor reload it**. Spiked in vim/nvim: `:set autoread` alone never + reloads an idle buffer (needs `:checktime`, which only the user's own + `CursorHold` autocmd fires, clean-buffer only); the CLI has no channel to send + it because the editor owns the TTY. VS Code auto-reloads clean buffers; + emacs/nano do not by default. A remote change against an _unsaved_ buffer is a + destructive 3-way merge (vim throws a blocking `W12` modal). + +Watch mode also forfeits the clean-abort guarantee (once a save lands, content is +on Notion — `:cq` can no longer mean "nothing synced") and spams Notion version +history (each `:w` is a full-body `replace_content`). + +True live bidirectional editing requires a **custom TUI editor the CLI fully +owns** (OT/CRDT, drop `$EDITOR`) — a different product that abandons the +`git commit` / `kubectl edit` covenant this tool is built on. That is out of +scope here; if it is ever wanted it is its own VRS pass, not a tweak to `edit`. + +## What we ship instead ("fix the hang + warn") + +- **Staged progress** on the post-exit push (decision 0018, R43–R45): the silent + multi-round-trip push surfaces as sequential stderr stage lines + (observe → write-body → write-title → settle), so it no longer reads as a hang. +- **Drift notes**: when the guarded push auto-merges against a moved remote, or + conflicts (exit 7, `.conflict.md`), `edit` emits a visible stderr note. This + reuses the engine's **existing** drift outcomes — no background poller, no + extra `last_edited_time` round-trip (which would swim against the + redundant-pull collapse, #788, and the single-source-of-truth consolidation, + decision [0019](./0019-one-canonical-body-at-both-wire-boundaries.md)). + +## Status + +accepted (reaffirms 0003 / 0017 with empirical spikes; composes with 0018 +progress. Supersedes nothing — it records the rejection of an alternative.) diff --git a/packages/@overeng/notion-md/docs/vrs/01-editor/requirements.md b/packages/@overeng/notion-md/docs/vrs/01-editor/requirements.md new file mode 100644 index 000000000..a4b2f0ea2 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/01-editor/requirements.md @@ -0,0 +1,37 @@ +# Requirements: 01-editor + +**Role.** The `$EDITOR`-based editing surface: the stateless stdin/stdout body +pipes (`cat` / `put`) that write nothing anywhere, the canonical-editor +convenience (`edit`) that is an ephemeral file-engine session, the title↔H1 +presentation boundary, the per-command guard plumbing and exit-code model, and +the write-path sync-progress indicator. The pipes project body + title only; the +engine's extras are reached through `edit` or the file path, not the pipes. + +Builds on the cross-cutting [../requirements.md](../requirements.md) (global +A/T) and [../glossary.md](../glossary.md). IDs are GLOBAL and preserved. The +uniform lossy-page refusal these verbs enforce is **owned by** +[04-fidelity](../04-fidelity/requirements.md) (R30/R38) — it is a property of the +shared classifier gate, not a streaming-only behavior; this subsystem cites it. +The base-snapshot guard `edit` reuses is owned by +[03-sync-engine](../03-sync-engine/requirements.md) (R09/R11). Hosted-media URL +canonicalization (R36) is owned by [04-fidelity](../04-fidelity/requirements.md); +the pipe base hash depends on it. + +## Requirements + +### Must Support Editor-Based Editing + +- **R32 Editor surfaces:** The tool must provide stateless stdin/stdout body pipes (`cat`/`put`) that write nothing anywhere, and a canonical-editor convenience (`edit`) that is an ephemeral file-engine session — it may materialize a `.nmd` + `.notion-md/` under `$TMPDIR` but must write nothing to the working directory and clean the temp tree up (decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)). +- **R33 Title presentation boundary:** Default mode may present the page title as a leading H1, but the title must transport through the typed page API and never as a body block ([R01](../requirements.md#must-preserve-surface-boundaries-cross-cutting)/[R04](../06-data-source/requirements.md)); a missing title line is refused, not guessed. +- **R34 Editor guard:** The stateless `put` must be guarded by default against a caller-supplied base hash (title + body), refuse on remote drift, and bypass the guard only under an explicit `--force` ([R11](../03-sync-engine/requirements.md)/[R15](../03-sync-engine/requirements.md)). `edit` must be guarded by the file engine's base snapshot captured at the ephemeral pull (decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)). +- **R35 Editor neutrality:** `edit` must work with any `$VISUAL`/`$EDITOR` and ship no editor plugin. +- **R37 Pipe scope boundary:** The stateless pipes (`cat`/`put`) must operate only on body + title and leave every other surface untouched on the remote; structured property editing and the engine's extras (object store, three-way merge, `unsupported_blocks` preservation) are reached through `edit` or the file-based path, not the pipes. Stateless property _writes_ (`put --frontmatter`) are not provided (decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)). +- **R39 Partial-write honesty (`put`):** The stateless `put` is two non-atomic writes (body, then title). On a partial failure it must report which write landed and fail with a distinct code (partial-write dominating the post-push gate), never silently succeed. `edit` inherits the engine's settle-and-re-pull instead. +- **R46 Read-only edit session:** `edit --read-only` must pull and present the page in `$VISUAL`/`$EDITOR` exactly like `edit`, but **never push or write anything to the remote** — any edits are discarded and the `$TMPDIR` temp tree is cleaned up — and it must say so (a stderr note that changes were not synced). It is the editor analogue of a read pipe (`vim -R` / `git show`): no base-hash/guard machinery is needed (nothing is written), and a non-zero editor exit is a clean no-op. `--read-only` composes with `--frontmatter` (inspect the full envelope read-only); `--read-only --force` is contradictory (force concerns the push) and must be rejected, not silently ignored. By default a read-only session still refuses a not-round-trip-safe page at the pull (R30/R38, same as `edit`); relaxing that for read-only — since it never pushes — is a deliberate open question, not assumed. + +### Must Make Write-Path Sync Legible + +- **R43 Write-path progress visibility:** A write-path command (`edit` save, `put`, and the file-based `sync`/`put` write) must surface its multi-round-trip sync as a **discrete-stage** progress indicator — a per-stage checklist (e.g. observe → write-body → write-title → settle), each stage with a pending/active/done/failed status and an `X/total · elapsed` summary — so the operation does not read as a hang. It must not present a smooth `%`/data progress bar: there is no per-block progress data (`replace_content` is one opaque call; the block-tree pull discovers children by recursive crawl), so a percentage would be fabricated (decision [0018](../.decisions/0018-staged-task-list-sync-progress.md)). The four near-identical engine pulls must read as **distinct human stages** via purpose-tagged stage events. (`cat` is a read pipe and is excluded.) +- **R44 Pipe-safe, TTY-gated rendering:** The progress indicator must render to **stderr** and only when `process.stderr.isTTY`, so `cat`'s stdout stays pure Markdown ([R01](../requirements.md#must-preserve-surface-boundaries-cross-cutting), decision [0002](../.decisions/0002-base-hash-on-stderr.md)) and a piped/redirected write (`… | put > file`) degrades to a static line or nothing — never animated control sequences in a pipe or log (decision [0018](../.decisions/0018-staged-task-list-sync-progress.md)). +- **R45 Zero-cost and crash-neutral when non-interactive:** Progress instrumentation must be **behavior-neutral and zero-cost when non-interactive** (tests, fake/live E2E, non-TTY runs): no output, and no change to any result, error, or exit code, so a write path is byte-identical with the indicator on or off. It is a cross-cutting Effect service ([R16](../requirements.md#must-be-effect-native-cross-cutting)) so the engine stays render-agnostic. Enabling progress must **not reintroduce the umbrella's startup-crash class** (#787) — the indicator's wiring must not run at module load. Mechanism in [01-editor spec](./spec.md#sync-progress-indicator-write-path), decision [0018](../.decisions/0018-staged-task-list-sync-progress.md). +- **R47 Visible remote drift on the write path:** When a write push encounters remote drift the guarded engine already detects — an auto-merge of the local edit against a concurrently-changed remote, or a body conflict (exit 7, relocated `<page>.conflict.md`) — it must surface that outcome as a **visible note** (stderr), not handle it silently. This reuses the engine's existing drift detection: it must **not** add a background poller or an extra remote round-trip (e.g. a `last_edited_time` probe) for the notice, consistent with the redundant-pull collapse (#788) and the single-source-of-truth body (decision [0019](../.decisions/0019-one-canonical-body-at-both-wire-boundaries.md)). This is the surfacing half of the deliberately single-shot `edit` model — live two-way reflection of upstream changes into the open editor is **rejected** as infeasible plugin-free (decision [0020](../.decisions/0020-no-live-watch-edit-stays-single-shot.md)). diff --git a/packages/@overeng/notion-md/docs/vrs/01-editor/spec.md b/packages/@overeng/notion-md/docs/vrs/01-editor/spec.md new file mode 100644 index 000000000..685cd87d6 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/01-editor/spec.md @@ -0,0 +1,303 @@ +# Spec: 01-editor + +Specifies the `$EDITOR`-based editing surface — the stateless `cat`/`put` body +pipes, the ephemeral-file-engine `edit` session, the title↔H1 boundary, the +guard plumbing, the exit-code model, the umbrella surface, and the write-path +sync-progress indicator. Builds on [../requirements.md](../requirements.md) + +[./requirements.md](./requirements.md); terms in [../glossary.md](../glossary.md); +rationale in [../.decisions/](../.decisions/). See [../spec.md](../spec.md) for the +architecture index. + +Traces: R32–R35, R37, R39, R43–R45. The uniform lossy-page refusal (exit 3) these +verbs enforce is owned by [04-fidelity](../04-fidelity/spec.md) (R30/R38); the +base-snapshot guard `edit` reuses is owned by +[03-sync-engine](../03-sync-engine/spec.md) (R09/R11); hosted-media +canonicalization the base hash depends on is owned by +[04-fidelity](../04-fidelity/spec.md) (R36). + +## Editor Surfaces (`cat` / `put` / `edit`) + +Requirement trace: R01, R03, R04, R11, R15. These commands let a human (or pipe) +edit a Notion page as Markdown with the canonical editor instead of a persistent +local file. They are two different shapes (decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)): + +- **`cat` / `put` — stateless body pipes.** Gateway-only (body facade + `observeRemoteBody` / `replaceRemoteBodyVerified`): no `.nmd` file, no + `.notion-md/` store, nothing written anywhere. Pure stdin/stdout. `cat` + additionally supports a read-only `--frontmatter` envelope dump. +- **`edit` — ephemeral file-engine session.** Sugar over the file-based `sync` + engine: pull the page into a `.nmd` + `.notion-md/` under `$TMPDIR`, present the + body in `$EDITOR`, push through `syncPage`, then delete the temp tree. Not a + second push engine. + +### Scope boundary + +The **stateless pipes (`cat`/`put`)** operate only on the body + title (decision +[0008](../.decisions/0008-streaming-scope-boundary.md)). Surfaces they do not +represent are left untouched on the remote; a user who needs them uses `edit` or +the file-based path. **`edit`**, being engine-backed, additionally reaches the +engine's extras on _representable_ pages. + +| Surface | `cat`/`put` | `edit` | Notes | +| -------------------------------------------------------------- | ----------------------------- | -------------------------- | --------------------------------------------------------- | +| body, title | yes | yes | the pipe projection | +| writable properties / metadata | `cat --frontmatter` read only | yes (`edit --frontmatter`) | stateless property _write_ dropped (decision 0017) | +| file bytes / object store | no | yes (in `$TMPDIR`) | hosted media canonicalized either way (0007) | +| comments, data-source schema | no | via engine | file/engine only | +| base-snapshot three-way merge | no | yes | `edit` inherits the engine's 3-way Markdown merge | +| lossy/opaque blocks (`child_database`, `synced_block`, toc, …) | refused | refused | uniform refusal at the pull (exit 3, decisions 0016/0017) | +| tree / child-page / move / trash | no | no | file-based `sync`/tree only | + +### Representation modes + +| Mode | Selector | Shape | Available on | +| ----------- | --------------- | ----------------------------------------------------- | --------------------------------- | +| Default | (none) | `# <title>` then a blank line then the body Markdown | `cat`, `put`, `edit` | +| Frontmatter | `--frontmatter` | full `.nmd` envelope (strict JSON frontmatter + body) | `cat` (read), `edit` (read/write) | + +Default mode presents the title as a leading H1 (decision [0001](../.decisions/0001-title-as-h1-presentation.md)); the title is +transport-routed through the typed page API on write, never as a body block. +`--frontmatter` carries the writable projection (title + writable metadata + +writable properties + body). **Stateless `put --frontmatter` is not provided** +(decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)): a safe property write needs drift detection, which needs a base +snapshot — so structured property editing is `edit --frontmatter` (interactive, +engine-backed) or the file-based `sync` (scripted). `cat --frontmatter` is a +read-only envelope dump and is always safe in a pipe. The writable projection and +the `schema_snapshot` drift check are owned by [06-data-source](../06-data-source/spec.md). + +### Title boundary contract (default mode) + +| Situation | `cat` emits | `put` behavior | +| -------------------------------- | ------------------ | ----------------------------------------------------------- | +| Page has a title | `# <title>` line 1 | line 1 → typed title API; remainder → body | +| Untitled page | empty `# ` line 1 | empty title round-trips as untitled | +| Body has its own leading H1 | title H1 then body | unambiguous: line 1 is the title, the rest is body verbatim | +| Line 1 is **not** a `# ` heading | n/a | **refuse, fail-loud** — no silent title mutation (T03) | + +The body sent to Notion always has the title H1 stripped (R01). A missing title +line is refused rather than guessed because silently emptying a title is +property-level data loss. + +### Guard plumbing + +**`cat` / `put` (stateless pipes):** + +- `cat` writes the base hash to **stderr** (`base-hash: sha256:…`); **stdout is + pure Markdown** for clean piping (decision [0002](../.decisions/0002-base-hash-on-stderr.md)). +- The base hash covers the pipe's writable surface (decision [0006](../.decisions/0006-writable-projection-guard.md)): title + body, + hosted-media URLs canonicalized (decision [0007](../.decisions/0007-canonicalize-hosted-media-urls.md), [04-fidelity](../04-fidelity/spec.md)), with read-only / computed / + volatile fields excluded so the hash is stable across pulls and `cat`→`put` is + idempotent. +- **Canonical serialization (the base hash must be reproducible by an independent + implementation):** the title+body is serialized as the default-mode `# title` + - body text with `\n` line endings and hosted-media URLs canonicalized + (decision 0007), then `sha256` with a `sha256:` prefix. The exact byte form is + load-bearing for the cross-machine optimistic-concurrency token. +- `put` is guarded by default: it re-reads remote, recomputes the title+body + hash, and refuses with exit `7` (`NotionMdBodyConflictError`) if it differs + from `--base-hash`. +- `put` with neither `--base-hash` nor `--force` refuses with guidance. `--force` + is the explicit destructive mode (R15) and reports that it bypassed the guard. + +**`edit` (engine session):** carries **no stderr base hash**. Concurrency safety +comes from the engine's **base snapshot** captured at the ephemeral pull and +compared at `syncPage` push (the same guard the file path uses, decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md), +[03-sync-engine](../03-sync-engine/spec.md)) — stronger than the pipe's 2-way +hash, since the engine can auto-merge non-overlapping concurrent edits. In +`--frontmatter` mode the engine also detects data-source **schema drift** from a +`schema_snapshot` captured at pull ([06-data-source](../06-data-source/spec.md); +the earlier stateless in-buffer fingerprint is gone — superseded by decision +[0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)). Writable vs +read-only/computed property split is `propertyWriteClassFromType` / +`PROPERTY_WRITE_CLASSES` (`@overeng/notion-core`), the single source of truth. + +### Exit codes and error model + +Each expected failure is a tagged error (R18) mapped to a distinct exit code for +scriptability: + +| Exit | Tagged error | When | +| ---- | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 0 | — | success | +| 1 | `NmdGatewayError` | gateway/API failure (network, 5xx, rate-limit) — remote unreachable, distinct from a bad-input failure | +| 2 | (CLI framework) | bad flags/args | +| 3 | `NmdRemoteBodyLossyError` | **the defining refusal (uniform across `cat`/`put`/`edit`)** — the page contains a not-losslessly-representable block (`child_database`, `synced_block`, `table_of_contents`, `child_page`, API `unsupported`, …) and cannot be edited as Markdown; the message names the block and points to the Notion UI or the file-based `.nmd` sync path (decisions 0016/0017) | +| 4 | `NmdUnresolvablePageError` | `<page>` not a valid id/URL, or page not found | +| 5 | `NmdInvalidDocumentError` | default mode missing leading title H1; malformed `--frontmatter` envelope | +| 6 | `NmdSchemaDriftError` | `edit --frontmatter` (and file `sync`): the data-source schema changed since the pull, detected by the engine's `schema_snapshot` comparison (R14); resolve by re-pulling, not by `--force` | +| 7 | `NotionMdBodyConflictError` | guard conflict — remote moved since `--base-hash` (`put`), or the engine's base snapshot diverged and 3-way merge failed (`edit`) | +| 8 | `NmdEditorAbortedError` | `edit`: `$EDITOR` exited non-zero → no push | +| 9 | `NmdPostPushGateError` | post-push `semanticEquivalent` gate rejected the result (the page may be mutated — re-`cat`) | +| 10 | `NmdPartialWriteError` | `put` only: one of the two writes (body, title) landed and the other failed (decision 0012); page is in a mixed state — re-`cat` | + +The error family uses the `Nmd*` prefix; the one exception is the pre-existing +`NotionMdBodyConflictError` (exit 7), which keeps its legacy `NotionMd*` name. +Exit 3 is checked **at the pull** on all three verbs (for `edit`, at the +ephemeral file-engine pull) — none of them presents a body it cannot round-trip. +Exit 6 is **redefined**: it is no longer the deleted stateless in-buffer +fingerprint but the engine's `schema_snapshot`-based schema-drift +refusal for `edit --frontmatter` / `sync` (decision 0017, R14) — kept on its own +axis from the exit-7 value/body conflict so it is not `--force`-able. Exit 11 +(opaque-move) is **removed** (no opaque move, decision 0016). Exit 10 applies +only to the stateless `put`; `edit` inherits the engine's settle-and-re-pull. + +### Edge behavior + +- **Empty body** on `put` is a valid edit (Notion allows an empty body); it is + applied. `edit` prints a stderr note if the buffer became empty but still + pushes. +- **`put`/`edit` operate on an existing bound page only.** Page creation stays on + the file-based / tree path; an unresolvable or nonexistent page is exit 4. +- **`edit` aborts on a non-zero editor exit** (exit 8, nothing pushed). +- **Two writes, body first (decision 0012):** a `put` is body (`replace_content`) + then title (typed API). If one lands and the other fails it reports which landed + and exits 10 (which dominates the exit-9 post-push gate); never silent exit 0. + Recovery is re-`cat`. +- **Title / body H1 sharp edge:** `put` takes line 1 as the title verbatim even + when line 2 is also `# …`; a page whose first body block is a `heading_1` + therefore shows two leading `# ` lines in `cat`. This is parser-deterministic + but visually duplicated; the only protection against an accidental title wipe + is the missing-title-H1 refusal (exit 5). `replace_content` preserves a body + that starts with `# Heading` (verified), so the `put` transport does not eat + it; only Notion's create-from-Markdown absorbs a leading `#` (not used here). +- **Untitled / empty body:** `cat` emits exactly `# ` (hash + single space) on + line 1 for an untitled page; an untitled, empty-body page is `# \n` only. `put` + parses the line-1 remainder after `# ` as the title (empty → untitled). The + exact bytes are load-bearing for the missing-title-H1 check and idempotence. +- **`--frontmatter` mode never treats a body H1 as the title** — the title lives + in the frontmatter block; a leading `# …` in the body is ordinary content. The + title-H1 contract is default-mode only. +- **`--force` is concurrency-only** (decision [0009](../.decisions/0009-force-is-concurrency-only.md)): it bypasses the exit-7 guard + and nothing else. It does not override the exit-3 lossy refusal, which is + correctness, not concurrency. Per R15 it reports exactly that it bypassed the + guard. + +### `<page>` resolution + +`<page>` accepts a raw page id, a dashed id, or a full Notion URL, resolved +through `parseNotionUuid` from `@overeng/notion-core` (the same contract as +`sync <page-id-or-url>`). An unresolvable value fails fast with guidance (R17). + +### `edit` session + +`edit` is the canonical-editor convenience and an **ephemeral file-engine +session** (decisions [0003](../.decisions/0003-edit-is-a-session-not-live-sync.md), [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)) — sugar over the `sync` engine, not a separate +push path: + +1. `mktemp -d` a session dir under `$TMPDIR` (never the cwd) and `pullPage` the + page into `<dir>/page.nmd` + `<dir>/.notion-md/`. The pull's + `assertRemoteMarkdownComplete` gate refuses a lossy page here (exit 3), + exactly like `cat`/`put`. +2. Present the body to `$VISUAL` → `$EDITOR` → `vi` — default mode strips the + frontmatter and shows `# <title>` + body; `--frontmatter` shows the full + envelope. Wait for exit. +3. On a non-zero editor exit, abort (exit 8), nothing pushed. +4. On an unchanged buffer, no-op. +5. Otherwise splice the edit back into the envelope (parse line-1 title H1 into + the frontmatter title in default mode) and `syncPage` the temp `.nmd`. Because + every accepted page is representable, `edit` uses a full-body `replace_content` + (decision 0017), closing the targeted-update silent-partial-apply window. +6. On a conflict the engine writes a `.conflict.roughdraft.md`; `edit` **copies + it out of `$TMPDIR`** to a durable sibling (`<page>.conflict.md`) and prints + recovery guidance. +7. Always scope-clean the session dir (success / conflict / abort / interrupt). + +No editor plugin is shipped; `edit` works with any `$EDITOR` (decision 0003). + +### `edit --read-only` (inspection, never sync) + +`edit --read-only` (R46) is the read pipe analogue of the editor: pull and present +the page in `$VISUAL`/`$EDITOR` exactly like `edit`, but on exit **never push and +never write anything to the remote** — discard any edits, scope-clean the temp +tree, and print a stderr note that changes were not synced. Steps 1–2 above run; +steps 5–6 (splice + `syncPage` + conflict relocation) do **not**. No base-snapshot +guard is needed (nothing is written), so a non-zero editor exit is just a clean +no-op, not an abort. Because it never writes, the heavy engine push path is not +required — a read-only session may use the lighter `observeRemoteEditorPage` read +(like `cat`) into a temp file rather than a full engine pull. + +- `--read-only` composes with `--frontmatter` (inspect the full envelope + read-only). +- `--read-only --force` is contradictory (`--force` concerns the push) and is + **rejected** with a bad-usage error, not silently ignored. +- **Default lossy behavior:** read-only still refuses a not-round-trip-safe page + at the pull (exit 3, R30/R38), same as `edit`. Relaxing this for read-only — + since it never pushes, viewing a lossy page is harmless — is a deliberate **open + design question**, not assumed here. + +### Umbrella surface + +The commands appear as `notion-md cat|put|edit` standalone and `notion md +cat|put|edit` through the umbrella dispatch. `edit` is additionally promoted to +the top-level alias `notion edit <page>` (decision [0004](../.decisions/0004-umbrella-surfacing.md)). + +## Sync Progress Indicator (write path) + +Requirement trace: R43–R45. Decision [0018](../.decisions/0018-staged-task-list-sync-progress.md). Scope: the **write path only** — `edit` +save, `put`, and the file-based `sync`/`put` write. `cat` is a read pipe and is +**excluded** (its only stderr output stays the `base-hash:` line). + +A write-path sync makes several remote round-trips (status pull, pre-push +re-read, the `replace_content` body write, the typed title write, the post-write +re-observe/settle) with no UI in between, so it reads as a hang. The remedy is a +**discrete-stage** progress indicator, not a smooth `%` bar. + +### Why staged, not a percentage + +There is no per-block progress data to drive a percentage: `replace_content` is +one opaque server-side call, and the block-tree pull discovers children by a +recursive crawl whose total is unknown until it completes. A `%` would have to +invent a denominator — a fake number that erodes trust the moment a stage stalls. +Discrete named stages answer the real question ("is it hung, and on what?") +honestly: the user sees which step is running and that it is still moving. + +### Stages and rendering + +The indicator is `@overeng/tui-react`'s **`TaskList`** — one checklist row per +sync stage, each row a `{ id, label, status, message? }` with `status` in +`pending` / `active` (spinner) / `success` / `error` / `skipped`, plus an +`X/total · elapsed` summary line: + +``` +notion-md edit · 3/4 · 2.1s + ✓ observe remote body pulled + ✓ write-body replace_content + ⠋ write-title … + · settle pending +``` + +The four near-identical engine pulls must read as **distinct human stages** +(observe / write-body / write-title / settle, …) so the same mechanical +round-trip surfaces as a legible step rather than an indistinguishable repeat. +The stage vocabulary is a CLI-facing presentation contract, distinct from the +OTEL span names (which stay on `notion_md.*`); a stage may map to several spans or +none. + +### Mechanism + +- A `ProgressReporter` Effect service (`Context.Tag`) with a **no-op default + Layer**. The engine emits **purpose-tagged stage events** (stage id + label + + status transition) to it; in every non-interactive context the events fall on + the floor with zero rendering cost and no behavior change. The engine stays + render-agnostic. +- The CLI provides a `TaskList`-backed Layer **only on the write path**, rendered + through the TUI render seam to **stderr**, gated on `process.stderr.isTTY`. So + `cat`'s stdout stays pure Markdown (R01, decision 0002) and a piped/redirected + write (`… | put > file`) degrades to a static line or nothing — never animated + control sequences in a pipe or log (R44). +- The CLI `TaskList` app is constructed **lazily inside the command handler**, via + a memoized accessor — never at module top level. notion-md has no runtime TUI + import today; a top-level `createTuiApp(...)` would re-enter the same + concurrent-module-load TDZ that crashed the umbrella in #787 (`createTuiApp` + reached while `@overeng/tui-react` was mid-initialization), so the construction + is deferred to call time (R45). +- `put`'s two non-atomic writes (decision 0012) surface as two rows, so a partial + write (exit 10) is visibly the title row failing after the body row succeeded. + +### Complementary perf lever + +The redundant-4-pulls collapse (folding the near-identical engine pulls into +fewer round-trips) is the complementary performance lever, tracked separately in +**#788**. This indicator makes the existing passes legible; it does not change how +many there are. The two compose: fewer pulls means fewer stages, but the staged UI +stays correct either way. diff --git a/packages/@overeng/notion-md/docs/vrs/02-file-sync/requirements.md b/packages/@overeng/notion-md/docs/vrs/02-file-sync/requirements.md new file mode 100644 index 000000000..49a79cf27 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/02-file-sync/requirements.md @@ -0,0 +1,24 @@ +# Requirements: 02-file-sync + +**Role.** The file-based sync surface and its orchestration: the pull / status / +push flows over a persistent `.nmd` file, watch-mode lifecycle, and the +batch/tree orchestrator (target discovery, duplicate page-id preflight, bounded +concurrency, per-file results). One `.nmd` file maps to one Notion page; every +mutation passes through the shared page-local guards in +[03-sync-engine](../03-sync-engine/requirements.md). + +Builds on the cross-cutting [../requirements.md](../requirements.md) (global +A/T) and [../glossary.md](../glossary.md). IDs are GLOBAL and preserved. The +guarded-push / three-way-merge / settle engine this surface calls is owned by +[03-sync-engine](../03-sync-engine/requirements.md) (R09, R11, R13, R15); the +file/property-surface bits of R11/R13 are exercised here but defined there. + +## Requirements + +### Must coordinate file-based sync safely + +- **R20 Bounded concurrency:** Watch mode must serialize or intentionally coordinate sync passes so local writes, remote writes, and state-store updates cannot overlap unsafely. + +### Must verify watch behavior + +- **R28 Watch coverage:** Watch mode must be tested for debounce, coalescing, cancellation, overlapping events, remote polling, and shutdown. diff --git a/packages/@overeng/notion-md/docs/vrs/02-file-sync/spec.md b/packages/@overeng/notion-md/docs/vrs/02-file-sync/spec.md new file mode 100644 index 000000000..f7616d411 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/02-file-sync/spec.md @@ -0,0 +1,180 @@ +# Spec: 02-file-sync + +Specifies the file-based sync surface: the pull / status / push flows over a +persistent `.nmd` file, the CLI and batch/tree orchestration, and the watch-mode +lifecycle. Builds on [../requirements.md](../requirements.md) + +[./requirements.md](./requirements.md); terms in [../glossary.md](../glossary.md); +rationale in [../.decisions/](../.decisions/). See [../spec.md](../spec.md) for the +architecture index. + +Traces: R20, R28. The guarded-push / three-way-merge / settle decisions invoked +by these flows are owned by [03-sync-engine](../03-sync-engine/spec.md) (R09, R11, +R13, R15); the lossy-page refusal at the pull is owned by +[04-fidelity](../04-fidelity/spec.md) (R30/R38); the `.nmd` envelope and object +store written by these flows are owned by [05-local-state](../05-local-state/spec.md); +the schema-drift check before a property write is owned by +[06-data-source](../06-data-source/spec.md) (R14). + +## Pull Flow + +1. Decode CLI options. +2. Retrieve Notion page metadata. +3. Observe the remote body through the Notion body observation service. +4. Reject clean-base adoption if the observation is lossy. **Under R38 "lossy" + means any block whose body-Markdown rendering does not reparse to the same + block** (round-trip-safety), so a page with a `child_database` / + `table_of_contents` / `synced_block` / `child_page`-in-body / API `unsupported` + block is refused here — uniformly with the editor verbs (decisions 0016, 0017; + [04-fidelity](../04-fidelity/spec.md)). +5. Adopt the block-tree-rendered Markdown as the local body and base snapshot; + keep endpoint Markdown only as diagnostic evidence. +6. Retrieve block-API payloads only for **round-trip-safe** captures (files, media, + resolvable unknowns) to enrich storage; a not-round-trip-safe block makes the + observation lossy (step 4) rather than something to preserve and edit around. +7. Compute the body hash over the adopted rendered body. +8. Build a strict frontmatter envelope ([05-local-state](../05-local-state/spec.md)). +9. Write base snapshot and storage objects. +10. Write the `.nmd` file. +11. Emit a pull result with storage mode and object refs. + +Future selected surfaces add data-source schema, comments, and files before the write commit. + +## Status Flow + +1. Read and decode `.nmd` once. +2. Validate all referenced objects. +3. Retrieve the current remote page and Markdown. +4. Compute local body hash, remote body hash, property edit state, metadata drift, and unresolved unknown block IDs. +5. Return a typed status result. + +Status distinguishes `remoteBodyChanged` from `remotePageMetadataChanged`. The current implementation still exposes a combined `remoteChanged` convenience field. + +## Push Flow + +1. Read and decode `.nmd` once. +2. Pull remote state once for status. +3. Reject clean-base use of any lossy remote body observation ([04-fidelity](../04-fidelity/spec.md)). +4. Reject unresolved Roughdraft review markup unless explicitly allowed ([03-sync-engine](../03-sync-engine/spec.md), R13). +5. Reject body pushes that could delete resolvable unknown blocks unless destructive intent is explicit. (Not-round-trip-safe blocks never reach a push: the page was refused at pull, step 4 / R38 — this push guard is the secondary defense for resolvable captures only.) +6. If only page metadata or properties changed and the remote body changed, patch those surfaces and refresh local body from remote only when the refreshed body is complete. +7. If the remote body changed and local body changed, attempt a conservative three-way merge ([03-sync-engine](../03-sync-engine/spec.md)). +8. If merge succeeds, update Markdown and then properties. Before a property write, compare the data-source schema against the pull-time `schema_snapshot`; on drift, refuse with exit 6 (`NmdSchemaDriftError`, R14, [06-data-source](../06-data-source/spec.md)) rather than risk silently auto-creating options — resolve by re-pulling. +9. If merge fails, write a Roughdraft conflict artifact and leave remote unchanged. +10. Land the merged (or still-at-base) body through the engine's write-verb + selection — targeted `update_content` when safe, guarded `replace_content` + otherwise ([03-sync-engine](../03-sync-engine/spec.md#update_content-vs-replace_content)). +11. Settle: re-observe the remote body after writes and rewrite `.nmd` with fresh + body, base, page metadata, storage, and completeness evidence. The post-push + `semanticEquivalent` gate and the trusted-base refresh are owned by the engine + ([03-sync-engine](../03-sync-engine/spec.md#settle-and-post-push-verification)). + +The local file is read once for a push decision to avoid local snapshot drift. + +Clean-base writes are allowed only from complete body observations with +block-tree-rendered Markdown available. Endpoint truncation, unknown block IDs, +unsupported inventory entries, missing rendered evidence, or a rendered +block-tree suffix not present in the endpoint Markdown all block establishment, +tree materialization, facade settlement, and post-write clean-base refresh. The +engine governs when a write is considered settled (an incomplete refreshed +observation leaves the local `.nmd` base untrusted; [03-sync-engine](../03-sync-engine/spec.md#settle-and-post-push-verification)). + +Pull adoption is block-aware. Notion's Markdown endpoint may omit blank block +boundaries around heading/paragraph/divider sequences; reparsing that endpoint +Markdown through CommonMark can promote prose paragraphs to Setext/ATX headings. +`notion-md` therefore treats endpoint Markdown as evidence and adopts the +client block-tree renderer output as the clean body. + +## CLI + +Current commands: + +```bash +notion-md sync <page-id-or-url> page.nmd +notion-md sync docs --from-remote --root <page-id-or-url> +notion-md plan docs +notion-md status page.nmd +notion-md sync page.nmd [--watch] [--poll-interval-ms 30000] +notion-md sync docs +``` + +Environment: + +| Variable | Meaning | +| ------------------ | ---------------- | +| `NOTION_API_TOKEN` | Notion API token | + +Output: + +- One-shot commands emit pretty JSON results by default. +- Watch emits compact NDJSON event lines by default. +- Watch `sync_error` events include structured typed error fields. +- The long-term stable contract is explicit `--output human|json|ndjson`, with `auto` allowed only as a convenience alias after envelope schemas are versioned. + +The write-path commands also surface the staged sync-progress indicator on a TTY +stderr ([01-editor](../01-editor/spec.md#sync-progress-indicator-write-path), R43–R45). + +Future CLI contract: + +```bash +notion-md diff <file.nmd> [--surface body|properties|comments|files] +notion-md comments pull|push <file.nmd> +notion-md doctor <page-id-or-url|file.nmd> +notion-md store verify|gc|export <file.nmd> +``` + +Batch commands: + +```bash +notion-md status <target...> [--recursive] [--concurrency 4] +notion-md sync <target> [--recursive] [--concurrency 4] [--watch] +``` + +Rules: + +- A single file target emits a single-page JSON result. +- Multiple status targets or flat recursive directory targets emit a batch envelope. +- Directory tree targets read `.notion-md/workspace.json` as an internal tree + index when present. `plan` reports tree operations without writing files, and + `sync` applies the local tree unless `--from-remote` is explicit. +- Recursive discovery includes existing `*.nmd` files and skips `.notion-md`, + `.git`, and `node_modules`. +- Duplicate `page_id` values in the same batch are rejected before any Notion + mutation. +- Missing or malformed files are reported as per-file errors when other valid + targets can still run. +- Local file deletion, local rename, and remote page moves are not destructive + intent. Remote archive/delete remains explicit future behavior. + +Batch and folder support do not change the ownership unit: one `.nmd` file maps +to one Notion page, and every mutation still passes through the same page-local +guards ([03-sync-engine](../03-sync-engine/spec.md)). The batch layer only owns +target discovery, duplicate page-id preflight, bounded concurrency, per-file +result reporting, and multi-file watch scheduling. + +## Watch Lifecycle + +Requirement trace: R19, R20, R28. + +``` +initial event ----\ +file event --------> sliding queue -> debounce -> sync pass -> JSON event +remote poll ------/ +``` + +Rules: + +- One sync pass runs at a time per process. +- File events and poll events are coalesced. +- Each pass emits `sync` or `sync_error`. +- Sync-pass spans observe failures before the watch loop recovers. +- Interruption closes the watcher, stops polling, and cancels queued work. +- File events come from the Effect Platform `FileSystem.watch` stream. Production + adapters are thin stream producers; coalescing policy stays in the watch loop. +- Multi-file watch resolves the target set at startup, watches the containing + directories for those files, coalesces by path, and runs batch sync passes with + bounded concurrency. New files discovered after startup require restarting the + watcher until a tree manifest/daemon owns dynamic discovery. + +The watch core uses a sliding queue and debounce window. Future tests may inject +source streams and `TestClock`, but production code must stay on Effect Platform +watch primitives instead of raw runtime callbacks. diff --git a/packages/@overeng/notion-md/docs/vrs/03-sync-engine/requirements.md b/packages/@overeng/notion-md/docs/vrs/03-sync-engine/requirements.md new file mode 100644 index 000000000..3c611dec3 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/03-sync-engine/requirements.md @@ -0,0 +1,27 @@ +# Requirements: 03-sync-engine + +**Role.** The shared correctness engine both surfaces call: the guarded push +(re-read remote, refuse stale last-writer-wins overwrites), the conservative +three-way Markdown merge from a base snapshot, the settle-and-re-pull, the +review-safety guard, and the explicit-force escape hatch. The editor's `edit` +verb ([01-editor](../01-editor/requirements.md)) and the file path +([02-file-sync](../02-file-sync/requirements.md)) both push through this engine; +the stateless `cat`/`put` pipes use its 2-way verified-replace facade. + +Builds on the cross-cutting [../requirements.md](../requirements.md) (global +A/T) and [../glossary.md](../glossary.md). IDs are GLOBAL and preserved. The +clean-base requirement R09 depends on the last-clean base snapshots maintained by +[05-local-state](../05-local-state/requirements.md). The lossy-page refusal that +prevents a `replace_content` over an opaque block is owned by +[04-fidelity](../04-fidelity/requirements.md) (R30/R38); schema-drift refusal on a +property write is owned by [06-data-source](../06-data-source/requirements.md) +(R14). + +## Requirements + +### Must Prevent Data Loss + +- **R09 Base snapshots:** The local state store must preserve last-clean bases needed for guarded push and three-way merge. +- **R11 Guarded push:** Default push must re-read remote state and refuse last-writer-wins overwrites when the stored base is stale. +- **R13 Review safety:** Unresolved local review/suggestion markup must not be sent to Notion body content by default. +- **R15 Force clarity:** Destructive modes must be separate from normal push and report exactly which protections they bypass. diff --git a/packages/@overeng/notion-md/docs/vrs/03-sync-engine/spec.md b/packages/@overeng/notion-md/docs/vrs/03-sync-engine/spec.md new file mode 100644 index 000000000..c6bcef4b5 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/03-sync-engine/spec.md @@ -0,0 +1,133 @@ +# Spec: 03-sync-engine + +Specifies the shared correctness engine both surfaces call: the guarded push, the +conservative three-way Markdown merge from a base snapshot, the `update_content` +vs `replace_content` write-verb selection, the canonical base, the post-push +`semanticEquivalent` gate, the settle-and-re-pull, the review-safety guard, and the +explicit-force escape hatch. This is the authoritative home for those push-engine +mechanics; `edit`, the file path, and the `cat`/`put` 2-way facade all consume them +and cite here rather than restating them. Builds on +[../requirements.md](../requirements.md) + [./requirements.md](./requirements.md); +terms in [../glossary.md](../glossary.md); rationale in +[../.decisions/](../.decisions/). See [../spec.md](../spec.md) for the architecture +index. + +Traces: R09, R11, R13, R15. The pull/status/push flows that drive this engine are +specified in [02-file-sync](../02-file-sync/spec.md); the editor's `edit` verb +reuses this engine ([01-editor](../01-editor/spec.md), decision 0017); the stateless +`cat`/`put` pipes use the gateway-only 2-way verified-replace facade. The base +snapshots compared here are stored by [05-local-state](../05-local-state/spec.md); +the lossy-page refusal that prevents a `replace_content` over an opaque block is +owned by [04-fidelity](../04-fidelity/spec.md) (R30/R38). + +## Guarded push model + +Both surfaces push the body through a **guarded Markdown surface** — there is no +block-reconciliation engine ([04-fidelity](../04-fidelity/spec.md), R40/R41): + +- **Stateless `cat`/`put`** — a 2-way guarded verified replace: re-read remote, + compare against the caller's `--base-hash`, refuse on drift (exit 7) unless + `--force`, then `replaceRemoteBodyVerified` → `replace_content` ([01-editor](../01-editor/spec.md)). +- **File path and `edit`** — a 3-way Markdown merge from the engine's base + snapshot, then a guarded `replace_content`. `edit` reuses this wholesale + (decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)): pull-at-start captures the base snapshot, push-at-end compares it + and auto-merges non-overlapping concurrent edits. + +The base snapshot (R09) is the engine's optimistic-concurrency token; a stale base +is refused (R11), and a destructive override (R15) is separate from normal push and +reports exactly which protection it bypassed (`--force` is concurrency-only, +decision [0009](../.decisions/0009-force-is-concurrency-only.md)). + +## Merge And Conflict Policy + +Requirement trace: R11–R15. + +Body merge operates on canonical Markdown: + +| Case | Result | +| ----------------------------- | ----------------------------------------- | +| local equals remote | clean | +| local equals base | accept remote | +| remote equals base | accept local | +| non-overlapping ranges | merge | +| same-range same edit | accept merged edit | +| overlapping different edit | conflict | +| protected placeholder removal | conflict unless explicit destructive mode | + +The write verb a merged body lands through (`update_content` vs `replace_content`) +is selected per the rules below. + +Unresolved conflicts are written beside the `.nmd` file as Roughdraft Markdown: + +```markdown +# notion-md body conflict + +{==Body conflict==}{>>Remote and local body content both changed since the last clean pull.<<}{id="body-conflict"} + +## Base body + +... + +## Local body + +... + +## Remote body + +... +``` + +Normal push refuses unresolved Roughdraft review markup (R13). Explicit modes may +later apply, render, strip, or bridge review annotations. (`edit` relocates a +conflict artifact out of `$TMPDIR` to a durable `<page>.conflict.md`; see +[01-editor](../01-editor/spec.md#edit-session).) + +## update_content vs replace_content + +Both write verbs land through Notion's server-side Markdown parser, so the engine +never reconstructs blocks client-side ([04-fidelity](../04-fidelity/spec.md)): + +- **`replace_content`** is the guarded default — the whole canonical body is sent + and Notion re-parses it. Because the page was refused at the pull unless its body + is fully representable (decision [0016](../.decisions/0016-refuse-lossy-pages.md), [04-fidelity](../04-fidelity/spec.md)), `replace_content` can never + destroy an opaque block. +- **`update_content`** is an optimization. It may be used only when the base hunk + is unique in the current remote body and the returned Markdown equals the + expected body. Ambiguous or deletion-heavy edits fall back to guarded + `replace_content`. + +## Canonical Base + +The engine's base snapshot (R09) is **the canonical body, and only ever the value +the first pull emitted.** Notion canonicalizes lists, ordered-list counters, +code-fence language, and blank lines at write time, so the engine adopts the +canonical body returned by the first pull as the base. A client must **never** +recompute the base locally over the editable buffer (which is pre-canonical until +the next pull); for `cat`/`put` the base hash is exactly the value `cat` printed to +stderr ([01-editor](../01-editor/spec.md#guard-plumbing), decision [0002](../.decisions/0002-base-hash-on-stderr.md)). The base +snapshots themselves are stored by [05-local-state](../05-local-state/spec.md). + +Base tracking depends on hosted-media URL canonicalization for idempotence: media +URLs rotate on every pull, so the engine compares bodies only after the +canonicalization owned by [04-fidelity](../04-fidelity/spec.md#hosted-media-references) (R36, decision [0007](../.decisions/0007-canonicalize-hosted-media-urls.md)). + +## Settle and Post-Push Verification + +A remote write is not trusted until the engine settles it (R09, R11): + +- **Post-push `semanticEquivalent` gate.** After a write, the engine re-reads the + remote body and asserts it is semantically equivalent to the intended body + (exit 9 on mismatch). The gate runs with hosted-media URL canonicalization so a + rotated signed URL is not mistaken for a content change ([04-fidelity](../04-fidelity/spec.md#hosted-media-references), decision [0007](../.decisions/0007-canonicalize-hosted-media-urls.md)). +- **Settle and re-pull.** A successful write is settled by re-observing the remote + body and refreshing the local base from that fresh, complete observation. If the + refreshed observation is not complete, the local base stays untrusted and the + caller receives a typed lossy-remote-body error rather than a stale clean base + ([04-fidelity](../04-fidelity/spec.md), R38). Remote body is re-read immediately + before a guarded Markdown write to catch races between status and write. + +The surface-specific framing of these mechanics — `put`'s two-write body-first / +title-last order and its partial-write reporting (decision [0012](../.decisions/0012-non-atomic-title-body-write-order.md), exit 10) — is +specified where each surface drives the engine: [01-editor](../01-editor/spec.md) +for the `cat`/`put`/`edit` verbs and [02-file-sync](../02-file-sync/spec.md) for the +file path. diff --git a/packages/@overeng/notion-md/docs/vrs/04-fidelity/requirements.md b/packages/@overeng/notion-md/docs/vrs/04-fidelity/requirements.md new file mode 100644 index 000000000..6f2b7fd07 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/04-fidelity/requirements.md @@ -0,0 +1,31 @@ +# Requirements: 04-fidelity + +**Role.** The body-fidelity layer that decides what can round-trip as Markdown: +the sound round-trip-safety classifier, the uniform lossy-page refusal at the +pull, the guarded server-side replace (no lossy client-side reconstruction), and +hosted-media URL canonicalization. This is the deliberately **shared** layer — +both the editor pipes ([01-editor](../01-editor/requirements.md)) and the file +path ([02-file-sync](../02-file-sync/requirements.md)), and the engine +([03-sync-engine](../03-sync-engine/requirements.md)) that both call, depend on +it. The refusal is a property of the shared core (the classifier gate), enforced +at the pull on **every** surface. + +Builds on the cross-cutting [../requirements.md](../requirements.md) (global +A/T) and [../glossary.md](../glossary.md). IDs are GLOBAL and preserved. R30/R38 +are enforced on both the editor and file surfaces but placed by **primary owner** +(the classifier) here and cross-referenced from those subsystems. + +## Requirements + +### Must Prevent Data Loss + +- **R12 No silent loss of non-body content:** A not-round-trip-safe **body** block (`child_database`, `synced_block`, `table_of_contents`, `child_page`-in-body, degraded bookmark/embed, API `unsupported`) must be **refused at the pull** (R30/R38) rather than silently dropped on push — superseding the earlier "preserve + explicit-delete override" model, which live testing proved silently corrupts. A child page **as a tree node** (its own `.nmd` file) is preserved by the file-based tree engine ([02-file-sync](../02-file-sync/requirements.md)), distinct from a child-page block in the body. Resolvable captures (files, media) are preserved out of band in the object store ([05-local-state](../05-local-state/requirements.md)). + +### Must Not Silently Drop Or Corrupt Content + +- **R30 Lossy-page refusal (uniform):** All editor verbs (`cat`, `put`, `edit`) and the file-based `sync` must refuse a page whose body contains a not-losslessly-representable block (`child_database`, `synced_block`, `table_of_contents`, `child_page`, API `unsupported`, …) at the **pull**, with a message that names the block class and points to the Notion UI. The refusal is a property of the shared core (the classifier gate), not a streaming-only behavior; nothing must ever present or push a body it cannot round-trip. See decisions [0016](../.decisions/0016-refuse-lossy-pages.md), [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md). +- **R31 Guarded body-replace push (`cat`/`put`):** The stateless `put` must push the body through a guarded verified replace plus a typed title write — two writes, body first (decision [0012](../.decisions/0012-non-atomic-title-body-write-order.md)) — not block-level reconciliation. `edit` instead reuses the file engine's guarded push (decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md), [03-sync-engine](../03-sync-engine/requirements.md)). Because lossy pages are refused (R30), the body replace never runs over a body containing an opaque block. Mechanism in [04-fidelity spec](./spec.md#push-strategy-fidelity-intersection). +- **R36 Hosted-media URL canonicalization:** Hosted-media (signed-URL) blocks must be canonicalized — volatile signature/expiry query params stripped, origin and path kept — at every point a body is hashed, diffed, base-tracked, or gated, including the post-push semantic-equivalence gate, so media-bearing bodies are idempotent and pushable. External (stable) URLs are left untouched. The editor pipe's base hash ([01-editor](../01-editor/requirements.md), R34) depends on this canonicalization. +- **R38 Sound fidelity classification:** The body-fidelity classifier must flag every block whose **body-Markdown rendering does not reparse to the same block** (round-trip-safety) — not only `unsupported`-typed ones, but known-but-lossy blocks (`child_database` → `[embedded db]()`, `table_of_contents` → `[TOC]`, `synced_block`, `child_page`-in-body, degraded bookmark/embed, …) — so the refusal gate (R30) fires at the pull, on the **file path as well as the editor**. This is a correctness prerequisite proven by live testing (experiments.md): today these classify `complete`, so editing an _unrelated_ paragraph silently re-creates them as paragraphs on push (file `sync` and `edit` alike) — a current data-loss defect, not a hypothetical. +- **R40 No lossy client-side reconstruction:** A representable-body push must go through Notion's own `replace_content` server-side parse, never a client-side Markdown→block converter (live-proven to drop code/quote/to-do/image/nesting/inline marks). No client-side Markdown→block converter is in scope. See decision [0016](../.decisions/0016-refuse-lossy-pages.md) (which abandoned the renderer-symmetric converter); the live evidence is in [experiments.md](../experiments.md). +- **R41 Guarded Markdown push model:** Both paths must push the body through a guarded Markdown surface — the stateless `cat`/`put` a 2-way guarded verified replace, the file-based path (and `edit`, which reuses it) a 3-way Markdown merge from its base snapshot with a guarded `replace_content`. Neither uses a block-reconciliation engine; pages with opaque blocks are refused uniformly (R30). diff --git a/packages/@overeng/notion-md/docs/vrs/04-fidelity/spec.md b/packages/@overeng/notion-md/docs/vrs/04-fidelity/spec.md new file mode 100644 index 000000000..a7ba7e50d --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/04-fidelity/spec.md @@ -0,0 +1,173 @@ +# Spec: 04-fidelity + +Specifies the body-fidelity layer: the sound round-trip-safety classifier, the +uniform lossy-page refusal at the pull, the feature-mapping fidelity table, the +guarded server-side push strategy (no lossy client-side reconstruction), and +hosted-media URL canonicalization. Builds on +[../requirements.md](../requirements.md) and [./requirements.md](./requirements.md); +terms in [../glossary.md](../glossary.md); rationale in +[../.decisions/](../.decisions/). See [../spec.md](../spec.md) for the architecture +index. + +Traces: R12, R30, R31, R36, R38, R40, R41. This is the deliberately **shared** +layer: the editor pipes ([01-editor](../01-editor/spec.md)), the file path +([02-file-sync](../02-file-sync/spec.md)), and the engine +([03-sync-engine](../03-sync-engine/spec.md)) that both call all depend on it. The +refusal is enforced at the pull on every surface. + +## Refusing Lossy Pages (uniform) + +Requirement trace: R12, R38, Success Criterion 4. Decisions [0016](../.decisions/0016-refuse-lossy-pages.md), [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md). + +The editor serves the **representable-Markdown majority** and refuses the rest. +A page whose body contains any **not-losslessly-representable block** is refused +(exit 3) **at the pull** — uniformly across `cat`, `put`, and `edit` (and the +file-based `sync`, which refuses at the same gate). Refusal is a property of the +shared core, not a streaming-only carve-out: `edit` materializes through the same +`pullPage` whose `assertRemoteMarkdownComplete` gate fires the refusal (decision +0017), so it refuses the same pages the pipes do. + +The refusal criterion is **"not losslessly round-trippable"** — broader than the +API `unsupported` type. It covers `unsupported` plus known-but-lossy blocks: +`child_database` (renders `[embedded db]()`), `synced_block`, +`table_of_contents`, `breadcrumb` (renders `''`), `child_page`, and similar. The +body-fidelity classifier (`@overeng/notion-core`), which today flags only +`unsupported`, must be extended to flag every such block (R38, impl-delta Group +C). This is a **correctness prerequisite for the file path too**: today +`child_database`/`toc` classify `complete`, so without the extension a +`replace_content` push (file `sync` or `edit`) would silently destroy them. + +- **Refusal, not placeholdering.** The earlier reconciler/placeholder approach + was abandoned: Notion's platform bars the parts of it that + matter — no backlink endpoint (a moved `synced_block` original silently breaks + inbound references), `child_database` is uncreatable via the block API, and the + Markdown endpoint is non-injective. Refusing is the honest, elegant scope the + platform permits (decision 0016). +- **Message.** The exit-3 error names the offending block class and points the + user to the **Notion UI** to edit that block. The refusal is shared with the + file-based `sync` (same pull gate), so it is not a workaround to switch to + `sync` for these blocks. +- **Representable majority.** A page of paragraphs, headings, lists, to-dos, + quotes, code, callouts, toggles, tables, columns, equations, and **hosted or + external media** (media is representable — only its URL is volatile, decision 0007) round-trips cleanly and is fully editable. +- **Out-of-band preservation is for _round-trip-safe_ captures, not a lossy + escape hatch.** The file path's `unsupported_blocks` + object-store machinery + (Feature Mapping) captures files, media, and resolvable payloads on pages that + classify _complete_. Post-R38 **no page containing a not-round-trip-safe block + classifies `complete`** — such pages are refused at the pull on every surface — + so that machinery never applies to a not-round-trip-safe body block. The + pre-R38 "preserve any unsupported body block + `allow_deleting_content` + override" behavior is retired: live testing proved it silently corrupts + ([../experiments.md](../experiments.md)). Lossy pages are edited in Notion. + +## Hosted-Media References + +Requirement trace: R10, R36. Decision [0007](../.decisions/0007-canonicalize-hosted-media-urls.md). Live-validated in [../experiments.md](../experiments.md). + +Notion-hosted media (image/file/video/pdf with `type: "file"`) renders with an +expiring signed S3 URL (`X-Amz-*`) that **rotates on every pull**. Left raw, it +makes the body hash volatile (breaking `cat`→`put` idempotence and staling base +hashes with zero edits) and causes `update_content` pushes on media pages to be +rejected by the post-push gate. + +- Hosted-media URLs are **canonicalized** — strip the `X-Amz-*` / signature / + `Expires` query params, keep `origin + pathname` — at **every** point a body + is hashed, diffed, base-tracked, or gated, **including inside + `semanticEquivalent` / `canonicalizeBlockMarkdown`**. +- External (stable) URLs are left untouched and pushed as external media. +- The canonicalized URL is deterministic but not directly fetchable; acceptable + for an editing surface (the user edits text, not media URLs). Canonicalization + governs hashing/diffing/gating only; the live file stays authoritative on the + remote. + +The pipe base hash ([01-editor](../01-editor/spec.md#guard-plumbing)) and the +engine's base snapshot ([03-sync-engine](../03-sync-engine/spec.md)) both depend on +this canonicalization for idempotence. + +## Canonical Body Form (one function, both boundaries) + +Decision [0019](../.decisions/0019-one-canonical-body-at-both-wire-boundaries.md). + +There is **one renderer** (`treeToMarkdown`) and **one canonicalizer** +(`canonicalizeBlockMarkdown`, in `@overeng/notion-effect-client` beside the +renderer). Both Notion wire boundaries route the body through the canonicalizer, +so the body a surface reads (`cat`/`edit`/file sync), the body hashed/compared, +and the body pushed are the **same canonical bytes**: + +- **Pull receive** canonicalizes at the source: `observeFromSnapshots` + canonicalizes the rendered body once, before it feeds the inventory, the + fidelity classifier, and the evidence fingerprint — so all of them agree by + construction. +- **Push send** (`replace_content`) canonicalizes the same way. + +The renderer emits _parseable-not-canonical_ Markdown (it joins sibling blocks +with `\n\n` so they survive a reparse) and carries no spacing policy; the +canonical layer owns spacing/list-tightness (it forces `spread = false` on lists, +so a tight Notion list does not pull as a loose CommonMark list, and the stray +indented blank line inside nested lists is removed). Hosted-media URL +canonicalization (above) is a sub-step. `semanticEquivalent` (the push gate) is +whitespace-insensitive outside fenced code and is invariant across this — it +already masked the prior pull-loose / push-tight divergence. + +## Push Strategy (fidelity intersection) + +Refuse-lossy is what makes the **server-side `replace_content` push** safe: because +the page is refused unless its body is fully representable (decision 0016), the body +goes through Notion's own `replace_content` parser server-side +(`replaceRemoteBodyVerified`) with no block-level reconciliation and no client-side +Markdown→block converter — and since the body contains no opaque blocks, +`replace_content` can never destroy one (decision 0017). + +The guarded-push engine that owns `replace_content` vs `update_content` selection, +the canonical base, the post-push `semanticEquivalent` gate, and settle/re-pull is +[03-sync-engine](../03-sync-engine/spec.md#update_content-vs-replace_content); that +gate runs with the hosted-media URL canonicalization owned here (decision 0007). The +surface framing — `put` as a guarded body replace plus a typed title write, two +writes body-first, partial-write reporting (decision 0017, decision 0012) — is in +[01-editor](../01-editor/spec.md#edge-behavior) and the file path in +[02-file-sync](../02-file-sync/spec.md). + +## Feature Mapping + +Requirement trace: R01-R05. + +| Notion feature | Local body representation | Non-body state | Fidelity / policy | +| ------------------------------------- | -------------------------------- | ------------------------------- | -------------------------------------------------------------------------- | +| Page title/icon/cover | not body | frontmatter page fields | title preserved; icon/cover modeled | +| Page lock/trash state | not body | frontmatter page fields | field-level page API patch | +| Paragraphs, headings, lists | stock Markdown/enhanced Markdown | none | supported with Notion normalization | +| To-dos, quotes, dividers | stock Markdown/enhanced Markdown | none | supported | +| Code blocks | fenced blocks | language normalization | supported; aliases may normalize | +| Equations | Markdown/enhanced math syntax | raw rich-text fallback if lossy | block supported; inline conservative | +| Callouts, toggles, tables | enhanced Markdown tags | color/attribute normalization | supported with normalization caveats | +| Columns | enhanced column tags | none | supported by endpoint, needs coverage | +| Images/files/media | Markdown/enhanced media tags | future file payloads | not fully implemented | +| Bookmark/embed/link preview | not round-trip-safe in the body | — | **refused at pull** (R38) — edit in Notion | +| Child page/database **block in body** | not round-trip-safe in the body | — | **refused at pull** (R30/R38) | +| Child page **as a tree node** | own `.nmd` file (tree) | tree membership | preserved by the file-based tree engine (not a body block) | +| Data-source row properties | not body | typed property map | modeled writable properties ([06-data-source](../06-data-source/spec.md)) | +| Data-source schema/views | not body | future schema snapshot | not implemented | +| Comments | not body | future comment bridge | not implemented | +| Suggestions/review | Roughdraft local layer | review state | reject unresolved by default ([03-sync-engine](../03-sync-engine/spec.md)) | + +Known Notion enhanced Markdown limitations: + +- Notion normalizes valid Markdown on pull. +- Page title and properties are not included in Markdown body output. +- Some blocks pull as `<unknown>` with `unknown_block_ids`. +- The Markdown endpoint can return a prefix of the rendered block tree, such as + content before a divider; that response is lossy and cannot become a clean + `.nmd` base. +- The Markdown endpoint can omit separators around block boundaries; the clean + pull body is rendered from the block tree so paragraphs adjacent to headings + and dividers keep their block type. +- Signed file URLs expire and are not durable identity. +- Comments support inline Markdown-like content but are separate from body Markdown. +- A block whose body-Markdown rendering does not reparse to the same block + (`[TOC]`, `[embedded db]()`, degraded bookmark, …) is **refused at pull** (R38), + because a push would silently re-create it as a paragraph ([../experiments.md](../experiments.md)). +- `allow_deleting_content` can delete resolvable unknown blocks and tree child + pages/databases; the default is non-destructive. It is not an escape hatch for + not-round-trip-safe body blocks, which are refused before any push. + +Evidence for these limitations lives in [../experiments.md](../experiments.md). diff --git a/packages/@overeng/notion-md/docs/vrs/05-local-state/requirements.md b/packages/@overeng/notion-md/docs/vrs/05-local-state/requirements.md new file mode 100644 index 000000000..83709f41d --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/05-local-state/requirements.md @@ -0,0 +1,25 @@ +# Requirements: 05-local-state + +**Role.** The durable local state layer: the strict versioned `.nmd` envelope, +the page-id-keyed sync sidecar, the content-addressed `.notion-md/` object store, +and the volatile-URL exclusion that keeps local identity stable across pulls and +repository moves. The base snapshots the engine +([03-sync-engine](../03-sync-engine/requirements.md)) reads for guarded push and +three-way merge live here. + +Builds on the cross-cutting [../requirements.md](../requirements.md) (global +A/T) and [../glossary.md](../glossary.md). IDs are GLOBAL and preserved. The +writable projection captured into the envelope (page metadata, properties, +`schema_snapshot`) is owned by +[06-data-source](../06-data-source/requirements.md); the hosted-media +canonicalization that keeps stored URLs stable is owned by +[04-fidelity](../04-fidelity/requirements.md) (R36). + +## Requirements + +### Must Maintain Durable Local State + +- **R06 Versioned state:** Local sync state must use explicit schema versions and reject unknown fields unless an extension models them. +- **R07 Content addressing:** Large or immutable artifacts must be stored by content hash rather than by transient Notion retrieval URL. +- **R08 Stable references:** Object-store refs must use relative paths plus content addresses that survive repository moves. +- **R10 Volatile URL exclusion:** Expiring Notion file URLs must not be durable local identifiers. diff --git a/packages/@overeng/notion-md/docs/vrs/05-local-state/spec.md b/packages/@overeng/notion-md/docs/vrs/05-local-state/spec.md new file mode 100644 index 000000000..7a22545d7 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/05-local-state/spec.md @@ -0,0 +1,146 @@ +# Spec: 05-local-state + +Specifies the durable local state layer: the strict versioned `.nmd` envelope, +the page-id-keyed sync sidecar, the frontmatter schema shape, and the +content-addressed `.notion-md/` object store. Builds on +[../requirements.md](../requirements.md) + [./requirements.md](./requirements.md); +terms in [../glossary.md](../glossary.md); rationale in +[../.decisions/](../.decisions/). See [../spec.md](../spec.md) for the architecture +index. + +Traces: R06, R07, R08, R10. The writable projection carried in the envelope +(properties, page metadata, `schema_snapshot`) is owned by +[06-data-source](../06-data-source/spec.md); the base snapshots stored here are +consumed by [03-sync-engine](../03-sync-engine/spec.md) (R09); hosted-media URL +canonicalization that keeps stored URLs stable is owned by +[04-fidelity](../04-fidelity/spec.md) (R36). + +## Local Format + +``` +doc.nmd + frontmatter: strict local sync envelope + body: stock Notion enhanced Markdown + +.notion-md/ + objects/sha256/<2>/<62>.json + sync/<page-id>.json +``` + +### `.nmd` Envelope + +The `.nmd` file is a versioned local wrapper around a Notion enhanced Markdown body. +Version 2 keeps human-editable state in the file and moves derived sync +bookkeeping into a page-id keyed sidecar: + +```markdown +--- +{ + 'notion_md': + { + 'version': 2, + 'api_version': '2026-03-11', + 'object': 'page', + 'page_id': '00000000-0000-4000-8000-000000000001', + 'parent': { '_tag': 'page', 'id': '00000000-0000-4000-8000-000000000000' }, + 'page': + { + 'title': 'Page title', + 'icon': null, + 'cover': null, + 'in_trash': false, + 'is_locked': false, + }, + 'properties': {}, + }, +} +--- + +Enhanced Markdown body starts here. +``` + +Rules: + +| Rule | Specification | +| ------------------- | -------------------------------------------------------------------------------------- | +| Body boundary | Only bytes after frontmatter are sent to Notion Markdown endpoints. | +| Strict schema | Unknown frontmatter keys are errors. | +| Body hash | Hash canonical stripped body bytes, never frontmatter. | +| API version | `api_version` records the Notion API version used for the last clean pull. | +| Local version | `notion_md.version` is the local human-editable envelope version. | +| Sync sidecar | Derived state lives in `.notion-md/sync/{page_id}.json`, keyed by immutable page id. | +| Visible frontmatter | A page whose visible body starts with `---` must escape or precede that text. | +| Review markup | Roughdraft markers are local review state unless an explicit push mode says otherwise. | + +Local experiments confirmed that frontmatter sent through the Markdown endpoint becomes literal body content. Push must strip it. + +### Frontmatter Schema + +The Effect Schema in `@overeng/notion-effect-client` is the source of truth. The +current local shape is split between human-editable V2 frontmatter and +machine-managed V1 sync state: + +```ts +type NmdFrontmatterV2 = { + readonly notion_md: { + readonly version: 2 + readonly api_version: '2026-03-11' + readonly object: 'page' + readonly page_id: NotionId + readonly url?: string + readonly parent: ParentRef + readonly page: PageState + readonly properties: Record<string, WritablePropertyValue> + } +} + +type NmdSyncStateV1 = { + readonly version: 1 + readonly page_id: NotionId + readonly body: BodyState + readonly storage: SelfContainedStorage | ObjectStoreStorage + readonly read_only_properties: Record<string, ReadOnlyPropertyValue> + readonly data_source: DataSourceBinding | null +} +``` + +Schemas use tagged unions for polymorphic values, branded strings for Notion IDs +and hashes, and exact decoding with excess-property rejection. The +`WritablePropertyValue` / `PageState` / `DataSourceBinding` shapes carried here +are specified by [06-data-source](../06-data-source/spec.md). + +## Object Store + +Requirement trace: R07-R10, R16. + +Objects are immutable JSON payloads addressed by exact stored bytes: + +``` +.notion-md/objects/sha256/ab/cdef....json +``` + +| Role | Payload | Required validation | +| ----------------- | ----------------------------- | ------------------------------------------------------- | +| `base_snapshot` | last clean body snapshot | page id, body hash, object hash, schema version | +| `storage_payload` | overflow storage payload | page id, inventory equality with frontmatter, hash | +| `file_payload` | future file bytes or metadata | content hash, media type, local path or upload identity | +| `comment_payload` | future comment bridge state | comment IDs, discussion IDs, anchor metadata | +| `schema_snapshot` | data-source schema state | schema hash, property IDs, data-source id | + +The `base_snapshot` role is the engine's optimistic-concurrency token +([03-sync-engine](../03-sync-engine/spec.md), R09); the `schema_snapshot` role +backs the schema-drift refusal ([06-data-source](../06-data-source/spec.md), R14). + +Write order is object first, `.nmd` last. A failed `.nmd` write may leave orphan objects; a future `store gc` removes unreachable objects. Object paths in frontmatter are logical POSIX-style paths; the state store normalizes both expected and stored paths through the platform `Path` service before reading. + +Storage policy: + +| Case | Storage form | +| ------------------------------------------- | ---------------------------------------- | +| Small stable unsupported/file/comment units | inline `storage._tag = "self_contained"` | +| Large storage payload | `storage._tag = "object_store"` | +| Volatile signed Notion URLs | `object_store` | +| File bytes | future content-addressed file payload | +| Raw unsanitized API snapshots | object store only | + +The implementation currently supports self-contained storage and content-addressed `storage_payload` objects. It rejects legacy sidecar-shaped frontmatter instead of migrating it. diff --git a/packages/@overeng/notion-md/docs/vrs/06-data-source/requirements.md b/packages/@overeng/notion-md/docs/vrs/06-data-source/requirements.md new file mode 100644 index 000000000..398b9f5b2 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/06-data-source/requirements.md @@ -0,0 +1,27 @@ +# Requirements: 06-data-source + +**Role.** The typed property / page-metadata surface and its data-source binding: +the writable property-value forms, writable page metadata (title/icon/cover/ +lock/trash), the `data_source` binding, and the `schema_snapshot`-based +schema-drift refusal that guards a property write. Properties and page metadata +sync through the typed page/data-source APIs, never through body Markdown. + +Builds on the cross-cutting [../requirements.md](../requirements.md) (global +A/T) and [../glossary.md](../glossary.md). IDs are GLOBAL and preserved. The +`schema_snapshot` object role and the envelope these values are projected into +are stored by [05-local-state](../05-local-state/requirements.md); the drift +refusal is exercised by the engine +([03-sync-engine](../03-sync-engine/requirements.md)) before a property write and +reached interactively through `edit --frontmatter` +([01-editor](../01-editor/requirements.md)). Full data-source schema/view sync is +owned by the standalone [Notion datasource sync spec](../../../../notion-datasource-sync/docs/vrs/spec.md). + +## Requirements + +### Must Preserve Surface Boundaries + +- **R04 Property boundary:** Page and row properties must sync through typed page/data-source APIs, not through body Markdown. + +### Must Prevent Data Loss + +- **R14 Schema drift safety:** Property writes must refuse or require explicit acceptance when the data-source schema has changed since the last clean pull, with a distinct exit code that is **not** `--force`-able; the user resolves by re-pulling. Mechanism (the pull-time schema capture the live schema is compared against) in [06-data-source spec](./spec.md#data-source-binding-and-schema-drift), decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md) (which superseded the earlier stateless in-buffer fingerprint). diff --git a/packages/@overeng/notion-md/docs/vrs/06-data-source/spec.md b/packages/@overeng/notion-md/docs/vrs/06-data-source/spec.md new file mode 100644 index 000000000..a7dc9f51a --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/06-data-source/spec.md @@ -0,0 +1,99 @@ +# Spec: 06-data-source + +Specifies the typed property / page-metadata surface and its data-source binding: +the writable property-value forms, writable page metadata, the `data_source` +binding, and the `schema_snapshot`-based schema-drift refusal that guards a +property write. Builds on [../requirements.md](../requirements.md) + +[./requirements.md](./requirements.md); terms in [../glossary.md](../glossary.md); +rationale in [../.decisions/](../.decisions/). See [../spec.md](../spec.md) for the +architecture index. + +Traces: R04, R14. These values are projected into the `.nmd` envelope stored by +[05-local-state](../05-local-state/spec.md); the drift refusal is exercised by the +engine ([03-sync-engine](../03-sync-engine/spec.md)) before a property write and +reached interactively through `edit --frontmatter` ([01-editor](../01-editor/spec.md)). +Full data-source schema/view sync is owned by the standalone +[Notion datasource sync spec](../../../../notion-datasource-sync/docs/vrs/spec.md). + +## Writable Property Values + +Property frontmatter is human-editable only for modeled writable forms. Unknown or generated properties remain visible as read-only values. + +| Notion property type | Local form | Push encoding | +| -------------------- | -------------------------- | ----------------------------- | +| `title` | string | rich-text title from string | +| `rich_text` | string or null | rich text from string | +| `number` | number or null | number | +| `select` | option name or null | select by name | +| `multi_select` | option names | multi-select by names | +| `status` | option name or null | status by name | +| `date` | Notion date object or null | date object | +| `people` | user IDs | people IDs | +| `checkbox` | boolean | checkbox | +| `url` | string or null | url | +| `email` | string or null | email | +| `phone_number` | string or null | phone number | +| `relation` | page IDs | relation IDs | +| `files` | file refs | future file-upload resolution | +| `place` | place object or null | place object | +| `verification` | verification state object | verification object | +| generated properties | read-only wrapper | not pushed | + +Property IDs must be preserved when available. Display names are for readability; IDs win on rename or schema drift. + +The writable vs read-only/computed split is `propertyWriteClassFromType` / +`PROPERTY_WRITE_CLASSES` (`@overeng/notion-core`), the single source of truth, the +same predicate the editor's `--frontmatter` projection uses +([01-editor](../01-editor/spec.md#guard-plumbing)). + +## Writable Page Metadata + +The page metadata surface covers page state that is not part of the Markdown +body and is not a data-source property. + +| Field | Local form | Push encoding | +| ----------- | --------------------------------------- | ------------------- | +| `title` | string | page title property | +| `icon` | null, emoji, native icon, external file | page `icon` | +| `cover` | null, external or Notion-hosted file | external/null cover | +| `in_trash` | boolean | page `in_trash` | +| `is_locked` | boolean | page `is_locked` | + +Strict frontmatter accepts the read shapes Notion can return. The write planner +only emits page metadata patches for shapes Notion's page update API accepts: +page titles, null/external covers, null/emoji/native/external icons, +`in_trash`, and `is_locked`. Notion-hosted file URLs and custom emojis are +preserved as pulled state until their write behavior is verified. + +Both properties and page metadata sync through the typed `PATCH /pages/{id}` page +API (R04), never through body Markdown; a body conflict does not block a +property-only push ([02-file-sync](../02-file-sync/spec.md#push-flow)). + +## Data-Source Binding and Schema Drift + +Requirement trace: R14. The earlier stateless in-buffer fingerprint is superseded +by decision [0017](../.decisions/0017-edit-is-an-ephemeral-file-engine-session.md): drift is detected from a base snapshot, not a +re-derived stateless fingerprint. + +For a data-source-backed page, `pullPage` retrieves the parent data source +(`GET /v1/data_sources/{id}` via `page.parent.data_source_id`) and captures the +**writable** property schema into the sidecar `data_source` binding as a +`schema_snapshot` object ([05-local-state](../05-local-state/spec.md)): a canonical +projection of `{ name, type, sorted option names }` sorted by property name, +options only for `select`/`multi_select`/`status`, **hashing names not ids** (a +rename is id-preserving), excluding ids/colors/descriptions/status-groups/ +timestamps/computed properties. + +Before any property write the engine re-retrieves the live schema, recomputes the +hash, and on drift refuses with `NmdSchemaDriftError` (exit 6, +[01-editor](../01-editor/spec.md#exit-codes-and-error-model)) rather than risk +Notion silently auto-creating a `select` option for an unknown value name. This +refusal is on its own axis from the exit-7 value/body conflict and is **not** +`--force`-able; resolve by re-pulling. A benign color-only schema change does not +trip; the five structural mutations (add/remove/rename/retype property, add option) +do. + +This is the file-engine path that `edit --frontmatter` reuses — there is no +stateless in-buffer fingerprint and no `put --frontmatter` +([01-editor](../01-editor/spec.md), decision 0017). Standalone (non-data-source) +pages have no snapshot and skip the check. diff --git a/packages/@overeng/notion-md/docs/vrs/README.md b/packages/@overeng/notion-md/docs/vrs/README.md index be55e5d14..ba6c1bd96 100644 --- a/packages/@overeng/notion-md/docs/vrs/README.md +++ b/packages/@overeng/notion-md/docs/vrs/README.md @@ -3,9 +3,23 @@ These documents are the design source of truth for `@overeng/notion-md`. - [Vision](./vision.md) -- [Requirements](./requirements.md) -- [Spec](./spec.md) +- [Requirements](./requirements.md) — cross-cutting; per-subsystem requirements live in the numeric dirs +- [Spec](./spec.md) — thin architecture index + subsystem map +- [Glossary](./glossary.md) +- [Decisions](./.decisions/) — `0001`–`0019` (some early ids superseded and removed; rationale in [experiments.md](./experiments.md)) +- [Implementation Delta](./impl-delta.md) - [Experiments](./experiments.md) +The design is decomposed into layered subsystems, each with its own +`requirements.md` + `spec.md` (global requirement IDs preserved, never +renumbered): + +- [01-editor](./01-editor/spec.md) — `cat`/`put`/`edit` surface + sync-progress indicator +- [02-file-sync](./02-file-sync/spec.md) — pull/status/push flows, CLI, watch, batch/tree +- [03-sync-engine](./03-sync-engine/spec.md) — shared guarded push, 3-way merge, settle +- [04-fidelity](./04-fidelity/spec.md) — round-trip classifier, uniform lossy refusal, media canonicalization +- [05-local-state](./05-local-state/spec.md) — `.nmd` envelope + content-addressed object store +- [06-data-source](./06-data-source/spec.md) — typed property/metadata surface + schema-drift guard + The package docs explain usage. The VRS documents define the product shape, constraints, implementation model, evidence, and long-term design decisions. diff --git a/packages/@overeng/notion-md/docs/vrs/experiments.md b/packages/@overeng/notion-md/docs/vrs/experiments.md index 8e8dd77dc..84db57b8e 100644 --- a/packages/@overeng/notion-md/docs/vrs/experiments.md +++ b/packages/@overeng/notion-md/docs/vrs/experiments.md @@ -99,3 +99,105 @@ Artifacts: `tmp/notion-md-sidecar-files/`. - Tiny bytes were only acceptable because the fixture was artificially small. **Conclusion:** the launch format should use strict frontmatter for compact metadata plus a content-addressed object store for base snapshots and bulky or volatile payloads. A separate per-page state file is not needed until property/comment/file surfaces outgrow frontmatter. + +## Streaming Editor Surface (`cat`/`put`/`edit`) + +**Hypothesis:** A stateless stdin/stdout body surface over the body facade can drive editor-based two-way editing with a guarded push, idempotent round-trips, and lossless preservation of unknown blocks. + +**Method:** Live Notion (API `2026-03-11`), scratch pages under the shared test parent, all trashed at teardown. Exercised `observeRemoteBody` / `replaceRemoteBodyVerified` / `updatePageMetadata` / `updatePageProperties` / `updateMarkdown(update_content)` directly. Scripts under `tmp/notion-vim/` (gitignored). + +**Results:** + +- **Body round-trip is an immediate fixpoint.** A rich body hashed identically across two full pull→push-unchanged→pull rounds. Notion canonicalizes lists, ordered-list counters, code-fence language, and blank lines at _create_ time, so the canonical form must be adopted as the base (the author's pre-canonical source is not the fixpoint). +- **Title is transported out-of-band.** A leading `# H1` was absorbed as the page title and did not appear in the body; `updatePageMetadata` set the title with the body hash unchanged. Validates decision 0001. +- **Guard works.** A stale base-hash `put` was refused with `NotionMdBodyConflictError` (exit 7); remote preserved. +- **Property writability.** Writable properties round-trip via `updatePageProperties`; writing a computed field (`last_edited_time`) is rejected by Notion and unaffected. +- **`last_edited_time` is not a sub-minute change signal.** It is minute-rounded and only advances on a real edit; two no-op pulls never differ on it. Change detection must use body/property hashes, not the timestamp. +- **Hosted-media signed URLs break naive idempotence.** A Notion-hosted image renders the raw signed S3 URL (`X-Amz-*`), which rotates every pull → raw body hash differs between two no-op pulls; external-URL media is stable. URL-canonicalizing the body (strip signature, keep origin+path) makes the hash stable across pulls. Drives decisions 0006 and 0007. +- **`update_content` preserves untouched blocks — but only that.** A targeted block-level patch edited one paragraph while leaving an untouched lossy `child_database` intact. This is _preservation of untouched blocks only_; a follow-up adversarial review proved `update_content` **cannot move or delete** an opaque block from the rendered surface (its rendered token is absent from Notion's endpoint Markdown), and a multi-update batch silently partial-applies. Soundly editing _around_ opaque blocks would have required block-level reconciliation by id — which is why the streaming editor instead **refuses** pages containing opaque blocks (decision 0016) and pushes a representable body through a single guarded `replace_content`. Exception: `replace_content`/`update_content` on a hosted-media page is rejected by the post-push `semanticEquivalent` gate until media URLs are canonicalized there too (decision 0007). + +**Conclusion:** The streaming surface is sound on the existing primitives. The one cross-cutting requirement surfaced by live testing is **hosted-media URL canonicalization** at every hash/diff/gate point (including `semanticEquivalent`); it simultaneously fixes media idempotence and unblocks `update_content` on media pages. Decision 0006's idempotence rationale was corrected by this evidence (the volatile axis is the body URL, not metadata). + +Artifacts: `tmp/notion-vim/vrs-e2e-results.md`. + +## Ephemeral `edit` over the file engine + the silent-loss bug (live) + +**Hypothesis:** `edit` can be implemented as an ephemeral file-engine session +(pull into a `$TMPDIR` `.nmd`, edit, `syncPage`, cleanup), and the +classifier/pull gate refuses lossy pages uniformly so no surface corrupts them. + +**Method:** Live Notion (API `2026-03-11`), scratch pages under the provided test +parent, all archived by tracked id at teardown (no parent sweep). Ran the real +`notion-md sync` CLI into a `mktemp -d` dir for each block type, edited an +_unrelated_ paragraph, pushed, and compared block-API ground truth. Report: +`tmp/notion-vim/option2-ephemeral-edit-e2e.md`. + +**Results:** + +- **Transport is sound.** Two-arg establish-pull into a temp dir creates + `page.nmd` + `.notion-md/` + sidecar; a body edit pushes cleanly; the + concurrent-remote-edit guard fires (`NmdConflictError`, conflict roughdraft + written, remote preserved). `edit` needs nothing new from the engine. +- **The fidelity classifier protects _none_ of the renderable-but-lossy blocks + today — editing an unrelated paragraph silently destroys the untouched block:** + - `table_of_contents` → re-created as a `[TOC]` paragraph (total loss), + - `synced_block` → plain paragraph (text survives, sync identity lost), + - `bookmark` → link paragraph (URL survives, block degraded), + - `child_database` → survived **only** because Notion's _server_ refused the + delete (`This operation would delete 1 child page(s) or database(s)`), not + because the notion-md guard fired. + + Push returned exit 0 with `unresolvedUnknownBlocks: []` and no placeholder — the + guard never fired, because these blocks classify `complete` (not API + `unsupported`). Block-API ground truth confirmed the block id was replaced by a + new paragraph id. + +- **Mechanism = the non-injective endpoint, on _any_ edit.** These blocks render + to body Markdown (`[TOC]`, `[embedded db]()`, plain text) that Notion's parser + re-creates as a **paragraph** on push. The loss is not specific to `--force`; it + fires on a normal targeted edit of an unrelated line. + +**Conclusion:** This is a **pre-existing file-`sync` data-loss defect**, not an +editor-only concern, and it confirms refuse-lossy is the right call. It sharpens +the R38 criterion: a block is lossy (→ refuse at the pull) iff **its body-Markdown +rendering does not reparse to the same block type** (round-trip-safety), not +merely "type is API `unsupported`". Until the classifier is extended to that +criterion and the pull gate refuses these pages, neither `edit` nor file `sync` +may ship over them. The earlier offline claim that the engine "refuses lossy at +the pull" holds only for API-`unsupported`/`unknown_block_ids`/truncation today; +the renderable-but-lossy class is exactly the R38 gap. + +Artifacts: `tmp/notion-vim/option2-ephemeral-edit-e2e.md`. + +## Block-Level Reconciliation Feasibility + +**Hypothesis:** the body can be pushed by reconciling a desired block tree against the live remote tree by id, using Notion's block REST API. + +**Method:** raw REST against the pinned `2026-03-11` API; scratch pages exercising positional insert, delete/update by id, recreate-move per block type, and granular edits around opaque blocks. Scripts under `tmp/notion-vim/`. + +**Results:** + +- **All four primitives work** on the shipped API. Positional insert uses `position:{type:'after_block',after_block:{id}}` (the `after` param is rejected on `2026-03-11` — renamed, already wrapped in `blocks.ts`); insert _above_ a `child_database` works. Delete-by-id, update-by-id (id retained), and editing one block between opaque blocks while leaving them untouched all succeed. +- **Recreate-move** round-trips paragraph/callout/toggle/column_list/synced_block (original and reference) losslessly — but requires recursively fetching `/children` (never inline), **stripping read-only `null` fields** before re-append, and paginating (>100 children) / respecting nesting depth. It mints a **new id** (breaking inbound references), and **`child_database` is impossible** (not an append-able type). +- **The binding constraint is markdown→block fidelity, not the API.** The existing `markdownToBlocks`/`parseInlineMarkdown` is lossy (drops code fences, quotes, to-dos, images, nesting, inline `code`/`[link]`/`~~strike~~`), so reconstructing edited content through it silently corrupts. A sound client reconciler would have needed a renderer-symmetric converter to avoid this. + +**Conclusion:** per-block-by-id reconciliation is _feasible_ on the shipped API but its soundness hinges on a client renderer-symmetric converter, and recreate-move is impossible for `child_database` and unsafe for inbound-referenced originals (no backlink endpoint). Weighed against that cost and those hard platform limits, the design **refuses** pages containing opaque blocks (decision 0016) rather than build the reconciler/converter/recreate-move edifice; this evidence is the rationale for that refusal. The representable-body push needs none of it — Notion's `replace_content` parses the edited Markdown server-side. + +Artifacts: `tmp/notion-vim/reconciler-feasibility.md`. + +## Stateless Schema-Drift Fingerprint + +**Hypothesis:** a schema fingerprint carried in the `--frontmatter` envelope can detect data-source schema drift statelessly. + +**Method:** live database/data-source mutations on `2026-03-11`; compute and compare fingerprints across benign and structural changes. Script under `tmp/notion-vim/`. + +**Results:** + +- **Stateless recovery holds:** `page.parent` carries `data_source_id`, so `put` recovers the exact data source; the schema lives on `GET /v1/data_sources/{id}` (the 2025-09-03+ split). +- **Hashable subset:** `{name, type, sorted option names}` sorted by name, options only for select/multi-select/status; hash _names_ not ids (rename is id-preserving); exclude ids, colors, descriptions, timestamps, `request_id`. Stable across identical reads and a benign color-only change; all five structural mutations produced distinct fingerprints. +- **Not redundant:** writing an unknown select-option _name_ returns **HTTP 200 and silently auto-creates the option**, corrupting the schema — the fingerprint is the only precise pre-write guard. +- `PROPERTY_WRITE_CLASSES` (`@overeng/notion-core`) matches live writable/computed behavior. + +**Conclusion:** the in-buffer schema fingerprint was sound and implementable, and adds real value over Notion's own (silent) handling — but the stateless fingerprint was later superseded by decision 0017 (drift is detected from the engine's base snapshot instead). The underlying value here (a precise pre-write schema-drift guard) carries over. + +Artifacts: `tmp/notion-vim/schema-fingerprint-verify.md`. diff --git a/packages/@overeng/notion-md/docs/vrs/glossary.md b/packages/@overeng/notion-md/docs/vrs/glossary.md new file mode 100644 index 000000000..8640ed546 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/glossary.md @@ -0,0 +1,106 @@ +# Notion Markdown Sync — Glossary + +Domain language for `@overeng/notion-md`, with emphasis on the editor surfaces +(`cat`/`put`/`edit`) added for editor-based two-way editing. + +## Language + +**Body pipe** (`cat` / `put`): +The stateless stdin/stdout body commands that move a page body between Notion and +an editor or pipe with **no** `.nmd` file and **no** `.notion-md/` directory. +Gateway-only. The genuinely stateless surface. Distinct from [[Editor session]] +(`edit`, engine-backed) and the file surfaces (`sync`, `status`, `plan`). +_Avoid_: streaming surface (it now spans the engine-backed `edit` too), pipe mode. + +**Default mode**: +The representation that is plain Notion enhanced Markdown plus the page title +rendered as a leading [[Title H1]]. The everyday human-editing shape. Available +on `cat`, `put`, `edit`. +_Avoid_: body mode, simple mode. + +**Frontmatter mode**: +The representation selected by `--frontmatter` that emits/accepts the full `.nmd` +envelope (strict JSON frontmatter with properties + body). Editable title and +writable properties; read-only properties pass through. Available on `cat` +(read) and `edit` (read/write); **not** on `put` (no stateless property write, +decision 0017). +_Avoid_: envelope mode, full mode, nmd mode. + +**Title H1**: +The leading `# <title>` line in default mode. A **presentation** of the page +title, never body content. On write it is extracted to the typed page-title API +and stripped from the body. See decision 0001. +_Avoid_: heading title, body title. + +**Presentation surface** vs **transport surface**: +Presentation is how a value is rendered for a human to edit (e.g. title as an +H1). Transport is the API/storage path a value actually syncs through (e.g. +title via the typed page-metadata API). A value may be presented one way and +transported another; R01/R03/R04 constrain transport, not presentation. + +**Base hash**: +The content hash of the title + body a `cat` caller read, used as the optimistic- +concurrency token for a [[Guarded put]]. A [[Body pipe]] concept only; `edit` +uses the engine's base _snapshot_ instead. Trips on a concurrent change to title +or body. +_Avoid_: body hash (the base hash spans title too), etag. + +**Guarded put**: +The default `put` behavior: re-read remote, compare against the caller's +[[Base hash]], and refuse (exit 7) if the remote moved. The pipe analogue of R11 +guarded push. + +**Force put**: +`put --force`: skip the [[Base hash]] guard (last-writer-wins). An explicit +destructive mode under R15; must report what it bypasses. + +**Editor session** (`edit`): +One `edit` invocation as an **ephemeral file-engine session**: `mktemp -d` under +`$TMPDIR` → `pullPage` into a temp `.nmd` + `.notion-md/` → `$VISUAL`/`$EDITOR` on +the body → `syncPage` push → scope-clean the temp tree. Sugar over the file +engine, not a separate push path; guarded by the engine's base snapshot. Not +live/continuous sync. See decisions 0003, 0017. +_Avoid_: streaming edit, gateway-only edit (it is engine-backed). + +**Opaque block** (unknown / not-losslessly-representable block): +A Notion block the body Markdown cannot fully represent — the API `unsupported` +type plus known-but-lossy blocks (`child_database`, `table_of_contents`, +`synced_block`, `child_page`, …). In code the unsupported-block snapshot schema +is `NmdUnsupportedBlockUnit` and the frontmatter field is `unsupported_blocks` +(there is no `NmdnUnit` / `n_blocks`). A page whose body contains one triggers a +[[Lossy-page refusal]] on every surface (`cat`/`put`/`edit`/`sync`). See +decisions 0016, 0017. +_Avoid_: unsupported block (the API's `unsupported` type is one source of these, +not the whole class); `n` block. + +**Lossy-page refusal**: +The **uniform** behavior — across `cat`/`put`/`edit` and the file-based `sync` — +when a page's body contains an [[Opaque block]]: refuse the page (exit 3) at the +**pull** with a message naming the block and pointing to the Notion UI, rather +than risk silently dropping or corrupting content. Enforced by the shared +classifier gate (`assertRemoteMarkdownComplete`), not a streaming-only behavior. +See decisions 0016, 0017. +_Avoid_: lossy error, unsupported error, streaming refusal. + +**Hosted media** vs **external media**: +Hosted media is an image/file/video/pdf block whose bytes Notion stores +(`type: "file"`), served via an **expiring signed URL** that rotates every pull. +External media references a stable third-party URL. Only hosted media needs +[[URL canonicalization]]. + +**URL canonicalization**: +Stripping the volatile signature/expiry query params (`X-Amz-*`, `Expires`) from +a [[Hosted media]] URL, keeping origin + path, wherever a body is hashed, +diffed, base-tracked, or gated. Makes media bodies idempotent and pushable. See +decision 0007. +_Avoid_: url stripping, url normalization (it is specifically the signed-param +strip). + +**Guarded body replace**: +The **`cat`/`put`** push engine: write the edited body through +`replaceRemoteBodyVerified` (Notion's `replace_content`, guarded by the +[[Base hash]]) then the title through the typed page API — two writes, body +first. Not block-level reconciliation; lossy pages are refused first +([[Lossy-page refusal]]). `edit` does not use this — it reuses the file engine's +guarded push. See decisions 0012, 0016, 0017. +_Avoid_: reconciliation, block patch. diff --git a/packages/@overeng/notion-md/docs/vrs/impl-delta.md b/packages/@overeng/notion-md/docs/vrs/impl-delta.md new file mode 100644 index 000000000..2346c4279 --- /dev/null +++ b/packages/@overeng/notion-md/docs/vrs/impl-delta.md @@ -0,0 +1,146 @@ +# Implementation Delta + +What the spec (the full long-term target) describes vs. what +`@overeng/notion-md` and the `notion` umbrella currently implement. The design is +**refuse-lossy, one engine** (decisions 0016, 0017): the editor serves the +representable-Markdown majority and refuses pages with opaque blocks uniformly; +`cat`/`put` are stateless body pipes and `edit` is sugar over the existing file +`sync` engine. The work groups below are dependency-ordered. + +## Already in place (reuse, don't rebuild) + +- **File `sync` engine** — `pullPage` / `syncPage` / `statusPage` (`sync.ts`) are + fully location-relative (state paths derive from the `.nmd` path arg, no + `process.cwd()`), already exercised in `mkdtemp` dirs by the live suite. `edit` + reuses this wholesale — pull to `$TMPDIR`, splice, `syncPage`, cleanup. +- **Body facade** — `observeRemoteBody` / `replaceRemoteBodyVerified` are + gateway-only and need no file/store. This is the whole `cat`/`put` push engine — + no block-level reconciliation. +- **Page id / URL resolution** — `parseNotionUuid` (`@overeng/notion-core`) + accepts raw ids, dashed ids, and full Notion URLs. +- **Frontmatter envelope** — `renderNmdFile` / `parseNmdFile` round-trip the + `.nmd` envelope purely (the body-only splice for `edit`). +- **Unsupported-block accounting** — `NmdUnsupportedBlockUnit` / + `unsupported_blocks` (no `NmdnUnit` / `n_blocks`); the body-fidelity classifier + flags only `unsupported` (`body-fidelity.ts:45`) — **the latent bug Group C + fixes**; the gateway has `updateMarkdown`, `updatePageMetadata`, + `updatePageProperties`. + +## Group A — editor surfaces `cat` / `put` / `edit` + +Spec: [01-editor](./01-editor/spec.md) "Editor Surfaces". Requirements: R32–R35, R37, R39. + +- [ ] `cat <page> [--frontmatter]` — default `# <title>` + body; base hash to + stderr (decision 0002); reuse `observeRemoteBody`; `--frontmatter` is a + read-only envelope dump; **refuse a lossy page (exit 3) at read time** (Group C). +- [ ] `put <page> (--base-hash <h> | --force)` — body + title only (no + `--frontmatter` write, decision 0017); title H1 → typed title API + stripped + from body (decision 0001); guarded by default; `--force` concurrency-only + (decision 0009); **two writes, body (`replaceRemoteBodyVerified`) first, + title last, partial-failure reported** (decision 0012, exit 10). +- [ ] `edit <page> [--frontmatter]` — **thin wrapper over the engine** (decision + 0017): `mktemp -d` under `$TMPDIR` → `pullPage` → body-only splice → `$EDITOR` + → reattach → `syncPage` (force full `replace_content`) → relocate any + `.conflict.roughdraft.md` out of `$TMPDIR` → scope-clean. No bespoke push + path, no base-hash threading, no partial-write model. +- [ ] `edit <page> --read-only` (R46) — pull + present in `$EDITOR`, **never push** + (discard edits, clean up, stderr note); composes with `--frontmatter`, rejects + `--read-only --force`. May use the lighter `observeRemoteEditorPage` read. +- [ ] Shared `<page>` resolution (`parseNotionUuid`) and the title↔H1 splice + helper (used by `cat`/`put` and `edit`); fail-loud on missing title H1; + exact untitled/empty-body bytes (spec edge behavior). + +Prototype (validated, not production): `tmp/notion-vim/` — `pagemd-live.ts`, +`notion-md-edit.sh`. + +## Group B — hosted-media URL canonicalization + +Spec: [04-fidelity](./04-fidelity/spec.md) "Hosted-Media References". Decision 0007. Requirement: R36. Shared by both +surfaces. Live testing (experiments.md) showed media-bearing pages are otherwise +non-idempotent and their pushes are rejected by the post-push gate. + +- [ ] Canonicalize hosted-media URLs (strip `X-Amz-*`/signature/`Expires`, keep + origin+path) everywhere a body is hashed/diffed/base-tracked. +- [ ] Apply the same canonicalization **inside** `semanticEquivalent` / + `canonicalizeBlockMarkdown` (`canonical-markdown.ts`) — currently + whitespace-only (`:95`), so any media-page push is rejected today. +- [ ] Leave external (stable) URLs untouched. + +## Group C — sound fidelity classification (the shared refusal gate) + +Spec: [04-fidelity](./04-fidelity/spec.md) "Refusing Lossy Pages (uniform)". Decisions 0016, 0017. Requirement: R38. +**Blocking prerequisite** — and a correctness fix for the existing file path, not +just streaming. + +- [ ] Extend the classifier beyond `unsupported` to flag every + not-losslessly-round-trippable block (`child_database`, `table_of_contents`, + `synced_block`, `child_page`, …). Today these classify `complete` with empty + `unknown_block_ids`, so a `replace_content` (file `sync` or `edit`) silently + destroys them (`body-fidelity.ts:45`, `assertRemoteMarkdownComplete` + `sync.ts:567`). +- [ ] Because the gate is at the pull (`assertRemoteMarkdownComplete`), the + refusal then covers `cat`/`put`/`edit`/`sync` uniformly with the same code — + exit 3, message naming the block class, pointing to the Notion UI. + +## Group F — `--frontmatter` schema-drift, via the engine (not a fingerprint) + +Spec: [06-data-source](./06-data-source/spec.md) "Data-Source Binding and Schema Drift" (+ [01-editor](./01-editor/spec.md) "Guard plumbing"). Decision 0017 (supersedes 0013). Requirement R14. The +stateless in-buffer fingerprint is **deleted**; `edit --frontmatter` detects drift +from a base snapshot, the same way the engine detects body conflict. + +- [ ] Capture the writable data-source schema into the engine sidecar as a + `schema_snapshot` (an already-designed object role) at `pullPage`, and + compare it at `syncPage` push, refusing a property write on drift (R14). + This is a small file-engine addition, not a parallel streaming subsystem. + +## Group G — error model + observability + tests + +- [ ] Tagged errors → exit codes (spec table): gateway failure (1), **lossy-page + refusal (3)**, schema drift (6, `edit --frontmatter`/`sync`, engine + `schema_snapshot`), conflict (7), editor abort (8), post-push gate (9), + partial write (10, `put` only). No exit 11. +- [ ] OTEL: `notion-md.cat|put` spans (mode, result, page id, `body_written`/ + `title_written`); `notion-md.edit` wraps the engine's `sync-page`/`push-page`/ + `status-page` spans (R21–R24; no tokens/bodies/signed URLs). +- [ ] Unit (title↔H1 split, base-hash, lossy-classifier verdicts), integration + (fake gateway incl. refusal path), live E2E (round-trip, conflict, media, + lossy-page refusal, ephemeral `edit` over the engine) — R25–R29. + +## Group H — write-path sync-progress indicator + +Spec: [01-editor](./01-editor/spec.md) "Sync Progress Indicator (write path)". +Decision 0018. Requirements: R43–R45. Designed, not yet implemented. + +- [ ] `ProgressReporter` Effect service (`Context.Tag`) with a **no-op default + Layer** — the engine emits purpose-tagged stage events to it (observe → + write-body → write-title → settle); non-interactive contexts (tests, fake/live + E2E, non-TTY) pay zero rendering cost and see no behavior change. +- [ ] CLI `TaskList`-backed Layer on the **write path only** (`edit`/`put`/file + `sync`), rendered through the TUI seam to **stderr, gated on + `process.stderr.isTTY`** so `cat`'s stdout stays pure and `… | put > file` + degrades to static. `cat` excluded. +- [ ] Construct the TUI app **lazily inside the command handler** (memoized + accessor), never at module top level — a top-level `createTuiApp` re-enters the + #787 concurrent-module-load TDZ. +- [ ] Map `put`'s two-write order (decision 0012) to two rows so a partial write + (exit 10) is visibly the title row failing after the body row. +- [ ] Complementary perf lever: collapse the redundant 4 pulls (#788) — fewer + stages, same staged UI. + +## notion-cli (umbrella) + +Decision 0004. Spec: [01-editor](./01-editor/spec.md) "Umbrella surface". Requirements R17–R18. + +- [ ] `notion md cat|put|edit` via existing dispatch — verify. +- [ ] Promote top-level alias `notion edit <page>`. +- [ ] Update notion-cli docs (already reflect the surface; keep in sync). + +## Dependency order + +C (the shared refusal gate) is the blocking prerequisite — it gates the pull on +every surface and fixes the latent file-path bug. B (media) makes representable +bodies idempotent. A is the surface: `cat`/`put` over the body facade, `edit` a +thin wrapper over the `sync` engine. F is a small engine addition for +`--frontmatter` drift. G spans everything. H (the staged sync-progress indicator) +is independent UI polish on top of A's write path. The reconciler/converter groups +and the stateless schema-fingerprint group are gone (decisions 0016, 0017). diff --git a/packages/@overeng/notion-md/docs/vrs/requirements.md b/packages/@overeng/notion-md/docs/vrs/requirements.md index b9a37d445..726b2c851 100644 --- a/packages/@overeng/notion-md/docs/vrs/requirements.md +++ b/packages/@overeng/notion-md/docs/vrs/requirements.md @@ -1,8 +1,17 @@ # Notion Markdown Sync Requirements -## Context - -These requirements serve [vision.md](./vision.md). They define the production constraints for a Notion <> Markdown sync tool built on Notion enhanced Markdown and local versioned state. +These requirements serve [vision.md](./vision.md). They define the production +constraints for a Notion <> Markdown sync tool built on Notion enhanced Markdown +and local versioned state. Terms are defined in [glossary.md](./glossary.md); +rationale for the hard-to-reverse choices lives in [.decisions/](./.decisions/) +and is cross-referenced by ID. + +The per-subsystem requirements (preserving the GLOBAL R-IDs) live under the +numeric subsystem dirs — see [spec.md](./spec.md) for the architecture index and +the map of which subsystem owns which requirement. The global Assumptions +(`A01…`) and Tradeoffs (`T01…`) below, plus the cross-cutting surface-boundary, +Effect-native, observability, and verification requirements, are inherited +downward by every subsystem. ## Assumptions @@ -19,52 +28,65 @@ These requirements serve [vision.md](./vision.md). They define the production co - **T03 Conservative push defaults:** The tool may block pushes that are probably safe if it cannot prove they preserve remote and out-of-band state. - **T04 Eventual watch refresh:** Watch mode may use polling or webhooks as triggers, but push correctness must still come from fresh pre-push reads. - **T05 Partial feature support:** Features without proven E2E fidelity may be preserved as unsupported blocks instead of being editable as first-class Markdown. +- **T06 Refuse rather than reconcile lossy pages:** The tool refuses a page whose body contains a not-losslessly-representable block (`child_database`, `synced_block`, table of contents, child page, …) instead of editing it — uniformly across the editor verbs and the file-based `sync` (decision [0017](./.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)) — because Notion's platform bars a sound edit of such blocks (no backlink endpoint, `child_database` uncreatable via the block API, non-injective Markdown endpoint). Losing the ability to edit those pages as Markdown is accepted in exchange for a small, correct, plugin-free design; such blocks are edited in the Notion UI. See decisions [0016](./.decisions/0016-refuse-lossy-pages.md), [0017](./.decisions/0017-edit-is-an-ephemeral-file-engine-session.md). (Owned by [04-fidelity](./04-fidelity/requirements.md).) +- **T07 Editor session, not live sync:** `edit` is a discrete pull-edit-push session over an ephemeral `$TMPDIR` `.nmd` + `.notion-md/` tree (decision [0017](./.decisions/0017-edit-is-an-ephemeral-file-engine-session.md)), not character-level live sync and not a zero-file in-memory buffer. `edit` is therefore not strictly stateless (only `cat`/`put` are); statelessness is preserved where it is intrinsic — the pipes — and traded for engine reuse in `edit`. The simpler, plugin-free, one-engine model is accepted in exchange. (Owned by [01-editor](./01-editor/requirements.md).) +- **T08 No stateless property write:** Structured property editing is available through `edit --frontmatter` (interactive) and the file-based `sync` (scripted), but not as a stateless pipe (`put --frontmatter`). A safe property write needs schema-drift detection, which needs a base snapshot; rather than carry a parallel stateless schema-fingerprint subsystem, that one niche (non-interactive property writes with no temp dir) is dropped in favor of `sync`. See decision [0017](./.decisions/0017-edit-is-an-ephemeral-file-engine-session.md). (Owned by [01-editor](./01-editor/requirements.md) / [06-data-source](./06-data-source/requirements.md).) ## Requirements -### Must Preserve Surface Boundaries +The data-loss, surface-boundary, durable-state, editor, and fidelity requirements +are distributed across the subsystem dirs, each keeping its GLOBAL ID (see the +[spec.md](./spec.md) index). The cross-cutting surface-boundary, Effect-native, +observability, and verification requirements below stay at root and are inherited +by every subsystem. + +### Must Preserve Surface Boundaries (cross-cutting) - **R01 Body boundary:** The body sent to Notion must be stock Notion enhanced Markdown with all local metadata stripped. - **R02 Multi-surface model:** Body, page metadata, properties, data-source schema, comments, files, unsupported blocks, and review state must be represented as distinct sync surfaces. - **R03 Frontmatter boundary:** Local frontmatter must never be interpreted as Notion-native metadata. -- **R04 Property boundary:** Page and row properties must sync through typed page/data-source APIs, not through body Markdown. - **R05 Comment boundary:** Notion comments must sync through the comments API or local review metadata, not through the body hash. -### Must Maintain Durable Local State - -- **R06 Versioned state:** Local sync state must use explicit schema versions and reject unknown fields unless an extension models them. -- **R07 Content addressing:** Large or immutable artifacts must be stored by content hash rather than by transient Notion retrieval URL. -- **R08 Stable references:** Object-store refs must use relative paths plus content addresses that survive repository moves. -- **R09 Base snapshots:** The local state store must preserve last-clean bases needed for guarded push and three-way merge. -- **R10 Volatile URL exclusion:** Expiring Notion file URLs must not be durable local identifiers. +(R04 Property boundary → [06-data-source](./06-data-source/requirements.md).) -### Must Prevent Data Loss +### Must Be Effect-Native (cross-cutting) -- **R11 Guarded push:** Default push must re-read remote state and refuse last-writer-wins overwrites when the stored base is stale. -- **R12 Unknown preservation:** Push must refuse to drop unsupported blocks, unknown placeholders, child pages, child databases, or synced block identity unless the user chooses an explicit destructive mode. -- **R13 Review safety:** Unresolved local review/suggestion markup must not be sent to Notion body content by default. -- **R14 Schema drift safety:** Property writes must refuse or require explicit acceptance when the data-source schema has changed since the last clean pull. -- **R15 Force clarity:** Destructive modes must be separate from normal push and report exactly which protections they bypass. - -### Must Be Effect-Native - -- **R16 Typed services:** Notion API access, local state, merge, file cache, comments, watch, and telemetry must be modeled as Effect services with explicit dependencies. +- **R16 Typed services:** Notion API access, local state, merge, file cache, comments, watch, telemetry, and progress reporting must be modeled as Effect services with explicit dependencies. - **R17 Schema validation:** Every untrusted boundary must decode through Effect Schema: CLI options, frontmatter, object-store payloads, Notion responses, and webhook payloads. - **R18 Typed errors:** Expected failures must use tagged errors with actionable context; unexpected defects must remain defects. - **R19 Scoped lifecycle:** Long-lived resources such as watchers, pollers, webhooks, caches, and HTTP clients must be scoped and interruptible. -- **R20 Bounded concurrency:** Watch mode must serialize or intentionally coordinate sync passes so local writes, remote writes, and state-store updates cannot overlap unsafely. -### Must Be Observable +(R20 Bounded concurrency → [02-file-sync](./02-file-sync/requirements.md).) + +### Must Be Observable (cross-cutting) - **R21 Service identity:** CLI, watch/daemon, and webhook receiver processes must use distinct OpenTelemetry service names. - **R22 Span coverage:** Every command, watch pass, Notion API request, local state transaction, merge decision, file upload, and destructive decision must emit a meaningful span. - **R23 Queryable attributes:** Spans must include concise `span.label` plus page, file, surface, operation, result, and Notion request identifiers when available. - **R24 Safe telemetry:** Trace attributes must not include tokens, full document bodies, private file contents, or expiring signed URLs. -### Must Be Verifiable +### Must Be Verifiable (cross-cutting) - **R25 Unit coverage:** Pure parsing, canonicalization, hashing, object-store validation, merge, and storage classification behavior must have deterministic unit tests. - **R26 Integration coverage:** Effect service boundaries must have integration tests with fake Notion and fake local state services. - **R27 Notion E2E coverage:** Supported Notion body features and destructive-guard behavior must be verified against real temporary Notion pages with cleanup verification. -- **R28 Watch coverage:** Watch mode must be tested for debounce, coalescing, cancellation, overlapping events, remote polling, and shutdown. - **R29 Trace coverage:** E2E or integration tests must assert the presence of required spans and key non-secret attributes. + +(R28 Watch coverage → [02-file-sync](./02-file-sync/requirements.md).) + +### Distributed to subsystems + +Each global ID lands in exactly one subsystem; cross-references resolve to the +owning subsystem. + +| Subsystem | Requirements | +| -------------------------------------------------- | ------------------------------------------- | +| [01-editor](./01-editor/requirements.md) | R32, R33, R34, R35, R37, R39, R43, R44, R45 | +| [02-file-sync](./02-file-sync/requirements.md) | R20, R28 | +| [03-sync-engine](./03-sync-engine/requirements.md) | R09, R11, R13, R15 | +| [04-fidelity](./04-fidelity/requirements.md) | R12, R30, R31, R36, R38, R40, R41 | +| [05-local-state](./05-local-state/requirements.md) | R06, R07, R08, R10 | +| [06-data-source](./06-data-source/requirements.md) | R04, R14 | + +R42 was removed (the stateless in-buffer schema fingerprint, decision 0017 +superseding 0013); it is not reintroduced. diff --git a/packages/@overeng/notion-md/docs/vrs/spec.md b/packages/@overeng/notion-md/docs/vrs/spec.md index c9e60b28d..8fb338e09 100644 --- a/packages/@overeng/notion-md/docs/vrs/spec.md +++ b/packages/@overeng/notion-md/docs/vrs/spec.md @@ -1,538 +1,155 @@ -# Notion Markdown Sync Spec - -This document specifies the Notion Markdown sync system. It builds on [requirements.md](./requirements.md). +# Spec: notion-md — architecture index + +This is the top-level architecture index for `@overeng/notion-md`. It builds on +[requirements.md](./requirements.md); terms are in [glossary.md](./glossary.md) +(inherited downward by every subsystem); the hard-to-reverse rationale is in +[.decisions/](./.decisions/), cited by relative path. The per-subsystem `spec.md` +files carry the detailed design; this page holds only Status, Scope, the system +shape + dependency diagram, the subsystem index, and the cross-cutting +OpenTelemetry, Verification, and residual long-term-decision lists. Evidence lives +in [experiments.md](./experiments.md); the spec-vs-implementation gap in +[impl-delta.md](./impl-delta.md). ## Status -Draft -- the implemented `@overeng/notion-md` package covers the core body/property sync path, strict `.nmd` frontmatter, content-addressed local state, guarded push/sync/watch behavior, batch multi-file and recursive folder orchestration, Effect Platform file watching, and live Notion E2E coverage. File bytes, comment projection, and webhook delivery are designed surfaces that remain outside the implemented core. Full data-source sync is owned by the standalone [Notion datasource sync spec](../../../notion-datasource-sync/docs/vrs/spec.md). +Draft -- the implemented `@overeng/notion-md` package covers the core body/property +sync path, strict `.nmd` frontmatter, content-addressed local state, guarded +push/sync/watch behavior, batch multi-file and recursive folder orchestration, +Effect Platform file watching, and live Notion E2E coverage. The `$EDITOR`-based +editor surface (`cat`/`put`/`edit`), the uniform lossy-page refusal, hosted-media +canonicalization, and the schema-drift guard are designed and partly landed (see +[impl-delta.md](./impl-delta.md)). The staged write-path sync-progress indicator +([01-editor](./01-editor/spec.md#sync-progress-indicator-write-path)) is designed, +not yet implemented. File bytes, comment projection, and webhook delivery are +designed surfaces that remain outside the implemented core. Full data-source sync +is owned by the standalone [Notion datasource sync spec](../../../notion-datasource-sync/docs/vrs/spec.md). ## Scope -This spec defines: - -- the `.nmd` local file contract, -- the `.notion-md` content-addressed local state store, -- sync surfaces and guarded conflict policy, -- CLI, batch, and watch behavior, -- Effect service boundaries, -- OpenTelemetry conventions, -- verification expectations and known limitations. +Defines (across the subsystem specs): the `.nmd` local file contract and the +`.notion-md` content-addressed local state store; the `$EDITOR`-based editor +surface and the write-path progress indicator; sync surfaces and guarded conflict +policy; the shared fidelity classifier and uniform lossy-page refusal; CLI, batch, +and watch behavior; the typed property / page-metadata surface and schema-drift +guard; Effect service boundaries; OpenTelemetry conventions; verification +expectations and known limitations. -This spec does not define: +Does not define: - a generic Notion renderer, - a rich text editor, - a full offline Notion clone, -- a replacement syntax for Notion enhanced Markdown. +- a replacement syntax for Notion enhanced Markdown, +- full data-source schema/view sync (see the [Notion datasource sync spec](../../../notion-datasource-sync/docs/vrs/spec.md)). ## System Shape ``` -notion-md CLI +notion-md CLI notion edit <page> (umbrella alias) + | | + | pull/status/push/sync/watch/batch | cat/put/edit (editor surface) + v v +Batch/tree orchestrator ────────► Editor surfaces (01-editor) + | discovery, dup page-id, concurrency | cat/put: gateway-only body pipes + v | edit: ephemeral $TMPDIR file-engine session +Sync coordinator (02-file-sync) ◄───────────┘ | - | pull/status/push/sync/watch/batch v -Batch/tree orchestrator +Sync engine (03-sync-engine): guarded push · 3-way merge · update/replace selection · canonical base · post-push gate · settle · review guard | - |-- target discovery, duplicate page-id preflight, bounded concurrency - v -Sync coordinator + +── depends on ──► Fidelity (04-fidelity): classifier · uniform lossy refusal · media canonicalization + | + +── reads/writes ─► Local state (05-local-state): .nmd envelope · object store · base snapshots | - |-- Local .nmd file - |-- .notion-md/objects/sha256/<hash>.json - |-- Notion Markdown endpoint - |-- Notion page/property APIs - |-- Notion block API for unsupported blocks - |-- Future: comments, files, data-source schema, webhooks + +── projects ─────► Data source (06-data-source): writable props/metadata · schema_snapshot drift + | + +── Local .nmd file + +── .notion-md/objects/sha256/<hash>.json + +── Notion Markdown endpoint · page/property APIs · block API + +── Future: comments, files, data-source schema, webhooks ``` Requirement trace: R01-R05, R16-R24. -The system treats Notion enhanced Markdown as one sync surface, not the whole page. The body surface is stock Notion enhanced Markdown. Local metadata, page properties, unsupported block preservation, files, comments, and review state are modeled outside the body so they are never silently sent as Notion Markdown. +The system treats Notion enhanced Markdown as one sync surface, not the whole page. +The body surface is stock Notion enhanced Markdown. Local metadata, page +properties, unsupported block preservation, files, comments, and review state are +modeled outside the body so they are never silently sent as Notion Markdown. + +**Effect service boundaries** (R16-R20). The CLI program provides the command +tree, option schemas, and output renderers. The sync coordinator (depends on +`NotionGateway` + `NmdStateStore`) owns pull/status/push/sync decisions. The +`NotionGateway` (depends on `NotionConfig` + `HttpClient`) owns typed Notion API +calls and response adaptation. The `NmdStateStore` (depends on `FileSystem` + +`Path`) owns `.nmd` IO, object refs, object validation, and atomic local writes. +The merge planner is a pure module. The watch service owns the event queue, +debounce, polling, and scoped cancellation. The `ProgressReporter` service +(`Context.Tag`, no-op default Layer) carries write-path stage events +([01-editor](./01-editor/spec.md#sync-progress-indicator-write-path), R45). +Untrusted payloads decode through Effect Schema at the boundary; expected failures +use tagged errors with page/file/surface context; long-lived watch resources are +scoped and interruptible; pure planning logic stays outside services with focused +unit tests. The public body facade exposes body-only observe, local read, materialize, verified remote replace, and clean-base settlement operations for adapters that -compose with `.nmd` files. The facade depends on `NotionMdGateway` and -`NmdStateStore`; it does not expose sync coordinator decisions or page metadata -mutation as an adapter surface. - -Remote body observations carry `@overeng/notion-core` body-completeness -evidence produced by `@overeng/notion-effect-client` live observation. -`notion-md` is the package that turns that evidence into clean-base policy: -single-page establishment, tree materialization, clean-base refresh, and the -body facade must refuse to treat a lossy Markdown observation as a clean `.nmd` -base. - -Batch and folder support do not change the ownership unit: one `.nmd` file maps -to one Notion page, and every mutation still passes through the same page-local -guards. The batch layer only owns target discovery, duplicate page-id preflight, -bounded concurrency, per-file result reporting, and multi-file watch scheduling. - -## Local Format - -``` -doc.nmd - frontmatter: strict local sync envelope - body: stock Notion enhanced Markdown - -.notion-md/ - objects/sha256/<2>/<62>.json - sync/<page-id>.json -``` - -Requirement trace: R06-R10. - -### `.nmd` Envelope - -The `.nmd` file is a versioned local wrapper around a Notion enhanced Markdown body. -Version 2 keeps human-editable state in the file and moves derived sync -bookkeeping into a page-id keyed sidecar: - -```markdown ---- -{ - 'notion_md': - { - 'version': 2, - 'api_version': '2026-03-11', - 'object': 'page', - 'page_id': '00000000-0000-4000-8000-000000000001', - 'parent': { '_tag': 'page', 'id': '00000000-0000-4000-8000-000000000000' }, - 'page': - { - 'title': 'Page title', - 'icon': null, - 'cover': null, - 'in_trash': false, - 'is_locked': false, - }, - 'properties': {}, - }, -} ---- - -Enhanced Markdown body starts here. -``` - -Rules: - -| Rule | Specification | -| ------------------- | -------------------------------------------------------------------------------------- | -| Body boundary | Only bytes after frontmatter are sent to Notion Markdown endpoints. | -| Strict schema | Unknown frontmatter keys are errors. | -| Body hash | Hash canonical stripped body bytes, never frontmatter. | -| API version | `api_version` records the Notion API version used for the last clean pull. | -| Local version | `notion_md.version` is the local human-editable envelope version. | -| Sync sidecar | Derived state lives in `.notion-md/sync/{page_id}.json`, keyed by immutable page id. | -| Visible frontmatter | A page whose visible body starts with `---` must escape or precede that text. | -| Review markup | Roughdraft markers are local review state unless an explicit push mode says otherwise. | - -Local experiments confirmed that frontmatter sent through the Markdown endpoint becomes literal body content. Push must strip it. - -### Frontmatter Schema - -The Effect Schema in `@overeng/notion-effect-client` is the source of truth. The -current local shape is split between human-editable V2 frontmatter and -machine-managed V1 sync state: - -```ts -type NmdFrontmatterV2 = { - readonly notion_md: { - readonly version: 2 - readonly api_version: '2026-03-11' - readonly object: 'page' - readonly page_id: NotionId - readonly url?: string - readonly parent: ParentRef - readonly page: PageState - readonly properties: Record<string, WritablePropertyValue> - } -} - -type NmdSyncStateV1 = { - readonly version: 1 - readonly page_id: NotionId - readonly body: BodyState - readonly storage: SelfContainedStorage | ObjectStoreStorage - readonly read_only_properties: Record<string, ReadOnlyPropertyValue> - readonly data_source: DataSourceBinding | null -} -``` - -Schemas use tagged unions for polymorphic values, branded strings for Notion IDs and hashes, and exact decoding with excess-property rejection. - -### Writable Property Values - -Property frontmatter is human-editable only for modeled writable forms. Unknown or generated properties remain visible as read-only values. - -| Notion property type | Local form | Push encoding | -| -------------------- | -------------------------- | ----------------------------- | -| `title` | string | rich-text title from string | -| `rich_text` | string or null | rich text from string | -| `number` | number or null | number | -| `select` | option name or null | select by name | -| `multi_select` | option names | multi-select by names | -| `status` | option name or null | status by name | -| `date` | Notion date object or null | date object | -| `people` | user IDs | people IDs | -| `checkbox` | boolean | checkbox | -| `url` | string or null | url | -| `email` | string or null | email | -| `phone_number` | string or null | phone number | -| `relation` | page IDs | relation IDs | -| `files` | file refs | future file-upload resolution | -| `place` | place object or null | place object | -| `verification` | verification state object | verification object | -| generated properties | read-only wrapper | not pushed | - -Property IDs must be preserved when available. Display names are for readability; IDs win on rename or schema drift. - -### Writable Page Metadata - -The page metadata surface covers page state that is not part of the Markdown -body and is not a data-source property. - -| Field | Local form | Push encoding | -| ----------- | --------------------------------------- | ------------------- | -| `title` | string | page title property | -| `icon` | null, emoji, native icon, external file | page `icon` | -| `cover` | null, external or Notion-hosted file | external/null cover | -| `in_trash` | boolean | page `in_trash` | -| `is_locked` | boolean | page `is_locked` | - -Strict frontmatter accepts the read shapes Notion can return. The write planner -only emits page metadata patches for shapes Notion's page update API accepts: -page titles, null/external covers, null/emoji/native/external icons, -`in_trash`, and `is_locked`. Notion-hosted file URLs and custom emojis are -preserved as pulled state until their write behavior is verified. +compose with `.nmd` files (the `cat`/`put` engine). The facade depends on +`NotionMdGateway` and `NmdStateStore`; it does not expose sync coordinator +decisions or page-metadata mutation as an adapter surface. Remote body +observations carry `@overeng/notion-core` body-completeness evidence produced by +`@overeng/notion-effect-client` live observation; `notion-md` turns that evidence +into clean-base policy and refuses to treat a lossy observation as a clean `.nmd` +base ([04-fidelity](./04-fidelity/spec.md)). -## Object Store - -Requirement trace: R07-R10, R16. - -Objects are immutable JSON payloads addressed by exact stored bytes: - -``` -.notion-md/objects/sha256/ab/cdef....json -``` - -| Role | Payload | Required validation | -| ----------------- | ------------------------------- | ------------------------------------------------------- | -| `base_snapshot` | last clean body snapshot | page id, body hash, object hash, schema version | -| `storage_payload` | overflow storage payload | page id, inventory equality with frontmatter, hash | -| `file_payload` | future file bytes or metadata | content hash, media type, local path or upload identity | -| `comment_payload` | future comment bridge state | comment IDs, discussion IDs, anchor metadata | -| `schema_snapshot` | future data-source schema state | schema hash, property IDs, data-source id | - -Write order is object first, `.nmd` last. A failed `.nmd` write may leave orphan objects; a future `store gc` removes unreachable objects. Object paths in frontmatter are logical POSIX-style paths; the state store normalizes both expected and stored paths through the platform `Path` service before reading. - -Storage policy: - -| Case | Storage form | -| ------------------------------------------- | ---------------------------------------- | -| Small stable unsupported/file/comment units | inline `storage._tag = "self_contained"` | -| Large storage payload | `storage._tag = "object_store"` | -| Volatile signed Notion URLs | `object_store` | -| File bytes | future content-addressed file payload | -| Raw unsanitized API snapshots | object store only | - -The implementation currently supports self-contained storage and content-addressed `storage_payload` objects. It rejects legacy sidecar-shaped frontmatter instead of migrating it. - -## Sync Surfaces +### Sync surfaces map Requirement trace: R01-R05, R11-R15. -| Surface | Local state | Pull API | Push API | Conflict unit | Current status | -| ------------------ | ------------------------------ | ------------------------------------- | --------------------------- | ------------------ | --------------------------- | -| Body | `.nmd` body + `base_snapshot` | block-tree render + endpoint evidence | Markdown update endpoint | canonical Markdown | implemented | -| Page metadata | frontmatter page fields | `GET /pages/{id}` | `PATCH /pages/{id}` | field | title/lock/trash/icon/cover | -| Properties | frontmatter property map | `GET /pages/{id}` | `PATCH /pages/{id}` | property | modeled writable forms | -| Unsupported blocks | frontmatter/object storage | Markdown + block API | preserve or explicit delete | block id | guard + preserve metadata | -| Data-source schema | external datasource-sync state | datasource-sync package | datasource-sync package | schema hash | owned by datasource sync | -| Comments | future comment payload | comments API | comments API | discussion/comment | designed, not implemented | -| Files | future file payload | block/file APIs | file upload APIs | content hash | modeled, not implemented | -| Review | Roughdraft local markup | local only or comments API | explicit bridge only | review id | guard implemented | - -Body conflicts do not block property-only pushes. Property-only pushes across a concurrent remote body edit patch properties, then refresh the local `.nmd` body and base from the current remote state. - -## Pull Flow - -1. Decode CLI options. -2. Retrieve Notion page metadata. -3. Observe the remote body through the Notion body observation service. -4. Reject clean-base adoption if the observation is lossy. -5. Adopt the block-tree-rendered Markdown as the local body and base snapshot; - keep endpoint Markdown only as diagnostic evidence. -6. Retrieve unknown block payloads through the block API when Markdown reports unknown/truncated blocks. -7. Compute the body hash over the adopted rendered body. -8. Build a strict frontmatter envelope. -9. Write base snapshot and storage objects. -10. Write the `.nmd` file. -11. Emit a pull result with storage mode and object refs. - -Future selected surfaces add data-source schema, comments, and files before the write commit. - -## Status Flow - -1. Read and decode `.nmd` once. -2. Validate all referenced objects. -3. Retrieve the current remote page and Markdown. -4. Compute local body hash, remote body hash, property edit state, metadata drift, and unresolved unknown block IDs. -5. Return a typed status result. - -Status distinguishes `remoteBodyChanged` from `remotePageMetadataChanged`. The current implementation still exposes a combined `remoteChanged` convenience field. - -## Push Flow - -1. Read and decode `.nmd` once. -2. Pull remote state once for status. -3. Reject clean-base use of any lossy remote body observation. -4. Reject unresolved Roughdraft review markup unless explicitly allowed. -5. Reject body pushes that could delete unknown blocks unless destructive intent is explicit. -6. If only page metadata or properties changed and the remote body changed, patch those surfaces and refresh local body from remote only when the refreshed body is complete. -7. If the remote body changed and local body changed, attempt a conservative three-way merge. -8. If merge succeeds, update Markdown and then properties. -9. If merge fails, write a Roughdraft conflict artifact and leave remote unchanged. -10. If remote body is still at base, use a targeted Markdown update when safe or guarded replace when necessary. -11. Re-observe the remote body after writes and rewrite `.nmd` with fresh body, base, page metadata, storage, and completeness evidence. - -The local file is read once for a push decision to avoid local snapshot drift. Remote body is re-read immediately before guarded Markdown updates to catch races between status and write. - -Clean-base writes are allowed only from complete body observations with -block-tree-rendered Markdown available. Endpoint truncation, unknown block IDs, -unsupported inventory entries, missing rendered evidence, or a rendered -block-tree suffix not present in the endpoint Markdown all block establishment, -tree materialization, facade settlement, and post-write clean-base refresh. A -successful remote write is not considered settled until the refreshed -observation is complete; otherwise the local `.nmd` base remains untrusted and -the caller receives a typed lossy-remote-body error. - -Pull adoption is block-aware. Notion's Markdown endpoint may omit blank block -boundaries around heading/paragraph/divider sequences; reparsing that endpoint -Markdown through CommonMark can promote prose paragraphs to Setext/ATX headings. -`notion-md` therefore treats endpoint Markdown as evidence and adopts the -client block-tree renderer output as the clean body. - -## Merge And Conflict Policy - -Requirement trace: R11-R15. - -Body merge operates on canonical Markdown: - -| Case | Result | -| ----------------------------- | ----------------------------------------- | -| local equals remote | clean | -| local equals base | accept remote | -| remote equals base | accept local | -| non-overlapping ranges | merge | -| same-range same edit | accept merged edit | -| overlapping different edit | conflict | -| protected placeholder removal | conflict unless explicit destructive mode | - -`update_content` is an optimization. It may be used only when the base hunk is unique in the current remote body and the returned Markdown equals the expected body. Ambiguous or deletion-heavy edits fall back to guarded `replace_content`. - -Unresolved conflicts are written beside the `.nmd` file as Roughdraft Markdown: - -```markdown -# notion-md body conflict - -{==Body conflict==}{>>Remote and local body content both changed since the last clean pull.<<}{id="body-conflict"} - -## Base body - -... - -## Local body - -... - -## Remote body - -... -``` - -Normal push refuses unresolved Roughdraft review markup. Explicit modes may later apply, render, strip, or bridge review annotations. - -## Feature Mapping - -Requirement trace: R01-R05. - -| Notion feature | Local body representation | Non-body state | Fidelity / policy | -| --------------------------- | --------------------------------------- | ------------------------------- | ------------------------------------- | -| Page title/icon/cover | not body | frontmatter page fields | title preserved; icon/cover modeled | -| Page lock/trash state | not body | frontmatter page fields | field-level page API patch | -| Paragraphs, headings, lists | stock Markdown/enhanced Markdown | none | supported with Notion normalization | -| To-dos, quotes, dividers | stock Markdown/enhanced Markdown | none | supported | -| Code blocks | fenced blocks | language normalization | supported; aliases may normalize | -| Equations | Markdown/enhanced math syntax | raw rich-text fallback if lossy | block supported; inline conservative | -| Callouts, toggles, tables | enhanced Markdown tags | color/attribute normalization | supported with normalization caveats | -| Columns | enhanced column tags | none | supported by endpoint, needs coverage | -| Images/files/media | Markdown/enhanced media tags | future file payloads | not fully implemented | -| Bookmark/embed/link preview | `<unknown ...>` placeholder | unsupported block unit/object | preserve or explicit delete | -| Child page/database | enhanced reference tags or placeholders | future ownership records | preserve by default | -| Data-source row properties | not body | typed property map | modeled writable properties | -| Data-source schema/views | not body | future schema snapshot | not implemented | -| Comments | not body | future comment bridge | not implemented | -| Suggestions/review | Roughdraft local layer | review state | reject unresolved by default | - -Known Notion enhanced Markdown limitations: - -- Notion normalizes valid Markdown on pull. -- Page title and properties are not included in Markdown body output. -- Some blocks pull as `<unknown>` with `unknown_block_ids`. -- The Markdown endpoint can return a prefix of the rendered block tree, such as - content before a divider; that response is lossy and cannot become a clean - `.nmd` base. -- The Markdown endpoint can omit separators around block boundaries; the clean - pull body is rendered from the block tree so paragraphs adjacent to headings - and dividers keep their block type. -- Signed file URLs expire and are not durable identity. -- Comments support inline Markdown-like content but are separate from body Markdown. -- `allow_deleting_content` can delete child pages/databases and unsupported blocks; the default is non-destructive. - -Evidence for these limitations lives in [experiments.md](./experiments.md). - -## Effect Services - -Requirement trace: R16-R20. - -``` -CLI program - provides command tree, option schemas, output renderers - -Sync coordinator - depends on NotionGateway and NmdStateStore - owns pull/status/push/sync decisions - -NotionGateway - depends on NotionConfig and HttpClient - owns typed Notion API calls and response adaptation - -NmdStateStore - depends on FileSystem and Path - owns .nmd IO, object refs, object validation, atomic local writes - -Merge planner - pure module for body merge and Markdown update planning - -Watch service - owns event queue, debounce, polling, scoped cancellation -``` - -Implementation rules: - -- Decode untrusted payloads with Effect Schema at the boundary. -- Expected failures use tagged errors with page/file/surface context. -- State-store object reads verify hash, role, schema version, page id, and inventory. -- Layers are composed at process boundaries. -- Long-lived watch resources are scoped and interruptible. -- Pure planning logic stays outside Effect services and has focused unit tests. - -## CLI - -Current commands: - -```bash -notion-md sync <page-id-or-url> page.nmd -notion-md sync docs --from-remote --root <page-id-or-url> -notion-md plan docs -notion-md status page.nmd -notion-md sync page.nmd [--watch] [--poll-interval-ms 30000] -notion-md sync docs -``` - -Environment: - -| Variable | Meaning | -| ------------------ | ---------------- | -| `NOTION_API_TOKEN` | Notion API token | - -Output: - -- One-shot commands emit pretty JSON results by default. -- Watch emits compact NDJSON event lines by default. -- Watch `sync_error` events include structured typed error fields. -- The long-term stable contract is explicit `--output human|json|ndjson`, with `auto` allowed only as a convenience alias after envelope schemas are versioned. - -Future CLI contract: - -```bash -notion-md diff <file.nmd> [--surface body|properties|comments|files] -notion-md comments pull|push <file.nmd> -notion-md doctor <page-id-or-url|file.nmd> -notion-md store verify|gc|export <file.nmd> -``` - -Batch commands: - -```bash -notion-md status <target...> [--recursive] [--concurrency 4] -notion-md sync <target> [--recursive] [--concurrency 4] [--watch] -``` - -Rules: - -- A single file target emits a single-page JSON result. -- Multiple status targets or flat recursive directory targets emit a batch envelope. -- Directory tree targets read `.notion-md/workspace.json` as an internal tree - index when present. `plan` reports tree operations without writing files, and - `sync` applies the local tree unless `--from-remote` is explicit. -- Recursive discovery includes existing `*.nmd` files and skips `.notion-md`, - `.git`, and `node_modules`. -- Duplicate `page_id` values in the same batch are rejected before any Notion - mutation. -- Missing or malformed files are reported as per-file errors when other valid - targets can still run. -- Local file deletion, local rename, and remote page moves are not destructive - intent. Remote archive/delete remains explicit future behavior. - -## Watch Lifecycle - -Requirement trace: R19-R20, R28. - -``` -initial event ----\ -file event --------> sliding queue -> debounce -> sync pass -> JSON event -remote poll ------/ -``` - -Rules: - -- One sync pass runs at a time per process. -- File events and poll events are coalesced. -- Each pass emits `sync` or `sync_error`. -- Sync-pass spans observe failures before the watch loop recovers. -- Interruption closes the watcher, stops polling, and cancels queued work. -- File events come from the Effect Platform `FileSystem.watch` stream. Production - adapters are thin stream producers; coalescing policy stays in the watch loop. -- Multi-file watch resolves the target set at startup, watches the containing - directories for those files, coalesces by path, and runs batch sync passes with - bounded concurrency. New files discovered after startup require restarting the - watcher until a tree manifest/daemon owns dynamic discovery. - -The watch core uses a sliding queue and debounce window. Future tests may inject -source streams and `TestClock`, but production code must stay on Effect Platform -watch primitives instead of raw runtime callbacks. - -## Long-Term Decisions - -Requirement trace: R01-R24. - -| Area | Decision | -| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Inline equations | Treat inline equations conservatively until raw rich-text evidence proves Notion's Markdown endpoint preserves equation semantics. If not, preserve spans outside the body. | -| Page/data-source references | Use stock enhanced Markdown where Notion round-trips references. Preserve unsupported references with block API snapshots and object refs. | -| Property merge bases | Keep compact bases inline; move large or volatile bases into content-addressed objects by policy. | -| Comment anchoring | Bridge Roughdraft comments only when exact selected text is unique in a known block; otherwise fall back to page-level comments. | -| Store index | Derive reachability from `.nmd` frontmatter and object refs. Add a JSON index only when repo-scale GC or multi-page watch needs it. | -| Batch sync | Keep the page/file sync engine as the correctness boundary. Batch and folder modes are orchestration only, with duplicate page-id preflight and per-file results. | -| Body completeness | Keep pure vocabulary in `@overeng/notion-core`, live observation in `@overeng/notion-effect-client`, and clean-base adoption/write policy in `@overeng/notion-md`. | -| Pull body authority | Adopt block-tree-rendered Markdown as the clean `.nmd` body; retain endpoint Markdown as diagnostic evidence for truncation, unknown blocks, and endpoint/block-tree comparison. | -| Webhooks | Polling remains the correctness baseline. A local daemon/tunnel may accelerate refresh; hosted relay is a separate product/security decision. | -| CLI output | Use explicit output modes with versioned envelopes. Watch mode uses NDJSON events. | -| Watch events | Use Effect Platform streams plus a deterministic reducer/queue policy. Avoid raw `fs.watch` ownership in package code. | - -## OpenTelemetry +| Surface | Local state | Pull API | Push API | Owner | +| ---------------------------------------- | ------------------------------ | ------------------------------------- | -------------------------------------------------------- | ---------------------------------------------------------- | +| Body | `.nmd` body + `base_snapshot` | block-tree render + endpoint evidence | Markdown update endpoint | [03](./03-sync-engine/spec.md)/[04](./04-fidelity/spec.md) | +| Page metadata | frontmatter page fields | `GET /pages/{id}` | `PATCH /pages/{id}` | [06-data-source](./06-data-source/spec.md) | +| Properties | frontmatter property map | `GET /pages/{id}` | `PATCH /pages/{id}` | [06-data-source](./06-data-source/spec.md) | +| Unsupported / not-round-trip-safe blocks | frontmatter/object storage | Markdown + block API | refuse at pull (R38); round-trip-safe captures preserved | [04-fidelity](./04-fidelity/spec.md) | +| Data-source schema | external datasource-sync state | datasource-sync package | datasource-sync package | datasource-sync | +| Comments | future comment payload | comments API | comments API | (designed) | +| Files | future file payload | block/file APIs | file upload APIs | (designed) | +| Review | Roughdraft local markup | local only or comments API | explicit bridge only | [03-sync-engine](./03-sync-engine/spec.md) | + +## Subsystem index + +The `0N` prefix encodes reading + dependency order (`0N` may depend on lower +numbers, not the reverse). The decomposition is **layered**: the two surfaces +(01-editor, 02-file-sync) sit on a shared engine (03-sync-engine) that depends on +the shared fidelity layer (04-fidelity), durable local state (05-local-state), and +the typed property surface (06-data-source). Each subsystem `spec.md` opens with a +link up to [../requirements.md](./requirements.md) + its own `requirements.md`. + +| # | Subsystem | Spec covers | Requirements | Decisions | +| --- | --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------- | -------------------------------------------------------- | +| 01 | [editor](./01-editor/spec.md) | `cat`/`put`/`edit` surfaces, representation modes, title↔H1 boundary, guard plumbing, exit codes, `edit` session, umbrella, sync-progress indicator | R32, R33, R34, R35, R37, R39, R43–R45 | 0001, 0002, 0003, 0004, 0008, 0009, 0012, 0017, **0018** | +| 02 | [file-sync](./02-file-sync/spec.md) | pull/status/push flows, CLI + batch/tree orchestration, watch lifecycle | R20, R28 | (traces 0017) | +| 03 | [sync-engine](./03-sync-engine/spec.md) | guarded push, 3-way Markdown merge, `update_content`/`replace_content` selection, canonical base, post-push `semanticEquivalent` gate, settle-and-re-pull, review-safety guard, force escape hatch | R09, R11, R13, R15 | 0002, 0007, 0009, 0012, 0016, 0017 | +| 04 | [fidelity](./04-fidelity/spec.md) | sound round-trip classifier, uniform lossy-page refusal, feature mapping, server-side push strategy, hosted-media canonicalization | R12, R30, R31, R36, R38, R40, R41 | 0005, 0007, 0010, 0011, 0014, 0015, 0016 | +| 05 | [local-state](./05-local-state/spec.md) | `.nmd` envelope, frontmatter schema, `.notion-md/` content-addressed object store, base snapshots | R06, R07, R08, R10 | 0006 | +| 06 | [data-source](./06-data-source/spec.md) | writable property values, writable page metadata, `data_source` binding, `schema_snapshot` schema-drift guard | R04, R14 | 0013 | + +The `.decisions/` directory (0001–0018) is the authoritative decision log; the +Decisions column above is the citation map. Decision **0016** (refuse lossy pages) +supersedes the reconciler/converter records (0005, 0010, 0011, 0014, 0015); +decision **0017** (edit = ephemeral file-engine session) supersedes 0013 (the +stateless schema fingerprint) and broadens the refusal to uniform; decision +**0018** adds the staged write-path sync-progress indicator. + +**Cross-cutting at root by design.** Observability (the OpenTelemetry conventions), +verification expectations, and the Effect service-boundary overview deliberately +stay at this root index rather than living in a subsystem: the 6-way split is by +sync surface and correctness layer and has no observability/testing subsystem to own +them, and each spans all six. A future `07-observability` subsystem could own the +OTEL + verification surface if it grows enough to warrant its own requirements; that +is a possible follow-up, not done now. + +## OpenTelemetry (cross-cutting) Requirement trace: R21-R24, R29. @@ -547,20 +164,29 @@ Current implementation uses `notion-md-cli` for both modes and distinguishes wat Span conventions: -| Span | Required attributes | -| ----------------------------------- | ------------------------------------------------------------- | -| `notion-md.cli.<command>` | `span.label`, `notion_md.command` | -| `notion-md.sync-page` | `span.label`, `notion_md.sync.result`, `notion_md.page_id` | -| `notion-md.status-page` | local/remote changed booleans, unknown-block count | -| `notion-md.push-page` | force flag, destructive flag, push decision, markdown command | -| `notion-md.watch.sync-pass` | watch reason, command, path basename, error tag when failed | -| `notion-md.gateway.update-markdown` | page id, update type, content-update count, destructive flag | -| `notion-md.state.read-object` | object role, hash prefix | -| `notion-md.state.write-object` | object role, hash prefix | - -Attributes must not include tokens, full Markdown bodies, file bytes, or signed URLs. - -## Verification +| Span | Required attributes | +| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `notion-md.cli.<command>` | `span.label`, `notion_md.command` | +| `notion-md.cat` | `span.label`, `notion_md.page_id`, `notion_md.editor.mode` | +| `notion-md.put` | `span.label`, `notion_md.page_id`, `notion_md.editor.mode`, `notion_md.put.force`, `notion_md.put.body_written`, `notion_md.put.title_written` | +| `notion-md.edit` | `span.label`, `notion_md.page_id`, `notion_md.editor.mode`, `notion_md.edit.outcome`; wraps the engine's `notion-md.sync-page` / `push-page` / `status-page` spans as children (decision 0017) | +| `notion-md.sync-page` | `span.label`, `notion_md.sync.result`, `notion_md.page_id` | +| `notion-md.status-page` | local/remote changed booleans, unknown-block count | +| `notion-md.push-page` | force flag, destructive flag, push decision, markdown command | +| `notion-md.watch.sync-pass` | watch reason, command, path basename, error tag when failed | +| `notion-md.gateway.update-markdown` | page id, update type, content-update count, destructive flag | +| `notion-md.state.read-object` | object role, hash prefix | +| `notion-md.state.write-object` | object role, hash prefix | + +Attributes must not include tokens, full Markdown bodies, file bytes, or signed +URLs — asserted by a span leak-guard test (R24, Group G). All attribute keys use +the `notion_md.*` namespace (not a `nmd.*` shorthand). The write-path stage +vocabulary ([01-editor](./01-editor/spec.md#sync-progress-indicator-write-path)) is +a CLI-facing presentation contract, distinct from these span names. A +`result`/`changed`/`partial_write` attribute per command is desirable hardening not +yet emitted (impl-delta Group G follow-up). + +## Verification (cross-cutting) | Layer | Required coverage | | --------------- | --------------------------------------------------------------------------------- | @@ -595,7 +221,29 @@ Checked-in examples use `.nmd.example` so recursive commands only operate after user has pulled distinct real Notion pages into `.nmd` files. Follow-up hardening remains for required live-lane policy, OTEL span assertions, -versioned CLI output schemas, and broader storage/comment coverage. Watch -coverage already includes polling, structured errors, and batch coalescing in -the fake/live E2E suite; additional watch work should target uncovered lifecycle -or timing edges rather than restating the basic watch-core scenarios. +versioned CLI output schemas, broader storage/comment coverage, and the staged +sync-progress indicator (decision 0018). Watch coverage already includes polling, +structured errors, and batch coalescing in the fake/live E2E suite; additional +watch work should target uncovered lifecycle or timing edges rather than restating +the basic watch-core scenarios. + +## Residual long-term decisions (cross-cutting) + +The editor decisions are recorded as individual records in +[`.decisions/`](./.decisions/) (0001–0018) — that directory is the authoritative +decision log. The residual file-based-engine areas below have no individual record +and are summarized here; they must not silently diverge from the records. + +| Area | Decision | +| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Inline equations | Treat inline equations conservatively until raw rich-text evidence proves Notion's Markdown endpoint preserves equation semantics. If not, preserve spans outside the body. | +| Page/data-source references | Use stock enhanced Markdown where Notion round-trips references. Preserve unsupported references with block API snapshots and object refs. | +| Property merge bases | Keep compact bases inline; move large or volatile bases into content-addressed objects by policy. | +| Comment anchoring | Bridge Roughdraft comments only when exact selected text is unique in a known block; otherwise fall back to page-level comments. | +| Store index | Derive reachability from `.nmd` frontmatter and object refs. Add a JSON index only when repo-scale GC or multi-page watch needs it. | +| Batch sync | Keep the page/file sync engine as the correctness boundary. Batch and folder modes are orchestration only, with duplicate page-id preflight and per-file results. | +| Body completeness | Keep pure vocabulary in `@overeng/notion-core`, live observation in `@overeng/notion-effect-client`, and clean-base adoption/write policy in `@overeng/notion-md`. | +| Pull body authority | Adopt block-tree-rendered Markdown as the clean `.nmd` body; retain endpoint Markdown as diagnostic evidence for truncation, unknown blocks, and comparison. | +| Webhooks | Polling remains the correctness baseline. A local daemon/tunnel may accelerate refresh; hosted relay is a separate product/security decision. | +| CLI output | Use explicit output modes with versioned envelopes. Watch mode uses NDJSON events. | +| Watch events | Use Effect Platform streams plus a deterministic reducer/queue policy. Avoid raw `fs.watch` ownership in package code. | diff --git a/packages/@overeng/notion-md/docs/vrs/vision.md b/packages/@overeng/notion-md/docs/vrs/vision.md index 1da7fb97f..205b05fd4 100644 --- a/packages/@overeng/notion-md/docs/vrs/vision.md +++ b/packages/@overeng/notion-md/docs/vrs/vision.md @@ -35,7 +35,8 @@ 1. A user can pull a Notion page to local state, inspect and edit the body as Notion enhanced Markdown, and push changes without sending local metadata as page content. 2. A normal push refuses to overwrite changed remote body content or delete unsupported/child content unless the user chooses an explicit destructive mode. 3. Page properties and data-source rows round-trip through typed schemas instead of being flattened into body Markdown. -4. Unsupported blocks and file artifacts are preserved through stable placeholders and content-addressed objects. +4. Unsupported blocks and file artifacts are preserved through stable placeholders and content-addressed objects where they are resolvable; a page the tool cannot round-trip as Markdown is refused (uniformly across the editor and file surfaces) rather than risk silently dropping such content. 5. Watch mode can run continuously, coalesce local and remote changes, and shut down without orphaned work. 6. Every CLI command and watch pass produces traceable Effect spans with enough attributes to diagnose Notion API, filesystem, merge, and validation failures. 7. Supported body features are backed by E2E fixtures that create, pull, compare, and clean up real Notion pages. +8. A user can edit a Notion page as Markdown through their canonical `$EDITOR` with a single command, no local file in the working directory, and a guarded push — with no editor plugin required. diff --git a/packages/@overeng/notion-md/nix/build.nix b/packages/@overeng/notion-md/nix/build.nix index cba993b79..905885c0c 100644 --- a/packages/@overeng/notion-md/nix/build.nix +++ b/packages/@overeng/notion-md/nix/build.nix @@ -20,7 +20,7 @@ let # Managed by the repo FOD refresh workflow — do not edit manually. depsBuilds = { "." = { - hash = "sha256-2V8S6/AKbZ1bG32UbmAkcrgmNZDJq2+BNh17fLCWkRk="; + hash = "sha256-2L8UshlPGyIo7y0aeM4BgumEYRiSGkclu3UF1xW7Bw4="; }; }; smokeTestArgs = [ "--help" ]; diff --git a/packages/@overeng/notion-md/package.json b/packages/@overeng/notion-md/package.json index 5da9e02c6..e1973f266 100644 --- a/packages/@overeng/notion-md/package.json +++ b/packages/@overeng/notion-md/package.json @@ -30,12 +30,7 @@ "@overeng/notion-effect-client": "workspace:^", "@overeng/notion-effect-schema": "workspace:^", "@overeng/otel-contract": "workspace:^", - "@overeng/utils": "workspace:^", - "remark-gfm": "4.0.1", - "remark-parse": "11.0.0", - "remark-stringify": "11.0.0", - "unified": "11.0.5", - "unist-util-visit": "5.1.0" + "@overeng/utils": "workspace:^" }, "devDependencies": { "@effect-atom/atom": "0.5.3", diff --git a/packages/@overeng/notion-md/package.json.genie.ts b/packages/@overeng/notion-md/package.json.genie.ts index c26082d47..268578adf 100644 --- a/packages/@overeng/notion-md/package.json.genie.ts +++ b/packages/@overeng/notion-md/package.json.genie.ts @@ -38,13 +38,6 @@ const workspaceDeps = catalog.compose({ otelContractPkg, utilsPkg, ], - external: catalog.pick( - 'remark-gfm', - 'remark-parse', - 'remark-stringify', - 'unified', - 'unist-util-visit', - ), }, devDependencies: { workspace: [tuiReactPkg, utilsDevPkg], diff --git a/packages/@overeng/notion-md/src/body-facade.ts b/packages/@overeng/notion-md/src/body-facade.ts index 11d09bc06..ccdd498ad 100644 --- a/packages/@overeng/notion-md/src/body-facade.ts +++ b/packages/@overeng/notion-md/src/body-facade.ts @@ -1,13 +1,14 @@ import { Effect, Schema } from 'effect' import { descriptorForUtf8, type ContentDescriptor } from '@overeng/content-address' -import type { BodyCompleteness } from '@overeng/notion-core' +import { describeBodyLossyRefusal, type BodyCompleteness } from '@overeng/notion-core' import type { BodyEvidenceFingerprint, RemoteBodyObservationEvidence, Sha256Digest, } from '@overeng/notion-effect-client' +import { editorBaseHash } from './editor-surface.ts' import { NmdFrontmatterError, NmdRemoteBodyLossyError, type NmdError } from './errors.ts' import { parseNmdFile } from './frontmatter.ts' import { normalizeMarkdownLineEndings, sha256Digest } from './hash.ts' @@ -104,7 +105,11 @@ const assertSnapshotComplete = (opts: { operation: opts.operation, page_id: opts.snapshot.pageId, reasons: [...completeness.reasons], - message: `Remote Markdown body for page ${opts.snapshot.pageId} is lossy (${completeness.reasons.join(', ')}); refusing verified body operation`, + message: describeBodyLossyRefusal({ + pageId: opts.snapshot.pageId, + completeness, + context: 'refusing verified body operation', + }), }), ) } @@ -119,6 +124,43 @@ export const observeRemoteBody = (opts: { return remoteBodySnapshot(pulled) }) +/** + * Editor-surface projection of a Notion page: title + body together, with the + * default-mode editor base hash over title+body (decisions 0001/0006). Unlike + * `observeRemoteBody` (body-only), this carries the title and its property key + * so `cat`/`put` can present and route the title through the typed page API. + * Refuses a lossy page (exit 3) at observe time, exactly like the file path. + */ +export interface NotionMdEditorSnapshot { + readonly pageId: string + readonly title: string + readonly titlePropertyKey: string + readonly body: string + readonly baseHash: Sha256Digest + readonly completeness?: BodyCompleteness +} + +/** Observe the current remote title + body for a Notion page, refusing lossy pages. */ +export const observeRemoteEditorPage = (opts: { + readonly pageId: string +}): Effect.Effect<NotionMdEditorSnapshot, NmdError, NotionMdGateway> => + Effect.gen(function* () { + const gateway = yield* NotionMdGateway + const pulled = yield* gateway.pullPage({ pageId: opts.pageId }) + const snapshot = remoteBodySnapshot(pulled) + yield* assertSnapshotComplete({ operation: 'observe_editor_page', snapshot }) + const title = pulled.page.title + const body = snapshot.markdown + return { + pageId: pulled.page.id, + title, + titlePropertyKey: pulled.page.title_property_key, + body, + baseHash: editorBaseHash({ title, body }), + ...(snapshot.completeness === undefined ? {} : { completeness: snapshot.completeness }), + } + }) + /** Read and hash only the parsed body from a local `.nmd` file. */ export const readLocalBody = (opts: { readonly path: string @@ -211,6 +253,44 @@ export const replaceRemoteBodyVerified = (opts: { } }) +/** + * Replace the remote Markdown body **unconditionally** (last-writer-wins), + * skipping the pre-write base-hash compare — the concurrency-only `--force` + * escape (decision 0009). It still asserts body completeness before and after + * the write (the lossy refusal is correctness, not concurrency, and `--force` + * never bypasses it) and returns the re-pulled body so the caller's post-push + * `semanticEquivalent` gate (exit 9) can run. + */ +export const replaceRemoteBodyForced = (opts: { + readonly pageId: string + readonly markdown: string +}): Effect.Effect<NotionMdVerifiedRemoteReplaceResult, NmdError, NotionMdGateway> => + Effect.gen(function* () { + const gateway = yield* NotionMdGateway + const current = remoteBodySnapshot(yield* gateway.pullPage({ pageId: opts.pageId })) + yield* assertSnapshotComplete({ operation: 'replace_remote_body_forced', snapshot: current }) + + yield* gateway.updateMarkdown({ + pageId: opts.pageId, + command: { _tag: 'replace_content', markdown: opts.markdown }, + allowDeletingContent: false, + }) + const updated = remoteBodySnapshot(yield* gateway.pullPage({ pageId: opts.pageId })) + yield* assertSnapshotComplete({ operation: 'replace_remote_body_forced', snapshot: updated }) + return { + pageId: opts.pageId, + previousBodyHash: current.bodyHash, + bodyHash: updated.bodyHash, + bodyDescriptor: updated.bodyDescriptor, + ...(updated.bodyEvidence === undefined ? {} : { bodyEvidence: updated.bodyEvidence }), + ...(updated.bodyEvidenceFingerprint === undefined + ? {} + : { bodyEvidenceFingerprint: updated.bodyEvidenceFingerprint }), + markdown: updated.markdown, + ...(updated.completeness === undefined ? {} : { completeness: updated.completeness }), + } + }) + /** Re-check local body stability, then refresh the local materialization after a verified push. */ export const settleVerifiedBodyPush = (opts: { readonly pageId: string diff --git a/packages/@overeng/notion-md/src/body-facade.unit.test.ts b/packages/@overeng/notion-md/src/body-facade.unit.test.ts index e97778d7a..215d2987d 100644 --- a/packages/@overeng/notion-md/src/body-facade.unit.test.ts +++ b/packages/@overeng/notion-md/src/body-facade.unit.test.ts @@ -169,6 +169,7 @@ class FakeGateway { this.metadataUpdateCalls.push('properties') throw new Error('unexpected metadata update') }), + retrieveDataSource: () => Effect.dieMessage('unexpected retrieveDataSource'), updatePageMetadata: () => Effect.sync(() => { this.metadataUpdateCalls.push('metadata') diff --git a/packages/@overeng/notion-md/src/canonical-markdown.test.ts b/packages/@overeng/notion-md/src/canonical-markdown.test.ts index 790925668..0c0ba812b 100644 --- a/packages/@overeng/notion-md/src/canonical-markdown.test.ts +++ b/packages/@overeng/notion-md/src/canonical-markdown.test.ts @@ -1,51 +1,18 @@ +import { readFileSync } from 'node:fs' +import { fileURLToPath } from 'node:url' + import { describe, expect, it } from '@effect/vitest' +import * as fc from 'effect/FastCheck' import { canonicalizeBlockMarkdown, semanticEquivalent } from './canonical-markdown.ts' +import { sha256Digest } from './hash.ts' -describe('canonicalizeBlockMarkdown', () => { - it('unwraps soft-wrapped paragraph lines into one logical line', () => { - const wrapped = [ - 'Use this skill when designing software and you need a', - 'principled read on whether a code-level solution makes the system', - 'simpler.', - ].join('\n') - - expect(canonicalizeBlockMarkdown(wrapped)).toBe( - 'Use this skill when designing software and you need a principled read on whether a code-level solution makes the system simpler.\n', - ) - }) - - it('preserves paragraph boundaries on blank lines', () => { - const input = 'First paragraph.\n\nSecond paragraph.' - expect(canonicalizeBlockMarkdown(input)).toBe('First paragraph.\n\nSecond paragraph.\n') - }) - - it('preserves explicit hard breaks', () => { - const input = 'Line one.\\\nLine two.' - expect(canonicalizeBlockMarkdown(input)).toBe('Line one.\\\nLine two.\n') - }) - - it('keeps list structure with unwrapped continuations', () => { - const input = ['- first item that wraps across', ' two lines', '- second item'].join('\n') - expect(canonicalizeBlockMarkdown(input)).toBe( - '- first item that wraps across two lines\n- second item\n', - ) - }) - - it('leaves fenced code blocks untouched', () => { - const input = '```ts\nconst x = 1\nconst y = 2\n```' - expect(canonicalizeBlockMarkdown(input)).toBe('```ts\nconst x = 1\nconst y = 2\n```\n') - }) - - it('is idempotent', () => { - const input = 'Paragraph one wraps\nacross lines.\n\nParagraph two.' - const once = canonicalizeBlockMarkdown(input) - expect(canonicalizeBlockMarkdown(once)).toBe(once) - }) - - it('normalizes CRLF line endings to LF', () => { - const input = 'Line one\r\nstill line one.\r\n\r\nLine two.' - expect(canonicalizeBlockMarkdown(input)).toBe('Line one still line one.\n\nLine two.\n') +describe('canonicalizeBlockMarkdown re-export', () => { + // The canonical function now lives in `@overeng/notion-effect-client` (its + // own unit tests cover behavior); notion-md re-exports it. Smoke-test that + // the re-export resolves and produces the canonical (tight-list) form. + it('re-exports the canonical body function', () => { + expect(canonicalizeBlockMarkdown('- a\n\n- b\n')).toBe('- a\n- b\n') }) }) @@ -85,4 +52,151 @@ describe('semanticEquivalent', () => { const same = 'Intro.\n```ts\nconst x = 1\nconst y = 2\n```\n' expect(semanticEquivalent({ a: sent, b: same })).toBe(true) }) + + it('treats two rotated hosted-media signature variants as equivalent', () => { + const host = 'https://prod-files-secure.s3.us-west-2.amazonaws.com/abc/photo.png' + const params = + '?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260615T120000Z&X-Amz-Expires=3600' + const pullOne = `![caption](${host}${params}&X-Amz-Signature=deadbeef)\n` + const pullTwo = `![caption](${host}${params}&X-Amz-Signature=cafef00d)\n` + expect(semanticEquivalent({ a: pullOne, b: pullTwo })).toBe(true) + }) + + it('still flags a real change to a hosted-media caption', () => { + const host = 'https://prod-files-secure.s3.us-west-2.amazonaws.com/abc/photo.png' + const a = `![before](${host}?X-Amz-Signature=deadbeef)\n` + const b = `![after](${host}?X-Amz-Signature=cafef00d)\n` + expect(semanticEquivalent({ a, b })).toBe(false) + }) +}) + +/* + * Two-oracle agreement on canonical inputs (decision 0019). + * + * The system has two oracles for "did the body change": + * - the RAW-HASH oracle — `sha256Digest` over the body (`classifyPlan` in + * `tree.ts` compares `prevState.body.hash === sha256Digest(composed)`); a + * byte-exact digest, NOT canonicalization-invariant. + * - the CANON-INVARIANT oracle — `semanticEquivalent`, which canonicalizes + * both sides before comparing (the push integrity gate). + * + * Post-consolidation, decision 0019 routes BOTH wire boundaries through + * `canonicalizeBlockMarkdown`, so every body the oracles ever see is already + * canonical. For canonical inputs the two oracles MUST agree: raw-hash-equal + * ⟺ semanticEquivalent-true. They may legitimately disagree only on + * non-canonical inputs (where the raw hash sees spelling/spacing the canonical + * compare folds away). + * + * The referee is NEITHER oracle: it is `canonicalize-then-byte-equal`. Asserting + * each oracle against this independent judge is what guards the exact failure + * mode a future canonicalization change could silently introduce — e.g. a change + * that makes `semanticEquivalent` call two distinct canonical bodies equal would + * surface as `semEq !== referee`, not hide behind the oracle it would corrupt. + */ +describe('two-oracle agreement on canonical inputs (decision 0019)', () => { + /** + * Assert both oracles match the referee for a pair of ALREADY-canonical + * bodies (the now-guaranteed pull form). The premise — inputs are canonical — + * is itself asserted (idempotence), so a regression that breaks canonical + * fixpoint fails here too. + */ + const assertOraclesAgree = (rawA: string, rawB: string): void => { + const ca = canonicalizeBlockMarkdown(rawA) + const cb = canonicalizeBlockMarkdown(rawB) + // Premise: the canonical form is a fixpoint (already-canonical inputs). + expect(canonicalizeBlockMarkdown(ca)).toBe(ca) + expect(canonicalizeBlockMarkdown(cb)).toBe(cb) + + const referee = ca === cb + const rawHashEqual = sha256Digest(ca) === sha256Digest(cb) + const semanticEqual = semanticEquivalent({ a: ca, b: cb }) + + expect(rawHashEqual).toBe(referee) + expect(semanticEqual).toBe(referee) + } + + // Seed pairs the system is known to (or could) diverge on: each pair is two + // raw spellings of the SAME content that canonicalization must converge to + // identical bytes (referee true → both oracles true), plus distinct-content + // pairs (referee false → both oracles false). Random generation alone rarely + // hits the convergent cases, so these seeds carry the discriminating power. + const seedPairs: ReadonlyArray<{ + readonly name: string + readonly a: string + readonly b: string + }> = [ + { name: 'emphasis fold (*x* vs _x_)', a: '*x*\n', b: '_x_\n' }, + { name: 'loose vs tight list', a: '- a\n\n- b\n', b: '- a\n- b\n' }, + { name: 'soft-wrapped vs unwrapped paragraph', a: 'one\ntwo\n', b: 'one two\n' }, + { name: 'bullet marker style (* vs -)', a: '* a\n* b\n', b: '- a\n- b\n' }, + { name: 'heading-to-text blank-line spacing', a: '# H\ntext\n', b: '# H\n\ntext\n' }, + { name: 'distinct content', a: 'apple\n', b: 'banana\n' }, + { name: 'distinct lists', a: '- a\n- b\n', b: '- a\n- c\n' }, + ] + + for (const seed of seedPairs) { + it(`oracles agree with the referee for: ${seed.name}`, () => { + assertOraclesAgree(seed.a, seed.b) + }) + } + + // Property: for arbitrary bodies, canonicalizing both ALWAYS makes the two + // oracles agree with the referee. Broadens the seeds; the seeds guarantee the + // convergent (referee-true) branch is exercised, this covers the long tail. + it.prop( + 'oracles agree with the referee for arbitrary canonical bodies', + [fc.string({ maxLength: 120 }), fc.string({ maxLength: 120 })], + ([a, b]) => { + assertOraclesAgree(a, b) + }, + { fastCheck: { numRuns: 200 } }, + ) + + // Premise standalone: canonicalization is idempotent (the pull form is a + // fixpoint), so "already canonical" is a meaningful, achievable state. + it.prop( + 'canonicalizeBlockMarkdown is idempotent', + [fc.string({ maxLength: 160 })], + ([raw]) => { + const once = canonicalizeBlockMarkdown(raw) + expect(canonicalizeBlockMarkdown(once)).toBe(once) + }, + { fastCheck: { numRuns: 200 } }, + ) +}) + +/* + * Golden-file fixpoint over the durable demo (`demo/showcase.nmd`). + * + * The demo is the committed live showcase, re-baselined by the body/Markdown + * consolidation (decision 0019). Its body must already be in canonical form — + * a real `sync` would not rewrite it — and canonicalization must be idempotent + * over it. This locks the re-baseline and catches future drift (a canon change + * that would make the demo body non-canonical fails here). + * + * We extract the body via the raw frontmatter boundary (NOT `parseNmdFile`) so + * the assertion targets exactly the body bytes, independent of frontmatter + * schema version. + */ +describe('demo/showcase.nmd golden-file fixpoint (decision 0019)', () => { + const demoPath = fileURLToPath(new URL('../demo/showcase.nmd', import.meta.url)) + + /** Split the `.nmd` envelope at the closing `---` and return the raw body. */ + const demoBody = (): string => { + const content = readFileSync(demoPath, 'utf8').replace(/\r\n/g, '\n') + const marker = '\n---\n' + const end = content.indexOf(marker, 4) + expect(end).toBeGreaterThan(0) + return content.slice(end + marker.length).replace(/^\n/u, '') + } + + it('committed demo body is already canonical (sync would not rewrite it)', () => { + const body = demoBody() + expect(canonicalizeBlockMarkdown(body)).toBe(body) + }) + + it('canonicalization is idempotent over the demo body', () => { + const canon = canonicalizeBlockMarkdown(demoBody()) + expect(canonicalizeBlockMarkdown(canon)).toBe(canon) + }) }) diff --git a/packages/@overeng/notion-md/src/canonical-markdown.ts b/packages/@overeng/notion-md/src/canonical-markdown.ts index 816a05ca4..0d58fa8c4 100644 --- a/packages/@overeng/notion-md/src/canonical-markdown.ts +++ b/packages/@overeng/notion-md/src/canonical-markdown.ts @@ -1,57 +1,13 @@ -import remarkGfm from 'remark-gfm' -import remarkParse from 'remark-parse' -import remarkStringify from 'remark-stringify' -import { unified } from 'unified' -import { visit } from 'unist-util-visit' +import { canonicalizeBlockMarkdown } from '@overeng/notion-effect-client' /* - * Canonical Markdown serialization used as the wire and on-disk form. - * - * Why a canonical form: Notion's enhanced-Markdown endpoint reserializes any - * pushed body into its own block model, so byte-equal roundtrips are not - * achievable. We define one canonical shape (CommonMark + GFM, paragraphs - * unwrapped onto a single logical line, ATX headings, hyphen list bullets) and - * normalize both push input and pull output to it. The push-side guard then - * checks canonical equality instead of byte equality, and the visible Notion - * page no longer shows hard breaks from soft-wrapped source paragraphs. + * `canonicalizeBlockMarkdown` — the single canonical body form — now lives in + * `@overeng/notion-effect-client`, beside the renderer (`treeToMarkdown`) and + * the media-URL canonicalizer it calls, so the canonical body is produced where + * the bytes originate (decision 0019). This module keeps only `semanticEquivalent`, + * which is sync *policy* (the push integrity gate), not the wire form itself. */ - -/* - * Soft line breaks inside a paragraph (a literal `\n` in source) render as - * hard line breaks on Notion. Collapse them to single spaces so a logical - * paragraph survives as one Notion block. Authors who want a hard break must - * use the explicit `break` node (two trailing spaces or a backslash). - */ -const unwrapSoftBreaks: () => (tree: unknown) => void = () => (tree) => { - visit(tree as never, 'text', (node: { value: string }) => { - if (node.value.includes('\n') === true) { - node.value = node.value.replace(/[ \t]*\n[ \t]*/g, ' ') - } - }) -} - -const processor = unified() - .use(remarkParse) - .use(remarkGfm) - .use(unwrapSoftBreaks) - .use(remarkStringify, { - bullet: '-', - emphasis: '_', - strong: '*', - fence: '`', - fences: true, - listItemIndent: 'one', - rule: '-', - setext: false, - tightDefinitions: true, - }) - -/** Reduce arbitrary Markdown to the canonical form used for hashing and wire transfer. */ -export const canonicalizeBlockMarkdown = (markdown: string): string => { - const normalized = markdown.replace(/\r\n/g, '\n').replace(/\r/g, '\n') - const rendered = processor.processSync(normalized).toString() - return rendered.endsWith('\n') === true ? rendered : `${rendered}\n` -} +export { canonicalizeBlockMarkdown } from '@overeng/notion-effect-client' /* * Split markdown into alternating non-code and fenced-code segments. Lets diff --git a/packages/@overeng/notion-md/src/cli-program.ts b/packages/@overeng/notion-md/src/cli-program.ts index c292982b6..c9a020d7b 100644 --- a/packages/@overeng/notion-md/src/cli-program.ts +++ b/packages/@overeng/notion-md/src/cli-program.ts @@ -16,11 +16,19 @@ import { statusMany, syncMany, } from './batch.ts' -import { NmdCliError, NmdTokenMissingError } from './errors.ts' +import { + catEditorPage, + editEditorPage, + editReadOnlyPage, + putEditorPage, + type EditorMode, +} from './editor-commands.ts' +import { NmdCliError, NmdTokenMissingError, NmdUnresolvablePageError } from './errors.ts' import { NotionMdGatewayLive } from './live.ts' import type { NotionMdGateway } from './model.ts' import { annotateAttrs, withOperation } from './observability.ts' import { planPath, statusPath, syncPath, targetKind } from './path.ts' +import { ProgressReporterStderrLines } from './progress.ts' import { NmdStateStoreLive, type NmdStateStore } from './state-store.ts' import { pullPage, syncPage, type SyncOptions } from './sync.ts' import { NOTION_MD_VERSION } from './version.ts' @@ -662,9 +670,184 @@ const syncCommand = Command.make( ), ) +// --------------------------------------------------------------------------- +// Editor surfaces: cat / put / edit (VRS "Editor Surfaces") +// --------------------------------------------------------------------------- + +const pageArg = Args.text({ name: 'page' }).pipe( + Args.withDescription('Notion page id, dashed id, or URL'), + Args.withSchema(NonEmptyCliText), +) + +const frontmatterOption = Options.boolean('frontmatter').pipe( + Options.withDescription( + 'Use the full strict `.nmd` envelope instead of the default `# title` + body', + ), + Options.withDefault(false), +) + +const baseHashOption = Options.text('base-hash').pipe( + Options.withDescription('Optimistic-concurrency token from a prior `cat` (guards the write)'), + Options.optional, +) + +const readOnlyOption = Options.boolean('read-only').pipe( + Options.withDescription( + 'Open the page in $EDITOR for inspection only; discard edits and never push (like `vim -R`)', + ), + Options.withDefault(false), +) + +/** Resolve a `<page>` token to a Notion page id, failing with exit 4 when unresolvable. */ +const resolvePageArg = (page: string): Effect.Effect<string, NmdUnresolvablePageError> => { + const parsed = parseNotionPageRef(page) + return parsed === undefined + ? Effect.fail( + new NmdUnresolvablePageError({ + page, + message: `\`${page}\` is not a valid Notion page id, dashed id, or URL.`, + }), + ) + : Effect.succeed(parsed) +} + +/** Read all of stdin as a UTF-8 string (the `put` body buffer). */ +const readStdin = (): Effect.Effect<string> => + Effect.async<string>((resume) => { + const chunks: Buffer[] = [] + process.stdin.on('data', (chunk: Buffer) => chunks.push(chunk)) + process.stdin.on('end', () => resume(Effect.succeed(Buffer.concat(chunks).toString('utf8')))) + process.stdin.on('error', () => resume(Effect.succeed(Buffer.concat(chunks).toString('utf8')))) + process.stdin.resume() + }) + +const catCommand = Command.make( + 'cat', + { page: pageArg, frontmatter: frontmatterOption }, + ({ page, frontmatter }) => { + const mode: EditorMode = frontmatter === true ? 'frontmatter' : 'default' + return commandSpan({ + command: 'cat', + label: basename(page), + effect: resolvePageArg(page).pipe( + Effect.flatMap((pageId) => withNotion(catEditorPage({ pageId, mode }))), + Effect.asVoid, + ), + }) + }, +).pipe( + Command.withDescription( + 'Print a Notion page as editor Markdown (`# title` + body) with the base hash on stderr; `--frontmatter` dumps the full `.nmd` envelope', + ), +) + +const putCommand = Command.make( + 'put', + { page: pageArg, baseHash: baseHashOption, force: forceOption }, + ({ page, baseHash, force }) => + commandSpan({ + command: 'put', + label: basename(page), + effect: resolvePageArg(page).pipe( + Effect.flatMap((pageId) => + force === false && Option.isNone(baseHash) === true + ? Effect.fail( + new NmdCliError({ + message: + 'put requires either --base-hash <hash> (guarded; capture it from `cat`) or --force (concurrency override).', + }), + ) + : readStdin().pipe( + Effect.flatMap((buffer) => + withNotion( + putEditorPage({ + pageId, + buffer, + force, + ...(Option.isSome(baseHash) === true + ? { baseHash: baseHash.value as never } + : {}), + }), + ), + ), + ), + ), + Effect.flatMap(logJson), + ), + }), +).pipe( + Command.withDescription( + 'Write editor Markdown from stdin (`# title` + body) back to a Notion page; guarded by --base-hash, or --force to override concurrency', + ), +) + +/** + * Build the `edit` command. Shared by the `notion-md`/`notion md` subcommand and + * the top-level `notion edit <page>` alias (R18) so both delegate to the exact + * same engine-backed session. + */ +const makeEditCommand = (name: string) => + Command.make( + name, + { page: pageArg, frontmatter: frontmatterOption, readOnly: readOnlyOption }, + ({ page, frontmatter, readOnly }) => { + const mode: EditorMode = frontmatter === true ? 'frontmatter' : 'default' + return commandSpan({ + command: 'edit', + label: basename(page), + // `edit` exposes no `--force` (force lives on `put`/`sync`), so the + // documented `--read-only`/`--force` contradiction cannot be expressed + // here — there is nothing to reject. + effect: resolvePageArg(page).pipe( + Effect.flatMap((pageId) => + readOnly === true + ? withNotion(editReadOnlyPage({ pageId, mode })).pipe( + Effect.map((result): unknown => result), + ) + : withNotion(editEditorPage({ pageId, mode, pageRef: page })).pipe( + Effect.map((result): unknown => result), + /* + * Staged write-path progress (R43–R45, decision 0018): wire the + * live stderr-line reporter ONLY on the `edit` push path, and + * only when stderr is a TTY — a piped/redirected write provides + * nothing (Layer.empty → serviceOption None → silent), keeping + * the path byte-identical and pipe-safe (R44/R45). Constructed + * lazily inside the handler (no TUI graph, no #787 TDZ risk). + */ + Effect.provide( + process.stderr.isTTY === true ? ProgressReporterStderrLines : Layer.empty, + ), + ), + ), + Effect.flatMap(logJson), + ), + }) + }, + ).pipe( + Command.withDescription( + 'Edit a Notion page in $EDITOR via an ephemeral .nmd session, then push the change through the sync engine; `--read-only` inspects without pushing', + ), + ) + +const editCommand = makeEditCommand('edit') + +/** + * Top-level `notion edit <page>` alias (R18) — the marquee editor verb, wired + * into the umbrella root alongside `md`/`schema`/`db`. Delegates to the same + * `editEditorPage` session as `notion md edit`. + */ +export const notionEditAliasCommand = makeEditCommand('edit') + const makeNotionMdCommand = (name: 'md' | 'notion-md') => Command.make(name).pipe( - Command.withSubcommands([statusCommand, planCommand, syncCommand]), + Command.withSubcommands([ + statusCommand, + planCommand, + syncCommand, + catCommand, + putCommand, + editCommand, + ]), Command.withDescription('Two-way Notion enhanced Markdown sync'), ) diff --git a/packages/@overeng/notion-md/src/cli.ts b/packages/@overeng/notion-md/src/cli.ts index 9ff772fe9..046dacd54 100644 --- a/packages/@overeng/notion-md/src/cli.ts +++ b/packages/@overeng/notion-md/src/cli.ts @@ -1,11 +1,20 @@ #!/usr/bin/env bun import { NodeContext, NodeRuntime } from '@effect/platform-node' -import { Effect, Layer } from 'effect' +import { Effect, type Exit, Layer } from 'effect' import { makeOtelCliLayer } from '@overeng/utils/node/otel' import { cli, renderCliError } from './cli-program.ts' +import { editorExitCode } from './exit-codes.ts' + +/** + * Map the program `Exit` to the editor-surface exit-code contract (exit-codes.ts). + * Runs after every scope/finalizer closes, so `edit`'s temp-dir cleanup is safe. + */ +const editorTeardown = <E, A>(exit: Exit.Exit<E, A>, onExit: (code: number) => void): void => { + onExit(editorExitCode(exit)) +} const toEffectCliArgv = ({ binaryName, @@ -30,5 +39,5 @@ export const runCliMain = ({ ) if (import.meta.main) { - runCliMain().pipe(NodeRuntime.runMain({ disableErrorReporting: true })) + runCliMain().pipe(NodeRuntime.runMain({ disableErrorReporting: true, teardown: editorTeardown })) } diff --git a/packages/@overeng/notion-md/src/editor-commands.ts b/packages/@overeng/notion-md/src/editor-commands.ts new file mode 100644 index 000000000..fd31968cf --- /dev/null +++ b/packages/@overeng/notion-md/src/editor-commands.ts @@ -0,0 +1,633 @@ +import { join } from 'node:path' + +import { Command as PlatformCommand, type CommandExecutor, FileSystem } from '@effect/platform' +import { Console, Effect } from 'effect' + +import { describeBodyLossyRefusal } from '@overeng/notion-core' +import type { Sha256Digest } from '@overeng/notion-effect-client' + +import { + observeRemoteEditorPage, + replaceRemoteBodyForced, + replaceRemoteBodyVerified, +} from './body-facade.ts' +import { semanticEquivalent } from './canonical-markdown.ts' +import { editorBaseHash, parseTitleBody, serializeTitleBody } from './editor-surface.ts' +import { + NmdConflictError, + NmdEditorAbortedError, + NmdGatewayError, + NmdInvalidDocumentError, + NmdPartialWriteError, + NmdPostPushGateError, + NmdRemoteBodyLossyError, + type NmdError, +} from './errors.ts' +import { parseNmdFile, renderNmdFile } from './frontmatter.ts' +import { normalizeMarkdownLineEndings, sha256Digest } from './hash.ts' +import { NotionMdGateway } from './model.ts' +import { + annotateAttrs, + CatSpan, + EditSpan, + editResultAttrs, + PutSpan, + putResultAttrs, + withOperation, +} from './observability.ts' +import { reportNote } from './progress.ts' +import type { NmdStateStore } from './state-store.ts' +import { buildFrontmatterV2, pullPage, syncPageReplacingBody } from './sync.ts' + +/** Editor representation mode for `cat` / `put` / `edit`. */ +export type EditorMode = 'default' | 'frontmatter' + +// --------------------------------------------------------------------------- +// cat +// --------------------------------------------------------------------------- + +/** Inputs for the `cat` editor projection of a Notion page. */ +export interface CatOptions { + readonly pageId: string + readonly mode: EditorMode + /** Sink for the base-hash stderr line (decision 0002). Overridable for tests. */ + readonly writeStderr?: (line: string) => Effect.Effect<void> + /** Sink for the Markdown / envelope stdout. Overridable for tests. */ + readonly writeStdout?: (value: string) => Effect.Effect<void> +} + +/** Outcome of a `cat` invocation (base hash present only in default mode). */ +export interface CatResult { + readonly pageId: string + readonly mode: EditorMode + readonly baseHash?: Sha256Digest +} + +/** A projected editor buffer for a page (base hash present only in default mode). */ +interface ProjectedPageBuffer { + readonly pageId: string + readonly buffer: string + readonly baseHash?: Sha256Digest +} + +/** + * Project a Notion page into the editor buffer for the active mode, refusing a + * lossy page (exit 3) at observe/pull time. The single source of truth for the + * `cat` / read-only `edit` presentation: default mode is `# title` + body (via + * `observeRemoteEditorPage`), `--frontmatter` is the full strict `.nmd` envelope + * (via the engine pull). Pure projection — no stdout/stderr side effects — so + * each caller owns its own sinks (`cat`'s `base-hash:` line, the byte-exact pipe + * output) without duplicating the lossy-refusal logic. + */ +const projectPageBuffer = (opts: { + readonly pageId: string + readonly mode: EditorMode +}): Effect.Effect<ProjectedPageBuffer, NmdError, NotionMdGateway> => + Effect.gen(function* () { + if (opts.mode === 'frontmatter') { + const gateway = yield* NotionMdGateway + const pulled = yield* gateway.pullPage({ pageId: opts.pageId }) + const completeness = pulled.markdown.completeness + if (completeness !== undefined && completeness._tag !== 'complete') { + return yield* new NmdRemoteBodyLossyError({ + operation: 'cat_frontmatter', + page_id: pulled.page.id, + reasons: [...completeness.reasons], + message: describeBodyLossyRefusal({ + pageId: pulled.page.id, + completeness, + context: 'refusing to dump a lossy page envelope', + }), + }) + } + return { + pageId: pulled.page.id, + buffer: renderNmdFile({ + frontmatter: buildFrontmatterV2({ page: pulled.page }), + body: pulled.markdown.markdown, + }), + } + } + + const snapshot = yield* observeRemoteEditorPage({ pageId: opts.pageId }) + return { + pageId: snapshot.pageId, + buffer: serializeTitleBody({ title: snapshot.title, body: snapshot.body }), + baseHash: snapshot.baseHash, + } + }) + +/** + * `cat <page> [--frontmatter]` — emit the editor projection of a Notion page. + * + * Default mode prints `# <title>` + body to stdout and the title+body base hash + * (decisions 0001/0002/0006) to **stderr** (`base-hash: sha256:…`), keeping + * stdout pure Markdown for clean piping. `--frontmatter` is a read-only dump of + * the full strict `.nmd` envelope (no store written, decision 0017). Both modes + * refuse a lossy page (exit 3) at observe time. + */ +export const catEditorPage = ( + opts: CatOptions, +): Effect.Effect<CatResult, NmdError, NotionMdGateway> => + Effect.gen(function* () { + // Byte-exact stdout (no trailing newline added): `serializeTitleBody` / + // `renderNmdFile` already end in `\n`, and the exact bytes are load-bearing + // for the cross-machine base hash an independent implementation reproduces. + const writeStdout = + opts.writeStdout ?? ((value: string) => Effect.sync(() => void process.stdout.write(value))) + const writeStderr = opts.writeStderr ?? Console.error + + const projected = yield* projectPageBuffer({ pageId: opts.pageId, mode: opts.mode }) + yield* writeStdout(projected.buffer) + if (projected.baseHash !== undefined) { + yield* writeStderr(`base-hash: ${projected.baseHash}`) + } + return { + pageId: projected.pageId, + mode: opts.mode, + ...(projected.baseHash === undefined ? {} : { baseHash: projected.baseHash }), + } + }).pipe(withOperation(CatSpan, { pageId: opts.pageId, mode: opts.mode })) + +// --------------------------------------------------------------------------- +// put +// --------------------------------------------------------------------------- + +/** Inputs for a guarded `put` title+body write. */ +export interface PutOptions { + readonly pageId: string + /** Default-mode editor buffer: `# <title>` + body. */ + readonly buffer: string + /** Optimistic-concurrency token from a prior `cat` (decision 0002). */ + readonly baseHash?: Sha256Digest + /** Concurrency-only override (decision 0009): bypass the exit-7 guard, nothing else. */ + readonly force: boolean +} + +/** Outcome of a completed `put`: which surfaces were written and the new base hash. */ +export interface PutResult { + readonly pageId: string + readonly bodyWritten: boolean + readonly titleWritten: boolean + readonly forced: boolean + readonly baseHash: Sha256Digest +} + +/** + * `put <page> (--base-hash <h> | --force)` — guarded title+body write. + * + * Two writes, body first (decision 0012): the body via `replaceRemoteBodyVerified` + * (a `replace_content` that can never destroy an opaque block, since the page is + * representable), then the title via the typed page API. A partial failure (one + * write landed, the other failed) reports which landed and exits 10. The default + * guard re-reads the remote, recomputes the title+body hash, and refuses with + * exit 7 on drift; `--force` bypasses only that guard (decision 0009). There is + * no `--frontmatter` write (decision 0017). + */ +export const putEditorPage = ( + opts: PutOptions, +): Effect.Effect< + PutResult, + | NmdError + | NmdInvalidDocumentError + | NmdConflictError + | NmdPartialWriteError + | NmdPostPushGateError, + NotionMdGateway +> => + Effect.gen(function* () { + const gateway = yield* NotionMdGateway + const doc = yield* parseTitleBody({ buffer: opts.buffer, pageId: opts.pageId }) + + // Observe the current remote title+body (refuses a lossy page, exit 3) and + // form the guard base. + const current = yield* observeRemoteEditorPage({ pageId: opts.pageId }) + + if (opts.force === false) { + if (opts.baseHash === undefined) { + // Defensive: the CLI enforces --base-hash | --force, but keep the engine honest. + return yield* new NmdInvalidDocumentError({ + page_id: opts.pageId, + message: + 'put requires either --base-hash <hash> (guarded) or --force (concurrency override).', + }) + } + if (current.baseHash !== opts.baseHash) { + return yield* new NmdConflictError({ + path: opts.pageId, + page_id: opts.pageId, + local_changed: true, + remote_changed: true, + message: + `Remote page ${opts.pageId} moved since the supplied --base-hash ` + + `(expected ${opts.baseHash}, current ${current.baseHash}). Re-cat and retry, or --force.`, + }) + } + } + + // --- Write 1: body (replace_content). --- + // Guarded: `current.body` is the facade-normalized remote body, so its digest + // is exactly the body-only base hash `replaceRemoteBodyVerified` expects; a + // remote change between this observe and the facade's internal re-pull trips + // the inner guard and maps to exit 7 (a correct bonus TOCTOU catch). Force: + // skip the pre-write compare entirely (last-writer-wins, decision 0009) — the + // verified path would *die* on a concurrent change rather than overwrite. + const replaced = + opts.force === true + ? yield* replaceRemoteBodyForced({ pageId: opts.pageId, markdown: doc.body }) + : yield* replaceRemoteBodyVerified({ + pageId: opts.pageId, + baseBodyHash: sha256Digest(normalizeMarkdownLineEndings(current.body)), + markdown: doc.body, + }).pipe( + Effect.catchTag( + 'NotionMdBodyConflictError', + () => + new NmdConflictError({ + path: opts.pageId, + page_id: opts.pageId, + local_changed: true, + remote_changed: true, + message: `Remote body for page ${opts.pageId} changed before verified replace; re-cat and retry, or --force.`, + }), + ), + ) + + // --- Write 2: title (typed page API). A failure here is a partial write. --- + if (doc.title !== current.title) { + yield* gateway + .updatePageMetadata({ + pageId: opts.pageId, + metadata: { title: { key: current.titlePropertyKey, value: doc.title } }, + }) + .pipe( + Effect.mapError( + (cause) => + new NmdPartialWriteError({ + page_id: opts.pageId, + body_written: true, + title_written: false, + message: + `put landed the body write but the title write failed for page ${opts.pageId}; ` + + 'the page is in a mixed state with a stale base hash. Re-cat to recover.', + cause, + }), + ), + ) + } + + // --- Post-push semantic-equivalence gate (decision 0012; exit 9). --- + if (semanticEquivalent({ a: replaced.markdown, b: doc.body }) === false) { + return yield* new NmdPostPushGateError({ + page_id: opts.pageId, + message: + `Post-push gate rejected the result for page ${opts.pageId}: the stored body is not ` + + 'semantically equivalent to what was sent. The page may be mutated; re-cat to inspect.', + }) + } + + const finalBaseHash = editorBaseHash({ title: doc.title, body: replaced.markdown }) + return { + pageId: opts.pageId, + bodyWritten: true, + titleWritten: doc.title !== current.title, + forced: opts.force, + baseHash: finalBaseHash, + } + }).pipe( + Effect.tap((result) => + annotateAttrs(putResultAttrs, { + bodyWritten: result.bodyWritten, + titleWritten: result.titleWritten, + }), + ), + withOperation(PutSpan, { pageId: opts.pageId, force: opts.force }), + ) + +// --------------------------------------------------------------------------- +// edit +// --------------------------------------------------------------------------- + +/** Inputs for an ephemeral `edit` editor session. */ +export interface EditOptions { + readonly pageId: string + readonly mode: EditorMode + /** Original `<page>` token, used to name the durable conflict sibling. */ + readonly pageRef: string + /** + * Launch the editor on the buffer file. Defaults to spawning + * `$VISUAL`→`$EDITOR`→`vi` and returning its exit code. Overridable for tests + * (a scripted non-interactive editor). + */ + readonly runEditor?: (opts: { + readonly filePath: string + }) => Effect.Effect< + number, + NmdGatewayError, + CommandExecutor.CommandExecutor | FileSystem.FileSystem + > +} + +/** Outcome of an `edit` session: pushed, no-op, or relocated-conflict. */ +export interface EditResult { + readonly pageId: string + readonly outcome: 'pushed' | 'noop' | 'conflict' + readonly conflictPath?: string +} + +const resolveEditorCommand = (): string => process.env['VISUAL'] ?? process.env['EDITOR'] ?? 'vi' + +const defaultRunEditor = (opts: { + readonly filePath: string +}): Effect.Effect<number, NmdGatewayError, CommandExecutor.CommandExecutor> => + Effect.gen(function* () { + const editor = resolveEditorCommand() + // Split a possibly-flagged editor command (e.g. `code --wait`) on whitespace. + const [bin, ...args] = editor.split(/\s+/u).filter((part) => part.length > 0) + const command = PlatformCommand.make(bin ?? 'vi', ...args, opts.filePath).pipe( + PlatformCommand.stdin('inherit'), + PlatformCommand.stdout('inherit'), + PlatformCommand.stderr('inherit'), + ) + return yield* PlatformCommand.exitCode(command).pipe( + Effect.mapError( + (cause) => + new NmdGatewayError({ + operation: 'edit_spawn_editor', + message: `Failed to launch editor \`${editor}\`: ${String(cause)}`, + cause, + }), + ), + ) + }) + +/** + * `edit <page> [--frontmatter]` — ephemeral file-engine editor session + * (decision 0017). Not a second push engine: pull the page into a `.nmd` + + * `.notion-md/` under `$TMPDIR`, present the body in `$EDITOR`, splice the edit + * back, push through the engine's guarded `syncPage` (forcing a full-body + * `replace_content`), relocate any `.conflict.roughdraft.md` out of `$TMPDIR` to + * a durable `<page>.conflict.md`, and scope-clean the temp tree on every path + * (success / conflict / abort / interrupt). A non-zero editor exit aborts with + * exit 8 and nothing is pushed; an unchanged buffer is a no-op. + */ +export const editEditorPage = ( + opts: EditOptions, +): Effect.Effect< + EditResult, + NmdError | NmdInvalidDocumentError | NmdEditorAbortedError, + FileSystem.FileSystem | NotionMdGateway | NmdStateStore | CommandExecutor.CommandExecutor +> => + Effect.scoped( + Effect.gen(function* () { + const fs = yield* FileSystem.FileSystem + const runEditor = opts.runEditor ?? defaultRunEditor + const dir = yield* fs.makeTempDirectoryScoped({ prefix: 'notion-md-edit-' }).pipe( + Effect.mapError( + (cause) => + new NmdGatewayError({ + operation: 'edit_mktemp', + page_id: opts.pageId, + message: `Failed to create editor session temp dir: ${String(cause)}`, + cause, + }), + ), + ) + const nmdPath = join(dir, 'page.nmd') + + // 1. Pull into the ephemeral session (refuses a lossy page here, exit 3). + yield* pullPage({ pageId: opts.pageId, outPath: nmdPath }) + + // 2. Project the editor buffer (default: # title + body; frontmatter: envelope). + const original = yield* fs + .readFileString(nmdPath) + .pipe(Effect.mapError(editorIoError({ pageId: opts.pageId, path: nmdPath }))) + const buffer = yield* projectEditorBuffer({ mode: opts.mode, envelope: original }) + const bufferPath = join(dir, opts.mode === 'frontmatter' ? 'page.nmd' : 'page.md') + yield* fs + .writeFileString(bufferPath, buffer) + .pipe(Effect.mapError(editorIoError({ pageId: opts.pageId, path: bufferPath }))) + + // 3. Launch the editor. + const exitCode = yield* runEditor({ filePath: bufferPath }) + if (exitCode !== 0) { + return yield* new NmdEditorAbortedError({ + page_id: opts.pageId, + editor: resolveEditorCommand(), + exit_code: exitCode, + message: `Editor exited with code ${exitCode}; nothing was pushed for page ${opts.pageId}.`, + }) + } + + const edited = yield* fs + .readFileString(bufferPath) + .pipe(Effect.mapError(editorIoError({ pageId: opts.pageId, path: bufferPath }))) + + // 4. Unchanged buffer → no-op. + if (edited === buffer) { + return { pageId: opts.pageId, outcome: 'noop' as const } + } + + // 5. Splice the edit back into the envelope and write the temp .nmd. + const reattached = yield* reattachEditorBuffer({ + mode: opts.mode, + envelope: original, + edited, + pageId: opts.pageId, + path: nmdPath, + }) + yield* fs + .writeFileString(nmdPath, reattached) + .pipe(Effect.mapError(editorIoError({ pageId: opts.pageId, path: nmdPath }))) + + // 6. Push through the engine (full-body replace_content, decision 0017). + return yield* syncPageReplacingBody({ path: nmdPath }).pipe( + Effect.as<EditResult>({ pageId: opts.pageId, outcome: 'pushed' }), + Effect.catchTag('NmdConflictError', (error) => + relocateConflict({ fs, error, pageRef: opts.pageRef, pageId: opts.pageId }), + ), + ) + }), + ).pipe( + Effect.tap((result) => annotateAttrs(editResultAttrs, { outcome: result.outcome })), + Effect.catchTag('NmdEditorAbortedError', (error) => + annotateAttrs(editResultAttrs, { outcome: 'aborted' }).pipe( + Effect.zipRight(Effect.fail(error)), + ), + ), + withOperation(EditSpan, { pageId: opts.pageId, mode: opts.mode }), + ) + +// --------------------------------------------------------------------------- +// edit --read-only +// --------------------------------------------------------------------------- + +/** Inputs for a read-only `edit --read-only` inspection session. */ +export interface ReadOnlyEditOptions { + readonly pageId: string + readonly mode: EditorMode + /** + * Sink for the `read-only: changes were not synced` note. Defaults to + * `Console.error` (stderr); overridable for tests. + */ + readonly writeStderr?: (line: string) => Effect.Effect<void> + /** + * Launch the editor on the buffer file. Same default as `edit` + * (`$VISUAL`→`$EDITOR`→`vi`); overridable for tests. + */ + readonly runEditor?: (opts: { + readonly filePath: string + }) => Effect.Effect< + number, + NmdGatewayError, + CommandExecutor.CommandExecutor | FileSystem.FileSystem + > +} + +/** Outcome of a read-only `edit --read-only` session (always a discarding no-op). */ +export interface ReadOnlyEditResult { + readonly pageId: string + readonly outcome: 'read-only' +} + +/** + * `edit <page> --read-only [--frontmatter]` — open the page in `$EDITOR` for + * inspection only (the terminal analogue of `vim -R` / `git show`). + * + * Reuses the exact `cat` presentation via `projectPageBuffer` (default `# title` + * + body, or the full `.nmd` envelope with `--frontmatter`), refusing a lossy + * page (exit 3) at observe/pull time just like `edit`/`cat`. Unlike `edit`, this + * is a deliberately lighter path: a single observe/pull into a `$TMPDIR` temp + * file (no engine round-trip, no `NmdStateStore`). On editor exit — **regardless + * of the exit code** — nothing is ever pushed or written remotely (no + * `syncPage`, no `replaceRemoteBodyVerified`, no metadata/property writes); every + * edit is discarded, the scoped temp tree is reaped, a `read-only: changes were + * not synced` note is printed to stderr, and the session exits 0. There is no + * base-hash/guard machinery because nothing is written. + */ +export const editReadOnlyPage = ( + opts: ReadOnlyEditOptions, +): Effect.Effect< + ReadOnlyEditResult, + NmdError, + FileSystem.FileSystem | NotionMdGateway | CommandExecutor.CommandExecutor +> => + Effect.scoped( + Effect.gen(function* () { + const fs = yield* FileSystem.FileSystem + const runEditor = opts.runEditor ?? defaultRunEditor + const writeStderr = opts.writeStderr ?? Console.error + + // 1. Observe/pull the page projection (refuses a lossy page here, exit 3). + const projected = yield* projectPageBuffer({ pageId: opts.pageId, mode: opts.mode }) + + // 2. Materialize the buffer in a scoped temp tree (reaped on every path). + const dir = yield* fs.makeTempDirectoryScoped({ prefix: 'notion-md-view-' }).pipe( + Effect.mapError( + (cause) => + new NmdGatewayError({ + operation: 'edit_read_only_mktemp', + page_id: opts.pageId, + message: `Failed to create read-only session temp dir: ${String(cause)}`, + cause, + }), + ), + ) + const bufferPath = join(dir, opts.mode === 'frontmatter' ? 'page.nmd' : 'page.md') + yield* fs + .writeFileString(bufferPath, projected.buffer) + .pipe(Effect.mapError(editorIoError({ pageId: opts.pageId, path: bufferPath }))) + + // 3. Launch the editor for inspection. The exit code is irrelevant: there + // is nothing to push, so a non-zero exit is just a clean no-op too. + yield* runEditor({ filePath: bufferPath }) + + // 4. Always discard. Never push, never write anything remote. + yield* writeStderr('read-only: changes were not synced') + return { pageId: opts.pageId, outcome: 'read-only' as const } + }), + ).pipe( + Effect.tap((result) => annotateAttrs(editResultAttrs, { outcome: result.outcome })), + withOperation(EditSpan, { pageId: opts.pageId, mode: opts.mode }), + ) + +const editorIoError = + (ctx: { readonly pageId: string; readonly path: string }) => + (cause: unknown): NmdGatewayError => + new NmdGatewayError({ + operation: 'edit_session_io', + page_id: ctx.pageId, + message: `Editor session IO failed for ${ctx.path}: ${String(cause)}`, + cause, + }) + +/** Project the temp `.nmd` envelope into the editor buffer for the active mode. */ +const projectEditorBuffer = (opts: { + readonly mode: EditorMode + readonly envelope: string +}): Effect.Effect<string, NmdError | NmdInvalidDocumentError> => + Effect.gen(function* () { + if (opts.mode === 'frontmatter') return opts.envelope + const parsed = yield* parseNmdFile({ path: 'edit-session', content: opts.envelope }) + return serializeTitleBody({ title: parsed.frontmatter.notion_md.page.title, body: parsed.body }) + }) + +/** Splice the edited buffer back into the temp `.nmd` envelope for the active mode. */ +const reattachEditorBuffer = (opts: { + readonly mode: EditorMode + readonly envelope: string + readonly edited: string + readonly pageId: string + readonly path: string +}): Effect.Effect<string, NmdError | NmdInvalidDocumentError> => + Effect.gen(function* () { + if (opts.mode === 'frontmatter') return opts.edited + const parsed = yield* parseNmdFile({ path: opts.path, content: opts.envelope }) + const doc = yield* parseTitleBody({ buffer: opts.edited, pageId: opts.pageId }) + return renderNmdFile({ + frontmatter: { + notion_md: { + ...parsed.frontmatter.notion_md, + page: { ...parsed.frontmatter.notion_md.page, title: doc.title }, + }, + }, + body: doc.body, + }) + }) + +/** + * Copy the engine's `.conflict.roughdraft.md` (written inside `$TMPDIR`, which is + * reaped on scope close) to a durable `<page>.conflict.md` sibling in the cwd so + * a conflicted edit is recoverable, and return a conflict outcome. + */ +const relocateConflict = (opts: { + readonly fs: FileSystem.FileSystem + readonly error: NmdConflictError + readonly pageRef: string + readonly pageId: string +}): Effect.Effect<EditResult, NmdError> => + Effect.gen(function* () { + const source = opts.error.conflict_path + const durable = `${sanitizePageRef(opts.pageRef)}.conflict.md` + if (source !== undefined) { + const contents = yield* opts.fs + .readFileString(source) + .pipe(Effect.mapError(editorIoError({ pageId: opts.pageId, path: source }))) + yield* opts.fs + .writeFileString(durable, contents) + .pipe(Effect.mapError(editorIoError({ pageId: opts.pageId, path: durable }))) + } + /* + * Surface the conflict on stderr with the DURABLE path (not the engine's + * $TMPDIR roughdraft, which `relocateConflict` has just reaped). The note is + * emitted here, after relocation, so the path it names still exists. + */ + yield* reportNote( + `remote changed and overlaps your edit — wrote conflict draft to ${durable}; nothing pushed`, + ) + return { pageId: opts.pageId, outcome: 'conflict', conflictPath: durable } + }) + +/** Turn an arbitrary `<page>` token into a filesystem-safe basename. */ +const sanitizePageRef = (pageRef: string): string => + pageRef.replace(/^https?:\/\//u, '').replace(/[^A-Za-z0-9._-]/gu, '_') || 'page' diff --git a/packages/@overeng/notion-md/src/editor-commands.unit.test.ts b/packages/@overeng/notion-md/src/editor-commands.unit.test.ts new file mode 100644 index 000000000..bcf304dab --- /dev/null +++ b/packages/@overeng/notion-md/src/editor-commands.unit.test.ts @@ -0,0 +1,291 @@ +import { Effect, Layer } from 'effect' +import { describe, expect, it } from 'vitest' + +import type { BodyCompleteness } from '@overeng/notion-core' + +import { catEditorPage, putEditorPage } from './editor-commands.ts' +import { editorBaseHash } from './editor-surface.ts' +import { NmdGatewayError } from './errors.ts' +import { normalizeMarkdownLineEndings } from './hash.ts' +import { + NotionMdGateway, + type NotionMdGatewayShape, + type PullPageResult, + type RemotePageSnapshot, +} from './model.ts' + +const pageId = '00000000-0000-4000-8000-000000000001' + +const lossyCompleteness: BodyCompleteness = { + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['child_database'], +} + +interface FakeState { + title: string + body: string + completeness?: BodyCompleteness +} + +const pageSnapshot = (state: FakeState): RemotePageSnapshot => ({ + id: pageId, + title: state.title, + title_property_key: 'title', + url: 'https://notion.so/page', + parent: { type: 'workspace', workspace: true }, + icon: null, + cover: null, + in_trash: false, + is_locked: false, + last_edited_time: '2026-06-15T12:00:00.000Z', + properties: {}, +}) + +const pullResult = (state: FakeState): PullPageResult => ({ + page: pageSnapshot(state), + markdown: { + markdown: normalizeMarkdownLineEndings(state.body), + truncated: false, + unknown_block_ids: [], + ...(state.completeness === undefined ? {} : { completeness: state.completeness }), + }, + storage: { _tag: 'self_contained', unsupported_blocks: [], files: [], comments: [] }, +}) + +/** A mutable fake gateway whose body/title can be edited and read back. */ +class FakeGateway { + readonly state: FakeState + readonly metadataCalls: Array<{ readonly title?: { readonly value: string } }> = [] + readonly markdownCalls: Array<{ readonly markdown: string }> = [] + /** Body returned by the next pull only, simulating a concurrent remote writer. */ + private injectNextPullBody: string | undefined = undefined + pullCount = 0 + /** When true, the title metadata write fails (to exercise the exit-10 partial write). */ + failTitleWrite = false + + constructor(initial: FakeState) { + this.state = { ...initial, body: normalizeMarkdownLineEndings(initial.body) } + } + + /** Simulate a concurrent writer: the next `pullPage` reports a different body. */ + concurrentlyChangeBodyOnce(body: string): void { + this.injectNextPullBody = normalizeMarkdownLineEndings(body) + } + + readonly layer = Layer.succeed(NotionMdGateway, { + pullPage: () => + Effect.sync(() => { + this.pullCount += 1 + if (this.injectNextPullBody !== undefined) { + this.state.body = this.injectNextPullBody + this.injectNextPullBody = undefined + } + return pullResult(this.state) + }), + updateMarkdown: ({ command }) => + Effect.sync(() => { + if (command._tag === 'replace_content') { + this.state.body = normalizeMarkdownLineEndings(command.markdown) + this.markdownCalls.push({ markdown: command.markdown }) + } + return { markdown: pullResult(this.state).markdown } + }), + updatePageProperties: () => Effect.dieMessage('unexpected updatePageProperties'), + retrieveDataSource: () => Effect.dieMessage('unexpected retrieveDataSource'), + updatePageMetadata: ({ metadata }) => + this.failTitleWrite === true + ? Effect.fail( + new NmdGatewayError({ + operation: 'update_page_metadata', + page_id: pageId, + message: 'forced title write failure', + }), + ) + : Effect.sync(() => { + if (metadata.title !== undefined) { + this.state.title = metadata.title.value + this.metadataCalls.push({ title: { value: metadata.title.value } }) + } + return pageSnapshot(this.state) + }), + listChildPages: () => Effect.succeed([]), + createPage: () => Effect.dieMessage('unexpected createPage'), + movePage: () => Effect.dieMessage('unexpected movePage'), + archivePage: () => Effect.dieMessage('unexpected archivePage'), + } satisfies NotionMdGatewayShape) +} + +const runWith = <A, E>(effect: Effect.Effect<A, E, NotionMdGateway>, gateway: FakeGateway) => + Effect.either(effect).pipe(Effect.provide(gateway.layer), Effect.runPromise) + +describe('cat', () => { + it('emits `# title` + body to stdout and the base hash to stderr', async () => { + const gateway = new FakeGateway({ title: 'Hello', body: 'world' }) + const stdout: string[] = [] + const stderr: string[] = [] + const result = await runWith( + catEditorPage({ + pageId, + mode: 'default', + writeStdout: (v) => Effect.sync(() => void stdout.push(v)), + writeStderr: (v) => Effect.sync(() => void stderr.push(v)), + }), + gateway, + ) + expect(result._tag).toBe('Right') + expect(stdout).toEqual(['# Hello\n\nworld\n']) + expect(stderr[0]).toBe(`base-hash: ${editorBaseHash({ title: 'Hello', body: 'world\n' })}`) + }) + + it('refuses a lossy page (NmdRemoteBodyLossyError → exit 3)', async () => { + const gateway = new FakeGateway({ title: 'X', body: 'y', completeness: lossyCompleteness }) + const result = await runWith( + catEditorPage({ + pageId, + mode: 'default', + writeStdout: () => Effect.void, + writeStderr: () => Effect.void, + }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdRemoteBodyLossyError') + }) + + it('--frontmatter dumps the full envelope', async () => { + const gateway = new FakeGateway({ title: 'Env', body: 'body text' }) + const stdout: string[] = [] + const result = await runWith( + catEditorPage({ + pageId, + mode: 'frontmatter', + writeStdout: (v) => Effect.sync(() => void stdout.push(v)), + writeStderr: () => Effect.void, + }), + gateway, + ) + expect(result._tag).toBe('Right') + expect(stdout[0]).toContain('---\n') + expect(stdout[0]).toContain('"version": 2') + expect(stdout[0]).toContain('body text') + }) +}) + +describe('put', () => { + it('round-trips a cat→put fixpoint with a matching base hash', async () => { + const gateway = new FakeGateway({ title: 'Title', body: 'original' }) + const baseHash = editorBaseHash({ title: 'Title', body: 'original\n' }) + const result = await runWith( + putEditorPage({ + pageId, + buffer: '# Title\n\nedited body\n', + baseHash, + force: false, + }), + gateway, + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') { + expect(result.right.bodyWritten).toBe(true) + expect(result.right.titleWritten).toBe(false) + } + expect(gateway.state.body).toBe('edited body\n') + expect(gateway.markdownCalls).toHaveLength(1) + }) + + it('writes the title through the typed API when it changed (body first, title last)', async () => { + const gateway = new FakeGateway({ title: 'Old', body: 'b' }) + const baseHash = editorBaseHash({ title: 'Old', body: 'b\n' }) + const result = await runWith( + putEditorPage({ pageId, buffer: '# New\n\nb\n', baseHash, force: false }), + gateway, + ) + expect(result._tag).toBe('Right') + expect(gateway.state.title).toBe('New') + expect(gateway.metadataCalls).toHaveLength(1) + }) + + it('refuses a stale base hash with NmdConflictError (exit 7)', async () => { + const gateway = new FakeGateway({ title: 'T', body: 'current' }) + const result = await runWith( + putEditorPage({ + pageId, + buffer: '# T\n\nnew\n', + baseHash: editorBaseHash({ title: 'T', body: 'STALE\n' }), + force: false, + }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdConflictError') + // Nothing written on a guard refusal. + expect(gateway.markdownCalls).toHaveLength(0) + }) + + it('refuses a missing title H1 (NmdInvalidDocumentError → exit 5)', async () => { + const gateway = new FakeGateway({ title: 'T', body: 'b' }) + const result = await runWith( + putEditorPage({ pageId, buffer: 'no heading here\n', force: true }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdInvalidDocumentError') + }) + + it('--force bypasses the stale-base guard (concurrency-only)', async () => { + const gateway = new FakeGateway({ title: 'T', body: 'current' }) + const result = await runWith( + putEditorPage({ pageId, buffer: '# T\n\nforced\n', force: true }), + gateway, + ) + expect(result._tag).toBe('Right') + expect(gateway.state.body).toBe('forced\n') + }) + + it('--force overwrites a remote that changed mid-flight (last-writer-wins, not a crash)', async () => { + const gateway = new FakeGateway({ title: 'T', body: 'observed' }) + // After the initial observe pull, a concurrent writer changes the body. + // A verified guard would *die* here; force must overwrite instead. + gateway.concurrentlyChangeBodyOnce('someone-elses-edit') + const result = await runWith( + putEditorPage({ pageId, buffer: '# T\n\nmy forced body\n', force: true }), + gateway, + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right.bodyWritten).toBe(true) + expect(gateway.state.body).toBe('my forced body\n') + }) + + it('reports a partial write (body landed, title failed) as NmdPartialWriteError → exit 10', async () => { + const gateway = new FakeGateway({ title: 'Old', body: 'b' }) + gateway.failTitleWrite = true + const baseHash = editorBaseHash({ title: 'Old', body: 'b\n' }) + const result = await runWith( + // Title changes → triggers the title write, which is forced to fail. + putEditorPage({ pageId, buffer: '# New\n\nedited\n', baseHash, force: false }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') { + expect(result.left._tag).toBe('NmdPartialWriteError') + if (result.left._tag === 'NmdPartialWriteError') { + expect(result.left.body_written).toBe(true) + expect(result.left.title_written).toBe(false) + } + } + // The body write did land; the title did not. + expect(gateway.state.body).toBe('edited\n') + expect(gateway.state.title).toBe('Old') + }) + + it('refuses a lossy page even with --force (correctness, not concurrency)', async () => { + const gateway = new FakeGateway({ title: 'T', body: 'b', completeness: lossyCompleteness }) + const result = await runWith( + putEditorPage({ pageId, buffer: '# T\n\nx\n', force: true }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdRemoteBodyLossyError') + }) +}) diff --git a/packages/@overeng/notion-md/src/editor-edit.e2e.test.ts b/packages/@overeng/notion-md/src/editor-edit.e2e.test.ts new file mode 100644 index 000000000..4e24962f1 --- /dev/null +++ b/packages/@overeng/notion-md/src/editor-edit.e2e.test.ts @@ -0,0 +1,330 @@ +import { existsSync, mkdtempSync, readFileSync, rmSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { dirname, join } from 'node:path' + +import { FileSystem } from '@effect/platform' +import { NodeContext } from '@effect/platform-node' +import { Effect, Layer } from 'effect' +import { describe, expect, it } from 'vitest' + +import type { BodyCompleteness } from '@overeng/notion-core' + +import { editEditorPage, editReadOnlyPage } from './editor-commands.ts' +import { + FakeGateway, + type FakeState, + harnessPageId as pageId, + pull, + scriptedEditor, +} from './editor-test-harness.ts' +import { NmdGatewayError } from './errors.ts' +import { normalizeMarkdownLineEndings } from './hash.ts' +import { NotionMdGateway, type NotionMdGatewayShape } from './model.ts' +import { NmdStateStoreLive, type NmdStateStore } from './state-store.ts' + +const stateStoreLayer = NmdStateStoreLive.pipe(Layer.provide(NodeContext.layer)) + +const runEdit = <A, E>( + effect: Effect.Effect<A, E, NotionMdGateway | NmdStateStore | NodeContext.NodeContext>, + gateway: FakeGateway, +) => + Effect.either(effect).pipe( + Effect.provide(Layer.mergeAll(gateway.layer, stateStoreLayer, NodeContext.layer)), + Effect.runPromise, + ) + +describe('edit (ephemeral file-engine session)', () => { + it('round-trips a default-mode body edit through the engine and cleans up', async () => { + const gateway = new FakeGateway({ title: 'Doc', body: 'original line' }) + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((buffer) => buffer.replace('original line', 'edited line')), + }), + gateway, + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right.outcome).toBe('pushed') + expect(gateway.state.body).toBe('edited line\n') + }) + + it('splices a title edit through the typed page API', async () => { + const gateway = new FakeGateway({ title: 'Old Title', body: 'body' }) + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((buffer) => buffer.replace('# Old Title', '# New Title')), + }), + gateway, + ) + expect(result._tag).toBe('Right') + expect(gateway.state.title).toBe('New Title') + }) + + it('no-ops on an unchanged buffer (nothing pushed)', async () => { + const gateway = new FakeGateway({ title: 'Doc', body: 'unchanged' }) + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((buffer) => buffer), + }), + gateway, + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right.outcome).toBe('noop') + expect(gateway.state.body).toBe('unchanged\n') + }) + + it('aborts with NmdEditorAbortedError (exit 8) on a non-zero editor exit; nothing pushed', async () => { + const gateway = new FakeGateway({ title: 'Doc', body: 'safe' }) + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((buffer) => buffer.replace('safe', 'should-not-land'), 1), + }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdEditorAbortedError') + expect(gateway.state.body).toBe('safe\n') + }) + + it('relocates a conflict to a durable <page>.conflict.md when the remote changed concurrently', async () => { + // Base body established by the ephemeral pull (pull #1). The editor changes + // the line locally; a concurrent remote writer changes the SAME line after + // pull #1, so the engine cannot 3-way merge → NmdConflictError with a + // roughdraft path, which `edit` relocates out of $TMPDIR to the cwd. + const gateway = new FakeGateway({ title: 'Doc', body: 'the original line' }) + gateway.switchRemoteBodyAfter(1, 'a totally different remote line') + + // Run in a throwaway cwd: the durable `<page>.conflict.md` is written + // relative to the process cwd (would otherwise land in the package root). + const cwd = mkdtempSync(join(tmpdir(), 'notion-md-conflict-')) + const previousCwd = process.cwd() + process.chdir(cwd) + try { + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((buffer) => + buffer.replace('the original line', 'my local edit of that line'), + ), + }), + gateway, + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') { + expect(result.right.outcome).toBe('conflict') + expect(result.right.conflictPath).toBe(`${pageId}.conflict.md`) + // The durable conflict file exists and carries all three bodies so the + // edit is recoverable (the $TMPDIR roughdraft is already reaped). + const durable = readFileSync(join(cwd, `${pageId}.conflict.md`), 'utf8') + expect(durable).toContain('the original line') + expect(durable).toContain('my local edit of that line') + expect(durable).toContain('a totally different remote line') + } + } finally { + process.chdir(previousCwd) + rmSync(cwd, { recursive: true, force: true }) + } + }) + + it('refuses a lossy page at the ephemeral pull (exit 3)', async () => { + const gateway = new FakeGateway({ + title: 'Doc', + body: 'body', + completeness: { + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['synced_block'], + }, + }) + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((buffer) => `${buffer}\nmore`), + }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdRemoteBodyLossyError') + }) +}) + +/** + * A read-only fake gateway: every write path (`updateMarkdown`, + * `updatePageMetadata`, `updatePageProperties`) `dieMessage`s, so any push/write + * attempt crashes the test outright — the strongest proof that `--read-only` + * never writes (stronger than asserting empty call arrays). + */ +class ReadOnlyFakeGateway { + readonly state: FakeState + pullCount = 0 + constructor(initial: { title: string; body: string; completeness?: BodyCompleteness }) { + this.state = { + title: initial.title, + body: normalizeMarkdownLineEndings(initial.body), + completeness: initial.completeness ?? { _tag: 'complete' }, + } + } + + readonly layer = Layer.succeed(NotionMdGateway, { + pullPage: () => + Effect.sync(() => { + this.pullCount += 1 + return pull(this.state) + }), + updateMarkdown: () => Effect.dieMessage('read-only must never call updateMarkdown'), + updatePageProperties: () => Effect.dieMessage('read-only must never call updatePageProperties'), + retrieveDataSource: () => Effect.dieMessage('unexpected retrieveDataSource'), + updatePageMetadata: () => Effect.dieMessage('read-only must never call updatePageMetadata'), + listChildPages: () => Effect.succeed([]), + createPage: () => Effect.dieMessage('unexpected createPage'), + movePage: () => Effect.dieMessage('unexpected movePage'), + archivePage: () => Effect.dieMessage('unexpected archivePage'), + } satisfies NotionMdGatewayShape) +} + +/** A scripted editor that records the buffer it saw and rewrites it; tracks cleanup. */ +const recordingEditor = (opts: { + readonly transform?: (buffer: string) => string + readonly exitCode?: number +}) => { + const seen: { buffer?: string; filePath?: string } = {} + const run = (args: { + readonly filePath: string + }): Effect.Effect<number, NmdGatewayError, FileSystem.FileSystem> => + Effect.gen(function* () { + const fs = yield* FileSystem.FileSystem + const buffer = yield* fs.readFileString(args.filePath) + seen.buffer = buffer + seen.filePath = args.filePath + // Edit the buffer to prove the edits are discarded (never read back). + yield* fs.writeFileString( + args.filePath, + (opts.transform ?? ((b) => `${b}\nlocal edit`))(buffer), + ) + return opts.exitCode ?? 0 + }).pipe( + Effect.mapError( + (cause) => + new NmdGatewayError({ operation: 'recording_editor', message: String(cause), cause }), + ), + ) + return { seen, run } +} + +const runReadOnly = <A, E>( + effect: Effect.Effect<A, E, NotionMdGateway | NodeContext.NodeContext>, + gateway: ReadOnlyFakeGateway, +) => + Effect.either(effect).pipe( + // Deliberately NO stateStoreLayer: read-only's narrow R never needs it. + Effect.provide(Layer.mergeAll(gateway.layer, NodeContext.layer)), + Effect.runPromise, + ) + +describe('edit --read-only (inspection-only session)', () => { + it('presents the body, discards edits, and never calls any write gateway method', async () => { + const gateway = new ReadOnlyFakeGateway({ title: 'Doc', body: 'original line' }) + const editor = recordingEditor({ transform: (b) => b.replace('original line', 'edited line') }) + const stderr: string[] = [] + const result = await runReadOnly( + editReadOnlyPage({ + pageId, + mode: 'default', + writeStderr: (line) => Effect.sync(() => void stderr.push(line)), + runEditor: editor.run, + }), + gateway, + ) + // No write method was hit (else the gateway would have died, surfacing as a defect). + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right.outcome).toBe('read-only') + // The editor saw the cat-style projection (`# title` + body). + expect(editor.seen.buffer).toBe('# Doc\n\noriginal line\n') + // Remote body is untouched; the local edit was discarded. + expect(gateway.state.body).toBe('original line\n') + expect(stderr).toEqual(['read-only: changes were not synced']) + }) + + it('cleans up the scoped temp tree after the session', async () => { + const gateway = new ReadOnlyFakeGateway({ title: 'Doc', body: 'body' }) + const editor = recordingEditor({}) + const result = await runReadOnly( + editReadOnlyPage({ pageId, mode: 'default', runEditor: editor.run }), + gateway, + ) + expect(result._tag).toBe('Right') + // The buffer path lived under a scoped temp dir, reaped on scope close. + const dir = dirname(editor.seen.filePath ?? '') + expect(existsSync(dir)).toBe(false) + }) + + it('a non-zero editor exit is still a clean no-op (no abort, no push), exits read-only', async () => { + const gateway = new ReadOnlyFakeGateway({ title: 'Doc', body: 'safe' }) + const editor = recordingEditor({ exitCode: 1 }) + const stderr: string[] = [] + const result = await runReadOnly( + editReadOnlyPage({ + pageId, + mode: 'default', + writeStderr: (line) => Effect.sync(() => void stderr.push(line)), + runEditor: editor.run, + }), + gateway, + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right.outcome).toBe('read-only') + expect(gateway.state.body).toBe('safe\n') + expect(stderr).toEqual(['read-only: changes were not synced']) + }) + + it('--read-only --frontmatter inspects the full envelope read-only, still no writes', async () => { + const gateway = new ReadOnlyFakeGateway({ title: 'Env', body: 'envelope body' }) + const editor = recordingEditor({}) + const result = await runReadOnly( + editReadOnlyPage({ pageId, mode: 'frontmatter', runEditor: editor.run }), + gateway, + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right.outcome).toBe('read-only') + // The editor saw the full strict `.nmd` envelope, not the `# title` form. + expect(editor.seen.buffer).toContain('---\n') + expect(editor.seen.buffer).toContain('"version": 2') + expect(editor.seen.buffer).toContain('envelope body') + }) + + it('refuses a lossy page at observe time (exit 3); the editor is never launched', async () => { + const gateway = new ReadOnlyFakeGateway({ + title: 'Doc', + body: 'body', + completeness: { + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['synced_block'], + }, + }) + const editor = recordingEditor({}) + const result = await runReadOnly( + editReadOnlyPage({ pageId, mode: 'default', runEditor: editor.run }), + gateway, + ) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdRemoteBodyLossyError') + // Refused before any editor launch. + expect(editor.seen.buffer).toBeUndefined() + }) +}) diff --git a/packages/@overeng/notion-md/src/editor-observability.unit.test.ts b/packages/@overeng/notion-md/src/editor-observability.unit.test.ts new file mode 100644 index 000000000..d0578ddd3 --- /dev/null +++ b/packages/@overeng/notion-md/src/editor-observability.unit.test.ts @@ -0,0 +1,311 @@ +/** + * Group G span-assertion suite (R21–R24, R29). Drives the REAL instrumented + * `cat` / `put` / `edit` editor paths against a fake in-memory gateway (no + * secrets, no network) and asserts the emitted span shape via the otelite + * in-process capture bridge (`@overeng/utils-dev/otelite`), the same pattern the + * notion-effect-client `NotionHttp` span test uses. + * + * Two load-bearing checks: + * + * 1. Each top-level editor command emits its root span (`notion-md.cat` / + * `notion-md.put` / `notion-md.edit`) with the actually-emitted attributes, + * and `edit` wraps the engine's `notion-md.sync-page` / `status-page` / + * `push-page` spans as children of the one `notion-md.edit` root (decision + * 0017, R21). + * + * 2. R24 leak guard: no captured span attribute value carries a distinctive + * sentinel body, a signed-URL marker (`X-Amz-`/`Signature=`/`Expires=`), or a + * `Bearer ` token. The sentinel is pushed THROUGH the gateway so the assertion + * is meaningful (there is a real body in flight, not just "nothing to leak"). + * + * Attribute names asserted here are the ones the implementation actually emits + * (`notion_md.editor.mode`, `notion_md.put.body_written`, …). The spec's span + * table uses an `nmd.*` shorthand that was never implemented; the divergence is + * documented for a separate reconciliation rather than papered over with tests + * that would assert non-existent attributes. + */ + +import { FileSystem } from '@effect/platform' +import { NodeContext } from '@effect/platform-node' +import { expect, layer } from '@effect/vitest' +import { Effect, Layer } from 'effect' + +import type { BodyCompleteness } from '@overeng/notion-core' +import { + flushCaptureSpans, + makeOteliteCaptureLayer, + OteliteCapture, +} from '@overeng/utils-dev/otelite' + +import { + catEditorPage, + editEditorPage, + editReadOnlyPage, + putEditorPage, +} from './editor-commands.ts' +import { editorBaseHash } from './editor-surface.ts' +import { NmdGatewayError } from './errors.ts' +import { normalizeMarkdownLineEndings } from './hash.ts' +import { + NotionMdGateway, + type NotionMdGatewayShape, + type PullPageResult, + type RemotePageSnapshot, +} from './model.ts' +import { NmdStateStoreLive } from './state-store.ts' + +const exportInterval = 100 +const CaptureLayer = makeOteliteCaptureLayer({ exportInterval }) + +const pageId = '00000000-0000-4000-8000-000000000001' + +/** + * A distinctive body string. If it surfaces in any captured span attribute the + * R24 leak guard fails — bodies must never reach the telemetry. + */ +const SENTINEL_BODY = 'SECRET-SENTINEL-BODY-do-not-leak-9f3a' + +const snapshot = (state: { title: string }): RemotePageSnapshot => ({ + id: pageId, + title: state.title, + title_property_key: 'title', + url: 'https://notion.so/page', + parent: { type: 'workspace', workspace: true }, + icon: null, + cover: null, + in_trash: false, + is_locked: false, + last_edited_time: '2026-06-15T12:00:00.000Z', + properties: {}, +}) + +const pull = (state: { + title: string + body: string + completeness?: BodyCompleteness +}): PullPageResult => ({ + page: snapshot(state), + markdown: { + markdown: normalizeMarkdownLineEndings(state.body), + truncated: false, + unknown_block_ids: [], + ...(state.completeness === undefined ? {} : { completeness: state.completeness }), + }, + storage: { _tag: 'self_contained', unsupported_blocks: [], files: [], comments: [] }, +}) + +/** A mutable fake gateway; its body can be switched after the first pull. */ +class FakeGateway { + readonly state: { title: string; body: string; completeness?: BodyCompleteness } + /** When set, the next pull adopts this body (a concurrent remote writer). */ + private switchBodyOnPull: { afterPull: number; body: string } | undefined + pullCount = 0 + + constructor(initial: { title: string; body: string; completeness?: BodyCompleteness }) { + this.state = { ...initial, body: normalizeMarkdownLineEndings(initial.body) } + } + + /** After `afterPull` pulls have completed, the remote body becomes `body`. */ + switchRemoteBodyAfter(afterPull: number, body: string): void { + this.switchBodyOnPull = { afterPull, body: normalizeMarkdownLineEndings(body) } + } + + readonly layer = Layer.succeed(NotionMdGateway, { + pullPage: () => + Effect.sync(() => { + this.pullCount += 1 + if ( + this.switchBodyOnPull !== undefined && + this.pullCount > this.switchBodyOnPull.afterPull + ) { + this.state.body = this.switchBodyOnPull.body + this.switchBodyOnPull = undefined + } + return pull(this.state) + }), + updateMarkdown: ({ command }) => + Effect.sync(() => { + if (command._tag === 'replace_content') { + this.state.body = normalizeMarkdownLineEndings(command.markdown) + } + return { markdown: pull(this.state).markdown } + }), + updatePageProperties: () => Effect.dieMessage('unexpected updatePageProperties'), + retrieveDataSource: () => Effect.dieMessage('unexpected retrieveDataSource'), + updatePageMetadata: ({ metadata }) => + Effect.sync(() => { + if (metadata.title !== undefined) this.state.title = metadata.title.value + return snapshot(this.state) + }), + listChildPages: () => Effect.succeed([]), + createPage: () => Effect.dieMessage('unexpected createPage'), + movePage: () => Effect.dieMessage('unexpected movePage'), + archivePage: () => Effect.dieMessage('unexpected archivePage'), + } satisfies NotionMdGatewayShape) +} + +/** A scripted non-interactive editor used by the `edit` span test. */ +const scriptedEditor = + (transform: (buffer: string) => string) => + (opts: { + readonly filePath: string + }): Effect.Effect<number, NmdGatewayError, FileSystem.FileSystem> => + Effect.gen(function* () { + const fs = yield* FileSystem.FileSystem + const buffer = yield* fs.readFileString(opts.filePath) + yield* fs.writeFileString(opts.filePath, transform(buffer)) + return 0 + }).pipe( + Effect.mapError( + (cause) => + new NmdGatewayError({ operation: 'scripted_editor', message: String(cause), cause }), + ), + ) + +const stateStoreLayer = NmdStateStoreLive.pipe(Layer.provide(NodeContext.layer)) + +/** R24 leak guard applied across every captured span attribute value. */ +const assertNoSensitiveAttrs = ( + spans: ReadonlyArray<{ readonly attrs: Readonly<Record<string, string>> }>, +): void => { + for (const span of spans) { + for (const [key, value] of Object.entries(span.attrs)) { + expect(value).not.toContain(SENTINEL_BODY) + expect(value).not.toContain('X-Amz-') + expect(value).not.toContain('Signature=') + expect(value).not.toContain('Expires=') + expect(value).not.toContain('Bearer ') + expect(key.toLowerCase()).not.toContain('authorization') + } + } +} + +// `excludeTestServices: true` runs on the REAL clock so the OTLP exporter's +// batch loop + the flush sleep tick against wall time. +layer(CaptureLayer, { excludeTestServices: true })('editor span shapes (Group G)', (it) => { + it.effect('notion-md.cat emits page_id + editor.mode + span.label, no body leak', () => + Effect.gen(function* () { + const cap = yield* OteliteCapture + const gateway = new FakeGateway({ title: 'Hello', body: SENTINEL_BODY }) + + yield* catEditorPage({ + pageId, + mode: 'default', + writeStdout: () => Effect.void, + writeStderr: () => Effect.void, + }).pipe(Effect.provide(gateway.layer)) + + yield* flushCaptureSpans({ exportInterval }) + + const catSpans = yield* cap.inspect({ signal: 'traces', name: 'notion-md.cat' }) + expect(catSpans).toHaveLength(1) + const span = catSpans[0]! + expect(span.attrs['notion_md.page_id']).toBe(pageId) + expect(span.attrs['notion_md.editor.mode']).toBe('default') + expect(span.attrs['span.label']).toBe(pageId.slice(0, 8)) + + assertNoSensitiveAttrs(yield* cap.inspect({ signal: 'traces' })) + }), + ) + + it.effect('notion-md.put emits force + body_written + title_written result attrs', () => + Effect.gen(function* () { + const cap = yield* OteliteCapture + const gateway = new FakeGateway({ title: 'Title', body: 'before' }) + const baseHash = editorBaseHash({ title: 'Title', body: 'before\n' }) + + const result = yield* putEditorPage({ + pageId, + // Title changes (→ title_written true) and the body becomes the sentinel. + buffer: `# New Title\n\n${SENTINEL_BODY}\n`, + baseHash, + force: false, + }).pipe(Effect.provide(gateway.layer)) + expect(result.bodyWritten).toBe(true) + expect(result.titleWritten).toBe(true) + + yield* flushCaptureSpans({ exportInterval }) + + const putSpans = yield* cap.inspect({ signal: 'traces', name: 'notion-md.put' }) + expect(putSpans).toHaveLength(1) + const span = putSpans[0]! + expect(span.attrs['notion_md.page_id']).toBe(pageId) + expect(span.attrs['notion_md.put.force']).toBe('false') + expect(span.attrs['notion_md.put.body_written']).toBe('true') + expect(span.attrs['notion_md.put.title_written']).toBe('true') + expect(span.attrs['span.label']).toBe(pageId.slice(0, 8)) + + // R24: the sentinel body went through the gateway but must not be in a span. + assertNoSensitiveAttrs(yield* cap.inspect({ signal: 'traces' })) + }), + ) + + it.effect('notion-md.edit wraps the engine status/push spans and records outcome=pushed', () => + Effect.gen(function* () { + const cap = yield* OteliteCapture + const gateway = new FakeGateway({ title: 'Doc', body: 'original line' }) + + const result = yield* editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((buffer) => buffer.replace('original line', SENTINEL_BODY)), + }).pipe(Effect.provide(Layer.mergeAll(gateway.layer, stateStoreLayer, NodeContext.layer))) + expect(result.outcome).toBe('pushed') + + yield* flushCaptureSpans({ exportInterval }) + + const editSpans = yield* cap.inspect({ signal: 'traces', name: 'notion-md.edit' }) + expect(editSpans).toHaveLength(1) + const root = editSpans[0]! + expect(root.attrs['notion_md.page_id']).toBe(pageId) + expect(root.attrs['notion_md.editor.mode']).toBe('default') + expect(root.attrs['notion_md.edit.outcome']).toBe('pushed') + + // The engine spans the edit wraps (decision 0017). They share the edit's + // trace id, proving they are children of the one `notion-md.edit` root. + const syncPage = yield* cap.inspect({ signal: 'traces', name: 'notion-md.sync-page' }) + expect(syncPage).toHaveLength(1) + expect(syncPage[0]!.trace_id).toBe(root.trace_id) + const statusPage = yield* cap.inspect({ signal: 'traces', name: 'notion-md.status-page' }) + expect(statusPage.length).toBeGreaterThanOrEqual(1) + expect(statusPage[0]!.trace_id).toBe(root.trace_id) + const pushPage = yield* cap.inspect({ signal: 'traces', name: 'notion-md.push-page' }) + expect(pushPage.length).toBeGreaterThanOrEqual(1) + expect(pushPage[0]!.trace_id).toBe(root.trace_id) + + // R24 across the whole tree, including the wrapped engine spans. + assertNoSensitiveAttrs(yield* cap.inspect({ signal: 'traces' })) + }), + ) + + it.effect('notion-md.edit --read-only records outcome=read-only and never pushes', () => + Effect.gen(function* () { + const cap = yield* OteliteCapture + const gateway = new FakeGateway({ title: 'Doc', body: SENTINEL_BODY }) + + const result = yield* editReadOnlyPage({ + pageId, + mode: 'default', + writeStderr: () => Effect.void, + // The editor "edits" the buffer, but read-only discards it. + runEditor: scriptedEditor((buffer) => `${buffer}\ndiscarded edit`), + }).pipe(Effect.provide(Layer.mergeAll(gateway.layer, stateStoreLayer, NodeContext.layer))) + expect(result.outcome).toBe('read-only') + // Read-only never calls updateMarkdown: the remote body is untouched. + expect(gateway.state.body).toBe(normalizeMarkdownLineEndings(SENTINEL_BODY)) + + yield* flushCaptureSpans({ exportInterval }) + + // The capture layer accumulates across the cases in this block, so select + // the read-only span by its outcome rather than asserting a global count. + const editSpans = yield* cap.inspect({ signal: 'traces', name: 'notion-md.edit' }) + const root = editSpans.find((s) => s.attrs['notion_md.edit.outcome'] === 'read-only') + expect(root).toBeDefined() + expect(root!.attrs['notion_md.page_id']).toBe(pageId) + expect(root!.attrs['notion_md.editor.mode']).toBe('default') + + assertNoSensitiveAttrs(yield* cap.inspect({ signal: 'traces' })) + }), + ) +}) diff --git a/packages/@overeng/notion-md/src/editor-surface.ts b/packages/@overeng/notion-md/src/editor-surface.ts new file mode 100644 index 000000000..0fe603fc9 --- /dev/null +++ b/packages/@overeng/notion-md/src/editor-surface.ts @@ -0,0 +1,104 @@ +import { Effect } from 'effect' + +import type { Sha256Digest } from '@overeng/notion-effect-client' + +import { NmdInvalidDocumentError } from './errors.ts' +import { normalizeMarkdownLineEndings, sha256Digest } from './hash.ts' + +/** + * Shared title↔H1 splice and base-hash logic for the editor surfaces + * (`cat` / `put` / `edit`). + * + * Default mode presents a Notion page as a leading `# <title>` line, a blank + * line, then the body Markdown (decision 0001). The title is a *presentation* + * affordance only: on write it is transport-routed through the typed page API, + * never as a body block. The base hash covers the writable surface — title + + * body together (decisions 0001/0006) — so a title-only remote change still + * trips the guard. + * + * The exact serialized bytes are load-bearing: they are the cross-machine + * optimistic-concurrency token (`cat` prints the hash; `put` compares it) and + * the missing-title-H1 refusal anchor, so `serialize ∘ parse` must round-trip + * exactly (spec "Edge behavior"). + * + * The body arrives here already in the single canonical form (pull routes every + * body through `canonicalizeBlockMarkdown`, hosted-media URL canonicalization / + * Group B included — decision 0019). This module frames that canonical body + * verbatim and must NOT re-canonicalize it: the `# <title>` line is a + * presentation affordance, not body Markdown, and routing the title-framed + * buffer back through the Markdown parser would break the load-bearing H1 + * round-trip. Line-ending normalization (idempotent over a canonical body) is + * the only transform applied. + */ + +/** A default-mode editor document: the page title plus its Markdown body. */ +export interface TitleBodyDocument { + readonly title: string + readonly body: string +} + +/** + * Serialize a title + body to the canonical default-mode editor byte form. + * + * - titled + body: `# <title>\n\n<body>` (body ends in `\n`) + * - titled + empty body: `# <title>\n` + * - untitled (empty title) + empty body: `# \n` + * + * The body is line-ending normalized (CRLF→LF, trailing whitespace trimmed, a + * single trailing newline). A leading `# …` *inside* the body is ordinary + * content and is preserved verbatim — only line 1 is the title. + */ +export const serializeTitleBody = (doc: TitleBodyDocument): string => { + const body = normalizeMarkdownLineEndings(doc.body) + const bodyIsEmpty = body === '\n' + return bodyIsEmpty ? `# ${doc.title}\n` : `# ${doc.title}\n\n${body}` +} + +/** + * Parse a default-mode editor buffer back into title + body. + * + * Line 1 must be a `# ` ATX heading; its remainder (after `# `) is the title + * verbatim (empty → untitled). Everything after the first blank separator line + * is the body verbatim — including a body that starts with its own `# …` + * heading (the title/body H1 sharp edge: line 1 wins, the rest is body). A line + * 1 that is not a `# ` heading is refused (exit 5) rather than guessed, because + * silently emptying a title is property-level data loss (T03). + */ +export const parseTitleBody = (opts: { + readonly buffer: string + readonly pageId?: string +}): Effect.Effect<TitleBodyDocument, NmdInvalidDocumentError> => + Effect.gen(function* () { + const normalized = opts.buffer.replace(/\r\n/g, '\n').replace(/\r/g, '\n') + const newlineIndex = normalized.indexOf('\n') + const firstLine = newlineIndex === -1 ? normalized : normalized.slice(0, newlineIndex) + + if (firstLine !== '#' && firstLine.startsWith('# ') === false) { + return yield* new NmdInvalidDocumentError({ + ...(opts.pageId === undefined ? {} : { page_id: opts.pageId }), + message: + 'Default-mode editor buffer must start with a `# <title>` heading on line 1; ' + + 'refusing to push because silently emptying the page title is property-level data loss. ' + + 'Add a `# Title` first line (use `# ` for an untitled page).', + }) + } + + // `# foo` → title `foo`; bare `#` or `# ` → untitled (empty title). + const title = firstLine === '#' ? '' : firstLine.slice(2) + + const rest = newlineIndex === -1 ? '' : normalized.slice(newlineIndex + 1) + // A single blank separator line between the title and the body is consumed. + const body = rest.startsWith('\n') === true ? rest.slice(1) : rest + + return { title, body: body === '' ? '' : normalizeMarkdownLineEndings(body) } + }) + +/** + * The editor base hash: `sha256` over the canonical default-mode serialization + * of title + body (decisions 0002/0006). This is the value `cat` prints to + * stderr and `put` compares against `--base-hash`. A client must reproduce it + * from the canonical body the first pull returned — never recompute it locally + * over a pre-canonical editable buffer (spec "Push Strategy and Canonical Base"). + */ +export const editorBaseHash = (doc: TitleBodyDocument): Sha256Digest => + sha256Digest(serializeTitleBody(doc)) diff --git a/packages/@overeng/notion-md/src/editor-surface.unit.test.ts b/packages/@overeng/notion-md/src/editor-surface.unit.test.ts new file mode 100644 index 000000000..aa1f107ce --- /dev/null +++ b/packages/@overeng/notion-md/src/editor-surface.unit.test.ts @@ -0,0 +1,123 @@ +import { Effect } from 'effect' +import { describe, expect, it } from 'vitest' + +import { + editorBaseHash, + parseTitleBody, + serializeTitleBody, + type TitleBodyDocument, +} from './editor-surface.ts' + +const run = <A, E>(effect: Effect.Effect<A, E>): Promise<A> => Effect.runPromise(effect) + +describe('serializeTitleBody', () => { + it('titled + body: `# title` then a blank line then body', () => { + expect(serializeTitleBody({ title: 'Hello', body: 'world' })).toBe('# Hello\n\nworld\n') + }) + + it('titled + empty body: just the title line', () => { + expect(serializeTitleBody({ title: 'Hello', body: '' })).toBe('# Hello\n') + }) + + it('untitled + empty body is exactly `# \\n`', () => { + expect(serializeTitleBody({ title: '', body: '' })).toBe('# \n') + }) + + it('untitled + body keeps the empty title line', () => { + expect(serializeTitleBody({ title: '', body: 'body' })).toBe('# \n\nbody\n') + }) + + it('preserves a body that has its own leading H1 verbatim', () => { + expect(serializeTitleBody({ title: 'Title', body: '# Body Heading\n\ntext' })).toBe( + '# Title\n\n# Body Heading\n\ntext\n', + ) + }) +}) + +describe('parseTitleBody', () => { + it('splits a titled body', () => + run( + Effect.gen(function* () { + const doc = yield* parseTitleBody({ buffer: '# Hello\n\nworld\n' }) + expect(doc).toEqual({ title: 'Hello', body: 'world\n' }) + }), + )) + + it('parses an untitled page (bare `# `)', () => + run( + Effect.gen(function* () { + const doc = yield* parseTitleBody({ buffer: '# \n' }) + expect(doc).toEqual({ title: '', body: '' }) + }), + )) + + it('parses a bare `#` (no trailing space) as untitled', () => + run( + Effect.gen(function* () { + const doc = yield* parseTitleBody({ buffer: '#\n' }) + expect(doc).toEqual({ title: '', body: '' }) + }), + )) + + it('line 1 wins: a body with its own leading H1 keeps it as body content', () => + run( + Effect.gen(function* () { + const doc = yield* parseTitleBody({ buffer: '# Title\n\n# Body Heading\n\ntext\n' }) + expect(doc).toEqual({ title: 'Title', body: '# Body Heading\n\ntext\n' }) + }), + )) + + it('refuses a buffer whose line 1 is not a `# ` heading (exit 5)', () => + run( + Effect.gen(function* () { + const result = yield* Effect.either(parseTitleBody({ buffer: 'no heading\nbody\n' })) + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdInvalidDocumentError') + }), + )) + + it('refuses a Setext-style heading (line 2 underline is not line-1 `# `)', () => + run( + Effect.gen(function* () { + const result = yield* Effect.either(parseTitleBody({ buffer: 'Title\n=====\n' })) + expect(result._tag).toBe('Left') + }), + )) +}) + +describe('serialize ∘ parse round-trips (cat→put fixpoint)', () => { + const cases: ReadonlyArray<TitleBodyDocument> = [ + { title: 'Hello', body: 'world' }, + { title: 'Hello', body: '' }, + { title: '', body: '' }, + { title: '', body: 'body' }, + { title: 'Title', body: '# Body Heading\n\ntext' }, + { title: 'Multi line', body: 'line 1\n\nline 2\n\n- a\n- b' }, + { title: 'Trailing ws', body: 'text ' }, + ] + + it.each(cases)('parse(serialize(%j)) is a fixpoint', (doc) => + run( + Effect.gen(function* () { + const serialized = serializeTitleBody(doc) + const parsed = yield* parseTitleBody({ buffer: serialized }) + // Re-serializing the parsed document is byte-identical: this is the + // idempotence the cat→put round-trip relies on. + expect(serializeTitleBody(parsed)).toBe(serialized) + expect(editorBaseHash(parsed)).toBe(editorBaseHash(doc)) + }), + ), + ) +}) + +describe('editorBaseHash', () => { + it('covers title and body together (a title-only change moves the hash)', () => { + const a = editorBaseHash({ title: 'A', body: 'same' }) + const b = editorBaseHash({ title: 'B', body: 'same' }) + expect(a).not.toBe(b) + }) + + it('is a `sha256:`-prefixed digest', () => { + expect(editorBaseHash({ title: 'X', body: 'y' })).toMatch(/^sha256:[0-9a-f]{64}$/u) + }) +}) diff --git a/packages/@overeng/notion-md/src/editor-test-harness.ts b/packages/@overeng/notion-md/src/editor-test-harness.ts new file mode 100644 index 000000000..1b85f1d18 --- /dev/null +++ b/packages/@overeng/notion-md/src/editor-test-harness.ts @@ -0,0 +1,153 @@ +/** + * Shared fake-gateway test harness for the editor write path. Extracted from the + * `edit` e2e suite so `editor-edit.e2e.test.ts` and `progress.unit.test.ts` drive + * the engine through one fake instead of duplicating ~100 lines. Lives in a + * non-`.test.ts` module so vitest does not collect it as a suite. + */ +import { FileSystem } from '@effect/platform' +import { Effect, Layer } from 'effect' + +import type { BodyCompleteness } from '@overeng/notion-core' + +import { NmdGatewayError } from './errors.ts' +import { normalizeMarkdownLineEndings } from './hash.ts' +import { + NotionMdGateway, + type NotionMdGatewayShape, + type PullPageResult, + type RemotePageSnapshot, +} from './model.ts' + +/** Canonical page id used across the editor harness tests. */ +export const harnessPageId = '00000000-0000-4000-8000-000000000001' + +/** Mutable remote state the fake gateway reflects. */ +export interface FakeState { + title: string + body: string + completeness: BodyCompleteness +} + +/** Build a remote page snapshot from the fake state. */ +export const snapshot = (s: FakeState): RemotePageSnapshot => ({ + id: harnessPageId, + title: s.title, + title_property_key: 'title', + url: 'https://notion.so/page', + parent: { type: 'workspace', workspace: true }, + icon: null, + cover: null, + in_trash: false, + is_locked: false, + last_edited_time: '2026-06-15T12:00:00.000Z', + properties: {}, +}) + +/** Build a pull result from the fake state. */ +export const pull = (s: FakeState): PullPageResult => ({ + page: snapshot(s), + markdown: { + markdown: normalizeMarkdownLineEndings(s.body), + truncated: false, + unknown_block_ids: [], + completeness: s.completeness, + }, + storage: { _tag: 'self_contained', unsupported_blocks: [], files: [], comments: [] }, +}) + +/** + * In-memory Notion gateway for editor write-path tests: reflects pulls/writes + * against mutable `state`, can simulate a concurrent remote writer + * (`switchRemoteBodyAfter`), and can fail the next `updateMarkdown` + * (`failUpdateMarkdownOnce`) to exercise the write-body error path. + */ +export class FakeGateway { + readonly state: FakeState + /** When set, the remote body switches to `body` after `afterPull` pulls. */ + private switchBodyOnPull: { afterPull: number; body: string } | undefined + /** When set, the next `updateMarkdown` fails with this gateway error. */ + private failUpdateMarkdown: NmdGatewayError | undefined + pullCount = 0 + constructor(initial: { title: string; body: string; completeness?: BodyCompleteness }) { + this.state = { + title: initial.title, + body: normalizeMarkdownLineEndings(initial.body), + completeness: initial.completeness ?? { _tag: 'complete' }, + } + } + + /** Simulate a concurrent remote writer: after `afterPull` pulls, body becomes `body`. */ + // oxlint-disable-next-line overeng/named-args -- test fixture preserving the original e2e harness shape + switchRemoteBodyAfter(afterPull: number, body: string): void { + this.switchBodyOnPull = { afterPull, body: normalizeMarkdownLineEndings(body) } + } + + /** Make the next `updateMarkdown` (the body write) fail once, to test the failure stage. */ + failUpdateMarkdownOnce(operation = 'fake_update_markdown'): void { + this.failUpdateMarkdown = new NmdGatewayError({ + operation, + page_id: harnessPageId, + message: 'simulated updateMarkdown failure', + }) + } + + readonly layer = Layer.succeed(NotionMdGateway, { + pullPage: () => + Effect.sync(() => { + this.pullCount += 1 + if ( + this.switchBodyOnPull !== undefined && + this.pullCount > this.switchBodyOnPull.afterPull + ) { + this.state.body = this.switchBodyOnPull.body + this.switchBodyOnPull = undefined + } + return pull(this.state) + }), + updateMarkdown: ({ command }) => + Effect.suspend(() => { + if (this.failUpdateMarkdown !== undefined) { + const error = this.failUpdateMarkdown + this.failUpdateMarkdown = undefined + return Effect.fail(error) + } + if (command._tag === 'replace_content') { + this.state.body = normalizeMarkdownLineEndings(command.markdown) + } + return Effect.succeed({ markdown: pull(this.state).markdown }) + }), + updatePageProperties: () => Effect.dieMessage('unexpected updatePageProperties'), + retrieveDataSource: () => Effect.dieMessage('unexpected retrieveDataSource'), + updatePageMetadata: ({ metadata }) => + Effect.sync(() => { + if (metadata.title !== undefined) this.state.title = metadata.title.value + return snapshot(this.state) + }), + listChildPages: () => Effect.succeed([]), + createPage: () => Effect.dieMessage('unexpected createPage'), + movePage: () => Effect.dieMessage('unexpected movePage'), + archivePage: () => Effect.dieMessage('unexpected archivePage'), + } satisfies NotionMdGatewayShape) +} + +/* oxlint-disable overeng/named-args -- test fixtures preserve the original positional e2e harness shape */ + +/** A scripted non-interactive editor: rewrite the buffer file via a transform. */ +export const scriptedEditor = + (transform: (buffer: string) => string, exitCode = 0) => + (opts: { + readonly filePath: string + }): Effect.Effect<number, NmdGatewayError, FileSystem.FileSystem> => + Effect.gen(function* () { + const fs = yield* FileSystem.FileSystem + if (exitCode === 0) { + const buffer = yield* fs.readFileString(opts.filePath) + yield* fs.writeFileString(opts.filePath, transform(buffer)) + } + return exitCode + }).pipe( + Effect.mapError( + (cause) => + new NmdGatewayError({ operation: 'scripted_editor', message: String(cause), cause }), + ), + ) diff --git a/packages/@overeng/notion-md/src/errors.ts b/packages/@overeng/notion-md/src/errors.ts index b97ea99ff..57308e463 100644 --- a/packages/@overeng/notion-md/src/errors.ts +++ b/packages/@overeng/notion-md/src/errors.ts @@ -48,6 +48,8 @@ export class NmdGatewayError extends Schema.TaggedError<NmdGatewayError>()('NmdG operation: Schema.String, page_id: Schema.optional(Schema.String), block_id: Schema.optional(Schema.String), + /** Log-safe fingerprint of the active Notion integration token (see `notionTokenFingerprint`). */ + token_fingerprint: Schema.optional(Schema.String), message: Schema.String, cause: Schema.optional(Schema.Defect), }) {} @@ -64,6 +66,22 @@ export class NmdRemoteBodyLossyError extends Schema.TaggedError<NmdRemoteBodyLos }, ) {} +/** + * Raised when a data-source-backed page's writable property schema changed + * since the clean pull and a property write was attempted (`edit --frontmatter` + * / file `sync`; exit 6, R14, decision 0017). Distinct from the exit-7 + * value/body conflict and **not** `--force`-able — resolve by re-pulling. + */ +export class NmdSchemaDriftError extends Schema.TaggedError<NmdSchemaDriftError>()( + 'NmdSchemaDriftError', + { + page_id: Schema.String, + data_source_id: Schema.String, + path: Schema.optional(Schema.String), + message: Schema.String, + }, +) {} + /** Raised when a command needs a Notion token and none was supplied. */ export class NmdTokenMissingError extends Schema.TaggedError<NmdTokenMissingError>()( 'NmdTokenMissingError', @@ -72,6 +90,69 @@ export class NmdTokenMissingError extends Schema.TaggedError<NmdTokenMissingErro }, ) {} +/** + * Raised when `<page>` is not a valid Notion id/URL, or the page does not exist + * (editor surfaces `cat`/`put`/`edit`; exit 4). + */ +export class NmdUnresolvablePageError extends Schema.TaggedError<NmdUnresolvablePageError>()( + 'NmdUnresolvablePageError', + { + page: Schema.String, + message: Schema.String, + cause: Schema.optional(Schema.Defect), + }, +) {} + +/** + * Raised when a default-mode editor buffer is missing its leading title H1, or a + * `--frontmatter` envelope is malformed (editor surfaces; exit 5). + */ +export class NmdInvalidDocumentError extends Schema.TaggedError<NmdInvalidDocumentError>()( + 'NmdInvalidDocumentError', + { + page_id: Schema.optional(Schema.String), + message: Schema.String, + }, +) {} + +/** Raised when `$EDITOR` exits non-zero during `edit`; nothing is pushed (exit 8). */ +export class NmdEditorAbortedError extends Schema.TaggedError<NmdEditorAbortedError>()( + 'NmdEditorAbortedError', + { + page_id: Schema.String, + editor: Schema.String, + exit_code: Schema.Number, + message: Schema.String, + }, +) {} + +/** + * Raised when the post-push `semanticEquivalent` gate rejects a `put` result + * (the remote may be mutated; re-`cat`; exit 9). + */ +export class NmdPostPushGateError extends Schema.TaggedError<NmdPostPushGateError>()( + 'NmdPostPushGateError', + { + page_id: Schema.String, + message: Schema.String, + }, +) {} + +/** + * Raised when one of a `put`'s two writes (body, title) landed and the other + * failed; the page is in a mixed state (decision 0012; exit 10). + */ +export class NmdPartialWriteError extends Schema.TaggedError<NmdPartialWriteError>()( + 'NmdPartialWriteError', + { + page_id: Schema.String, + body_written: Schema.Boolean, + title_written: Schema.Boolean, + message: Schema.String, + cause: Schema.optional(Schema.Defect), + }, +) {} + /** Raised for invalid command-line arguments. */ export class NmdCliError extends Schema.TaggedError<NmdCliError>()('NmdCliError', { message: Schema.String, @@ -85,4 +166,5 @@ export type NmdError = | NmdFileSystemError | NmdGatewayError | NmdRemoteBodyLossyError + | NmdSchemaDriftError | NmdCliError diff --git a/packages/@overeng/notion-md/src/exit-codes.ts b/packages/@overeng/notion-md/src/exit-codes.ts new file mode 100644 index 000000000..5408a9b7b --- /dev/null +++ b/packages/@overeng/notion-md/src/exit-codes.ts @@ -0,0 +1,65 @@ +import { Cause, Exit } from 'effect' + +/** + * Editor-surface exit-code contract (VRS "Exit codes and error model"). + * + * Each expected tagged failure maps to a distinct process exit code so `cat` / + * `put` / `edit` are scriptable. The map is applied at the outermost runtime + * boundary (the `runMain` teardown), after every scope/finalizer has closed — + * never via `process.exit()` mid-effect, which would bypass `edit`'s temp-dir + * cleanup. + * + * | Exit | Tag | + * | ---- | ------------------------------------------- | + * | 0 | success | + * | 1 | NmdGatewayError | + * | 2 | (CLI framework — bad flags/args) | + * | 3 | NmdRemoteBodyLossyError | + * | 4 | NmdUnresolvablePageError | + * | 5 | NmdInvalidDocumentError | + * | 6 | NmdSchemaDriftError | + * | 7 | NotionMdBodyConflictError / NmdConflictError | + * | 8 | NmdEditorAbortedError | + * | 9 | NmdPostPushGateError | + * | 10 | NmdPartialWriteError | + */ +export const EDITOR_EXIT_CODES: Readonly<Record<string, number>> = { + NmdGatewayError: 1, + NmdRemoteBodyLossyError: 3, + NmdUnresolvablePageError: 4, + NmdInvalidDocumentError: 5, + // Exit 6: data-source schema drift detected by the engine `schema_snapshot` + // comparison before a property write (`edit --frontmatter` / file `sync`; + // R14, decision 0017). Distinct from the exit-7 conflict — not `--force`-able. + NmdSchemaDriftError: 6, + NotionMdBodyConflictError: 7, + NmdConflictError: 7, + NmdEditorAbortedError: 8, + NmdPostPushGateError: 9, + NmdPartialWriteError: 10, +} + +const taggedExitCode = (value: unknown): number | undefined => { + if (typeof value !== 'object' || value === null || '_tag' in value === false) return undefined + const tag = (value as { readonly _tag?: unknown })._tag + return typeof tag === 'string' ? EDITOR_EXIT_CODES[tag] : undefined +} + +/** + * Map an `Exit` to a process exit code per the editor-surface contract. + * + * Success → 0. A tagged expected failure → its mapped code. Interruption (Ctrl+C) + * → 130. Any other failure (defect, unmapped error) → 1, matching the + * framework's default "something failed" code without masking the distinct + * editor codes. + */ +export const editorExitCode = (exit: Exit.Exit<unknown, unknown>): number => { + if (Exit.isSuccess(exit) === true) return 0 + if (Cause.isInterruptedOnly(exit.cause) === true) return 130 + + for (const failure of Cause.failures(exit.cause)) { + const code = taggedExitCode(failure) + if (code !== undefined) return code + } + return 1 +} diff --git a/packages/@overeng/notion-md/src/gateway-token-fingerprint.unit.test.ts b/packages/@overeng/notion-md/src/gateway-token-fingerprint.unit.test.ts new file mode 100644 index 000000000..ec937091d --- /dev/null +++ b/packages/@overeng/notion-md/src/gateway-token-fingerprint.unit.test.ts @@ -0,0 +1,52 @@ +import { HttpClient, HttpClientResponse } from '@effect/platform' +import { Effect, Exit, Layer, Redacted } from 'effect' +import { describe, expect, it } from 'vitest' + +import { NotionConfig, notionTokenFingerprint } from '@overeng/notion-effect-client' + +import { NmdGatewayError } from './errors.ts' +import { NotionMdGatewayLive } from './live.ts' +import { NotionMdGateway } from './model.ts' + +/* Known non-secret token; the fingerprint is a hash so no bytes leak. */ +const token = Redacted.make('ntn_TESTONLYtoken') +const expectedFp = notionTokenFingerprint(token) + +/* HttpClient that always answers 404, forcing every gateway op to fail. */ +const failingHttpClient = HttpClient.make((request) => + Effect.succeed( + HttpClientResponse.fromWeb( + request, + new Response( + JSON.stringify({ object: 'error', status: 404, code: 'object_not_found', message: 'nope' }), + { status: 404, headers: { 'content-type': 'application/json' } }, + ), + ), + ), +) + +const testLayer = NotionMdGatewayLive.pipe( + Layer.provide(Layer.succeed(NotionConfig, { authToken: token, retryEnabled: false })), + Layer.provide(Layer.succeed(HttpClient.HttpClient, failingHttpClient)), +) + +describe('NmdGatewayError token fingerprint', () => { + it('carries the integration token fingerprint in both message and field', async () => { + const exit = await Effect.gen(function* () { + const gateway = yield* NotionMdGateway + return yield* gateway.pullPage({ pageId: 'made-up-page-id' }) + }).pipe(Effect.provide(testLayer), Effect.runPromiseExit) + + expect(Exit.isFailure(exit)).toBe(true) + if (Exit.isFailure(exit) === false) return + const error = exit.cause._tag === 'Fail' ? exit.cause.error : undefined + expect(error).toBeInstanceOf(NmdGatewayError) + if (error instanceof NmdGatewayError === false) return + + expect(error.token_fingerprint).toBe(expectedFp) + expect(error.message).toContain(`[integration token ${expectedFp}]`) + /* Fingerprint is a hash: no secret token bytes appear in the surfaced error. */ + expect(error.message).not.toContain('TESTONLYtoken') + expect(expectedFp).toMatch(/^ntn_…#[0-9a-f]{8}$/u) + }) +}) diff --git a/packages/@overeng/notion-md/src/hash.ts b/packages/@overeng/notion-md/src/hash.ts index f37102b79..0d27c400d 100644 --- a/packages/@overeng/notion-md/src/hash.ts +++ b/packages/@overeng/notion-md/src/hash.ts @@ -5,10 +5,15 @@ import { sha256Hex } from '@overeng/utils' * Lightweight line-ending normalizer for body hashing and on-disk storage. * * Folds CRLF/CR to LF, trims trailing whitespace, ensures a final newline. - * Block-level *canonicalization* — paragraph unwrap, GFM rules, hyphen - * bullets — lives in `canonical-markdown.ts` and is applied separately at - * the Notion wire boundary (push send + post-push compare + pull receive). - * Two functions, two responsibilities: never collapse them. + * Block-level *canonicalization* — paragraph unwrap, GFM rules, hyphen bullets, + * tight lists — is `canonicalizeBlockMarkdown` in `@overeng/notion-effect-client` + * (beside the renderer it canonicalizes), applied at BOTH Notion wire boundaries: + * pull receive (`observeFromSnapshots` canonicalizes the rendered body at the + * source) and push send + post-push compare (decision 0019). This line-ending + * normalize is a *sub-step* of that canonical form, but is also used on its own + * for on-disk / title-frame / hash-prep where a full re-canonicalization would be + * wrong (it must preserve verbatim line/substring identity). Two responsibilities, + * one a step of the other: never substitute one for the other. */ export const normalizeMarkdownLineEndings = (markdown: string): string => markdown.replace(/\r\n/g, '\n').replace(/\r/g, '\n').replace(/\s+$/u, '') + '\n' @@ -16,3 +21,25 @@ export const normalizeMarkdownLineEndings = (markdown: string): string => /** Compute the canonical body hash used by `.nmd` conflict guards. */ export const sha256Digest = (value: string): Sha256Digest => `sha256:${sha256Hex(value)}` as Sha256Digest + +/** + * Drop block-level `<page>…</page>` child anchors from a body and collapse the + * gap they leave behind. + * + * Derived `<page>` anchors are re-emitted on every push and Notion auto-appends + * them, so they are not user content: change detection must ignore them, and the + * tree engine strips them before recording the on-disk / baseline body. Besides + * filtering the anchor lines we line-ending normalize and collapse the runs of + * blank lines an anchor leaves between two blocks (`\n{3,}` → `\n\n`), so the + * result is stable whether or not an anchor sat between blocks — the single + * definition both the sync change-detection compare and the tree baseline use. + */ +export const stripChildAnchors = (body: string): string => + normalizeMarkdownLineEndings( + body + .split('\n') + .filter((line) => /^\s*<page\b[^>]*>.*<\/page>\s*$/u.test(line) === false) + .join('\n') + .replace(/\n{3,}/gu, '\n\n') + .replace(/\n+$/u, '\n'), + ) diff --git a/packages/@overeng/notion-md/src/hash.unit.test.ts b/packages/@overeng/notion-md/src/hash.unit.test.ts new file mode 100644 index 000000000..0953f9c3f --- /dev/null +++ b/packages/@overeng/notion-md/src/hash.unit.test.ts @@ -0,0 +1,22 @@ +import { describe, expect, it } from '@effect/vitest' + +import { stripChildAnchors } from './hash.ts' + +describe('stripChildAnchors', () => { + it('drops anchor lines and collapses the blank-line gap they leave', () => { + // The previously divergent case: a `<page>` anchor between two paragraphs. + // The single (rich) definition collapses the resulting `\n\n\n` to `\n\n`, + // so the stripped body is stable whether or not an anchor sat between blocks. + const body = 'Para A\n\n<page url="https://app.notion.com/p/x">Child</page>\n\nPara B\n' + expect(stripChildAnchors(body)).toBe('Para A\n\nPara B\n') + }) + + it('removes a trailing anchor block without leaving trailing blanks', () => { + const body = 'Body text\n\n<page url="https://app.notion.com/p/x">Child</page>\n' + expect(stripChildAnchors(body)).toBe('Body text\n') + }) + + it('leaves an anchor-free body unchanged aside from line-ending normalize', () => { + expect(stripChildAnchors('- a\n- b\n')).toBe('- a\n- b\n') + }) +}) diff --git a/packages/@overeng/notion-md/src/live.integration.test.ts b/packages/@overeng/notion-md/src/live.integration.test.ts index 8235e63f3..97cd8caf8 100644 --- a/packages/@overeng/notion-md/src/live.integration.test.ts +++ b/packages/@overeng/notion-md/src/live.integration.test.ts @@ -474,8 +474,12 @@ describe.skipIf(skipLive)('notion-md live integration', () => { }, ) - liveIt('guards unresolved unknown blocks when Notion exposes them', async () => { - await withScratchPage('unknown-block-guard', async (pageId) => { + // R38 / #785 (decisions 0016/0017): a not-round-trip-safe body block such as a + // bookmark renders to Markdown that Notion re-parses as a paragraph on push, so + // it is refused at the PULL (not preserved and guarded at push, which live + // testing proved silently corrupts). Refusal names the block class. + liveIt('refuses a page with a not-round-trip-safe block at pull (R38)', async () => { + await withScratchPage('not-round-trip-safe-block', async (pageId) => { await runLive( NotionBlocks.append({ blockId: pageId, @@ -489,20 +493,10 @@ describe.skipIf(skipLive)('notion-md live integration', () => { ) await withTempDir(async (dir) => { - const path = join(dir, 'unknown.nmd') - const pulled = await runLive(pullPage({ pageId, outPath: path })) - const content = await readFile(path, 'utf8') - await writeFile(path, content.replace('Initial body', 'Local body')) - - expect(pulled.storage).toBe('self_contained') - const status = await runLive(statusPage({ path })) - if (status.unresolvedUnknownBlocks.length > 0) { - await expect(runLive(pushPage({ path }))).rejects.toThrow( - 'Page contains unresolved unknown Notion blocks', - ) - } else { - await expect(runLive(pushPage({ path }))).resolves.toMatchObject({ pushed: true }) - } + const path = join(dir, 'lossy.nmd') + await expect(runLive(pullPage({ pageId, outPath: path }))).rejects.toThrow('bookmark') + // No local base is written, so no later edit can silently destroy the block. + await expect(readFile(path, 'utf8')).rejects.toThrow() }) }) }) diff --git a/packages/@overeng/notion-md/src/live.ts b/packages/@overeng/notion-md/src/live.ts index 18114bc7c..779be26ca 100644 --- a/packages/@overeng/notion-md/src/live.ts +++ b/packages/@overeng/notion-md/src/live.ts @@ -6,6 +6,8 @@ import { NotionBody, type NotionBodyObservation, NotionConfig, + NotionDataSources, + notionTokenFingerprint, NotionPages, type NmdStorage, type UpdateMarkdownOptions, @@ -127,36 +129,53 @@ const unknownPlaceholders = (markdown: string): readonly string[] => export const remoteMarkdownFromBodyObservation = ( body: NotionBodyObservation, ): RemoteMarkdownSnapshot => { + /* + * `observeFromSnapshots` renders the block tree and canonicalizes it once at + * the source (`body-observation.ts`), so `renderedMarkdown` is already the + * single canonical body form — the same bytes the evidence fingerprint, the + * fidelity classifier, hash, and push see (decision 0019, "agree by + * construction"). It is total on the pull path; a missing value is an + * invariant violation, not a recoverable state — fail as a defect rather than + * silently falling back to the endpoint Markdown (which runs headings together + * and drops inter-block blanks, the latent symptom-2 trap). + */ const renderedMarkdown = body.inventory.renderedMarkdown + if (renderedMarkdown === undefined) { + throw new Error( + `Body observation for page ${body.pageId} has no rendered Markdown; ` + + 'observeFromSnapshots must always render the block tree.', + ) + } return { - markdown: normalizeMarkdownLineEndings(renderedMarkdown ?? body.markdown.markdown), + markdown: renderedMarkdown, endpoint_markdown: normalizeMarkdownLineEndings(body.markdown.markdown), truncated: body.markdown.truncated, unknown_block_ids: body.markdown.unknownBlockIds, body_evidence: body.evidence, body_evidence_fingerprint: body.evidenceFingerprint, - completeness: - renderedMarkdown === undefined - ? { - _tag: 'lossy', - reasons: ['rendered_markdown_unavailable'], - } - : body.completeness, + completeness: body.completeness, } } const mapGatewayError = - (opts: { readonly operation: string; readonly pageId?: string; readonly blockId?: string }) => + (opts: { + readonly operation: string + readonly tokenFp: string + readonly pageId?: string + readonly blockId?: string + }) => (cause: unknown): NmdGatewayError => new NmdGatewayError({ operation: opts.operation, page_id: opts.pageId, block_id: opts.blockId, + token_fingerprint: opts.tokenFp, cause, message: - opts.pageId === undefined + (opts.pageId === undefined ? `Notion gateway operation failed: ${opts.operation}` - : `Notion gateway operation failed for page ${opts.pageId}: ${opts.operation}`, + : `Notion gateway operation failed for page ${opts.pageId}: ${opts.operation}`) + + ` [integration token ${opts.tokenFp}]`, }) const toNotionUpdateMarkdownOptions = (opts: { @@ -205,6 +224,12 @@ export const NotionMdGatewayLive = Layer.effect( Effect.gen(function* () { const config = yield* NotionConfig const client = yield* HttpClient.HttpClient + /* + * Log-safe fingerprint of the active integration token. Surfaced on every + * gateway error so a user can tell *which* credential is in use (e.g. when + * a `secrets-run` token resolves to a different integration than expected). + */ + const tokenFp = notionTokenFingerprint(config.authToken) const provideHttp = <A, E>( effect: Effect.Effect<A, E, NotionConfig | HttpClient.HttpClient>, ): Effect.Effect<A, E> => @@ -237,7 +262,7 @@ export const NotionMdGatewayLive = Layer.effect( }), } }).pipe( - Effect.mapError(mapGatewayError({ operation: 'pull_page', pageId })), + Effect.mapError(mapGatewayError({ operation: 'pull_page', tokenFp, pageId })), Observability.withOperation(Observability.GatewayPullPageSpan, { pageId }), ), updateMarkdown: ({ pageId, command, allowDeletingContent }) => @@ -273,7 +298,8 @@ export const NotionMdGatewayLive = Layer.effect( new NmdGatewayError({ operation: 'update_markdown', page_id: pageId, - message: `Notion gateway operation failed for page ${pageId}: update_markdown returned unexpected Markdown`, + token_fingerprint: tokenFp, + message: `Notion gateway operation failed for page ${pageId}: update_markdown returned unexpected Markdown [integration token ${tokenFp}]`, }), ), ), @@ -290,7 +316,7 @@ export const NotionMdGatewayLive = Layer.effect( Effect.mapError((cause) => cause instanceof NmdGatewayError ? cause - : mapGatewayError({ operation: 'update_markdown', pageId })(cause), + : mapGatewayError({ operation: 'update_markdown', tokenFp, pageId })(cause), ), Observability.withOperation(Observability.GatewayUpdateMarkdownSpan, { pageId, @@ -303,9 +329,24 @@ export const NotionMdGatewayLive = Layer.effect( updatePageProperties: ({ pageId, properties }) => provideHttp(NotionPages.update({ pageId, properties })).pipe( Effect.map(toRemotePage), - Effect.mapError(mapGatewayError({ operation: 'update_page_properties', pageId })), + Effect.mapError( + mapGatewayError({ operation: 'update_page_properties', tokenFp, pageId }), + ), Observability.withOperation(Observability.GatewayUpdatePagePropertiesSpan, { pageId }), ), + retrieveDataSource: ({ dataSourceId }) => + provideHttp(NotionDataSources.retrieve({ dataSourceId })).pipe( + Effect.map((dataSource) => ({ + dataSourceId: dataSource.id, + databaseId: + dataSource.parent.type === 'database_id' ? dataSource.parent.database_id : undefined, + properties: dataSource.properties, + })), + Effect.mapError(mapGatewayError({ operation: 'retrieve_data_source', tokenFp })), + Observability.withOperation(Observability.GatewayRetrieveDataSourceSpan, { + dataSourceId, + }), + ), updatePageMetadata: ({ pageId, metadata }) => provideHttp( NotionPages.update({ @@ -333,7 +374,7 @@ export const NotionMdGatewayLive = Layer.effect( }), ).pipe( Effect.map(toRemotePage), - Effect.mapError(mapGatewayError({ operation: 'update_page_metadata', pageId })), + Effect.mapError(mapGatewayError({ operation: 'update_page_metadata', tokenFp, pageId })), Observability.withOperation(Observability.GatewayUpdatePageMetadataSpan, { pageId, hasTitle: metadata.title !== undefined, @@ -354,7 +395,7 @@ export const NotionMdGatewayLive = Layer.effect( return childPage === undefined ? [] : [childPage] }), ), - Effect.mapError(mapGatewayError({ operation: 'list_child_pages', pageId })), + Effect.mapError(mapGatewayError({ operation: 'list_child_pages', tokenFp, pageId })), Observability.withOperation(Observability.GatewayListChildPagesSpan, { pageId }), ), createPage: ({ parentPageId, title, markdown }) => @@ -371,7 +412,9 @@ export const NotionMdGatewayLive = Layer.effect( }), ).pipe( Effect.map(toRemotePage), - Effect.mapError(mapGatewayError({ operation: 'create_page', pageId: parentPageId })), + Effect.mapError( + mapGatewayError({ operation: 'create_page', tokenFp, pageId: parentPageId }), + ), Observability.withOperation(Observability.GatewayCreatePageSpan, { parentPageId }), ), movePage: ({ pageId, parentPageId }) => @@ -379,13 +422,13 @@ export const NotionMdGatewayLive = Layer.effect( NotionPages.move({ pageId, parent: { type: 'page_id', page_id: parentPageId } }), ).pipe( Effect.map(toRemotePage), - Effect.mapError(mapGatewayError({ operation: 'move_page', pageId })), + Effect.mapError(mapGatewayError({ operation: 'move_page', tokenFp, pageId })), Observability.withOperation(Observability.GatewayMovePageSpan, { pageId }), ), archivePage: ({ pageId }) => provideHttp(NotionPages.update({ pageId, in_trash: true })).pipe( Effect.map(toRemotePage), - Effect.mapError(mapGatewayError({ operation: 'archive_page', pageId })), + Effect.mapError(mapGatewayError({ operation: 'archive_page', tokenFp, pageId })), Observability.withOperation(Observability.GatewayArchivePageSpan, { pageId }), ), } diff --git a/packages/@overeng/notion-md/src/live.unit.test.ts b/packages/@overeng/notion-md/src/live.unit.test.ts index 0fb08700b..407eac5b2 100644 --- a/packages/@overeng/notion-md/src/live.unit.test.ts +++ b/packages/@overeng/notion-md/src/live.unit.test.ts @@ -39,6 +39,14 @@ const evidenceFor = (input: { return { evidence, evidenceFingerprint: fingerprintBodyEvidence(evidence) } } +/* + * `observeFromSnapshots` now canonicalizes `renderedMarkdown` at the source + * (body-observation.ts), so these inputs are the already-canonical body and + * `remoteMarkdownFromBodyObservation` projects it through verbatim. The + * canonicalization behavior itself is locked in canonical-markdown.unit.test.ts + * and body-observation.unit.test.ts; here we assert the projection and that the + * endpoint Markdown is never adopted in place of the rendered body. + */ describe('remoteMarkdownFromBodyObservation', () => { it('adopts block-tree-rendered Markdown instead of endpoint Markdown', () => { const entries = [ @@ -70,13 +78,13 @@ describe('remoteMarkdownFromBodyObservation', () => { }, inventory: { entries, - renderedMarkdown: '## Section\n\nParagraph that the endpoint left adjacent\n\n---', + renderedMarkdown: '## Section\n\nParagraph that the endpoint left adjacent\n\n---\n', }, completeness: { _tag: 'complete' }, ...evidenceFor({ pageId: '00000000-0000-4000-8000-000000000001', endpointMarkdown: '## Section\nParagraph that the endpoint left adjacent\n---\n', - renderedMarkdown: '## Section\n\nParagraph that the endpoint left adjacent\n\n---', + renderedMarkdown: '## Section\n\nParagraph that the endpoint left adjacent\n\n---\n', entries, completeness: 'complete', }), @@ -92,38 +100,116 @@ describe('remoteMarkdownFromBodyObservation', () => { }) }) - it('fails closed when block-tree-rendered Markdown is unavailable', () => { - const entries = [] as const + it('projects the canonical (tight) list body through to the pull snapshot', () => { + const entries = [ + { + id: '00000000-0000-4000-8000-000000000002', + type: 'bulleted_list_item', + hasChildren: false, + inTrash: false, + }, + { + id: '00000000-0000-4000-8000-000000000003', + type: 'bulleted_list_item', + hasChildren: false, + inTrash: false, + }, + { + id: '00000000-0000-4000-8000-000000000004', + type: 'paragraph', + hasChildren: false, + inTrash: false, + }, + ] as const + // `observeFromSnapshots` already canonicalized the loose renderer output to + // this tight form; the projection passes it through, keeping the blank line + // before the trailing paragraph. + const renderedMarkdown = '- Bullet A\n- Bullet B\n\nA paragraph after the list.\n' const observation: NotionBodyObservation = { pageId: '00000000-0000-4000-8000-000000000001', markdown: { - markdown: 'Endpoint only', + markdown: '- Bullet A\n- Bullet B\nA paragraph after the list.\n', truncated: false, unknownBlockIds: [], }, - inventory: { + inventory: { entries, renderedMarkdown }, + completeness: { _tag: 'complete' }, + ...evidenceFor({ + pageId: '00000000-0000-4000-8000-000000000001', + endpointMarkdown: '- Bullet A\n- Bullet B\nA paragraph after the list.\n', + renderedMarkdown, entries, + completeness: 'complete', + }), + } + + expect(remoteMarkdownFromBodyObservation(observation)).toMatchObject({ + markdown: '- Bullet A\n- Bullet B\n\nA paragraph after the list.\n', + }) + }) + + it('never runs consecutive headings together on pull', () => { + // The endpoint Markdown drops inter-block blank lines (headings run + // together); the canonical rendered body must keep them blank-separated. + const entries = [ + { + id: '00000000-0000-4000-8000-000000000002', + type: 'heading_1', + hasChildren: false, + inTrash: false, }, + { + id: '00000000-0000-4000-8000-000000000003', + type: 'heading_2', + hasChildren: false, + inTrash: false, + }, + { + id: '00000000-0000-4000-8000-000000000004', + type: 'heading_3', + hasChildren: false, + inTrash: false, + }, + ] as const + const renderedMarkdown = '# H1\n\n## H2\n\n### H3\n' + const observation: NotionBodyObservation = { + pageId: '00000000-0000-4000-8000-000000000001', + // Endpoint shape with headings run together — must NOT leak through. + markdown: { markdown: '# H1\n## H2\n### H3\n', truncated: false, unknownBlockIds: [] }, + inventory: { entries, renderedMarkdown }, completeness: { _tag: 'complete' }, ...evidenceFor({ pageId: '00000000-0000-4000-8000-000000000001', - endpointMarkdown: 'Endpoint only', - renderedMarkdown: '', + endpointMarkdown: '# H1\n## H2\n### H3\n', + renderedMarkdown, entries, completeness: 'complete', }), } expect(remoteMarkdownFromBodyObservation(observation)).toMatchObject({ - markdown: 'Endpoint only\n', - endpoint_markdown: 'Endpoint only\n', - truncated: false, - unknown_block_ids: [], - body_evidence_fingerprint: observation.evidenceFingerprint, - completeness: { - _tag: 'lossy', - reasons: ['rendered_markdown_unavailable'], - }, + markdown: '# H1\n\n## H2\n\n### H3\n', }) }) + + it('throws an invariant defect when block-tree-rendered Markdown is unavailable', () => { + const entries = [] as const + const observation: NotionBodyObservation = { + pageId: '00000000-0000-4000-8000-000000000001', + markdown: { markdown: 'Endpoint only', truncated: false, unknownBlockIds: [] }, + inventory: { entries }, + completeness: { _tag: 'complete' }, + ...evidenceFor({ + pageId: '00000000-0000-4000-8000-000000000001', + endpointMarkdown: 'Endpoint only', + renderedMarkdown: '', + entries, + completeness: 'complete', + }), + } + + expect(() => remoteMarkdownFromBodyObservation(observation)).toThrow( + /has no rendered Markdown/u, + ) + }) }) diff --git a/packages/@overeng/notion-md/src/merge.test.ts b/packages/@overeng/notion-md/src/merge.test.ts index cd41126f4..1604ee803 100644 --- a/packages/@overeng/notion-md/src/merge.test.ts +++ b/packages/@overeng/notion-md/src/merge.test.ts @@ -1,6 +1,7 @@ import { describe, expect, it } from '@effect/vitest' import * as fc from 'effect/FastCheck' +import { canonicalizeBlockMarkdown } from './canonical-markdown.ts' import { normalizeMarkdownLineEndings } from './hash.ts' import { planMarkdownUpdate, tryMergeMarkdownBodies } from './merge.ts' @@ -94,6 +95,28 @@ describe('notion-md merge planning', () => { }) }) + it('plans an update over a canonical base/remote and reconstructs the desired body', () => { + // Post-consolidation, base and remote come from pull → they are already the + // canonical (tight-list) form. The user's desired buffer is raw. The plan's + // `oldStr`/`newStr` are deliberately raw substrings Notion matches verbatim, + // so `desired` is NOT canonicalized — applying the plan to the canonical + // remote must reconstruct exactly the desired body (decision 0019, §3.3). + const base = canonicalizeBlockMarkdown('# Notes\n\n- alpha\n\n- beta\n') + const remote = base + expect(base).toBe('# Notes\n\n- alpha\n- beta\n') // canonical: tight list + const desired = '# Notes\n\n- alpha\n- gamma\n' + + const command = planMarkdownUpdate({ baseBody: base, remoteBody: remote, desiredBody: desired }) + expect(command).toEqual({ + _tag: 'update_content', + contentUpdates: [{ oldStr: 'bet', newStr: 'gamm' }], + expectedMarkdown: '# Notes\n\n- alpha\n- gamma\n', + }) + expect(applyMarkdownUpdate({ baseBody: base, remoteBody: remote, desiredBody: desired })).toBe( + normalizeMarkdownLineEndings(desired), + ) + }) + it.prop( 'keeps local body when remote equals the base snapshot', [fc.string({ maxLength: 80 }), fc.string({ maxLength: 80 })], diff --git a/packages/@overeng/notion-md/src/mod.ts b/packages/@overeng/notion-md/src/mod.ts index 66ffd6f80..58b605c4c 100644 --- a/packages/@overeng/notion-md/src/mod.ts +++ b/packages/@overeng/notion-md/src/mod.ts @@ -1,13 +1,33 @@ export { NmdCliError, NmdConflictError, + NmdEditorAbortedError, NmdFileSystemError, NmdFrontmatterError, NmdGatewayError, + NmdInvalidDocumentError, NmdObjectStoreError, + NmdPartialWriteError, + NmdPostPushGateError, + NmdRemoteBodyLossyError, + NmdSchemaDriftError, NmdTokenMissingError, + NmdUnresolvablePageError, } from './errors.ts' export type { NmdError } from './errors.ts' +export { catEditorPage, editEditorPage, putEditorPage } from './editor-commands.ts' +export type { + CatOptions, + CatResult, + EditOptions, + EditorMode, + EditResult, + PutOptions, + PutResult, +} from './editor-commands.ts' +export { editorBaseHash, parseTitleBody, serializeTitleBody } from './editor-surface.ts' +export type { TitleBodyDocument } from './editor-surface.ts' +export { EDITOR_EXIT_CODES, editorExitCode } from './exit-codes.ts' export { parseNmdFile, renderNmdFile } from './frontmatter.ts' export type { ParsedNmdFile } from './frontmatter.ts' export { normalizeMarkdownLineEndings, sha256Digest } from './hash.ts' @@ -56,7 +76,13 @@ export type { SyncPathOptions, SyncPathResult, } from './path.ts' -export { pullPage, statusPage, syncPage } from './sync.ts' +export { + buildFrontmatterV2, + pullPage, + statusPage, + syncPage, + syncPageReplacingBody, +} from './sync.ts' export type { PullOptions, PullResult, @@ -92,12 +118,15 @@ export { materializeBody, NotionMdBodyConflictError, observeRemoteBody, + observeRemoteEditorPage, readLocalBody, + replaceRemoteBodyForced, replaceRemoteBodyVerified, settleVerifiedBodyPush, } from './body-facade.ts' export type { NotionMdBodySnapshot, + NotionMdEditorSnapshot, NotionMdLocalBodySnapshot, NotionMdMaterializedBody, NotionMdSettledBodyPush, diff --git a/packages/@overeng/notion-md/src/model.ts b/packages/@overeng/notion-md/src/model.ts index df0df8108..9dcfe38ef 100644 --- a/packages/@overeng/notion-md/src/model.ts +++ b/packages/@overeng/notion-md/src/model.ts @@ -63,6 +63,18 @@ export interface RemoteMarkdownSnapshot { readonly body_evidence_fingerprint?: BodyEvidenceFingerprint } +/** + * Live property schema of a Notion data source, retrieved for schema-drift + * detection on data-source-backed pages (decision 0017, R14). In API + * 2026-03-11 the property schema lives on the data source, not the database. + */ +export interface RemoteDataSourceSchema { + readonly dataSourceId: string + readonly databaseId: string | undefined + /** Raw property definitions keyed by display name (`{ id, name, type, … }`). */ + readonly properties: Record<string, unknown> +} + /** Complete remote page snapshot used by the sync engine. */ export interface PullPageResult { readonly page: RemotePageSnapshot @@ -135,6 +147,15 @@ export interface NotionMdGatewayShape { readonly pageId: string readonly properties: Record<string, unknown> }) => Effect.Effect<RemotePageSnapshot, NmdGatewayError> + /** + * Retrieve a data source's live property schema for schema-drift detection + * (`GET /v1/data_sources/{id}`; decision 0017, R14). Used only for + * data-source-backed pages at pull (to capture the `schema_snapshot`) and at + * push (to recompute it before a property write). + */ + readonly retrieveDataSource: (opts: { + readonly dataSourceId: string + }) => Effect.Effect<RemoteDataSourceSchema, NmdGatewayError> readonly updatePageMetadata: (opts: { readonly pageId: string readonly metadata: PageMetadataUpdate diff --git a/packages/@overeng/notion-md/src/observability.ts b/packages/@overeng/notion-md/src/observability.ts index 283538577..78829ca2c 100644 --- a/packages/@overeng/notion-md/src/observability.ts +++ b/packages/@overeng/notion-md/src/observability.ts @@ -22,6 +22,12 @@ export const parentPageAttrs = OtelAttrs.defineSync( }), ) +export const dataSourceAttrs = OtelAttrs.defineSync( + Schema.Struct({ + dataSourceId: Schema.String.pipe(OtelAttr.key({ key: 'notion_md.data_source_id' })), + }), +) + export const pathAttrs = OtelAttrs.defineSync( Schema.Struct({ basename: Schema.String.pipe(OtelAttr.key({ key: 'notion_md.path.basename' })), @@ -280,6 +286,12 @@ export const GatewayUpdatePagePropertiesSpan = OtelOperation.define({ label: ({ pageId }) => pageId.slice(0, 8), }) +export const GatewayRetrieveDataSourceSpan = OtelOperation.define({ + name: 'notion-md.gateway.retrieve-data-source', + attributes: dataSourceAttrs, + label: ({ dataSourceId }) => dataSourceId.slice(0, 8), +}) + export const GatewayUpdatePageMetadataSpan = OtelOperation.define({ name: 'notion-md.gateway.update-page-metadata', attributes: metadataUpdateAttrs, @@ -310,6 +322,58 @@ export const GatewayArchivePageSpan = OtelOperation.define({ label: ({ pageId }) => pageId.slice(0, 8), }) +const editorModeSchema = Schema.Literal('default', 'frontmatter') + +/** Span for `notion-md cat` — read-only editor projection of a Notion page. */ +export const CatSpan = OtelOperation.define({ + name: 'notion-md.cat', + root: true, + schema: Schema.Struct({ + pageId: Schema.String.pipe(OtelAttr.key({ key: 'notion_md.page_id' })), + mode: editorModeSchema.pipe(OtelAttr.key({ key: 'notion_md.editor.mode' })), + }), + label: ({ pageId }) => pageId.slice(0, 8), +}) + +/** Span for `notion-md put` — guarded title+body write to a Notion page. */ +export const PutSpan = OtelOperation.define({ + name: 'notion-md.put', + root: true, + schema: Schema.Struct({ + pageId: Schema.String.pipe(OtelAttr.key({ key: 'notion_md.page_id' })), + force: Schema.Boolean.pipe(OtelAttr.key({ key: 'notion_md.put.force' })), + }), + label: ({ pageId }) => pageId.slice(0, 8), +}) + +/** Result annotations for a completed `notion-md put`. */ +export const putResultAttrs = OtelAttrs.defineSync( + Schema.Struct({ + bodyWritten: Schema.Boolean.pipe(OtelAttr.key({ key: 'notion_md.put.body_written' })), + titleWritten: Schema.Boolean.pipe(OtelAttr.key({ key: 'notion_md.put.title_written' })), + }), +) + +/** Span for `notion-md edit` — ephemeral file-engine editor session (wraps engine spans). */ +export const EditSpan = OtelOperation.define({ + name: 'notion-md.edit', + root: true, + schema: Schema.Struct({ + pageId: Schema.String.pipe(OtelAttr.key({ key: 'notion_md.page_id' })), + mode: editorModeSchema.pipe(OtelAttr.key({ key: 'notion_md.editor.mode' })), + }), + label: ({ pageId }) => pageId.slice(0, 8), +}) + +/** Outcome annotation for a completed `notion-md edit` session. */ +export const editResultAttrs = OtelAttrs.defineSync( + Schema.Struct({ + outcome: Schema.Literal('pushed', 'noop', 'aborted', 'conflict', 'read-only').pipe( + OtelAttr.key({ key: 'notion_md.edit.outcome' }), + ), + }), +) + export const page = (pageId: string) => GatewayPullPageSpan.encodeSync({ pageId }) export const parentPage = (parentPageId: string) => diff --git a/packages/@overeng/notion-md/src/progress.ts b/packages/@overeng/notion-md/src/progress.ts new file mode 100644 index 000000000..7d956ab06 --- /dev/null +++ b/packages/@overeng/notion-md/src/progress.ts @@ -0,0 +1,129 @@ +import { Context, Effect, Layer, Option } from 'effect' + +/** + * Stable, CLI-facing stage vocabulary for a write-path sync (R43). These ids are + * a presentation contract distinct from the OTEL span names (which stay on the + * `notion_md.*` namespace, decision 0018); a stage may map to several spans or + * none. + */ +export type ProgressStageId = 'observe' | 'write-body' | 'write-title' | 'settle' + +/** A single staged transition the engine emits to the reporter. */ +export interface ProgressStage { + readonly id: ProgressStageId + readonly label: string + readonly message?: string +} + +/** + * The render seam (R45): the engine emits purpose-tagged stage transitions; the + * provided Layer decides whether/how to render. Every method returns + * `Effect.Effect<void>` so a renderer can do I/O, and the no-op/absent Layer + * makes the whole surface zero-cost and behavior-neutral. + */ +export interface ProgressReporterShape { + readonly stageActive: (stage: ProgressStage) => Effect.Effect<void> + readonly stageSucceed: (stage: ProgressStage) => Effect.Effect<void> + readonly stageSkip: (stage: ProgressStage) => Effect.Effect<void> + readonly stageFail: (stage: ProgressStage) => Effect.Effect<void> + /** Free-form stderr note for the "+ warn" outcomes (drift auto-merge / conflict). */ + readonly note: (message: string) => Effect.Effect<void> +} + +/** Render seam for staged write-path sync progress (decision 0018, R43–R45). */ +export class ProgressReporter extends Context.Tag('ProgressReporter')< + ProgressReporter, + ProgressReporterShape +>() {} + +/** + * Emit a reporter call without adding to the engine's `R`, and swallowing ALL + * failures and defects (R45): a render glitch can never change a result or exit + * code. `Effect.serviceOption` requires nothing in `R` (returns + * `Option<Service>`), so the engine stays render-agnostic; when no Layer is + * provided the emit is a silent no-op. + */ +const emit = (f: (r: ProgressReporterShape) => Effect.Effect<void>): Effect.Effect<void> => + Effect.serviceOption(ProgressReporter).pipe( + Effect.flatMap(Option.match({ onNone: () => Effect.void, onSome: f })), + Effect.catchAllCause(() => Effect.void), + ) + +/** Emit an `active` transition for a stage. */ +export const reportStageActive = (stage: ProgressStage): Effect.Effect<void> => + emit((r) => r.stageActive(stage)) + +/** Emit a `succeed` transition for a stage. */ +export const reportStageSucceed = (stage: ProgressStage): Effect.Effect<void> => + emit((r) => r.stageSucceed(stage)) + +/** Emit a `skip` transition for a stage (a no-op step the user should still see). */ +export const reportStageSkip = (stage: ProgressStage): Effect.Effect<void> => + emit((r) => r.stageSkip(stage)) + +/** Emit a `fail` transition for a stage. */ +export const reportStageFail = (stage: ProgressStage): Effect.Effect<void> => + emit((r) => r.stageFail(stage)) + +/** Emit a free-form stderr note (the drift auto-merge / conflict warnings). */ +export const reportNote = (message: string): Effect.Effect<void> => emit((r) => r.note(message)) + +/** + * Wrap an effect as a single stage: emit `active` before it runs, `succeed` + * (with the optional done message) on success, and `fail` on error. Preserves + * the wrapped effect's `A`/`E`/`R` exactly — the emit helpers are + * `Effect<void, never, never>`, so the engine's signatures do not change. + */ +// oxlint-disable-next-line overeng/named-args -- data-last Effect combinator (stage config + wrapped effect) +export const withStage = <A, E, R>( + stage: { readonly id: ProgressStageId; readonly label: string; readonly doneMessage?: string }, + eff: Effect.Effect<A, E, R>, +): Effect.Effect<A, E, R> => + reportStageActive(stage).pipe( + Effect.zipRight(eff), + Effect.tap(() => + reportStageSucceed( + stage.doneMessage === undefined ? stage : { ...stage, message: stage.doneMessage }, + ), + ), + Effect.tapErrorCause(() => reportStageFail(stage)), + ) + +/** Write a single line to stderr (no animated control sequences). */ +const writeLine = (line: string): Effect.Effect<void> => + Effect.sync(() => void process.stderr.write(`${line}\n`)) + +/** + * Live stderr-line renderer (decision 0018 "static line" rung): tasteful + * sequential lines, no cursor control. Deliberately NOT the animated + * `@overeng/tui-react` `TaskList` — `edit` returns from a full-screen editor + * that owned the TTY, and a mounting TUI would fight the terminal; sequential + * lines also sidestep the #787 module-load TDZ (no `createTuiApp`). The + * `ProgressReporter` Tag seam lets the animated `TaskList` Layer drop in later + * with no engine re-touch. + */ +export const ProgressReporterStderrLines: Layer.Layer<ProgressReporter> = Layer.succeed( + ProgressReporter, + { + stageActive: (stage) => writeLine(` · ${stage.label} …`), + stageSucceed: (stage) => + writeLine( + stage.message === undefined ? ` ✓ ${stage.label}` : ` ✓ ${stage.label} ${stage.message}`, + ), + stageSkip: (stage) => writeLine(` · ${stage.label} (skipped)`), + stageFail: (stage) => writeLine(` ✗ ${stage.label}`), + note: (message) => writeLine(`note: ${message}`), + } satisfies ProgressReporterShape, +) + +/** + * Explicit no-op Layer for tests/clarity. (Absence of a Layer already no-ops via + * `serviceOption`; this just makes the intent provable.) + */ +export const ProgressReporterNoop: Layer.Layer<ProgressReporter> = Layer.succeed(ProgressReporter, { + stageActive: () => Effect.void, + stageSucceed: () => Effect.void, + stageSkip: () => Effect.void, + stageFail: () => Effect.void, + note: () => Effect.void, +} satisfies ProgressReporterShape) diff --git a/packages/@overeng/notion-md/src/progress.unit.test.ts b/packages/@overeng/notion-md/src/progress.unit.test.ts new file mode 100644 index 000000000..ada3164d4 --- /dev/null +++ b/packages/@overeng/notion-md/src/progress.unit.test.ts @@ -0,0 +1,236 @@ +import { mkdtempSync, rmSync } from 'node:fs' +import { tmpdir } from 'node:os' +import { join } from 'node:path' + +import { NodeContext } from '@effect/platform-node' +import { Effect, Layer } from 'effect' +import { describe, expect, it } from 'vitest' + +import { editEditorPage } from './editor-commands.ts' +import { FakeGateway, harnessPageId as pageId, scriptedEditor } from './editor-test-harness.ts' +import type { NotionMdGateway } from './model.ts' +import { ProgressReporter, type ProgressReporterShape, type ProgressStage } from './progress.ts' +import { NmdStateStoreLive, type NmdStateStore } from './state-store.ts' + +const stateStoreLayer = NmdStateStoreLive.pipe(Layer.provide(NodeContext.layer)) + +/** A captured progress event: the method that fired and the stage/note payload. */ +type ProgressEvent = + | { readonly kind: 'active' | 'succeed' | 'skip' | 'fail'; readonly stage: ProgressStage } + | { readonly kind: 'note'; readonly message: string } + +/** A `ProgressReporter` Layer that records every emitted transition for assertions. */ +const capturingLayer = (sink: ProgressEvent[]): Layer.Layer<ProgressReporter> => + Layer.succeed(ProgressReporter, { + stageActive: (stage) => Effect.sync(() => void sink.push({ kind: 'active', stage })), + stageSucceed: (stage) => Effect.sync(() => void sink.push({ kind: 'succeed', stage })), + stageSkip: (stage) => Effect.sync(() => void sink.push({ kind: 'skip', stage })), + stageFail: (stage) => Effect.sync(() => void sink.push({ kind: 'fail', stage })), + note: (message) => Effect.sync(() => void sink.push({ kind: 'note', message })), + } satisfies ProgressReporterShape) + +/** A `ProgressReporter` Layer whose every method defects/fails — proves emit swallows it. */ +const hostileLayer: Layer.Layer<ProgressReporter> = Layer.succeed(ProgressReporter, { + stageActive: () => Effect.die(new Error('hostile stageActive')), + stageSucceed: () => Effect.fail(new Error('hostile stageSucceed') as never), + stageSkip: () => Effect.die(new Error('hostile stageSkip')), + stageFail: () => Effect.die(new Error('hostile stageFail')), + note: () => Effect.die(new Error('hostile note')), +} satisfies ProgressReporterShape) + +const runEdit = <A, E>( + effect: Effect.Effect<A, E, NotionMdGateway | NmdStateStore | NodeContext.NodeContext>, + gateway: FakeGateway, + progressLayer?: Layer.Layer<ProgressReporter>, +) => { + const base = Layer.mergeAll(gateway.layer, stateStoreLayer, NodeContext.layer) + const layer = progressLayer === undefined ? base : Layer.merge(base, progressLayer) + return Effect.either(effect).pipe( + Effect.provide(layer as Layer.Layer<NotionMdGateway | NmdStateStore | NodeContext.NodeContext>), + Effect.runPromise, + ) +} + +const ids = (events: ProgressEvent[]): string[] => + events.map((e) => (e.kind === 'note' ? `note` : `${e.stage.id}:${e.kind}`)) + +describe('progress (staged write-path sync indicator)', () => { + it('R45: the edit push result is identical with no / capturing / hostile reporter', async () => { + const run = (progressLayer?: Layer.Layer<ProgressReporter>) => { + const gateway = new FakeGateway({ title: 'Doc', body: 'original line' }) + return runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((b) => b.replace('original line', 'edited line')), + }), + gateway, + progressLayer, + ).then((result) => ({ result, body: gateway.state.body })) + } + + const captured: ProgressEvent[] = [] + const none = await run(undefined) + const capturing = await run(capturingLayer(captured)) + const hostile = await run(hostileLayer) + + // Byte-identical outcome + remote effect across all three reporter wirings. + for (const r of [none, capturing, hostile]) { + expect(r.result._tag).toBe('Right') + if (r.result._tag === 'Right') { + expect(r.result.right).toEqual({ pageId, outcome: 'pushed' }) + } + expect(r.body).toBe('edited line\n') + } + // The hostile reporter (die/fail) never surfaced as a defect or changed the result. + expect(captured.length).toBeGreaterThan(0) + }) + + it('emits observe → write-body → write-title → settle for a changed-buffer push', async () => { + const gateway = new FakeGateway({ title: 'Doc', body: 'original line' }) + const events: ProgressEvent[] = [] + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((b) => b.replace('original line', 'edited line')), + }), + gateway, + capturingLayer(events), + ) + expect(result._tag).toBe('Right') + // Body-only edit: title is unchanged, so write-title is a skip (no active/succeed). + expect(ids(events)).toEqual([ + 'observe:active', + 'observe:succeed', + 'write-body:active', + 'write-body:succeed', + 'write-title:skip', + 'settle:active', + 'settle:succeed', + ]) + }) + + it('emits write-title active+succeed when the title also changed', async () => { + const gateway = new FakeGateway({ title: 'Old', body: 'original line' }) + const events: ProgressEvent[] = [] + await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((b) => + b.replace('# Old', '# New').replace('original line', 'edited line'), + ), + }), + gateway, + capturingLayer(events), + ) + expect(ids(events)).toEqual([ + 'observe:active', + 'observe:succeed', + 'write-body:active', + 'write-body:succeed', + 'write-title:active', + 'write-title:succeed', + 'settle:active', + 'settle:succeed', + ]) + }) + + it('warn: a remote-changed-but-auto-mergeable edit emits the auto-merge note', async () => { + // Two-line base; the editor changes line 1, a concurrent remote writer changes + // line 2 (a disjoint hunk) after the ephemeral pull → a clean 3-way auto-merge. + const gateway = new FakeGateway({ title: 'Doc', body: 'line one\nline two' }) + gateway.switchRemoteBodyAfter(1, 'line one\nremote line two') + const events: ProgressEvent[] = [] + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((b) => b.replace('line one', 'local line one')), + }), + gateway, + capturingLayer(events), + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right).toEqual({ pageId, outcome: 'pushed' }) + const notes = events.filter((e) => e.kind === 'note') + expect(notes).toHaveLength(1) + if (notes[0]?.kind === 'note') { + expect(notes[0].message).toContain('auto-merged') + } + // The auto-merge body landed both hunks. + expect(gateway.state.body).toBe('local line one\nremote line two\n') + }) + + it('warn: a conflicting edit emits the conflict note and returns the exit-7 conflict outcome', async () => { + const gateway = new FakeGateway({ title: 'Doc', body: 'the original line' }) + // Remote changes the SAME line the editor edits → unmergeable → conflict. + gateway.switchRemoteBodyAfter(1, 'a totally different remote line') + const events: ProgressEvent[] = [] + // Run in a throwaway cwd: the durable `<page>.conflict.md` is written + // relative to the process cwd (would otherwise land in the package root). + const cwd = mkdtempSync(join(tmpdir(), 'notion-md-progress-conflict-')) + const previousCwd = process.cwd() + process.chdir(cwd) + try { + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((b) => + b.replace('the original line', 'my local edit of that line'), + ), + }), + gateway, + capturingLayer(events), + ) + expect(result._tag).toBe('Right') + if (result._tag === 'Right') expect(result.right.outcome).toBe('conflict') + } finally { + process.chdir(previousCwd) + rmSync(cwd, { recursive: true, force: true }) + } + const notes = events.filter((e) => e.kind === 'note') + expect(notes).toHaveLength(1) + if (notes[0]?.kind === 'note') { + expect(notes[0].message).toContain('conflict draft') + expect(notes[0].message).toContain(`${pageId}.conflict.md`) + } + // Conflict branch never reaches a body write → no write-body stage. + expect(ids(events).some((id) => id.startsWith('write-body'))).toBe(false) + }) + + it('emits write-body active+fail and propagates the original error when the body write fails', async () => { + const gateway = new FakeGateway({ title: 'Doc', body: 'original line' }) + gateway.failUpdateMarkdownOnce() + const events: ProgressEvent[] = [] + const result = await runEdit( + editEditorPage({ + pageId, + mode: 'default', + pageRef: pageId, + runEditor: scriptedEditor((b) => b.replace('original line', 'edited line')), + }), + gateway, + capturingLayer(events), + ) + // The wrapped stage failed: the engine error still propagates (Left), proving + // `withStage`'s fail branch is observation-only and never swallows the error. + expect(result._tag).toBe('Left') + if (result._tag === 'Left') expect(result.left._tag).toBe('NmdGatewayError') + expect(ids(events)).toEqual([ + 'observe:active', + 'observe:succeed', + 'write-body:active', + 'write-body:fail', + ]) + // The body write never landed. + expect(gateway.state.body).toBe('original line\n') + }) +}) diff --git a/packages/@overeng/notion-md/src/schema-snapshot.ts b/packages/@overeng/notion-md/src/schema-snapshot.ts new file mode 100644 index 000000000..99d32f293 --- /dev/null +++ b/packages/@overeng/notion-md/src/schema-snapshot.ts @@ -0,0 +1,125 @@ +import { propertyWriteClassFromType } from '@overeng/notion-core' +import type { Sha256Digest } from '@overeng/notion-effect-client' + +import { sha256Digest } from './hash.ts' + +/** + * Schema-drift detection for data-source-backed pages (decision 0017, R14). + * + * The engine captures a `schema_snapshot` of the parent data source's + * **writable** property schema at pull (into the sidecar `data_source` + * binding) and recomputes it before a property write at push. On drift it + * refuses with `NmdSchemaDriftError` (exit 6) rather than risk Notion silently + * auto-creating a select option for an unknown value name. + * + * Why a writable-only, name-based projection (decision 0017, live-verified): + * - A computed-only schema change cannot affect a property *write*, so hashing + * only the writable subset keeps the guard from over-refusing on benign edits + * to formulas/rollups/etc. (consistent with the writable-projection guard, + * decision 0006). + * - Hash names, not ids: a rename is id-preserving, so id-hashing would + * silently miss a rename — exactly the drift that corrupts a value write. + * - Options are hashed for `select` / `multi_select` / `status` only, by name + * and sorted, because an unknown option name is silently auto-created on + * write (HTTP 200) — the one drift Notion does not reject on its own. + * - Excluded from the hash: property ids, option ids, colors, descriptions, + * status groups, all timestamps, `created_by` / `last_edited_by`, + * `request_id`, title/url/is_inline/is_locked — none affects a write and all + * are volatile across otherwise-unchanged reads. + */ + +/** Property types whose option set participates in the schema snapshot. */ +const OPTION_PROPERTY_TYPES = new Set(['select', 'multi_select', 'status']) + +/** One writable property projected to its drift-sensitive structure. */ +interface WritableSchemaEntry { + readonly name: string + readonly type: string + /** Sorted option names for select/multi_select/status; `null` otherwise. */ + readonly options: readonly string[] | null +} + +const propertyDefType = (def: unknown): string | undefined => { + if (typeof def !== 'object' || def === null || 'type' in def === false) return undefined + const type = (def as { readonly type?: unknown }).type + return typeof type === 'string' ? type : undefined +} + +const propertyDefId = (def: unknown): string | undefined => { + if (typeof def !== 'object' || def === null || 'id' in def === false) return undefined + const id = (def as { readonly id?: unknown }).id + return typeof id === 'string' ? id : undefined +} + +/** Extract the sorted option *names* for an option-bearing property definition. */ +const optionNames = (opts: { readonly def: unknown; readonly type: string }): readonly string[] => { + const { def, type } = opts + if (typeof def !== 'object' || def === null) return [] + const config = (def as Record<string, unknown>)[type] + if (typeof config !== 'object' || config === null || 'options' in config === false) return [] + const options = (config as { readonly options?: unknown }).options + if (Array.isArray(options) === false) return [] + return options + .map((option) => + typeof option === 'object' && option !== null && 'name' in option === true + ? (option as { readonly name?: unknown }).name + : undefined, + ) + .filter((name): name is string => typeof name === 'string') + .toSorted() +} + +/** + * Project a data source's property schema to its writable, drift-sensitive + * canonical form: `{ name, type, sorted option names }` entries for writable + * properties only, sorted by property name. + */ +export const canonicalWritableSchema = ( + properties: Record<string, unknown>, +): readonly WritableSchemaEntry[] => + Object.entries(properties) + .flatMap(([name, def]): readonly WritableSchemaEntry[] => { + const type = propertyDefType(def) + if (type === undefined || propertyWriteClassFromType(type) !== 'writable') return [] + return [ + { + name, + type, + options: OPTION_PROPERTY_TYPES.has(type) === true ? optionNames({ def, type }) : null, + }, + ] + }) + .toSorted((a, b) => (a.name < b.name ? -1 : a.name > b.name ? 1 : 0)) + +/** + * Hash a data source's writable property schema into the prefixed + * `Sha256Digest` stored as the sidecar `data_source.schema_hash`. + */ +export const writableSchemaHash = (properties: Record<string, unknown>): Sha256Digest => + sha256Digest(JSON.stringify(canonicalWritableSchema(properties))) + +/** All property name → id pairs, preserved for write targeting and diagnostics. */ +export const propertyIdMap = (properties: Record<string, unknown>): Record<string, string> => + Object.fromEntries( + Object.entries(properties).flatMap(([name, def]) => { + const id = propertyDefId(def) + return id === undefined ? [] : [[name, id]] + }), + ) + +/** Name of the title property (`type === 'title'`), used for the sidecar binding. */ +export const titlePropertyName = (properties: Record<string, unknown>): string | undefined => { + for (const [name, def] of Object.entries(properties)) { + if (propertyDefType(def) === 'title') return name + } + return undefined +} + +/** Property names that cannot be written back (computed / unsupported). */ +export const readOnlyPropertyNames = (properties: Record<string, unknown>): readonly string[] => + Object.entries(properties) + .flatMap(([name, def]) => { + const type = propertyDefType(def) + return type === undefined || propertyWriteClassFromType(type) === 'writable' ? [] : [name] + }) + .toSorted() diff --git a/packages/@overeng/notion-md/src/schema-snapshot.unit.test.ts b/packages/@overeng/notion-md/src/schema-snapshot.unit.test.ts new file mode 100644 index 000000000..2902401a6 --- /dev/null +++ b/packages/@overeng/notion-md/src/schema-snapshot.unit.test.ts @@ -0,0 +1,151 @@ +import { describe, expect, it } from '@effect/vitest' + +import { + canonicalWritableSchema, + propertyIdMap, + readOnlyPropertyNames, + titlePropertyName, + writableSchemaHash, +} from './schema-snapshot.ts' + +/** + * Reference property schema modeled on a live `GET /v1/data_sources/{id}` + * response: each property carries `{ id, name, type, [type]: config }`, options + * carry `{ id, name, color }`. Mixes writable, computed, and unsupported types + * so the writable-subset projection is exercised end to end. + */ +const baseSchema = (): Record<string, unknown> => ({ + Name: { id: 'title', name: 'Name', type: 'title', title: {} }, + Notes: { id: '%5CI%7DT', name: 'Notes', type: 'rich_text', rich_text: {} }, + Count: { id: 'm%3FIX', name: 'Count', type: 'number', number: { format: 'number' } }, + Done: { id: 'vV%3AO', name: 'Done', type: 'checkbox', checkbox: {} }, + Priority: { + id: 'm%7Bm%3C', + name: 'Priority', + type: 'select', + select: { + options: [ + { id: 'opt-low', name: 'Low', color: 'gray' }, + { id: 'opt-high', name: 'High', color: 'red' }, + ], + }, + }, + Tags: { + id: 'AY%3BG', + name: 'Tags', + type: 'multi_select', + multi_select: { + options: [ + { id: 'tag-a', name: 'a', color: 'blue' }, + { id: 'tag-b', name: 'b', color: 'green' }, + ], + }, + }, + // computed — excluded from the writable projection + Ref: { id: 'MQMg', name: 'Ref', type: 'unique_id', unique_id: { prefix: 'FP' } }, + Score: { id: 'NCs%7B', name: 'Score', type: 'formula', formula: { expression: '1 + 1' } }, +}) + +describe('canonicalWritableSchema', () => { + it('projects only writable properties, sorted by name, options by name', () => { + expect(canonicalWritableSchema(baseSchema())).toEqual([ + { name: 'Count', type: 'number', options: null }, + { name: 'Done', type: 'checkbox', options: null }, + { name: 'Name', type: 'title', options: null }, + { name: 'Notes', type: 'rich_text', options: null }, + { name: 'Priority', type: 'select', options: ['High', 'Low'] }, + { name: 'Tags', type: 'multi_select', options: ['a', 'b'] }, + ]) + }) + + it('excludes computed and unsupported properties', () => { + const names = canonicalWritableSchema(baseSchema()).map((entry) => entry.name) + expect(names).not.toContain('Ref') + expect(names).not.toContain('Score') + }) +}) + +describe('writableSchemaHash — drift sensitivity', () => { + const base = baseSchema() + const baseHash = writableSchemaHash(base) + + it('is a prefixed sha256 digest', () => { + expect(baseHash).toMatch(/^sha256:[0-9a-f]{64}$/u) + }) + + it('is stable across a benign option color-only change', () => { + const recolored = baseSchema() + const priority = recolored.Priority as { select: { options: { color: string }[] } } + priority.select.options[0]!.color = 'purple' + priority.select.options[1]!.color = 'yellow' + expect(writableSchemaHash(recolored)).toBe(baseHash) + }) + + it('is stable across an option id-only change (ids are not hashed)', () => { + const reid = baseSchema() + ;(reid.Tags as { multi_select: { options: { id: string }[] } }).multi_select.options[0]!.id = + 'tag-a-renamed-internally' + expect(writableSchemaHash(reid)).toBe(baseHash) + }) + + it('is stable across a computed-property change (writable subset only)', () => { + const changed = baseSchema() + ;(changed.Score as { formula: { expression: string } }).formula.expression = '2 * 2' + expect(writableSchemaHash(changed)).toBe(baseHash) + }) + + it('trips on adding a writable property', () => { + const added = baseSchema() + added.When = { id: 'OmBl', name: 'When', type: 'date', date: {} } + expect(writableSchemaHash(added)).not.toBe(baseHash) + }) + + it('trips on removing a writable property', () => { + const removed = baseSchema() + delete removed.Count + expect(writableSchemaHash(removed)).not.toBe(baseHash) + }) + + it('trips on renaming a property (rename is id-preserving)', () => { + const renamed = baseSchema() + const count = renamed.Count as { name: string } + delete renamed.Count + count.name = 'Total' + renamed.Total = count + expect(writableSchemaHash(renamed)).not.toBe(baseHash) + }) + + it('trips on retyping a property', () => { + const retyped = baseSchema() + retyped.Count = { id: 'm%3FIX', name: 'Count', type: 'rich_text', rich_text: {} } + expect(writableSchemaHash(retyped)).not.toBe(baseHash) + }) + + it('trips on adding a select option', () => { + const optAdded = baseSchema() + ;(optAdded.Priority as { select: { options: unknown[] } }).select.options.push({ + id: 'opt-mid', + name: 'Medium', + color: 'yellow', + }) + expect(writableSchemaHash(optAdded)).not.toBe(baseHash) + }) +}) + +describe('binding projections', () => { + it('maps every property name to its id', () => { + expect(propertyIdMap(baseSchema())).toMatchObject({ + Name: 'title', + Count: 'm%3FIX', + Score: 'NCs%7B', + }) + }) + + it('finds the title property name', () => { + expect(titlePropertyName(baseSchema())).toBe('Name') + }) + + it('lists computed/unsupported property names as read-only', () => { + expect(readOnlyPropertyNames(baseSchema())).toEqual(['Ref', 'Score']) + }) +}) diff --git a/packages/@overeng/notion-md/src/sync.e2e.test.ts b/packages/@overeng/notion-md/src/sync.e2e.test.ts index 70addec7d..e35e43ec9 100644 --- a/packages/@overeng/notion-md/src/sync.e2e.test.ts +++ b/packages/@overeng/notion-md/src/sync.e2e.test.ts @@ -7,7 +7,7 @@ import { NodeContext } from '@effect/platform-node' import { Deferred, Effect, Fiber, Layer } from 'effect' import { describe, expect, it } from 'vitest' -import type { BodyCompleteness } from '@overeng/notion-core' +import { classifyBodyCompleteness, type BodyCompleteness } from '@overeng/notion-core' import type { NmdPageState, NmdStorage, NmdSyncStateV1 } from '@overeng/notion-effect-client' import { resolveNmdTargets, runBatchWatch, syncMany } from './batch.ts' @@ -17,10 +17,16 @@ import { NmdFrontmatterError, NmdGatewayError, NmdObjectStoreError, + NmdSchemaDriftError, } from './errors.ts' import { parseNmdFile, renderNmdFile } from './frontmatter.ts' import { normalizeMarkdownLineEndings, sha256Digest } from './hash.ts' -import { NotionMdGateway, type MarkdownUpdateCommand, type PullPageResult } from './model.ts' +import { + NotionMdGateway, + type MarkdownUpdateCommand, + type PullPageResult, + type RemoteParent, +} from './model.ts' import { NmdStateStoreLive, objectPath, @@ -65,6 +71,13 @@ interface FakePage { readonly unknownBlockIds?: readonly string[] readonly completeness?: BodyCompleteness readonly lastEditedTime?: string + /** Remote parent; defaults to the shared `page_id` standalone parent. */ + readonly parent?: RemoteParent + /** + * When set, `retrieveDataSource` resolves this property schema for the page's + * `data_source_id` parent — drives schema-drift coverage (Group F, R14). + */ + readonly dataSourceSchema?: Record<string, unknown> } const unsupportedStorage = (payload: unknown = { url: 'https://www.notion.com/' }): NmdStorage => ({ @@ -113,6 +126,8 @@ const unsupportedStorage = (payload: unknown = { url: 'https://www.notion.com/' class FakeNotion { private readonly pages = new Map<string, Required<FakePage>>() + /** Live property schema per data_source_id, recomputed against on retrieve. */ + private readonly dataSourceSchemas = new Map<string, Record<string, unknown>>() private tick = 0 private afterPagePropertiesUpdate: (() => void) | undefined private afterNextPullPage: (() => void) | undefined @@ -141,9 +156,22 @@ class FakeNotion { inTrash: false, isLocked: false, lastEditedTime: '2026-05-22T12:00:00.000Z', + parent: { type: 'page_id', page_id: pageId }, + dataSourceSchema: {}, ...page, }) } + /* seed the live schema registry for data-source-backed pages */ + for (const page of pages) { + if (page.parent?.type === 'data_source_id' && page.dataSourceSchema !== undefined) { + this.dataSourceSchemas.set(page.parent.data_source_id, page.dataSourceSchema) + } + } + } + + /** Mutate a data source's live property schema to simulate remote drift. */ + setDataSourceSchema(dataSourceId: string, schema: Record<string, unknown>): void { + this.dataSourceSchemas.set(dataSourceId, schema) } readonly layer = Layer.succeed(NotionMdGateway, { @@ -237,6 +265,21 @@ class FakeNotion { afterUpdate?.() return this.toPullResult(next).page }), + retrieveDataSource: ({ dataSourceId }) => + Effect.sync(() => { + const schema = this.dataSourceSchemas.get(dataSourceId) + if (schema === undefined) { + throw new NmdGatewayError({ + operation: 'retrieve_data_source', + message: `Unknown fake data source: ${dataSourceId}`, + }) + } + return { + dataSourceId, + databaseId: undefined, + properties: schema, + } + }), updatePageMetadata: ({ pageId: id, metadata }) => Effect.sync(() => { const page = this.requirePage(id) @@ -278,6 +321,8 @@ class FakeNotion { properties: {}, unknownBlockIds: [], completeness: { _tag: 'complete' }, + parent: { type: 'page_id', page_id: parentPageId }, + dataSourceSchema: {}, }) this.pages.set(parentPageId, { ...parent, @@ -397,7 +442,7 @@ class FakeNotion { title: page.title, title_property_key: 'title', url: `https://www.notion.so/${page.pageId.replaceAll('-', '')}`, - parent: { type: 'page_id', page_id: pageId }, + parent: page.parent, icon: page.icon, cover: page.cover, in_trash: page.inTrash, @@ -730,6 +775,103 @@ describe('notion-md e2e prototype', () => { }) }) + // R38 / #785: a renderable-but-not-round-trip-safe block (table_of_contents, + // synced_block, bookmark, child_database, …) renders to Markdown that Notion + // re-parses as a paragraph on push. The classifier must flag it so the pull + // gate refuses the page rather than letting an unrelated edit silently destroy + // the block. Drive the verdict through the REAL classifier from a block + // inventory to prove the classifier→gate wiring, not just the gate. + it('refuses a page whose body contains a not-round-trip-safe block (R38)', async () => { + await withTempDir(async (dir) => { + const completeness = classifyBodyCompleteness({ + markdown: { markdown: '# Doc\n\n[TOC]\n\nProse', truncated: false, unknownBlockIds: [] }, + inventory: { + entries: [ + { + id: '00000000-0000-4000-8000-00000000a001', + type: 'heading_1', + hasChildren: false, + inTrash: false, + }, + { + id: '00000000-0000-4000-8000-00000000a002', + type: 'table_of_contents', + hasChildren: false, + inTrash: false, + }, + { + id: '00000000-0000-4000-8000-00000000a003', + type: 'paragraph', + hasChildren: false, + inTrash: false, + }, + ], + renderedMarkdown: '# Doc\n\n[TOC]\n\nProse', + }, + }) + expect(completeness).toEqual({ + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['table_of_contents'], + }) + + const fake = new FakeNotion([ + { pageId, title: 'Doc', markdown: '# Doc\n\n[TOC]\n\nProse', completeness }, + ]) + const path = join(dir, 'toc.nmd') + + await expect(runWithFake(pullPage({ pageId, outPath: path }), fake)).rejects.toThrow( + 'table_of_contents', + ) + // Nothing was written: the page is refused before any local base exists, + // so no later edit can trigger the silent push-time destruction. + await expect(readFile(path, 'utf8')).rejects.toThrow() + }) + }) + + // R30: a child_page block *in a single page's body* is refused (the single-page + // surface has no tree engine to manage it as a <page> anchor). Contrast with + // the tree-node tolerance covered in tree.unit.test.ts. + it('refuses a single page whose body contains a child_page block (R30)', async () => { + await withTempDir(async (dir) => { + const completeness = classifyBodyCompleteness({ + markdown: { markdown: '# Doc\n\nProse', truncated: false, unknownBlockIds: [] }, + inventory: { + entries: [ + { + id: '00000000-0000-4000-8000-00000000b001', + type: 'paragraph', + hasChildren: false, + inTrash: false, + }, + { + id: '00000000-0000-4000-8000-00000000b002', + type: 'child_page', + hasChildren: true, + inTrash: false, + }, + ], + renderedMarkdown: '# Doc\n\nProse', + }, + }) + expect(completeness).toEqual({ + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['child_page'], + }) + + const fake = new FakeNotion([ + { pageId, title: 'Doc', markdown: '# Doc\n\nProse', completeness }, + ]) + const path = join(dir, 'child.nmd') + + await expect(runWithFake(pullPage({ pageId, outPath: path }), fake)).rejects.toThrow( + 'child_page', + ) + await expect(readFile(path, 'utf8')).rejects.toThrow() + }) + }) + it('batch sync reconciles independent local and remote edits across files', async () => { await withTempDir(async (dir) => { const fake = new FakeNotion([ @@ -1041,6 +1183,143 @@ describe('notion-md e2e prototype', () => { }) }) + it('refuses a property write when the data-source schema drifted since pull (exit 6, R14)', async () => { + await withTempDir(async (dir) => { + const dataSourceId = '00000000-0000-4000-8000-0000000000d5' + const schema = { + Name: { id: 'title', name: 'Name', type: 'title', title: {} }, + Done: { id: 'vV%3AO', name: 'Done', type: 'checkbox', checkbox: {} }, + Priority: { + id: 'm%7Bm%3C', + name: 'Priority', + type: 'select', + select: { options: [{ id: 'opt-low', name: 'Low', color: 'gray' }] }, + }, + } + const fake = new FakeNotion([ + { + pageId, + title: 'Row', + markdown: '# Row\n\nBody', + parent: { type: 'data_source_id', data_source_id: dataSourceId }, + dataSourceSchema: schema, + properties: { Done: { type: 'checkbox', checkbox: false } }, + }, + ]) + const path = join(dir, 'row.nmd') + + await runWithFake(pullPage({ pageId, outPath: path }), fake) + const pulledSync = await readSyncStateFile(path) + // the schema_snapshot was captured into the sidecar data_source binding + expect(pulledSync.data_source?.data_source_id).toBe(dataSourceId) + + const parsed = await parseFile(path) + await writeFile( + path, + renderNmdFile({ + frontmatter: { + notion_md: { + ...parsed.frontmatter.notion_md, + properties: { + ...parsed.frontmatter.notion_md.properties, + Done: { _tag: 'checkbox', value: true }, + }, + }, + }, + body: parsed.body, + }), + ) + + // remote schema drifts: a new select option appears after the clean pull + fake.setDataSourceSchema(dataSourceId, { + ...schema, + Priority: { + id: 'm%7Bm%3C', + name: 'Priority', + type: 'select', + select: { + options: [ + { id: 'opt-low', name: 'Low', color: 'gray' }, + { id: 'opt-high', name: 'High', color: 'red' }, + ], + }, + }, + }) + + const result = await runEitherWithFake(pushPage({ path }), fake) + + expect(result).toMatchObject({ + _tag: 'Left', + left: { _tag: 'NmdSchemaDriftError', page_id: pageId, data_source_id: dataSourceId, path }, + }) + if (result._tag !== 'Left') throw new Error('Expected pushPage to fail on schema drift') + expect(result.left).toBeInstanceOf(NmdSchemaDriftError) + // the property write was refused — remote stays at its pre-edit value + expect(fake.remoteProperties(pageId).Done).toEqual({ type: 'checkbox', checkbox: false }) + }) + }) + + it('allows a property write when only a benign (color-only) schema change occurred', async () => { + await withTempDir(async (dir) => { + const dataSourceId = '00000000-0000-4000-8000-0000000000d5' + const schema = { + Name: { id: 'title', name: 'Name', type: 'title', title: {} }, + Done: { id: 'vV%3AO', name: 'Done', type: 'checkbox', checkbox: {} }, + Priority: { + id: 'm%7Bm%3C', + name: 'Priority', + type: 'select', + select: { options: [{ id: 'opt-low', name: 'Low', color: 'gray' }] }, + }, + } + const fake = new FakeNotion([ + { + pageId, + title: 'Row', + markdown: '# Row\n\nBody', + parent: { type: 'data_source_id', data_source_id: dataSourceId }, + dataSourceSchema: schema, + properties: { Done: { type: 'checkbox', checkbox: false } }, + }, + ]) + const path = join(dir, 'row.nmd') + + await runWithFake(pullPage({ pageId, outPath: path }), fake) + const parsed = await parseFile(path) + await writeFile( + path, + renderNmdFile({ + frontmatter: { + notion_md: { + ...parsed.frontmatter.notion_md, + properties: { + ...parsed.frontmatter.notion_md.properties, + Done: { _tag: 'checkbox', value: true }, + }, + }, + }, + body: parsed.body, + }), + ) + + // benign drift: only an option color changed — the writable projection is unchanged + fake.setDataSourceSchema(dataSourceId, { + ...schema, + Priority: { + id: 'm%7Bm%3C', + name: 'Priority', + type: 'select', + select: { options: [{ id: 'opt-low', name: 'Low', color: 'purple' }] }, + }, + }) + + const pushed = await runWithFake(pushPage({ path }), fake) + + expect(pushed.pushed).toBe(true) + expect(fake.remoteProperties(pageId).Done).toEqual({ checkbox: true }) + }) + }) + it('refuses to refresh a property-only push over a concurrent local body edit', async () => { await withTempDir(async (dir) => { const fake = new FakeNotion([ diff --git a/packages/@overeng/notion-md/src/sync.ts b/packages/@overeng/notion-md/src/sync.ts index 3771235b4..75adde8a8 100644 --- a/packages/@overeng/notion-md/src/sync.ts +++ b/packages/@overeng/notion-md/src/sync.ts @@ -3,8 +3,10 @@ import { basename } from 'node:path' import type { FileSystem } from '@effect/platform' import { Effect, Option } from 'effect' +import { describeBodyLossyRefusal, tolerateTreeChildPages } from '@overeng/notion-core' import { NOTION_API_VERSION, + type NmdDataSourceBinding, type NmdFrontmatterV2, type NmdObjectRef, type NmdParentRef, @@ -19,10 +21,11 @@ import { NmdConflictError, NmdFrontmatterError, NmdRemoteBodyLossyError, + NmdSchemaDriftError, type NmdError, } from './errors.ts' import { parseNmdFile, renderNmdFile } from './frontmatter.ts' -import { normalizeMarkdownLineEndings, sha256Digest } from './hash.ts' +import { normalizeMarkdownLineEndings, sha256Digest, stripChildAnchors } from './hash.ts' import { planMarkdownUpdate, tryMergeMarkdownBodies } from './merge.ts' import { NotionMdGateway, @@ -34,6 +37,13 @@ import { type WritablePageIcon, } from './model.ts' import * as Observability from './observability.ts' +import { reportNote, reportStageSkip, withStage } from './progress.ts' +import { + propertyIdMap, + readOnlyPropertyNames, + titlePropertyName, + writableSchemaHash, +} from './schema-snapshot.ts' import { NmdStateStore, readBaseSnapshot, @@ -392,9 +402,18 @@ const assertRemoteMarkdownComplete = (opts: { readonly path?: string readonly pageId: string readonly markdown: RemoteMarkdownSnapshot + /** + * Set on tree-node call sites: tolerate the node's own `child_page` blocks + * (managed by the file tree engine as `<page>` anchors, R12/R30) while still + * refusing any other lossy block on the same page. Single-page surfaces leave + * this false so a child-page block in a single page's body is refused. + */ + readonly allowChildPageBlocks?: boolean }): Effect.Effect<void, NmdRemoteBodyLossyError> => { - const completeness = opts.markdown.completeness - if (completeness === undefined || completeness._tag === 'complete') return Effect.void + const raw = opts.markdown.completeness + if (raw === undefined || raw._tag === 'complete') return Effect.void + const completeness = opts.allowChildPageBlocks === true ? tolerateTreeChildPages(raw) : raw + if (completeness._tag === 'complete') return Effect.void return Effect.fail( new NmdRemoteBodyLossyError({ @@ -402,7 +421,11 @@ const assertRemoteMarkdownComplete = (opts: { page_id: opts.pageId, ...(opts.path === undefined ? {} : { path: opts.path }), reasons: [...completeness.reasons], - message: `Remote Markdown body for page ${opts.pageId} is lossy (${completeness.reasons.join(', ')}); refusing to treat it as a clean notion-md base`, + message: describeBodyLossyRefusal({ + pageId: opts.pageId, + completeness, + context: 'refusing to treat it as a clean notion-md base', + }), }), ) } @@ -492,7 +515,73 @@ ${fence} }) } -const buildFrontmatterV2 = (opts: { readonly page: RemotePageSnapshot }): NmdFrontmatterV2 => ({ +/** + * Build the strict V2 `.nmd` frontmatter envelope for a remote page. Exported + * for the stateless `cat --frontmatter` envelope dump, which renders the full + * envelope without writing a `.notion-md/` store (decision 0017). + */ +/** + * Capture a `schema_snapshot` of the parent data source for a + * data-source-backed page (decision 0017, R14). Retrieves the live property + * schema and projects it to the sidecar `data_source` binding — the base the + * push compares against to refuse a property write across schema drift. + * + * Standalone (non-data-source) pages return `null`: they have no data-source + * schema, so the drift check does not apply. + */ +const captureDataSourceBinding = (opts: { + readonly page: RemotePageSnapshot +}): Effect.Effect<NmdDataSourceBinding | null, NmdError, NotionMdGateway> => + Effect.gen(function* () { + if (opts.page.parent.type !== 'data_source_id') return null + const gateway = yield* NotionMdGateway + const dataSourceId = opts.page.parent.data_source_id + const schema = yield* gateway.retrieveDataSource({ dataSourceId }) + return { + database_id: schema.databaseId ?? dataSourceId, + data_source_id: dataSourceId, + schema_hash: writableSchemaHash(schema.properties), + title_property: titlePropertyName(schema.properties) ?? opts.page.title_property_key, + property_ids: propertyIdMap(schema.properties), + read_only_properties: readOnlyPropertyNames(schema.properties), + } + }) + +/** + * Refuse a property write when the parent data source's writable schema drifted + * since the clean pull (decision 0017, R14). Re-retrieves the live schema and + * compares the recomputed hash against the sidecar `schema_snapshot`; on drift + * fails with `NmdSchemaDriftError` (exit 6) — distinct from the exit-7 + * value/body conflict and not `--force`-able. Resolve by re-pulling. + * + * Standalone pages (`syncState.data_source === null`) skip the check. + */ +const assertSchemaUnchanged = (opts: { + readonly path: string + readonly pageId: string + readonly dataSource: NmdDataSourceBinding | null +}): Effect.Effect<void, NmdError, NotionMdGateway> => + Effect.gen(function* () { + const binding = opts.dataSource + if (binding === null) return + const gateway = yield* NotionMdGateway + const schema = yield* gateway.retrieveDataSource({ dataSourceId: binding.data_source_id }) + const liveHash = writableSchemaHash(schema.properties) + if (liveHash === binding.schema_hash) return + yield* Observability.annotateAttrs(Observability.pushDecisionAttrs, { + decision: 'schema_drift', + }) + return yield* new NmdSchemaDriftError({ + page_id: opts.pageId, + data_source_id: binding.data_source_id, + path: opts.path, + message: `Data-source schema changed since the last clean pull (data source ${binding.data_source_id}); refusing the property write so an unknown value is not silently auto-created. Re-pull the page to adopt the new schema, then re-apply your edit.`, + }) + }) + +export const buildFrontmatterV2 = (opts: { + readonly page: RemotePageSnapshot +}): NmdFrontmatterV2 => ({ notion_md: { version: 2, api_version: NOTION_API_VERSION, @@ -529,6 +618,12 @@ const buildSyncState = (opts: { * oracle. For single-page the fake/real round-trip makes these equal. */ readonly baselineBody: string + /** + * Schema snapshot of the parent data source for a data-source-backed page + * (decision 0017, R14), or `null` for a standalone page. Captured at pull and + * compared before a property write to refuse on schema drift (exit 6). + */ + readonly dataSource: NmdDataSourceBinding | null }): NmdSyncStateV1 => { const baseline = normalizeMarkdownLineEndings(opts.baselineBody) return { @@ -545,7 +640,7 @@ const buildSyncState = (opts: { }, storage: opts.storage, read_only_properties: readOnlyPropertyEchoes(opts.page.properties), - data_source: null, + data_source: opts.dataSource, } } @@ -562,7 +657,7 @@ const writeNmdWithStoragePolicy = (opts: { * compares composed-vs-composed; for single-page it equals `fileBody`. */ readonly baselineBody: string -}): Effect.Effect<PullResult, NmdError, NmdStateStore> => +}): Effect.Effect<PullResult, NmdError, NotionMdGateway | NmdStateStore> => Effect.gen(function* () { yield* assertRemoteMarkdownComplete({ operation: 'write_clean_base', @@ -570,6 +665,7 @@ const writeNmdWithStoragePolicy = (opts: { pageId: opts.page.id, markdown: opts.markdown, }) + const dataSource = yield* captureDataSourceBinding({ page: opts.page }) const base = yield* writeBaseSnapshot({ path: opts.path, pageId: opts.page.id, @@ -582,6 +678,7 @@ const writeNmdWithStoragePolicy = (opts: { storage: opts.storage, base, baselineBody: opts.baselineBody, + dataSource, }) const decision = decideStorage(syncState) let storageObjectPath: string | undefined @@ -680,6 +777,7 @@ const establishSidecarFromRemote = (opts: { markdown: pulled.markdown, }) const baselineBody = normalizeMarkdownLineEndings(pulled.markdown.markdown) + const dataSource = yield* captureDataSourceBinding({ page: pulled.page }) const base = yield* writeBaseSnapshot({ path: opts.path, pageId: opts.pageId, @@ -693,6 +791,7 @@ const establishSidecarFromRemote = (opts: { storage: pulled.storage ?? emptyStorage(), base, baselineBody, + dataSource, }), }) }).pipe(Observability.withOperation(Observability.EstablishSidecarSpan, { pageId: opts.pageId })) @@ -774,20 +873,6 @@ const assertLocalBodyUnchanged = (opts: { }) }) -/** - * Strip block-level `<page url=...>...</page>` child anchors from a body. The - * anchor set is DERIVED and re-emitted on every parent push, and Notion itself - * auto-appends an anchor whenever a child is created — so it must not count as - * a "remote change" that would block the parent's push or trigger a 3-way - * merge (which would duplicate the anchor). Used only for change DETECTION; the - * push body always carries the full derived anchor set. - */ -const stripChildAnchors = (body: string): string => - body - .split('\n') - .filter((line) => /^\s*<page\b[^>]*>.*<\/page>\s*$/u.test(line) === false) - .join('\n') - const remoteBodyUnchangedForPush = (opts: { readonly remoteBody: string readonly baseBody: string @@ -987,6 +1072,7 @@ export const treeNodePersist = (opts: { path: opts.path, pageId: status.pageId, markdown: pulled.markdown, + allowChildPageBlocks: true, }) yield* assertLocalBodyUnchanged({ path: opts.path, @@ -1001,6 +1087,7 @@ export const treeNodePersist = (opts: { content: renderNmdFile({ frontmatter: opts.frontmatter, body: opts.bareBody }), }) /* sidecar + base snapshot live at the tree root, keyed by page id */ + const dataSource = yield* captureDataSourceBinding({ page: pulled.page }) const base = yield* writeBaseSnapshot({ path: opts.statePath, pageId: status.pageId, @@ -1014,6 +1101,7 @@ export const treeNodePersist = (opts: { storage: pulled.storage ?? emptyStorage(), base, baselineBody: pushedBody, + dataSource, }), }) }), @@ -1100,6 +1188,7 @@ export const pushGuarded = (opts: { path, pageId: status.pageId, markdown: remoteForStatus.markdown, + allowChildPageBlocks: options.replaceContent === true, }) if ( @@ -1153,6 +1242,11 @@ export const pushGuarded = (opts: { yield* gateway.updatePageMetadata({ pageId: status.pageId, metadata: metadataUpdate }) } if (status.localPropertiesChanged === true) { + yield* assertSchemaUnchanged({ + path, + pageId: status.pageId, + dataSource: local.syncState.data_source, + }) yield* gateway.updatePageProperties({ pageId: status.pageId, properties: yield* encodeWritableProperties({ @@ -1166,11 +1260,14 @@ export const pushGuarded = (opts: { * adopt the freshly re-pulled remote body as the new local baseline * (a pull), rather than re-asserting the stale desired body. */ - yield* opts.persist.persist({ - pushedBody: local.desiredBody, - status, - adoptRemoteBody: true, - }) + yield* withStage( + { id: 'settle', label: 'settle', doneMessage: 'verified' }, + opts.persist.persist({ + pushedBody: local.desiredBody, + status, + adoptRemoteBody: true, + }), + ) return { path, pageId: status.pageId, pushed: true, status } } @@ -1189,13 +1286,21 @@ export const pushGuarded = (opts: { yield* Observability.annotateAttrs(Observability.pushMarkdownCommandAttrs, { markdownCommand: command._tag, }) - yield* gateway.updateMarkdown({ - pageId: status.pageId, - command, - allowDeletingContent: - options.allowDeletingUnknownBlocks === true || options.replaceContent === true, - }) + yield* withStage( + { id: 'write-body', label: 'write-body', doneMessage: 'replace_content' }, + gateway.updateMarkdown({ + pageId: status.pageId, + command, + allowDeletingContent: + options.allowDeletingUnknownBlocks === true || options.replaceContent === true, + }), + ) if (status.localPropertiesChanged === true) { + yield* assertSchemaUnchanged({ + path, + pageId: status.pageId, + dataSource: local.syncState.data_source, + }) yield* gateway.updatePageProperties({ pageId: status.pageId, properties: yield* encodeWritableProperties({ @@ -1205,9 +1310,20 @@ export const pushGuarded = (opts: { }) } if (hasPageMetadataUpdate(metadataUpdate) === true) { - yield* gateway.updatePageMetadata({ pageId: status.pageId, metadata: metadataUpdate }) + yield* withStage( + { id: 'write-title', label: 'write-title', doneMessage: 'title updated' }, + gateway.updatePageMetadata({ pageId: status.pageId, metadata: metadataUpdate }), + ) + } else { + yield* reportStageSkip({ id: 'write-title', label: 'write-title' }) } - yield* opts.persist.persist({ pushedBody: mergedBody, status }) + yield* withStage( + { id: 'settle', label: 'settle', doneMessage: 'verified' }, + opts.persist.persist({ pushedBody: mergedBody, status }), + ) + yield* reportNote( + 'remote changed since pull — auto-merged your edit with the upstream change', + ) return { path, pageId: status.pageId, pushed: true, status } } @@ -1232,61 +1348,70 @@ export const pushGuarded = (opts: { } if (status.localChanged === true) { - yield* Effect.gen(function* () { - const baseSnapshot = yield* readBaseSnapshot({ - path: statePath, - syncState: local.syncState, - }) - const remote = yield* gateway.pullPage({ pageId: status.pageId }) - yield* assertRemoteMarkdownComplete({ - operation: 'guarded_push_preflight', - path, - pageId: status.pageId, - markdown: remote.markdown, - }) - /* - * TOCTOU: the remote must not have changed since the status pull. - * Compare semantically against the baseline (canonicalization-invariant). - * For `replaceContent` tree pushes, ignore derived child anchors only; - * real user-authored body edits still block the full-body replace. - */ - if ( - options.force !== true && - remoteBodyUnchangedForPush({ - remoteBody: remote.markdown.markdown, - baseBody: baseSnapshot.body, - ignoreChildAnchors: options.replaceContent === true, - }) === false - ) { - return yield* new NmdConflictError({ + yield* withStage( + { id: 'write-body', label: 'write-body', doneMessage: 'replace_content' }, + Effect.gen(function* () { + const baseSnapshot = yield* readBaseSnapshot({ + path: statePath, + syncState: local.syncState, + }) + const remote = yield* gateway.pullPage({ pageId: status.pageId }) + yield* assertRemoteMarkdownComplete({ + operation: 'guarded_push_preflight', path, - page_id: status.pageId, - local_changed: status.localChanged, - remote_changed: true, - message: 'Remote page changed while preparing guarded Markdown push', + pageId: status.pageId, + markdown: remote.markdown, + allowChildPageBlocks: options.replaceContent === true, }) - } - const command = - options.force === true || options.replaceContent === true - ? ({ _tag: 'replace_content', markdown: local.desiredBody } as const) - : planMarkdownUpdate({ - baseBody: baseSnapshot.body, - remoteBody: remote.markdown.markdown, - desiredBody: local.desiredBody, - }) - yield* Observability.annotateAttrs(Observability.pushDecisionMarkdownCommandAttrs, { - decision: options.force === true ? 'force_replace' : 'guarded_update', - markdownCommand: command._tag, - }) - yield* gateway.updateMarkdown({ - pageId: status.pageId, - command, - allowDeletingContent: - options.allowDeletingUnknownBlocks === true || options.replaceContent === true, - }) - }) + /* + * TOCTOU: the remote must not have changed since the status pull. + * Compare semantically against the baseline (canonicalization-invariant). + * For `replaceContent` tree pushes, ignore derived child anchors only; + * real user-authored body edits still block the full-body replace. + */ + if ( + options.force !== true && + remoteBodyUnchangedForPush({ + remoteBody: remote.markdown.markdown, + baseBody: baseSnapshot.body, + ignoreChildAnchors: options.replaceContent === true, + }) === false + ) { + return yield* new NmdConflictError({ + path, + page_id: status.pageId, + local_changed: status.localChanged, + remote_changed: true, + message: 'Remote page changed while preparing guarded Markdown push', + }) + } + const command = + options.force === true || options.replaceContent === true + ? ({ _tag: 'replace_content', markdown: local.desiredBody } as const) + : planMarkdownUpdate({ + baseBody: baseSnapshot.body, + remoteBody: remote.markdown.markdown, + desiredBody: local.desiredBody, + }) + yield* Observability.annotateAttrs(Observability.pushDecisionMarkdownCommandAttrs, { + decision: options.force === true ? 'force_replace' : 'guarded_update', + markdownCommand: command._tag, + }) + yield* gateway.updateMarkdown({ + pageId: status.pageId, + command, + allowDeletingContent: + options.allowDeletingUnknownBlocks === true || options.replaceContent === true, + }) + }), + ) } if (status.localPropertiesChanged === true) { + yield* assertSchemaUnchanged({ + path, + pageId: status.pageId, + dataSource: local.syncState.data_source, + }) yield* gateway.updatePageProperties({ pageId: status.pageId, properties: yield* encodeWritableProperties({ @@ -1296,18 +1421,26 @@ export const pushGuarded = (opts: { }) } if (hasPageMetadataUpdate(metadataUpdate) === true) { - yield* gateway.updatePageMetadata({ pageId: status.pageId, metadata: metadataUpdate }) + yield* withStage( + { id: 'write-title', label: 'write-title', doneMessage: 'title updated' }, + gateway.updatePageMetadata({ pageId: status.pageId, metadata: metadataUpdate }), + ) + } else { + yield* reportStageSkip({ id: 'write-title', label: 'write-title' }) } /* * If only properties/metadata changed (the body was not pushed), adopt the * re-pulled remote body — it may have raced ahead during the property * update. When the body WAS pushed, round-trip the pushed body. */ - yield* opts.persist.persist({ - pushedBody: local.desiredBody, - status, - adoptRemoteBody: status.localChanged === false, - }) + yield* withStage( + { id: 'settle', label: 'settle', doneMessage: 'verified' }, + opts.persist.persist({ + pushedBody: local.desiredBody, + status, + adoptRemoteBody: status.localChanged === false, + }), + ) return { path, pageId: status.pageId, pushed: true, status } }) @@ -1320,7 +1453,10 @@ export const pushPageWithPolicy = ( yield* assertSinglePageTarget(opts.path) const local = yield* readNmd(opts.path) const gateway = yield* NotionMdGateway - const remoteForStatus = yield* gateway.pullPage({ pageId: local.pageId }) + const remoteForStatus = yield* withStage( + { id: 'observe', label: 'observe', doneMessage: 'remote pulled' }, + gateway.pullPage({ pageId: local.pageId }), + ) return yield* pushGuarded({ local, remoteForStatus, @@ -1350,9 +1486,18 @@ export const pushPage = ( ): Effect.Effect<PushResult, NmdError, FileSystem.FileSystem | NotionMdGateway | NmdStateStore> => pushPageWithPolicy(opts) -/** Run one two-way reconciliation pass for a `.nmd` file. */ -export const syncPage = ( - opts: SyncOptions, +/** + * One two-way reconciliation pass for a `.nmd` file. + * + * `replaceContent` forces a full-body `replace_content` instead of the narrowest + * `update_content` search-replace. The `edit` editor surface sets it (decision + * 0017): every page `edit` accepts is fully representable (lossy pages are + * refused at the pull), so a full replace is safe and closes the targeted-update + * silent-partial-apply window for the single ephemeral session. The default + * file-`sync` path leaves it unset and keeps its targeted-update optimization. + */ +const runSyncPass = ( + opts: SyncOptions & { readonly replaceContent?: boolean }, ): Effect.Effect<SyncResult, NmdError, FileSystem.FileSystem | NotionMdGateway | NmdStateStore> => Effect.gen(function* () { const status = yield* statusPage({ path: opts.path }) @@ -1362,7 +1507,10 @@ export const syncPage = ( status.localPageMetadataChanged === true || status.localPropertiesChanged === true ) { - const push = yield* pushPage(opts) + const push = + opts.replaceContent === true + ? yield* pushPageWithPolicy({ ...opts, replaceContent: true }) + : yield* pushPage(opts) return { _tag: 'pushed', path: opts.path, @@ -1389,7 +1537,32 @@ export const syncPage = ( pageId: status.pageId, status, } as const - }).pipe( + }) + +/** Run one two-way reconciliation pass for a `.nmd` file. */ +export const syncPage = ( + opts: SyncOptions, +): Effect.Effect<SyncResult, NmdError, FileSystem.FileSystem | NotionMdGateway | NmdStateStore> => + runSyncPass(opts).pipe( + Effect.tap((result) => + Observability.annotateAttrs(Observability.syncResultAttrs, { + pageId: result.pageId, + result: result._tag, + }), + ), + Observability.withOperation(Observability.SyncPageSpan, { basename: basename(opts.path) }), + ) + +/** + * `edit`-surface sync pass: forces a full-body `replace_content` (decision + * 0017). A thin wrapper over the same engine `syncPage` uses — not a second push + * path. Used by the ephemeral `edit` session after the spliced buffer is written + * back to the temp `.nmd`. + */ +export const syncPageReplacingBody = ( + opts: SyncOptions, +): Effect.Effect<SyncResult, NmdError, FileSystem.FileSystem | NotionMdGateway | NmdStateStore> => + runSyncPass({ ...opts, replaceContent: true }).pipe( Effect.tap((result) => Observability.annotateAttrs(Observability.syncResultAttrs, { pageId: result.pageId, diff --git a/packages/@overeng/notion-md/src/tree.ts b/packages/@overeng/notion-md/src/tree.ts index 19fd0fcc7..dca04f517 100644 --- a/packages/@overeng/notion-md/src/tree.ts +++ b/packages/@overeng/notion-md/src/tree.ts @@ -3,6 +3,7 @@ import { basename, dirname, extname, join, relative, resolve } from 'node:path' import { FileSystem } from '@effect/platform' import { Effect, Schema } from 'effect' +import { describeBodyLossyRefusal, tolerateTreeChildPages } from '@overeng/notion-core' import { NOTION_API_VERSION, type NmdFrontmatterV2, @@ -20,7 +21,7 @@ import { type NmdError, } from './errors.ts' import { parseNmdFile, renderNmdFile } from './frontmatter.ts' -import { normalizeMarkdownLineEndings, sha256Digest } from './hash.ts' +import { normalizeMarkdownLineEndings, sha256Digest, stripChildAnchors } from './hash.ts' import { NotionMdGateway, type RemoteMarkdownSnapshot, type RemotePageSnapshot } from './model.ts' import { withOperation } from './observability.ts' import { @@ -101,8 +102,16 @@ const assertRemoteMarkdownComplete = (opts: { readonly pageId: string readonly markdown: RemoteMarkdownSnapshot }): Effect.Effect<void, NmdRemoteBodyLossyError> => { - const completeness = opts.markdown.completeness - if (completeness === undefined || completeness._tag === 'complete') return Effect.void + const raw = opts.markdown.completeness + if (raw === undefined || raw._tag === 'complete') return Effect.void + /* + * Tree nodes own their sub-page `child_page` blocks (re-emitted as `<page>` + * anchors and managed by the tree engine, R12/R30): tolerate those while + * still refusing any other lossy block (toc/synced/…) on the same page so + * #785 stays fixed on the tree path too. + */ + const completeness = tolerateTreeChildPages(raw) + if (completeness._tag === 'complete') return Effect.void return Effect.fail( new NmdRemoteBodyLossyError({ @@ -110,7 +119,11 @@ const assertRemoteMarkdownComplete = (opts: { page_id: opts.pageId, ...(opts.relPath === undefined ? {} : { path: opts.relPath }), reasons: [...completeness.reasons], - message: `Remote Markdown body for page ${opts.pageId} is lossy (${completeness.reasons.join(', ')}); refusing to treat it as a clean notion-md tree base`, + message: describeBodyLossyRefusal({ + pageId: opts.pageId, + completeness, + context: 'refusing to treat it as a clean notion-md tree base', + }), }), ) } @@ -513,17 +526,6 @@ const composeTreePushBody = (opts: { ), ) -/** Strip block-level child anchors; in a tree they are derived from hierarchy. */ -const stripChildAnchors = (body: string): string => - normalizeMarkdownLineEndings( - body - .split('\n') - .filter((line) => /^\s*<page\b[^>]*>.*<\/page>\s*$/u.test(line) === false) - .join('\n') - .replace(/\n{3,}/gu, '\n\n') - .replace(/\n+$/u, '\n'), - ) - /** Re-render a file with the real `page_id`/`url` bound in (keeps body + title). */ const bindFrontmatter = (opts: { readonly frontmatter: NmdFrontmatterV2 diff --git a/packages/@overeng/notion-md/src/tree.unit.test.ts b/packages/@overeng/notion-md/src/tree.unit.test.ts index eaabe0531..950cf8f0b 100644 --- a/packages/@overeng/notion-md/src/tree.unit.test.ts +++ b/packages/@overeng/notion-md/src/tree.unit.test.ts @@ -6,7 +6,7 @@ import { NodeContext } from '@effect/platform-node' import { Effect, Layer } from 'effect' import { describe, expect, it } from 'vitest' -import type { BodyCompleteness } from '@overeng/notion-core' +import { classifyBodyCompleteness, type BodyCompleteness } from '@overeng/notion-core' import { NOTION_API_VERSION, type NmdPageState } from '@overeng/notion-effect-client' import { @@ -67,6 +67,11 @@ class FakeTreeNotion { this.lossyAfterNextUpdate.set(id, completeness) } + /** Set a page's current remote completeness verdict (test-only). */ + setRemoteCompleteness(id: string, completeness: BodyCompleteness): void { + this.require(id).completeness = completeness + } + childTitles(id: string): readonly string[] { return [...this.pages.entries()] .filter(([, page]) => page.parentId === id && page.inTrash === false) @@ -165,6 +170,7 @@ class FakeTreeNotion { } }), updatePageProperties: ({ pageId }) => Effect.sync(() => this.snapshot(pageId)), + retrieveDataSource: () => Effect.dieMessage('unexpected retrieveDataSource'), updatePageMetadata: ({ pageId }) => Effect.sync(() => this.snapshot(pageId)), listChildPages: ({ pageId }) => Effect.sync(() => @@ -857,6 +863,68 @@ describe('notion-md tree reconcile lifecycle', () => { }) }) + // R38/#785 + R12/R30: a tree PARENT legitimately contains child_page blocks + // for its sub-pages. The classifier flags child_page as not-round-trip-safe, + // but the tree gate must TOLERATE the node's own child pages (managed as + // <page> anchors) — otherwise every multi-page tree would be refused. + const childPageVerdict = (extraTypes: readonly string[] = []) => + classifyBodyCompleteness({ + markdown: { markdown: 'Root body.', truncated: false, unknownBlockIds: [] }, + inventory: { + entries: [ + { id: 'b-1', type: 'paragraph', hasChildren: false, inTrash: false }, + { id: 'b-2', type: 'child_page', hasChildren: true, inTrash: false }, + ...extraTypes.map((type, i) => ({ + id: `b-x${i}`, + type, + hasChildren: false, + inTrash: false, + })), + ], + renderedMarkdown: 'Root body.', + }, + }) + + it('tolerates a tree node whose only lossy block is its own child_page (R12/R30)', async () => { + await withTempDir(async (dir) => { + const fake = new FakeTreeNotion() + await writeFile(join(dir, 'index.nmd'), unbound({ title: 'Root', body: 'Root body.' })) + await writeFile(join(dir, 'alpha.nmd'), unbound({ title: 'Alpha', body: 'Alpha body.' })) + + // First sync binds ids and establishes baselines. + await run(syncTree({ root: dir, rootPageId }), fake) + // Now the root's remote body classifies lossy *only* because it contains a + // child_page block (its sub-page). The tree gate must not refuse it. + const verdict = childPageVerdict() + expect(verdict).toEqual({ + _tag: 'lossy', + reasons: ['not_round_trip_safe_blocks'], + lossyBlockTypes: ['child_page'], + }) + fake.setRemoteCompleteness(rootPageId, verdict) + + // A subsequent sync must still succeed (no refusal). + const result = await run(syncTree({ root: dir }), fake) + expect(result.ops.length).toBeGreaterThan(0) + }) + }) + + it('still refuses a tree node that ALSO has a real lossy block (#785 stays fixed on the tree path)', async () => { + await withTempDir(async (dir) => { + const fake = new FakeTreeNotion() + await writeFile(join(dir, 'index.nmd'), unbound({ title: 'Root', body: 'Root body.' })) + await writeFile(join(dir, 'alpha.nmd'), unbound({ title: 'Alpha', body: 'Alpha body.' })) + + await run(syncTree({ root: dir, rootPageId }), fake) + // Root now has child_page (tolerated) AND a table_of_contents (must refuse). + fake.setRemoteCompleteness(rootPageId, childPageVerdict(['table_of_contents'])) + // Force a local change so the gate is reached on push. + await writeFile(join(dir, 'index.nmd'), unbound({ title: 'Root', body: 'Edited root body.' })) + + await expect(run(syncTree({ root: dir }), fake)).rejects.toThrow('table_of_contents') + }) + }) + it('is crash-idempotent: per-create id writeback prevents duplicate creation', async () => { await withTempDir(async (dir) => { const fake = new FakeTreeNotion() diff --git a/packages/@overeng/tui-stories/nix/build.nix b/packages/@overeng/tui-stories/nix/build.nix index afd32a9e9..79920f3ec 100644 --- a/packages/@overeng/tui-stories/nix/build.nix +++ b/packages/@overeng/tui-stories/nix/build.nix @@ -21,7 +21,7 @@ let # Managed by the repo FOD refresh workflow — do not edit manually. depsBuilds = { "." = { - hash = "sha256-g+iqVtWaBhbgnNkFW2QhDOmZJKoiB7K9YSoW/A1Ok5I="; + hash = "sha256-5W03Rz5JP7P4LqTZ8hkKpxjSGF1YDKskgoQ9Gu8vVlU="; }; }; nativeNodePackages = [ opentuiCoreNative ]; diff --git a/packages/@overeng/workflow-report/nix/build.nix b/packages/@overeng/workflow-report/nix/build.nix index e32fabcbe..2141575ec 100644 --- a/packages/@overeng/workflow-report/nix/build.nix +++ b/packages/@overeng/workflow-report/nix/build.nix @@ -19,7 +19,7 @@ let # Managed by the repo FOD refresh workflow — do not edit manually. depsBuilds = { "." = { - hash = "sha256-9QYiKCj1ByN7jxOtmZ+AJ32uxdWAJYbSmuhkCd5Cpi4="; + hash = "sha256-YCjA27qJi92WElqgNI+Cai+23Tlt5G7w6o4CrVmy4WY="; }; }; smokeTestArgs = [ "--help" ]; diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index b471b6888..0ce948a23 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -920,22 +920,28 @@ importers: packages/@overeng/notion-effect-client: dependencies: '@effect/cluster': - specifier: ^0.58.2 + specifier: 0.58.2 version: 0.58.2(f335cd339cca8128b299febedcdd4641) '@effect/experimental': - specifier: ^0.60.0 + specifier: 0.60.0 version: 0.60.0(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(effect@3.21.2)(ioredis@5.6.1) '@effect/opentelemetry': - specifier: ^0.63.0 + specifier: 0.63.0 version: 0.63.0(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(@opentelemetry/api@1.9.0)(@opentelemetry/resources@2.7.1(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-logs@0.218.0(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-metrics@2.7.1(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-base@2.7.1(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-node@2.7.1(@opentelemetry/api@1.9.0))(@opentelemetry/sdk-trace-web@2.7.1(@opentelemetry/api@1.9.0))(@opentelemetry/semantic-conventions@1.41.1)(effect@3.21.2) + '@effect/platform': + specifier: 0.96.1 + version: 0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2) '@effect/platform-node': - specifier: ^0.106.0 + specifier: 0.106.0 version: 0.106.0(@effect/cluster@0.58.2(f335cd339cca8128b299febedcdd4641))(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(@effect/rpc@0.75.1(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(effect@3.21.2))(@effect/sql@0.51.1(@effect/experimental@0.60.0(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(effect@3.21.2)(ioredis@5.6.1))(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(effect@3.21.2))(effect@3.21.2) '@effect/rpc': - specifier: ^0.75.1 + specifier: 0.75.1 version: 0.75.1(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(effect@3.21.2) + '@effect/vitest': + specifier: 0.29.0 + version: 0.29.0(effect@3.21.2)(vitest@3.2.4(@types/debug@4.1.13)(@types/node@25.3.3)(happy-dom@18.0.1)(jiti@2.6.1)(lightningcss@1.30.2)(tsx@4.21.0)(yaml@2.8.3)) '@effect/workflow': - specifier: ^0.18.0 + specifier: 0.18.0 version: 0.18.0(@effect/experimental@0.60.0(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(effect@3.21.2)(ioredis@5.6.1))(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(@effect/rpc@0.75.1(@effect/platform@0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2))(effect@3.21.2))(effect@3.21.2) '@overeng/content-address': specifier: workspace:^ @@ -949,34 +955,43 @@ importers: '@overeng/otel-contract': specifier: workspace:^ version: link:../otel-contract - '@playwright/test': - specifier: ^1.59.1 - version: 1.59.1 - devDependencies: - '@effect/platform': - specifier: 0.96.1 - version: 0.96.1(patch_hash=08d6466db56675b7a32a3a3c64815a5b784f583b310b6758471a97d3db6edd32)(effect@3.21.2) - '@effect/vitest': - specifier: 0.29.0 - version: 0.29.0(effect@3.21.2)(vitest@3.2.4(@types/debug@4.1.13)(@types/node@25.3.3)(happy-dom@18.0.1)(jiti@2.6.1)(lightningcss@1.30.2)(tsx@4.21.0)(yaml@2.8.3)) '@overeng/utils': specifier: workspace:^ version: link:../utils + '@playwright/test': + specifier: 1.59.1 + version: 1.59.1 + effect: + specifier: 3.21.2 + version: 3.21.2 + remark-gfm: + specifier: 4.0.1 + version: 4.0.1 + remark-parse: + specifier: 11.0.0 + version: 11.0.0 + remark-stringify: + specifier: 11.0.0 + version: 11.0.0 + unified: + specifier: 11.0.5 + version: 11.0.5 + unist-util-visit: + specifier: 5.1.0 + version: 5.1.0 + vitest: + specifier: 3.2.4 + version: 3.2.4(@types/debug@4.1.13)(@types/node@25.3.3)(happy-dom@18.0.1)(jiti@2.6.1)(lightningcss@1.30.2)(tsx@4.21.0)(yaml@2.8.3) + devDependencies: '@overeng/utils-dev': specifier: workspace:^ version: link:../utils-dev '@types/node': specifier: 25.3.3 version: 25.3.3 - effect: - specifier: 3.21.2 - version: 3.21.2 typescript: specifier: 5.9.3 version: 5.9.3 - vitest: - specifier: 3.2.4 - version: 3.2.4(@types/debug@4.1.13)(@types/node@25.3.3)(happy-dom@18.0.1)(jiti@2.6.1)(lightningcss@1.30.2)(tsx@4.21.0)(yaml@2.8.3) packages/@overeng/notion-effect-schema: dependencies: @@ -1023,21 +1038,6 @@ importers: '@overeng/utils': specifier: workspace:^ version: link:../utils - remark-gfm: - specifier: 4.0.1 - version: 4.0.1 - remark-parse: - specifier: 11.0.0 - version: 11.0.0 - remark-stringify: - specifier: 11.0.0 - version: 11.0.0 - unified: - specifier: 11.0.5 - version: 11.0.5 - unist-util-visit: - specifier: 5.1.0 - version: 5.1.0 devDependencies: '@effect-atom/atom': specifier: 0.5.3