docs: restructure as single source of truth with auto-sync#1530
docs: restructure as single source of truth with auto-sync#1530
Conversation
Add scripts/sync-docs.sh for local manual sync via gh CLI.
📝 WalkthroughWalkthroughThis PR establishes documentation infrastructure and content for Wren Engine. It introduces a GitHub Actions workflow to automatically synchronize documentation changes from the main branch to a separate documentation website repository, along with supporting configuration and tooling. It also adds comprehensive documentation covering installation, core concepts, usage guides, and CLI references. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
docs/concept/what_is_context.md (1)
3-3: Minor: Consider simplifying "in order to" → "to".Line 3 uses "in order to work" which could be simplified to "to work" for more concise writing. This is a stylistic preference, not an error.
✏️ Proposed simplification
-In Wren Engine, context is the structured business understanding an AI agent needs in order to work with data correctly. +In Wren Engine, context is the structured business understanding an AI agent needs to work with data correctly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/concept/what_is_context.md` at line 3, Replace the phrase "in order to work with data correctly" with the more concise "to work with data correctly" in the sentence that begins "In Wren Engine, context is the structured business understanding..." so the line reads "...context is the structured business understanding an AI agent needs to work with data correctly." This is a simple stylistic change—locate that sentence in what_is_context.md and update the wording accordingly.docs/concept/architecture.md (1)
99-117: Add language identifiers to fenced code blocks.The ASCII/text diagrams in code blocks should specify a language identifier to satisfy markdown linters and improve clarity. While they render correctly without it, adding
textwould be more explicit.📝 Proposed fix to add language identifiers
For line 99:
-``` +```text User SQL (e.g. SELECT * FROM orders WHERE status = 'pending')Apply the same pattern to the other three code blocks at lines 176, 188, and 199.
Also applies to: 176-184, 188-195, 199-204
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/concept/architecture.md` around lines 99 - 117, Add the language identifier `text` to each fenced code block that contains the ASCII/text diagrams so markdown linters are satisfied; specifically update the block starting with "User SQL (e.g. SELECT * FROM orders WHERE status = 'pending')" and the subsequent diagram block that begins with the CTE/WITH example (the block showing WITH "orders" AS (...)), plus the other three diagram blocks referenced in the comment, by changing ``` to ```text for those fenced code blocks (leave the diagram content unchanged).scripts/sync-docs.sh (1)
82-85: Script may fail if PR already exists for this branch.Same issue as the workflow:
gh pr createwill error if a PR fromsync/engine-docs-${SHORT_SHA}already exists (e.g., on script re-run). Consider checking for existing PRs first or using--fillwith an existing PR check.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/sync-docs.sh` around lines 82 - 85, The PR creation step using gh pr create (the PR_URL assignment) will fail if a PR already exists for branch sync/engine-docs-${SHORT_SHA}; update the script to first check for an existing PR for that head (e.g., use gh pr list or gh pr view filtered by --head "sync/engine-docs-${SHORT_SHA}" or by searching title/branch) and if found capture its URL into PR_URL instead of calling gh pr create, otherwise call gh pr create as before; reference the existing symbols PR_URL, gh pr create, SHORT_SHA and TARGET_BRANCH when implementing this conditional flow..github/workflows/sync-docs.yml (2)
34-37: Directory list is duplicated in three places.The sync directories are hardcoded here, in
scripts/sync-docs.sh(line 30), and declared indocs/.sync.yml. If the sync scope changes, all three must be updated in lockstep.Consider extracting to a shared source (e.g., reading from
.sync.ymlviayq) or at minimum adding a comment cross-referencing the other locations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/sync-docs.yml around lines 34 - 37, The hardcoded directory list (get_started, concept, guide, reference) is duplicated across .github/workflows/sync-docs.yml, scripts/sync-docs.sh and docs/.sync.yml; update the workflow to read the canonical list from docs/.sync.yml (e.g., parse with yq) or at minimum add a clear cross-reference comment pointing to scripts/sync-docs.sh and docs/.sync.yml; locate the loop in sync-docs.yml (the for dir in ...; do block) and replace the hardcoded list with a command that loads directories from docs/.sync.yml via yq (or add the comment) and ensure the same symbol names (get_started, concept, guide, reference) are used consistently.
51-61: Workflow may fail on re-run if branch already exists.If the workflow is re-triggered for the same commit (e.g., manual re-run),
git push origin "${BRANCH}"will fail because the branch already exists on the remote. Similarly,gh pr createwill fail if a PR from that branch is already open.Consider adding guards:
🛠️ Proposed fix
BRANCH="sync/engine-docs-${GITHUB_SHA::8}" git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git checkout -b "${BRANCH}" git add -A git commit -m "docs: sync from wren-engine@${GITHUB_SHA::8}" - git push origin "${BRANCH}" - gh pr create \ + git push origin "${BRANCH}" --force-with-lease + + # Skip PR creation if one already exists for this branch + if ! gh pr list --head "${BRANCH}" --json number --jq '.[0].number' | grep -q .; then + gh pr create \ --title "docs: sync Wren Engine docs from wren-engine" \ --body "Auto-synced from [wren-engine@\`${GITHUB_SHA::8}\`](https://github.com/Canner/wren-engine/commit/${GITHUB_SHA})." \ --base "${{ vars.DOCS_REPO_BRANCH }}" + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/sync-docs.yml around lines 51 - 61, The workflow will fail on re-run if the branch or PR already exists; update the job around the BRANCH variable and the git/gh steps to guard against duplicates by first checking for the remote branch and existing PR before pushing/creating: verify if remote branch "${BRANCH}" exists and skip or update it (instead of blindly running git push origin "${BRANCH}"), and before running gh pr create, check for an open PR from that branch (use gh pr view / gh pr list or the GitHub API) and only call gh pr create when no PR exists; ensure the logic still creates the branch and commits when needed and handles idempotent re-runs gracefully.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/guide/profiles.md`:
- Around line 87-93: The JSON examples include invalid `//` comments; remove the
inline `// Flat format` and `// Envelope format (auto-unwrapped)` from inside
the JSON fences, split them into two separate ```json``` code blocks (one
containing {"datasource":"postgres",...} and the other containing
{"datasource":"duckdb",...}) and add a short plain-text sentence after the
blocks stating "The first example shows the flat format; the second shows the
envelope format (auto-unwrapped)." to preserve the explanations; update the
section in profiles.md where the two commented JSON examples appear.
In `@docs/reference/cli.md`:
- Around line 65-71: Replace the invalid commented JSON examples under the "Flat
format" and "Envelope format (auto-unwrapped)" blocks by removing the bash-code
comments, marking the fences as ```json, and providing valid JSON objects (e.g.,
include full fields instead of "..." in the flat example such as "database",
"user", "password"); specifically update the block titled "Flat format" to a
proper JSON code fence with a complete object and update the "Envelope format
(auto-unwrapped)" block to a ```json fence containing
{"datasource":"duckdb","properties":{"url":"/data","format":"duckdb"}} so both
snippets are valid JSON and free of inline comments.
In `@scripts/sync-docs.sh`:
- Around line 53-56: Guard against empty variables before calling rm by
validating TARGET and each dir are non-empty and not "/" and then use safe
parameter expansion; for example, in the loop over SYNC_DIRS check [ -n
"${TARGET:-}" ] && [ -n "${dir:-}" ] and optionally [ "${TARGET}" != "/" ] and [
"${dir}" != "" ] before running rm, or replace the risky call with a safe
expansion like rm -rf -- "${TARGET:?}/${dir:?}" (and ensure REPO_ROOT is also
validated similarly) so rm cannot accidentally target the filesystem root;
update the loop that references SYNC_DIRS, TARGET, and REPO_ROOT accordingly.
---
Nitpick comments:
In @.github/workflows/sync-docs.yml:
- Around line 34-37: The hardcoded directory list (get_started, concept, guide,
reference) is duplicated across .github/workflows/sync-docs.yml,
scripts/sync-docs.sh and docs/.sync.yml; update the workflow to read the
canonical list from docs/.sync.yml (e.g., parse with yq) or at minimum add a
clear cross-reference comment pointing to scripts/sync-docs.sh and
docs/.sync.yml; locate the loop in sync-docs.yml (the for dir in ...; do block)
and replace the hardcoded list with a command that loads directories from
docs/.sync.yml via yq (or add the comment) and ensure the same symbol names
(get_started, concept, guide, reference) are used consistently.
- Around line 51-61: The workflow will fail on re-run if the branch or PR
already exists; update the job around the BRANCH variable and the git/gh steps
to guard against duplicates by first checking for the remote branch and existing
PR before pushing/creating: verify if remote branch "${BRANCH}" exists and skip
or update it (instead of blindly running git push origin "${BRANCH}"), and
before running gh pr create, check for an open PR from that branch (use gh pr
view / gh pr list or the GitHub API) and only call gh pr create when no PR
exists; ensure the logic still creates the branch and commits when needed and
handles idempotent re-runs gracefully.
In `@docs/concept/architecture.md`:
- Around line 99-117: Add the language identifier `text` to each fenced code
block that contains the ASCII/text diagrams so markdown linters are satisfied;
specifically update the block starting with "User SQL (e.g. SELECT * FROM orders
WHERE status = 'pending')" and the subsequent diagram block that begins with the
CTE/WITH example (the block showing WITH "orders" AS (...)), plus the other
three diagram blocks referenced in the comment, by changing ``` to ```text for
those fenced code blocks (leave the diagram content unchanged).
In `@docs/concept/what_is_context.md`:
- Line 3: Replace the phrase "in order to work with data correctly" with the
more concise "to work with data correctly" in the sentence that begins "In Wren
Engine, context is the structured business understanding..." so the line reads
"...context is the structured business understanding an AI agent needs to work
with data correctly." This is a simple stylistic change—locate that sentence in
what_is_context.md and update the wording accordingly.
In `@scripts/sync-docs.sh`:
- Around line 82-85: The PR creation step using gh pr create (the PR_URL
assignment) will fail if a PR already exists for branch
sync/engine-docs-${SHORT_SHA}; update the script to first check for an existing
PR for that head (e.g., use gh pr list or gh pr view filtered by --head
"sync/engine-docs-${SHORT_SHA}" or by searching title/branch) and if found
capture its URL into PR_URL instead of calling gh pr create, otherwise call gh
pr create as before; reference the existing symbols PR_URL, gh pr create,
SHORT_SHA and TARGET_BRANCH when implementing this conditional flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d91f4d8d-0fac-4401-985c-44dc78f14ef1
📒 Files selected for processing (20)
.github/workflows/sync-docs.ymldocs/.sync.ymldocs/README.mddocs/concept/architecture.mddocs/concept/benefits_llm.mddocs/concept/what_is_context.mddocs/concept/what_is_mdl.mddocs/get_started/connect.mddocs/get_started/installation.mddocs/get_started/quickstart.mddocs/guide/memory.mddocs/guide/modeling/model.mddocs/guide/modeling/overview.mddocs/guide/modeling/relation.mddocs/guide/modeling/view.mddocs/guide/modeling/wren_project.mddocs/guide/profiles.mddocs/reference/cli.mddocs/reference/skills.mdscripts/sync-docs.sh
Summary
docs/to mirror the doc website layout (get_started/,concept/,guide/,reference/).mdlinks (works in both GitHub preview and Docusaurus)scripts/sync-docs.shfor local manual sync viaghCLI.github/workflows/sync-docs.yml— auto-creates a PR on the doc website when docs change onmainDOCS_REPO,DOCS_REPO_BRANCH), not hardcodedSetup required
After merging, set the repository variables:
And ensure the
CROSS_REPO_TOKENsecret has push + PR access to the doc website repo.Test plan
scripts/sync-docs.shlocally (dry-run) to confirm diff is clean against the doc website🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Documentation
Chores