Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 28 additions & 38 deletions .claude/commands/review-translations.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,38 +224,39 @@ Read `.claude/translation-review/known-patterns.md` — this contains all issue

### Translation Glossary (AUTHORITATIVE SOURCE)

The EthGlossary API (`https://ethereum.org/api/glossary`) is the **authoritative source** for all Ethereum term translations across the entire pipeline. Community-voted glossary terms are not suggestions — they are the required translations.
**ETHGlossary** is the authoritative source for Ethereum term translations. Deviations are critical issues, not warnings.

**Fetch live from the API first, fall back to cache only if the API is unreachable:**
Resolve the base URL from the pipeline config (env var wins; default lives in `src/scripts/intl-pipeline/config.ts` under `GLOSSARY_API_URL`):

```bash
# Fetch live glossary
GLOSSARY_CACHE="$HOME/.claude/translation-review/fetch-translation-glossary.json"
GLOSSARY_URL="https://ethereum.org/api/glossary"

# Try live fetch first
if curl -sf "$GLOSSARY_URL" -o "$TMPDIR/glossary-live.json" 2>/dev/null; then
# Update cache with fresh data
cp "$TMPDIR/glossary-live.json" "$GLOSSARY_CACHE"
echo "Glossary fetched live from API and cache updated."
else
echo "WARNING: API unreachable, using cached glossary."
fi
GLOSSARY_API_URL="${GLOSSARY_API_URL:-$(grep -oE 'https://[^"]+/api/v[0-9]+' "$WORKTREE_PATH/src/scripts/intl-pipeline/config.ts" | head -1)}"
GLOSSARY_HOST="${GLOSSARY_API_URL%/api/*}"
```

Schema: `Array<{ string_term, translation_text, language_code, total_votes }>`.
Fetch `llms.txt` first as the canonical reference for endpoints and languages; if examples below disagree, llms.txt wins:

For each language being reviewed, extract relevant glossary terms:
```bash
curl -sf "$GLOSSARY_HOST/llms.txt" \
-o "$TMPDIR/ethglossary-llms.txt" \
&& cp "$TMPDIR/ethglossary-llms.txt" "$HOME/.claude/translation-review/ethglossary-llms.txt"
```
Filter entries where language_code matches the target locale.
Sort by total_votes descending.
Include ALL terms for the language (not just top 50) — these are authoritative.

**Preferred — per-file filter** (`POST /filter`): returns only the glossary terms that appear in the English source, with translations sorted by occurrence. Avoids pulling hundreds of irrelevant terms into agent context.

```bash
ENGLISH_SOURCE=$(cat "$WORKTREE_PATH/public/content/{path}.md")
curl -sf -X POST "$GLOSSARY_API_URL/filter" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg text "$ENGLISH_SOURCE" --arg lang "{LANGUAGE_CODE}" '{text: $text, language: $lang}')"
```

**Fallback — full language** when filtering per file is impractical or the endpoint is unreachable:

```bash
curl -sf "$GLOSSARY_API_URL/translations/{LANGUAGE_CODE}"
```

**The glossary is used in every subsequent phase:**
- **Phase 3 (Review):** Agents treat glossary deviations as CRITICAL, not warnings
- **Phase 5 (Auto-Fix):** Glossary deviations are auto-corrected to the top-voted translation
- **Phase 8 (Knowledge Base):** New deviations discovered are logged for future reviews
Used in Phase 3 (review — deviations are CRITICAL), Phase 5 (auto-fix corrects to ETHGlossary translation), Phase 8 (new deviations logged).

### Per-Language Prior Findings
Check if `.claude/translation-review/per-language/{LANGUAGE_CODE}.md` exists. If so, read it and inject relevant prior findings into the agent prompt.
Expand Down Expand Up @@ -331,22 +332,11 @@ The community has voted on these translations for key Ethereum terms. Use these
- Review the entire current content of each file
- Compare against English source files from the worktree

## MANDATORY: Fetch Ethereum Glossary FIRST

**Before reviewing ANY translation, you MUST fetch the official Ethereum glossary for the language(s) being reviewed.** This is non-negotiable. The glossary contains community-approved translations for key terms.

```bash
# Fetch full glossary (all languages):
curl -s "https://ethereum.org/api/glossary/"

# Fetch glossary for a specific language (optional lang param, one at a time):
curl -s "https://ethereum.org/api/glossary/?lang=fr"
curl -s "https://ethereum.org/api/glossary/?lang=ja"
```
## MANDATORY: Use ETHGlossary for the target language

The glossary returns approved translations per language. Use these as the authority for how technical terms SHOULD be translated. Flag any deviations as warnings with "Glossary mismatch" in the issue column.
Use the ETHGlossary terms fetched in Phase 2 as the authority for technical term translations. Report deviations as **critical** issues (not warnings), with the current (wrong) translation and the expected (ETHGlossary) translation so Phase 5 can auto-fix them.

**If you skip the glossary, the entire review is invalid.**
**If you skip ETHGlossary, the entire review is invalid.**

## Review Checklist

Expand Down Expand Up @@ -733,6 +723,6 @@ ETH, Wei, Gwei, Gas
- Use `--model=sonnet` or `--model=haiku` for faster reviews
- Build verification is opt-in: `--build-local` for local scoped builds, `--netlify-check` for Netlify deploy preview checks
- If an agent exceeds context limits with Opus, fall back to Sonnet with Grep-based file inspection
- **EthGlossary API** (`https://ethereum.org/api/glossary`) is fetched live in Phase 2 and is the authoritative source for term translations across the entire pipeline — review (Phase 3), auto-fix (Phase 5), and knowledge base (Phase 8). The local cache at `~/.claude/translation-review/fetch-translation-glossary.json` is a fallback only.
- **ETHGlossary** is the authoritative source for term translations across review (Phase 3), auto-fix (Phase 5), and knowledge base (Phase 8). See Phase 2 for usage; `llms.txt` is the canonical endpoint reference.
- Knowledge base at `.claude/translation-review/` accumulates findings across reviews (committed to repo)
- `gh` CLI commands require `dangerouslyDisableSandbox: true` due to TLS certificate verification issues in sandbox mode
8 changes: 4 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,10 @@ pnpm events-import # Import community events

### Internationalization

- **25 languages** supported via Crowdin (canonical list: `i18n.config.json`)
- **RTL support** for Arabic, Urdu
- Translation files (JSON format) in `src/intl/[locale]/`
- Content translations managed through Crowdin platform
- **25 languages** supported (canonical list: `i18n.config.json`); **RTL support** for Arabic, Urdu
- JSON UI strings in `src/intl/[locale]/`; translated markdown content in `public/content/translations/[locale]/`
- Non-English markdown is propagated by the **intl-pipeline** (`src/scripts/intl-pipeline/`, entry `main.ts`). **Do not hand-propagate English changes into non-English files** -- let the pipeline run, or trigger `intl-pipeline.yml` with `stamp_only: true` if manifests must catch up urgently (e.g. unblocking a build). Hand-fixing a translation error is fine when the English side hasn't moved, since the manifest mapping stays valid. Spec: `tests/specs/PIPELINE-SPEC.md`.
- Glossary: base URL from `GLOSSARY_API_URL` env var; default in `src/scripts/intl-pipeline/config.ts`. ETHGlossary is authoritative for Ethereum term translations.

### Markdown Content

Expand Down
44 changes: 9 additions & 35 deletions src/scripts/intl-pipeline/FUTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,53 +6,27 @@

## Pipeline Quality

### 1. Fix Comment Restoration Concatenation Bug
### 1. Deep JSON Validation

**Problem:** Translated code comments are concatenated with the original instead of replacing them. Example: `// **** REMOVE LIQUIDITY **** // **** ...Arabic... ****`

**Root cause:** `restoreComments()` in `lib/llm/code-block-extractor.ts` appends the translated comment to the existing line content instead of replacing. `translateCodeComments()` should use `strippedCode` (comments removed) as the base for restoration, not the original `block.content`.

**Complexity:** Low. ~5 line change.

### 2. Stronger Glossary Enforcement

**Problem:** High-frequency glossary terms like "mint" are translated inconsistently. The glossary is sent in the prompt but Gemini doesn't always adhere strictly.

**Proposed solution:**
- Post-translation pass that scans output for known English glossary terms that should have been translated, and flags or auto-corrects them
- Consider a validation step that compares glossary term frequency in source vs translation
- May overlap with existing sanitizer `fixKnownBrandGarbles` pattern -- extend to glossary terms

### 3. Transliteration During Translation

**Problem:** Gemini regresses on transliterations (author names, brand names like "Proto-danksharding") that the sanitizer then has to catch.

**Proposed solution:**
- Include transliteration banks directly in the translation prompt for non-Latin locales
- Add language-group-specific transliteration rules to `lib/llm/prompt-builder.ts`
- Ensure the translation prompt and sanitizer are aligned on the same transliteration bank

### 4. Deep JSON Validation

**Problem:** Current validation only checks top-level JSON keys. Nested namespaces can have dropped or renamed keys at depth > 1 without detection.
**Problem:** Current validation only checks top-level JSON keys (`validateTranslatedJson` in `lib/llm/output-validation.ts`). Nested namespaces can have dropped or renamed keys at depth > 1 without detection.

**Proposed solution:** Recursive key comparison that walks the full object tree, reporting missing/added/renamed keys at any depth.

---

## Pipeline Features

### 5. Split PRs (one PR per language)
### 2. Restore Split PRs (one PR per language) -- nice-to-have

**Problem:** Large multi-language runs produce a single massive PR that's hard to review.
**Problem:** Large multi-language runs produce a single massive PR that's hard to review. This previously worked via a `SPLIT_PRS` workflow input (commit `a52be9ddd9`) but was removed during the pipeline rewrite.

**Proposed solution:** A workflow input `split_prs` (boolean, default false) that creates a separate branch and PR per language.
**Nuance:** Today's orchestration assumes one `intl/pending-<base>` per base. Restoring split PRs means applying the full orchestration contract (base-into-pending merge, fail-fast, local-tree sync, temp branch, PR) independently per language -- e.g. `intl/pending-<base>-<lang>`. The old implementation predates this model, so it needs adaptation rather than cherry-pick.

---

## Automation

### 6. Auto-trigger Translations on Content Merge
### 3. Auto-trigger Translations on Content Merge

**Problem:** Content changes merged to dev currently require manual triggering of the translation pipeline.

Expand All @@ -61,7 +35,7 @@
- Automatically triggers the translation workflow for changed files
- Should respect a cooldown/batch window to avoid triggering on every small merge

### 7. Full-language Retroactive Cleanup
### 4. Full-language Retroactive Cleanup

**Problem:** Many languages were translated before current pipeline improvements. Those translations have the same class of issues found in Arabic (brand garbles, wrong compounds, etc.).

Expand All @@ -74,7 +48,7 @@

## Image Translation

### 8. Translate Text in Diagrams and Infographics
### 5. Translate Text in Diagrams and Infographics

**Problem:** Educational diagrams and infographics contain English text that remains untranslated, creating a jarring experience on otherwise fully translated pages.

Expand All @@ -89,7 +63,7 @@

## Package Extraction

### 9. Extract i18n Tooling into Standalone Packages
### 6. Extract i18n Tooling into Standalone Packages

**Problem:** Glossary, translation pipeline, and (future) image pipeline are embedded in the repo. Creates bloat and prevents reuse.

Expand Down
23 changes: 20 additions & 3 deletions src/scripts/intl-pipeline/lib/github/branches.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,26 @@ export const branchExists = async (branchName: string): Promise<boolean> => {
return res.ok
}

/**
* Delete a branch on GitHub. Returns true if deleted or already absent.
* Returns false with a warning on API failure. Never throws.
*/
export const deleteBranch = async (branchName: string): Promise<boolean> => {
const url = `https://api.github.com/repos/${config.ghOrganization}/${config.ghRepo}/git/refs/heads/${branchName}`
const res = await fetchWithRetry(url, {
method: "DELETE",
headers: gitHubBearerHeaders,
})
// 204: deleted, 422: ref does not exist
if (res.ok || res.status === 422) return true
const body = await res.text().catch(() => "")
console.warn(`[branch] Delete ${branchName} failed (${res.status}): ${body}`)
return false
}

/**
* Merge a base branch into a head branch via the GitHub API.
* Used to keep the staging branch up-to-date with dev.
* Used to keep the pending branch up-to-date with dev.
* Returns true if merge succeeded (or was already up-to-date).
*/
export const mergeBranchInto = async (
Expand Down Expand Up @@ -131,11 +148,11 @@ export const mergeBranchInto = async (
}

/**
* Ensure a staging branch exists and is up-to-date with its base.
* Ensure a pending branch exists and is up-to-date with its base.
* Creates the branch if it doesn't exist; merges base into it if it does.
* Returns the branch name.
*/
export const ensureStagingBranch = async (
export const ensurePendingBranch = async (
branchName: string,
baseBranch: string
): Promise<string> => {
Expand Down
71 changes: 65 additions & 6 deletions src/scripts/intl-pipeline/main.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,10 @@ import {
import i18nConfig from "../../../i18n.config.json"

import {
ensureStagingBranch,
branchExists,
createBranchFromSha,
deleteBranch,
ensurePendingBranch,
getBranchObject,
mergeBranchInto,
} from "./lib/github/branches"
Expand Down Expand Up @@ -541,7 +544,7 @@ async function runIncremental(

async function main() {
const startTime = Date.now()
logSection("Incremental Translation Pipeline v5")
logSection("Incremental Translation Pipeline")

if (!config.targetPaths.length) {
console.error("[ERROR] TARGET_PATH is required")
Expand All @@ -558,10 +561,58 @@ async function main() {
log(`Mode: ${config.mode}`)
log(`Concurrency: ${config.concurrency}`)

// Create temp working branch for crash safety
// If the pending branch already exists (prior run against the same base),
// use it as the baseline: merge current base into it first (fail-fast on
// conflict), sync local working tree from it so drift detection reads the
// latest stamped manifests, and branch the temp branch off of it.
const pendingExists = await branchExists(targetBranch)
let tempBranchSourceSha: string

if (pendingExists) {
log(`Pending branch exists: ${targetBranch}`)
log(`Merging ${baseBranch} into ${targetBranch}...`)
const merged = await mergeBranchInto(baseBranch, targetBranch)
if (!merged) {
throw new Error(
`Cannot merge ${baseBranch} into ${targetBranch}. ` +
`Either resolve conflicts on ${targetBranch} manually, or delete the branch and retry. ` +
`Aborting before any translation work.`
)
}
tempBranchSourceSha = (await getBranchObject(targetBranch)).sha

// Force-update the local ref and check out pending's versions of the
// manifest and content paths. This is destructive to any local edits in
// those paths and is intended to run in CI (GitHub Actions) only, where
// the working tree is ephemeral. The pipeline requires GEMINI_API_KEY
// which is loaded from GH Secrets, so accidental local invocation is
// unlikely, but edits in the listed paths will be clobbered if it happens.
log(`Syncing local working tree from ${targetBranch}...`)
execFileSync(
"git",
["fetch", "origin", `+${targetBranch}:${targetBranch}`],
{ stdio: "inherit" }
)
execFileSync(
"git",
[
"checkout",
targetBranch,
"--",
".manifests",
"public/content",
"src/intl",
],
{ stdio: "inherit" }
)
} else {
tempBranchSourceSha = (await getBranchObject(baseBranch)).sha
}

// Create temp working branch for crash safety (from pending if it exists, otherwise base)
const tempBranch = generateTempBranchName()
log(`Temp branch: ${tempBranch}`)
await ensureStagingBranch(tempBranch, baseBranch)
await createBranchFromSha(tempBranch, tempBranchSourceSha)
const baseBranchSha = (await getBranchObject(baseBranch)).sha
const committer = new SharedCommitter(tempBranch)
await committer.init()
Expand Down Expand Up @@ -711,19 +762,27 @@ async function main() {
}
}

// Merge temp branch into target branch
// Merge temp branch into pending, then clean up the temp.
// If pending didn't exist at the start, create it from base now.
if (committedFiles.length > 0 || hasCommits) {
log(`Merging ${tempBranch} -> ${targetBranch}`)
await ensureStagingBranch(targetBranch, baseBranch)
if (!pendingExists) {
await ensurePendingBranch(targetBranch, baseBranch)
}
const merged = await mergeBranchInto(tempBranch, targetBranch)
if (!merged) {
throw new Error(
`Failed to merge ${tempBranch} into ${targetBranch}. Temp branch preserved for manual resolution.`
)
}
log(`Merged successfully`)

// Clean up temp branch -- its work is now on pending
await deleteBranch(tempBranch)
} else {
log(`No changes to merge`)
// Nothing landed on the temp branch -- clean it up
await deleteBranch(tempBranch)
}

// Create or update PR unless skipped
Expand Down
Loading
Loading