i18n: translation pipeline, /videos#18063
Conversation
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
✅ Deploy Preview for ethereumorg ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
🌐 Translation review started. View progress |
There was a problem hiding this comment.
Translation Quality Review
PR: #18063 (i18n: translation pipeline, /videos)
Branch HEAD: 2df3ad60b4
Languages: ar, bn, cs, de, es, fr, hi, id, it, ja, ko, mr (12)
Files reviewed: 695 video transcripts (58 per language; fr has 57)
Date: 2026-04-29
Fixes: No fixes applied (review-only)
Summary by Language
| Language | Files | Quality Score | Issues |
|---|---|---|---|
| ar | 58 | 7.8/10 | 0 critical, ~8 warnings |
| bn | 58 | 8.6/10 | 0 critical, ~7 warnings |
| cs | 58 | 9.6/10 | 0 critical, 2 warnings |
| de | 58 | 9.0/10 | 0 critical, 5 warnings |
| es | 58 | 9.2/10 | 0 critical, 2 warnings |
| fr | 57 | 9.6/10 | 0 critical, 1 warning (missing file) |
| hi | 58 | 9.6/10 | 0 critical, 6 warnings |
| id | 58 | 9.6/10 | 0 critical, 2 warnings |
| it | 58 | 8.8/10 | 0 critical, 5 warnings |
| ja | 58 | 9.0/10 | 2 critical, 3 warnings |
| ko | 58 | 9.2/10 | 0 critical, 7 warnings |
| mr | 58 | 9.2/10 | 0 critical, 4 warnings |
Overall: 9.1/10 average across 12 languages. 2 critical issues, both in a single Japanese file.
Critical Issues
ja — public/content/translations/ja/videos/surveillance-silence-reclaiming-privacy/index.md
- Line 161-162: Duplicate section heading. English heading
Part two: let's save the world (25:22)followed immediately by Japanese第2部:世界を救おう (25:22). The Japanese heading anchor also points to a different later section. Keep one Japanese heading; align the anchor to the English source slug. - Lines 164-390: ~50+ paragraph pairs ship the English source paragraph above its Japanese translation. The page will render English content twice. Remove all English paragraphs (Japanese translations beneath them are present and high quality).
- Line 392: Heading
#### 私たちが見たい未来について、私たちが下すべき選択 (46:56)is missing required{#anchor-id}. Markdownlint pre-commit hook requires h1-h4 IDs.
This single file appears to be a transcript that was partially edited before the translation pass completed — the other 57 Japanese files are clean.
Warnings — highlights per language
- ar: Pervasive glossary deviation:
staking → التخزين(174 occurrences) instead of glossary canonicalالتحصيص;validator → المصادقة(~60 occurrences) instead ofمُدقِّق. Speaker labels left in Latin in several files (notably 36×**Binji:**instani-kulechov-building-aave).Ethereum Foundationauthor rendered two ways. Brand-name body transliteration inconsistent (geth/Lido/Optimism/Solidity/TikTok left in Latin while transliteration bank exists). - bn: Brand names left in Latin script in body (Twitter, TikTok, Optimism, Arbitrum, Lido, Coinbase, EigenLayer, Apple, Solidity, Cardano) despite Bengali bank having canonical transliterations. First-mention parenthetical pattern is good but secondary mentions revert to Latin.
- cs: 2 minor stylistic items (literal idioms in
restaking-explainedandai-agents-interview-luna). - de: "Proof of Authority" translated as "Autoritätsnachweis" in 2 files while
Proof-of-Stake/Proof-of-Workare kept in English everywhere. Recommend keeping all three in English. Minor breadcrumb plural mismatch inkey-pair-eth-build. - es: Cross-file tu/usted variance (within-file consistency is fine).
- fr:
privacy-is-existentialis missing frompublic/content/translations/fr/videos/(58 EN vs 57 FR). Pipeline should propagate on next run. Minor capitalization variance. - hi: 6 files retain English speaker tags (
Ryan Sean Adams:,Stani Kulechov:,Binji:,Domothy:) while majority files transliterate them. Files:blobspace-101-dencun,stani-kulechov-building-aave,ethereum-localism-global-protocols-local-power,ethereum-institutional-privacy-panel,real-state-of-l2s-bartek-kiepuszewski,ethereums-quantum-plan-justin-drake. - id: Minor tone register mixing in
zero-knowledge-proofs-5-levels(justified by source) andcrypto-security-passwords(long paragraph matches source). - it: Inconsistent capitalization/translation of consensus terms — "Proof-of-Stake" English vs "Prova di lavoro" / "prova di autorità" translated; pick one register. Cross-file tu/voi register varies.
- ja: Beyond the 2 criticals: brand-name handling mixed (some files transliterate to katakana, others keep Latin) — both acceptable per policy, just inconsistent.
- ko: "Finematics" rendered as
파이네매틱스ineip-4844-dencun-explainedvs the 7-file standard파인매틱스. "Ethereum Foundation" split: 8 files use이더리움 재단(idiomatic), 5 use이더리움 파운데이션; recommend unifying on이더리움 재단. A fewauthor:fields left in Latin (aantonop,Junion,Bankless,ETHBoulder). - mr:
topictags translated in some files and left in English in others. One brand "Finematics" rendered two ways. One Devanagari numeral२inside a tag value (स्तर-२) — minor.
Detailed per-language scoring
Detailed per-language scoring tables (per-category notes per file) are posted as a follow-up comment to keep this Review body within parser limits.
Notes on this run
- No fixes applied — this review ran from GitHub Actions, which cannot commit auto-fixes back to the PR branch.
- Glossary deviations on
arreported as warnings, not critical: the spec calls glossary deviations critical, but the per-language KB (ar.md) records that multiple staking/validator translations exist in the wild. Treat the systemicstaking → التخزينandvalidator → المصادقةas recommended fixes — running the auto-fix locally will surface and standardize them against the canonical glossary. fr/privacy-is-existentialmissing translation is a pipeline-side issue, not a translation-quality issue — the next intl-pipeline run should propagate it.
To apply fixes, run locally:
/review-translations --pr=18063 --fix
This will, in particular: clean up ja/surveillance-silence-reclaiming-privacy/index.md (remove interleaved English, fix duplicate heading + missing anchor) and standardize Arabic staking/validator to glossary canonical.
Reviewed by Claude Code via /review-translations CI workflow
Detailed per-language scoring (follow-up to translation review)This comment supplements the Translation Quality Review above with per-category scoring tables. The summary table and critical issues are in the Review body. ar (7.8/10)
Far better quality than the PR #17105 baseline (5.2). Frontmatter integrity is fully preserved (no translated keys, no broken bn (8.6/10)
cs (9.6/10)
Professional, near-publication quality. Recommend merging without translation-side changes. de (9.0/10)
es (9.2/10)
fr (9.6/10)
Only notable gap: hi (9.6/10)
Zero Devanagari numerals, no domain transliteration, no frontmatter key translations, no untranslated paragraphs. id (9.6/10)
it (8.8/10)
ja (9.0/10)
One file ( ko (9.2/10)
mr (9.2/10)
|
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
|
@claude /review-translations --language=pl,pt-br,ru,sw,ta,te,tr,uk,ur,vi,zh-tw,zh |
|
🌐 Translation review started. View progress |
There was a problem hiding this comment.
Translation Quality Review
PR: #18063 (i18n: translation pipeline, /videos)
Branch HEAD: 858e8e017f
Languages: pl, pt-br, ru, sw, ta, te, tr, uk, ur, vi, zh-tw, zh (12)
Files reviewed: 696 video transcripts (58 per language)
Date: 2026-04-29
Fixes: No fixes applied (review-only)
Companion review to the earlier batch on this PR (ar/bn/cs/de/es/fr/hi/id/it/ja/ko/mr) -- covers the remaining 12 languages of the PR.
Summary by Language
| Language | Files | Quality Score | Issues |
|---|---|---|---|
| pl | 58 | 9.3/10 | 0 critical, ~5 warnings |
| pt-br | 58 | 9.4/10 | 0 critical, 5 warnings |
| ru | 58 | 8.8/10 | 0 critical, ~10 warnings |
| sw | 58 | 8.9/10 | 0 critical, ~10 warnings |
| ta | 58 | 7.7/10 | 56 critical (translated topic: slugs), ~5 warnings |
| te | 58 | 7.0/10 | 14+ critical (translated topic: slugs), ~6 warnings |
| tr | 58 | 9.7/10 | 0 critical, 1 warning |
| uk | 58 | 9.0/10 | 0 critical, ~25 warnings (brand-strategy stylistic) |
| ur | 58 | 7.7/10 | 9 critical (YAML-quote escaping, span tags inside heading IDs), ~26 warnings (Eastern-Arabic numerals) |
| vi | 58 | 9.0/10 | 0 critical, ~5 warnings |
| zh-tw | 58 | 8.8/10 | 0 critical, ~12 warnings |
| zh | 58 | 8.5/10 | 0 critical, ~20 warnings |
Overall: 8.7/10 average across 12 languages. Critical issues concentrated in Urdu (frontmatter YAML escaping, heading-ID corruption -- 6 files) and Tamil/Telugu (translated topic: taxonomy slugs -- pattern across most files).
Critical Issues
ur -- YAML / heading-ID structural defects (6 files)
ai-agents-interview-luna/index.mdline 2 -- Duplicated/recursivetitle:field with unescaped inner double-quote from inline<span dir="ltr">tag. Thetitle:key appears twice nested into itself.devconnect-argentina-2025-recap/index.mdlines 2-3 -- Unescaped double-quotes inside double-quoted YAML title and description.post-quantum-security-ethereum-roadmap/index.mdlines 2-3 -- Same pattern.pectra-upgrade-overview/index.mdline 3 -- Same pattern.ethereum-evolution-glamsterdam/index.mdline 60 -- Heading-ID anchor block contains literal<span dir="ltr">2s</span>inside the curly{#...}slug. Violates markdownlint custom-id rule.how-to-make-a-guerilla-l2/index.mdline 51 -- Same heading-ID issue.
These are YAML/MDX parser-level bugs that may break the build for Urdu locale.
ta -- Frontmatter topic: slugs translated (56 of 58 files)
The English source uses kebab-case Latin slugs (community-stories, how-ethereum-works, roadmap-and-priorities, etc.) as taxonomy/filter keys. Tamil files translate them to Tamil script (e.g. தொகுதிச்சங்கிலி, வழிகாட்டி-வரைபடம், டெஸ்சி, etc.), which won t match any canonical taxonomy slug and risks fragmenting topic filtering. Pattern is uniform -- only 2 files retain Latin.
te -- Frontmatter topic: slugs translated (~14+ confirmed; pattern likely affects all 58)
Same defect as Tamil: topic slugs rendered in Telugu script (e.g. ఎథీరియం-ఎలా-పనిచేస్తుంది, నవీకరణలు, కమ్యూనిటీ, etc.). Some files mix Latin (eip-4844, dencun, dao) and Telugu in the same topic array.
Note on the topic-tag policy: The pl, zh-tw, and zh reviewers noted similar inconsistency at warning level. The codebase doc says concept tags are intentionally translated by Crowdin but also says frontmatter
tagsarrays must stay Latin. Recommend: clarify the policy fortopic:arrays specifically and either revert to Latin slugs in ta/te or normalize across all locales.
(Detailed per-language warnings, scoring breakdowns, and per-file issue lists posted as a follow-up comment.)
To apply fixes, run locally:
/review-translations --pr=18063 --fix
Highest-value targeted runs: fix ur YAML quote escaping (4 files) + span tags inside heading-IDs (2 files); revert ta/te topic: arrays to the English Latin slugs from the source files; standardize Arabic-script numerals in ur body text to Western Arabic via numberFormat() policy; normalize sw rollup/mikusanyiko against glossary.
Reviewed by Claude Code via /review-translations CI workflow
Translation Review — Per-Language Warnings (PR #18063)Follow-up detail to the Translation Quality Review covering pl, pt-br, ru, sw, ta, te, tr, uk, ur, vi, zh-tw, zh. Warnings highlights per language
Per-language scoring breakdowns posted in a separate comment. |
Translation Review — Per-Language Scoring Detail (PR #18063)Follow-up to the Translation Quality Review — per-category scoring breakdown for each of the 12 languages. pl — Overall 9.3/10
pt-br — Overall 9.4/10
ru — Overall 8.8/10
sw — Overall 8.9/10
ta — Overall 7.7/10
te — Overall 7.0/10
tr — Overall 9.7/10
uk — Overall 9.0/10
ur — Overall 7.7/10
vi — Overall 9.0/10
zh-tw — Overall 8.8/10
zh — Overall 8.5/10
Companion to the main review — supplemental detail does not need its own SHA marker. |
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Lock videos frontmatter taxonomy/metadata fields (topic, uploadDate, duration, educationLevel, youtubeId, format) so they match English byte-for-byte. Update LLM prompt with explicit per-field policy, ban span dir wrappers in frontmatter (use LRI/PDI U+2066/U+2069 instead), and drop redundant heading-ID rules since Gemini never sees them after the normalizer's Pass 6 strips them.
Protect heading-ID anchor blocks {#...} from late RTL passes by adding them to RTL_SKIP_PATTERN. syncHeaderIdsWithEnglish was already copying clean English IDs into translated headers, but fixBareRtlValues was re-corrupting them afterward by wrapping fragments like 2s in span dir=ltr.
Defensive fix for backreference latent bugs in five frontmatter editors (normalizeFrontmatterDates, syncButtonsFrontmatterFields, quoteFrontmatterNonAscii, fixDuplicateFrontmatterAuthor, fixFrontmatterLang). When the replacement string contains a dollar-N sequence from user content (e.g. dollar-17M), it was being interpreted as a regex backreference. Switched all five to callback form.
Document patterns 58-61 in sanitizer-test-research.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Re-run sanitizer over all 1391 translated video markdown files (24 non-English locales). 1355 files modified.
Lock the videos taxonomy/metadata fields (topic, uploadDate, duration, educationLevel, youtubeId, format) to match the English source byte-for-byte. Translatable fields (title, description, breadcrumb, lang) are preserved.
Strip span dir=ltr wrappers from heading-ID anchor blocks {#...} in RTL locales (ur). Convert legitimate span dir=ltr wrappers inside frontmatter values to U+2066 (LRI) / U+2069 (PDI) BiDi isolates so they no longer break YAML.
Hand-recover the recursive title damage in ai-agents-interview-luna for ur and ta where the LLM's $17M span had cascaded into duplicated frontmatter keys; the inner translated value is restored.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
The RTL markdown prompt now carries a dual BiDi policy: span dir=ltr for body content (MDX parses HTML, browser honors dir) and U+2066 (LRI) / U+2069 (PDI) for frontmatter values (the inner double-quote on a dir attribute terminates the outer YAML string and breaks the build). The test previously asserted markdown contained span and did NOT contain U+2066, which conflicted with the new frontmatter rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>

Automated Translations
This PR contains translations managed by the intl pipeline.
Run: 2026-04-28 22:39 UTC
public/content/videos/Notes
public/content/videos/privacy-is-existential/index.md(fr) -- 3 attempts across model fallback chain, all hitfinishReason=RECITATION. This is largely deterministic for a given prompt; a retry with the same input is unlikely to succeed.To rerun the failed combination after a fix:
Run: 2026-04-29 06:34:16 UTC
1 task(s) failed:
public/content/videos/privacy-is-existential/index.md(fr): Gemini returned no content (finishReason=RECITATION). This file/language combination may be triggering content filters.Rerun the failed combinations: