Skip to content

i18n: translation pipeline, /videos#18063

Merged
wackerow merged 32 commits into
devfrom
intl/pending-dev
Apr 30, 2026
Merged

i18n: translation pipeline, /videos#18063
wackerow merged 32 commits into
devfrom
intl/pending-dev

Conversation

@wackerow
Copy link
Copy Markdown
Member

@wackerow wackerow commented Apr 29, 2026

Automated Translations

This PR contains translations managed by the intl pipeline.


Run: 2026-04-28 22:39 UTC

  • Languages: ar, fr*, bn, cs, de, es, hi, id, it, ja, ko, mr
  • Files: 638 (58 MD per language)
  • Source: public/content/videos/
  • Mode: auto
  • View workflow run

Notes
public/content/videos/privacy-is-existential/index.md (fr) -- 3 attempts across model fallback chain, all hit finishReason=RECITATION. This is largely deterministic for a given prompt; a retry with the same input is unlikely to succeed.

To rerun the failed combination after a fix:

gh workflow run "Intl Pipeline" -f target_path="public/content/videos/privacy-is-existential/index.md" -f target_languages=fr

Run: 2026-04-29 06:34:16 UTC

  • Languages: ar, bn, cs, de, es, fr, hi, id, it, ja, ko, mr, pl, pt-br, ru, sw, ta, te, tr, uk, ur, vi, zh-tw, zh
  • Files: 696 (696 MD, 0 JSON)
  • Mode: auto
  • View workflow run

1 task(s) failed:

  • public/content/videos/privacy-is-existential/index.md (fr): Gemini returned no content (finishReason=RECITATION). This file/language combination may be triggering content filters.

Rerun the failed combinations:

gh workflow run "Intl Pipeline" -f target_path="public/content/videos/privacy-is-existential/index.md" -f target_languages="fr"

myelinated-wackerow and others added 12 commits April 28, 2026 20:01
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 29, 2026

Deploy Preview for ethereumorg ready!

Name Link
🔨 Latest commit fe64419
🔍 Latest deploy log https://app.netlify.com/projects/ethereumorg/deploys/69f295deb43b510008c3fe1a
😎 Deploy Preview https://deploy-preview-18063.ethereum.it
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
7 paths audited
Performance: 73 (🟢 up 24 from production)
Accessibility: 93 (no change from production)
Best Practices: 100 (no change from production)
SEO: 98 (🔴 down 1 from production)
PWA: 59 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions github-actions Bot added content 🖋️ This involves copy additions or edits translation 🌍 This is related to our Translation Program labels Apr 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🌐 Translation review started. View progress

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test review body

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translation Quality Review

PR: #18063 (i18n: translation pipeline, /videos)
Branch HEAD: 2df3ad60b4
Languages: ar, bn, cs, de, es, fr, hi, id, it, ja, ko, mr (12)
Files reviewed: 695 video transcripts (58 per language; fr has 57)
Date: 2026-04-29
Fixes: No fixes applied (review-only)

Summary by Language

Language Files Quality Score Issues
ar 58 7.8/10 0 critical, ~8 warnings
bn 58 8.6/10 0 critical, ~7 warnings
cs 58 9.6/10 0 critical, 2 warnings
de 58 9.0/10 0 critical, 5 warnings
es 58 9.2/10 0 critical, 2 warnings
fr 57 9.6/10 0 critical, 1 warning (missing file)
hi 58 9.6/10 0 critical, 6 warnings
id 58 9.6/10 0 critical, 2 warnings
it 58 8.8/10 0 critical, 5 warnings
ja 58 9.0/10 2 critical, 3 warnings
ko 58 9.2/10 0 critical, 7 warnings
mr 58 9.2/10 0 critical, 4 warnings

Overall: 9.1/10 average across 12 languages. 2 critical issues, both in a single Japanese file.

Critical Issues

ja — public/content/translations/ja/videos/surveillance-silence-reclaiming-privacy/index.md

  • Line 161-162: Duplicate section heading. English heading Part two: let's save the world (25:22) followed immediately by Japanese 第2部:世界を救おう (25:22). The Japanese heading anchor also points to a different later section. Keep one Japanese heading; align the anchor to the English source slug.
  • Lines 164-390: ~50+ paragraph pairs ship the English source paragraph above its Japanese translation. The page will render English content twice. Remove all English paragraphs (Japanese translations beneath them are present and high quality).
  • Line 392: Heading #### 私たちが見たい未来について、私たちが下すべき選択 (46:56) is missing required {#anchor-id}. Markdownlint pre-commit hook requires h1-h4 IDs.

This single file appears to be a transcript that was partially edited before the translation pass completed — the other 57 Japanese files are clean.

Warnings — highlights per language

  • ar: Pervasive glossary deviation: staking → التخزين (174 occurrences) instead of glossary canonical التحصيص; validator → المصادقة (~60 occurrences) instead of مُدقِّق. Speaker labels left in Latin in several files (notably 36× **Binji:** in stani-kulechov-building-aave). Ethereum Foundation author rendered two ways. Brand-name body transliteration inconsistent (geth/Lido/Optimism/Solidity/TikTok left in Latin while transliteration bank exists).
  • bn: Brand names left in Latin script in body (Twitter, TikTok, Optimism, Arbitrum, Lido, Coinbase, EigenLayer, Apple, Solidity, Cardano) despite Bengali bank having canonical transliterations. First-mention parenthetical pattern is good but secondary mentions revert to Latin.
  • cs: 2 minor stylistic items (literal idioms in restaking-explained and ai-agents-interview-luna).
  • de: "Proof of Authority" translated as "Autoritätsnachweis" in 2 files while Proof-of-Stake / Proof-of-Work are kept in English everywhere. Recommend keeping all three in English. Minor breadcrumb plural mismatch in key-pair-eth-build.
  • es: Cross-file tu/usted variance (within-file consistency is fine).
  • fr: privacy-is-existential is missing from public/content/translations/fr/videos/ (58 EN vs 57 FR). Pipeline should propagate on next run. Minor capitalization variance.
  • hi: 6 files retain English speaker tags (Ryan Sean Adams:, Stani Kulechov:, Binji:, Domothy:) while majority files transliterate them. Files: blobspace-101-dencun, stani-kulechov-building-aave, ethereum-localism-global-protocols-local-power, ethereum-institutional-privacy-panel, real-state-of-l2s-bartek-kiepuszewski, ethereums-quantum-plan-justin-drake.
  • id: Minor tone register mixing in zero-knowledge-proofs-5-levels (justified by source) and crypto-security-passwords (long paragraph matches source).
  • it: Inconsistent capitalization/translation of consensus terms — "Proof-of-Stake" English vs "Prova di lavoro" / "prova di autorità" translated; pick one register. Cross-file tu/voi register varies.
  • ja: Beyond the 2 criticals: brand-name handling mixed (some files transliterate to katakana, others keep Latin) — both acceptable per policy, just inconsistent.
  • ko: "Finematics" rendered as 파이네매틱스 in eip-4844-dencun-explained vs the 7-file standard 파인매틱스. "Ethereum Foundation" split: 8 files use 이더리움 재단 (idiomatic), 5 use 이더리움 파운데이션; recommend unifying on 이더리움 재단. A few author: fields left in Latin (aantonop, Junion, Bankless, ETHBoulder).
  • mr: topic tags translated in some files and left in English in others. One brand "Finematics" rendered two ways. One Devanagari numeral inside a tag value (स्तर-२) — minor.

Detailed per-language scoring

Detailed per-language scoring tables (per-category notes per file) are posted as a follow-up comment to keep this Review body within parser limits.

Notes on this run

  • No fixes applied — this review ran from GitHub Actions, which cannot commit auto-fixes back to the PR branch.
  • Glossary deviations on ar reported as warnings, not critical: the spec calls glossary deviations critical, but the per-language KB (ar.md) records that multiple staking/validator translations exist in the wild. Treat the systemic staking → التخزين and validator → المصادقة as recommended fixes — running the auto-fix locally will surface and standardize them against the canonical glossary.
  • fr/privacy-is-existential missing translation is a pipeline-side issue, not a translation-quality issue — the next intl-pipeline run should propagate it.

To apply fixes, run locally:

/review-translations --pr=18063 --fix

This will, in particular: clean up ja/surveillance-silence-reclaiming-privacy/index.md (remove interleaved English, fix duplicate heading + missing anchor) and standardize Arabic staking/validator to glossary canonical.


Reviewed by Claude Code via /review-translations CI workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

Detailed per-language scoring (follow-up to translation review)

This comment supplements the Translation Quality Review above with per-category scoring tables. The summary table and critical issues are in the Review body.

ar (7.8/10)

Category Score Notes
Brand Name Preservation (transliteration policy) 6/10 Many transliterated correctly; speaker names like Binji and brands like geth/Lido/Optimism/Bankless/Solidity/TikTok left in Latin
Technical Accuracy 9/10 No PoS/PoW or validator/miner inversions, MEV not corrupted, oracle not corrupted, blocks not "barriers"
Semantic Fidelity 9/10 Faithful renderings, technical concepts conveyed accurately
Terminology Consistency 6/10 staking → التخزين not glossary التحصيص; validator mostly المصادقة not مُدقِّق
Tone/Register 9/10 Consistent MSA register, RTL spans used properly for ltr fragments

Far better quality than the PR #17105 baseline (5.2). Frontmatter integrity is fully preserved (no translated keys, no broken lang/youtubeId/domains, no Crowdin boilerplate). No critical semantic inversions or cross-script contamination. Main gaps are systematic glossary deviation and inconsistent transliteration policy application — sanitizer-fixable patterns rather than meaning-changing errors.

bn (8.6/10)

Category Score Notes
Brand Name Preservation (transliteration policy) 6.5/10 Many brand names left in Latin; bn.json bank exists with proper transliterations. First-mention parenthetical pattern is good, but secondary mentions revert to Latin.
Technical Accuracy 9.5/10 PoW/PoS, validator/miner, mainnet/testnet semantics intact
Semantic Fidelity 9.5/10 Spot-checked PoW vs PoS, restaking, proof-of-authority, Vitalik 30-min, Luna interview against EN — meaning preserved
Terminology Consistency 8.5/10 Core terms (ইথেরিয়াম, ভ্যালিডেটর, ট্রানজ্যাকশন, কনসেনসাস) consistent
Tone/Register 9/10 Natural fluent Bengali prose; appropriate formal/conversational balance

cs (9.6/10)

Category Score Notes
Brand Name Preservation 10/10 Solidity, Ethereum, Bitcoin, ETH, BTC, EVM, MetaMask, DeFi, NFT, ERC-20/721/1155, KZG, EIP-4844, Devcon, Optimism, Arbitrum, Lido, Coinbase, GitHub, TikTok, Beacon chain, Hyperledger Fabric, Cardano all preserved cleanly
Technical Accuracy 10/10 PoW/PoS, slashing/penalizace, validator, blob, calldata, rollup, hard fork/soft fork, weak subjectivity, dankshardingu, point-evaluation precompile all correct
Semantic Fidelity 10/10 No PoS/PoW inversions, no validator/miner swaps, no mainnet/testnet swaps
Terminology Consistency 9/10 Consistent use of "validátor", "staking/stakování", "blockchain", "důkaz prací (PoW)", "důkaz podílem (PoS)", "rollup"
Tone/Register 9/10 Polite/neutral register throughout; idioms localized appropriately

Professional, near-publication quality. Recommend merging without translation-side changes.

de (9.0/10)

Category Score Notes
Brand Name Preservation 10/10 Solidity, Vyper, MetaMask, Hardhat, Ethereum, Bitcoin, JavaScript all retained
Technical Accuracy 9/10 youtubeIds, hrefs, frontmatter keys all correct
Semantic Fidelity 9/10 No PoS/PoW or validator/miner inversions; "Härte"/"Zustand"/"Staat" used correctly per context
Terminology Consistency 8/10 PoA translated as "Autoritätsnachweis" while PoW/PoS kept English
Tone/Register 9/10 Each file internally consistent du-or-Sie; choice fits speaker register

es (9.2/10)

Category Score Notes
Brand Name Preservation 10/10 Solidity, Vyper, MetaMask, EigenLayer, Lido, Geth, Prysm, Lighthouse, Optimism, Arbitrum, OP Mainnet, Bitcoin, JavaScript all kept in Latin
Technical Accuracy 10/10 PoW/PoS, validator/miner, mainnet/testnet, L1/L2 correctly mapped; no semantic inversions
Semantic Fidelity 9/10 Faithful to source; idiomatic Spanish
Terminology Consistency 9/10 Stable: staking, validador, billetera, bifurcación dura, capa 1/2, prueba de trabajo/participación
Tone/Register 8/10 Mostly tú-form (232 hits) but 42 usted hits; cross-file register varies between videos

fr (9.6/10)

Category Score Notes
Brand Name Preservation 10/10 All Latin brand names kept verbatim (Solidity, Ethereum, Bitcoin, MetaMask, GitHub, JavaScript, etc.)
Technical Accuracy 9.5/10 EIP/ERC/blob/danksharding/staking/restaking handled correctly
Semantic Fidelity 10/10 No PoW/PoS, validator/miner, or mainnet/testnet inversions detected
Terminology Consistency 9/10 Stable: "chaîne de blocs", "preuve d'enjeu/travail", "contrats intelligents", "couche 1/2 (l1/l2)"
Tone/Register 9.5/10 "tu" used inside interview dialogue, "vous" in expository text — both context-appropriate

Only notable gap: privacy-is-existential is absent from public/content/translations/fr/videos/. Pipeline should propagate on next run.

hi (9.6/10)

Category Score Notes
Brand Name Preservation (transliteration policy) 9/10 Excellent transliteration overall (विटालिक ब्यूटेरिन, सॉलिडिटी-style); 6 files retain Latin speaker tags
Technical Accuracy 10/10 PoW/PoS, validator/miner, slashing, blobs all consistent and correct
Semantic Fidelity 10/10 No inversions, no mistranslated brand semantics, paragraphs faithful to English
Terminology Consistency 9/10 "क्लाइंट" used consistently for client; PoS=प्रूफ-ऑफ़-स्टेक, PoW=प्रूफ-ऑफ-वर्क uniformly
Tone/Register 10/10 Formal आप used throughout; respectful register

Zero Devanagari numerals, no domain transliteration, no frontmatter key translations, no untranslated paragraphs.

id (9.6/10)

Category Score Notes
Brand Name Preservation 10/10 Zero translated brand names; "Retheum" verified intentional
Technical Accuracy 9.5/10 "Bukti Kerja"/"Bukti Kepemilikan" used consistently; technical terms preserved appropriately
Semantic Fidelity 10/10 No PoW↔PoS, validator↔miner, or mainnet↔testnet inversions
Terminology Consistency 9.5/10 "rantai blok"/"kontrak pintar"/"penambang" consistent across files
Tone/Register 9/10 Predominantly formal "Anda"; informal "kamu" appears only in conversational interviews/child-targeted content (justified by source)

it (8.8/10)

Category Score Notes
Brand Name Preservation 10/10 Solidity, Vyper, MetaMask, GitHub, Twitter, Mainnet all preserved verbatim
Technical Accuracy 9/10 EIP-4844, PeerDAS, RIP-7212, blob, rollup, slashing, validator, restaking all correct
Semantic Fidelity 10/10 No PoS/PoW or validator/miner inversions; mainnet/testnet preserved
Terminology Consistency 7/10 Mixed: "Proof-of-Stake" English vs "Prova di lavoro" translated; inconsistent capitalization
Tone/Register 8/10 Mostly internally consistent, but "tu" vs "voi" varies between files

ja (9.0/10)

Category Score Notes
Brand Name Preservation 9/10 Strong katakana use; mixed Latin/katakana acceptable in tech context
Technical Accuracy 9/10 PoW/PoS, validator/miner, mainnet/testnet, EIP-4844, restaking all correct
Semantic Fidelity 8/10 Excellent across sample, but surveillance-silence-reclaiming-privacy ships English alongside translation
Terminology Consistency 9/10 Core terms (プルーフ・オブ・ステーク, バリデータ, ロールアップ, レイヤー2, スラッシング, リステーキング) consistent
Tone/Register 10/10 Polite です/ます consistently throughout

One file (surveillance-silence-reclaiming-privacy) needs cleanup before merge. The other 57 files are merge-ready.

ko (9.2/10)

Category Score Notes
Brand Name Preservation 8.5/10 Strong overall (transliteration + Latin parens like 비탈릭 부테린(Vitalik Buterin)); 1 Finematics typo, a few mixed-Latin author fields
Technical Accuracy 10/10 PoW/PoS, validator/miner, mainnet/testnet all correct; KZG, blob, rollup, slashing terminology accurate
Semantic Fidelity 10/10 Translations faithful to English source
Terminology Consistency 7.5/10 EF rendered two ways; minor brand variation
Tone/Register 10/10 Consistent formal -니다/-습니다 register

mr (9.2/10)

Category Score Notes
Brand Name Preservation (transliteration policy) 9/10 Brands transliterated correctly with Latin original in parens (Optimism/Coinbase pattern); minor Finematics inconsistency
Technical Accuracy 10/10 PoS/PoW, validator/miner, L1/L2, EIP-4844, blob, DA all preserved correctly
Semantic Fidelity 10/10 Sampled paragraphs match English closely; no untranslated paragraphs
Terminology Consistency 8/10 Ethereum consistently "इथेरियम"; topic-tag policy varies file to file
Tone/Register 9/10 Formal तुम्ही/आपण used consistently across interview-format files

wackerow and others added 9 commits April 28, 2026 21:38
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
wackerow and others added 6 commits April 28, 2026 23:30
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
Co-Authored-By: Gemini <gemini@google.com>
@wackerow
Copy link
Copy Markdown
Member Author

@claude /review-translations --language=pl,pt-br,ru,sw,ta,te,tr,uk,ur,vi,zh-tw,zh

@github-actions
Copy link
Copy Markdown
Contributor

🌐 Translation review started. View progress

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test body submission - verifying gh pr review path works.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Body via stdin pipe test

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 1

Heading inside printf

Para.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translation Quality Review

PR: #18063 (i18n: translation pipeline, /videos)
Branch HEAD: 858e8e017f
Languages: pl, pt-br, ru, sw, ta, te, tr, uk, ur, vi, zh-tw, zh (12)
Files reviewed: 696 video transcripts (58 per language)
Date: 2026-04-29
Fixes: No fixes applied (review-only)

Companion review to the earlier batch on this PR (ar/bn/cs/de/es/fr/hi/id/it/ja/ko/mr) -- covers the remaining 12 languages of the PR.

Summary by Language

Language Files Quality Score Issues
pl 58 9.3/10 0 critical, ~5 warnings
pt-br 58 9.4/10 0 critical, 5 warnings
ru 58 8.8/10 0 critical, ~10 warnings
sw 58 8.9/10 0 critical, ~10 warnings
ta 58 7.7/10 56 critical (translated topic: slugs), ~5 warnings
te 58 7.0/10 14+ critical (translated topic: slugs), ~6 warnings
tr 58 9.7/10 0 critical, 1 warning
uk 58 9.0/10 0 critical, ~25 warnings (brand-strategy stylistic)
ur 58 7.7/10 9 critical (YAML-quote escaping, span tags inside heading IDs), ~26 warnings (Eastern-Arabic numerals)
vi 58 9.0/10 0 critical, ~5 warnings
zh-tw 58 8.8/10 0 critical, ~12 warnings
zh 58 8.5/10 0 critical, ~20 warnings

Overall: 8.7/10 average across 12 languages. Critical issues concentrated in Urdu (frontmatter YAML escaping, heading-ID corruption -- 6 files) and Tamil/Telugu (translated topic: taxonomy slugs -- pattern across most files).

Critical Issues

ur -- YAML / heading-ID structural defects (6 files)

  • ai-agents-interview-luna/index.md line 2 -- Duplicated/recursive title: field with unescaped inner double-quote from inline <span dir="ltr"> tag. The title: key appears twice nested into itself.
  • devconnect-argentina-2025-recap/index.md lines 2-3 -- Unescaped double-quotes inside double-quoted YAML title and description.
  • post-quantum-security-ethereum-roadmap/index.md lines 2-3 -- Same pattern.
  • pectra-upgrade-overview/index.md line 3 -- Same pattern.
  • ethereum-evolution-glamsterdam/index.md line 60 -- Heading-ID anchor block contains literal <span dir="ltr">2s</span> inside the curly {#...} slug. Violates markdownlint custom-id rule.
  • how-to-make-a-guerilla-l2/index.md line 51 -- Same heading-ID issue.

These are YAML/MDX parser-level bugs that may break the build for Urdu locale.

ta -- Frontmatter topic: slugs translated (56 of 58 files)
The English source uses kebab-case Latin slugs (community-stories, how-ethereum-works, roadmap-and-priorities, etc.) as taxonomy/filter keys. Tamil files translate them to Tamil script (e.g. தொகுதிச்சங்கிலி, வழிகாட்டி-வரைபடம், டெஸ்சி, etc.), which won t match any canonical taxonomy slug and risks fragmenting topic filtering. Pattern is uniform -- only 2 files retain Latin.

te -- Frontmatter topic: slugs translated (~14+ confirmed; pattern likely affects all 58)
Same defect as Tamil: topic slugs rendered in Telugu script (e.g. ఎథీరియం-ఎలా-పనిచేస్తుంది, నవీకరణలు, కమ్యూనిటీ, etc.). Some files mix Latin (eip-4844, dencun, dao) and Telugu in the same topic array.

Note on the topic-tag policy: The pl, zh-tw, and zh reviewers noted similar inconsistency at warning level. The codebase doc says concept tags are intentionally translated by Crowdin but also says frontmatter tags arrays must stay Latin. Recommend: clarify the policy for topic: arrays specifically and either revert to Latin slugs in ta/te or normalize across all locales.

(Detailed per-language warnings, scoring breakdowns, and per-file issue lists posted as a follow-up comment.)


To apply fixes, run locally:

/review-translations --pr=18063 --fix

Highest-value targeted runs: fix ur YAML quote escaping (4 files) + span tags inside heading-IDs (2 files); revert ta/te topic: arrays to the English Latin slugs from the source files; standardize Arabic-script numerals in ur body text to Western Arabic via numberFormat() policy; normalize sw rollup/mikusanyiko against glossary.


Reviewed by Claude Code via /review-translations CI workflow

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

Translation Review — Per-Language Warnings (PR #18063)

Follow-up detail to the Translation Quality Review covering pl, pt-br, ru, sw, ta, te, tr, uk, ur, vi, zh-tw, zh.

Warnings highlights per language

  • pl: Frontmatter topic: slug formatting inconsistent (kebab-case vs spaces vs English left untranslated in devcon-mumbai-coming-2026). English role tags **Teen:** / **College Student:** left in English in zero-knowledge-proofs-5-levels while dialogue is Polish.
  • pt-br: Gender of "Ethereum" varies (a/o Ethereum) across 3 files (ethereum-things-i-like-mariano-conti, devconnect-buenos-aires-promo, understanding-consensus-mechanisms); "optimistic rollups" left untranslated in layer-2-scaling-explained while elsewhere translated; "Wei" capitalized inconsistently in transactions-eth-build.
  • ru: Typo протоколаламипротоколами in zero-knowledge-proofs-5-levels line 138. Brand renderings: "Finematics" appears as Finematics / Файнматикс / Финематикс (3 forms across 7 files); "Bankless" as Bankless / Бэнклесс. Solidity/MetaMask/Lido/Aave/Optimism/Arbitrum kept Latin (acceptable; bank prefers Cyrillic).
  • sw: Glossary deviation: rollup rendered as mikusanyiko (gathering/collection) across layer-2-scaling-explained, blobspace-101-dencun, eip-4844-dencun-explained, restaking-explained, eigenlayer-permissionless-features — conflicts with glossary rollup → rollup. Same files mix mikusanyiko and rollup (e.g. blobspace-101-dencun lines 19/51). Typo tokanitokeni in ethereum-staking-withdrawals line 102 (also amana should be dhamana). blockchain left in English in ethereum-basics-intro lines 17/23/33 and decentralized-identity-explained line 3 (glossary: mnyororo wa vitalu).
  • ta: Beyond topic-tags critical: "Ethereum" rendered three ways (எத்திரியம் 45 files / எத்தீரியம் 23 files / எத்தேரியம் 5 files; glossary primary is the third form). "blockchain" two ways. "consensus" five ways. "Ethereum Foundation" three ways across files. Stray Bengali word প্রচুর in desci-movement-juan-benet/index.md line 41.
  • te: Beyond topic-tags critical: "Ethereum" transliteration deviates from glossary (uses ఎథీరియం vs glossary ఎథెరియం). "ETHBoulder" rendered both ఎథ్‌బోల్డర్ and ఎత్‌బోల్డర్ within same files. Out-of-file: te.json transliteration bank has two Japanese-katakana entries (Infura → インフラ, Snapshot → スナップショット) and Gnosis missing initial consonant (నోసిస్ should be గ్నోసిస్).
  • tr: Single minor — sabitcoin used in smart-contracts-code-is-law line 79 vs glossary İstikrarlı coin. None of the prior known critical patterns (katillik, MeFi, Markette, ethererum, EHT/BSL transpositions, PoW/PoS inversion) appear.
  • uk: Brand-strategy choice: Cyrillic for protocols (Етеріум/Біткоїн), Latin for tools/code (Solidity, Geth, Remix, Etherscan, GitHub) — internally consistent, deviates from transliteration bank. Author frontmatter sometimes transliterated (ЕТХБоулдер, Бінанс Академі, Веб3Прайвасі Нау, Фундація Ethereum) while inline body keeps Latin — inconsistency.
  • ur: Beyond the 9 criticals: 26 files mix Eastern Arabic-Indic numerals (۰–۹) with Latin digits (e.g. لیئر ۲ (l2)); contradicts the codebase numberFormat() policy. Brand inconsistency: Devconnect Latin in devconnect-argentina-2025-recap vs ڈی کنیکٹ in devconnect-buenos-aires-promo; Prysm = پرزم (file: ethereum-evolution-glamsterdam) vs glossary پریزم.
  • vi: "blockchain" left in English across 3 files (ethereum-basics-intro lines 17/33/39, decentralized-identity-explained line 3, crypto-security-passwords line 25); glossary requires chuỗi khối. Mixed bản cuộn (rollup) style. (Significant improvement over the 7.2/10 for Vietnamese in PR i18n: automated Crowdin translation import (vi) #17176.)
  • zh-tw: No simplified-Chinese contamination. WIRED and When Shift Happens transliterated in author frontmatter while body keeps Latin — inconsistency. Finematics rendered both Latin and 芬尼馬蒂克斯 across files. MetaMask glossary mismatch (梅塔馬斯克 in desci-movement-juan-benet:73 and security-through-obscurity-microdots:46 vs canonical 麥塔馬斯克). Uniswap deviates from glossary 優尼互換 (uses 尤尼斯瓦普 in 7 files).
  • zh: No traditional-Chinese contamination. Frontmatter author: field heavily transliterated and inconsistent: Bankless appears as Bankless / 班克莱斯 / 班克莱斯 (Bankless) across three files; Finematics has 3 distinct Simplified spellings across 7 files; WIRED/Microsoft Security/Tech in Asia/When Shift Happens transliterated in author: despite body keeping Latin. Uniswap mostly 尤尼斯瓦普 (glossary: 优尼互换 or Latin). Rollup mixed with 汇总 within same file.

Per-language scoring breakdowns posted in a separate comment.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

Translation Review — Per-Language Scoring Detail (PR #18063)

Follow-up to the Translation Quality Review — per-category scoring breakdown for each of the 12 languages.

pl — Overall 9.3/10

Category Score Notes
Brand Name Preservation 10/10 All brands kept Latin (Ethereum, Bitcoin, MetaMask, Lido, Aave, Optimism, Arbitrum, OP Mainnet, zkSync, Uniswap, EigenLayer, Coinbase, JPMorgan, KZG, EVM, ETH.BUILD).
Technical Accuracy 9.5/10 PoS/PoW semantics correct everywhere; rollups, blob/danksharding, Beacon Chain, validator economics rendered faithfully.
Semantic Fidelity 9.5/10 No semantic inversions; idiomatic re-wordings preserve original intent.
Terminology Consistency 8.0/10 Core terms used consistently. Topic-slug formatting inconsistent across the corpus.
Tone/Register 9.5/10 Register matches source across tutorial/podcast/explainer formats.

pt-br — Overall 9.4/10

Category Score Notes
Brand Name Preservation 10/10 All brands correctly preserved in English. Glosses provided for transliterated terms.
Technical Accuracy 9.5/10 PoW/PoS, rollups, blob transactions, EIPs, sharding, KZG, MEV, validator/staker mechanics correctly conveyed.
Semantic Fidelity 9.5/10 Faithful and idiomatic. Speaker labels and structural markers preserved.
Terminology Consistency 8.5/10 Strong on glossary terms; minor optimistic rollups and Ethereum-gender inconsistency.
Tone/Register 9.5/10 Consistent você-form Brazilian register; no European Portuguese forms.

ru — Overall 8.8/10

Category Score Notes
Brand Name Preservation 7.5/10 Domains/URLs/tickers all Latin (good); Finematics 3-way and Bankless 3-way inconsistency.
Technical Accuracy 9.5/10 EIP/EVM/PoW/PoS/SNARK/STARK/KZG/BLS preserved. Concepts rendered accurately.
Semantic Fidelity 9.5/10 English content faithfully rendered; one typo doesn t affect meaning.
Terminology Consistency 8.0/10 Glossary terms used consistently in body; inconsistency on author/source brand names.
Tone/Register 9.5/10 Formal/neutral register; idiomatic Russian.

sw — Overall 8.9/10

Category Score Notes
Brand Name Preservation 9.5/10 Bitcoin, Ethereum, Solidity, MetaMask, Optimism, Arbitrum, Uniswap, OpenSea, Lido, Rocket Pool, district0x, Finematics, Bankless, Etherscan all preserved.
Technical Accuracy 8.8/10 Concepts conveyed accurately. PoW/PoS, validator, slashing, hashing, nonces, blobs all correctly explained.
Semantic Fidelity 9.2/10 Long Bankless/Justin Drake debates rendered accurately.
Terminology Consistency 7.5/10 Main weakness: rollupmikusanyiko glossary deviation across most files; mixed within same file.
Tone/Register 9.3/10 Reads naturally; appropriate informal register for podcast transcripts.

ta — Overall 7.7/10

Category Score Notes
Brand Name Preservation 7.5/10 Domains intact; tickers intact; brand names usually given Tamil + parenthetical Latin. Author renders inconsistently.
Technical Accuracy 8.5/10 PoW/PoS, 51% attack, slashing, sharding, blob, EIP-4844 explanations faithful.
Semantic Fidelity 8.5/10 Full transcripts tracked English line-by-line; no untranslated chunks beyond one stray Bengali word.
Terminology Consistency 5.5/10 "Ethereum" 3 ways, "blockchain" 2 ways, "PoS" 2 ways, "consensus" 5 ways. Topic tags translated.
Tone/Register 8.5/10 Reads naturally; Western Arabic numerals throughout.

te — Overall 7.0/10

Category Score Notes
Brand Name Preservation 7.5/10 Most brands transliterated reasonably; ETHBoulder/EthBoulder forms drift; "Ethereum" deviates from glossary glyph.
Technical Accuracy 9.0/10 Strong handling of blob, calldata, state delta, PoS/PoW, EVM, KZG, slashing, restaking, sharding. "state" correctly rendered as స్థితి (computational), not governmental.
Semantic Fidelity 8.5/10 Translations track English source closely; meanings preserved.
Terminology Consistency 6.5/10 "Ethereum" form deviates from glossary; ETHBoulder forms drift; mixed Latin/Telugu in topic arrays.
Tone/Register 9.0/10 Appropriate technical/educational register; Western Arabic numerals throughout.

tr — Overall 9.7/10

Category Score Notes
Brand Name Preservation 10/10 Solidity, Ethereum, EVM, Vyper, Bitcoin, Uniswap, MakerDAO, Lido, Optimism, Arbitrum, zkSync, Geth, Prysm, Lighthouse, Nethermind, Reth all kept verbatim. None of the prior known issues (katillik/MeFi/Markette/ethererum) present.
Technical Accuracy 9.7/10 PoW/PoS arguments preserved with correct attribution. EIP numbers, ECDSA, KZG, XMSS, SHA-256 correct. No EHT/BSL/ECDAS transpositions.
Semantic Fidelity 9.7/10 "client" → "istemci" (software) and "müşteri" (customer) correctly distinguished by context.
Terminology Consistency 9.5/10 Glossary applied consistently: Ana Ağ, istemci, Hisse Kanıtı (PoS), İş Kanıtı (PoW), akıllı sözleşme, blokzincir, doğrulayıcı.
Tone/Register 9.7/10 Natural, fluent Turkish; technical register matches source.

uk — Overall 9.0/10

Category Score Notes
Brand Name Preservation 8.5/10 Domains/URLs/code all Latin. Brand strategy: Cyrillic for protocols, Latin for tools/code — internally consistent, deviates from bank. Author frontmatter sometimes transliterates while body Latin — inconsistency.
Technical Accuracy 9.5/10 Heading IDs preserved exactly. Code blocks intact. EIP-4844, ERC-8004, BLS, ZK, SNARK/STARK/PLONK, EVM accurate.
Semantic Fidelity 9.5/10 Strong, idiomatic Ukrainian renderings. Glossary terms consistently parenthesized.
Terminology Consistency 8.0/10 Within-document consistency good; cross-document brand handling drifts. Glossary alignment full match for core Ethereum terms.
Tone/Register 9.5/10 Natural conversational Ukrainian; distinguishes casual vs formal/technical register.

ur — Overall 7.7/10

Category Score Notes
Brand Name Preservation 7.5/10 Most brand names handled with Urdu+Latin gloss pattern. Domains 100% Latin. Some glossary deviations (پرزم vs پریزم) and Devconnect transliteration inconsistency.
Technical Accuracy 7.0/10 Translation of core terms generally accurate. Two files have heading-IDs with embedded <span> tags — structural defect.
Semantic Fidelity 8.5/10 Long-form sections faithfully convey meaning; no significant omissions or untranslated chunks.
Terminology Consistency 6.5/10 Eastern vs Western numerals mixed within same file (26 files); brand transliteration mixed; topic-tag policy uneven.
Tone/Register 9.0/10 Tone consistently formal-educational; honorifics and verb agreement correct.

vi — Overall 9.0/10

Category Score Notes
Brand Name Preservation 9.5/10 Brand names preserved correctly throughout. No diacritic-borrowed forms.
Technical Accuracy 9.0/10 Strong handling of complex technical content (PoS/PoW, EIP, EigenLayer restaking, BLS/KZG, XMSS, Falcon, danksharding).
Semantic Fidelity 9.2/10 Translations follow source faithfully; idiomatic Vietnamese. No truncations.
Terminology Consistency 8.0/10 Glossary largely respected. Notable misses: "blockchain" left untranslated in a few places.
Tone/Register 9.3/10 Register matches source; pronoun choice appropriate per speaker context.

zh-tw — Overall 8.8/10

Category Score Notes
Brand Name Preservation 7.5/10 WIRED and When Shift Happens transliterated in frontmatter while body keeps Latin; Finematics/Uniswap/MetaMask/Chainlink diverge from glossary.
Technical Accuracy 9.5/10 DeFi, EIP-4844, ZK proofs, restaking, PoW, DAO hack, blockchain demo correctly rendered.
Semantic Fidelity 9.5/10 Sentence-level comparison shows faithful, natural prose.
Terminology Consistency 8.0/10 Within-file consistency good; cross-file drift on brand names.
Tone/Register 9.5/10 Reads as native Traditional Chinese (Taiwan); colloquial register matches conversational interview content.

zh — Overall 8.5/10

Category Score Notes
Brand Name Preservation 6.5/10 Body retains Latin brands well. Frontmatter author: heavily transliterated and inconsistent — Bankless 3 forms, Finematics 3 forms.
Technical Accuracy 9.5/10 Excellent — protocol concepts (PBS, KZG, blobs, Verkle, MaxEB, slashing, restaking, fraud proofs) translated accurately.
Semantic Fidelity 9.5/10 Faithful long-form interview/explainer content; tone preserved.
Terminology Consistency 7.5/10 Glossary terms used consistently in body (区块链, 智能合约, 验证者, 质押, 节点, L2, Gas). Deviations on Uniswap, Rollup, Finematics.
Tone/Register 9.5/10 Natural Mandarin throughout.

Companion to the main review — supplemental detail does not need its own SHA marker.

myelinated-wackerow and others added 3 commits April 29, 2026 13:46
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Lock videos frontmatter taxonomy/metadata fields (topic, uploadDate, duration, educationLevel, youtubeId, format) so they match English byte-for-byte. Update LLM prompt with explicit per-field policy, ban span dir wrappers in frontmatter (use LRI/PDI U+2066/U+2069 instead), and drop redundant heading-ID rules since Gemini never sees them after the normalizer's Pass 6 strips them.

Protect heading-ID anchor blocks {#...} from late RTL passes by adding them to RTL_SKIP_PATTERN. syncHeaderIdsWithEnglish was already copying clean English IDs into translated headers, but fixBareRtlValues was re-corrupting them afterward by wrapping fragments like 2s in span dir=ltr.

Defensive fix for backreference latent bugs in five frontmatter editors (normalizeFrontmatterDates, syncButtonsFrontmatterFields, quoteFrontmatterNonAscii, fixDuplicateFrontmatterAuthor, fixFrontmatterLang). When the replacement string contains a dollar-N sequence from user content (e.g. dollar-17M), it was being interpreted as a regex backreference. Switched all five to callback form.

Document patterns 58-61 in sanitizer-test-research.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Re-run sanitizer over all 1391 translated video markdown files (24 non-English locales). 1355 files modified.

Lock the videos taxonomy/metadata fields (topic, uploadDate, duration, educationLevel, youtubeId, format) to match the English source byte-for-byte. Translatable fields (title, description, breadcrumb, lang) are preserved.

Strip span dir=ltr wrappers from heading-ID anchor blocks {#...} in RTL locales (ur). Convert legitimate span dir=ltr wrappers inside frontmatter values to U+2066 (LRI) / U+2069 (PDI) BiDi isolates so they no longer break YAML.

Hand-recover the recursive title damage in ai-agents-interview-luna for ur and ta where the LLM's $17M span had cascaded into duplicated frontmatter keys; the inner translated value is restored.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
@github-actions github-actions Bot added the documentation 📖 Change or add documentation label Apr 29, 2026
myelinated-wackerow and others added 2 commits April 29, 2026 15:58
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
The RTL markdown prompt now carries a dual BiDi policy: span dir=ltr for body content (MDX parses HTML, browser honors dir) and U+2066 (LRI) / U+2069 (PDI) for frontmatter values (the inner double-quote on a dir attribute terminates the outer YAML string and breaks the build). The test previously asserted markdown contained span and did NOT contain U+2066, which conflicted with the new frontmatter rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
@wackerow wackerow merged commit 9f430e8 into dev Apr 30, 2026
9 checks passed
@wackerow wackerow deleted the intl/pending-dev branch April 30, 2026 00:09
@wackerow wackerow mentioned this pull request May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content 🖋️ This involves copy additions or edits documentation 📖 Change or add documentation translation 🌍 This is related to our Translation Program

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants