Skip to content
Merged
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
title: "Czech (cs) Crowdin Translation Import Review (PR #17547)"
date: 2026-02-25
category: translation-review
tags:
- i18n
- translations
- crowdin
- czech
- mdx
- sanitizer
- acronyms
- urls
- glossary
- gemini
- content-staleness
severity: medium
component: "i18n translation pipeline (Czech/cs)"
related_prs:
- 17547
- 17653
- 17654
resolution_type: code-fix
confidence: medium
quality_score: 7.2
---

# Czech (cs) Translation Review -- PR #17547

**PR:** #17547 -- i18n(cs): automated Crowdin translation import (part 01 of 13)
**Date:** 2026-02-25
**Files reviewed:** 28 of 38 (10 orphaned at stale Crowdin paths)
**Overall quality score:** 7.2/10

## Context

PR #17547 is an automated Crowdin import of Czech translations for ethereum.org. The translations were generated by Gemini AI, not human translators. This is part 01 of a 13-part Czech import series.

The review was performed using the `review-translations-local` skill, which runs: worktree setup -> sanitizer -> AI review agents -> auto-fix -> build verification.

### Related PRs

- **#17653** (`fix-review-translations`): Source of sanitizer scripts (`sanitize-pr.ts`, `post_import_sanitize.ts`). Not yet merged to dev at time of review.
- **#17654** (`add-sanitizer-tests`): Adds unit tests and a workflow for the sanitizer. Not yet merged.

### Related Documentation

- [Turkish PR #17182 review](./crowdin-import-review-turkish-pr-17182.md) -- first review case study, 7.7/10
- [Japanese PR #17132 review](./crowdin-import-review-japanese-pr-17132.md) -- 5 sanitizer bugs found
- [Japanese quality scoring](./japanese-translation-quality-review-fix-process-pr-17132.md) -- 306 files, 4.89/5.0
- [Agent calibration (Czech Part 07)](../integration-issues/crowdin-import-review-agent-calibration.md) -- false positive reduction

## Quality Scores

| Category | Score | Notes |
|----------|-------|-------|
| Brand Name Preservation | 8.5/10 | Consistently kept in English. Minor redundant "Machine" in EVM (fixed). |
| Technical Accuracy | 6.5/10 | Multiple files outdated vs English source. Outdated URLs (fixed). |
| Semantic Fidelity | 7.0/10 | Good where content is current. Staleness degrades fidelity. |
| Terminology Consistency | 6.0/10 | Glossary deviations: beacon chain, staking, gas. MEV/ZKP acronyms (fixed). |
| Tone/Register | 8.0/10 | Consistent formal/neutral vy-form. |
| **Overall** | **7.2/10** | |

## Solution

### Root Causes

1. **Gemini AI translation**: Translations generated by Gemini, not human translators. Explains systematic glossary deviations -- Gemini localized terms the community glossary says to keep in English.
2. **Crowdin path mapping drift**: Crowdin's file path mapping hasn't been updated to match content restructuring on dev branch, causing 10 orphaned file paths.
3. **Translation staleness**: English source files substantially rewritten (account-abstraction, future-proofing) but Czech translations still reflect old content.

### Commit 1 -- Critical Fixes (8 fixes, 4 files)

| # | File | Fix |
|---|------|-----|
| 1 | `wrapped-eth/index.md` | Fixed broken MDX link bracket: `[ERC-20 tokeny(/glossary/#erc-20)` -> `[ERC-20 tokeny](/glossary/#erc-20)` |
| 2-4 | `community/research/index.md` | Fixed MEV acronym (3x): MEH -> MEV |
| 5 | `roadmap/account-abstraction/index.md` | Fixed bundlebear URL: `/overview/all` -> `/erc4337-overview/all` |
| 6 | `roadmap/account-abstraction/index.md` | Fixed erc4337.io URL: `www.erc4337.io` -> `docs.erc4337.io` |
| 7 | `roadmap/account-abstraction/index.md` | Removed redundant "Machine": "Virtualniho stroje Etherea Machine (EVM)" -> "Virtualniho stroje Etherea (EVM)" |
| 8-9 | `roadmap/scaling/index.md` | Fixed l2beat URLs (2x): `/tvl` -> `/tvs`, added `/scaling/summary` |

### Commit 2 -- Simple Warnings (6 fixes, 5 files)

| # | File | Fix |
|---|------|-----|
| 1 | `bridges/index.md` | Typo: "zahruje" -> "zahrnuje" (missing 'n') |
| 2 | `community/grants/index.md` | Typo: "Ethereuu" -> "Ethereu" (double 'u') |
| 3 | `community/get-involved/index.md` | Removed duplicate "Zapojte se" from heading |
| 4 | `roadmap/user-experience/index.md` | Typo: "trval" -> "trvat" (infinitive form) |
| 5-6 | `community/research/index.md` | Acronym: DNZ -> ZKP (2x) |

## Key Decisions

### DNZ vs ZKP Acronym Decision

**Decision:** Changed Czech-localized DNZ (Dukazy s Nulovymi Znalostmi) to universal ZKP.

**Reasoning (moderate confidence):**
- English source uses ZKP
- Heading anchor `{#cryptography--zkp}` creates a mismatch with DNZ in the heading text
- ZKP is universally used in the crypto ecosystem, even in non-English communities (similar to DeFi, NFT, DAO staying as-is)
- Consistent with the MEV decision (did not keep localized MEH)

**Counter-argument:** DNZ is a natural Czech abbreviation and could be more immediately comprehensible to Czech readers. However, the anchor mismatch is a concrete technical issue, and ecosystem convention tips the balance.

### Glossary Deviations -- Deferred Enforcement

**Decision:** Did NOT fix pervasive glossary deviations. Documented for future PR.

**The deviations:**
| English Term | Community Glossary | Gemini Translation | Used In |
|---|---|---|---|
| staking | staking (keep English) | uzamceni | adding-staking-products, beacon-chain |
| beacon chain | beacon chain (keep English) | Retezova vazba | beacon-chain |
| gas / gas fee | transakcni poplatek | palivo | account-abstraction, adding-wallets, research |
| staker | staker (keep English) | uzamykatel | beacon-chain |

**Key context:** The glossary was provided by Czech-speaking community members via Crowdin votes. The translations were generated by Gemini AI. We trust the community glossary as authoritative.

**Why deferred:**
1. Czech is highly inflected -- "staking" as a loanword would need different case forms (staking/stakingu/stakingem/stakingovy etc.) and a Czech speaker is needed to verify correctness
2. Mass find-replace on inflected forms risks introducing grammatical errors
3. The deviations are internally consistent within files (Gemini was at least self-consistent)
4. This needs a dedicated PR with Czech speaker involvement

**Action needed:** Future PR should enforce glossary terms with Czech speaker review of grammatical cases.

### Content Staleness -- Deferred Retranslation

**Decision:** Did NOT fix 3 files with significantly outdated content. Requires full retranslation.

| File | Staleness Issue |
|------|----------------|
| `roadmap/account-abstraction/index.md` | English rewritten: EIP-2771/2938/3074 sections removed, EIP-7702 added. Czech still has all old content. |
| `roadmap/future-proofing/index.md` | English restructured into "Recent Changes Implemented" / "Ongoing goals" subsections with specific EIP references. Czech has old paragraph format. |
| `roadmap/scaling/index.md:50-52` | Proto-Danksharding described in future tense ("budou implementovany") but was successfully deployed March 2024. |

**Action needed:** These files should be flagged for retranslation in subsequent parts of the 13-part Czech import, or as a separate content update.

## Pipeline Issues

### Sanitizer Crash on Orphaned Crowdin Paths

**Symptom:** `sanitize-pr.ts` crashed with `ENOENT` when processing files from the PR.

**Root cause:** The PR contains files at stale Crowdin paths (e.g., `cs/account-abstraction/index.md`) that don't exist in the worktree after merging dev (content was restructured to `cs/roadmap/account-abstraction/index.md`). The sanitizer fetches the file list from the GitHub API, builds absolute paths, and tries to `readFileSync` -- crashing on the first missing file with no graceful recovery.

**Orphaned files (10 of 38):**
- `cs/account-abstraction/` -> should be `cs/roadmap/account-abstraction/`
- `cs/code-of-conduct/` -> should be `cs/community/code-of-conduct/`
- `cs/danksharding/` -> should be `cs/roadmap/danksharding/`
- `cs/developers/docs/wrapped-eth/` -> should be `cs/wrapped-eth/`
- `cs/dvt/` -> should be `cs/staking/dvt/`
- `cs/scaling/` -> ambiguous (2 English candidates)
- `cs/statelessness/` -> should be `cs/roadmap/statelessness/`
- `cs/support/` -> should be `cs/community/support/`
- `cs/user-experience/` -> should be `cs/roadmap/user-experience/`
- `cs/withdrawals/` -> should be `cs/staking/withdrawals/`

**Fix needed in sanitize-pr.ts:** Add `fs.existsSync` check before passing files to `runSanitizer`. Filter out missing files, log them as warnings, and continue processing the files that do exist:

```typescript
const filesWithContent = translationFiles
.map((relPath) => ({ path: path.join(ROOT, relPath), content: "" }))
.filter(({ path: absPath }) => {
if (!fs.existsSync(absPath)) {
console.warn(`[sanitize-pr] SKIP (not on disk): ${absPath}`)
orphanPaths.push(absPath)
return false
}
return true
})
```

**Tracked in:** PR #17654 (add-sanitizer-tests) should add a test case for this.

### Build Verification Skipped

The `NEXT_PUBLIC_BUILD_LOCALES=en,cs pnpm build` was killed with exit code 137 (OOM) in the constrained review environment. The fixes applied are all straightforward text replacements that cannot introduce MDX compilation errors. Build verification should be re-run in CI.

## Prevention

### 1. Sanitizer Resilience

- Add `fs.existsSync` guard in `sanitize-pr.ts` before dispatching to `runSanitizer`
- Populate the existing `orphanWarnings` return field instead of crashing
- Add CI pre-flight check that worktree is clean and on correct branch

### 2. Acronym Protection

- Extend `TICKER_CORRECTIONS` in `post_import_sanitize.ts` to include semantic acronyms: MEV, ZKP, EIP, ERC, DAO, NFT, DeFi, L1, L2, PoW, PoS, EVM, ECDSA, BLS
- Add explicit `DO NOT TRANSLATE` block to Gemini prompt in `gemini.ts`
- Add acronym diff check to review agent checklist

### 3. URL Drift Detection

- Add `checkUrlDrift` function comparing English source URLs against translation URLs
- Maintain a redirect map (`src/data/i18n/url-redirects.json`) for known URL changes
- Auto-replace stale URLs during sanitizer pass

### 4. Glossary Compliance

- Make glossary injection mandatory (not optional) in `buildTranslationPrompt`
- Add post-translation glossary compliance check before committing
- Log warnings when `fetchGlossaryEntries` returns zero results for covered languages

### 5. Content Staleness Detection

- Compute staleness score (heading count diff, word count ratio) during sanitizer run
- Add optional `last-synced-en-hash` frontmatter field for tracking
- Flag files where English source changed >30% since last translation

### 6. Suggested Test Cases for PR #17654

- Orphaned path handling: `runSanitizer` should skip missing files, not crash
- Acronym protection: Detect MEV/ZKP absence in translation when present in English
- Ticker transposition: EHT -> ETH fix outside code blocks
- Brand name tag restoration: MetaMask casing
- MDX link syntax: Detect missing brackets
- Glossary compliance: Flag non-approved term usage
- Content staleness: Heading count mismatch warning
- URL drift: Detect URLs in English absent from translation

## Cross-References

- [Turkish review (PR #17182)](./crowdin-import-review-turkish-pr-17182.md) -- 7.7/10, 34 critical, 56 warnings
- [Japanese review (PR #17132)](./crowdin-import-review-japanese-pr-17132.md) -- sanitizer bugs found
- [Japanese quality scoring](./japanese-translation-quality-review-fix-process-pr-17132.md) -- 4.89/5.0
- [Agent calibration](../integration-issues/crowdin-import-review-agent-calibration.md) -- false positive reduction
- Sanitizer source: PR #17653 (`fix-review-translations` branch)
- Sanitizer tests: PR #17654 (`add-sanitizer-tests` branch)
- Community glossary: `~/.claude/translation-review/fetch-translation-glossary.json`
Loading