Skip to content

fix(i18n): sanitizer fixes and brand protection#17653

Merged
wackerow merged 15 commits into
devfrom
fix-review-translations
Feb 26, 2026
Merged

fix(i18n): sanitizer fixes and brand protection#17653
wackerow merged 15 commits into
devfrom
fix-review-translations

Conversation

@myelinated-wackerow
Copy link
Copy Markdown
Collaborator

Summary

  • Fix 5 sanitizer bugs found during Japanese translation review (href substitution, brand tag casing, code-fence awareness, MDX escape handling, orphaned tag logic)
  • Add MDX angle bracket escaping for Crowdin translation artifacts
  • Wire up sanitizer checks and expand brand name protection
  • Add PR-scoped sanitizer script for targeted runs
  • Add orphan file detection for translated files with no English counterpart
  • Add franc-min language detection for untranslated content warnings
  • Add cross-script contamination detection (e.g. Cyrillic in Japanese files)

Test plan

  • Sanitizer runs cleanly against ja, tr, zh, es translation files
  • No MDX compilation errors on affected pages
  • Brand names preserved in frontmatter tags
  • Translated hrefs flagged as warnings (not auto-fixed)
  • Reviewer: spot-check a few translation files for correctness

Documentation

  • docs/solutions/integration-issues/post-import-sanitizer-bugs-found-japanese-review.md -- bug analysis
  • docs/solutions/integration-issues/crowdin-file-path-mapping-and-review-workflow.md -- workflow docs
  • docs/solutions/translation-review/scaling-translation-review-pipeline.md -- scaling strategy

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Co-Authored-By: wackerow 54227730+wackerow@users.noreply.github.com

wackerow and others added 15 commits February 21, 2026 12:51
ESM-only trigram language detection library used by the post-import sanitizer to detect untranslated paragraphs in translation files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ticker transposition fixes (EHT→ETH, BSL→BLS, ECDSA), frontmatter tag syncing from English source, expanded brand name list with auto-fix for tags, cross-script contamination detection for 20+ locales, MDX angle bracket escaping, orphaned closing tag removal, and franc-min-powered untranslated paragraph detection. Makes runSanitizer async to support dynamic ESM import of franc-min.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compound engineering document capturing the full brainstorm, 3-phase pipeline strategy, prevention matrix, and knowledge compounding approach for scaling review of 21 translation PRs across 24 languages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces fixCount-based issue reporting with actual content comparison so transforms only log when content genuinely changes. Adds block-scoped href replacement to prevent cross-block interference when the same href appears in multiple blocks. Detects displaced hrefs that are globally valid but in the wrong block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flag translated files that have no English source at the expected path. When a single match is found by filename, suggests the correct location. Reports ambiguous cases with candidate count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents root causes and fixes for misplaced translation files, the worktree-based review workflow, sanitizer enhancements, and automation permission requirements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sanitize-pr.ts to run the sanitizer on only files changed in a PR diff (via gh API), replacing ad-hoc TARGET_LANGUAGES scoping. Update post_import_sanitize.ts: replace syncFrontmatterTags with brand-only tag fixing, add orphan file detection with suggested correct paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. fixTranslatedHrefs: convert to warn-only — block-positional alignment
   is unreliable (Crowdin adds/removes blank lines, shifting paragraph
   indices and causing incorrect href substitutions across unrelated
   paragraphs). Href fixes left to AI review agents with semantic context.

2. fixBrandTags: use canonical casing from PROTECTED_BRAND_NAMES instead
   of copying English source values (which may be lowercase). Switch to
   targeted replacement to preserve original YAML formatting (multi-line
   arrays, spacing, quoting style).

3. fixTickerTranspositions: remove KECCAK→Keccak from corrections map
   (KECCAK is a valid all-caps form in code). Add code-fence skipping so
   ticker corrections don't modify content inside code blocks.

4. removeOrphanedClosingTags: add code-block/code-span awareness using
   the same split pattern as escapeMdxAngleBrackets, so tags inside
   backticks (e.g. `</strong>`) are not stripped.

5. removeOrphanedClosingTags: fix removal order — keep first N closers
   (paired with openers) and remove trailing excess, instead of removing
   the first N matches which strips correctly-paired tags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Documents 5 correctness bugs found in post_import_sanitize.ts during
Japanese translation review of PR #17132. Covers root causes, fixes,
prevention strategies, and testing recommendations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Expand escapeMdxAngleBrackets to catch bare <> and </> fragments in
prose (Crowdin drops backticks around these during translation).

Add restoreDroppedBackslashEscapes to detect \< patterns in English
source and restore missing backslash escapes in translations (Crowdin
strips these in table cells, e.g. \<= becomes <= and \<Storage becomes
<Storage, both of which break MDX compilation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
- Wire fixEscapedBoldAndItalic into pipeline (fixes \*\*text\*\* from Crowdin)
- Wire warnPunctuationOnlyHeadings into pipeline (detects dropped headings)
- Wire warnCodeFenceContentDrift into pipeline (detects translated code blocks)
- Add 9 Ethereum client names to PROTECTED_BRAND_NAMES
- Remove unused extractHrefsFromBlock (block-level href approach abandoned)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
- Update "60+ languages" to "25 languages" (actual count from i18n.config.json)
- Add reference to i18n.config.json as canonical language list
- Fix RTL language list (Arabic, Urdu — no Hebrew in active config)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 25, 2026

Deploy Preview for ethereumorg ready!

Name Link
🔨 Latest commit 5c7d324
🔍 Latest deploy log https://app.netlify.com/projects/ethereumorg/deploys/699e9110e578c400077e6bc4
😎 Deploy Preview https://deploy-preview-17653.ethereum.it
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
7 paths audited
Performance: 59 (🔴 down 1 from production)
Accessibility: 94 (🟢 up 1 from production)
Best Practices: 100 (no change from production)
SEO: 99 (no change from production)
PWA: 59 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions github-actions Bot added dependencies 📦 Changes related to project dependencies documentation 📖 Change or add documentation tooling 🔧 Changes related to tooling of the project labels Feb 25, 2026
Copy link
Copy Markdown
Member

@wackerow wackerow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File Notes
docs/solutions/* /workflows:compound documentation
src/scripts/i18n/lib/workflows/sanitization.ts Simple async/await patch
src/scripts/i18n/post_import_sanitize.ts Iterative adjustments to sanitizer script
src/scripts/i18n/sanitize-pr.ts new orchestrator to run sanitizer on list of files
AGENTS.md Simple patch to number of languages for better context
package.json, pnpm-lock.yaml franc-min devDep for language detection to flag incomplete/missing translations

@wackerow
Copy link
Copy Markdown
Member

@pettinarip Going to pull this and the unit testing setup in... Using changes in this branch already in translation PR reviews, but pulling it into dev will reduce friction by not needing to copy over from this branch.

Similar with the unit tests PR (#17654) which I can't really access yet when doing reviews... earlier I get this into dev, sooner I'm able to utilize the write-test-first flow when issues arise in translation reviews.

Please let me know if you spot any issues and we can iterate from here or revert as-needed.

@wackerow wackerow merged commit 499205f into dev Feb 26, 2026
8 checks passed
@wackerow wackerow deleted the fix-review-translations branch February 26, 2026 22:43
@pettinarip pettinarip mentioned this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies 📦 Changes related to project dependencies documentation 📖 Change or add documentation tooling 🔧 Changes related to tooling of the project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants