fix(i18n): sanitizer fixes and brand protection#17653
Merged
Merged
Conversation
ESM-only trigram language detection library used by the post-import sanitizer to detect untranslated paragraphs in translation files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ticker transposition fixes (EHT→ETH, BSL→BLS, ECDSA), frontmatter tag syncing from English source, expanded brand name list with auto-fix for tags, cross-script contamination detection for 20+ locales, MDX angle bracket escaping, orphaned closing tag removal, and franc-min-powered untranslated paragraph detection. Makes runSanitizer async to support dynamic ESM import of franc-min. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Compound engineering document capturing the full brainstorm, 3-phase pipeline strategy, prevention matrix, and knowledge compounding approach for scaling review of 21 translation PRs across 24 languages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces fixCount-based issue reporting with actual content comparison so transforms only log when content genuinely changes. Adds block-scoped href replacement to prevent cross-block interference when the same href appears in multiple blocks. Detects displaced hrefs that are globally valid but in the wrong block. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flag translated files that have no English source at the expected path. When a single match is found by filename, suggests the correct location. Reports ambiguous cases with candidate count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents root causes and fixes for misplaced translation files, the worktree-based review workflow, sanitizer enhancements, and automation permission requirements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add sanitize-pr.ts to run the sanitizer on only files changed in a PR diff (via gh API), replacing ad-hoc TARGET_LANGUAGES scoping. Update post_import_sanitize.ts: replace syncFrontmatterTags with brand-only tag fixing, add orphan file detection with suggested correct paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. fixTranslatedHrefs: convert to warn-only — block-positional alignment is unreliable (Crowdin adds/removes blank lines, shifting paragraph indices and causing incorrect href substitutions across unrelated paragraphs). Href fixes left to AI review agents with semantic context. 2. fixBrandTags: use canonical casing from PROTECTED_BRAND_NAMES instead of copying English source values (which may be lowercase). Switch to targeted replacement to preserve original YAML formatting (multi-line arrays, spacing, quoting style). 3. fixTickerTranspositions: remove KECCAK→Keccak from corrections map (KECCAK is a valid all-caps form in code). Add code-fence skipping so ticker corrections don't modify content inside code blocks. 4. removeOrphanedClosingTags: add code-block/code-span awareness using the same split pattern as escapeMdxAngleBrackets, so tags inside backticks (e.g. `</strong>`) are not stripped. 5. removeOrphanedClosingTags: fix removal order — keep first N closers (paired with openers) and remove trailing excess, instead of removing the first N matches which strips correctly-paired tags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Documents 5 correctness bugs found in post_import_sanitize.ts during Japanese translation review of PR #17132. Covers root causes, fixes, prevention strategies, and testing recommendations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
Expand escapeMdxAngleBrackets to catch bare <> and </> fragments in prose (Crowdin drops backticks around these during translation). Add restoreDroppedBackslashEscapes to detect \< patterns in English source and restore missing backslash escapes in translations (Crowdin strips these in table cells, e.g. \<= becomes <= and \<Storage becomes <Storage, both of which break MDX compilation). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
- Wire fixEscapedBoldAndItalic into pipeline (fixes \*\*text\*\* from Crowdin) - Wire warnPunctuationOnlyHeadings into pipeline (detects dropped headings) - Wire warnCodeFenceContentDrift into pipeline (detects translated code blocks) - Add 9 Ethereum client names to PROTECTED_BRAND_NAMES - Remove unused extractHrefsFromBlock (block-level href approach abandoned) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
- Update "60+ languages" to "25 languages" (actual count from i18n.config.json) - Add reference to i18n.config.json as canonical language list - Fix RTL language list (Arabic, Urdu — no Hebrew in active config) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-Authored-By: wackerow <54227730+wackerow@users.noreply.github.com>
✅ Deploy Preview for ethereumorg ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
4 tasks
wackerow
approved these changes
Feb 25, 2026
Member
wackerow
left a comment
There was a problem hiding this comment.
| File | Notes |
|---|---|
docs/solutions/* |
/workflows:compound documentation |
src/scripts/i18n/lib/workflows/sanitization.ts |
Simple async/await patch |
src/scripts/i18n/post_import_sanitize.ts |
Iterative adjustments to sanitizer script |
src/scripts/i18n/sanitize-pr.ts |
new orchestrator to run sanitizer on list of files |
AGENTS.md |
Simple patch to number of languages for better context |
package.json, pnpm-lock.yaml |
franc-min devDep for language detection to flag incomplete/missing translations |
Member
|
@pettinarip Going to pull this and the unit testing setup in... Using changes in this branch already in translation PR reviews, but pulling it into dev will reduce friction by not needing to copy over from this branch. Similar with the unit tests PR (#17654) which I can't really access yet when doing reviews... earlier I get this into dev, sooner I'm able to utilize the write-test-first flow when issues arise in translation reviews. Please let me know if you spot any issues and we can iterate from here or revert as-needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Test plan
Documentation
docs/solutions/integration-issues/post-import-sanitizer-bugs-found-japanese-review.md-- bug analysisdocs/solutions/integration-issues/crowdin-file-path-mapping-and-review-workflow.md-- workflow docsdocs/solutions/translation-review/scaling-translation-review-pipeline.md-- scaling strategyCo-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Co-Authored-By: wackerow 54227730+wackerow@users.noreply.github.com