Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
b7de8b5
i18n(vi): Crowdin translations
wackerow Jan 27, 2026
cdadcbc
i18n(vi): JSX attribute translations
wackerow Jan 27, 2026
16fd8d8
i18n: post-import sanitization
wackerow Jan 27, 2026
1578e27
Merge branch 'dev' into i18n/import/2026-01-27T15-06-08-vi
wackerow Feb 4, 2026
1060490
fix(i18n): run sanitizer on vi translations
minimalsm Feb 13, 2026
a7d320e
Merge branch 'dev' into merge-dev-17176
minimalsm Feb 13, 2026
0ce78b2
fix(i18n): correct critical Vietnamese translation errors
wackerow Feb 17, 2026
9964d6f
fix(i18n): resolve MDX syntax errors in Vietnamese translation files
wackerow Feb 17, 2026
852d711
docs: add translation review documentation for Vietnamese
wackerow Feb 17, 2026
9db5cf9
fix(i18n): restore missing and malformed hrefs in Vietnamese translat…
wackerow Feb 18, 2026
e4fb77e
docs: add solution doc for translation href sync issues
wackerow Feb 18, 2026
8d50802
Merge branch 'dev' into i18n/import/2026-01-27T15-06-08-vi
wackerow Feb 18, 2026
5b05d0c
Merge branch 'dev' into i18n/import/2026-01-27T15-06-08-vi
wackerow Feb 19, 2026
3cca730
fix(i18n): missing vi string translations
wackerow Feb 19, 2026
6477abc
i18n: attempt to patch strings
wackerow Feb 20, 2026
72f9fd6
fix(i18n): fix Gemini's mistakes
wackerow Feb 20, 2026
a55cd84
fix(i18n): correct remaining vi diacritical and terminology errors
wackerow Feb 20, 2026
22c2e05
Merge branch 'dev' into i18n/import/2026-01-27T15-06-08-vi
wackerow Feb 20, 2026
460916f
docs: extend vi translation-review retrospective
wackerow Feb 20, 2026
0ea1124
chore: remove unused releasesData export and CoinGecko console.warn
pettinarip Feb 27, 2026
cd5744e
Add new bounty hunter 'Evgeny Legerov' - Low 1000 points
0xMushow Feb 27, 2026
dd112f4
merge: resolve conflict with dev in page-developers-tutorials.json
myelinated-wackerow Feb 27, 2026
c4ce028
Merge pull request #17680 from ethereum/chore/cleanup-unused-exports
wackerow Feb 27, 2026
e94b46b
Merge pull request #17683 from 0xMushow/patch-3
wackerow Feb 27, 2026
dd44c06
fix(i18n): review vi translations PR #17176
myelinated-wackerow Feb 27, 2026
e6fa158
fix(i18n): fix backslash-escape double-encoding
myelinated-wackerow Feb 28, 2026
2f7bfa4
docs: compound backslash-escape fix
myelinated-wackerow Feb 28, 2026
805515d
Merge pull request #17176 from ethereum/i18n/import/2026-01-27T15-06-…
wackerow Feb 28, 2026
30c462d
fix(bug-bounty): comment out BugBountyBanner component
fredrik0x Feb 28, 2026
150af4a
fix(bug-bounty): replace email contact with Google Form submission
fredrik0x Feb 28, 2026
e821d66
content(bug-bounty): include EF grantees in bounty program eligibilit…
fredrik0x Feb 28, 2026
d8da36c
refactor(bug-bounty): remove outdated submission details and card sev…
fredrik0x Feb 28, 2026
1f083f5
fix(bug-bounty): wrap leaderboards in Flex for side-by-side layout
fredrik0x Feb 28, 2026
6669e5a
fix(bug-bounty): reorder page sections for better content flow
fredrik0x Feb 28, 2026
5f96586
refactor(bug-bounty): modernize page layout with styled list components
fredrik0x Feb 28, 2026
df2c7ac
fix(bug-bounty): move leaderboard anchor id to section container
fredrik0x Feb 28, 2026
57f0abf
refactor(BugBountyCards): remove unused sub-headers and text sections
fredrik0x Feb 28, 2026
e6aef50
refactor(bug-bounty): reorder sections and update layout styling
fredrik0x Feb 28, 2026
45d73e7
fix(bug-bounty): update card styles and fix grammar in footnote
fredrik0x Feb 28, 2026
9aa95eb
text update
fredrik0x Mar 2, 2026
f02d3ba
Merge branch 'master' into new-bounty
pettinarip Mar 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
365 changes: 157 additions & 208 deletions app/[locale]/bug-bounty/page.tsx

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion app/[locale]/stablecoins/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ async function Page({ params }: { params: PageParams }) {
.map(({ id, ...rest }) => {
const coinMarketData = stablecoinsData.find((coin) => coin.id === id)
if (!coinMarketData) {
console.warn("CoinGecko stablecoin data not found:", id)
// CoinGecko data may not include all configured stablecoins
return null
}
return { ...coinMarketData, ...rest }
Expand Down
157 changes: 157 additions & 0 deletions docs/solutions/integration-issues/translation-href-sync-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
title: "Translation href Sync Issues - Links Corrupted During Crowdin Translation"
date: "2026-02-17"
category: "integration-issues"
tags:
- translation
- i18n
- crowdin
- link-integrity
- glossary
- html-structure
- json-translations
component: "src/intl/ translation JSON files"
severity: "high"
symptoms:
- "Glossary links rendering as plain text instead of clickable anchors"
- "Crowdin numbered placeholders (<0>, <1>) appearing in rendered content"
- "Links pointing to wrong glossary entries"
- "Duplicate nested <a> tags causing malformed HTML"
- "Extra links present in translations that don't exist in English"
---

# Translation href Sync Issues

## Problem

Translation PRs imported from Crowdin frequently contain corrupted `<a href="...">` tags in JSON translation files (`src/intl/{locale}/*.json`). The canonical English JSON files embed HTML links inside translation string values (e.g., `<a href="/glossary/#validator">validator</a>`). Translators on Crowdin introduce five categories of errors:

1. **Placeholder substitution**: `<a href="/glossary/#defi">DeFi</a>` becomes `<0>DeFi</0>` (Crowdin numbered placeholder)
2. **Link removal**: `<a href="/glossary/#key">keys</a>` becomes plain text `khoa`
3. **Wrong targets**: `<a href="/glossary/#node">Node</a>` becomes `<a href="/glossary/#validator">tuy chon</a>`
4. **Nested/duplicate tags**: `<a href="..."><a href="...">text</a>`
5. **Extra links added**: Links present in translation but absent in English canonical

## First Occurrence

PR #17176 (Vietnamese translations) - 13 href issues across 4 files:
- `src/intl/vi/page-roadmap.json` (6 issues)
- `src/intl/vi/page-staking.json` (6 issues)
- `src/intl/vi/glossary-tooltip.json` (1 issue)
- `src/intl/vi/glossary.json` (1 issue)

## Investigation

### Step 1: Identify changed files
```bash
git diff dev --name-only -- 'src/intl/vi/**/*.json'
```

### Step 2: Automated comparison script
For each changed JSON file, flatten the JSON, extract all `href="..."` values from both the English (`src/intl/en/`) and translated versions, and compare using symmetric set difference:

```python
import json, re, os

def extract_urls(value):
return re.findall(r'href="([^"]*)"', value)

def flatten(data, prefix=''):
items = {}
if isinstance(data, dict):
for k, v in data.items():
nk = f'{prefix}.{k}' if prefix else k
if isinstance(v, (dict, list)):
items.update(flatten(v, nk))
elif isinstance(v, str):
items[nk] = v
elif isinstance(data, list):
for i, v in enumerate(data):
nk = f'{prefix}[{i}]'
if isinstance(v, (dict, list)):
items.update(flatten(v, nk))
elif isinstance(v, str):
items[nk] = v
return items

# For each file, compare EN vs translated href sets per key
# Also check for: nested <a> tags, Crowdin placeholders (<0>, <1>)
```

### Step 3: Cross-check patterns
- Nested anchors: `re.search(r'<a [^>]*><a [^>]*>', value)`
- Crowdin placeholders where EN has real links: `re.search(r'<\d+>', vi_value)` when `re.search(r'<a href=', en_value)` is true

## Root Cause

1. **Crowdin editor behavior**: When translators restructure sentences, Crowdin converts `<a href="...">` tags into numbered placeholders (`<0>`, `<1>`) automatically
2. **Translator misunderstanding**: Translators don't realize HTML href values must remain unchanged
3. **Copy-paste errors**: Manual editing creates duplicate/nested anchor tags
4. **No JSON href validation**: The post-import sanitizer (`src/scripts/i18n/post_import_sanitize.ts`) validates hrefs in Markdown files but performs zero href checking on JSON translation values

## Solution

For each affected key:

1. Read the English canonical value from `src/intl/en/[file].json`
2. Read the translated value from `src/intl/{locale}/[file].json`
3. Restore the exact `<a href="...">` structure from English while keeping translated display text
4. Remove any extra links not present in English
5. Fix nested `<a>` tags by removing duplicates

### Example fix

**English** (`page-staking.json`):
```json
"page-staking-section-comparison-pools-rewards-li3": "Liquidity tokens can be held in your own wallet, used in <a href=\"/glossary/#defi\">DeFi</a> and sold..."
```

**Vietnamese BEFORE** (link removed):
```json
"page-staking-section-comparison-pools-rewards-li3": "Token thanh khoản được lưu trữ trong ví riêng của bạn, được sử dụng trong DeFi và bán đi..."
```

**Vietnamese AFTER** (link restored):
```json
"page-staking-section-comparison-pools-rewards-li3": "Token thanh khoản được lưu trữ trong ví riêng của bạn, được sử dụng trong <a href=\"/glossary/#defi\">DeFi</a> và bán đi..."
```

## Prevention

### Priority 1: Extend the sanitizer for JSON href validation

The post-import sanitizer at `src/scripts/i18n/post_import_sanitize.ts` already has robust href validation for Markdown (`fixTranslatedHrefs`, lines 232-401). The `processJsonFile` function (lines 1273-1306) only does BOM normalization, smart quote replacement, and JSON parse validation. It performs zero href checking.

Add a `validateJsonHrefs` step to `processJsonFile` that:
- Loads the corresponding English JSON file
- Extracts `href="..."` values from both EN and translated strings per key
- Flags missing, extra, wrong, nested, or placeholder hrefs
- Auto-fixes unambiguous cases (single mismatch per key)

### Priority 2: CI validation gate

Add a GitHub Actions check on PRs touching `src/intl/` that fails when href count mismatches, Crowdin placeholders, or nested anchors are detected. This should be a required status check on `dev` branch protection.

### Priority 3: Crowdin configuration

- Set JSON files to treat `<a href="...">` as protected tag pairs
- Enable built-in "Tags mismatch" and "Broken URLs" QA checks
- Add custom placeholder patterns for `href="[^"]*"` as non-editable tokens

### Priority 4: Reviewer checklist

When reviewing any PR touching `src/intl/`:
- [ ] Anchor tag count parity per JSON key (EN vs translated)
- [ ] No Crowdin numbered placeholders in output
- [ ] No nested `<a>` tags
- [ ] All `href="..."` values unchanged from English
- [ ] No extra or missing links vs English

## Related Files

- `src/scripts/i18n/post_import_sanitize.ts` - Post-import sanitizer (needs JSON href support)
- `src/scripts/i18n/lib/workflows/sanitization.ts` - Sanitization workflow runner
- `.claude/commands/review-translations.md` - Translation review slash command
- `.github/workflows/claude-review-translations.yml` - CI translation review workflow
- `docs/header-ids.md` - Related: header IDs must also not be translated
- `docs/solutions/translation-review/crowdin-import-review-vietnamese-pr-17176.md` - Full PR #17176 review post-mortem
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
title: Fix double-escaping of backslash-escaped angle brackets in MDX sanitizer
date: 2026-02-28
category: logic-errors
component: Translation post-import sanitizer
tags:
- regex
- mdx
- escaping
- i18n
- sanitizer
- translation
- angle-brackets
severity: high
recurring: true
languages_affected:
- vi
- cs
- fr
- ru
files_modified:
- src/scripts/i18n/post_import_sanitize.ts
- tests/unit/sanitizer/standalone-fixes.spec.ts
- public/content/translations/cs/developers/tutorials/how-to-mint-an-nft/index.md
- public/content/translations/fr/developers/tutorials/how-to-mint-an-nft/index.md
- public/content/translations/ru/developers/docs/networking-layer/portal-network/index.md
- public/content/translations/ru/developers/tutorials/reverse-engineering-a-contract/index.md
- public/content/translations/vi/developers/docs/networking-layer/portal-network/index.md
- public/content/translations/vi/developers/tutorials/how-to-mint-an-nft/index.md
- public/content/translations/vi/developers/tutorials/reverse-engineering-a-contract/index.md
---

# Fix double-escaping of backslash-escaped angle brackets in MDX sanitizer

## Problem Symptom

Translated markdown files contained `\&lt;1 GB RAM` instead of the correct `\<1 GB RAM`. This rendered as literal `\&lt;` in the browser rather than the intended `<` character. The pattern appeared across multiple languages (vi, cs, fr, ru) in files like `portal-network/index.md`, `how-to-mint-an-nft/index.md`, and `reverse-engineering-a-contract/index.md`.

The English source uses `\<1` as a valid MDX backslash escape for `<` before digits. After running the sanitizer, translations gained the extra `&lt;` entity, producing a double-escape.

## Root Cause Analysis

The `escapeMdxAngleBrackets` function in `src/scripts/i18n/post_import_sanitize.ts` (line 1558) used the following regex:

```typescript
// BUGGY:
parts[i] = parts[i].replace(/(?<!&lt|&)<(\d)/g, (_, digit) => {
fixCount++
return `&lt;${digit}`
})
```

The negative lookbehind `(?<!&lt|&)` excluded:
- `&lt` -- already HTML-entity-escaped (e.g., `&lt;1`)
- `&` -- ampersand prefix

It did **NOT** exclude `\` (backslash). When MDX content contained a valid backslash escape like `\<1 GB RAM`, the regex matched the `<` because `\` was not in the lookbehind. The replacement transformed `\<1` into `\&lt;1` -- a double-escape.

This bug was introduced because backslash-escaping of `<` is a less common MDX pattern than entity-escaping. The original regex only anticipated the two most common preceding characters (`&lt` and `&`) but missed the third valid escape prefix.

## Working Solution

### The Fix

**File:** `src/scripts/i18n/post_import_sanitize.ts` (line 1558)

Added `\\` to the negative lookbehind:

```typescript
// BEFORE (buggy):
parts[i] = parts[i].replace(/(?<!&lt|&)<(\d)/g, (_, digit) => {

// AFTER (fixed):
parts[i] = parts[i].replace(/(?<!&lt|&|\\)<(\d)/g, (_, digit) => {
```

The `\\` in the regex source represents a literal `\` character in the lookbehind, so the pattern now reads: "match `<` followed by a digit, but NOT if preceded by `&lt`, `&`, or `\`".

### Tests Added

Two new unit tests in `tests/unit/sanitizer/standalone-fixes.spec.ts`:

```typescript
test("does not escape < that is already backslash-escaped", () => {
const input =
"Accessible to resource-constrained devices (\\<1 GB RAM, \\<100 MB disk space, 1 CPU)"
const { content, fixCount } = escapeMdxAngleBrackets(input)
expect(content).toBe(input)
expect(fixCount).toBe(0)
})

test("does not escape backslash-escaped < before single digit", () => {
const input = "do the same in \\<10 minutes"
const { content, fixCount } = escapeMdxAngleBrackets(input)
expect(content).toBe(input)
expect(fixCount).toBe(0)
})
```

All 131 tests pass (129 existing + 2 new).

### Translation Files Repaired

Reverted `\&lt;` back to `\<` in 7 files across 4 languages:

| Language | File |
|----------|------|
| cs | `developers/tutorials/how-to-mint-an-nft/index.md` |
| fr | `developers/tutorials/how-to-mint-an-nft/index.md` |
| ru | `developers/docs/networking-layer/portal-network/index.md` |
| ru | `developers/tutorials/reverse-engineering-a-contract/index.md` |
| vi | `developers/docs/networking-layer/portal-network/index.md` |
| vi | `developers/tutorials/how-to-mint-an-nft/index.md` |
| vi | `developers/tutorials/reverse-engineering-a-contract/index.md` |

## Prevention Strategies

### 1. Lookbehind Completeness Checklist

For every negative lookbehind `(?<!...)` in the sanitizer, verify coverage of **all** escape character families:
- `\` (backslash -- markdown/MDX escape)
- `&` and `&lt;`, `&amp;`, `&#` (HTML entities)
- `` ` `` (backtick -- code context)

### 2. Edge Case Test Matrix

Test these input patterns for any angle bracket escaping function:

| Input | Expected | Risk |
|-------|----------|------|
| `\<1` | unchanged | Backslash escape |
| `&lt;1` | unchanged | Entity escape |
| `&<1` | unchanged | Ampersand prefix |
| `<1` | `&lt;1` | Bare angle bracket |
| `\<10\<20` | unchanged | Multiple escapes |
| `\\<1` | `\\&lt;1` | Double backslash (literal `\` + bare `<`) |

### 3. Process Improvements

- **Dry-run diff viewer:** Run sanitizer in dry-run mode before committing, showing before/after for every file so reviewers can spot double-escaping visually.
- **Regex audit:** Periodically grep for `(?<!` in sanitizer files and verify each lookbehind covers backslash escapes.
- **Cross-language regression:** When a pattern is found in one language, scan all other languages for the same pattern before closing the issue.

## Related Documentation

- [Crowdin Translation Sanitizer MDX Fence Bugs](../build-errors/crowdin-translation-sanitizer-mdx-fence-bugs.md) -- Patterns 12-15 covering `escapeMdxAngleBrackets` bugs
- [Post-Import Sanitizer Regex Bugs: Whitespace Handling](./post-import-sanitizer-regex-bugs-whitespace-handling.md) -- Sibling regex bugs in the same sanitizer
- [Sanitizer Test Research](../integration-issues/sanitizer-test-research.md) -- Comprehensive pattern catalog (Patterns 1-16)
- [Known Patterns](~/.claude/translation-review/known-patterns.md) -- Pattern 6: Double-Escaping in MDX

## Commits

- `e6fa15813e` -- fix(i18n): fix backslash-escape double-encoding (regex fix + 2 tests + 7 file repairs)
- `dd44c06a36` -- fix(i18n): review vi translations PR #17176 (bulk Vietnamese translation fixes)
Loading