Skip to content

[FOR DISCUSSION] feat: add automated detection of translated URL paths#16939

Closed
minimalsm wants to merge 4 commits into
devfrom
feat/url-validation-automation
Closed

[FOR DISCUSSION] feat: add automated detection of translated URL paths#16939
minimalsm wants to merge 4 commits into
devfrom
feat/url-validation-automation

Conversation

@minimalsm
Copy link
Copy Markdown
Contributor

@minimalsm minimalsm commented Dec 18, 2025

Summary

Add validation script and CI workflow to detect when translators accidentally translate URL paths instead of just content text.

This prevents 404 errors caused by translated URLs like:

  • /abstecken/ instead of /staking/ (German)
  • /gluais/ instead of /glossary/ (Irish)
  • /decentrale-identiteit/ instead of /decentralized-identity/ (Dutch)
  • /ciseal-2/ instead of /layer-2/ (Irish)

Features

  • Validation script (src/scripts/validateTranslatedUrls.ts):

    • Scans translation markdown and JSON files for internal links
    • Validates links against known valid English paths
    • Uses fuzzy matching (Levenshtein distance) to suggest corrections
    • Supports --fix flag for auto-correction of high-confidence matches (≥70%)
    • Supports --json flag for CI-friendly output
  • CI workflow (.github/workflows/validate-translations.yml):

    • Runs on PRs touching public/content/translations/** or src/intl/**
    • Runs after Crowdin CI workflow completes
    • Fails build if translated URLs are detected

Usage

pnpm validate-urls          # Report errors
pnpm validate-urls --fix    # Auto-fix high-confidence errors
pnpm validate-urls --json   # Output as JSON

Current Findings

Running the script on the current codebase finds 69 errors (28 auto-fixable), including:

  • Case sensitivity issues: /Developers/Docs/.../developers/docs/...
  • Translated glossary paths: /woordenlijst (Dutch), /Glossar (German), /glossario (Italian)
  • Translated roadmap paths: /routekaart/opschalen (Dutch), /treochlár/danksharding (Irish)

Example Fix PR

See PR #16940 for a demonstration of using pnpm validate-urls --fix to automatically correct these issues.

Test Plan

  • Script runs successfully on existing translations
  • Detects known translated URL patterns
  • Auto-fix works for high-confidence matches
  • CI workflow triggers correctly

minimalsm and others added 2 commits December 17, 2025 23:15
Translations should use English URL paths, not translated paths.

Fixed URLs:
- Irish (ga): /ciseal-2/ → /layer-2/, /pobal/deontais/ → /community/grants/,
  /gluais/#* → /glossary/#*, /treochlár/scálú/ → /roadmap/scaling/
- Dutch (nl): /decentrale-identiteit/ → /decentralized-identity/,
  /geschiedenis/#paris → /history/#paris
- Turkish (tr): /developers/docs/consensus-mekanizmalar/pow →
  /developers/docs/consensus-mechanisms/pow
- Hausa (ha): /kamus/#* → /glossary/#*

Fixes multiple 404 errors reported by AHREFS SEO audit.
Add validation script and CI workflow to detect when translators
accidentally translate URL paths instead of just content text.

Features:
- Scans translation markdown and JSON files for internal links
- Validates links against known valid English paths
- Uses fuzzy matching (Levenshtein) to suggest corrections
- Supports --fix flag for auto-correction of high-confidence matches
- Supports --json flag for CI-friendly output
- CI runs on PRs touching translations and after Crowdin imports

This prevents 404 errors caused by translated URLs like:
- /abstecken/ instead of /staking/ (German)
- /gluais/ instead of /glossary/ (Irish)
- /decentrale-identiteit/ instead of /decentralized-identity/ (Dutch)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions Bot added content 🖋️ This involves copy additions or edits dependencies 📦 Changes related to project dependencies tooling 🔧 Changes related to tooling of the project translation 🌍 This is related to our Translation Program labels Dec 18, 2025
minimalsm added a commit that referenced this pull request Dec 18, 2025
Applied automatic fixes using `pnpm validate-urls --fix`.

Fixes include:
- Case sensitivity: /Developers/Docs/... → /developers/docs/...
- Case sensitivity: /Staking/pools → /staking/pools
- Translated paths: Various localized URL paths corrected

Files fixed:
- de: scaling docs, standards docs, staking products
- es: smart contracts deploying, developers guide
- it: javascript programming docs
- ja: beacon chain, glossary
- ms: defi, staking
- nl: dencun roadmap

This PR demonstrates the automated URL validation in action.
See PR #16939 for the validation tool.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@minimalsm minimalsm changed the title feat: add automated detection of translated URL paths [FOR DISCUSSION] feat: add automated detection of translated URL paths Dec 18, 2025
@minimalsm
Copy link
Copy Markdown
Contributor Author

Example of Script in Action

PR #16940 demonstrates the pnpm validate-urls --fix command automatically correcting 12 files across 6 languages.

Fixes applied include:

  • Case sensitivity: /Developers/Docs/Scaling/.../developers/docs/scaling/...
  • Case sensitivity: /Staking/pools/staking/pools
  • Various localized paths corrected in German, Spanish, Italian, Japanese, Malay, and Dutch translations

@netlify
Copy link
Copy Markdown

netlify Bot commented Dec 18, 2025

Deploy Preview for ethereumorg ready!

Name Link
🔨 Latest commit b385a62
🔍 Latest deploy log https://app.netlify.com/projects/ethereumorg/deploys/6943d46310884e00085806c3
😎 Deploy Preview https://deploy-preview-16939--ethereumorg.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
7 paths audited
Performance: 41 (🔴 down 11 from production)
Accessibility: 94 (no change from production)
Best Practices: 92 (🔴 down 8 from production)
SEO: 100 (🟢 up 1 from production)
PWA: 59 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Performance:
- Add incremental validation in CI (only validates changed files)
- Optimize fuzzy matching with candidate filtering (90-95% reduction)
- Full validation now runs in ~3s instead of estimated 10-30min

Security:
- Add ReDoS protection with line length and match count limits
- Add path traversal protection with filename sanitization
- Add URL length limits to prevent memory exhaustion

Data Integrity:
- Add atomic file modifications with rollback on failure
- Fix string replacement to handle ALL occurrences (global regex)
- Add JSON syntax validation after fixes
- Escape special regex characters in URLs

Code Quality:
- Consolidate duplicate link extraction into single function
- Add workflow success check for Crowdin CI trigger
- Add 15-minute timeout to CI workflow
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 5, 2026

This issue is stale because it has been open 30 days with no activity.

@github-actions github-actions Bot added the Status: Stale This issue is stale because it has been open 30 days with no activity. label Feb 5, 2026
@wackerow
Copy link
Copy Markdown
Member

Closing out in lieu of latest intl-pipeline

@wackerow wackerow closed this Apr 23, 2026
@github-actions github-actions Bot added the abandoned This has been abandoned or will not be implemented label Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

abandoned This has been abandoned or will not be implemented content 🖋️ This involves copy additions or edits dependencies 📦 Changes related to project dependencies Status: Stale This issue is stale because it has been open 30 days with no activity. tooling 🔧 Changes related to tooling of the project translation 🌍 This is related to our Translation Program

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants