[FOR DISCUSSION] feat: add automated detection of translated URL paths#16939
Closed
minimalsm wants to merge 4 commits into
Closed
[FOR DISCUSSION] feat: add automated detection of translated URL paths#16939minimalsm wants to merge 4 commits into
minimalsm wants to merge 4 commits into
Conversation
Translations should use English URL paths, not translated paths. Fixed URLs: - Irish (ga): /ciseal-2/ → /layer-2/, /pobal/deontais/ → /community/grants/, /gluais/#* → /glossary/#*, /treochlár/scálú/ → /roadmap/scaling/ - Dutch (nl): /decentrale-identiteit/ → /decentralized-identity/, /geschiedenis/#paris → /history/#paris - Turkish (tr): /developers/docs/consensus-mekanizmalar/pow → /developers/docs/consensus-mechanisms/pow - Hausa (ha): /kamus/#* → /glossary/#* Fixes multiple 404 errors reported by AHREFS SEO audit.
Add validation script and CI workflow to detect when translators accidentally translate URL paths instead of just content text. Features: - Scans translation markdown and JSON files for internal links - Validates links against known valid English paths - Uses fuzzy matching (Levenshtein) to suggest corrections - Supports --fix flag for auto-correction of high-confidence matches - Supports --json flag for CI-friendly output - CI runs on PRs touching translations and after Crowdin imports This prevents 404 errors caused by translated URLs like: - /abstecken/ instead of /staking/ (German) - /gluais/ instead of /glossary/ (Irish) - /decentrale-identiteit/ instead of /decentralized-identity/ (Dutch) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
minimalsm
added a commit
that referenced
this pull request
Dec 18, 2025
Applied automatic fixes using `pnpm validate-urls --fix`. Fixes include: - Case sensitivity: /Developers/Docs/... → /developers/docs/... - Case sensitivity: /Staking/pools → /staking/pools - Translated paths: Various localized URL paths corrected Files fixed: - de: scaling docs, standards docs, staking products - es: smart contracts deploying, developers guide - it: javascript programming docs - ja: beacon chain, glossary - ms: defi, staking - nl: dencun roadmap This PR demonstrates the automated URL validation in action. See PR #16939 for the validation tool. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Contributor
Author
Example of Script in ActionPR #16940 demonstrates the Fixes applied include:
|
✅ Deploy Preview for ethereumorg ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Performance: - Add incremental validation in CI (only validates changed files) - Optimize fuzzy matching with candidate filtering (90-95% reduction) - Full validation now runs in ~3s instead of estimated 10-30min Security: - Add ReDoS protection with line length and match count limits - Add path traversal protection with filename sanitization - Add URL length limits to prevent memory exhaustion Data Integrity: - Add atomic file modifications with rollback on failure - Fix string replacement to handle ALL occurrences (global regex) - Add JSON syntax validation after fixes - Escape special regex characters in URLs Code Quality: - Consolidate duplicate link extraction into single function - Add workflow success check for Crowdin CI trigger - Add 15-minute timeout to CI workflow
2 tasks
Contributor
|
This issue is stale because it has been open 30 days with no activity. |
Member
|
Closing out in lieu of latest intl-pipeline |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Add validation script and CI workflow to detect when translators accidentally translate URL paths instead of just content text.
This prevents 404 errors caused by translated URLs like:
/abstecken/instead of/staking/(German)/gluais/instead of/glossary/(Irish)/decentrale-identiteit/instead of/decentralized-identity/(Dutch)/ciseal-2/instead of/layer-2/(Irish)Features
Validation script (
src/scripts/validateTranslatedUrls.ts):--fixflag for auto-correction of high-confidence matches (≥70%)--jsonflag for CI-friendly outputCI workflow (
.github/workflows/validate-translations.yml):public/content/translations/**orsrc/intl/**Usage
Current Findings
Running the script on the current codebase finds 69 errors (28 auto-fixable), including:
/Developers/Docs/...→/developers/docs/.../woordenlijst(Dutch),/Glossar(German),/glossario(Italian)/routekaart/opschalen(Dutch),/treochlár/danksharding(Irish)Example Fix PR
See PR #16940 for a demonstration of using
pnpm validate-urls --fixto automatically correct these issues.Test Plan