feat(editor): broaden language detection in web code editor#37619
Conversation
The CodeMirror language registry only ships a narrow set of extensions and filenames per language, so common config and DSL files (.gitconfig, Brewfile, Vagrantfile, Containerfile, Cargo.lock, *.gemspec, *.tcc, Snakefile, etc.) render as plain text in the file editor. Pull authoritative extension/filename data from github-linguist via a new `make generate-codemirror-languages` script, write a curated subset to `assets/codemirror-languages.json`, and wire it into the editor as overrides on top of `@codemirror/language-data`. A small set of manual entries fill gaps Linguist classifies under separate languages (.editorconfig, .gitconfig, .npmrc) or doesn't list at all (*.conf, Snakefile, Containerfile.*, Dockerfile.*, Makefile.am, BSDmakefile). The derived data structures are memoised at module scope so the work runs once per page session, and the JSON moves into the dynamic `importCodemirror()` chunk so it doesn't bloat the entry bundle. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
The dynamic `import(...json, {with: {type: 'json'}})` form left a
literal `?import` URL in the entry chunk under rolldown, which the
backend then served with the wrong MIME type. Static import bundles
the data inline correctly.
Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
61 of the 67 entries had identical keys and values; replace the verbose `Record<string, string>` with a list where bare strings imply no rename and tuples carry the few that do. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Drop the hand-curated 67-name list; intersect Linguist's full set against language-data's exported names instead. A small renames map covers the six places where the two registries disagree on naming. Adds 34 niche languages we hadn't curated (Cypher, Eiffel, Erlang, Mathematica, etc.) at +4 KB on the JSON. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
wxiaoguang
left a comment
There was a problem hiding this comment.
Better to have some tests to cover cm.language.LanguageDescription.matchFilename after loading custom rules.
- Move `no-restricted-globals`/`-properties` rules to the web_src block; the underlying restrictions only apply to browser code, so the eslint-disable on `fetch` in the build script can go away. - Extract `buildLanguageDescriptions` so the assembled list is callable from tests, and add coverage for matchFilename across the extended Linguist + manual rules. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
Both points addressed in 1bff063. |
There was a problem hiding this comment.
Pull request overview
This PR improves CodeMirror syntax highlighting in Gitea’s web code editor by expanding language detection rules using a generated subset of GitHub Linguist’s languages.yml.
Changes:
- Add a generated
assets/codemirror-languages.jsonfile (and a generator script/Makefile target) to broaden filename/extension-to-language mapping. - Refactor the editor module to build language descriptions from the generated dataset, while keeping custom entries (Markdown, Dockerfile, etc.).
- Add a Vitest test that validates extended filename matching behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| web_src/js/modules/codeeditor/main.ts | Builds CodeMirror LanguageDescription list from generated Linguist-derived data and wires it into editor initialization. |
| web_src/js/modules/codeeditor/main.test.ts | Adds coverage for extended matchFilename language detection rules. |
| tools/generate-codemirror-languages.ts | Adds a Node script to generate the JSON mapping from Linguist languages.yml. |
| Makefile | Adds generate-codemirror-languages target to run the generator. |
| eslint.config.ts | Moves restricted-globals/properties rules into the web_src/**/* override. |
| assets/codemirror-languages.json | Adds the generated language mapping consumed by the editor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@wxiaoguang feature is right here I think |
What's the difference between "enhancement" and "feature"? In my mind:
|
I’d say an enhancement is basically a feature applied to something existing already.
So every enhancement is technically a feature, but not every feature is an enhancement. |
Then why mix them together? If every enhancement is a feature, then why "enhancement" should exist? Either: completely drop "enhancement", or distinguish it from "feature". |
|
I thought I label some PRs for the old release system because 1.27 will have a mix of non-conventional commits so will be harder to sort through. For 1.28 we should build the release notes from conventional prefix alone. And yes, I will apply Copilot feedback, stop pestering me about them 😆. I always resolve all reviews. |
Pull these inherently ambiguous extensions into a single global set and apply during generation so no language claims them. Previously .inc was excluded from C++/PHP only, so HTML/Pascal/SQL still picked it up at runtime; now it falls through to plain text. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Drop entries that are either no-ops in current Linguist data or redundant with the array-order resolution of overlapping extensions: - JavaScript: 21 historical aliases that are all JS syntax anyway - Shell .csh: Linguist moved the extension to Tcsh-only - XML .tmpl: Linguist only assigns it to Go Template - XML .jsproj: only XML claims it, and it really is XML/MSBuild - XML .ts/.tsx: TypeScript/TSX win on alphabetical order anyway Tests updated to assert .csproj, .jsproj highlight as XML and .spec still routes to RPM Spec. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Build a name+alias lookup from @codemirror/language-data and resolve each Linguist name through it case-insensitively. That covers 5 of the previous 6 manual renames (COBOL, Diff, INI, Less, TeX); only Protocol Buffer → ProtoBuf needs a hand entry. Pulls in three more languages (JSON-LD, Q, plus a deduped JSON5→JSON merge) for free. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
- Move manualFilenames/manualExtensions/handledByCustomEntry from the runtime into the script as extraFilenames/extraExtensions/skipNames. Re-running `make generate-codemirror-languages` is now the sole update path; the runtime layer is purely data-driven. - Drop the linguistLanguages.find() lookups for Markdown/Dockerfile and the dead fallback chains; those names no longer appear in the JSON, so the runtime hardcodes their custom extensions directly. - Inline main() into top-level await in the script and drop the try/catch + process.exit dance — Node already exits 1 on unhandled top-level rejection. - Drop redundant Array.from(new Set(...)) wrappers; Linguist input is already deduped. - Reduce the test from ~70 expects to one representative per routing decision (filename regex, ext lookup, custom override, ambiguous fallthrough, script-side extras, excludeExt redirect). Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Three changes from review feedback: 1. The generator was letting Linguist's wider extension lists override CodeMirror's curated ownership. PLSQL's Linguist extensions include `.sql`, and after alphabetical sort PLSQL preceded SQL in the array, so `.sql` files routed to PLSQL highlighting. Build a cmOriginalExtOwner map from language-data and drop any extension from a Linguist entry whose CM original owner is a different language. Test assertions added for `.sql`, `.h`, `.mm`. 2. Switch the `assets/codemirror-languages.json` import from static to the dynamic `importCodemirror()` Promise.all so the 14 KB no longer ships in the entry chunk. 3. Whitelist `/assets/codemirror-languages.json` in the dev-mode vite proxy. Same workaround already exists for `/assets/emoji.json`; without it the dynamic JSON import 404s in dev because Gitea's router intercepts `/assets/*` before vite gets a chance. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
Few tweaks pushed, the json file is now dynamically imported only when needed which required a expection in The "BAD DESIGN" in vitedev.go could be fixed by renaming the repo |
Signed-off-by: silverwind <me@silverwind.io>
|
Still the question: #37619 (comment)
|
|
an enhancement can be seen as a feature imho. |
So:
|
Because next changelog should make use of
need some consensus about labels first before we touch them |
Maybe it needs to get some consensus now. Otherwise, there will be a mess when new PRs still have inconsistent labels. |
|
The question is what is the difference between an enhancement and a feature? |
I think I have been clear:
|
* main: fix(deps): update dependency mermaid to v11.15.0 [security], add e2e test (go-gitea#37662) ci: Also lint json5 files (go-gitea#37659) fix(templates): avoid misleading compare message when branches lack merge base (go-gitea#37651) fix(deps): update npm dependencies (go-gitea#37647) refactor: routing info middleware (go-gitea#37653) chore(deps): update action dependencies (major) (go-gitea#37638) fix(deps): update go dependencies (major) (go-gitea#37639) ci(renovate): update Go import paths on major bumps (go-gitea#37641) fix(packages): Add label for private and internal package and fix composor package source permission check (go-gitea#37610) refactor: replace Fomantic search module with first-party code (go-gitea#37443) fix(deps): update npm dependencies (go-gitea#37636) fix(deps): update module code.gitea.io/sdk/gitea to v0.25.0 (go-gitea#37637) feat(api): add last_sync to repository API (go-gitea#37566) test(e2e): run playwright via container (go-gitea#37300) feat(editor): broaden language detection in web code editor (go-gitea#37619) refactor(log): replace log.Critical with log.Error (go-gitea#37624) fix: "run as root" check (go-gitea#37622) fix: improve actions status icons and texts (go-gitea#37206) # Conflicts: # pnpm-workspace.yaml


Use https://github.com/github-linguist/linguist/blob/main/lib/linguist/languages.yml to substantially improve syntax higlighting in Codemirror. File is generated on-demand only.
This PR was written with the help of Claude Opus 4.7