Skip to content

feat(editor): broaden language detection in web code editor#37619

Merged
silverwind merged 13 commits into
go-gitea:mainfrom
silverwind:codeeditor-language-detection
May 10, 2026
Merged

feat(editor): broaden language detection in web code editor#37619
silverwind merged 13 commits into
go-gitea:mainfrom
silverwind:codeeditor-language-detection

Conversation

@silverwind
Copy link
Copy Markdown
Member

@silverwind silverwind commented May 9, 2026

Use https://github.com/github-linguist/linguist/blob/main/lib/linguist/languages.yml to substantially improve syntax higlighting in Codemirror. File is generated on-demand only.


This PR was written with the help of Claude Opus 4.7

silverwind and others added 2 commits May 9, 2026 06:44
The CodeMirror language registry only ships a narrow set of extensions
and filenames per language, so common config and DSL files (.gitconfig,
Brewfile, Vagrantfile, Containerfile, Cargo.lock, *.gemspec, *.tcc,
Snakefile, etc.) render as plain text in the file editor.

Pull authoritative extension/filename data from github-linguist via a
new `make generate-codemirror-languages` script, write a curated subset
to `assets/codemirror-languages.json`, and wire it into the editor as
overrides on top of `@codemirror/language-data`. A small set of manual
entries fill gaps Linguist classifies under separate languages
(.editorconfig, .gitconfig, .npmrc) or doesn't list at all (*.conf,
Snakefile, Containerfile.*, Dockerfile.*, Makefile.am, BSDmakefile).

The derived data structures are memoised at module scope so the work
runs once per page session, and the JSON moves into the dynamic
`importCodemirror()` chunk so it doesn't bloat the entry bundle.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
The dynamic `import(...json, {with: {type: 'json'}})` form left a
literal `?import` URL in the entry chunk under rolldown, which the
backend then served with the wrong MIME type. Static import bundles
the data inline correctly.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label May 9, 2026
silverwind and others added 3 commits May 9, 2026 06:49
Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
61 of the 67 entries had identical keys and values; replace the
verbose `Record<string, string>` with a list where bare strings imply
no rename and tuples carry the few that do.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Drop the hand-curated 67-name list; intersect Linguist's full set
against language-data's exported names instead. A small renames map
covers the six places where the two registries disagree on naming.
Adds 34 niche languages we hadn't curated (Cypher, Eiffel, Erlang,
Mathematica, etc.) at +4 KB on the JSON.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@silverwind silverwind added the type/enhancement An improvement of existing functionality label May 9, 2026
Comment thread tools/generate-codemirror-languages.ts Outdated
Copy link
Copy Markdown
Contributor

@wxiaoguang wxiaoguang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have some tests to cover cm.language.LanguageDescription.matchFilename after loading custom rules.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels May 9, 2026
- Move `no-restricted-globals`/`-properties` rules to the web_src
  block; the underlying restrictions only apply to browser code, so
  the eslint-disable on `fetch` in the build script can go away.
- Extract `buildLanguageDescriptions` so the assembled list is
  callable from tests, and add coverage for matchFilename across the
  extended Linguist + manual rules.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@silverwind
Copy link
Copy Markdown
Member Author

Both points addressed in 1bff063.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves CodeMirror syntax highlighting in Gitea’s web code editor by expanding language detection rules using a generated subset of GitHub Linguist’s languages.yml.

Changes:

  • Add a generated assets/codemirror-languages.json file (and a generator script/Makefile target) to broaden filename/extension-to-language mapping.
  • Refactor the editor module to build language descriptions from the generated dataset, while keeping custom entries (Markdown, Dockerfile, etc.).
  • Add a Vitest test that validates extended filename matching behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
web_src/js/modules/codeeditor/main.ts Builds CodeMirror LanguageDescription list from generated Linguist-derived data and wires it into editor initialization.
web_src/js/modules/codeeditor/main.test.ts Adds coverage for extended matchFilename language detection rules.
tools/generate-codemirror-languages.ts Adds a Node script to generate the JSON mapping from Linguist languages.yml.
Makefile Adds generate-codemirror-languages target to run the generator.
eslint.config.ts Moves restricted-globals/properties rules into the web_src/**/* override.
assets/codemirror-languages.json Adds the generated language mapping consumed by the editor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/generate-codemirror-languages.ts
Comment thread tools/generate-codemirror-languages.ts Outdated
@bircni
Copy link
Copy Markdown
Member

bircni commented May 9, 2026

@wxiaoguang feature is right here I think

@wxiaoguang
Copy link
Copy Markdown
Contributor

@wxiaoguang feature is right here I think

image

What's the difference between "enhancement" and "feature"?

In my mind:

  • feature: add something that didn't exist, e.g.: there was no code highlighting, now support code highlighting.
  • enhancement: focus on improving something which already exists, e.g.: add more highlighting supports.

@bircni
Copy link
Copy Markdown
Member

bircni commented May 9, 2026

What's the difference between "enhancement" and "feature"?

In my mind:

  • feature: add something that didn't exist, e.g.: there was no code highlighting, now support code highlighting.
  • enhancement: focus on improving something which already exists, e.g.: add more highlighting supports.

I’d say an enhancement is basically a feature applied to something existing already.

  • feature = new capability
  • enhancement = extends/improves an existing capability

So every enhancement is technically a feature, but not every feature is an enhancement.

Comment thread tools/generate-codemirror-languages.ts
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels May 9, 2026
@wxiaoguang
Copy link
Copy Markdown
Contributor

wxiaoguang commented May 9, 2026

So every enhancement is technically a feature, but not every feature is an enhancement.

Then why mix them together? If every enhancement is a feature, then why "enhancement" should exist?

Either: completely drop "enhancement", or distinguish it from "feature".

@silverwind
Copy link
Copy Markdown
Member Author

silverwind commented May 9, 2026

I thought I label some PRs for the old release system because 1.27 will have a mix of non-conventional commits so will be harder to sort through. For 1.28 we should build the release notes from conventional prefix alone.

And yes, I will apply Copilot feedback, stop pestering me about them 😆. I always resolve all reviews.

@silverwind silverwind marked this pull request as draft May 9, 2026 12:16
silverwind and others added 3 commits May 10, 2026 01:01
Pull these inherently ambiguous extensions into a single global set
and apply during generation so no language claims them. Previously
.inc was excluded from C++/PHP only, so HTML/Pascal/SQL still picked
it up at runtime; now it falls through to plain text.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Drop entries that are either no-ops in current Linguist data or
redundant with the array-order resolution of overlapping extensions:

- JavaScript: 21 historical aliases that are all JS syntax anyway
- Shell .csh: Linguist moved the extension to Tcsh-only
- XML .tmpl: Linguist only assigns it to Go Template
- XML .jsproj: only XML claims it, and it really is XML/MSBuild
- XML .ts/.tsx: TypeScript/TSX win on alphabetical order anyway

Tests updated to assert .csproj, .jsproj highlight as XML and .spec
still routes to RPM Spec.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Build a name+alias lookup from @codemirror/language-data and resolve
each Linguist name through it case-insensitively. That covers 5 of
the previous 6 manual renames (COBOL, Diff, INI, Less, TeX); only
Protocol Buffer → ProtoBuf needs a hand entry. Pulls in three more
languages (JSON-LD, Q, plus a deduped JSON5→JSON merge) for free.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
silverwind and others added 2 commits May 10, 2026 06:02
- Move manualFilenames/manualExtensions/handledByCustomEntry from the
  runtime into the script as extraFilenames/extraExtensions/skipNames.
  Re-running `make generate-codemirror-languages` is now the sole
  update path; the runtime layer is purely data-driven.
- Drop the linguistLanguages.find() lookups for Markdown/Dockerfile
  and the dead fallback chains; those names no longer appear in the
  JSON, so the runtime hardcodes their custom extensions directly.
- Inline main() into top-level await in the script and drop the
  try/catch + process.exit dance — Node already exits 1 on unhandled
  top-level rejection.
- Drop redundant Array.from(new Set(...)) wrappers; Linguist input is
  already deduped.
- Reduce the test from ~70 expects to one representative per routing
  decision (filename regex, ext lookup, custom override, ambiguous
  fallthrough, script-side extras, excludeExt redirect).

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
Three changes from review feedback:

1. The generator was letting Linguist's wider extension lists override
   CodeMirror's curated ownership. PLSQL's Linguist extensions include
   `.sql`, and after alphabetical sort PLSQL preceded SQL in the array,
   so `.sql` files routed to PLSQL highlighting. Build a
   cmOriginalExtOwner map from language-data and drop any extension
   from a Linguist entry whose CM original owner is a different
   language. Test assertions added for `.sql`, `.h`, `.mm`.

2. Switch the `assets/codemirror-languages.json` import from static to
   the dynamic `importCodemirror()` Promise.all so the 14 KB no longer
   ships in the entry chunk.

3. Whitelist `/assets/codemirror-languages.json` in the dev-mode vite
   proxy. Same workaround already exists for `/assets/emoji.json`;
   without it the dynamic JSON import 404s in dev because Gitea's
   router intercepts `/assets/*` before vite gets a chance.

Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
@silverwind silverwind marked this pull request as ready for review May 10, 2026 04:21
@silverwind
Copy link
Copy Markdown
Member Author

Few tweaks pushed, the json file is now dynamically imported only when needed which required a expection in vitedev.sh to work.

The "BAD DESIGN" in vitedev.go could be fixed by renaming the repo assets dir to something else, the problem is the name collision with same-named web path, but I'll defer that to later.

Comment thread Makefile Outdated
Signed-off-by: silverwind <me@silverwind.io>
@silverwind silverwind enabled auto-merge (squash) May 10, 2026 04:25
@silverwind silverwind merged commit a615988 into go-gitea:main May 10, 2026
21 checks passed
@silverwind silverwind deleted the codeeditor-language-detection branch May 10, 2026 04:51
@GiteaBot GiteaBot added this to the 1.27.0 milestone May 10, 2026
@wxiaoguang
Copy link
Copy Markdown
Contributor

Still the question: #37619 (comment)

Either: completely drop "enhancement", or distinguish it from "feature".

image

@silverwind silverwind removed the type/enhancement An improvement of existing functionality label May 10, 2026
@silverwind
Copy link
Copy Markdown
Member Author

an enhancement can be seen as a feature imho.

@wxiaoguang
Copy link
Copy Markdown
Contributor

an enhancement can be seen as a feature imho.

So:

  • why not just add "feature" label to the PR?
  • why not completely remove the "enhancement" label from the labels?

@silverwind
Copy link
Copy Markdown
Member Author

why not just add "feature" label to the PR?

Because next changelog should make use of feat: prefix

why not completely remove the "enhancement" label from the labels?

need some consensus about labels first before we touch them

@wxiaoguang
Copy link
Copy Markdown
Contributor

why not completely remove the "enhancement" label from the labels?

need some consensus about labels first before we touch them

Maybe it needs to get some consensus now.

Otherwise, there will be a mess when new PRs still have inconsistent labels.

@bircni
Copy link
Copy Markdown
Member

bircni commented May 10, 2026

The question is what is the difference between an enhancement and a feature?
Is it really that easy to Divide?

@wxiaoguang
Copy link
Copy Markdown
Contributor

The question is what is the difference between an enhancement and a feature? Is it really that easy to Divide?

I think I have been clear:

zjjhot added a commit to zjjhot/gitea that referenced this pull request May 12, 2026
* main:
  fix(deps): update dependency mermaid to v11.15.0 [security], add e2e test (go-gitea#37662)
  ci: Also lint json5 files (go-gitea#37659)
  fix(templates): avoid misleading compare message when branches lack merge base (go-gitea#37651)
  fix(deps): update npm dependencies (go-gitea#37647)
  refactor: routing info middleware (go-gitea#37653)
  chore(deps): update action dependencies (major) (go-gitea#37638)
  fix(deps): update go dependencies (major) (go-gitea#37639)
  ci(renovate): update Go import paths on major bumps (go-gitea#37641)
  fix(packages): Add label for private and internal package and fix composor package source permission check (go-gitea#37610)
  refactor: replace Fomantic search module with first-party code (go-gitea#37443)
  fix(deps): update npm dependencies (go-gitea#37636)
  fix(deps): update module code.gitea.io/sdk/gitea to v0.25.0 (go-gitea#37637)
  feat(api): add last_sync to repository API (go-gitea#37566)
  test(e2e): run playwright via container (go-gitea#37300)
  feat(editor): broaden language detection in web code editor (go-gitea#37619)
  refactor(log): replace log.Critical with log.Error (go-gitea#37624)
  fix: "run as root" check (go-gitea#37622)
  fix: improve actions status icons and texts (go-gitea#37206)

# Conflicts:
#	pnpm-workspace.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. topic/code-linting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants