Skip to content

A linter and formatter to help you to improve copywriting, correct spaces, words, and punctuations between CJK (Chinese, Japanese, Korean).

License

Notifications You must be signed in to change notification settings

huacnlee/autocorrect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

20bea25 · Mar 26, 2025
Mar 26, 2025
Jan 18, 2023
Mar 24, 2025
Feb 8, 2023
Mar 24, 2025
Oct 17, 2024
Mar 24, 2025
Mar 24, 2025
Mar 24, 2025
Mar 24, 2025
Oct 13, 2024
Mar 24, 2025
May 7, 2024
Jan 30, 2023
Jun 20, 2024
Dec 14, 2022
Jul 17, 2021
May 22, 2024
May 7, 2024
May 27, 2024
Aug 2, 2023
May 26, 2020
May 22, 2024
Mar 24, 2025
May 27, 2024
Jan 29, 2023
Aug 4, 2023
Apr 21, 2022
Dec 5, 2023

Repository files navigation

AutoCorrect Icon

AutoCorrect

Go GitHub release (latest by date) Docker Image Version (latest server) Crates.io NPM PyPI version Gem Version Maven Central

🎯 AutoCorrect 的愿景是提供一套标准化的文案校正方案。以便于在各类场景(例如:撰写书籍、文档、内容发布、项目源代码...)里面应用,让使用者轻松实现标准化、专业化的文案输出 / 校正。

AutoCorrect is a linter and formatter to help you to improve copywriting, correct spaces, words, and punctuations between CJK (Chinese, Japanese, Korean).

Like Eslint, Rubocop and Gofmt ..., AutoCorrect allows us to check source code, and output as colorized diff with corrected suggestions. You can integrate to CI (GitLab CI, GitHub Action, Travis CI....) for use to check the contents in source code. Recognize the file name, and find out the strings and the comment part.

AutoCorrect 是一个基于 Rust 编写的工具,用于「自动纠正」或「检查并建议」文案,给 CJK(中文、日语、韩语)与英文混写的场景,补充正确的空格,纠正单词,同时尝试以安全的方式自动纠正标点符号等等。

类似 ESlint、Rubocop、Gofmt 等工具,AutoCorrect 可以用于 CI 环境,它提供 Lint 功能,能便捷的检测出项目中有问题的文案,起到统一规范的作用。

支持各种类型源代码文件,能自动识别文件名,并准确找到字符串、注释做自动纠正。

此方案最早于 2013 年 出现于 Ruby China 的项目,并逐步完善规则细节,当前准确率较高(极少数异常情况),你可以放心用来辅助你完成自动纠正动作。

autocorrect lint output

Features

  • Add spacing between CJK (Chinese, Japanese, Korean) and English words.
  • Correct punctuations into full-width near the CJK.
  • Correct punctuations into half-width in English content.
  • (Experimental) Spellcheck and correct words with your dictionary.
  • Lint checking and output diff or JSON result, so you can integrate everywhere (GitLab CI, GitHub Action, VS Code, Vim, Emacs...)
  • Allows using .gitignore or .autocorrectignore to ignore files that you want to ignore.
  • Support more than 28 file types (Markdown, JSON, YAML, JavaScript, HTML ...), use AST parser to only check for strings, and comments.
  • LSP server: autocorrect-lsp
  • Cross-platform for Linux, macOS, Windows, and WebAssembly, and as Native SDK for programming (Node.js, JavaScript Browser, Ruby, Python, Java).

典型应用场景

  • 撰写书籍、文档,新闻媒体等内容发布,应用于 Markdown、AsciiDoc、HTML 等文档场景,确保文案的标准化、专业化(案例:MDN 项目少数派)。
  • 集成 GitLab CI、GitHub Action、Travis CI 等 CI 环境,需要对项目进行自动化检查。
  • 集成到 Docusaurus、Hexo、Hugo、Jekyll、Gatsby 等静态网站生成器,在生成的时候自动格式化。
  • 利用语言支持的 SDK 集成到应用程序,在存储或输出网站内容的时候格式化,提升网站品质(如:Ruby ChinaV2EXLongbridge)。
  • 作为 VS Code、Intellij Platform IDE(已支持)、Vim、Emacs (待实现) 插件,需要对文案进行检查(Linter & Formatter),依靠 LintResult 给出的(Annotator、Diagnostic)提示。
  • 基于 WebAssembly 实现,作为 Chrome、Safari 等浏览器插件,应用于任何网站(待实现)
  • 也可以集成到 WYSIWYG Editor 里面,例如(ProseMirror、CKEditor、Slate、Draft.js、Tiptap、Monaco Editor、CodeMirror 等)。

Installation

Install on macOS

You can install it via Homebrew:

$ brew install autocorrect
Install on Windows

You can install it via Scoop:

$ scoop install autocorrect

Or you can just install it via this on Unix-like system:

$ curl -sSL https://git.io/JcGER | sh

After that, you will get autocorrect command.

$ autocorrect -V
AutoCorrect 2.4.0

Or install NPM:

$ yarn add autocorrect-node
$ yarn autocorrect -V

Upgrade

Since: 1.9.0

AutoCorrect allows you to upgrade itself by autocorrect update command.

$ autocorrect update

NOTE: This command need you input your password, because it will install bin into /usr/local/bin directory.

Usage

Use in CLI

$ autocorrect text.txt
你好 Hello 世界

$ echo "hello世界" | autocorrect --stdin
hello 世界

$ autocorrect --fix text.txt
$ autocorrect --fix zh-CN.yml
$ autocorrect --fix

Lint

$ autocorrect --lint --format json text.txt

$ autocorrect --lint text.txt
Error: 1, Warning: 0

text.txt:1:3
-你好Hello世界
+你好 Hello 世界

You also can lint multiple files:

$ autocorrect --lint

How to lint all changed files in Git:

$ git diff --name-only | xargs autocorrect --lint

Use in NPM

since: 2.7.0

AutoCorrect has been published in NPM with CLI command support. If you want to use it in Frontend or Node.js project, you can just install autocorrect-node package for without install AutoCorrect bin.

cd your-project
yarn add autocorrect-node

Now you can run yarn autocorrect command in your project. This command is same as autocorrect command.

$ yarn autocorrect -h

More docs: autocorrect-node/README.md

Configuration

Default config: .autocorrect.default

$ autocorrect init
AutoCorrect init config: .autocorrectrc

NOTE: If you download fail, try to use autocorrect init --local command again.

Now the .autocorrectrc file has been created.

.autocorrectrc is allows use YAML, JSON format.

Config file example:

# yaml-language-server: $schema=https://huacnlee.github.io/autocorrect/schema.json
# Config rules
rules:
  # Auto add spacing between CJK (Chinese, Japanese, Korean) and English words.
  # 0 - off, 1 - error, 2 - warning
  space-word: 1
  # Add space between some punctuations.
  space-punctuation: 1
  # Add space between brackets (), [] when near the CJK.
  space-bracket: 1
  # Add space between ``, when near the CJK.
  space-backticks: 1
  # Add space between dash `-`
  space-dash: 0
  # Convert to fullwidth.
  fullwidth: 1
  # To remove space near the fullwidth.
  no-space-fullwidth: 1
  # Fullwidth alphanumeric characters to halfwidth.
  halfwidth-word: 1
  # Fullwidth punctuations to halfwidth in english.
  halfwidth-punctuation: 1
  # Spellcheck
  spellcheck: 2
# Enable or disable in a specific context
context:
  # Enable or disable to format codeblock in Markdown or AsciiDoc etc.
  codeblock: 1
textRules:
  # Config special rules for some texts
  # For example, if we wants to let "Hello你好" just warning, and "Hi你好" to ignore
  # "Hello你好": 2
  # "Hi你好": 0
fileTypes:
  # Config the files associations, you config is higher priority than default.
  # "rb": ruby
  # "Rakefile": ruby
  # "*.js": javascript
  # ".mdx": markdown
spellcheck:
  # Correct Words (Case insensitive) for by Spellcheck
  words:
    - GitHub
    - App Store
    # This means "appstore" into "App Store"
    - AppStore = App Store
    - Git
    - Node.js
    - nodejs = Node.js
    - VIM
    - DNS
    - HTTP
    - SSL

Ignore option

Since: 2.2.0

When you want to config some special words or texts to ignore on format or lint.

The textRules config may help you.

For example, we want:

  • Hello世界 - To just give a warning.
  • Hi你好 - To ignore.

Use can config:

textRules:
  Hello世界: 2
  Hi你好: 0

After that, AutoCorrect will follow your textRules to process.

Ignore files

Use .autocorrectignore to ignore files

Sometimes, you may want to ignore some special files that not want to check.

By default, the file matched .gitignore rule will be ignored.

You can also use .autocorrectignore to ignore other files, format like .gitignore.

Disable by inline comment

If you just want to disable some special lines in a file, you can write a comment autocorrect-disable, when AutoCorrect matched the comment include that, it will disable temporarily.

And then, you can use autocorrect-enable to reopen it again.

For example, in JavaScript:

function hello() {
  // autocorrect-disable
  console.log("现在这行开始autocorrect会暂时禁用");
  console.log("这行也是disable的状态");
  // autocorrect-enable
  let a = "现在起autocorrect回到了启用的状态";
}

The output will:

function hello() {
  // autocorrect-disable
  console.log("现在这行开始autocorrect会暂时禁用");
  console.log("这行也是disable的状态");
  // autocorrect-enable
  let a = "现在起 autocorrect 回到了启用的状态";
}

Disable some rules

Since: 2.0

You can use autocorrect-disable <rule> in a comment to disable some rules.

Rule names please see: Configuration

function hello() {
  // autocorrect-disable space-word
  console.log("现在这行开始autocorrect会暂时禁用.");
  // autocorrect-disable fullwidth
  console.log("这行也是disable的状态.");
  // autocorrect-enable
  let a = "现在起autocorrect回到了启用的状态.";
}

Will get:

function hello() {
  // autocorrect-disable space-word
  console.log("现在这行开始autocorrect会暂时禁用。");
  // autocorrect-disable fullwidth, space-word
  console.log("这行也是disable的状态.");
  // autocorrect-enable
  let a = "现在起 autocorrect 回到了启用的状态。";
}

VS Code Extension

Install Extension

https://marketplace.visualstudio.com/items?itemName=huacnlee.autocorrect

Screenshot:

AutoCorrect for VS Code Extension

Intellij Platform Plugin

AutoCorrect for Intellij Platform Plugin

https://github.com/huacnlee/autocorrect-idea-plugin

GitHub Action

https://github.com/huacnlee/autocorrect-action

Add to your .github/workflows/ci.yml

steps:
  - name: Check source code
    uses: actions/checkout@v4

  - name: AutoCorrect
    uses: huacnlee/autocorrect-action@main

GitLab CI

Add to your .gitlab-ci.yml, to use huacnlee/autocorrect Docker image to check.

autocorrect:
  stage: build
  image: huacnlee/autocorrect:latest
  script:
    - autocorrect --lint
  # Enable allow_failure if you wants.
  # allow_failure: true

Work with ReviewDog

Since: 2.8.0

AutoCorrect can work with reviewdog, so you can use it in CI/CD. ReviewDog will post a comment to your PR with the AutoCorrect change suggestions. Then the PR committer can easy to accept the suggestions.

Use --format rdjson option to output the lint results as the reviewdog supported format.

autocorrect --lint --format rdjson | reviewdog -f=rdjson -reporter=github-pr-review

Use huacnlee/autocorrect-action can help you setup GitHub Action.

Use for programming

AutoCorrect makes for support use in many programming languages.

Benchmark

MacBook Pro (13-inch, Apple M3, 2023)

Use make bench to run benchmark tests.

See autocorrect/src/benches/example.rs for details.

format_050              time:   [4.9991 µs 5.0175 µs 5.0382 µs]
format_100              time:   [8.7714 µs 8.8236 µs 8.8896 µs]
format_400              time:   [23.535 µs 23.591 µs 23.666 µs]
format_html             time:   [332.87 µs 334.00 µs 335.37 µs]
halfwidth_english       time:   [1.2051 µs 1.2079 µs 1.2110 µs]
format_json             time:   [54.019 µs 54.345 µs 54.855 µs]
format_javascript       time:   [176.61 µs 181.64 µs 187.20 µs]
format_json_2k          time:   [9.3245 ms 9.3768 ms 9.4390 ms]
format_jupyter          time:   [200.77 µs 204.93 µs 210.91 µs]
format_markdown         time:   [1.2216 ms 1.2246 ms 1.2283 ms]

spellcheck_50           time:   [1.2098 µs 1.2162 µs 1.2234 µs]
spellcheck_100          time:   [2.2592 µs 2.3049 µs 2.3861 µs]
spellcheck_400          time:   [7.7480 µs 7.9111 µs 8.1764 µs]

lint_markdown           time:   [1.2704 ms 1.2883 ms 1.3173 ms]
lint_json               time:   [58.696 µs 60.847 µs 63.484 µs]
lint_html               time:   [448.53 µs 486.95 µs 534.01 µs]
lint_javascript         time:   [177.00 µs 177.88 µs 178.69 µs]
lint_yaml               time:   [378.35 µs 382.30 µs 387.85 µs]
lint_to_json            time:   [1.2629 ms 1.2689 ms 1.2769 ms]
lint_to_diff            time:   [1.3255 ms 1.3288 ms 1.3327 ms]

Real world benchmark

With MDN Translated Content project, it has about 30K files.

~/work/translated-content $ autocorrect --fix
AutoCorrect spend time: 8402.538ms

Other Extensions

The other implementations from the community.

User cases

License

This project under MIT license.