中文文本纠错相关的论文、比赛和工具。
另一个文本纠错论文仓库:
https://github.com/nghuyong/text-correction-papers
- Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction
- GEE! Grammar Error Explanation with Large Language Models
- GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning
- Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task
- On the (In)Effectiveness of Large Language Models for Chinese Text Correction
- Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking
- Eval-GCSC: A New Metric for Evaluating ChatGPT’s Performance in Chinese Spelling Correction
- C-LLM: Learn to Check Chinese Spelling Errors Character by Character
这个任务通常不涉及添/删字词,只涉及替换,所以一般输入输出的句子是等长的。
-
Chinese Spelling Correction: A Comprehensive Survey of Progress, Challenges, and Opportunities
https://arxiv.org/pdf/2502.11508
2025 -
Towards Better Chinese Spelling Check for Search Engines: A New Dataset and Strong Baseline
WSDM 2024
https://github.com/AlipaySEQ/AlipaySEQ -
AN EMPIRICAL INVESTIGATION OF DOMAIN ADAPTATION ABILITY FOR CHINESE SPELLING CHECK MODELS
https://arxiv.org/pdf/2401.14630
2024 -
Mitigating Catastrophic Forgetting in Multi-domain Chinese Spelling Correction by Multi-stage Knowledge Transfer Framework
https://arxiv.org/pdf/2402.11422
2024 -
Rethinking Masked Language Modeling for Chinese Spelling Correction
ACL2023
https://github.com/gingasan/lemon -
An Error-Guided Correction Model for Chinese Spelling Error Correction
https://arxiv.org/pdf/2301.06323.pdf
https://github.com/ruisun1/Mask-Predict-main
2023 -
MDCSpell: A Multi-task Detector-Corrector Framework for Chinese Spelling Correction
https://aclanthology.org/2022.findings-acl.98.pdf
ACL2022 -
SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking
https://arxiv.org/pdf/2210.17168.pdf
2022 -
The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checkin
https://openreview.net/pdf?id=DW8WNS97jP5
2022 -
Sparsity Regularization for Chinese Spelling Check
https://openreview.net/pdf?id=lMQ2TTkQo51
2022 -
A Chinese Spelling Check Framework Based on Reverse Contrastive Learning
https://arxiv.org/pdf/2210.13823
2022 -
Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity
https://arxiv.org/pdf/2210.10996
EMNLP 2022 -
Learning from the Dictionary: Heterogeneous Knowledge Guided Fine-tuning for Chinese Spell Checking
https://arxiv.org/pdf/2210.10320
EMNLP 2022 -
uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers
https://arxiv.org/pdf/2209.07068
COLING 2022 -
Contextual Similarity is More Valuable than Character Similarity: An Empirical Study for Chinese Spell Checking
https://arxiv.org/pdf/2207.09217
2022 -
General and Domain Adaptive Chinese Spelling Check with Error Consistent Pretraining
https://arxiv.org/pdf/2203.10929
2022 -
The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking
https://arxiv.org/pdf/2203.00991
ACL 2022 -
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking
https://arxiv.org/pdf/2105.12306
ACL 2021 -
DCSpell:A Detector-Corrector Framework for Chinese Spelling Error Correction
https://dl.acm.org/doi/10.1145/3404835.3463050
SIGIR 2021 -
Correcting Chinese Spelling Errors with Phonetic Pre-training
https://aclanthology.org/2021.findings-acl.198.pdf
ACL 2021 -
PLOME:Pre-trained with Misspelled Knowledge for Chinese Spelling Correctio
https://aclanthology.org/2021.acl-long.233/
ACL 2021 -
PHMOSpell:Phonological and Morphological Knowledge Guided Chinese Spelling Check
https://aclanthology.org/2021.acl-long.464/
ACL 2021 -
Exploration and Exploitation: Two Ways to Improve Chinese Spelling Correction Models
https://arxiv.org/abs/2105.14813
ACL 2021 -
Think Twice: A Post-Processing Approach for the Chinese Spelling Error Correction
https://pdfs.semanticscholar.org/1464/b75885f3090335d52d7852375008b6d3b721.pdf?_ga=2.5037562.853549088.1668343765-1085817804.1668343765
Applied Sciences 2021 -
Dynamic Connected Networks for Chinese Spelling Check
https://aclanthology.org/2021.findings-acl.216/
ACL 2021 -
Global Attention Decoder for Chinese Spelling Error Correction
https://aclanthology.org/2021.findings-acl.122/
ACL 2021 -
SpellBERT: A Lightweight Pretrained Model for Chinese Spelling Check
https://aclanthology.org/2021.emnlp-main.287/
EMNLP 2021 -
Domain-shift Conditioning using Adaptable Filtering via Hierarchical Embeddings for Robust Chinese Spell Check
https://arxiv.org/pdf/2008.12281
EEE/ACM TASLP -
SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check
https://arxiv.org/pdf/2004.14166
ACL 2020 -
Spelling Error Correction with Soft-Masked BERT
https://arxiv.org/pdf/2005.07421
ACL 2020 -
Chunk-based Chinese Spelling Check with Global Optimization
https://aclanthology.org/2020.findings-emnlp.184.pdf
EMNLP 2020 -
Confusionset-guided Pointer Networks for Chinese Spelling Check
https://aclanthology.org/P19-1578/
ACL 2019 -
FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm
https://aclanthology.org/D19-5522.pdf
EMNLP 2019 -
Context-Sensitive Malicious Spelling Error Correction
https://arxiv.org/abs/1901.07688
WWW 2019 -
A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
https://aclanthology.org/D18-1273/
EMNLP 2018 -
Spelling Correction as a Foreign Language
https://arxiv.org/pdf/1705.07371.pdf
2017
- SIGHAN Bake-off 2013: http://ir.itc.ntnu.edu.tw/lre/sighan7csc.html
- SIGHAN Bake-off 2014: http://ir.itc.ntnu.edu.tw/lre/clp14csc.html
- SIGHAN Bake-off 2015: http://ir.itc.ntnu.edu.tw/lre/sighan8csc.html
- Wang271K
CGEC可以增添/删除字词。
-
Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction
https://arxiv.org/pdf/2501.00334
ICASSP 2025 -
Learning from Mistakes: Self-correct Adversarial Training for Chinese Unnatural Text Correction
https://arxiv.org/pdf/2412.17279
AAAI 2025 -
Detection-Correction Structure via General Language Model for Grammatical Error Correction
ACL2024
https://arxiv.org/pdf/2405.17804 -
LM-Combiner: A Contextual Rewriting Model for Chinese Grammatical Error Correction
COLING 2024
https://arxiv.org/pdf/2403.17413.pdf
https://github.com/wyxstriker/LM-Combiner -
Alirector: Alignment-Enhanced Chinese Grammatical Error Corrector
https://arxiv.org/pdf/2402.04601.pdf
2024 -
FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation
https://arxiv.org/pdf/2311.04906.pdf
2024 -
TLM: Token-Level Masking for Transformers
https://arxiv.org/pdf/2310.18738.pdf
EMNLP 2023
https://github.com/Young1993/tlm -
Improving Seq2Seq Grammatical Error Correction via Decoding Interventions
https://arxiv.org/pdf/2310.14534.pdf
EMNLP2023
https://github.com/Jacob-Zhou/gecdi -
MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction
https://arxiv.org/pdf/2310.11671.pdf
emnlp 2023
https://github.com/THUKElab/MixEdit -
GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning 2023
https://arxiv.org/pdf/2307.13923v1.pdf
https://github.com/FreedomIntelligence/GrammarGPT -
Progressive Multi-task Learning Framework for Chinese Text Error Correction
2023
https://arxiv.org/pdf/2306.17447v1.pdf -
Grammatical Error Correction: A Survey of the State of the Art
2023
https://arxiv.org/pdf/2211.05166v3.pdf -
An Analysis of GPT-3’s Performance in Grammatical Error Correction
2023
https://arxiv.org/pdf/2303.14342v1.pdf -
Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors
https://arxiv.org/pdf/2204.12052.pdf
2022 -
Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction
https://arxiv.org/pdf/2210.10442.pdf
EMNLP 2022
https://github.com/masr2000/CLG-CGEC -
Grammatical Error Correction: A Survey of the State of the Art
https://arxiv.org/pdf/2211.05166
2022 -
From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction
https://arxiv.org/pdf/2211.01625
2022 -
Focus Is What You Need For Chinese Grammatical Error Correction
https://arxiv.org/pdf/2210.12692
2022 -
SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser
https://arxiv.org/pdf/2210.12484EMNLP 2022
-
FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction
https://arxiv.org/pdf/2210.12364
EMNLP 2022 -
Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction
https://arxiv.org/pdf/2210.10442
EMNLP 2022 -
Chinese grammatical error correction based on knowledge distillation
https://arxiv.org/pdf/2208.00351
2022 -
Mining Error Templates for Grammatical Error Correction
https://arxiv.org/pdf/2206.11569
2022 -
Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
https://arxiv.org/pdf/2205.10884
AAAI 2022 -
A New Evaluation Method: Evaluation Data and Metrics for Chinese Grammar Error Correction
https://arxiv.org/pdf/2205.00217
2022 -
Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors
https://arxiv.org/pdf/2204.12052
2022 -
MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction
https://arxiv.org/pdf/2204.10994
NAACL 2022 -
"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction
https://arxiv.org/pdf/2203.00286
ACL 2022 -
Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding
https://arxiv.org/pdf/2106.04970
ACL 2021 -
Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction
https://arxiv.org/pdf/2106.01609
ACL 2021 -
Chinese Grammatical Correction Using BERT-based Pre-trained Model
https://arxiv.org/pdf/2011.02093
AACL-IJCNLP 2020 -
Improving the Efficiency of Grammatical Error Correction with Erroneous Span Detection and Correction
https://arxiv.org/pdf/2010.03260
EMNLP 2020
- CSED: A Chinese Semantic Error Diagnosis Corpus
2023
https://arxiv.org/pdf/2305.05183v1.pdf - Pre-Training with Syntactic Structure Prediction for Chinese Semantic Error Recognition
https://openreview.net/pdf?id=Qm_Z1UNDPN
2022 - BART based semantic correction for Mandarin(普通话) automatic speech recognition system
https://arxiv.org/pdf/2104.05507.pdf
2021 - Improving Pre-trained Language Models with Syntactic Dependency Prediction Task for Chinese Semantic Error Recognition
https://arxiv.org/pdf/2204.07464.pdf
2022
- 2021CAAI创新创业大赛
- CTC2021
- CCL2022 中文语法纠错评测
- 文本智能校对大赛 - 飞桨AI Studio (baidu.com)
- CGED历年的比赛
- CCL 2022 汉语学习者文本纠错评测 解决方案
-
GECToR -- Grammatical Error Correction: Tag, Not Rewrite
https://arxiv.org/pdf/2005.12592v2.pdf
ACL 2020
虽然这篇论文是针对于英文的,但是在很多中文的比赛中都有用到它。 -
当然还有seq2seq的文本纠错。
-
还有一些仓库的也很全:
-
2024开源最好的纠错项目
https://github.com/TW-NLP/ChineseErrorCorrector