Skip to content

Commit 239db3b

Browse files
committed
Improve test
1 parent 4d711c5 commit 239db3b

File tree

7 files changed

+123
-85
lines changed

7 files changed

+123
-85
lines changed

.github/workflows/build.yml

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: Go
2+
on: [push, pull_request]
3+
jobs:
4+
build:
5+
name: Build
6+
runs-on: ubuntu-latest
7+
steps:
8+
- name: Set up Go 1.17
9+
uses: actions/setup-go@v1
10+
with:
11+
go-version: 1.17
12+
id: go
13+
14+
- name: Check out code into the Go module directory
15+
uses: actions/checkout@v1
16+
17+
- name: Get dependencies
18+
run: |
19+
go get -v -t -d ./...
20+
21+
- name: Test
22+
run: go test
23+
lint:
24+
name: AutoCorrect
25+
runs-on: ubuntu-latest
26+
steps:
27+
- name: Check source code
28+
uses: actions/checkout@main
29+
30+
- name: AutoCorrect
31+
uses: huacnlee/autocorrect-action@main

README.md

+32-31
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,36 @@
1-
# gocc - Golang version OpenCC
2-
[![GoDoc](https://godoc.org/github.com/liuzl/gocc?status.svg)](https://godoc.org/github.com/liuzl/gocc)[![Go Report Card](https://goreportcard.com/badge/github.com/liuzl/gocc)](https://goreportcard.com/report/github.com/liuzl/gocc)
3-
## Introduction 介紹
4-
gocc is a golang port of OpenCC([Open Chinese Convert 開放中文轉換](https://github.com/BYVoid/OpenCC/)) which is a project for conversion between Traditional and Simplified Chinese developed by [BYVoid](https://www.byvoid.com/).
1+
# OpenCC for Go
52

6-
gocc stands for "**Go**lang version Open**CC**", it is a total rewrite version of OpenCC in Go. It just borrows the dict files and config files of OpenCC, so it may not produce the same output with the original OpenCC.
3+
[![Go](https://github.com/griffinqiu/opencc/workflows/Go/badge.svg)](https://github.com/griffinqiu/opencc/actions?query=workflow%3AGo)
4+
5+
This is a Go version of OpenCC([Open Chinese Convert 開放中文轉換](https://github.com/BYVoid/OpenCC/)) which is a project for conversion between Traditional and Simplified Chinese developed by [BYVoid](https://www.byvoid.com/).
6+
7+
此项目是基于 Native Go 方式实现 OpenCC,利用 OpenCC 官方的词典,减少 C 库对环境的依赖,同时,基于 Go Embed 特性,可以让我们编译的时候,直接将字典打包到 Go 的二进制里面,以获得较好的开发部署体验。
8+
9+
## Installation
710

8-
## Installation 安裝
9-
### 1, golang package
10-
```sh
11-
go get github.com/liuzl/gocc
12-
```
13-
### 2, Command Line
1411
```sh
15-
git clone https://github.com/liuzl/gocc
16-
cd gocc/cmd
17-
make install
18-
gocc --help
19-
echo "我们是工农子弟兵" | gocc
20-
#我們是工農子弟兵
12+
go get github.com/griffinqiu/opencc
2113
```
2214

23-
## Usage 使用
15+
## Usage
16+
2417
```go
2518
package main
2619

2720
import (
2821
"fmt"
2922
"log"
30-
31-
"github.com/liuzl/gocc"
23+
24+
"github.com/griffinqiu/opencc"
3225
)
3326

3427
func main() {
35-
s2t, err := gocc.New("s2t")
28+
s2t, err := opencc.New("s2t")
3629
if err != nil {
3730
log.Fatal(err)
3831
}
32+
33+
3934
in := `自然语言处理是人工智能领域中的一个重要方向。`
4035
out, err := s2t.Convert(in)
4136
if err != nil {
@@ -46,14 +41,20 @@ func main() {
4641
//自然語言處理是人工智能領域中的一個重要方向。
4742
}
4843
```
44+
4945
## Conversions
50-
* `s2t` Simplified Chinese to Traditional Chinese
51-
* `t2s` Traditional Chinese to Simplified Chinese
52-
* `s2tw` Simplified Chinese to Traditional Chinese (Taiwan Standard)
53-
* `tw2s` Traditional Chinese (Taiwan Standard) to Simplified Chinese
54-
* `s2hk` Simplified Chinese to Traditional Chinese (Hong Kong Standard)
55-
* `hk2s` Traditional Chinese (Hong Kong Standard) to Simplified Chinese
56-
* `s2twp` Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom
57-
* `tw2sp` Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom
58-
* `t2tw` Traditional Chinese (OpenCC Standard) to Taiwan Standard
59-
* `t2hk` Traditional Chinese (OpenCC Standard) to Hong Kong Standard
46+
47+
- `s2t` Simplified Chinese to Traditional Chinese
48+
- `t2s` Traditional Chinese to Simplified Chinese
49+
- `s2tw` Simplified Chinese to Traditional Chinese (Taiwan Standard)
50+
- `tw2s` Traditional Chinese (Taiwan Standard) to Simplified Chinese
51+
- `s2hk` Simplified Chinese to Traditional Chinese (Hong Kong Standard)
52+
- `hk2s` Traditional Chinese (Hong Kong Standard) to Simplified Chinese
53+
- `s2twp` Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom
54+
- `tw2sp` Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom
55+
- `t2tw` Traditional Chinese (OpenCC Standard) to Taiwan Standard
56+
- `t2hk` Traditional Chinese (OpenCC Standard) to Hong Kong Standard
57+
58+
## License
59+
60+
Apache License

config/s2hk.json

+24-16
Original file line numberDiff line numberDiff line change
@@ -7,21 +7,29 @@
77
"file": "STPhrases.txt"
88
}
99
},
10-
"conversion_chain": [{
11-
"dict": {
12-
"type": "group",
13-
"dicts": [{
14-
"type": "txt",
15-
"file": "STPhrases.txt"
16-
}, {
17-
"type": "txt",
18-
"file": "STCharacters.txt"
19-
}]
20-
}
21-
}, {
22-
"dict": {
23-
"type": "txt",
24-
"file": "HKVariants.txt"
10+
"conversion_chain": [
11+
{
12+
"dict": {
13+
"type": "group",
14+
"dicts": [
15+
{
16+
"type": "txt",
17+
"file": "STPhrases.txt"
18+
},
19+
{
20+
"type": "txt",
21+
"file": "STCharacters.txt"
22+
}
23+
]
24+
}
25+
},
26+
{
27+
"dict": [
28+
{
29+
"type": "txt",
30+
"file": "HKVariants.txt"
31+
}
32+
]
2533
}
26-
}]
34+
]
2735
}

go.mod

+1-4
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,6 @@ go 1.16
55
require (
66
github.com/adamzy/cedar-go v0.0.0-20170805034717-80a9c64b256d // indirect
77
github.com/eknkc/basex v1.0.1 // indirect
8-
github.com/liuzl/cedar-go v0.0.0-20170805034717-80a9c64b256d
8+
github.com/liuzl/cedar-go v0.0.0-20170805034717-80a9c64b256d // indirect
99
github.com/liuzl/da v0.0.0-20180704015230-14771aad5b1d
10-
github.com/liuzl/gocc v0.0.0-20200216023908-f8cb162baf44
11-
github.com/liuzl/goutil v0.0.0-20210628080224-310b49755b5f
12-
github.com/stretchr/testify v1.7.0 // indirect
1310
)

go.sum

-10
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,12 @@
1-
github.com/adamzy/cedar-go v0.0.0-20170805034717-80a9c64b256d h1:ir/IFJU5xbja5UaBEQLjcvn7aAU01nqU/NUyOBEU+ew=
21
github.com/adamzy/cedar-go v0.0.0-20170805034717-80a9c64b256d/go.mod h1:PRWNwWq0yifz6XDPZu48aSld8BWwBfr2JKB2bGWiEd4=
3-
github.com/davecgh/go-spew v1.1.0 h1:ZDRjVQ15GmhC3fiQ8ni8+OwkZQO4DARzQgrnXU1Liz8=
42
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
5-
github.com/eknkc/basex v1.0.1 h1:TcyAkqh4oJXgV3WYyL4KEfCMk9W8oJCpmx1bo+jVgKY=
63
github.com/eknkc/basex v1.0.1/go.mod h1:k/F/exNEHFdbs3ZHuasoP2E7zeWwZblG84Y7Z59vQRo=
74
github.com/liuzl/cedar-go v0.0.0-20170805034717-80a9c64b256d h1:qSmEGTgjkESUX5kPMSGJ4pcBUtYVDdkNzMrjQyvRvp0=
85
github.com/liuzl/cedar-go v0.0.0-20170805034717-80a9c64b256d/go.mod h1:x7SghIWwLVcJObXbjK7S2ENsT1cAcdJcPl7dRaSFog0=
96
github.com/liuzl/da v0.0.0-20180704015230-14771aad5b1d h1:hTRDIpJ1FjS9ULJuEzu69n3qTgc18eI+ztw/pJv47hs=
107
github.com/liuzl/da v0.0.0-20180704015230-14771aad5b1d/go.mod h1:7xD3p0XnHvJFQ3t/stEJd877CSIMkH/fACVWen5pYnc=
11-
github.com/liuzl/gocc v0.0.0-20200216023908-f8cb162baf44 h1:dS9TABScMvCthx7hMQ9yJ2gbkTMPKmr4GyZqxPGvIf0=
12-
github.com/liuzl/gocc v0.0.0-20200216023908-f8cb162baf44/go.mod h1:WOmJMC+oEvNwrENQx2G4TAh2FNRTSciRT/8MSlkBzLk=
13-
github.com/liuzl/goutil v0.0.0-20210628080224-310b49755b5f h1:VNfXzlrtFsryyprTrt3JhXG31DlWMpsW7Y9CXxB0wOE=
14-
github.com/liuzl/goutil v0.0.0-20210628080224-310b49755b5f/go.mod h1:bs5NOyZVxtkxzxw2wWhglhguMUE7PesJQ3TcaFNxWcU=
15-
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
168
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
179
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
18-
github.com/stretchr/testify v1.7.0 h1:nwc3DEeHmmLAfoZucVR881uASk0Mfjw8xYJ99tb5CcY=
1910
github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg=
2011
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
21-
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c h1:dUUwHk2QECo/6vqA44rthZ8ie2QXMNeKRTHCNY2nXvo=
2212
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

opencc.go

+2-4
Original file line numberDiff line numberDiff line change
@@ -23,17 +23,15 @@ var df embed.FS
2323

2424
var (
2525
// Dir is the parent dir for config and dictionary
26-
Dir = flag.String("dir", defaultDir(), "dict dir")
27-
configDir = "config"
28-
dictDir = "dictionary"
26+
Dir = flag.String("dir", defaultDir(), "dict dir")
2927
)
3028

3129
func defaultDir() string {
3230
if runtime.GOOS == "windows" {
3331
return `C:\gocc\`
3432
}
3533
if goPath, ok := os.LookupEnv("GOPATH"); ok {
36-
return goPath + "/src/github.com/liuzl/gocc/"
34+
return goPath + "/src/github.com/griffinqiu/opencc/"
3735
} else {
3836
return `/usr/local/share/gocc/`
3937
}

opencc_test.go

+33-20
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,51 @@ import (
44
"testing"
55
)
66

7-
func TestConvert(t *testing.T) {
8-
cases := []string{
9-
`我们是工农子弟兵`,
10-
`从正数第x行到倒数第y行,截取多行输出文本的部分内容`,
11-
`2017年中国住房租赁市场租金规模约为1.3万亿元`,
12-
`香煙(英語:Cigarette),為煙草製品的一種。滑鼠是一種很常見及常用的電腦輸入設備。`,
13-
`香菸(英語:Cigarette),為菸草製品的一種。記憶體是一種很常見及常用的電腦輸入裝置。`,
14-
`乾隆爷是谁的干爷爷?乾爷爷吗?`,
15-
}
7+
func assertCases(t *testing.T, s2t *OpenCC, cases map[string]string) {
8+
t.Helper()
169

17-
for k := range conversions {
18-
s2t, err := New(k)
10+
for k, v := range cases {
11+
str, err := s2t.Convert(k)
1912
if err != nil {
20-
t.Errorf("New %s error:%+v", k, err)
13+
t.Error(err)
2114
}
22-
t.Logf("%+v", s2t.DictChains)
23-
24-
for _, c := range cases {
25-
str, err := s2t.Convert(c)
26-
if err != nil {
27-
t.Error(err)
28-
}
29-
t.Logf("\n%s:original\n%s:%s", c, str, k)
15+
if str != v {
16+
t.Errorf("%s:%s", k, str)
3017
}
3118
}
3219
}
3320

21+
func TestConvert_s2t(t *testing.T) {
22+
cases := map[string]string{
23+
`我们是工农子弟兵`: `我們是工農子弟兵`,
24+
`从正数第 x 行到倒数第 y 行,截取多行输出文本的部分内容`: `從正數第 x 行到倒數第 y 行,截取多行輸出文本的部分內容`,
25+
`2017 年中国住房租赁市场租金规模约为 1.3 万亿元`: `2017 年中國住房租賃市場租金規模約爲 1.3 萬億元`,
26+
`香煙(英語:Cigarette),為煙草製品的一種。滑鼠是一種很常見及常用的電腦輸入設備。`: `香煙(英語:Cigarette),為煙草製品的一種。滑鼠是一種很常見及常用的電腦輸入設備。`,
27+
`香菸(英語:Cigarette),為菸草製品的一種。記憶體是一種很常見及常用的電腦輸入裝置。`: `香菸(英語:Cigarette),為菸草製品的一種。記憶體是一種很常見及常用的電腦輸入裝置。`,
28+
`乾隆爷是谁的干爷爷?乾爷爷吗?`: `乾隆爺是誰的幹爺爺?乾爺爺嗎?`,
29+
`2021 年汽车零部件板块市值涨幅跑输乘用车板块,估值相对滞涨,主要由于市场对零部件行业存两大担忧:大宗商品、运费上涨致利润承压;全球芯片紧缺致下游排产低于预期。`: `2021 年汽車零部件板塊市值漲幅跑輸乘用車板塊,估值相對滯漲,主要由於市場對零部件行業存兩大擔憂:大宗商品、運費上漲致利潤承壓;全球芯片緊缺致下游排產低於預期。`,
30+
}
31+
32+
s2t, _ := New("s2t")
33+
34+
assertCases(t, s2t, cases)
35+
}
36+
37+
func TestConvert_s2hk(t *testing.T) {
38+
cases := map[string]string{}
39+
40+
s2t, _ := New("s2hk")
41+
42+
assertCases(t, s2t, cases)
43+
}
44+
3445
func BenchmarkConvert(b *testing.B) {
3546
s2t, err := New("s2t")
3647
if err != nil {
3748
b.Fatal(err)
3849
}
50+
51+
// 10621 ns/op in Apple M1
3952
for n := 0; n < b.N; n++ {
4053
in := `自然语言处理是人工智能领域中的一个重要方向。`
4154
out, err := s2t.Convert(in)

0 commit comments

Comments
 (0)