Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

異なる種類の助詞の重複を許したい #6

Closed
takahashim opened this issue Feb 21, 2016 · 10 comments · Fixed by #7
Closed

異なる種類の助詞の重複を許したい #6

takahashim opened this issue Feb 21, 2016 · 10 comments · Fixed by #7
Assignees
Labels
Type: Bug Bug or Bug fixes

Comments

@takahashim
Copy link
Contributor

『ターミナルで「test」入力する』を与えると「一文に二回以上利用されている助詞 "と" がみつかりました」のエラーが出ますが、1個目の「と」は格助詞、2個めの「と」は接続助詞です。このような場合は重複を許したいです。

@azu azu added the Type: Bug Bug or Bug fixes label Feb 22, 2016
@azu
Copy link
Member

azu commented Feb 22, 2016

品詞細分類(pos_detail_1)まで見たほうがよさそうですね。
(テストケースとなるサンプルをもっと手軽に増やせると良さそう…)

$ npm i -g kuromoji-cli
$ kuromoji "ターミナルで「test」と入力すると"
[
    {
        "word_id": 434620,
        "word_type": "KNOWN",
        "word_position": 1,
        "surface_form": "ターミナル",
        "pos": "名詞",
        "pos_detail_1": "一般",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "ターミナル",
        "reading": "ターミナル",
        "pronunciation": "ターミナル"
    },
    {
        "word_id": 2594250,
        "word_type": "KNOWN",
        "word_position": 6,
        "surface_form": "",
        "pos": "助詞",
        "pos_detail_1": "格助詞",
        "pos_detail_2": "一般",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 2613610,
        "word_type": "KNOWN",
        "word_position": 7,
        "surface_form": "",
        "pos": "記号",
        "pos_detail_1": "括弧開",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 120,
        "word_type": "UNKNOWN",
        "word_position": 8,
        "surface_form": "test",
        "pos": "名詞",
        "pos_detail_1": "固有名詞",
        "pos_detail_2": "組織",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 2611700,
        "word_type": "KNOWN",
        "word_position": 12,
        "surface_form": "",
        "pos": "記号",
        "pos_detail_1": "括弧閉",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 2595020,
        "word_type": "KNOWN",
        "word_position": 13,
        "surface_form": "",
        "pos": "助詞",
        "pos_detail_1": "格助詞",
        "pos_detail_2": "引用",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 2567130,
        "word_type": "KNOWN",
        "word_position": 14,
        "surface_form": "入力",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "入力",
        "reading": "ニュウリョク",
        "pronunciation": "ニューリョク"
    },
    {
        "word_id": 3168910,
        "word_type": "KNOWN",
        "word_position": 16,
        "surface_form": "する",
        "pos": "動詞",
        "pos_detail_1": "自立",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "サ変・スル",
        "conjugated_form": "基本形",
        "basic_form": "する",
        "reading": "スル",
        "pronunciation": "スル"
    },
    {
        "word_id": 2594810,
        "word_type": "KNOWN",
        "word_position": 18,
        "surface_form": "",
        "pos": "助詞",
        "pos_detail_1": "接続助詞",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    }
]

azu added a commit that referenced this issue Feb 22, 2016
 助詞の品詞細分類1までを見て助詞同士を比較する。

 > ターミナルで「test」**と**入力する**と**、画面に表示されます。

 1個目の「と」は格助詞、2個めの「と」は接続助詞となるため、異なるものとして認識しエラーとしない。

 fix #6
@azu
Copy link
Member

azu commented Feb 22, 2016

#7 に実装したPRを出しました。

@azu
Copy link
Member

azu commented Feb 22, 2016

実装では単純に各tokenのpos_detail_1もキーにして比較するようにしています。
ただ、pos_detail_1,pos_detail_2,pos_detail_3は単純により詳細な分類があるなら順番に1,2,3と入れていってるような箱に見えるので、単純にpos_detail_1を見るとおかしくなるケースがあったりするのかも。(入れる順番は必ず決まっているとは思うので、1だけなら問題なさそうだけど2,3も見ようとするなら多分キーをMap的な順不同な構造に変える必要がありそう)

@takahashim
Copy link
Contributor Author

#7 ありがとうございます!
kuromoji.jsのpos_detail_1はmecabのIPADIC由来だそうで(http://stp-the-wld.blogspot.jp/2015/01/javascriptkuromojijs.html )、これはIPA品詞体系を元にしているようです。

この通りであれば、助詞についてはpos_detail_1だけを見ればよさそうです。

@azu
Copy link
Member

azu commented Feb 22, 2016

@takahashim なるほど。ありがとうございます。

@azu azu self-assigned this Feb 22, 2016
@azu azu closed this as completed in #7 Feb 22, 2016
@azu
Copy link
Member

azu commented Feb 22, 2016

マージして3.2.0としてリリースしました

@takahashim
Copy link
Contributor Author

ありがとうございました!

@naskya
Copy link

naskya commented Nov 3, 2022

$A \coloneqq B + C$ と置くと、 $f(A) = 0$ が成り立つ。

という感じの文を書いたときに「と置くと」の部分でまだ怒られましたので、ご参考までに。
(このような表現はよく現れると思います)

@azu
Copy link
Member

azu commented Nov 3, 2022

@naskya 別の原因な可能性があるのでIssueを作ってもらえると助かります。

Math記法が何か関係してそうな気がしなくはないので、プレーンなテキストとして再現するものがあると助かります。

B+Cと置くと、A=Bが成り立つ。

とした場合は 品詞細分類1(接続助詞と格助詞) が異なるので再現できませんでした
https://azu.github.io/morpheme-match/?text=B+C%E3%81%A8%E7%BD%AE%E3%81%8F%E3%81%A8%E3%80%81A=B%E3%81%8C%E6%88%90%E3%82%8A%E7%AB%8B%E3%81%A4%E3%80%82

`$A \coloneqq B + C$ と置くと、 $f(A) = 0$ が成り立つ。` もテストケースでは再現できなかった
[
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 1,
        "surface_form": "$",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 120,
        "word_type": "UNKNOWN",
        "word_position": 2,
        "surface_form": "A",
        "pos": "名詞",
        "pos_detail_1": "固有名詞",
        "pos_detail_2": "組織",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 3,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 4,
        "surface_form": "\\",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 120,
        "word_type": "UNKNOWN",
        "word_position": 5,
        "surface_form": "coloneqq",
        "pos": "名詞",
        "pos_detail_1": "固有名詞",
        "pos_detail_2": "組織",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 13,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 120,
        "word_type": "UNKNOWN",
        "word_position": 14,
        "surface_form": "B",
        "pos": "名詞",
        "pos_detail_1": "固有名詞",
        "pos_detail_2": "組織",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 15,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 16,
        "surface_form": "+",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 17,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 120,
        "word_type": "UNKNOWN",
        "word_position": 18,
        "surface_form": "C",
        "pos": "名詞",
        "pos_detail_1": "固有名詞",
        "pos_detail_2": "組織",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 19,
        "surface_form": "$",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 20,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 92760,
        "word_type": "KNOWN",
        "word_position": 21,
        "surface_form": "",
        "pos": "助詞",
        "pos_detail_1": "格助詞",
        "pos_detail_2": "引用",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 3830190,
        "word_type": "KNOWN",
        "word_position": 22,
        "surface_form": "置く",
        "pos": "動詞",
        "pos_detail_1": "自立",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "五段・カ行イ音便",
        "conjugated_form": "基本形",
        "basic_form": "置く",
        "reading": "オク",
        "pronunciation": "オク"
    },
    {
        "word_id": 92550,
        "word_type": "KNOWN",
        "word_position": 24,
        "surface_form": "",
        "pos": "助詞",
        "pos_detail_1": "接続助詞",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 90910,
        "word_type": "KNOWN",
        "word_position": 25,
        "surface_form": "",
        "pos": "記号",
        "pos_detail_1": "読点",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 26,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 27,
        "surface_form": "$",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 100,
        "word_type": "UNKNOWN",
        "word_position": 28,
        "surface_form": "f",
        "pos": "名詞",
        "pos_detail_1": "一般",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 29,
        "surface_form": "(",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 100,
        "word_type": "UNKNOWN",
        "word_position": 30,
        "surface_form": "A",
        "pos": "名詞",
        "pos_detail_1": "一般",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 31,
        "surface_form": ")",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 32,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 33,
        "surface_form": "=",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 34,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 90,
        "word_type": "UNKNOWN",
        "word_position": 35,
        "surface_form": "0",
        "pos": "名詞",
        "pos_detail_1": "",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 80,
        "word_type": "UNKNOWN",
        "word_position": 36,
        "surface_form": "$",
        "pos": "名詞",
        "pos_detail_1": "サ変接続",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 10,
        "word_type": "UNKNOWN",
        "word_position": 37,
        "surface_form": " ",
        "pos": "記号",
        "pos_detail_1": "空白",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "*"
    },
    {
        "word_id": 92920,
        "word_type": "KNOWN",
        "word_position": 38,
        "surface_form": "",
        "pos": "助詞",
        "pos_detail_1": "格助詞",
        "pos_detail_2": "一般",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    },
    {
        "word_id": 2844470,
        "word_type": "KNOWN",
        "word_position": 39,
        "surface_form": "成り立つ",
        "pos": "動詞",
        "pos_detail_1": "自立",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "五段・タ行",
        "conjugated_form": "基本形",
        "basic_form": "成り立つ",
        "reading": "ナリタツ",
        "pronunciation": "ナリタツ"
    },
    {
        "word_id": 90940,
        "word_type": "KNOWN",
        "word_position": 43,
        "surface_form": "",
        "pos": "記号",
        "pos_detail_1": "句点",
        "pos_detail_2": "*",
        "pos_detail_3": "*",
        "conjugated_type": "*",
        "conjugated_form": "*",
        "basic_form": "",
        "reading": "",
        "pronunciation": ""
    }
]

@naskya
Copy link

naskya commented Nov 4, 2022

@azu すみません。頭が回っていないときに適当なコメントを書いてしまいましたがこれは不適切な指摘でしたので撤回します。

私は普段 LaTeX 文書の校正にのみ textlint を使用しており、自分が textlint-plugin-latex2e を併用している(ある意味特殊な状態の textlint を使っている)ことを失念していました。

ご指摘の通り、textlint-plugin-latex2e を使わずにこの文を lint しても特にエラーは生じませんでしたので、これは no-doubled-joshi の不具合ではないと考えられます。

また、今後何か怪しいケースを見つけた場合にはプレーンなテキストで再現するケースを作るようにいたします。よろしくお願いします。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Bug or Bug fixes
Projects
None yet
3 participants