Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么在分词后保留"c++软件工程师"中“+”号在结果中,为什么拼音分词器会过滤掉符号呢 #291

Open
Maskvvv opened this issue Jul 2, 2023 · 0 comments

Comments

@Maskvvv
Copy link

Maskvvv commented Jul 2, 2023

GET /_analyze
{
  "tokenizer": "keyword", 
  "filter": [
    {
      "type": "pinyin",
      "keep_original": false,
      "keep_first_letter": false,
      "keep_full_pinyin": true,
      "none_chinese_pinyin_tokeniz": true,
      "ignore_pinyin_offset": false
    }
  ],
  "text": [
    "c++软件工程师"
  ]
}

结果

{
  "tokens" : [
    {
      "token" : "c",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "c",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "ruan",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "jian",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "gong",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "cheng",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "shi",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 6
    }
  ]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant