Skip to content

Commit

Permalink
check cross more
Browse files Browse the repository at this point in the history
  • Loading branch information
yaofei.sun committed Jan 12, 2016
1 parent 5c325d5 commit d7ebfbe
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions src/main/java/org/wltea/analyzer/core/LexemePath.java
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,10 @@ Lexeme removeTail(){
* @return
*/
boolean checkCross(Lexeme lexeme){
return (lexeme.getBegin() >= this.pathBegin && lexeme.getBegin() < this.pathEnd)
|| (this.pathBegin >= lexeme.getBegin() && this.pathBegin < lexeme.getBegin()+ lexeme.getLength());
int start = this.getPathBegin() < lexeme.getBeginPosition() ? this.getPathBegin() : lexeme.getBegin();
int end = this.getPathEnd() > lexeme.getEndPosition() ? this.getPathEnd() : lexeme.getEndPosition();

return (end - start) <= (this.getPathLength() + lexeme.getLength());
}

int getPathBegin() {
Expand Down

1 comment on commit d7ebfbe

@sunyaofei
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原有的判断交叉的方式不全面,新加的代码的思路是:两个线段交叉时其总长度小于等于两个线段的和,结果会景点smart模式分词结果。

测试case如下 :
输入:孙要求是正确

自定义词库:
孙要
孙要求
要求
求是
正确
———————————————————————not smart before bug fix
0 - 3 : 孙要求 | CN_WORD
0 - 2 : 孙要 | CN_WORD
0 - 1 : 孙 | CN_WORD
1 - 3 : 要求 | CN_WORD
2 - 4 : 求是 | CN_WORD
4 - 6 : 正确 | CN_WORD
————————————————————————smart before bug fix
0 - 2 : 孙要 | CN_WORD
2 - 4 : 求是 | CN_WORD
4 - 6 : 正确 | CN_WORD

———————————————————————not smart after bug fix
0 - 3 : 孙要求 | CN_WORD
0 - 2 : 孙要 | CN_WORD
0 - 1 : 孙 | CN_WORD
1 - 3 : 要求 | CN_WORD
2 - 4 : 求是 | CN_WORD
4 - 6 : 正确 | CN_WORD
————————————————————————smart after bug fix
0 - 3 : 孙要求 | CN_WORD
4 - 6 : 正确 | CN_WORD

Please sign in to comment.