Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIGHLIGHT() highlights a hindi word wrong #1413

Open
githubmanticore opened this issue Sep 1, 2023 · 1 comment
Open

HIGHLIGHT() highlights a hindi word wrong #1413

githubmanticore opened this issue Sep 1, 2023 · 1 comment

Comments

@githubmanticore
Copy link
Contributor

When I try to highligh Hindi word हिंदी in text english text हिंदी पाठ it leaves one character out of the snippet:

➜  ~ docker stop manticore; docker run --name manticore -p9306:9306 -p9308:9308 --rm -d manticoresearch/manticore:dev && \  
docker exec -it manticore mysql && \  
docker stop manticore  
  
Error response from daemon: No such container: manticore  
983dc20eb06f5812e4cc0f4e91a348fad13fc31636a737748c7dacf643e95de2  
MySQL [(none)]> create table t(f text);  
MySQL [(none)]> insert into t values(0,'english text हिंदी पाठ');  
MySQL [(none)]> select highlight() from t where match('हिंदी');  
 -----------------------------------------------   
| highlight()                                   |  
 -----------------------------------------------   
| english text <b>हिंद</b>ी पाठ                 |  
 -----------------------------------------------   
  
  
MySQL [(none)]> show status like '%version%';  
 --------------- -------------------------------   
| Counter       | Value                         |  
 --------------- -------------------------------   
| version       | 3.5.1 84c8db5e@200817 release |  
| mysql_version | 3.5.1 84c8db5e@200817 release |  
 --------------- -------------------------------   

It also looks strange that the row with the snippet is shorter than expected (3 spaces are missing in the end).

@dmorgoon
Copy link

dmorgoon commented Sep 1, 2023

Not sure if it is related or not, but highlighting in general works unexpectedly with some unicode symbols/codepoints

create table test_idx2 (t text);
--- note **é** bellow
insert into test_idx2 values ('includé');
select highlight() as h from test_idx2 where match('include');
+------------------+
| h                |
+------------------+
| <b>include</b>́  |
+------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants