Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlight very slow performance (x28) on remote index and returning empty values when fieldname without quotes specified in HIGHLIGHT({around=5},msg) #1158

Open
popalot2 opened this issue Jun 7, 2023 · 2 comments

Comments

@popalot2
Copy link

popalot2 commented Jun 7, 2023

Describe the bug

  1. When using HIGHLIGHT on a remote index (a wrapper around a local index), the performance is ~x28 times worst as compared to the
    same local index.
  2. When using HIGHLIGHT({around=5},msg) (with field name) on remote index, empty string is returned.

To Reproduce

  1. create index, adjust path to correct location
source src_highlight_performance_hit  
{  
  type = csvpipe  
  csvpipe_command= awk 'BEGIN {srand(); for (i = 1; i <= 10000; i  ) {printf int(rand() * 10000000) ",2,3,aaa,Led Zeppelin "; system("head -n 2000 /usr/share/dict/words | shuf| head -n 1000|tr -c \"[:alnum:]\" \" \""); print ""}}'  
  csvpipe_attr_uint=f1  
  csvpipe_attr_uint=f2  
  csvpipe_field=s1  
  csvpipe_field = msg  
}  
  
index idx_highligh_performance_hit  
{  
  stored_fields = msg  
  source = src_highlight_performance_hit  
  path = /var/manticore/idx_highligh_performance_hit  
  
  index_exact_words=1  
  min_prefix_len = 0  
  
  docstore_compression = none  
  
  ngram_len = 1  
  html_strip = 1  
}  
  
index idx_highligh_performance_hit_local  
{  
        local = idx_highligh_performance_hit  
        type = distributed  
}  
  
  
index idx_highligh_performance_hit_remote  
{  
        agent = 127.0.0.1:9312:idx_highligh_performance_hit  
        type = distributed  
}  
  
  1. prepare index
indexer  idx_highligh_performance_hit --rotate  
  1. run SQL, observe that highlight is empty when using HIGHLIGHT({around=5},msg) on remote index
#works ok  
mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},msg) h, RAND() r FROM idx_highligh_performance_hit  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  
  
#highlight empty from remote index, no error  
mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},msg) h, RAND() r FROM idx_highligh_performance_hit_remote  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  
  
#works ok  
mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},'msg') h, RAND() r FROM idx_highligh_performance_hit  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  
mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},'msg') h, RAND() r FROM idx_highligh_performance_hit  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  

results:

mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},msg) h, RAND() r FROM idx_highligh_performance_hit  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  
567002  <b>Led</b> Zeppelin absconce abyssopelagic absolutista absurdness  ...  0.99935228  
  
mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},msg) h, RAND() r FROM idx_highligh_performance_hit_remote  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  
698081          0.99990314  
  
mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},'msg') h, RAND() r FROM idx_highligh_performance_hit  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  
467910  <b>Led</b> Zeppelin abasement abaised acatharsy ablative  ...   0.99792570  
  
mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss -e "SELECT id ,HIGHLIGHT({around=5},'msg') h, RAND() r FROM idx_highligh_performance_hit  WHERE match('Led') ORDER BY r DESC LIMIT  0,1"  
57870  <b>Led</b> Zeppelin abluted abrogable acana abolisher  ...      0.99980712  
  1. prepare test SQL
awk 'BEGIN {for (i = 1; i <= 300; i  ) { print "SELECT id ,HIGHLIGHT(), RAND() r FROM idx_highligh_performance_hit_local  WHERE match(7Led7) ORDER BY r DESC LIMIT  0,30 OPTION threads=1; ";} }' > queries_local.txt  
awk 'BEGIN {for (i = 1; i <= 300; i  ) { print "SELECT id ,HIGHLIGHT(), RAND() r FROM idx_highligh_performance_hit_remote  WHERE match(7Led7) ORDER BY r DESC LIMIT  0,30  OPTION threads=1;";} }' > queries_remote.txt  
awk 'BEGIN {for (i = 1; i <= 300; i  ) { print "SELECT id ,RAND() r FROM idx_highligh_performance_hit_local  WHERE match(7Led7) ORDER BY r DESC LIMIT  0,30 OPTION threads=1; ";} }' > queries_local_no_highlight.txt  
awk 'BEGIN {for (i = 1; i <= 300; i  ) { print "SELECT id ,RAND() r FROM idx_highligh_performance_hit_remote  WHERE match(7Led7) ORDER BY r DESC LIMIT  0,30  OPTION threads=1;";} }' > queries_remote_no_highlight.txt  
  
  1. run test SQL to see performance
time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_local.txt > res_queries_local.txt  
  
time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_remote.txt > res_queries_remote.txt  
  
time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_local_no_highlight.txt > res_queries_local_no_highlight.txt  
  
time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_remote_no_highlight.txt > res_queries_remote_no_highlight.txt  

observe that highlight on remote index performs ~x28 times worst than on the same local index.

results

time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_local.txt > res_queries_local.txt  
real    0m1.876s  
user    0m0.011s  
sys     0m0.006s  
  
time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_remote.txt > res_queries_remote.txt  
real    0m52.648s  
user    0m0.011s  
sys     0m0.008s  

When highlight not used, performance is slightly worst on remote index (as expected)

time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_local_no_highlight.txt > res_queries_local_no_highlight.txt  
real    0m0.295s  
user    0m0.005s  
sys     0m0.005s  
  
time mysql --protocol=tcp -h localhost -P 9306 -u aaa -ss < queries_remote_no_highlight.txt > res_queries_remote_no_highlight.txt  
real    0m0.442s  
user    0m0.006s  
sys     0m0.004s  

Expected behavior
HIGHLIGHT({around=5},msg) should return correct data.
HIGHLIGHT on remote index should perform much much better.

Describe the environment:
Manticore 6.0.5 844b1ae@230606 dev (columnar 2.0.5 d593e0d@230529) (secondary 2.0.5 d593e0d@230529)
Linux Alma-87-amd64-base 5.14.0-284.11.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 9 05:49:00 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Messages from log files:
Not applicable

Additional context

@klirichek
Copy link
Contributor

klirichek commented Jun 8, 2023

The related issue is #568

@sanikolaev
Copy link
Collaborator

As discussed on "dev call of Jun 8 2023", highlight() is not working via lazy fetching. It makes sense to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants