Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Results Inconsistent Across Nodes for an index #2779

Open
5 tasks
mohdmsl opened this issue Nov 22, 2024 · 15 comments
Open
5 tasks

Query Results Inconsistent Across Nodes for an index #2779

mohdmsl opened this issue Nov 22, 2024 · 15 comments
Labels
bug waiting Waiting for the original poster (in most cases) or something else

Comments

@mohdmsl
Copy link

mohdmsl commented Nov 22, 2024

Bug Description:

Set up a Manticore cluster with three nodes (search-01, search-02, search-03) and synchronize the index lisdocument_20241122_2991.

Ensure all nodes have the same total document count in the index:
SELECT COUNT(*) FROM lisdocument_20241122_2991;

Run the following query on each node:

SELECT COUNT(*) 
FROM lisdocument_20241122_2991 
WHERE title_len_chars > 0 
  AND text_len_chars > 0 
AND type IN ('PROGRAM_ELEMENT','PLANNED_PROGRAM','PROJECT','ITEM');

search-01:

mysql> SELECT COUNT(*) FROM lisdocument_20241122_2991 
where title_len_chars > 0 AND text_len_chars > 0 
AND type IN ('PROGRAM_ELEMENT','PLANNED_PROGRAM','PROJECT','ITEM');

+----------+
| count(*) |
+----------+
|   928088 |
+----------+
1 row in set (0.02 sec)
--- 1 out of 1 results in 17ms ---

search-02:

mysql> SELECT COUNT(*) FROM lisdocument_20241122_2991 
where title_len_chars > 0 AND text_len_chars > 0 
AND type IN ('PROGRAM_ELEMENT','PLANNED_PROGRAM','PROJECT','ITEM');
+----------+
| count(*) |
+----------+
|   928469 |
+----------+
1 row in set (0.02 sec)
--- 1 out of 1 results in 19ms ---

search-03:

mysql> SELECT COUNT(*) FROM lisdocument_20241122_2991 where title_len_chars > 0 AND text_len_chars > 0 AND type IN ('PROGRAM_ELEMENT','PLANNED_PROGRAM','PROJECT','ITEM');

+----------+
| count(*) |
+----------+
|   928183 |
+----------+
1 row in set (0.02 sec)
--- 1 out of 1 results in 21ms ---

Additional Information:

The total document count in the index is consistent across nodes:

  1. SELECT COUNT(*) FROM lisdocument_20241122_2991;
    Returns the same result on all nodes.
  2. However, queries with filters (like the one provided) show discrepancies in results.

Schema of index is:

| lisdocument_20241122_2991 | CREATE TABLE lisdocument_20241122_2991 (
id bigint,
title text,
`text` text,
full_text_key_phrases text,
source_document_id string attribute,
document_id string attribute,
version float,
`type` string attribute,
`timestamp` bigint,
source string attribute,
title_len_chars integer,
text_len_chars integer,
full_text string attribute,
full_text_len_chars integer,
full_text_pdf string attribute,
full_text_pdf_len_chars bigint,
creation_date bigint,
modified_date bigint,
publish_year integer,
documents string attribute,
parent_documents string attribute,
relations string attribute,
mentions string attribute,
entities string attribute,
tags string attribute,
vector float_vector knn_type='hnsw' knn_dims='768' hnsw_similarity='COSINE'
) min_prefix_len='3' min_infix_len='3' index_exact_words='1' engine='columnar' blend_chars='+,&' morphology='lemmatize_en_all, libstemmer_en' stopwords_unstemmed='1' stopwords='/var/lib/data/manticore/lisdocument_20241122_2991/en' rt_mem_limit='536870912' |
+---------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

Cluster State:
Screenshot 2024-11-22 at 8 05 19 PM

mysql> SHOW STATUS 
    -> ;
+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Counter                                           | Value                                                                                                                                                                                                                                                                                                               |
+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| uptime                                            | 167005                                                                                                                                                                                                                                                                                                              |
| connections                                       | 59514                                                                                                                                                                                                                                                                                                               |
| maxed_out                                         | 0                                                                                                                                                                                                                                                                                                                   |
| version                                           | 6.3.6 593045790@24080214 (columnar 2.3.0 88a01c3@24052206) (secondary 2.3.0 88a01c3@24052206) (knn 2.3.0 88a01c3@24052206) (buddy v2.3.12)                                                                                                                                                                          |
| mysql_version                                     | 5.0.37                                                                                                                                                                                                                                                                                                              |
| command_search                                    | 1624                                                                                                                                                                                                                                                                                                                |
| command_excerpt                                   | 0                                                                                                                                                                                                                                                                                                                   |
| command_update                                    | 0                                                                                                                                                                                                                                                                                                                   |
| command_keywords                                  | 0                                                                                                                                                                                                                                                                                                                   |
| command_persist                                   | 98                                                                                                                                                                                                                                                                                                                  |
| command_status                                    | 2986                                                                                                                                                                                                                                                                                                                |
| command_flushattrs                                | 0                                                                                                                                                                                                                                                                                                                   |
| command_sphinxql                                  | 0                                                                                                                                                                                                                                                                                                                   |
| command_ping                                      | 0                                                                                                                                                                                                                                                                                                                   |
| command_delete                                    | 0                                                                                                                                                                                                                                                                                                                   |
| command_set                                       | 82                                                                                                                                                                                                                                                                                                                  |
| command_insert                                    | 0                                                                                                                                                                                                                                                                                                                   |
| command_replace                                   | 3057599                                                                                                                                                                                                                                                                                                             |
| command_commit                                    | 778                                                                                                                                                                                                                                                                                                                 |
| command_suggest                                   | 0                                                                                                                                                                                                                                                                                                                   |
| command_json                                      | 0                                                                                                                                                                                                                                                                                                                   |
| command_callpq                                    | 0                                                                                                                                                                                                                                                                                                                   |
| command_cluster                                   | 118                                                                                                                                                                                                                                                                                                                 |
| command_getfield                                  | 0                                                                                                                                                                                                                                                                                                                   |
| agent_connect                                     | 1974                                                                                                                                                                                                                                                                                                                |
| agent_tfo                                         | 0                                                                                                                                                                                                                                                                                                                   |
| agent_retry                                       | 8                                                                                                                                                                                                                                                                                                                   |
| queries                                           | 9202                                                                                                                                                                                                                                                                                                                |
| dist_queries                                      | 0                                                                                                                                                                                                                                                                                                                   |
| workers_total                                     | 40                                                                                                                                                                                                                                                                                                                  |
| workers_active                                    | 6                                                                                                                                                                                                                                                                                                                   |
| workers_clients                                   | 5                                                                                                                                                                                                                                                                                                                   |
| workers_clients_vip                               | 0                                                                                                                                                                                                                                                                                                                   |
| work_queue_length                                 | 31                                                                                                                                                                                                                                                                                                                  |
| load                                              | 3.07 2.45 2.22                                                                                                                                                                                                                                                                                                      |
| load_primary                                      | 0.00 0.00 0.00                                                                                                                                                                                                                                                                                                      |
| load_secondary                                    | 0.00 0.00 0.00                                                                                                                                                                                                                                                                                                      |
| query_wall                                        | 599.533                                                                                                                                                                                                                                                                                                             |
| query_cpu                                         | OFF                                                                                                                                                                                                                                                                                                                 |
| dist_wall                                         | 0.000                                                                                                                                                                                                                                                                                                               |
| dist_local                                        | 0.000                                                                                                                                                                                                                                                                                                               |
| dist_wait                                         | 0.000                                                                                                                                                                                                                                                                                                               |
| query_reads                                       | OFF                                                                                                                                                                                                                                                                                                                 |
| query_readkb                                      | OFF                                                                                                                                                                                                                                                                                                                 |
| query_readtime                                    | OFF                                                                                                                                                                                                                                                                                                                 |
| avg_query_wall                                    | 0.065                                                                                                                                                                                                                                                                                                               |
| avg_query_cpu                                     | OFF                                                                                                                                                                                                                                                                                                                 |
| avg_dist_wall                                     | 0.000                                                                                                                                                                                                                                                                                                               |
| avg_dist_local                                    | 0.000                                                                                                                                                                                                                                                                                                               |
| avg_dist_wait                                     | 0.000                                                                                                                                                                                                                                                                                                               |
| avg_query_reads                                   | OFF                                                                                                                                                                                                                                                                                                                 |
| avg_query_readkb                                  | OFF                                                                                                                                                                                                                                                                                                                 |
| avg_query_readtime                                | OFF                                                                                                                                                                                                                                                                                                                 |
| qcache_max_bytes                                  | 16777216                                                                                                                                                                                                                                                                                                            |
| qcache_thresh_msec                                | 3000                                                                                                                                                                                                                                                                                                                |
| qcache_ttl_sec                                    | 60                                                                                                                                                                                                                                                                                                                  |
| qcache_cached_queries                             | 0                                                                                                                                                                                                                                                                                                                   |
| qcache_used_bytes                                 | 0                                                                                                                                                                                                                                                                                                                   |
| qcache_hits                                       | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_name                                      | DMETRICS_FTS_1                                                                                                                                                                                                                                                                                                      |
| cluster_DMETRICS_FTS_1_state_uuid                 | 0aff7d43-a75b-11ef-9615-02b5ab4805fa                                                                                                                                                                                                                                                                                |
| cluster_DMETRICS_FTS_1_conf_id                    | 3                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_status                     | primary                                                                                                                                                                                                                                                                                                             |
| cluster_DMETRICS_FTS_1_size                       | 3                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_index                | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_node_state                 | synced                                                                                                                                                                                                                                                                                                              |
| cluster_DMETRICS_FTS_1_nodes_set                  | 172.20.4.12:9312,172.20.4.13:9312,172.20.4.14:9312                                                                                                                                                                                                                                                                  |
| cluster_DMETRICS_FTS_1_nodes_view                 | 172.20.4.12:9312,172.20.4.12:9322:replication,172.20.4.13:9312,172.20.4.13:9324:replication,172.20.4.14:9312,172.20.4.14:9322:replication                                                                                                                                                                           |
| cluster_DMETRICS_FTS_1_indexes_count              | 16                                                                                                                                                                                                                                                                                                                  |
| cluster_DMETRICS_FTS_1_indexes                    | kbentity,lisdocument,modelinfo,tag,lisdocument_20241122_2989,kbentity_20241122_2989,tag_20241122_2989,modelinfo_20241122_2989,lisdocument_20241122_2991,kbentity_20241122_2991,tag_20241122_2991,modelinfo_20241122_2991,lisdocument_20241122_2992,kbentity_20241122_2992,tag_20241122_2992,modelinfo_20241122_2992 |
| cluster_DMETRICS_FTS_1_local_state_uuid           | 0aff7d43-a75b-11ef-9615-02b5ab4805fa                                                                                                                                                                                                                                                                                |
| cluster_DMETRICS_FTS_1_protocol_version           | 9                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_last_applied               | 1685                                                                                                                                                                                                                                                                                                                |
| cluster_DMETRICS_FTS_1_last_committed             | 1685                                                                                                                                                                                                                                                                                                                |
| cluster_DMETRICS_FTS_1_replicated                 | 790                                                                                                                                                                                                                                                                                                                 |
| cluster_DMETRICS_FTS_1_replicated_bytes           | 71611553464                                                                                                                                                                                                                                                                                                         |
| cluster_DMETRICS_FTS_1_repl_keys                  | 3047383                                                                                                                                                                                                                                                                                                             |
| cluster_DMETRICS_FTS_1_repl_keys_bytes            | 24404168                                                                                                                                                                                                                                                                                                            |
| cluster_DMETRICS_FTS_1_repl_data_bytes            | 2517254                                                                                                                                                                                                                                                                                                             |
| cluster_DMETRICS_FTS_1_repl_other_bytes           | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_received                   | 2138                                                                                                                                                                                                                                                                                                                |
| cluster_DMETRICS_FTS_1_received_bytes             | 79697355014                                                                                                                                                                                                                                                                                                         |
| cluster_DMETRICS_FTS_1_local_commits              | 776                                                                                                                                                                                                                                                                                                                 |
| cluster_DMETRICS_FTS_1_local_cert_failures        | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_replays              | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_send_queue           | 1                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_send_queue_max       | 2                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_send_queue_min       | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_send_queue_avg       | 0.00604839                                                                                                                                                                                                                                                                                                          |
| cluster_DMETRICS_FTS_1_local_recv_queue           | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_recv_queue_max       | 30                                                                                                                                                                                                                                                                                                                  |
| cluster_DMETRICS_FTS_1_local_recv_queue_min       | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_recv_queue_avg       | 0.94667912                                                                                                                                                                                                                                                                                                          |
| cluster_DMETRICS_FTS_1_local_cached_downto        | 1684                                                                                                                                                                                                                                                                                                                |
| cluster_DMETRICS_FTS_1_flow_control_paused_ns     | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_flow_control_paused        | 0.000000                                                                                                                                                                                                                                                                                                            |
| cluster_DMETRICS_FTS_1_flow_control_sent          | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_flow_control_recv          | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_flow_control_interval      | [ 173, 173 ]                                                                                                                                                                                                                                                                                                        |
| cluster_DMETRICS_FTS_1_flow_control_interval_low  | 173                                                                                                                                                                                                                                                                                                                 |
| cluster_DMETRICS_FTS_1_flow_control_interval_high | 173                                                                                                                                                                                                                                                                                                                 |
| cluster_DMETRICS_FTS_1_flow_control_status        | OFF                                                                                                                                                                                                                                                                                                                 |
| cluster_DMETRICS_FTS_1_cert_deps_distance         | 4.84094954                                                                                                                                                                                                                                                                                                          |
| cluster_DMETRICS_FTS_1_apply_oooe                 | 0.14065282                                                                                                                                                                                                                                                                                                          |
| cluster_DMETRICS_FTS_1_apply_oool                 | 0.000000                                                                                                                                                                                                                                                                                                            |
| cluster_DMETRICS_FTS_1_apply_window               | 1.14065278                                                                                                                                                                                                                                                                                                          |
| cluster_DMETRICS_FTS_1_commit_oooe                | 0.000000                                                                                                                                                                                                                                                                                                            |
| cluster_DMETRICS_FTS_1_commit_oool                | 0.000000                                                                                                                                                                                                                                                                                                            |
| cluster_DMETRICS_FTS_1_commit_window              | 1.01958454                                                                                                                                                                                                                                                                                                          |
| cluster_DMETRICS_FTS_1_local_state                | 4                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_local_state_comment        | Synced                                                                                                                                                                                                                                                                                                              |
| cluster_DMETRICS_FTS_1_cert_index_size            | 12003                                                                                                                                                                                                                                                                                                               |
| cluster_DMETRICS_FTS_1_cert_bucket_count          | 172934                                                                                                                                                                                                                                                                                                              |
| cluster_DMETRICS_FTS_1_gcache_pool_size           | 502473776                                                                                                                                                                                                                                                                                                           |
| cluster_DMETRICS_FTS_1_causal_reads               | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_cert_interval              | 0.60415429                                                                                                                                                                                                                                                                                                          |
| cluster_DMETRICS_FTS_1_open_transactions          | 1                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_open_connections           | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_ist_receive_status         |                                                                                                                                                                                                                                                                                                                     |
| cluster_DMETRICS_FTS_1_ist_receive_seqno_start    | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_ist_receive_seqno_current  | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_ist_receive_seqno_end      | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_incoming_addresses         | 172.20.4.12:9312,172.20.4.12:9322:replication,172.20.4.13:9312,172.20.4.13:9324:replication,172.20.4.14:9312,172.20.4.14:9322:replication                                                                                                                                                                           |
| cluster_DMETRICS_FTS_1_cluster_weight             | 3                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_desync_count               | 0                                                                                                                                                                                                                                                                                                                   |
| cluster_DMETRICS_FTS_1_evs_delayed                |                                                                                                                                                                                                                                                                                                                     |
| cluster_DMETRICS_FTS_1_evs_evict_list             |                                                                                                                                                                                                                                                                                                                     |
| cluster_DMETRICS_FTS_1_evs_repl_latency           | 0.000357076/0.00142803/0.00510147/0.000483385/40349                                                                                                                                                                                                                                                                 |
| cluster_DMETRICS_FTS_1_evs_state                  | OPERATIONAL                                                                                                                                                                                                                                                                                                         |
| cluster_DMETRICS_FTS_1_gcomm_uuid                 | 0aff3c98-a75b-11ef-8d8f-6b04f95b2ac2                                                                                                                                                                                                                                                                                |
+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Manticore Search Version:

6.3.6

Operating System Version:

linux

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@mohdmsl mohdmsl added the bug label Nov 22, 2024
@sanikolaev
Copy link
Collaborator

Hello @mohdmsl

This is an interesting case. It's not clear why SELECT COUNT(*) is consistent, but SELECT COUNT(*) ... WHERE ... is not. How can we reproduce this issue for debugging?

@sanikolaev sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Nov 25, 2024
@sanikolaev
Copy link
Collaborator

Also, if you provide your snapshotted tables where the number of docs is the same, but the where queries return different results it may be also helpful to figure out what's going on.

@mohdmsl
Copy link
Author

mohdmsl commented Dec 5, 2024

hi @sanikolaev
I have created scripts in this repository to reproduce the issue encountered with KNN search. The repository includes the schema, data, and scripts required to generate the results: repo.

The schema consists of document_id and vectors with a dimension of 768. The objective is to insert identical data into multiple indexes on a single instance and then compare the resulting outputs.

Queries used:
1.

 name: 'Basic Count
query: select count(*) as total from lisdocument;
 name: vector search 
  query: SELECT count(*) as total FROM lisdocument
     WHERE  knn (vector, 100, (0.31006125,-0.0003656157,0.081459515,0.515771,0.014484076,-0.66133505,0.32237425,-0.55777,-0.386141,-0.023305604,0.11610652,-0.67190164,-0.08708035,0.08902938,-0.6518018,-0.051225036,-1.1461623,-0.41078997,0.17181408,-0.4735955,0.03203327,0.23325141,-0.9909842,-0.3320622,0.201117,0.6546775,-0.7582774,0.9626067,0.7869749,0.7003234,0.5161592,-0.47969586,1.1994319,0.28759456,-0.26597247,-0.22219379,0.33459783,-0.97726834,-0.4739887,0.43206725,0.01194832,-0.05791433,0.4811258,-0.8271685,0.3951732,0.24713722,0.11192199,1.0213652,0.19014879,-0.6778019,0.50749946,-1.049119,0.4575615,0.9339654,0.58290714,-0.097028166,-0.48701167,-1.0065265,-0.13599022,-0.40211567,-0.5544309,0.1297585,0.19523889,-0.77102447,0.64259815,-0.30415413,-0.23215108,0.67778605,0.618211,0.7600791,0.23732586,-0.7648567,0.062301278,0.13431896,0.66374356,0.40523198,0.46663165,0.51943946,-0.73183507,-0.57254523,0.50741565,-0.14492488,-0.026737308,-0.63081896,-0.041068904,0.43559575,0.1266691,0.68689376,0.013397118,0.53886193,0.20262197,0.29941437,-0.6999152,0.28189236,0.9438162,0.6245068,-0.2915007,-0.08586646,-0.39370182,0.6360048,-0.38291323,0.7843485,-0.54510206,-0.52077717,1.2116083,-0.12180229,0.13172503,-0.10871028,0.41399533,-0.47680008,0.9621055,-0.54643613,0.21518388,-0.54834914,-0.1906337,-0.09179794,-0.590677,-0.52437264,-0.7151974,-0.02695119,0.21971804,-0.024934726,0.05605529,0.44665658,-0.12020654,0.26943594,-0.10920261,0.20266409,0.012066222,0.19158714,-0.12457689,-0.4856409,0.16108908,-0.20201308,0.041395552,-0.34803805,-1.1102498,-0.74369335,-1.6364297,1.0657281,0.07174268,0.010433137,0.7891052,0.60267335,-0.2899389,0.4021704,-0.20868495,-0.003395542,-0.05169468,-0.17775702,0.21387385,0.0991693,-0.9152813,0.19724046,0.35551172,-0.4999967,-0.8879677,0.16530086,-1.1837602,-0.59362537,0.12462508,-0.34771776,0.9998891,0.8668287,-0.6882197,0.2541907,0.49837124,0.15233028,0.336384,-0.038960453,-0.7221209,-0.071910635,0.04892005,-0.21825159,0.16531892,-0.8357419,0.5070645,0.1569018,0.313003,-0.17091025,-0.33268282,1.1826873,0.24373363,-0.15638427,0.27422035,0.39059705,-0.34179664,-0.35423273,-0.7033785,0.21909295,1.1341084,0.27519116,1.1161767,-0.56932676,-0.9429059,0.40243575,-0.5576759,-0.21010822,-0.4341065,1.2042234,-0.82608193,0.4289299,-0.14796726,-0.027835188,-0.26135963,-0.5276095,-0.88423705,0.04380098,-0.43928683,0.41062203,-0.34576598,0.13919765,-0.01392624,-0.5553707,-0.45019755,0.91757697,0.4029668,0.1176647,-0.15419589,0.73095477,0.40260544,0.43138027,0.21548833,-0.5267257,-0.30721363,-0.25096813,-0.6048828,0.7789119,0.15235308,0.17003383,0.8087661,-0.38634503,-0.29575384,0.3041153,-0.0025674396,-0.12190996,0.5652693,0.123609096,-0.41633,-0.075612135,0.5379093,0.47812945,-0.7582835,0.48872802,0.17062183,-0.41053674,0.24030015,0.43435127,0.40152794,0.006724952,-0.32310417,-0.35114315,-0.37304744,-0.53361934,-0.14794493,0.5957107,0.48854348,0.22275516,0.28737432,-0.55769044,0.19703211,-0.43264365,0.70282584,0.8103258,0.4166922,0.18818198,-0.6881402,-0.8271788,0.33227536,0.44830915,-0.21777764,-0.3315875,-0.18762241,-0.5730435,1.2522116,0.05436271,0.5053515,0.08252778,-0.7671772,-0.67160285,0.7309791,-0.6450694,-0.23141778,0.081859656,-0.31854483,-0.66024923,0.14364149,0.30650187,0.14491636,0.09296628,0.29341727,-0.28662416,-0.8733087,0.73710185,0.6832434,-1.0337256,-0.36529627,-1.185272,0.70611227,-0.21201886,-0.4519033,0.35380524,0.24411415,0.7839299,-1.0546187,-0.5108705,-0.040125087,0.0038510442,0.16296358,0.14024405,-0.29560274,-0.36276716,1.3071771,0.086237356,-0.791835,0.11178309,-0.12588637,-0.4115165,0.30388063,-0.51085454,-0.090283126,-1.5626551,0.53435034,-0.22315815,-0.31446335,0.48053166,0.64726245,-1.073468,0.56286776,0.10031773,0.39174485,0.5910364,-0.27479896,-0.25149876,-0.064312816,-0.07417552,0.8132626,0.07798847,0.25687593,-0.60261244,-0.45493522,0.54915696,-0.053107496,-1.2380457,-0.21066967,-0.1940282,-0.43841827,0.17368084,-0.14619267,-0.2789112,0.2749565,0.4712215,0.3926232,-0.32814994,-1.0122043,-1.0696679,-0.16326195,-0.975733,0.39950857,0.19773667,-0.16035733,-0.058070622,-0.118425414,0.18218455,0.6603359,0.54957753,0.44137517,0.21397805,-0.22320518,-0.91974175,0.48958048,-0.084736824,-0.426293,0.8708147,-0.2713833,-0.39039102,0.30332446,-0.7585659,-0.12065592,-0.15349118,0.8288996,0.23025775,-0.35495567,0.21538107,0.86950296,-0.00998487,0.0017167316,0.120606296,0.4414424,0.4505997,0.829459,0.7318415,-0.8386215,0.2715878,0.65265125,-0.07971094,0.48587146,-0.5186518,0.55407375,0.2426369,0.21575326,-0.044891708,-1.2336338,-0.10642625,-0.5177275,-0.6177509,0.8570399,1.043228,-0.2713039,0.29890174,-0.47719136,0.09048428,0.81275654,0.22632912,-0.27022663,0.10290417,0.4140533,0.17572246,0.48987168,0.6703758,-0.46391836,0.075863644,15.550138,0.62152815,-0.04971984,0.10588257,0.17923146,0.15616268,-0.46352684,-0.08553317,-0.8809511,-0.13167156,0.88224876,-0.1254104,0.08551519,0.41633347,-0.39136684,-0.5745892,0.059034057,0.27425095,0.6071214,-1.021161,0.40765795,0.41800106,-0.2145601,0.020918911,0.60376966,0.48886603,0.23570696,-0.07289946,0.7897558,-0.38197488,0.8834917,-0.48737484,0.72564256,1.0258603,-0.7900168,-0.1589608,-0.5894068,-1.2211249,0.21977632,-0.032760903,-0.73974997,0.027299361,-0.40298098,-0.59684885,0.18560478,0.22460128,-0.25463927,0.6956338,0.04916962,-0.05426601,0.4170774,0.4209522,0.69340205,-0.5526944,0.13108079,0.028241986,0.27075502,0.058803745,-0.12544855,-0.6035113,-1.2496368,-0.6891652,-0.21734844,0.88274825,0.21550162,0.900132,0.23280652,0.022997925,0.070107326,0.2929837,-0.9167534,-0.44679806,0.6751843,0.26168746,-0.9102508,-0.53386253,-0.11613186,-0.33997464,-0.65545017,0.34133926,0.5415031,-0.8402537,-0.4420601,0.9791182,0.3558302,-1.3223187,0.58805984,-0.9981405,-1.0075716,-0.10522475,-0.4694484,-0.88297,-0.073731676,-0.40783596,-0.44489437,-0.35490802,0.5813389,-0.48236555,-0.8025031,0.14774305,-0.07733792,-0.04009475,-0.19666764,-0.13940135,0.56877077,0.89890414,-0.34569648,0.79776573,-0.091007896,-0.69942564,-1.1672308,0.30055726,-0.84184074,-1.0750866,0.53956366,-0.9331158,-0.29834008,0.0052645435,0.44171295,-0.4195513,0.8877479,-0.058045257,0.05742934,-0.16210446,-1.0204247,0.44277644,0.13323916,-1.1805303,-0.21072462,0.12990586,-0.044168193,-1.2782251,-0.25332808,0.40048394,0.2744026,-0.6884139,0.75036937,-0.8379228,0.1622562,0.1323186,-0.027955009,-0.42298132,-0.17523514,-0.64205575,0.44626942,0.11984201,0.61406684,-0.5398223,0.13014658,1.4530864,0.072602026,-0.34999475,-0.7632572,-0.3052821,-0.3685159,-0.30019602,-0.26202905,0.065592654,0.1356203,-0.23066288,-0.06168315,1.18814,0.18889207,-0.496132,0.71889156,0.3918985,-0.1458961,0.12719785,-0.8727122,-0.33598074,-0.20535125,-0.48016322,-0.40368298,-0.5138646,-0.62812066,0.455146,-0.23726645,-0.35210988,0.6264089,-0.35540456,-0.8420332,-0.4773195,-0.043095402,0.5031531,0.31352416,-0.56513035,0.6253261,-0.22605664,-0.022685153,0.56507564,0.13309203,-0.2219282,-0.74726635,-1.0282854,0.097224504,-0.030685902,-0.053559277,-1.3350763,0.6840824,-0.21796289,0.54228777,-0.1751819,0.185303,-1.3930091,0.28261814,0.55813426,0.058485728,-0.2205351,0.5061636,-0.45415583,-0.41728288,0.21399207,0.10460134,-0.5251125,-0.05460708,0.12028494,-0.8888363,-0.5616349,1.0626166,-0.57732487,-0.71718526,-0.3703574,0.18429625,0.5347522,0.20444115,0.57214594,0.37266445,-1.2059611,0.8612643,0.48821098,-0.43664938,-0.82340217,0.67264736,0.22491321,-0.74951804,0.0871341,-0.27833295,0.46309146,-0.78231233,-0.21181802,0.42298067,-0.07849596,0.5937467,1.0871675,0.019744089,-0.7014219,0.20075509,-0.64872456,-0.6484867,-0.48085892,0.7048896,0.37797463,-0.19823772,0.009384685,-0.44905084,0.47751105,0.07683447,-0.7515542,-0.14501087,0.17405201,0.17612083,0.54188806,-0.2535289,0.3439073,-0.6030962,0.38111544,-0.651172,-0.96390456,-0.030525658,-0.26229542,-0.07045684,0.28682828,0.4695437,0.48653972,-0.021991393,-0.19629684,-0.104265794,0.72771174,1.1777867,-0.42338756,-0.28041443,0.9536649,1.531987,-0.29626778,0.4382578,-0.715345,-0.8170925,0.94427216,1.243451,0.17655466,0.62224877,0.20158736,0.16209148,0.09094697,0.054510932,0.18709274,0.43218136,0.750771,0.52881354,-0.05870044,0.15825632,0.85584885,0.435063,0.10643143,0.95394486,1.6185488,-0.46994826,-0.0854546,-0.5466821,0.57927006,-0.428444,0.26448515,-0.6947298,1.218788,-0.14955972,0.18937686,0.12341008,-1.0414394,1.0318452,0.18007237,0.32144284,-0.41614947,-0.08107099,-0.24519788,-0.10084439,-0.5396372,0.36262137,-0.26675776,-0.21520412,0.14965774,-0.058672633,0.50249416,0.24154651,0.49916118,0.3448778,0.12129112,0.23797168,-1.1313815,-0.55008906,-0.7546291,-0.34192833,-1.1170193,-0.27139792,-1.0538054,-0.16249998,-0.90182513), 2000 )
     OPTION cutoff = 0, boolean_simplify = 1, max_matches = 1000 

Results: 5 DEC 2024


+---------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Query Name    | lisdocument1 | lisdocument2 | lisdocument3 | lisdocument4 | lisdocument5 | lisdocument6 |
+---------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Basic Count   |    100000    |    100000    |    100000    |    100000    |    100000    |    100000    |
| Vector Search |    10700     |    10600     |    10400     |    11200     |    11600     |    11500     |
+---------------+--------------+--------------+--------------+--------------+--------------+--------------+

Note:: The above result was obtained by inserting 100 documents repeatedly in a loop 100 times.

@sanikolaev
Copy link
Collaborator

sanikolaev commented Dec 6, 2024

Hello @mohdmsl

I also had to do pip install mysql-connector-python, you might want to include into the requirements file.

Here's wha I saw while running the load script:

Writing 100 to lisdocument6 records
Writing 100 to lisdocument6 records
Writing 100 to lisdocument6 records
Done writing 100 to 'lisdocument6' records
Done writing 100 to 'lisdocument6' records
Done writing 100 to 'lisdocument6' records
Done writing 100 to 'lisdocument6' records
Writing 100 to lisdocument6 records
Done writing 100 to 'lisdocument6' records
Writing 100 to lisdocument6 records
Done writing 100 to 'lisdocument6' records
Writing 100 to lisdocument6 records
Done writing 100 to 'lisdocument6' records
Writing 100 to lisdocument6 records
Done writing 100 to 'lisdocument6' records
Done writing 100 to 'lisdocument6' records
Writing 100 to lisdocument6 records
Writing 100 to lisdocument6 records
Done writing 100 to 'lisdocument6' records

but nothing was inserted into Manticore:

mysql> show tables;
+--------------+------+
| Index        | Type |
+--------------+------+
| lisdocument1 | rt   |
| lisdocument2 | rt   |
| lisdocument3 | rt   |
| lisdocument4 | rt   |
| lisdocument5 | rt   |
| lisdocument6 | rt   |
+--------------+------+
6 rows in set (0.00 sec)

mysql> select count(*) from lisdocument1;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
--- 0 out of 0 results in 0ms ---

mysql> select count(*) from lisdocument6;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)
--- 0 out of 0 results in 0ms ---

perhaps because smth was wrong with the HTTP JSON queries since I saw this in the log:

manticore-local  | WARNING: conn 172.24.0.1:55350(5987), sock=20: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55356(5988), sock=32: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55362(5989), sock=20: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55370(5990), sock=20: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55386(5991), sock=20: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55396(5992), sock=32: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55398(5993), sock=33: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55408(5994), sock=20: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55424(5995), sock=32: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55430(5996), sock=20: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55436(5997), sock=33: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55442(5998), sock=20: invalid HTTP method
manticore-local  | WARNING: conn 172.24.0.1:55444(5999), sock=20: invalid HTTP method

Please fix.

@mohdmsl
Copy link
Author

mohdmsl commented Dec 6, 2024

okay I will check. which python version you have used?

@mohdmsl
Copy link
Author

mohdmsl commented Dec 6, 2024

@sanikolaev
fixed it. can you please take latest pull and re-run docker compose.

@sanikolaev
Copy link
Collaborator

Thanks. I got this:

snikolaev@dev2:~/KNN-Benchmarking-for-Manticore-Search/manticore_comparison$ python diff_comparator.py
Running server: http://localhost:9308 | Query ID: Basic Count | Table: lisdocument1
Running server: http://localhost:9308 | Query ID: Basic Count | Table: lisdocument2
Running server: http://localhost:9308 | Query ID: Basic Count | Table: lisdocument3
Running server: http://localhost:9308 | Query ID: Basic Count | Table: lisdocument4
Running server: http://localhost:9308 | Query ID: Basic Count | Table: lisdocument5
Running server: http://localhost:9308 | Query ID: Basic Count | Table: lisdocument6
Running server: http://localhost:9308 | Query ID: vector search  | Table: lisdocument1
Running server: http://localhost:9308 | Query ID: vector search  | Table: lisdocument2
Running server: http://localhost:9308 | Query ID: vector search  | Table: lisdocument3
Running server: http://localhost:9308 | Query ID: vector search  | Table: lisdocument4
Running server: http://localhost:9308 | Query ID: vector search  | Table: lisdocument5
Running server: http://localhost:9308 | Query ID: vector search  | Table: lisdocument6
+---------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Query Name    | lisdocument1 | lisdocument2 | lisdocument3 | lisdocument4 | lisdocument5 | lisdocument6 |
+---------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Basic Count   |    100000    |    100000    |    100000    |    100000    |    100000    |    100000    |
| vector search |    25900     |    26100     |    26500     |    26300     |    26800     |    26700     |
+---------------+--------------+--------------+--------------+--------------+--------------+--------------+

but I don't understand what it means and how it's related to what you said initially:

Set up a Manticore cluster with three nodes (search-01, search-02, search-03) and synchronize the index lisdocument_20241122_2991.

I.e. the problem was (as I understood it) that on different nodes you had different counts, but in your demo there's only one node and multiple tables. What conclusion should I draw from this?

@mohdmsl
Copy link
Author

mohdmsl commented Dec 6, 2024

Yes, the original question was related to a multi-cluster setup. However, I attempted to reproduce the issue on a single-node cluster, as I noticed that even after a full load of data, queries return inconsistent results.

What I am observing is that after each data load, the query results vary, whereas my expectation is that the query results should remain consistent if data has not changed.

This behaviour creates a mess when I backfill same data on different environment(dev/stage)

@sanikolaev
Copy link
Collaborator

Thanks for the extra details. I've looked into the issue, and the main reason the count difference is due to a slight variation in the number of documents stored in disk chunks and the ram chunk:

mysql> select chunk_id, indexed_documents from lisdocument1.@status;
+----------+-------------------+
| chunk_id | indexed_documents |
+----------+-------------------+
|        1 |             45700 |
|        0 |             28600 |
+----------+-------------------+
mysql> select chunk_id, indexed_documents from lisdocument2.@status;
+----------+-------------------+
| chunk_id | indexed_documents |
+----------+-------------------+
|        1 |             45500 |
|        0 |             28600 |
+----------+-------------------+
mysql> select chunk_id, indexed_documents from lisdocument3.@status;
+----------+-------------------+
| chunk_id | indexed_documents |
+----------+-------------------+
|        1 |             45100 |
|        0 |             28600 |
+----------+-------------------+

etc.

This happens because of the adaptive RAM chunk size (you can read more about the "rate" here: https://manual.manticoresearch.com/Creating_a_table/Local_tables/Plain_and_real-time_table_settings#rt_mem_limit).

While the tables might look identical, there are small differences in how data is stored. This can affect how certain queries are executed.

To eliminate any differences and make the count "accurate", you can merge everything into a single disk chunk. For example:

mysql> flush ramchunk lisdocument1; optimize table lisdocument1 option sync=1, cutoff=1;
mysql> flush ramchunk lisdocument2; optimize table lisdocument2 option sync=1, cutoff=1;

mysql> SELECT count(*) as total FROM lisdocument1 WHERE  knn (vector, 100, ... , 2000 ) OPTION cutoff = 0, boolean_simplify = 1, max_matches = 1000;
+-------+
| total |
+-------+
|   100 |
+-------+

mysql> SELECT count(*) as total FROM lisdocument2 WHERE  knn (vector, 100, ... , 2000 ) OPTION cutoff = 0, boolean_simplify = 1, max_matches = 1000;
+-------+
| total |
+-------+
|   100 |
+-------+

The 20K+ count occurs because the k parameter does not affect the data stored in the RAM chunk, as explained in the documentation:
image

It is recommended to use LIMIT instead. Regarding SELECT COUNT(*) ... WHERE knn(), it's important to note that such a query often doesn't make practical sense. This is because knn, by design, doesn't filter data — it simply applies an artificial cutoff defined by k.

However, I noticed something odd. For example, this query sometimes returned only 409 results, which is lower than expected, provided all the limits (k, limit, max_matches) are set to 100,000, matching the total document count in the table and the implicit cutoff is off. There’s also only one disk chunk and an empty RAM chunk. I’ll discuss this with the development team to understand what’s happening.

mysql> SELECT count(*) FROM lisdocument2 WHERE  knn (vector, 100000, (0.31006125,-0.0003656157,0.081459515,0.515771,0.014484076,-0.66133505,0.32237425,-0.55777,-0.386141,-0.023305604,0.11610652,-0.67190164,-0.08708035,0.08902938,-0.6518018,-0.051225036,-1.1461623,-0.41078997,0.17181408,-0.4735955,0.03203327,0.23325141,-0.9909842,-0.3320622,0.201117,0.6546775,-0.7582774,0.9626067,0.7869749,0.7003234,0.5161592,-0.47969586,1.1994319,0.28759456,-0.26597247,-0.22219379,0.33459783,-0.97726834,-0.4739887,0.43206725,0.01194832,-0.05791433,0.4811258,-0.8271685,0.3951732,0.24713722,0.11192199,1.0213652,0.19014879,-0.6778019,0.50749946,-1.049119,0.4575615,0.9339654,0.58290714,-0.097028166,-0.48701167,-1.0065265,-0.13599022,-0.40211567,-0.5544309,0.1297585,0.19523889,-0.77102447,0.64259815,-0.30415413,-0.23215108,0.67778605,0.618211,0.7600791,0.23732586,-0.7648567,0.062301278,0.13431896,0.66374356,0.40523198,0.46663165,0.51943946,-0.73183507,-0.57254523,0.50741565,-0.14492488,-0.026737308,-0.63081896,-0.041068904,0.43559575,0.1266691,0.68689376,0.013397118,0.53886193,0.20262197,0.29941437,-0.6999152,0.28189236,0.9438162,0.6245068,-0.2915007,-0.08586646,-0.39370182,0.6360048,-0.38291323,0.7843485,-0.54510206,-0.52077717,1.2116083,-0.12180229,0.13172503,-0.10871028,0.41399533,-0.47680008,0.9621055,-0.54643613,0.21518388,-0.54834914,-0.1906337,-0.09179794,-0.590677,-0.52437264,-0.7151974,-0.02695119,0.21971804,-0.024934726,0.05605529,0.44665658,-0.12020654,0.26943594,-0.10920261,0.20266409,0.012066222,0.19158714,-0.12457689,-0.4856409,0.16108908,-0.20201308,0.041395552,-0.34803805,-1.1102498,-0.74369335,-1.6364297,1.0657281,0.07174268,0.010433137,0.7891052,0.60267335,-0.2899389,0.4021704,-0.20868495,-0.003395542,-0.05169468,-0.17775702,0.21387385,0.0991693,-0.9152813,0.19724046,0.35551172,-0.4999967,-0.8879677,0.16530086,-1.1837602,-0.59362537,0.12462508,-0.34771776,0.9998891,0.8668287,-0.6882197,0.2541907,0.49837124,0.15233028,0.336384,-0.038960453,-0.7221209,-0.071910635,0.04892005,-0.21825159,0.16531892,-0.8357419,0.5070645,0.1569018,0.313003,-0.17091025,-0.33268282,1.1826873,0.24373363,-0.15638427,0.27422035,0.39059705,-0.34179664,-0.35423273,-0.7033785,0.21909295,1.1341084,0.27519116,1.1161767,-0.56932676,-0.9429059,0.40243575,-0.5576759,-0.21010822,-0.4341065,1.2042234,-0.82608193,0.4289299,-0.14796726,-0.027835188,-0.26135963,-0.5276095,-0.88423705,0.04380098,-0.43928683,0.41062203,-0.34576598,0.13919765,-0.01392624,-0.5553707,-0.45019755,0.91757697,0.4029668,0.1176647,-0.15419589,0.73095477,0.40260544,0.43138027,0.21548833,-0.5267257,-0.30721363,-0.25096813,-0.6048828,0.7789119,0.15235308,0.17003383,0.8087661,-0.38634503,-0.29575384,0.3041153,-0.0025674396,-0.12190996,0.5652693,0.123609096,-0.41633,-0.075612135,0.5379093,0.47812945,-0.7582835,0.48872802,0.17062183,-0.41053674,0.24030015,0.43435127,0.40152794,0.006724952,-0.32310417,-0.35114315,-0.37304744,-0.53361934,-0.14794493,0.5957107,0.48854348,0.22275516,0.28737432,-0.55769044,0.19703211,-0.43264365,0.70282584,0.8103258,0.4166922,0.18818198,-0.6881402,-0.8271788,0.33227536,0.44830915,-0.21777764,-0.3315875,-0.18762241,-0.5730435,1.2522116,0.05436271,0.5053515,0.08252778,-0.7671772,-0.67160285,0.7309791,-0.6450694,-0.23141778,0.081859656,-0.31854483,-0.66024923,0.14364149,0.30650187,0.14491636,0.09296628,0.29341727,-0.28662416,-0.8733087,0.73710185,0.6832434,-1.0337256,-0.36529627,-1.185272,0.70611227,-0.21201886,-0.4519033,0.35380524,0.24411415,0.7839299,-1.0546187,-0.5108705,-0.040125087,0.0038510442,0.16296358,0.14024405,-0.29560274,-0.36276716,1.3071771,0.086237356,-0.791835,0.11178309,-0.12588637,-0.4115165,0.30388063,-0.51085454,-0.090283126,-1.5626551,0.53435034,-0.22315815,-0.31446335,0.48053166,0.64726245,-1.073468,0.56286776,0.10031773,0.39174485,0.5910364,-0.27479896,-0.25149876,-0.064312816,-0.07417552,0.8132626,0.07798847,0.25687593,-0.60261244,-0.45493522,0.54915696,-0.053107496,-1.2380457,-0.21066967,-0.1940282,-0.43841827,0.17368084,-0.14619267,-0.2789112,0.2749565,0.4712215,0.3926232,-0.32814994,-1.0122043,-1.0696679,-0.16326195,-0.975733,0.39950857,0.19773667,-0.16035733,-0.058070622,-0.118425414,0.18218455,0.6603359,0.54957753,0.44137517,0.21397805,-0.22320518,-0.91974175,0.48958048,-0.084736824,-0.426293,0.8708147,-0.2713833,-0.39039102,0.30332446,-0.7585659,-0.12065592,-0.15349118,0.8288996,0.23025775,-0.35495567,0.21538107,0.86950296,-0.00998487,0.0017167316,0.120606296,0.4414424,0.4505997,0.829459,0.7318415,-0.8386215,0.2715878,0.65265125,-0.07971094,0.48587146,-0.5186518,0.55407375,0.2426369,0.21575326,-0.044891708,-1.2336338,-0.10642625,-0.5177275,-0.6177509,0.8570399,1.043228,-0.2713039,0.29890174,-0.47719136,0.09048428,0.81275654,0.22632912,-0.27022663,0.10290417,0.4140533,0.17572246,0.48987168,0.6703758,-0.46391836,0.075863644,15.550138,0.62152815,-0.04971984,0.10588257,0.17923146,0.15616268,-0.46352684,-0.08553317,-0.8809511,-0.13167156,0.88224876,-0.1254104,0.08551519,0.41633347,-0.39136684,-0.5745892,0.059034057,0.27425095,0.6071214,-1.021161,0.40765795,0.41800106,-0.2145601,0.020918911,0.60376966,0.48886603,0.23570696,-0.07289946,0.7897558,-0.38197488,0.8834917,-0.48737484,0.72564256,1.0258603,-0.7900168,-0.1589608,-0.5894068,-1.2211249,0.21977632,-0.032760903,-0.73974997,0.027299361,-0.40298098,-0.59684885,0.18560478,0.22460128,-0.25463927,0.6956338,0.04916962,-0.05426601,0.4170774,0.4209522,0.69340205,-0.5526944,0.13108079,0.028241986,0.27075502,0.058803745,-0.12544855,-0.6035113,-1.2496368,-0.6891652,-0.21734844,0.88274825,0.21550162,0.900132,0.23280652,0.022997925,0.070107326,0.2929837,-0.9167534,-0.44679806,0.6751843,0.26168746,-0.9102508,-0.53386253,-0.11613186,-0.33997464,-0.65545017,0.34133926,0.5415031,-0.8402537,-0.4420601,0.9791182,0.3558302,-1.3223187,0.58805984,-0.9981405,-1.0075716,-0.10522475,-0.4694484,-0.88297,-0.073731676,-0.40783596,-0.44489437,-0.35490802,0.5813389,-0.48236555,-0.8025031,0.14774305,-0.07733792,-0.04009475,-0.19666764,-0.13940135,0.56877077,0.89890414,-0.34569648,0.79776573,-0.091007896,-0.69942564,-1.1672308,0.30055726,-0.84184074,-1.0750866,0.53956366,-0.9331158,-0.29834008,0.0052645435,0.44171295,-0.4195513,0.8877479,-0.058045257,0.05742934,-0.16210446,-1.0204247,0.44277644,0.13323916,-1.1805303,-0.21072462,0.12990586,-0.044168193,-1.2782251,-0.25332808,0.40048394,0.2744026,-0.6884139,0.75036937,-0.8379228,0.1622562,0.1323186,-0.027955009,-0.42298132,-0.17523514,-0.64205575,0.44626942,0.11984201,0.61406684,-0.5398223,0.13014658,1.4530864,0.072602026,-0.34999475,-0.7632572,-0.3052821,-0.3685159,-0.30019602,-0.26202905,0.065592654,0.1356203,-0.23066288,-0.06168315,1.18814,0.18889207,-0.496132,0.71889156,0.3918985,-0.1458961,0.12719785,-0.8727122,-0.33598074,-0.20535125,-0.48016322,-0.40368298,-0.5138646,-0.62812066,0.455146,-0.23726645,-0.35210988,0.6264089,-0.35540456,-0.8420332,-0.4773195,-0.043095402,0.5031531,0.31352416,-0.56513035,0.6253261,-0.22605664,-0.022685153,0.56507564,0.13309203,-0.2219282,-0.74726635,-1.0282854,0.097224504,-0.030685902,-0.053559277,-1.3350763,0.6840824,-0.21796289,0.54228777,-0.1751819,0.185303,-1.3930091,0.28261814,0.55813426,0.058485728,-0.2205351,0.5061636,-0.45415583,-0.41728288,0.21399207,0.10460134,-0.5251125,-0.05460708,0.12028494,-0.8888363,-0.5616349,1.0626166,-0.57732487,-0.71718526,-0.3703574,0.18429625,0.5347522,0.20444115,0.57214594,0.37266445,-1.2059611,0.8612643,0.48821098,-0.43664938,-0.82340217,0.67264736,0.22491321,-0.74951804,0.0871341,-0.27833295,0.46309146,-0.78231233,-0.21181802,0.42298067,-0.07849596,0.5937467,1.0871675,0.019744089,-0.7014219,0.20075509,-0.64872456,-0.6484867,-0.48085892,0.7048896,0.37797463,-0.19823772,0.009384685,-0.44905084,0.47751105,0.07683447,-0.7515542,-0.14501087,0.17405201,0.17612083,0.54188806,-0.2535289,0.3439073,-0.6030962,0.38111544,-0.651172,-0.96390456,-0.030525658,-0.26229542,-0.07045684,0.28682828,0.4695437,0.48653972,-0.021991393,-0.19629684,-0.104265794,0.72771174,1.1777867,-0.42338756,-0.28041443,0.9536649,1.531987,-0.29626778,0.4382578,-0.715345,-0.8170925,0.94427216,1.243451,0.17655466,0.62224877,0.20158736,0.16209148,0.09094697,0.054510932,0.18709274,0.43218136,0.750771,0.52881354,-0.05870044,0.15825632,0.85584885,0.435063,0.10643143,0.95394486,1.6185488,-0.46994826,-0.0854546,-0.5466821,0.57927006,-0.428444,0.26448515,-0.6947298,1.218788,-0.14955972,0.18937686,0.12341008,-1.0414394,1.0318452,0.18007237,0.32144284,-0.41614947,-0.08107099,-0.24519788,-0.10084439,-0.5396372,0.36262137,-0.26675776,-0.21520412,0.14965774,-0.058672633,0.50249416,0.24154651,0.49916118,0.3448778,0.12129112,0.23797168,-1.1313815,-0.55008906,-0.7546291,-0.34192833,-1.1170193,-0.27139792,-1.0538054,-0.16249998,-0.90182513)) limit 100000 OPTION cutoff = 0, boolean_simplify = 1, max_matches = 100000;
+----------+
| count(*) |
+----------+
|      409 |
+----------+

I'll create a separate issue about it.

As for your original query:

SELECT COUNT(*) FROM lisdocument_20241122_2991 
where title_len_chars > 0 AND text_len_chars > 0 
AND type IN ('PROGRAM_ELEMENT','PLANNED_PROGRAM','PROJECT','ITEM');

this is a different case. If you notice a count difference with a query like this, please share a test case so we can investigate further.

@mohdmsl
Copy link
Author

mohdmsl commented Dec 9, 2024

Thanks for your answer @sanikolaev

  1. I will reduce disk chunk to one and an empty the RAM chunk (I just wanted to know about any major performance implications if I have millions of data residing in a single disk chunk)
  2. Please let me know once you have discussed with your team about the query which returns 409 results

@mohdmsl
Copy link
Author

mohdmsl commented Dec 9, 2024

I added optimize_cutoff = '1' to the table schema with the expectation that it would maintain a maximum of one disk chunk at all times. However, when I load data, it initially creates multiple disk chunks, which are later merged. This merging process takes an exceptionally long time—in my case, it took over 3 hours for the disk chunks to consolidate into a single chunk.

The purpose of this parameter is to guarantee there is only one disk chunk at any given time, but that behavior is not being observed.

@sanikolaev
Copy link
Collaborator

Yes, merging can take some time, which is why it usually happens in the background, so you don’t need to worry about it. But I’m still not sure I understand your goal. If it’s SELECT COUNT(*) ... WHERE KNN(), what’s the purpose? Could you explain it further? From what I understand, you might be able to skip this query altogether and simply use your k value or just SELECT COUNT(*) FROM table instead depending on what you need. where knn() anyway doesn't do any filtering.

@sanikolaev
Copy link
Collaborator

I'll create a separate issue about it.

manticoresoftware/columnar#73

@mohdmsl
Copy link
Author

mohdmsl commented Dec 16, 2024

We have a use case where we want to show count of records having knn_dist() < 0.1 . Thats why I need this SELECT COUNT(*) ... WHERE KNN()

@sanikolaev
Copy link
Collaborator

Alright, that changes things, so what you are looking for is just not supported yet. Here's an MRE:

mysql> drop table if exists t; create table t(v float_vector knn_type='hnsw' knn_dims='1' hnsw_similarity='l2'); insert into t values(1, (0.1)),(2, (0.6)); select count(*) from t where knn(v, 5, (0.3)) and knn_dist() < 0.05;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table t(v float_vector knn_type='hnsw' knn_dims='1' hnsw_similarity='l2')
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
insert into t values(1, (0.1)),(2, (0.6))
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select count(*) from t where knn(v, 5, (0.3)) and knn_dist() < 0.05
--------------

ERROR 1064 (42000): P01: syntax error, unexpected '(' near '() < 0.05'

Feel free to create a separate feature request about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting Waiting for the original poster (in most cases) or something else
Projects
None yet
Development

No branches or pull requests

2 participants