-
-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect results when using NOT IN and IN operators in a query #1123
Comments
could you issue the query to single index and provide one that produces the bad result? |
mysql> SELECT id FROM mysql> SELECT id FROM |
could you upload your index to our s3 as described at our manual https://manual.manticoresearch.com Uploading your data topic ? I will check issue locally here with data you provided. |
Thank you for your prompt response to my issue. I have uploaded the index and config files to help you better understand and reproduce the problem. Let me know if you also need source |
Hit the same bug. I have a medium size dataset (~ 55k docs, total size of all data files ~145 MiB) and selecting with manticore versions not affected:
manticore versions affected:
Happens with both RT and non-RT indices:
Various attempts to raise memory limits did not change the outcome:
Removing aggregation returns empty result set:
DELETE queries are similarly affected, both ways deleting too few and too many documents. Edit: removed non-relevant SQL query part |
I can reproduce the issue in 6.0.4 with secondary indexes ON: snikolaev@dev2:~/issue-1123$ mysql -P9315 -h0
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 6.0.4 1a3a4ea82@230314 (secondary 2.0.4 5a49bd7@230306) git branch HEAD (no branch)
...
mysql> SELECT id FROM idx WHERE id NOT IN (142529) and ANY(tag_ids) IN (6719,97781,51017,981,25625) ORDER BY total_ctr DESC LIMIT 0, 21;
+--------+
| id |
+--------+
| 142529 |
+--------+
1 row in set (0.00 sec)
mysql> set global secondary_indexes=0;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT id FROM idx WHERE id NOT IN (142529) and ANY(tag_ids) IN (6719,97781,51017,981,25625) ORDER BY total_ctr DESC LIMIT 0, 21;
+--------+
| id |
+--------+
| 127718 |
| 352740 |
| 190169 |
| 152047 |
| 74303 |
| 27001 |
| 223883 |
| 72821 |
| 108069 |
| 21919 |
| 77188 |
| 13975 |
| 35102 |
| 258368 |
| 190551 |
| 63791 |
| 162306 |
| 64851 |
| 75328 |
| 67280 |
| 140792 |
+--------+
21 rows in set (0.01 sec) but can't reproduce it in the latest dev version:
@Korkman @hrulik please check if the issue persists for you in the latest dev version - https://mnt.cr/nightly or https://hub.docker.com/layers/manticoresearch/manticore/dev/images/sha256-22694aef8296fa4a82dad933706c300e9afc1f8d2a3a1f42309a9eb0871ecfe2?context=explore |
➤ Ilya Kuznetsov commented: Turning secondary indexes on or off is just a hint for the query optimizer. It might decide to not use them even if they are available. Please provide |
➤ Ilya Kuznetsov commented: There are 4 different code paths to process docid filters, so it would be nice to see which one does not work as expected. |
Nightly fixes the problem! Still testing but looks good. |
6.0.4:
dev: 6.0.5-230526-f5cd92166:
Edit: added searchd -v |
➤ Ilya Kuznetsov commented: So the issue is that 6.0.4 uses |
Thought so, too, but I couldn't enforce index usage. Only when using IN instead of NOT IN it will use the index. |
➤ Ilya Kuznetsov commented: Latest dev version (3b62be2) has a lot more warnings about why trying to force a secondary index (or |
I am experiencing unexpected results when running a query with NOT IN and IN operators on Manticore Search. The same query works correctly on Sphinx 3. Here is the query I am running:
SELECT id FROM index, delta WHERE id NOT IN (142529) and tag_ids IN (6719,97781,51017,981,25625) GROUP BY id ORDER BY total_ctr DESC LIMIT 0, 21;
The expected result should not include the id 142529, however, the query returns the following result:
| id |
| 142529 |
| 410074 |
| 410077 |
| 410084 |
| 410085 |
| 410097 |
| 410111 |
I have tried several alternatives, such as using id <> 142529, but the results are still incorrect. I would like to understand the reason for this behavior and any possible fixes or workarounds.
Thanks in advance for your help.
No any error in logs
The text was updated successfully, but these errors were encountered: