Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results for call keywords with/out 1 as stats #2267

Closed
5 tasks done
donhardman opened this issue May 31, 2024 · 1 comment
Closed
5 tasks done

Different results for call keywords with/out 1 as stats #2267

donhardman opened this issue May 31, 2024 · 1 comment
Assignees
Labels

Comments

@donhardman
Copy link
Contributor

donhardman commented May 31, 2024

Bug Description:

When we include '1 as stats' as a parameter, we get different results that seem like a bug.

check.tar.gz

The database dump file that you need to restore using the command mysql -h0 -P9306 < check.sql is attached.

mysql> call keywords('asto*', 'check');
+------+-----------+----------------+
| qpos | tokenized | normalized     |
+------+-----------+----------------+
| 1    | asto*     | =astore        |
| 1    | asto*     | =astound       |
| 1    | asto*     | =astonished    |
| 1    | asto*     | =astonishment  |
| 1    | asto*     | =astonishingly |
| 1    | asto*     | =astoundingly  |
+------+-----------+----------------+
mysql> call keywords('asto*', 'check', 1 as stats);
+------+-----------+----------------+------+------+
| qpos | tokenized | normalized     | docs | hits |
+------+-----------+----------------+------+------+
| 1    | asto*     | =astore        | 1    | 1    |
| 1    | asto*     | =astound       | 1    | 1    |
| 1    | asto*     | =astonished    | 4    | 4    |
| 1    | asto*     | =astonishment  | 1    | 2    |
| 1    | asto*     | =astonishingly | 1    | 1    |
| 1    | asto*     | =astoundingly  | 1    | 1    |
+------+-----------+----------------+------+------+


mysql> flush ramchunk check;


mysql> call keywords('asto*', 'check');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | asto*     | asto*      |
+------+-----------+------------+
mysql> call keywords('asto*', 'check', 1 as stats);
+------+-----------+----------------+------+------+
| qpos | tokenized | normalized     | docs | hits |
+------+-----------+----------------+------+------+
| 1    | asto*     | =astonished    | 4    | 4    |
| 1    | asto*     | =astonishment  | 1    | 2    |
| 1    | asto*     | =astonishingly | 1    | 1    |
| 1    | asto*     | =astore        | 1    | 1    |
| 1    | asto*     | =astound       | 1    | 1    |
| 1    | asto*     | =astoundingly  | 1    | 1    |
| 1    | asto*     | asto*          | 0    | 0    |
+------+-----------+----------------+------+------+
mysql>

Manticore Search Version:

Latest dev version

Operating System Version:

Ubuntu Jammy

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@tomatolog
Copy link
Contributor

at f3bb7e9 I fixed the issue.

The case that if we do not query for 1 as stats for CALL KEYWORDS then no need to process the disk chunks as the parent RT index tokenizes the terms well, ie the tokenized \ normalized are same from the RT index and after disk chunks - that is why the code use short-cut and exists does not process the disk chunks.

Now I fixed if the RT index process terms from the CALL KEYWORDS and see the wildcards in any of the terms it goes down to disk chunks as disk chunk could have different dictionaries and could expand different terms from the same wildcard term.

sanikolaev pushed a commit that referenced this issue Jun 25, 2024
@sanikolaev sanikolaev added the rel::6.3.2 Released in 6.3.2 label Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants