Queries on taxi dataset are slow with ps=1 #1316

githubmanticore · 2023-08-02T03:55:40Z

Queries on partial taxi dataset (6 indexes) are almost 2x slower with pseudo_sharding enabled:

mysql> set global pseudo_sharding=0; SELECT avg(total_amount) FROM taxi WHERE trip_distance = 5; show meta; 
Query OK, 0 rows affected (0.00 sec) 
 
+-------------------+ 
| avg(total_amount) | 
+-------------------+ 
|       17.94111949 | 
+-------------------+ 
1 row in set (0.07 sec) 
 
+----------------+-------------------------------------+ 
| Variable_name  | Value                               | 
+----------------+-------------------------------------+ 
| total          | 1                                   | 
| total_found    | 1                                   | 
| total_relation | eq                                  | 
| time           | 0.065                               | 
| index          | trip_distance:SecondaryIndex (100%) | 
+----------------+-------------------------------------+ 
5 rows in set (0.00 sec) 
 
mysql> set global pseudo_sharding=1; SELECT avg(total_amount) FROM taxi WHERE trip_distance = 5; show meta; 
Query OK, 0 rows affected (0.00 sec) 
 
+-------------------+ 
| avg(total_amount) | 
+-------------------+ 
|       17.94111949 | 
+-------------------+ 
1 row in set (0.12 sec) 
 
+----------------+----------------------------------------------------------------------+ 
| Variable_name  | Value                                                                | 
+----------------+----------------------------------------------------------------------+ 
| total          | 1                                                                    | 
| total_found    | 1                                                                    | 
| total_relation | eq                                                                   | 
| time           | 0.122                                                                | 
| index          | trip_distance:SecondaryIndex (18%), trip_distance:ColumnarScan (81%) | 
+----------------+----------------------------------------------------------------------+ 
5 rows in set (0.00 sec)

This needs to be fixed.

The text was updated successfully, but these errors were encountered:

githubmanticore · 2023-08-02T03:55:45Z

➤ Ilya Kuznetsov commented:

Incorrect CBO path selection was due to 2 things:

CBO didn't account for the fact that implicit group sorter is faster than plain group sorter
Histograms sometimes gave VERY misleading estimates when value distribution is uneven. Changed max histogram size from 1k to 8k which helped a lot (requires reindexing to work).

Fixed those issues in 92d722a

githubmanticore added the bug label Aug 2, 2023

githubmanticore closed this as completed Aug 2, 2023

chenrui333 mentioned this issue Aug 4, 2023

manticoresearch 6.2.0 Homebrew/homebrew-core#138555

Closed

sanikolaev added the rel::6.2.0 label Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries on taxi dataset are slow with ps=1 #1316

Queries on taxi dataset are slow with ps=1 #1316

githubmanticore commented Aug 2, 2023

githubmanticore commented Aug 2, 2023

Queries on taxi dataset are slow with ps=1 #1316

Queries on taxi dataset are slow with ps=1 #1316

Comments

githubmanticore commented Aug 2, 2023

githubmanticore commented Aug 2, 2023