Add MariaDB statistics support#19408
Conversation
c5df09c to
4e50d1f
Compare
4e50d1f to
8c1747a
Compare
Change `TestingMySqlServer.execute(String sql)` and
`TestingMySqlServer.execute(String sql)` to run in context of the
default database ('tpch'), simplifying usage.
Get table statistics for MariaDB tables.
8c1747a to
fcd57b9
Compare
|
( just rebased, after #19391 merged ) |
plugin/trino-mariadb/src/main/java/io/trino/plugin/mariadb/MariaDbClient.java
Show resolved
Hide resolved
| if (!columnIndexStatistics.nullable()) { | ||
| double knownNullFraction = columnStatisticsBuilder.build().getNullsFraction().getValue(); | ||
| if (knownNullFraction > 0) { | ||
| log.warn("Inconsistent statistics, null fraction for a column %s, %s, that is not nullable according to index statistics: %s", table, columnName, knownNullFraction); |
There was a problem hiding this comment.
did you mean "column %s.%s"? otherwise it would log foo, bar where foo is table and bar is column.
There was a problem hiding this comment.
i admit having copied this from MySQL's io.trino.plugin.mysql.MySqlClient#updateColumnStatisticsFromIndexStatistics
yes, we can improve both places
| // row count from INFORMATION_SCHEMA.TABLES may be very inaccurate | ||
| rowCount = max(rowCount, columnIndexStatistics.cardinality()); |
There was a problem hiding this comment.
the code seems to assume inaccuracy is always towards the lower values - is this a fair assumption to make?
i.e. rowCount is always lower than column index cardinality value?
There was a problem hiding this comment.
i don't think it is. I think the estimate may be well under-estimated and also over-estimated.
However, row count and NDV are related. We meed to do something (either way).
I copied the current approach from MySQL connector stats code (where it's probably where i implemented it earlier)
There was a problem hiding this comment.
yeah, looking at a few other places the logic is that overestimating stats is better than underestimation because underestimation can actually lead to objectively bad plans.
e.g. broadcast of a table that is too large to broadcast is probably worse than a hash join even if the table could be broadcast.
Marking as resolved.
| AND SUB_PART IS NULL -- ignore cases where only a column prefix is indexed | ||
| AND CARDINALITY IS NOT NULL -- CARDINALITY might be null (https://stackoverflow.com/a/42242729/65458) | ||
| AND CARDINALITY != 0 -- CARDINALITY is initially 0 until analyzed | ||
| GROUP BY COLUMN_NAME -- there might be multiple indexes on a column |
There was a problem hiding this comment.
do we need to port this to MySQL as well?
There was a problem hiding this comment.
this was copied from MySQL and slightly adapted.
i think here AND CARDINALITY != 0 line was added (based on observation with MariaDB)
i suspect MySQL benefits from AND CARDINALITY IS NOT NULL line and maybe it's redundant for mariadb. it is very hard to test this, so i retained old condition and just added a new one
someone would need to observe what mysql is doing to determine whether AND CARDINALITY != 0 would make sense there
...trino-mariadb/src/test/java/io/trino/plugin/mariadb/BaseMariaDbTableIndexStatisticsTest.java
Show resolved
Hide resolved
| getQueryRunner()::execute, | ||
| "test_numeric_corner_cases_", | ||
| ImmutableMap.<String, List<String>>builder() | ||
| // TODO Infinity and NaNs not supported by MySQL. Are they not supported in MariaDB as well? |
There was a problem hiding this comment.
I'd look at type mapping tests to find the answer.
There was a problem hiding this comment.
good point.
@homar please follow-up here. See all // TODO Infinity and NaNs not supported by MySQL occurrences in the code.
|
|
||
| @Test | ||
| @Override | ||
| public void testStatsWithPredicatePushdownWithStatsPrecalculationDisabled() |
There was a problem hiding this comment.
one case to test here should also be when multi-column indexes exist.
I'm thinking of case where cardinality of a multi-column index can be > actual row count. (I don't know what "CARDINALITY" refers to in MariaDB case so maybe this is an impractical situation to test for).
There was a problem hiding this comment.
I'm thinking of case where cardinality of a multi-column index can be > actual row count.
that's possible because cardinality is just an estimate.
one case to test here should also be when multi-column indexes exist.
good point, we should have a test for this.
For both MySQL and MariaDB.
@homar can you please follow-up?
Description
Get table statistics for MariaDB tables.