Fix the explain (TYPE IO) Exception when table is hive partition tabl…#12349
Fix the explain (TYPE IO) Exception when table is hive partition tabl…#12349findepi merged 1 commit intotrinodb:masterfrom
Conversation
|
Please do follow the guideline for the Git commits Make the git commit message header much smaller, wrap the message to 70-80 chars, and concentrate on make it more descriptive for the other fellows working on your changes. |
There was a problem hiding this comment.
| if (partitionColumns.isEmpty() && partitions.isEmpty()) { | |
| if (partitions.isEmpty()){ | |
| return partitionColumns.isEmpty() ? TupleDomain.none() : TupleDomain.all(); | |
| } |
There was a problem hiding this comment.
@albericgenius partitions.isEmpty() check is common in both if statements.
Please consider applying the suggested change.
There was a problem hiding this comment.
Thanks and updated, I can see the improvement from your comments.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Use ImmutableList.of() instead of defining extra variable partitions for keeping the code smaller.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveMetadata.java
Outdated
Show resolved
Hide resolved
|
Please add in the description:
|
|
Please be more brief with implementation details in the commit message. |
Thanks for notes, I added :) |
|
|
@findepi and @findinpath |
|
@albericgenius generally it may take a while (a few hours/ a few days) until a maintainer merges the commit. Good job! @bitsondatadev we should probably document the PR process on https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md to avoid confusion. |
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
if (partitionColumns.isEmpty()) {
// not a partitioned table
checkArgument(partitions.size()==1 && UNPARTITIONED_ID.equals(getOnlyElement(partitions).getPartitionId()), "Unexpected partitions for a non-partitioned table: %s", partitions);
return TupleDomain.all();but then what remains would be (as it used to be)
if (partitions.isEmpty()) {
return TupleDomain.none();
}you return the opposite value. I don't understand why.
There was a problem hiding this comment.
- originally we return TupleDomain.none() when partitions.isEmpty()
- but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in
IoPlanPrinter.parseConstraints. because the constraint is none. - for no data of the PartitionedTable case, the partitions is empty, and partitionColumns is not empty
- I could be wrong, please help to point out, I will update asap.
There was a problem hiding this comment.
thanks for the explanation
- but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in
IoPlanPrinter.parseConstraints. because the constraint is none.
does it mean we should fix that method instead?
e5d2a9f to
e0a5f1b
Compare
There was a problem hiding this comment.
test_io_explain -> test_io_explain_with_empty_partitioned_table
using same table name as in the other test will make tests fail when run concurrently
There was a problem hiding this comment.
- updated the table name.
- IoPlanPrinter.parseConstraints will check the constraint is none or not. The logic is correct.
- I think if a partitioned table do not have any data, we should return TupleDomain.all() as constraint. there is only one case that partitions is empty and partitionColumns is not because of no data.
- Only partitionColumns and partitions are empty, we should return TupleDomain.none().
- What is your thought?
There was a problem hiding this comment.
I think if a partitioned table do not have any data, we should return TupleDomain.all() as constraint.
TupleDomain.none() is also correct ("no rows match the filter"), and may be more useful ("more correct"), as can allow pruning other parts of the query
There was a problem hiding this comment.
- This implement will not take effect "no rows match the filter". because there is no data(partitions is empty and partitionColumns is not empty), even we return TupleDomain.all(), there is no data match the filter, the result is same.
- I agree with you, this is better to fix inside IoPlanPrinter.parseConstraints, but i still do not know how to get partitions informations in plan process. i will continue to think about it tomorrow. if you have free time, please help to give me some suggestion.
Thanks for your time
Alberic
There was a problem hiding this comment.
even we return TupleDomain.all(), there is no data match the filter, the result is same.
io.trino.spi.connector.ConnectorMetadata#getTableProperties's io.trino.spi.connector.ConnectorTableProperties#predicate is used to inform the planner and allow deriving filters for other tables.
for example, a query
SELECT * FROM some_table JOIN empty_partitioned_table ON ...should be reduced to SELECT .. WHERE false because the planner realizes empty_partitioned_table has no rows (TupleDomain.none()).
There was a problem hiding this comment.
thanks for the explanation
- but in the case of no data of the PartitionedTable, we will throw IllegalArgumentException in
IoPlanPrinter.parseConstraints. because the constraint is none.
does it mean we should fix that method instead?
035afed to
9a4b2d1
Compare
There was a problem hiding this comment.
That's exactly what we would return if predicate is all, so it's probably not the right return value for the none case.
There was a problem hiding this comment.
IoPlan.TableColumnInfo cannot currently represent a table with NONE constraint.
In general, such table should be eliminated from the query plan, this happens e.g. here
We could fix the optimizer so that it happens as well in the SELECT * FROM empty_partitioned_table case and we probably should. However, this wouldn't eliminate the need to fix EXPLAIN (TYPE IO) from failing in such case, since it's generally a possible situation. That's why we deal with it also on page source level instead of failing there
So, back to our problem. We need to make IoPlan.TableColumnInfo be able to represent table scan without data, with none constraint.
I would suggest replacing Set<ColumnConstraint> columnConstraints field with
class Constraint {
boolean isNone;
Set<ColumnConstraint> columnConstraints;
}
@albericgenius please don't rebase, unless necessary. |
6fc32ec to
030f789
Compare
|
@findepi Thanks for your help and time. I am not sure my implement way is match your idea or not? now it affect some JDBC test cases because of this new Rule.
|
|
@albericgenius
However, i also see that we could choose NOT to fix the plan printer, and consider a plan with redundant TableScan left as "bogus" or "invalid". It's surely undesirable. @martint thoughts? |
d90cbf1 to
5254b78
Compare
A TableScan that produces no data is a perfectly valid plan, so that should be fixed in the plan printer. It might be undesirable from a performance perspective, but that's just an optimization. |
There was a problem hiding this comment.
make a defensive copy to ensure that Constraint is immutable
| this.columnConstraints = columnConstraints; | |
| this.columnConstraints = ImmutableSet.copyIf(requireNonNull(columnConstraints, "columnConstraints is null")); |
There was a problem hiding this comment.
The logic here should not depend on stats.
Stats are estimates, can be inaccurate and off.
There was a problem hiding this comment.
i think we need to replace this check with an if
if (constraint.isNone()) {
return new Constraint(true, ImmutableSet.of());
}There was a problem hiding this comment.
the indentation here is off and will change as soon as someone invokes reformatting
| new IoPlanPrinter.Constraint(withoutData, | |
| ImmutableSet.of( | |
| new IoPlanPrinter.Constraint(withoutData, ImmutableSet.of( |
There was a problem hiding this comment.
We now the expected result (constraint here shouldn't be none)
we don't need a variable and conditional initialization for that
There was a problem hiding this comment.
as above -- replace with constant and inline
5254b78 to
dd8ac8c
Compare
Currently, this can happen e.g. when a hive partitioned table is empty. Ideally, such a table scan should be eliminated from the plan, but plan printing should not rely on that, this would be just an optimization.
|
I applied some cosmetic changes (to the code & commit message) myself. Will merge once the build passes. Thank you for your contribution! |
Thanks for your time and your coaching. |
Description
Fix the explain (TYPE IO) Exception when table is hive partition table which is empty
Related issues, pull requests, and links
#10398
Documentation
(+) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(+) No release notes entries required.
( ) Release notes entries required with the following suggested text: