Add support for compacting small files for Hive tables#9398
Add support for compacting small files for Hive tables#9398losipiuk wants to merge 42 commits intotrinodb:masterfrom
Conversation
findepi
left a comment
There was a problem hiding this comment.
Some initial thoughts.
Sorry for a bunch of low-levels too
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/TableProcedureExecutionMode.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/TableProcedureExecutionMode.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/TableProcedureExecutionMode.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| Scope tableScope = analyze(table, scope); |
There was a problem hiding this comment.
Is this equivalent to visitTable(table, scope)?
seem this will re-resolve table again. can we reusable tableScope (relationtype) creation without doing that?
There was a problem hiding this comment.
Yeah. I think we could if recorded in the scope what was the type of analyzed relation (view/materialized view/table). But it does not seem we are recording that information.
There was a problem hiding this comment.
We record that in io.trino.sql.analyzer.Analysis#registerTable
Would be good to reuse visitTable logic, since there is a lot going on here: tables, views, materialized views, redirections. Masks and filters -- we could pull them from Analysis too.
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
| } | ||
| node.getWhere().ifPresent(where -> analyzeWhere(node, tableScope, where)); | ||
|
|
||
| // analyze ORDER BY |
There was a problem hiding this comment.
ORDER BY is nice because it allows us to tap into distributed sort when rewriting data
however, plan order by may be too limiting
- total ordering may be not required, sorting subsets of data can be sufficient (per file, per certain amount of data, grouped execution)
- different ordering schemes (eg z-order) may come useful. we should think how we will model them (an expression?)
There was a problem hiding this comment.
It is not used so far. And for local ordering (e.g for Z-ordering) we can express intention via WITH parameters.
e.g
WITH (z_order_columns = ARRAY['a', 'b'])
Given that maybe we should drop support for ORDER BY for now?
8cad0ce to
f9ef708
Compare
alexjo2144
left a comment
There was a problem hiding this comment.
Slowly making my way through the commits, but had a question. Have you thought about how these procedures will interact with Access Control?
There was a problem hiding this comment.
Why does this need to be Concurrent?
There was a problem hiding this comment.
It is modelled after AbstractPropertyManager. I think theoretically the map can be modified by multiple threads in parallel as connectors are registered/unregistered. Not sure if that is really the case now.
There was a problem hiding this comment.
Not sure if that is really the case now.
it's currently not.
the connectors are loaded serially during server startup, but it's conceivable that this becomes parallel
Great question. I left it for later and totally forgot about that. I guess the most straightforward approach would be to add |
f9ef708 to
b58f58b
Compare
Added "Add access control for table procedures" commit. |
findepi
left a comment
There was a problem hiding this comment.
up to "Add support for table procedures SPI calls to Metadata"
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/TableProcedureMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Not sure if that is really the case now.
it's currently not.
the connectors are loaded serially during server startup, but it's conceivable that this becomes parallel
There was a problem hiding this comment.
Seems like what you need is to make AbstractPropertyManager<K> (K - key)
and add public method in subclasses that would convert API to internal key
There was a problem hiding this comment.
Yeah. The use would be somewhat less-nice. But it will work. Do you want me to update PR towards that?
There was a problem hiding this comment.
this could be static and shared with AbstractPropertyManager
core/trino-main/src/main/java/io/trino/metadata/TableProceduresPropertyManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/metadata/TableExecuteHandle.java
Outdated
Show resolved
Hide resolved
findepi
left a comment
There was a problem hiding this comment.
"Add parser/analyzer support for ALTER TABLE ... EXECUTE"
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| Scope tableScope = analyze(table, scope); |
There was a problem hiding this comment.
We record that in io.trino.sql.analyzer.Analysis#registerTable
Would be good to reuse visitTable logic, since there is a lot going on here: tables, views, materialized views, redirections. Masks and filters -- we could pull them from Analysis too.
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
if you ignore indent (since it must be 0 here), add checkArgument(indent==0,"...")
same for others, preexistings (separate pr)
There was a problem hiding this comment.
I think this is true now. But this is not obvious to me that it must always be true.
E.g. EXPLAIN ALTER ... does not bump indent. But it could do so.
findepi
left a comment
There was a problem hiding this comment.
"Add access control for table procedures"
core/trino-main/src/main/java/io/trino/security/AccessControl.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/security/AccessDeniedException.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorAccessControl.java
Outdated
Show resolved
Hide resolved
b58f58b to
a6d68f5
Compare
|
I sent out first batch of fixups. Let me know if I prefer to keep it this way for a while. Or should I squash those in? |
eae2195 to
ec74a7f
Compare
There was a problem hiding this comment.
doing it here seems wrong as it looks like we can have multiple SourcePartitionedSchedulers per single query. E.g. those can be created via FixedSourcePartitionedScheduler. Need a rework.
cc: @findepi
39c46e2 to
dd93cc4
Compare
core/trino-main/src/main/java/io/trino/metadata/TableProceduresRegistry.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/TableProcedureMetadata.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/Analysis.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
is this used for https://trino.io/docs/current/sql/execute.html ?
can the two be mistaken?
There was a problem hiding this comment.
I do not believe EXECUTE you mentioned classifies as an update-statement. And in code we are not passing "EXECUTE" ever to io.trino.execution.QueryStateMachine#setUpdateType.
I can put something else here (not sure how much it matters), but EXECUTE was matching what we do for other ALTER ... statements. E.g. for ALTER TABLE ... ADD COLUMN we put ADD COLUMN as update type.
There was a problem hiding this comment.
I am unsure about it. First, this is gonna change (eg #7994), but it's OK to support less. Correctness first.
However, expressability with a TD doesn't guarantee predicate will be subsumed. For example, Hive connector doesn't consume TD over non-partition columns.
Thus it feels the check here, at this stage, doesn't guarantee anything and we need a different check in a different place. So i hope we can remove the check here.
There was a problem hiding this comment.
Yeah - that is fair point that even with check here we depend on the connector, that it actually consumes the predicate instead of ignoring it. And if it cannot for some reason it should throw an exception.
I am not sure if we can provide any engine-side validation if connector behaves up to contract to be honest.
There was a problem hiding this comment.
If you kept the predicate in the form of Expression through the Optimizer phase, some optimizations could be applied and transform a not supported predicate into a supported one.
There was a problem hiding this comment.
Good point. Though then I would need to structure SPI very differently. The getTableHandleForExecute would not take the constraint parameter and we would depend on applyFilter to do the job. Yet then we would need to have some mechanism (validation optimizer rule?) to verify that at the end there is no filter between TableScanNode and TableWriterNode if TableScanNode if we are in executing ALTER TABLE EXECUTE.
It feels doable, though more complex and I am not sure if we are getting the true benefit, given the fact that the condition passed in the WHERE clause will most probably be simplistic (conjunction of range predicates?).
@kasiafi do you feel strongly about that?
There was a problem hiding this comment.
we would depend on applyFilter to do the job. Yet then we would need to have some mechanism (validation optimizer rule?) to verify that at the end there is no filter
If I understand correctly, we still need some validation that the whole constraint is consumed? Maybe it would be good to use existing mechanisms for pushing predicate / handling the non-accepted part?
However, if the Constraint is built here, I was thinking if this validation could go to the Analyzer.
There was a problem hiding this comment.
If I understand correctly, we still need some validation that the whole constraint is consumed
We cannot really validate that. The contract is that connector should throw exception if the constraint cannot be consumed fully. We can discuss if contract is nice when it comes to SPI shape, but we cannot do anything if connector does not obey it (does not consume, and does not throw). If I change the approach to depend on applyFilter we still cannot do any validation. If connector does not consume predicate, yet it returns empty remainingFilter in ConstraintApplicationResult the engin will not know.
Maybe it would be good to use existing mechanisms for pushing predicate / handling the non-accepted part
It feels to me it would be a bit nicer. And (as you said) handle more predicate shapes. On the other hand the logic of single procedure would be even more spread around the codebase, and harder to follow. I would start with proposed appraoch and refactor as a followup if we decide it is worth it.
However, if the Constraint is built here, I was thinking if this validation could go to the Analyzer
Not sure I fully understand what you suggest here. Let's chat on slack.
dd93cc4 to
b3947f4
Compare
There was a problem hiding this comment.
Could you use RelationPlanner to process the table and get the RelationPlan? The above mostly duplicates the code of RelationPlanner.visitTable().
There was a problem hiding this comment.
It is not that straightforward. visitTable in RelationPlanner would take TableHandle to be used with TableScanNode from analysis. This is not what I want.
I want to use TableHandle from TableExecuteHandle.sourceTableHandle. Any suggestions how should i proceed?
- I guess I can modify
analysisobject I have before calling out toRelationPlanner(or rather create new modified one base on the we got on call toLogicalPlanner.planStatement). - The other option would be to create a public helper
RelationPlanner.planTableWithHandle(Table table, TableHandle handle)and make it share code withRelationPlanner.visitTable.
Leaning towards latter. WDYT?
There was a problem hiding this comment.
Oh ... current code is actually messed up.
I am using TableExecuteHandle.sourceTableHandle in the TableScanOperator but I am using ColumnHandles taken from analysis; and the two may not be compatible.
for (Field field : scope.getRelationType().getAllFields()) {
Symbol symbol = symbolAllocator.newSymbol(field);
outputSymbolsBuilder.add(symbol);
assignments.put(symbol, analysis.getColumn(field));
}I need to wrap my head around it :/
There was a problem hiding this comment.
Oh ... current code is actually messed up.
Or maybe that is not a problem. We have pieces of code already when we change TableHandle in TSO but still use old ColumnHandles (e.g. after applyFilter).
Another question: Is this fine to assume here that order of symbols in plan I got for planning TS for Table matches the order of ColumnHandles i got from ConnectorTableMetadata?
There was a problem hiding this comment.
If you kept the predicate in the form of Expression through the Optimizer phase, some optimizations could be applied and transform a not supported predicate into a supported one.
core/trino-main/src/main/java/io/trino/sql/planner/LogicalPlanner.java
Outdated
Show resolved
Hide resolved
findepi
left a comment
There was a problem hiding this comment.
"Pass splits info to TableFinish operator for ALTER TABLE EXECUTE"
core/trino-main/src/main/java/io/trino/execution/SqlTaskManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/TableExecuteContext.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/TableExecuteContext.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/TableFinishOperator.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/split/SampledSplitSource.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Since io.trino.spi.connector.ConnectorSplitSource#getTableExecuteSplitsInfo returns Optional, i think this should be Optional<List> too
There was a problem hiding this comment.
Current contract is that for TableExecute flow ConnectorSplitSource must return non-empty optional here. Hence starting from TableExecuteContext we do not have any Optionals. I think this is simpler this way.
findepi
left a comment
There was a problem hiding this comment.
"Add tableExecuteSplitsInfo to FixedSplitSource"
core/trino-spi/src/main/java/io/trino/spi/connector/FixedSplitSource.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/FixedSplitSource.java
Outdated
Show resolved
Hide resolved
Add support for compacting small files for non-transactional, non-bucketed Hive tables. ALTER TABLE xxxxx EXECUTE OPTIMIZE WITH(file_size_threshold = ...)
8e53110 to
210b395
Compare
There was a problem hiding this comment.
throwing is bad and not throwing isn't great either. this is irrecoverable by us and requires user intervention
i think throwing is a better idea than logging, as it at least ensures problem is surfaced to the person invoking the procedure (query)
There was a problem hiding this comment.
Yeah - I very much would prefer to throw. Let me see. Maybe we can throw and still skip cleanup.
There was a problem hiding this comment.
PTAL now (last 2 commits)
cb30cf5 to
262ebf3
Compare
|
Replaced with: #9665 |
POC PR: High level review comments. No nit-picking at this phase please.
The PR adds support for ALTER TABLE execute syntax.
On top of that, it adds support for compacting small files for non-transactional, non-bucketed Hive tables.
Fixes #9466