-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Add support for compacting small files for Hive tables #9398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
43977c1
Add SPI for table procedures
losipiuk 1d19c7f
fixup! Add SPI for table procedures
losipiuk 2a6619c
fixup! Add SPI for table procedures
losipiuk 6985dbd
fixup! Add SPI for table procedures
losipiuk 30f646b
Add TableProceduresRegistry to Metadata
losipiuk 1db5b7d
fixup! Add TableProceduresRegistry to Metadata
losipiuk f153fbf
fixup! Add TableProceduresRegistry to Metadata
losipiuk 9a5b851
Generalize AbstractPropertyManager
losipiuk f2bd0a1
Add TableProceduresPropertyManager
losipiuk bf7790f
Add support for table procedures SPI calls to Metadata
losipiuk 14ed0ab
fixup! Add support for table procedures SPI calls to Metadata
losipiuk fd37a73
Add parser/analyzer support for ALTER TABLE ... EXECUTE
losipiuk 76ab884
fixup! Add parser/analyzer support for ALTER TABLE ... EXECUTE
losipiuk d08e204
fixup! Add parser/analyzer support for ALTER TABLE ... EXECUTE
losipiuk b83447f
fixup! Add parser/analyzer support for ALTER TABLE ... EXECUTE
losipiuk db68344
fixup! Add parser/analyzer support for ALTER TABLE ... EXECUTE
losipiuk 485f24d
Add access control for table procedures
losipiuk f407be1
fixup! Add access control for table procedures
losipiuk 9597bb2
Define QueryExecution for TableExecute statement
losipiuk 1074941
Add planner logic for TableExecute statement
losipiuk 1a09c45
fixup! Add planner logic for TableExecute statement
losipiuk 63b98da
Do not output rows count from ALTER TABLE EXECUTE
losipiuk 3d63c92
Pass splits info to TableFinish operator for ALTER TABLE EXECUTE
losipiuk 11d30ca
Prefer toImmutableList in HiveMetadata
losipiuk 898fa8c
Add Hive OPTIMIZE table procedure
losipiuk f56191c
fixup! Add TableProceduresRegistry to Metadata
losipiuk 77e91a3
fixup! Add SPI for table procedures
losipiuk 5138fff
fixup! Add parser/analyzer support for ALTER TABLE ... EXECUTE
losipiuk 3f1586a
fixup! Add planner logic for TableExecute statement
losipiuk e77c2bc
fixup! Add TableProceduresPropertyManager
losipiuk 62aa891
fixup! Add planner logic for TableExecute statement
losipiuk 04ad808
fixup! Pass splits info to TableFinish operator for ALTER TABLE EXECUTE
losipiuk d14457c
fixup! Pass splits info to TableFinish operator for ALTER TABLE EXECUTE
losipiuk b7c9b9b
fixup! Pass splits info to TableFinish operator for ALTER TABLE EXECUTE
losipiuk 2724556
fixup! Pass splits info to TableFinish operator for ALTER TABLE EXECUTE
losipiuk 21bc91c
fixup! Add Hive OPTIMIZE table procedure
losipiuk 012bb1a
fixup! Add SPI for table procedures
losipiuk a047359
Add tableExecuteSplitsInfo to FixedSplitSource
losipiuk 210b395
Allow filtering on partitions for Hive OPTIMIZE table procedure
losipiuk 5e43e3e
fixup! Add parser/analyzer support for ALTER TABLE ... EXECUTE
losipiuk 96d211f
Allow dropping declared write intentions in SemiTransactionalHiveMeta…
losipiuk 262ebf3
Add logic to limit chance for data loss in Hive OPTIMIZE table procedure
losipiuk File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
core/trino-main/src/main/java/io/trino/execution/TableExecuteContext.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| /* | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package io.trino.execution; | ||
|
|
||
| import com.google.common.collect.ImmutableList; | ||
|
|
||
| import javax.annotation.concurrent.GuardedBy; | ||
|
|
||
| import java.util.List; | ||
|
|
||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| public class TableExecuteContext | ||
| { | ||
| @GuardedBy("this") | ||
| private List<Object> splitsInfo; | ||
|
|
||
| public synchronized void setSplitsInfo(List<Object> splitsInfo) | ||
| { | ||
| requireNonNull(splitsInfo, "splitsInfo is null"); | ||
| if (this.splitsInfo != null) { | ||
| throw new IllegalStateException("splitsInfo already set to " + this.splitsInfo); | ||
| } | ||
| this.splitsInfo = ImmutableList.copyOf(splitsInfo); | ||
| } | ||
|
|
||
| public synchronized List<Object> getSplitsInfo() | ||
| { | ||
| if (splitsInfo == null) { | ||
| throw new IllegalStateException("splitsInfo not set yet"); | ||
| } | ||
| return splitsInfo; | ||
| } | ||
| } |
49 changes: 49 additions & 0 deletions
49
core/trino-main/src/main/java/io/trino/execution/TableExecuteContextManager.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| /* | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package io.trino.execution; | ||
|
|
||
| import io.trino.spi.QueryId; | ||
|
|
||
| import javax.annotation.concurrent.ThreadSafe; | ||
|
|
||
| import java.util.concurrent.ConcurrentHashMap; | ||
| import java.util.concurrent.ConcurrentMap; | ||
|
|
||
| @ThreadSafe | ||
| public class TableExecuteContextManager | ||
| { | ||
| private final ConcurrentMap<QueryId, TableExecuteContext> contexts = new ConcurrentHashMap<>(); | ||
|
|
||
| public void registerTableExecuteContextForQuery(QueryId queryId) | ||
| { | ||
| TableExecuteContext newContext = new TableExecuteContext(); | ||
| if (contexts.putIfAbsent(queryId, newContext) != null) { | ||
| throw new IllegalStateException("TableExecuteContext already registered for query " + queryId); | ||
| } | ||
| } | ||
|
|
||
| public void unregisterTableExecuteContextForQuery(QueryId queryId) | ||
| { | ||
| contexts.remove(queryId); | ||
| } | ||
|
|
||
| public TableExecuteContext getTableExecuteContextForQuery(QueryId queryId) | ||
| { | ||
| TableExecuteContext context = contexts.get(queryId); | ||
| if (context == null) { | ||
| throw new IllegalStateException("TableExecuteContext not registered for query " + queryId); | ||
| } | ||
| return context; | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a decision to be made whether table procedures should be statically or dynamically resolved.
for example, what should be the behavior when a procedure is applicable to certain relation types (eg only tables, or only certain type of tables)?
exposing it as if it existed might feel better and then throw at execute time "this is not applicable", but one could say the same about
$buckethidden column, or between-connectors user expectations (users may not appreciate the fact that hive supports different set of procedure names than iceberg connector).Also, it might be that we want to overload the name -- expose same procedure name, but with different properties or execution mode, or internal implementation. For example "compact table" is logically applicable to transactional and non-transactional tables, but is different operation internally.
Thus, i would suggest we don't register them here, and move the procedure name resolution from Connector into ConnectorMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had an offline converstation about this one. Let's leave it static for now.
tl;dr is:
compactfor Hive transactional table and for non-transactional table can do very different things). Yet we can achieve the same (similar) thing by doing conditional logic within the procedure implementation itself. Actually handling different table types within the procedure can be better, as we may construct nicer (non-generic) exception messages if the procedure is misused. Hence have better UX.