Add table redirection support to SPI and Hive's redirection to Iceberg#5160
Add table redirection support to SPI and Hive's redirection to Iceberg#5160lxynov wants to merge 4 commits intotrinodb:masterfrom
Conversation
presto-main/src/main/java/io/prestosql/execution/CommitTask.java
Outdated
Show resolved
Hide resolved
644c0f2 to
a5cf1a0
Compare
presto-main/src/main/java/io/prestosql/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
@electrum, do we want to include the redirection details here? If so, maybe we should make a helper, or sub class for table redirection.
There was a problem hiding this comment.
It looks like redirection is always taking this form. Maybe we should just warn in the MetadataManager?
Thinking more about this, if we did the warn in the MetadataManager then we might not need to know that a redirection occurred to the method could return the original name if no redirection happens?
There was a problem hiding this comment.
I firstly tried to include redirection details but later found that the DefaultWarningCollector stores warnings in a map keyed on WarningCodes, so a redirection warning's message can be overwritten by another one. So we'll need to implement another WarningCollector if we want to include details here. If we don't want to do so, it makes sense to move the warning to MetadataManager. @dain @electrum I'd leave it to you to make a call.
There was a problem hiding this comment.
Interesting. Let's leave this alone for now.
presto-main/src/main/java/io/prestosql/execution/AddColumnTask.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/execution/CommentTask.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/execution/CreateTableTask.java
Outdated
Show resolved
Hide resolved
electrum
left a comment
There was a problem hiding this comment.
The warning calls everywhere makes the code really cluttered. They need to go inside MetadataManager.
Another issue is how to handle metadata operations like DESCRIBE, information_schema, etc., and how those relate to the low level ConnectorMetadata calls.
This PR is good for review purposes of the SPI, but before we merge it, we need to add another commit with the Iceberg connector changes, and full test coverage of all the affected areas (i.e. make sure all the DDL commands are tested with redirection). We may need to introduce redirection into another connector like Memory for testing purposes (e.g., we could redirect from one memory catalog to another).
|
We can introduce a replacement for Stream<ListColumnsResult> listColumns(ConnectorSession session, SchemaTablePrefix prefix);
class ListColumnsResult
{
SchemaTableName tableName;
Optional<CatalogSchemaTableName> redirection;
Optional<List<ColumnMetadata>> columns;
}The The engine guarantees that the |
|
We need a similar change for |
a5cf1a0 to
54aaaec
Compare
89ab49b to
a4aaaf5
Compare
electrum
left a comment
There was a problem hiding this comment.
The engine changes look good. I'm still reviewing the Iceberg commit
presto-main/src/main/java/io/prestosql/execution/AddColumnTask.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Add error code and include original name
throw new PrestoException(TABLE_REDIRECTION_LIMIT, format("Too many redirections (%s) for table: %s", MAX_TABLE_REDIRECTIONS, originalTableName));There was a problem hiding this comment.
Same comments as for redirectTable()
There was a problem hiding this comment.
We might be able to add a helper method in this class
QualifiedObjectName targetTable = getRedirectedTable(insert, insert.getTarget());
private QualifiedObjectName getRedirectedTable(Node node, QualifiedName name) { ... }There was a problem hiding this comment.
Hmm this helper method won't help reduce the amount of code so I didn't use it in the updated PR. Please let me know if you think it's still better to use it.
presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/io/prestosql/spi/connector/StandardWarningCode.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Maybe include the whole cycle?
Otherwise with a redirect like A -> B -> C -> D -> E -> D it will be hard to identity D and E loop
presto-main/src/main/java/io/prestosql/metadata/MetadataManager.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/io/prestosql/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
What's the point in returning map entries with Optional.empty?
Is not returning them equally correct?
the existing code doesn't seem to use those empty entries, so if there is a future intent, it requires documenting (eg via @apiNote)
There was a problem hiding this comment.
Optional.empty means the table is redirected. Since it may be in another catalog, MetadataManager::listTablePrivileges should redirect it firstly and then call ConnectorMetadata::listTablePrivileges
There was a problem hiding this comment.
Optional.empty means the table is redirected.
This i understand. But is it for? I didn't find any usage of those empty values, they are just filtered out . Should i search again?
There was a problem hiding this comment.
Hmm so in MetadataManager::listTablePrivileges:
public List<GrantInfo> listTablePrivileges(Session session, QualifiedTablePrefix prefix)
{
ImmutableList.Builder<GrantInfo> grantInfos = ImmutableList.builder();
ImmutableList.Builder<QualifiedObjectName> redirectedTables = ImmutableList.builder();
listTablePrivilegesWithRedirection(session, prefix).forEach((name, grantInfo) -> {
if (grantInfo.isPresent()) {
grantInfos.addAll(grantInfo.get());
}
else {
redirectedTables.add(new QualifiedObjectName(
prefix.getCatalogName(),
name.getSchemaName(),
name.getTableName()));
}
});
for (QualifiedObjectName redirectedTable : redirectedTables.build()) {
QualifiedObjectName finalTable = redirectTable(session, redirectedTable, WarningCollector.NOOP);
listTablePrivilegesWithRedirection(session, finalTable.asQualifiedTablePrefix()).values()
.forEach(tableGrantInfos -> tableGrantInfos.ifPresent(grantInfos::addAll));
}
return grantInfos.build();
}When the value is present, grant infos are added to grantInfos directly; otherwise the key is put into redirectedTables list. And for redirected tables, listTablePrivilegesWithRedirection is called again.
presto-hive/src/main/java/io/prestosql/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
It seems the implementation of listTablePrivilegesWithRedirection method could be simplified.
Before suggesting that i have a question on the listTablePrivilegesWithRedirection return type. Added there.
There was a problem hiding this comment.
Why the change here?
Is listTableColumns deprecated and should not be called?
Or, we should test both: listTableColumns and listTableColumnsWithRedirection here?
(this comment applies to other test places as well)
There was a problem hiding this comment.
listTableColumns is removed from HiveMetadata
a4aaaf5 to
afce887
Compare
54159c8 to
e2d6896
Compare
|
@electrum @findepi Thanks for the reviews! I've updated the PR. The failing check is unrelated. We can look into cancelled checks later. @findepi I changed the info schema related SPI methods to return |
|
CI failed due to #5892 but not only |
...o-main/src/main/java/io/prestosql/connector/informationschema/InformationSchemaMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I have a general-level concern that the requirement to call metadata.redirectTable before calling a any other method on metadata can be error prone in practice, as it seems to easy not to call the redirectTable and overlook the missing call also in the review process. I am sure you have thought about this very deeply already... if you happen to have any notes about alternatives considered, i would be curious to learn more.
There was a problem hiding this comment.
This concern does make sense. I had a similar feeling when I was implementing this PR. However, the table redirection approach seems to be an appropriate one to address the hive-iceberg migration issue. A few other approaches that have been discussed:
- Federation connector: Introduce a new connector which federates between Hive and Iceberg Connectors.
- Search path support
cc @electrum
There was a problem hiding this comment.
Thanks.
But also if we settle up on redirect, we could try to model the API so that there is a help from type system to make sure redirection is considered.
Like we have in functions -- function name, resolved function, etc.
There was a problem hiding this comment.
include "RenameRedirection" (or at least "redirect") in the toString.
Also consider replacing , with =>
There was a problem hiding this comment.
Included "Rename" as the pair reflects a rename relationship
There was a problem hiding this comment.
Add a check in getTableHandle and getTableHandleForStatisticsCollection like
verify(redirectTable(...).isEmpty(), "Table access should have been redirected for %s", tableName);
This will make sure (or improve probability that) if we miss to add metadata.redirectTable somewhere (in existing or new code), this will not lead to incorrect results or data corruption.
There was a problem hiding this comment.
Good idea. Added in them as well as some other methods.
presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Per general style guidelines, the .filter and .map above should be preceded with a new line.
I'd recommend to do this as a follow up though, not to incur bigger diff within this PR.
There was a problem hiding this comment.
Does this if buy as much?
regardless of the case, we call listTables(session, prefix).
then, in redirect case we call redirectTable on each item of the list, which is a very cheap operation if redirect is not enabled.
Thus, you could safely handle both cases with single code path.
There was a problem hiding this comment.
I don't understand clearly... Can you share a code snippet?
There was a problem hiding this comment.
@Override
public Stream<ListTablePrivilegesResult> listTablePrivilegesStream(ConnectorSession session, SchemaTablePrefix prefix)
{
ImmutableList.Builder<SchemaTableName> redirectedTables = ImmutableList.builder();
ImmutableList.Builder<SchemaTableName> notRedirectedTables = ImmutableList.builder();
listTables(session, prefix).forEach(table -> {
if (redirectTable(session, table).isPresent()) {
redirectedTables.add(table);
}
else {
notRedirectedTables.add(table);
}
});
return Stream.concat(
accessControlMetadata.listTablePrivileges(session, notRedirectedTables.build()).stream()
.map(grantInfo -> new ListTablePrivilegesResult(grantInfo.getSchemaTableName(), Optional.of(grantInfo))),
redirectedTables.build().stream()
.map(table -> new ListTablePrivilegesResult(table, Optional.empty())));
}There was a problem hiding this comment.
Are all these needed? i guess not, so please remove them, so the relevant configuration is easier to fish out
There was a problem hiding this comment.
Removed irrelevant configs.
There was a problem hiding this comment.
The Iceberg Connector is needed to test the redirection.
There was a problem hiding this comment.
Surely. I was commenting on the last line of the config only (iceberg.file-format=PARQUET)
There was a problem hiding this comment.
❤️ for a product test!
Did you create or plan to create a QueryRunner-based one too?
That wouldn't make the PT obsolete, but could be helpful for fine-grained testing or do debug things, should anything ever malfunction.
There was a problem hiding this comment.
Didn't create a QueryRunner-based one because we'll need to install both the Iceberg and Hive plugins in one query runner.
There was a problem hiding this comment.
Can you use both from presto-tests?
e2d6896 to
8d0014a
Compare
There was a problem hiding this comment.
make it final and perhaps @Immutable
29579c0 to
cedfa26
Compare
cedfa26 to
24c05fd
Compare
When enabled, Hive Connector would redirect a table access to an Iceberg catalog when the table is an Iceberg table.
24c05fd to
73147dc
Compare
phd3
left a comment
There was a problem hiding this comment.
preparatory commits look good. Reviewing main commits.
"Allow multiple warnings with the same warning code"
"Add WarningCollector to data definition task interface"
phd3
left a comment
There was a problem hiding this comment.
Reviewed commit Add table redirection methods to SPI.
I've the same high level concern as #5160 (comment). Did we have any further discussions or consensus on it?
| StringBuilder sb = new StringBuilder(); | ||
| visitedTableNames.forEach(name -> { | ||
| sb.append(name); | ||
| sb.append(" -> "); | ||
| }); | ||
| throw new TrinoException(TABLE_REDIRECTION_LIMIT, format("Too many table redirections (%d): %s", MAX_TABLE_REDIRECTIONS, sb.substring(0, sb.length() - 4))); |
There was a problem hiding this comment.
| StringBuilder sb = new StringBuilder(); | |
| visitedTableNames.forEach(name -> { | |
| sb.append(name); | |
| sb.append(" -> "); | |
| }); | |
| throw new TrinoException(TABLE_REDIRECTION_LIMIT, format("Too many table redirections (%d): %s", MAX_TABLE_REDIRECTIONS, sb.substring(0, sb.length() - 4))); | |
| String redirections = visitedTableNames.stream() | |
| .map(QualifiedObjectName::toString) | |
| .collect(Collectors.joining(" -> ")); | |
| throw new TrinoException(TABLE_REDIRECTION_LIMIT, format("Too many table redirections (%d): %s", MAX_TABLE_REDIRECTIONS, redirections)); |
| Identifier schemaName = (parts.size() > 1) ? parts.get(1) : new Identifier(objectName.getSchemaName()); | ||
| Identifier catalogName = (parts.size() > 2) ? parts.get(2) : new Identifier(objectName.getCatalogName()); | ||
|
|
There was a problem hiding this comment.
say c1.s1.t1 is redirected to c2.s2.t2. Then it seems that SHOW CREATE for view will return CREATE VIEW c1.s1.t1 .... but for table, it'll return CREATE TABLE c2.s2.t2. is that right? we may want to keep it consistent.
| sb.append(name); | ||
| sb.append(" -> "); | ||
| }); | ||
| throw new TrinoException(TABLE_REDIRECTION_LIMIT, format("Too many table rename redirections (%d): %s", MAX_TABLE_REDIRECTIONS, sb.substring(0, sb.length() - 4))); |
There was a problem hiding this comment.
same suggested change using Collectors.joining
|
|
||
| /** | ||
| * List the table privileges granted to the specified grantee for the tables that have the specified prefix considering the selected session role. | ||
| * For redirected tables, returns a {@link GetViewsResult} with an Optional.empty() field value in the stream. |
| if (isExistingRelation(session, objectName.get())) { | ||
| return ImmutableList.of(objectName.get()); | ||
| // Table cannot exist | ||
| if (objectName.get().getCatalogName().isEmpty() || objectName.get().getSchemaName().isEmpty() || objectName.get().getObjectName().isEmpty()) { |
There was a problem hiding this comment.
could you explain why we need this?
| // TODO: Add a functional warning collector | ||
| QualifiedObjectName tableName = redirectTable(session, objectName.get(), WarningCollector.NOOP); | ||
| if (isExistingRelation(session, tableName)) { | ||
| return ImmutableList.of(tableName); |
There was a problem hiding this comment.
this can cause some possible confusing behavior from SQL perspective, if A.T1 redirects to B.T2. May be just return original name?
> select table_schema, table_name from C.information_schema.tables where table_catalog ='C' and table_schema = 'A' and table_name='T1';
table_schema | table_name
---------------+---------------
B | T2
| @@ -1202,7 +1202,8 @@ protected Scope visitTable(Table table, Optional<Scope> scope) | |||
| // If materialized view is current, answer the query using the storage table | |||
| Optional<QualifiedName> storageName = getMaterializedViewStorageTableName(name); | |||
There was a problem hiding this comment.
nit: add a comment on this method saying that the caller should pass in the redirected table name
| * system tables, hidden columns, type support, etc., too. | ||
| * </li> | ||
| */ | ||
| default Optional<CatalogSchemaTableName> redirectTable(ConnectorSession session, SchemaTableName tableName) |
There was a problem hiding this comment.
may be we should also document applyTableScanRedirect to clarify the difference, but this can be done as followup.
| StringBuilder sb = new StringBuilder(); | ||
| visitedTableNames.forEach(name -> { | ||
| sb.append(name); | ||
| sb.append(" -> "); | ||
| }); | ||
| sb.append(tableName); | ||
| throw new TrinoException(TABLE_REDIRECTION_LOOP, "Table redirections form a loop: " + sb.toString()); |
There was a problem hiding this comment.
| StringBuilder sb = new StringBuilder(); | |
| visitedTableNames.forEach(name -> { | |
| sb.append(name); | |
| sb.append(" -> "); | |
| }); | |
| sb.append(tableName); | |
| throw new TrinoException(TABLE_REDIRECTION_LOOP, "Table redirections form a loop: " + sb.toString()); | |
| String redirectionChain = new StringBuilder() | |
| .append(visitedTableNames.stream() | |
| .map(QualifiedObjectName::toString) | |
| .collect(Collectors.joining(" -> "))) | |
| .append(" -> ") | |
| .append(tableName) | |
| .toString(); | |
| throw new TrinoException(TABLE_REDIRECTION_LOOP, "Table redirections form a loop: " + redirectionChain); |
(similarly in other method)
| * Redirects the target table or view name of a rename operation to another one which may or may not be in the same catalog. | ||
| * <p> | ||
| * This method is called before the engine tries to rename a table or view. Connectors are expected to return {@link Optional#empty()} if | ||
| * the redirection doesn't happen. The engine calls this method and {@link #redirectTable} to redirect both source and target tables in an |
There was a problem hiding this comment.
may be change the order here, since that's what engine's method is doing?
There was a problem hiding this comment.
Make this a Javadoc comment on the getter
There was a problem hiding this comment.
Make this a Javadoc comment on the getter
There was a problem hiding this comment.
Make all these result classes final
There was a problem hiding this comment.
Nit: wrap stream operations on separate line
return listTableColumns(session, prefix).entrySet().stream()
.map(entry -> new ListTableColumnsResult(entry.getKey(), Optional.of(entry.getValue())));Same for others
There was a problem hiding this comment.
Make this a Javadoc comment on the getter
There was a problem hiding this comment.
I think this can use join from String:
String redirections = join(" -> ", visitedTableNames);There was a problem hiding this comment.
Since we use a Set for visitedTableNames, it won't show the full chain, e.g., a -> b -> a. We could have a separate List<String> chain that we use for the error message:
chain.add(tableName);
if (!visitedTableNames.add(tableName)) {
break;
}There was a problem hiding this comment.
Was using forEach not possible?
There was a problem hiding this comment.
Let's introduce a method
verifyNotRedirected(session, tableName);|
Superseded by #7606 |
Add table redirection support to SPI
Added these methods to
ConnectorMetadata. Please review the Javadoc comments for intended engine behaviors.Add support to redirect from Hive to an Iceberg catalog
To enable this, we can either set Hive configs
and keep session properties as default, or change session properties to
Issue: #4442
Umbrella issue: #1324