Skip to content

Iceberg files distributed#4840

Closed
Parth-Brahmbhatt wants to merge 2 commits intotrinodb:masterfrom
Parth-Brahmbhatt:presto-iceberg-files-distributed
Closed

Iceberg files distributed#4840
Parth-Brahmbhatt wants to merge 2 commits intotrinodb:masterfrom
Parth-Brahmbhatt:presto-iceberg-files-distributed

Conversation

@Parth-Brahmbhatt
Copy link
Copy Markdown
Member

No description provided.

@cla-bot cla-bot bot added the cla-signed label Aug 14, 2020
@Parth-Brahmbhatt Parth-Brahmbhatt marked this pull request as draft August 14, 2020 21:43
@Parth-Brahmbhatt Parth-Brahmbhatt marked this pull request as ready for review August 18, 2020 21:51
@Parth-Brahmbhatt
Copy link
Copy Markdown
Member Author

@electrum when you get a chance , can you please review this?

Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing. I'm wondering if there is a way we can reuse more of the Iceberg library code for system tables so that we don't have to reimplement everything.

public FilesTable(Schema schema, TypeManager typeManager)
{
return new FixedPageSource(buildPages(tableMetadata, session, icebergTable, snapshotId));
ImmutableList.Builder<IcebergColumnHandle> columnHandleBuilder = new ImmutableList.Builder<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can chain

ImmutableList.Builder<IcebergColumnHandle> columnHandleBuilder = ImmutableList.builder()
        .add(FILE_PATH)
        .add(FILE_FORMAT)
        ...

columnHandleBuilder.add(KEY_METADATA);
columnHandleBuilder.add(SPLIT_OFFSETS);

List<Field> fields = Lists.newArrayList(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ImmutableList.of()

columnBuilder.build()
.forEach(column -> {
final Type type = toPrestoType(column.type(), typeManager);
List<Field> boundFields = Lists.newArrayList(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableList.of()


columnBuilder.build()
.forEach(column -> {
final Type type = toPrestoType(column.type(), typeManager);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't add final to local variables

new Field(Optional.of(VALUE_COUNTS), BIGINT),
new Field(Optional.of(NULL_VALUE_COUNTS), BIGINT));

final ImmutableList.Builder<Types.NestedField> columnBuilder = new ImmutableList.Builder<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done as a stream

schema.columns().stream()
        .flatMap(column -> handleNestedType(column, Optional.empty()))
        .map(column -> {
            ...
            return new IcebergColumnHandle(...);
        })
        .forEach(columnHandleBuilder::add);
        

{
SchemaTableName table = tableHandle.getSchemaTableName();
TableIdentifier tableIdentifier = tableHandle.toTableIdentifier();
if (MetadataTableType.from(tableIdentifier.name()) != null && tableIdentifier.namespace().levels().length == 2) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check the namespace levels length? When would it not be 2?


public static Table getIcebergTable(HiveMetastore metastore, HdfsEnvironment hdfsEnvironment, ConnectorSession session, IcebergTableHandle tableHandle)
{
SchemaTableName table = tableHandle.getSchemaTableName();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could move inside the if block


private static Table loadMetadataTable(MetadataTableType type, SchemaTableName table, HiveMetastore metastore, HdfsEnvironment hdfsEnvironment, HdfsContext hdfsContext, HiveIdentity identity)
{
if (type != null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to check since this method isn't called with null. It's the caller's responsibility to do the check.

if (type != null) {
TableOperations ops = new HiveTableOperations(metastore, hdfsEnvironment, hdfsContext, identity, table.getSchemaName(), table.getTableName());
if (ops.current() == null) {
throw new NoSuchTableException("Table does not exist: " + table);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should throw Presto TableNotFoundException

case PARTITIONS:
return new PartitionsTable(ops, baseTable);
default:
throw new NoSuchTableException("Unknown metadata table type: %s for %s", type, table);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug, so do

throw new VerifyException(format("Unknown metadata table type [%s] for table: %s", type, table));

@colebow
Copy link
Copy Markdown
Member

colebow commented Oct 19, 2022

👋 @Parth-Brahmbhatt - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

@colebow colebow closed this Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants