Skip to content

Add support of creating iceberg table with existing data#14273

Closed
krvikash wants to merge 2 commits intotrinodb:masterfrom
krvikash:iceberg-support-create-table-with-exiting-data
Closed

Add support of creating iceberg table with existing data#14273
krvikash wants to merge 2 commits intotrinodb:masterfrom
krvikash:iceberg-support-create-table-with-exiting-data

Conversation

@krvikash
Copy link
Copy Markdown
Contributor

Description

This draft change supports creating an iceberg table using existing metadata and data.

https://trino.io/docs/current/connector/delta-lake.html#creating-tables

TODO: Add test cases

Syntax:

CREATE TABLE iceberg.default.source (dummy bigint) WITH (location= 'hdfs://hadoop-master:9000/user/hive/warehouse/iceberg_src_1-d7f665f9ece54a99bd7046f89f754b28' );

User has to provide a dummy column. The column will be ignored and the columns from the metadata will be used to create the table.

Non-technical explanation

NA

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(X) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Sep 23, 2022
@krvikash krvikash force-pushed the iceberg-support-create-table-with-exiting-data branch 2 times, most recently from 02fcfea to 5b2f382 Compare September 23, 2022 18:07
@krvikash krvikash changed the title Iceberg support create table with exiting data Add support of creating iceberg table with exiting data Sep 23, 2022
@krvikash krvikash force-pushed the iceberg-support-create-table-with-exiting-data branch from 5b2f382 to 9541ae7 Compare September 23, 2022 20:51
@krvikash
Copy link
Copy Markdown
Contributor Author

There are few test cases in trino-iceberg where we create table using "with location", location does not exist and empty table gets created. These test cases are failing with this change.
Eg: io.trino.plugin.iceberg.BaseIcebergConnectorTest.testCleaningUpWithTableWithSpecifiedLocation

Current behaviour:

  1. If location does not exist that is provided by user --> Empty table gets created
  2. If location does exist that is provided by user --> Throws an exception "Cannot create a table on a non-empty location: %s, set 'iceberg.unique-table-location=true' in your Iceberg catalog properties to use unique table locations for every table."

What will be the expected behaviour for the above two points?

@krvikash krvikash force-pushed the iceberg-support-create-table-with-exiting-data branch 2 times, most recently from a373515 to bdc8e86 Compare September 26, 2022 10:38
@krvikash krvikash changed the title Add support of creating iceberg table with exiting data Add support of creating iceberg table with existing data Sep 26, 2022
@krvikash krvikash force-pushed the iceberg-support-create-table-with-exiting-data branch from bdc8e86 to bf7995d Compare September 26, 2022 10:49
@krvikash krvikash force-pushed the iceberg-support-create-table-with-exiting-data branch from bf7995d to 7320e25 Compare September 26, 2022 13:36
Comment on lines +1519 to +1520
int version = parseVersion(fileEntry.path());
if (version > latestMetadataVersion) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that there could be a tie here. For example, if two writers tried updating the table at the same time. If we detect that there is a tie for the highest version number, we should throw an exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take care in register_table procedure.

@krvikash
Copy link
Copy Markdown
Contributor Author

Closing this pull request as we have decided to use register_table procedure to support table creation using existing data (Please see #13552 (comment)) . I will raise a new pull request with register_table procedure support in iceberg.

@krvikash krvikash closed this Sep 27, 2022
@krvikash
Copy link
Copy Markdown
Contributor Author

PR #14375 raised for register_table procedure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants