Skip to content

Add register_table procedure support for delta table#14779

Merged
ebyhr merged 1 commit intotrinodb:masterfrom
krvikash:delta-register_table-support
Nov 24, 2022
Merged

Add register_table procedure support for delta table#14779
ebyhr merged 1 commit intotrinodb:masterfrom
krvikash:delta-register_table-support

Conversation

@krvikash
Copy link
Copy Markdown
Contributor

@krvikash krvikash commented Oct 26, 2022

Description

Fixes #13568

  1. Procedure call -> delta.system.register_table(shcema_name => 'testdb', table_name => 'table1', table_location => 's3://my-bukcet/a/path/')
  2. By default CREATE TABLE with(location='***') will not allow to register table using existing location.
  3. Enable via delta.create-table-with-existing-location.enabled config property or create_table_with_existing_location_enabled session property to allow user to register table using CREATE TABLE statement (This support will be removed permanently after some release)
  4. By default register_table procedure is disabled. Enable it via delta.allow-register-table-procedure config property

Non-technical explanation

NA

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(X) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Delta Lake
* Allow registering existing table files in the metastore with the new
  [`register_table` procedure](delta-lake-register-table). ({issue}`13568`)
* Deprecate creating a new table with existing table content.
  This can be enabled using the `delta.legacy-create-table-with-existing-location.enabled`
  config property or `legacy_create_table_with_existing_location_enabled` session property. ({issue}`13568`)

@cla-bot cla-bot bot added the cla-signed label Oct 26, 2022
@krvikash krvikash force-pushed the delta-register_table-support branch 2 times, most recently from 05c7aaa to 5df0865 Compare October 27, 2022 10:20
@krvikash krvikash self-assigned this Oct 27, 2022
@krvikash krvikash force-pushed the delta-register_table-support branch 6 times, most recently from ce19fb2 to 81456e5 Compare October 28, 2022 21:09
@krvikash krvikash marked this pull request as ready for review October 28, 2022 21:39
@krvikash krvikash force-pushed the delta-register_table-support branch 8 times, most recently from ae2fba1 to 68cce18 Compare October 31, 2022 17:44
Copy link
Copy Markdown
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a test that shows the legacy syntax can be re-enabled?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just create the table using the Hive or Iceberg connectors, if you have one of those catalogs configured for the test

CREATE TABLE hive.schema.table_name ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm that I understand correctly, We can create a table using Hive or Iceberg connectors to verify the delta test cases where delta logs are missing?

@krvikash
Copy link
Copy Markdown
Contributor Author

krvikash commented Nov 1, 2022

Do you have a test that shows the legacy syntax can be re-enabled?

No, Not yet. I will add the test case for this.

@krvikash krvikash force-pushed the delta-register_table-support branch 3 times, most recently from d99b7d9 to bb666ee Compare November 2, 2022 05:57
@krvikash krvikash force-pushed the delta-register_table-support branch from c9bcd02 to fb11ca7 Compare November 14, 2022 11:15
@krvikash krvikash force-pushed the delta-register_table-support branch 2 times, most recently from c11fbc7 to 654cc96 Compare November 14, 2022 22:34
@krvikash
Copy link
Copy Markdown
Contributor Author

rebased and addressed comments.

@krvikash krvikash force-pushed the delta-register_table-support branch from 654cc96 to 91db0bb Compare November 14, 2022 23:03
@krvikash krvikash force-pushed the delta-register_table-support branch 2 times, most recently from 8e39889 to b7aa101 Compare November 15, 2022 07:42
@krvikash krvikash force-pushed the delta-register_table-support branch from b7aa101 to 9f60dad Compare November 16, 2022 08:43
@krvikash krvikash force-pushed the delta-register_table-support branch 2 times, most recently from c4ad97c to 55bdd02 Compare November 17, 2022 10:22
@krvikash
Copy link
Copy Markdown
Contributor Author

rebased and added BaseDeltaLakeConnectorSmokeTest#testRegisterTableWithTrailingSlashLocation

@krvikash krvikash force-pushed the delta-register_table-support branch from 55bdd02 to ba3308a Compare November 17, 2022 21:08
@findinpath
Copy link
Copy Markdown
Contributor

LGTM % some minor test related improvements.

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Nov 24, 2022

CI hit #15173

@hangc0276
Copy link
Copy Markdown

When I registered a delta table with an existing path and the new data kept writing to the path, I used SQL to query the data from the table and found it couldn't load new data.

I register a new table with the same path, and new data can be queried out.

Is it the expected behavior? @ebyhr @krvikash

@findinpath
Copy link
Copy Markdown
Contributor

@hangc0276 pls sketch your scenario in a new github issue - your question is not related to this PR.

To be taken into account when writing the new issue

... and the new data kept writing to the path

Was writing in the newly registered table happening through INSERT / UPDATE / MERGE operations via Trino/Spark or rather at the file level ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Distinguish creation of new table and registering existing table operations in Delta Lake connector

6 participants