Skip to content

Conversation

@tomtongue
Copy link
Contributor

Changes

Adding the StroageDescriptor parameter to a renamed table in the Glue Data Catalog by ALTER TABLE RENAME TO with Iceberg.

Current situation

The ALTER TABLE RENAME TO supported in Iceberg operates CREATE A NEW TABLE and DROP TABLE because the Glue Data Catalog doesn’t support renaming tables.

Currently, the StorageDescriptor part is not copied when running ALTER TABLE RENAME TO with Iceberg as the following example.

Example

Before the table is renamed:

➜  ~ aws glue get-table --database-name db --name iceberg_1636000905
{
    "Table": {
        "Name": "iceberg_1636000905",
        "DatabaseName": "db",
        "CreateTime": 1636000913.0,
        "UpdateTime": 1636000913.0,
        "Retention": 0,
        "StorageDescriptor": {
            "Columns": [
                {
                    "Name": "id",
                    "Type": "bigint",
                    "Parameters": {
                        "iceberg.field.id": "1",
                        "iceberg.field.optional": "true",
                        "iceberg.field.type.string": "bigint",
                        "iceberg.field.type.typeid": "LONG",
                        "iceberg.field.usage": "schema-column"
                    }
                },
                {
                    "Name": "data",
                    "Type": "string",
                    "Parameters": {
                        "iceberg.field.id": "2",
                        "iceberg.field.optional": "true",
                        "iceberg.field.type.string": "string",
                        "iceberg.field.type.typeid": "STRING",
                        "iceberg.field.usage": "schema-column"
                    }
                }
            ],
            "Location": "s3://.../iceberg_1636000905",
 // omitted ...
            "StoredAsSubDirectories": false
        },
        "TableType": "EXTERNAL_TABLE",
        "Parameters": {
            "metadata_location": "s3://.../iceberg_1636000905/metadata/00000-18f06867-fa8b-44b3-b25d-424b88a3b683.metadata.json",
            "table_type": "ICEBERG"
        },
        "CreatedBy": "arn:aws:sts::account_id:role_name",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "account_id",
        "IsRowFilteringEnabled": false
    }
}

After the table is renamed: => StorageDescriptor was missed.

➜  ~ aws glue get-table --database-name db --name iceberg_1636000905_rename
{
    "Table": {
        "Name": "iceberg_1636000905_rename",
        "DatabaseName": "db",
        "CreateTime": 1636001356.0,
        "UpdateTime": 1636001356.0,
        "Retention": 0,
        "TableType": "EXTERNAL_TABLE",
        "Parameters": {
            "metadata_location": "s3://.../iceberg_1636000905/metadata/00000-18f06867-fa8b-44b3-b25d-424b88a3b683.metadata.json",
            "table_type": "ICEBERG"
        },
        "CreatedBy": "arn:aws:sts::account_id:role_name",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "account_id",
        "IsRowFilteringEnabled": false
    }
}

After this change

The StorageDescriptor part is filled in by the iceberg renameTable operation.

Example

Before the table is renamed:

➜  ~ aws glue get-table --database-name db --name iceberg_1636005278
{
    "Table": {
        "Name": "iceberg_1636005278",
        "DatabaseName": "db",
        "CreateTime": 1636005285.0,
        "UpdateTime": 1636005285.0,
        "Retention": 0,
        "StorageDescriptor": {
            "Columns": [
                {
                    "Name": "id",
                    "Type": "bigint",
                    "Parameters": {
                        "iceberg.field.id": "1",
                        "iceberg.field.optional": "true",
                        "iceberg.field.type.string": "bigint",
                        "iceberg.field.type.typeid": "LONG",
                        "iceberg.field.usage": "schema-column"
                    }
                },
                {
                    "Name": "data",
                    "Type": "string",
                    "Parameters": {
                        "iceberg.field.id": "2",
                        "iceberg.field.optional": "true",
                        "iceberg.field.type.string": "string",
                        "iceberg.field.type.typeid": "STRING",
                        "iceberg.field.usage": "schema-column"
                    }
                }
            ],
            "Location": "s3://.../iceberg_1636005278",
            "Compressed": false,
            "NumberOfBuckets": 0,
            "SortColumns": [],
            "StoredAsSubDirectories": false
        },
        "TableType": "EXTERNAL_TABLE",
        "Parameters": {
            "metadata_location": "s3://.../iceberg_1636005278/metadata/00000-1b5f7c9b-01ec-42a3-88da-f62227e74e74.metadata.json",
            "table_type": "ICEBERG"
        },
        "CreatedBy": "arn:aws:sts::account_id:role_name",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "account_id",
        "IsRowFilteringEnabled": false
    }
}

After the table is renamed: => StorageDescriptor was copied from the previous version of table.

➜  ~ aws glue get-table --database-name db --name iceberg_1636005278_rename
{
    "Table": {
        "Name": "iceberg_1636005278_rename",
        "DatabaseName": "db",
        "CreateTime": 1636005530.0,
        "UpdateTime": 1636005530.0,
        "Retention": 0,
        "StorageDescriptor": {
            "Columns": [
                {
                    "Name": "id",
                    "Type": "bigint",
                    "Parameters": {
                        "iceberg.field.id": "1",
                        "iceberg.field.optional": "true",
                        "iceberg.field.type.string": "bigint",
                        "iceberg.field.type.typeid": "LONG",
                        "iceberg.field.usage": "schema-column"
                    }
                },
                {
                    "Name": "data",
                    "Type": "string",
                    "Parameters": {
                        "iceberg.field.id": "2",
                        "iceberg.field.optional": "true",
                        "iceberg.field.type.string": "string",
                        "iceberg.field.type.typeid": "STRING",
                        "iceberg.field.usage": "schema-column"
                    }
                }
            ],
            "Location": "s3://.../iceberg_1636005278",
            "Compressed": false,
            "NumberOfBuckets": 0,
            "SortColumns": [],
            "StoredAsSubDirectories": false
        },
        "TableType": "EXTERNAL_TABLE",
        "Parameters": {
            "metadata_location": "s3://.../iceberg_1636005278/metadata/00000-1b5f7c9b-01ec-42a3-88da-f62227e74e74.metadata.json",
            "table_type": "ICEBERG"
        },
        "CreatedBy": "arn:aws:sts::account_id:role_name",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "account_id",
        "IsRowFilteringEnabled": false
    }
}

Best regards,
Tom

@github-actions github-actions bot added the AWS label Nov 4, 2021
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! This looks good to me

@jackye1995
Copy link
Contributor

@yyanyy could you also take a look?

Copy link
Contributor

@yyanyy yyanyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the fix!

@jackye1995
Copy link
Contributor

Cool, will merge after CI passes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants