Skip to content

Conversation

@manabery
Copy link
Contributor

Issue

Hive 2.2.0 added IS_REWRITE_ENABLED column to TBLS table. It doesn't accept NULL and not have default value until Hive 3.0.0.
Due to the change, the Hive Metastore migration tool fails to update TBLS table for Hive 2.2.0-2.3.x metastore because the column doesn't have a default value.

Sample error message when importing Glue metastore into Hive metastore 2.3.9.

2025-05-20 01:08:34,027 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):  
File "/tmp/export_from_datacatalog.py", line 139, in <module>  
main()  
File "/tmp/export_from_datacatalog.py", line 135, in main  
connection=glue_context.extract_jdbc_conf(connection_name)  
File "/tmp/export_from_datacatalog.py", line 41, in datacatalog_migrate_to_hive_metastore  
hive_metastore.export_to_metastore()  
File "/tmp/localPyFiles-01ce0548-f3f3-4847-81b9-7462fb89df85/hive_metastore_migration_patched.py", line 1506, in export_to_metastore  
self.write_table(table_name="TBLS", df=self.ms_tbls)  
File "/tmp/localPyFiles-01ce0548-f3f3-4847-81b9-7462fb89df85/hive_metastore_migration_patched.py", line 1466, in write_table  
'driver': MYSQL_DRIVER_CLASS  
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1444, in jdbc  
self.mode(mode)._jwrite.jdbc(url, table, jprop)  
File "/opt/amazon/spark/python/lib/py4j-0.10.9-srczip/py4j/java_gateway.py", line 1305, in __call__  
answer, self.gateway_client, self.target_id, self.name)  
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco  
return f(*a, **kw)  
File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value  
format(target_id, ".", name), value)  
py4j.protocol.Py4JJavaError: An error occurred while calling o1225.jdbc.  
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 140.0 failed 4 times, most recent failure: Lost task 1.3 in stage 140.0 (TID 1025) (172.31.49.177 executor 1): java.sql.BatchUpdateException: Field 'IS_REWRITE_ENABLED' doesn't have a default value

References:

Fix

If the IS_REWRITE_ENABLED column exists in TBLS table on the metastore, set the default value (0).

Tests

For Hive 2.3 to check if the issue is resolved

Steps

  1. Set up Hive 2.3.9 metastore on a AL2023 instance.
  2. Run the patched tool

Result

Succeeded and tables were imported.

MariaDB [hive]> describe TBLS;
+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| TBL_ID             | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME        | int(11)      | NO   |     | NULL    |       |
| DB_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| LAST_ACCESS_TIME   | int(11)      | NO   |     | NULL    |       |
| OWNER              | varchar(767) | YES  |     | NULL    |       |
| RETENTION          | int(11)      | NO   |     | NULL    |       |
| SD_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| TBL_NAME           | varchar(256) | YES  | MUL | NULL    |       |
| TBL_TYPE           | varchar(128) | YES  |     | NULL    |       |
| VIEW_EXPANDED_TEXT | mediumtext   | YES  |     | NULL    |       |
| VIEW_ORIGINAL_TEXT | mediumtext   | YES  |     | NULL    |       |
| IS_REWRITE_ENABLED | bit(1)       | NO   |     | NULL    |       |
+--------------------+--------------+------+-----+---------+-------+
12 rows in set (0.001 sec)

For Hive 2.1 to check if it works with the TBLS table which doesn't have IS_REWRITE_ENABLED column

Steps

  1. Set up Hive 2.1.1 metastore on a AL2023 instance.
  2. Run the patched tool

Result

Succeeded and tables were imported.

MariaDB [hive]> describe TBLS;
+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| TBL_ID             | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME        | int(11)      | NO   |     | NULL    |       |
| DB_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| LAST_ACCESS_TIME   | int(11)      | NO   |     | NULL    |       |
| OWNER              | varchar(767) | YES  |     | NULL    |       |
| RETENTION          | int(11)      | NO   |     | NULL    |       |
| SD_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| TBL_NAME           | varchar(128) | YES  | MUL | NULL    |       |
| TBL_TYPE           | varchar(128) | YES  |     | NULL    |       |
| VIEW_EXPANDED_TEXT | mediumtext   | YES  |     | NULL    |       |
| VIEW_ORIGINAL_TEXT | mediumtext   | YES  |     | NULL    |       |
+--------------------+--------------+------+-----+---------+-------+
11 rows in set (0.001 sec)

For Hive 3.x to test regression

Steps

  1. Set up Hive 3.1.3 metastore on a AL 2023 instance.
  2. Run the patched tool

Result

Succeeded and tables were imported

MariaDB [hive]> describe TBLS;
+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| TBL_ID             | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME        | int(11)      | NO   |     | NULL    |       |
| DB_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| LAST_ACCESS_TIME   | int(11)      | NO   |     | NULL    |       |
| OWNER              | varchar(767) | YES  |     | NULL    |       |
| OWNER_TYPE         | varchar(10)  | YES  |     | NULL    |       |
| RETENTION          | int(11)      | NO   |     | NULL    |       |
| SD_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| TBL_NAME           | varchar(256) | YES  | MUL | NULL    |       |
| TBL_TYPE           | varchar(128) | YES  |     | NULL    |       |
| VIEW_EXPANDED_TEXT | mediumtext   | YES  |     | NULL    |       |
| VIEW_ORIGINAL_TEXT | mediumtext   | YES  |     | NULL    |       |
| IS_REWRITE_ENABLED | bit(1)       | NO   |     | b'0'    |       |
+--------------------+--------------+------+-----+---------+-------+
13 rows in set (0.001 sec)

…ration tool

* Added default value to IS_REWRITE_COLUMN if it's present in TBLS table.
@moomindani moomindani merged commit f1d4142 into aws-samples:master May 20, 2025
@moomindani
Copy link
Contributor

Thanks for your contribution, this PR looks perfect.

@moomindani moomindani self-assigned this May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants