-
Notifications
You must be signed in to change notification settings - Fork 834
Description
I am trying to migrate Glue Catalog to Hive Metastore of an EMR Cluster ( I used an external MySQL database as my Hive metastore).
I followed all the steps to migrate directly from AWS Glue to Hive, but i experienced " 'str' object has no attribute '_jdf' "when i run the Glue ETL job. See the full error message below:
2021-11-11 09:33:53,573 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
File "/tmp/export_from_datacatalog.py", line 138, in
main()
File "/tmp/export_from_datacatalog.py", line 134, in main
connection=glue_context.extract_jdbc_conf(connection_name)
File "/tmp/export_from_datacatalog.py", line 38, in datacatalog_migrate_to_hive_metastore
transform_databases_tables_partitions(sc, sql_context, hive_metastore, databases, tables, partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1445, in transform_databases_tables_partitions
.transform(hms=hive_metastore, databases=databases, tables=tables, partitions=partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1227, in transform
(ms_sds, ms_tbls, ms_partitions) = self.extract_sds(ms_tbls, ms_partitions)
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 1018, in extract_sds
.drop_columns(['ID', 'type'])
File "/tmp/localPyFiles-3222c3b6-ae99-42e0-be66-ac44ed10e9ab/hive_metastore_migration.py", line 182, in drop_columns
df = df.drop(col)
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 2519, in drop
jdf = self._jdf.drop(self._jseq(cols))
AttributeError: 'str' object has no attribute '_jdf'