Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,9 @@ public HiveSyncConfig(Properties props) {

public HiveSyncConfig(Properties props, Configuration hadoopConf) {
super(props, hadoopConf);
HiveConf hiveConf = hadoopConf instanceof HiveConf
? (HiveConf) hadoopConf : new HiveConf(hadoopConf, HiveConf.class);
HiveConf hiveConf = new HiveConf();
// HiveConf needs to load Hadoop conf to allow instantiation via AWSGlueClientFactory
hiveConf.addResource(hadoopConf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not so sure if this is equivalent to holding the original hadoopConf as this changes the order of addResources() during constructing. We should be good only if we can verify the equivalence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the ideal approach is to make the passed-in hiveConf load hadoop conf properly to use AWSGlueClientFactory at the very beginning (when creating hive sync config) so that nothing needs to load at this point. cc @yihua

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading AWSGlueClientFactory property specifically should solve the issue on AWS side, but it's possible that there are other configs/custom configs passed in via Spark session, and won't be loaded by hard-coded logic. I still think loading entire hadoop conf here would be a safer choice.

And this change could change the order of adding resource for hive conf. I've gone thru the HiveConf constructor but didn't see any usage of resource during constructing so I think it shouldn't affect, maybe I've overlooked something?
An alternative solution would be always pass hadoopConf to HiveConf constructor. Wdyt?

Suggested change
hiveConf.addResource(hadoopConf);
HiveConf hiveConf = new HiveConf(hadoopConf, HiveConf.class);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it's possible that there are other configs/custom configs passed in via Spark session,

Is this a classical way people pass around hive options with spark?

An alternative solution would be always pass hadoopConf to HiveConf constructor

Does it introduce too much overhead then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danny0405,

  1. It's not the preferred way, but it might be the best way in cases like serverless applications since those can't modify hive-site.xml easily
  2. There is overhead to load hadoop conf to hive conf instead of casting it directly, but it's almost negligible imo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CTTY always creating a new HiveConf looks good to me.

setHadoopConf(hiveConf);
validateParameters();
}
Expand Down