Skip to content

Document existing Hive properties#10831

Closed
findinpath wants to merge 7 commits intotrinodb:masterfrom
findinpath:docs/hive-properties
Closed

Document existing Hive properties#10831
findinpath wants to merge 7 commits intotrinodb:masterfrom
findinpath:docs/hive-properties

Conversation

@findinpath
Copy link
Contributor

No description provided.

@mosabua mosabua removed the request for review from rosewms January 27, 2022 22:49
@findinpath findinpath force-pushed the docs/hive-properties branch from 5d5cbca to dc8cf74 Compare January 28, 2022 06:59
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes #6437

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it just says "if you want faster, enable this", which isn't the whole truth
The documentation should say when this is ok to set and when it's not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi can you help me come up with a better description here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Allow the metastore to assume that partition keys are normalized for non-string partition keys. This can lead to performance improvements for affected tables. Not that ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to

Allow the metastore to assume that the values of partition
columns can be converted to string values. This can lead to
performance improvements in the queries which apply filters
on the partition columns.
Note that the partition keys of type timestamp do
not get canonicalized. Default is false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it just says "if you want faster, enable this", which isn't the whole truth
The documentation should say when this is ok to set and when it's not.

Comment on lines 627 to 628
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should say when this is ok to set and when it's not.

I am missing knowledge in this area. For which types would it be a bad idea to set the parameter to true ?
Is there any automated test showcasing the limitations?

Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great addition.

@findinpath findinpath force-pushed the docs/hive-properties branch from 8f75a47 to e5803f7 Compare February 9, 2022 05:36
@findepi findepi requested review from mosabua and posulliv March 25, 2022 12:18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in other places below .. only one space character before code or between sentences

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case true is part of of the "Default" column and not the description sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true .. does the HMS actually run the delete of the actual data? I thought it just manages the metadata...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of dealing with tables of type MANAGED_TABLE , when dropping the table, HMS is responsible of dropping the content of the table as well. Do note that AWS Glue metastore does not provide the same functionality when dropping tables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

Allow the metastore to assume that partition keys are normalized for non-string partition keys. This can lead to performance improvements for affected tables. Not that ..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL of the SOCKS proxy to use to for connecting to the Thrift Hive metastore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running. Defaults to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the various other ones like that below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related question not to be answered here...

how to you add access to that class.. drop the related jars into the Hive connector plugin folder?

If this is a common use case we might need to document specifically .. or generically somewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static AWSCredentialsProvider getCustomAWSCredentialsProvider(String providerClass)
{
try {
Object instance = Class.forName(providerClass).getConstructor().newInstance();
if (!(instance instanceof AWSCredentialsProvider)) {
throw new RuntimeException("Invalid credentials provider class: " + instance.getClass().getName());
}
return (AWSCredentialsProvider) instance;
}
catch (ReflectiveOperationException e) {
throw new RuntimeException(format("Error creating an instance of %s", providerClass), e);
}
}

I'm assuming that this setting was thought to deal with one of the many credential providers which come with the aws-java-sdk-core library.

If the class is not in the library aws-java-sdk-core then it would need to be probably added to hive's plugin folder.

Related PR: #1363

@findinpath findinpath requested a review from mosabua September 8, 2022 05:28
@mosabua
Copy link
Member

mosabua commented Oct 25, 2022

@findinpath will you pick this up again and progress it towards merge ..

Fyi @m57lyra is leading a plan to heavily refactor the hive connector docs.

@findinpath
Copy link
Contributor Author

@mosabua I think that this PR can be closed then. Sorry for keeping it open for so long. I lost in the meantime the context to the PR and would very likely need to start over from scratch.

@findinpath findinpath closed this Oct 25, 2022
@mosabua
Copy link
Member

mosabua commented Oct 25, 2022

@colebow can you harvest whatever is useful from this PR and still missing in our docs and do a lift and shift... it would be a shame to loose all the good work from @findinpath .. to keep it simple we could break it up into a whole bunch of small PR that can be merged quickly and that can be spread across the team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants