Skip to content

Upgrade to Hive 3.0.0#15659

Merged
rschlussel merged 1 commit intoprestodb:masterfrom
imjalpreet:HiveUpgrade
Mar 1, 2021
Merged

Upgrade to Hive 3.0.0#15659
rschlussel merged 1 commit intoprestodb:masterfrom
imjalpreet:HiveUpgrade

Conversation

@imjalpreet
Copy link
Copy Markdown
Member

@imjalpreet imjalpreet commented Feb 1, 2021

This PR includes the base changes required for upgrading to Hive 3. This will help in supporting and bringing a number of interesting features in Hive 3 to Presto in the future.

This depends on https://github.com/prestodb/presto-hive-apache/releases/tag/3.0.0-2 and the following PR in tempto (prestodb/tempto#272)

depends on https://github.com/facebookexternal/presto-facebook/pull/1414

@aweisberg
Copy link
Copy Markdown
Contributor

If there are test failures introduced by a commit it's better to have the fixes in the commit and not separate. Otherwise there are failing commits on master that interfere with bisecting.

This seems to have introduced another dependency related failure. It's caught in the FB integration builds.

[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireUpperBoundDeps failed with message:
Failed while enforcing RequireUpperBoundDeps. The error(s) are [
Require upper bound dependencies error for com.facebook.presto.hive:hive-apache:1.2.2-2 paths to dependency are:
+-com.facebook.presto:presto-namespace:0.248-SNAPSHOT
  +-com.facebook.presto:presto-hive-metastore:0.248-SNAPSHOT
    +-com.facebook.presto.hive:hive-apache:1.2.2-2 (managed) <-- com.facebook.presto.hive:hive-apache:3.0.0-2
, 
Require upper bound dependencies error for org.apache.thrift:libthrift:0.9.1 paths to dependency are:
+-com.facebook.presto:presto-namespace:0.248-SNAPSHOT
  +-com.facebook.presto:presto-hive-metastore:0.248-SNAPSHOT
    +-org.apache.thrift:libthrift:0.9.1 (managed) <-- org.apache.thrift:libthrift:0.9.3
]

@aweisberg aweisberg self-requested a review February 8, 2021 16:52
@aweisberg
Copy link
Copy Markdown
Contributor

I'll try and get Tempto through since it is one of the dependencies.

@imjalpreet
Copy link
Copy Markdown
Member Author

imjalpreet commented Feb 9, 2021

@aweisberg Yes once the tempto changes are through I will update this PR(squash commits as well) and make sure all the test cases pass before the code is reviewed.

Regarding the Facebook Integration build failure, it looks like there are some additional plugins in FB like presto-namespace which will also have to be modified for Hive 3 Upgrade. Currently, this additional plugin is using some dependencies which have been upgraded in this PR due to which we are seeing the errors. I am not sure if this plugin (presto-namespace) is open-sourced.

@aweisberg
Copy link
Copy Markdown
Contributor

Thanks I missed that. I thought presto-namespace was one of the OSS repos. Still trying to figure out how to release Tempto.

@aweisberg
Copy link
Copy Markdown
Contributor

So the issue is that we can't release Tempto because it's under io.prestodb which isn't controlled by the Linux Foundation.

We either need to find out who has the credentials for updating io.prestodb and get them to provide access or we need to refactor Tempto so we can upload it to com.facebook.presto.

I can ask around about updating io.prestodb but it could take a while, and I can't guarantee it will happen. I broached the subject of refactoring it under com.facebook.presto and so far no one has objected.

If you want to do the refactor of Tempto it would unblock this PR otherwise we have to wait on trying to get the credentials back. Sorry we are getting blocked on something so basic.

@imjalpreet
Copy link
Copy Markdown
Member Author

@aweisberg Thanks for looking into this. I can take up the refactoring task and raise another PR once it's ready.

@aweisberg
Copy link
Copy Markdown
Contributor

Great, so the package we want is com.facebook.prestodb.tempto and find ./ -type f -exec sed -i '' -e 's/io.prestodb/com.facebook.prestodb/g' {} \; can replace anything that refactoring in IntelliJ doesn't manage to do.

@aweisberg
Copy link
Copy Markdown
Contributor

Asking Sonatype if we can get access to io.prestodb.tempto

@aweisberg
Copy link
Copy Markdown
Contributor

aweisberg commented Feb 11, 2021

I was able to upload the snapshot artifacts. Can you rebase your PR and update it to use the 1.51-SNAPSHOT artifact and we will see if the tests complete?

@imjalpreet imjalpreet force-pushed the HiveUpgrade branch 2 times, most recently from 57acf75 to ce9dfbc Compare February 12, 2021 13:56
@imjalpreet
Copy link
Copy Markdown
Member Author

@aweisberg I had made the respective changes. The product tests have passed but the tests for presto-main module seem to have failed. It doesn't look like that is because of the code changes in this PR, looks like it was because of some crash while the tests were running. Maybe if you can re-trigger the module, it would pass successfully.

@imjalpreet imjalpreet marked this pull request as ready for review February 12, 2021 16:17
Copy link
Copy Markdown
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two minor change requests. The next step is for me to deploy this on a cluster and shadow some traffic to sanity check. I don't want this to get reverted during the release process.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep the refactor from the Trino PR? trinodb/trino@238f4b6#diff-23fd4cecda2fe420dc9a4fea750e785439046225ed9f013dee3bc123f1c9c2e5R1202

Just to keep the two code bases more consistent.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aweisberg I have made the refactor consistent with Trino as you requested.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't take the latest version of BloomFilter from Trino?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aweisberg The latest version was not taken with the idea of making the changes gradually to understand the impact. The class being used in trino is forked from org.apache.orc.util.BloomFilter and we had been referring to org.apache.hive.common.util.BloomFilter.

Let me check the impact and the changes required and get back.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aweisberg I reviewed the BloomFilter Class being used in Trino. As I mentioned earlier, one of the major change is they have forked from org.apache.orc.util.BloomFilter and post that they have made some optimisations on top of that. A couple of enhancements and optimisations include changes in ORC Bloom filter hash, added support for ORC UTF8 bloom filters, real type in ORC bloom filter and a few more.

Would it be better to have separate PRs for these enhancements?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have been fine with that as well :-) Thanks for doing it now. I am still trying to figure out why I can't publish Tempto. It really does seem to be uploading without error.

@aweisberg
Copy link
Copy Markdown
Contributor

aweisberg commented Feb 17, 2021

It's released! At long last our national nightmare is over.

Bad news, there are some shading issues I think. FB integration is failing on duplicate-finder-maven-plugin which I think is finding legitimate issues even outside the FB integration. We should probably enable the duplicate finder on Presto Hive https://pastebin.com/AHxAnJNF

Still looking into this to make sure we aren't pulling in org.apache.parquet directly by mistake.

@imjalpreet
Copy link
Copy Markdown
Member Author

It's released! At long last our national nightmare is over.

That's great news.

Bad news, there are some shading issues I think. FB integration is failing on duplicate-finder-maven-plugin which I think is finding legitimate issues even outside the FB integration. We should probably enable the duplicate finder on Presto Hive https://pastebin.com/AHxAnJNF

Sure I will also try to have a look into the same.

@imjalpreet imjalpreet force-pushed the HiveUpgrade branch 2 times, most recently from 7f64dcc to 8456314 Compare February 25, 2021 14:01
Copy link
Copy Markdown
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I verified this on a cluster. Turns out the issue was that the Tempto 1.51-SNAPSHOT was probably just unmodified 1.50 since I was failing it update the artifacts. Switching to the 1.51 release artifact made the dependency issues go away.

@aweisberg aweisberg requested a review from a team February 25, 2021 15:06
Copy link
Copy Markdown
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 questions, otherwise looks good.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for the changes for decimal?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructor argument order is reversed in Hive 3. Trino also just switches the parameters around.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this test removed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer, because that is what Trino did trinodb/trino@238f4b6#diff-82a51d3cf239ee341a4eab63c8c88c82367b389a9652cbaa067631d131a60930

Long answer... I don't know. I'll bring it back and see what happens when I run it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test still passes. So I guess we should keep it!

Copy link
Copy Markdown
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imjalpreet can you bring back TestShardWriter.testWriterZeroRows? I am not sure why it was removed in Trino, but I brought it back and the test still passes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test still passes. So I guess we should keep it!

Fix TestOrcBatchPageSourceMemoryTracking unit test failures

Fork BloomFilter class from org.apache.hive.common.util.BloomFilter

Fix presto-orc plugin unit test failures

Co-authored-by: David Phillips <david@acz.org>
@imjalpreet
Copy link
Copy Markdown
Member Author

imjalpreet commented Feb 25, 2021

@aweisberg I have reverted that test removal as requested. I will try to understand why was it removed in Trino.

Copy link
Copy Markdown
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @imjalpreet

@rschlussel
Copy link
Copy Markdown
Contributor

Ignoring travis failure since we're migrating from travis. Thanks for the contribution!

@rschlussel rschlussel merged commit fd32fea into prestodb:master Mar 1, 2021
@aweisberg
Copy link
Copy Markdown
Contributor

I was mistaken the joda upgrade was in #15649, hah I should know!

@imjalpreet
Copy link
Copy Markdown
Member Author

I was mistaken the joda upgrade was in #15649, hah I should know!

Ah okay, no issues!

@aweisberg
Copy link
Copy Markdown
Contributor

This did end up failing on at least one different issue. CONTAINS on an array was returning false and never true when it should and the # of input rows counted by the aggregation for that query also didn't match 0.247.

I am trying to factor out a reproducer now.

@aweisberg
Copy link
Copy Markdown
Contributor

Narrowing it down some, the issue with Hive 3 is that arrays of varchar are coming back in queries as an array containing a single varchar which is a , separated list of the array contents. I haven't factored that out into a reproducer yet.

You can tell from the CLI because instead of seeing a space after the , between each entry it's just one long run on.

@aweisberg
Copy link
Copy Markdown
Contributor

Still can't reproduce, but I noticed that the format of the table that creates this issue is TEXTFILE. I can create my own TEXTFILE format table with an array it in, and select it out, but I don't see that the values in the array being merged :-(

@aweisberg
Copy link
Copy Markdown
Contributor

LOL https://forums.aws.amazon.com/thread.jspa?threadID=264065

@aweisberg
Copy link
Copy Markdown
Contributor

aweisberg commented Mar 10, 2021

https://github.com/trinodb/trino/pull/1733/files
https://github.com/trinodb/trino/pull/1321/files

Release cut is Friday

@imjalpreet
Copy link
Copy Markdown
Member Author

@aweisberg thanks for the detailed investigation. A bug due to correcting a misspelled property that's a first. I will try to get in the fix before the release cut most probably by today evening.

@highker
Copy link
Copy Markdown

highker commented Mar 11, 2021

Thanks @imjalpreet putting this up. While working on the fix, please take a look at #15589. It depends which PR merges first. We will need to add materialized view support for Hive 3.0

@imjalpreet
Copy link
Copy Markdown
Member Author

@highker Sure, I will have a look at the materialized view PR.

@aweisberg
Copy link
Copy Markdown
Contributor

@highker I am not 100% following. Do we need to deliver both at the same time? If we only release the Hive 3 upgrade is that ok?

@highker
Copy link
Copy Markdown

highker commented Mar 11, 2021

@aweisberg, @imjalpreet, for example, we need to model materialized views as managed tables today (#15589 (comment)). This can be changed after Hive 3.0

@arhimondr arhimondr mentioned this pull request Mar 12, 2021
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants