Skip to content

Conversation

@guyco33
Copy link
Member

@guyco33 guyco33 commented Aug 4, 2022

Issue happens on hive2.3.
Sending empty stats will generate empty columns list that will return null value and will make this to fail on java.lang.NullPointerException: Cannot invoke "java.util.List.iterator()" because "stats" is null

On hive3 issue doesn't happen since empty column list result with empty list

On 387 the issue happens only when it's the first write before any other "mixed" table with both date and non-date columns - that's why it's hard to catch it, since most of tables contains both date and non-date columns.

On 388+ it seems to constantly happen.

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a product test? I confirmed we can reproduce the issue in EnvSinglenode.

@guyco33
Copy link
Member Author

guyco33 commented Aug 6, 2022

I wonder what change done after 387 that makes metastoreSupportDateStatistics not to survive between executions ...

@findepi
Copy link
Member

findepi commented Aug 8, 2022

I wonder what change done after 387 that makes metastoreSupportDateStatistics not to survive between executions ...

#12806 is the only candidate i could find.

If metastoreSupportsDateStatistics is reset, then ThriftHiveMetastore#chosen* fields are too, which probably incurs performance penalty as we keep retrying the exception-driven code path.

@dain, do you know whether #12806 change the lifecycle of ThriftHiveMetastore objects?

@findepi findepi added the bug label Aug 8, 2022
@findinpath
Copy link
Contributor

Please try making the commit comment a bit more expressive:

Fix write failure on hive table with only date columns

Issue happens on Apache hive 2.3.
On 387 the issue happens only when it's the first write before any other "mixed" table with both date and non-date columns.
On current release, it seems to constantly happen.
Avoid persisting empty column statistics

....

https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages

(and try to keep the length of the lines in the commit less than 80 chars)

@guyco33
Copy link
Member Author

guyco33 commented Aug 8, 2022

I approve that 388 behaves the same as 391 so the resetting issue of metastoreSupportsDateStatistics started on 388

@guyco33 guyco33 changed the title Fix write failure on hive table with only date columns Avoid writing to Hive an empty list of column stats because Hive 2.x fails Aug 8, 2022
@findepi findepi merged commit 760615f into trinodb:master Aug 9, 2022
@findepi
Copy link
Member

findepi commented Aug 9, 2022

This is a good change, but I am still concerned about #13502 (comment).
If metastoreSupportsDateStatistics is reset, the chosen* fields are likely reset as well. This means we have for-ever-exception-driven logic in thrift calls.

@guyco33 are you able to verify this?

cc @dain @electrum

@findepi findepi mentioned this pull request Aug 9, 2022
@github-actions github-actions bot added this to the 393 milestone Aug 9, 2022
@guyco33
Copy link
Member Author

guyco33 commented Aug 9, 2022

Thrift calls will reach the exception code only for hive Metastore that doesn't support date statistics - It should be for earlier hive versions before 1.2.0
I didn't find product tests for such earlier hive Metastore versions in EnvSinglenode

@guyco33
Copy link
Member Author

guyco33 commented Aug 10, 2022

It seems now that every table with both date and non-date columns will execute 2 Thrift calls while the code that test the value of metastoreSupportDateStatistics will never execute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants