-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Hive: Add timeout for TestHiveIcebergStorageHandlerWithEngine tests #2448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can we use as a global timeout rule in the class, and only override it for that single test method where more time is needed? |
I have not known about this before. Added timeout by a global rule |
|
Based on the logs the current issue is that the HMS thread pool is exhausted: This might, or might not be the root cause. |
|
We need the pool size change. Failed with the original size a second time |
massdosage
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, fun!
|
@RussellSpitzer: Could you please review? Thanks, |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me :)
| public void start() { | ||
| // Create a copy of the HiveConf for the metastore | ||
| metastore.start(new HiveConf(hs2Conf)); | ||
| metastore.start(new HiveConf(hs2Conf), 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why the pool is exhausted? In the past, we had a few leaks in the Spark catalog code which led to this. It can be also a valid use case too if we simply need a larger pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, worth creating an issue if we want to take a look later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it related to the recent change around how we manage the pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 2 ideas:
- Maybe this was caused by the Hive: Configure catalog type on table level. #2129 where we start to handle / cache the catalogs differently
- Or we added some more queries and Hive creates / caches HMSClients and we might ended up using a different codepath.
Created #2474
|
Thanks @marton-bod, @massdosage and @RussellSpitzer for the review! |
In #2437 we seem to hit an issue where the tests are failing because of resource problems.
Github infra handles the timeouts in this case, but when there is a timeout the logs are not collected.
We should handle the timeouts on the tests itself. This is a good practice and in this case we might be able to squeeze out more information about the problems from the logs.
Used a 40s timeout in most tests (usually they run around 5-10 s on my local machine), and used 100s for longer tests (they run around 20-30 s on my local machine). Let's see how they behave on the CI infra.