Hive: Add timeout for TestHiveIcebergStorageHandlerWithEngine tests #2448

pvary · 2021-04-09T10:19:33Z

In #2437 we seem to hit an issue where the tests are failing because of resource problems.

Github infra handles the timeouts in this case, but when there is a timeout the logs are not collected.

We should handle the timeouts on the tests itself. This is a good practice and in this case we might be able to squeeze out more information about the problems from the logs.

Used a 40s timeout in most tests (usually they run around 5-10 s on my local machine), and used 100s for longer tests (they run around 20-30 s on my local machine). Let's see how they behave on the CI infra.

marton-bod · 2021-04-09T11:48:47Z

Can we use

@Rule
public Timeout timeout = new Timeout(40000, TimeUnit.MILLISECONDS);

as a global timeout rule in the class, and only override it for that single test method where more time is needed?

pvary · 2021-04-09T12:23:58Z

Can we use
@Rule
public Timeout timeout = new Timeout(40000, TimeUnit.MILLISECONDS);
as a global timeout rule in the class, and only override it for that single test method where more time is needed?

I have not known about this before. Added timeout by a global rule

pvary · 2021-04-12T14:59:09Z

Based on the logs the current issue is that the HMS thread pool is exhausted:

	
StdErr [pool-33-thread-1] WARN org.apache.thrift.server.TThreadPoolServer - Task has been rejected by ExecutorService 10 times till timedout, reason: java.util.concurrent.RejectedExecutionException: Task org.apache.thrift.server.TThreadPoolServer$WorkerProcess@4c3ca2ea rejected from java.util.concurrent.ThreadPoolExecutor@7dcefa7e[Running, pool size = 5, active threads = 5, queued tasks = 0, completed tasks = 12]
StdErr [24fd6de1-23be-4f21-8aab-b48ebd01f071 Time-limited test] WARN hive.metastore - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
StdErr org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
StdErr 	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
StdErr 	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
StdErr 	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
StdErr 	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
StdErr 	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
StdErr 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
StdErr 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:4247)
StdErr 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:4233)
StdErr 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:498)
StdErr 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:247)
StdErr 	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
StdErr 	at jdk.internal.reflect.GeneratedConstructorAccessor169.newInstance(Unknown Source)
StdErr 	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
StdErr 	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
StdErr 	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
StdErr 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
StdErr 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
StdErr 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
StdErr 	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
StdErr 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
StdErr 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
StdErr 	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:493)
StdErr 	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
StdErr 	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
StdErr 	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
StdErr 	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
StdErr 	at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
StdErr 	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
StdErr 	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:500)
StdErr 	at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:265)
StdErr 	at org.apache.iceberg.mr.hive.TestHiveShell.executeStatement(TestHiveShell.java:138)
StdErr 	at org.apache.iceberg.mr.hive.TestHiveIcebergStorageHandlerWithEngine.testJoinTablesSupportedTypes(TestHiveIcebergStorageHandlerWithEngine.java:266)

This might, or might not be the root cause.

pvary · 2021-04-13T14:26:10Z

We need the pool size change. Failed with the original size a second time

massdosage

LGTM, fun!

pvary · 2021-04-13T17:07:36Z

@RussellSpitzer: Could you please review?

Thanks,
Peter

RussellSpitzer

Looks reasonable to me :)

aokolnychyi · 2021-04-13T19:51:19Z

mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveShell.java

  public void start() {
    // Create a copy of the HiveConf for the metastore
-    metastore.start(new HiveConf(hs2Conf));
+    metastore.start(new HiveConf(hs2Conf), 10);


Do we know why the pool is exhausted? In the past, we had a few leaks in the Spark catalog code which led to this. It can be also a valid use case too if we simply need a larger pool.

Maybe, worth creating an issue if we want to take a look later.

Or is it related to the recent change around how we manage the pool?

I have 2 ideas:

Maybe this was caused by the Hive: Configure catalog type on table level. #2129 where we start to handle / cache the catalogs differently

Or we added some more queries and Hive creates / caches HMSClients and we might ended up using a different codepath.

Created #2474

pvary · 2021-04-14T07:33:56Z

Thanks @marton-bod, @massdosage and @RussellSpitzer for the review!

…pache#2448)

Hive: Add timeout for TestHiveIcebergStorageHandlerWithEngine tests

7efcc82

github-actions bot added the MR label Apr 9, 2021

Use @rule instead of adding timeout 1-by-1

4186e89

marton-bod approved these changes Apr 9, 2021

View reviewed changes

Increase the pool size and see what happens

b4b2989

github-actions bot added the hive label Apr 12, 2021

pvary force-pushed the timeout branch from 7168f86 to b4b2989 Compare April 13, 2021 14:25

massdosage approved these changes Apr 13, 2021

View reviewed changes

Change the pool size only for the TestHiveIcebergStorageHandler tests

c44a071

RussellSpitzer approved these changes Apr 13, 2021

View reviewed changes

aokolnychyi reviewed Apr 13, 2021

View reviewed changes

pvary mentioned this pull request Apr 14, 2021

Hive: Check the increased usage of HMSClients in TestHiveIcebergStorageHandlerWithEngine tests #2474

Closed

pvary merged commit c3640af into apache:master Apr 14, 2021

pvary mentioned this pull request Apr 14, 2021

CI run-tests (8) sometimes times out after 6 hours #2475

Closed

pvary deleted the timeout branch April 14, 2021 08:59

stevenzwu pushed a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021

Hive: Add timeout for TestHiveIcebergStorageHandlerWithEngine tests (a…

a7eece6

…pache#2448)

Hive: Add timeout for TestHiveIcebergStorageHandlerWithEngine tests #2448

Hive: Add timeout for TestHiveIcebergStorageHandlerWithEngine tests #2448

Uh oh!

Conversation

pvary commented Apr 9, 2021

Uh oh!

marton-bod commented Apr 9, 2021

Uh oh!

pvary commented Apr 9, 2021

Uh oh!

pvary commented Apr 12, 2021

Uh oh!

pvary commented Apr 13, 2021

Uh oh!

massdosage left a comment

Choose a reason for hiding this comment

Uh oh!

pvary commented Apr 13, 2021

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 13, 2021

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 13, 2021

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 13, 2021

Choose a reason for hiding this comment

Uh oh!

pvary Apr 14, 2021

Choose a reason for hiding this comment

Uh oh!

pvary commented Apr 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants