Skip to content

Improve startup time for tests using HiveMinioDataLake#14561

Closed
findepi wants to merge 1 commit intotrinodb:masterfrom
findepi:findepi/improve-startup-time-for-tests-using-hiveminiodatalake-c20056
Closed

Improve startup time for tests using HiveMinioDataLake#14561
findepi wants to merge 1 commit intotrinodb:masterfrom
findepi:findepi/improve-startup-time-for-tests-using-hiveminiodatalake-c20056

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Oct 11, 2022

HiveMinioDataLake uses HiveHadoop only to have the metastore service. Tests using MinIO don't want to use the HDFS and don't need to wait for it.

The startup time is especially severe on Apple M1 chips. The change brings down container startup time from ~43s to ~13s.

@findepi findepi added the no-release-notes This pull request does not require release notes entry label Oct 11, 2022
@cla-bot cla-bot bot added the cla-signed label Oct 11, 2022
@findepi findepi force-pushed the findepi/improve-startup-time-for-tests-using-hiveminiodatalake-c20056 branch from 80cb374 to bd008fe Compare October 11, 2022 08:32
@findepi findepi added the test label Oct 11, 2022
Copy link
Copy Markdown
Member

@aczajkowski aczajkowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE 🎉

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe withHdfsAndHiveRuntime(false) or withHdfsAndHiveRuntimeDiasbled()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a dedicated HMS docker image instead of stripping on the fly the services?

https://hub.docker.com/r/starburstdata/hive-metastore may be a viable alternative.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the effort of @nineinchnick to get multi arch support on the existing images. See trinodb/docker-images#143

Not all images include multi arch support but hive3.1-hive has, I guess we can use that image easily for the Iceberg and Delta tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a dedicated HMS docker image instead of stripping on the fly the services?

Good idea, if someone gonna maintain it.

We need one per every hive config we use in these tests, see
io.trino.plugin.hive.containers.HiveHadoop#DEFAULT_IMAGE and io.trino.plugin.hive.containers.HiveHadoop#HIVE3_IMAGE

@findinpath
Copy link
Copy Markdown
Contributor

2022-10-11T07:05:02.285-0500 INFO hadoop              | Disabling HDF and Hive runtime in the container
2022-10-11T07:05:02.285-0500 INFO hadoop              | removed `/etc/supervisord.d/hive-server2.conf'
2022-10-11T07:05:02.286-0500 INFO hadoop              | removed `/etc/supervisord.d/yarn-nodemanager.conf'
2022-10-11T07:05:02.286-0500 INFO hadoop              | removed `/etc/supervisord.d/yarn-resourcemanager.conf'
2022-10-11T07:05:02.286-0500 INFO hadoop              | removed `/etc/supervisord.d/hdfs-namenode.conf'
2022-10-11T07:05:02.286-0500 INFO hadoop              | removed `/etc/supervisord.d/hdfs-datanode.conf'
2022-10-11T07:05:02.286-0500 INFO hadoop              | + rm -v /etc/supervisord.d/hive-server2.conf
2022-10-11T07:05:02.286-0500 INFO hadoop              | + rm -v /etc/supervisord.d/yarn-nodemanager.conf
2022-10-11T07:05:02.287-0500 INFO hadoop              | + rm -v /etc/supervisord.d/yarn-resourcemanager.conf
2022-10-11T07:05:02.287-0500 INFO hadoop              | + rm -v /etc/supervisord.d/hdfs-namenode.conf
2022-10-11T07:05:02.287-0500 INFO hadoop              | + rm -v /etc/supervisord.d/hdfs-datanode.conf
2022-10-11T07:05:02.287-0500 INFO hadoop              | + exec /usr/local/hadoop-run.sh
2022-10-11T07:05:02.287-0500 INFO hadoop              | /bin/bash: line 7: /usr/local/hadoop-run.sh: Permission denied
2022-10-11T07:05:02.356-0500 INFO hadoop              | (exited)
2022-10-11T07:05:02.534-0500 SEVERE Could not start container

@findepi findepi force-pushed the findepi/improve-startup-time-for-tests-using-hiveminiodatalake-c20056 branch 4 times, most recently from 4cae5cb to 4ee3970 Compare October 13, 2022 13:37
@findepi findepi self-assigned this Oct 14, 2022
@findepi findepi mentioned this pull request Oct 25, 2022
`HiveMinioDataLake` uses `HiveHadoop` only to have the metastore
service. Tests using MinIO don't want to use the HDFS and don't need to
wait for it.

The startup time is especially severe on Apple M1 chips. The change
brings down container startup time from ~43s to ~13s.
@findepi findepi force-pushed the findepi/improve-startup-time-for-tests-using-hiveminiodatalake-c20056 branch from 5feade6 to 676d601 Compare October 25, 2022 08:05
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Oct 25, 2022

(just rebased to remove commits extracted to #14742)

@findepi findepi deleted the findepi/improve-startup-time-for-tests-using-hiveminiodatalake-c20056 branch December 5, 2022 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed no-release-notes This pull request does not require release notes entry test

Development

Successfully merging this pull request may close these issues.

7 participants