Skip to content

[HUDI-2785] Add Trino setup in Docker Demo#4300

Merged
codope merged 3 commits intoapache:masterfrom
yihua:HUDI-2785-trino-docker-demo
Jan 14, 2022
Merged

[HUDI-2785] Add Trino setup in Docker Demo#4300
codope merged 3 commits intoapache:masterfrom
yihua:HUDI-2785-trino-docker-demo

Conversation

@yihua
Copy link
Copy Markdown
Contributor

@yihua yihua commented Dec 13, 2021

What is the purpose of the pull request

This PR adds Trino setup in the Docker Demo so users can try querying Hudi table synced to Hive using Trino query engine.

Brief change log

  • Adds docker build config for base_java11 for Trino related containers since Trino server can be only executed with Java 11 and above
  • Adds docker build config for trinobase (base image for Trino container), trinocoordinator (for Trino Coordinator container), and trinoworker (for Trino Worker container)
  • Adjusts sparkadhoc docker config to be able to run Trino CLI
  • Adds config in docker/compose/docker-compose_hadoop284_hive233_spark244.yml for Trino setup

Verify this pull request

The new images can be built with docker under $HUDI_DIR/docker/hoodie/hadoopusing the commands below. New images are uploaded to docker hub.

docker build sparkadhoc -t apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4
docker build base_java11 -t apachehudi/hudi-hadoop_2.8.4-base-java11
docker build trinobase -t apachehudi/hudi-hadoop_2.8.4-trinobase_368
docker build trinocoordinator -t apachehudi/hudi-hadoop_2.8.4-trinocoordinator_368
docker build trinoworker -t apachehudi/hudi-hadoop_2.8.4-trinoworker_368

The changes can be verified by running through the Docker Demo with additional Trino queries:

ethan@Ethans-MacBook-Pro ~/W/r/h/docker (HUDI-2785-trino-docker-demo) > docker exec -it adhoc-2 trino --server trino-coordinator-1:8091
trino> show catalogs;
 Catalog 
---------
 hive    
 system  
(2 rows)

Query 20220112_055038_00000_sac73, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
3.74 [0 rows, 0B] [0 rows/s, 0B/s]

trino> use hive.default;
USE
trino:default> show tables;
       Table        
--------------------
 stock_ticks_cow    
 stock_ticks_mor_ro 
 stock_ticks_mor_rt 
(3 rows)

Query 20220112_055050_00003_sac73, FINISHED, 2 nodes
Splits: 19 total, 19 done (100.00%)
1.84 [3 rows, 102B] [1 rows/s, 55B/s]

trino:default> select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
 symbol |        _col1        
--------+---------------------
 GOOG   | 2018-08-31 10:29:00 
(1 row)

Query 20220112_055101_00005_sac73, FINISHED, 1 node
Splits: 49 total, 49 done (100.00%)
4.08 [197 rows, 442KB] [48 rows/s, 108KB/s]

trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close from stock_ticks_cow where symbol = 'GOOG';
 _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
---------------------+--------+---------------------+--------+-----------+----------
 20220112054822108   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
 20220112054822108   | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085 
(2 rows)

Query 20220112_055113_00006_sac73, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.40 [197 rows, 450KB] [487 rows/s, 1.09MB/s]

trino:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
 symbol |        _col1        
--------+---------------------
 GOOG   | 2018-08-31 10:29:00 
(1 row)

Query 20220112_055125_00007_sac73, FINISHED, 1 node
Splits: 49 total, 49 done (100.00%)
0.50 [197 rows, 442KB] [395 rows/s, 888KB/s]

trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
 _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
---------------------+--------+---------------------+--------+-----------+----------
 20220112054844841   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
 20220112054844841   | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085 
(2 rows)

Query 20220112_055136_00008_sac73, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.49 [197 rows, 450KB] [404 rows/s, 924KB/s]

trino:default> exit

***** After second batch *****

ethan@Ethans-MacBook-Pro ~/W/r/h/docker (HUDI-2785-trino-docker-demo)> docker exec -it adhoc-2 trino --server trino-coordinator-1:8091
trino> use hive.default;
USE
trino:default> select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
 symbol |        _col1        
--------+---------------------
 GOOG   | 2018-08-31 10:59:00 
(1 row)

Query 20220112_055443_00012_sac73, FINISHED, 1 node
Splits: 49 total, 49 done (100.00%)
0.63 [197 rows, 442KB] [310 rows/s, 697KB/s]

trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG';
 _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
---------------------+--------+---------------------+--------+-----------+----------
 20220112054822108   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
 20220112055352654   | GOOG   | 2018-08-31 10:59:00 |   9021 | 1227.1993 | 1227.215 
(2 rows)

Query 20220112_055450_00013_sac73, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.65 [197 rows, 450KB] [303 rows/s, 692KB/s]

trino:default> select symbol, max(ts) from stock_ticks_mor_ro group by symbol HAVING symbol = 'GOOG';
 symbol |        _col1        
--------+---------------------
 GOOG   | 2018-08-31 10:29:00 
(1 row)

Query 20220112_055500_00014_sac73, FINISHED, 1 node
Splits: 49 total, 49 done (100.00%)
0.59 [197 rows, 442KB] [336 rows/s, 756KB/s]

trino:default> select "_hoodie_commit_time", symbol, ts, volume, open, close  from stock_ticks_mor_ro where  symbol = 'GOOG';
 _hoodie_commit_time | symbol |         ts          | volume |   open    |  close   
---------------------+--------+---------------------+--------+-----------+----------
 20220112054844841   | GOOG   | 2018-08-31 09:59:00 |   6330 |    1230.5 |  1230.02 
 20220112054844841   | GOOG   | 2018-08-31 10:29:00 |   3391 | 1230.1899 | 1230.085 
(2 rows)

Query 20220112_055506_00015_sac73, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0.35 [197 rows, 450KB] [556 rows/s, 1.24MB/s]

trino:default> exit
ethan@Ethans-MacBook-Pro ~/W/r/h/docker (HUDI-2785-trino-docker-demo)> 

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@yihua yihua force-pushed the HUDI-2785-trino-docker-demo branch from a31dff6 to 3d67017 Compare December 16, 2021 01:05
@vinothchandar
Copy link
Copy Markdown
Member

Could we also get the docs updated

Comment thread docker/compose/docker-compose_hadoop284_hive233_spark244.yml Outdated
Comment thread docker/hoodie/hadoop/trinoworker/etc/catalog/jmx.properties Outdated
@yihua
Copy link
Copy Markdown
Contributor Author

yihua commented Jan 12, 2022

Could we also get the docs updated

Yes, I'll follow up on this.

@yihua yihua force-pushed the HUDI-2785-trino-docker-demo branch from 3d67017 to aa03fbd Compare January 12, 2022 05:14
@apache apache deleted a comment from hudi-bot Jan 13, 2022
@yihua
Copy link
Copy Markdown
Contributor Author

yihua commented Jan 13, 2022

DOCS PR here: #4577

@apache apache deleted a comment from hudi-bot Jan 13, 2022
@yihua yihua force-pushed the HUDI-2785-trino-docker-demo branch from 28c1a04 to b8ff466 Compare January 14, 2022 07:59
@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit 53f75f8 into apache:master Jan 14, 2022
@vinishjail97 vinishjail97 mentioned this pull request Jan 24, 2022
5 tasks
vingov pushed a commit to vingov/hudi that referenced this pull request Jan 26, 2022
* [HUDI-2785] Add Trino setup in Docker Demo

* Update docker account and remove unnecessary configs

* Adjust sparkadhoc Dockerfile
liusenhua pushed a commit to liusenhua/hudi that referenced this pull request Mar 1, 2022
* [HUDI-2785] Add Trino setup in Docker Demo

* Update docker account and remove unnecessary configs

* Adjust sparkadhoc Dockerfile
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
* [HUDI-2785] Add Trino setup in Docker Demo

* Update docker account and remove unnecessary configs

* Adjust sparkadhoc Dockerfile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants