Skip to content

Commit ad2a459

Browse files
committed
Add lakehouse connector
1 parent 0419b43 commit ad2a459

35 files changed

+4078
-0
lines changed

.github/workflows/ci.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,7 @@ jobs:
350350
!:trino-ignite,
351351
!:trino-jdbc,
352352
!:trino-kafka,
353+
!:trino-lakehouse,
353354
!:trino-main,
354355
!:trino-mariadb,
355356
!:trino-memory,
@@ -470,6 +471,7 @@ jobs:
470471
- { modules: plugin/trino-iceberg, profile: minio-and-avro }
471472
- { modules: plugin/trino-ignite }
472473
- { modules: plugin/trino-kafka }
474+
- { modules: plugin/trino-lakehouse }
473475
- { modules: plugin/trino-mariadb }
474476
- { modules: plugin/trino-mongodb }
475477
- { modules: plugin/trino-mysql }

core/trino-server/src/main/provisio/trino.xml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,15 @@
148148
</artifact>
149149
</artifactSet>
150150

151+
<artifactSet to="plugin/lakehouse">
152+
<artifact id="${project.groupId}:trino-lakehouse:zip:${project.version}">
153+
<unpack />
154+
</artifact>
155+
<artifact id="${project.groupId}:trino-hdfs:zip:${project.version}">
156+
<unpack useRoot="true" />
157+
</artifact>
158+
</artifactSet>
159+
151160
<artifactSet to="plugin/loki">
152161
<artifact id="${project.groupId}:trino-loki:zip:${project.version}">
153162
<unpack />

docs/src/main/sphinx/connector.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Iceberg <connector/iceberg>
2525
Ignite <connector/ignite>
2626
JMX <connector/jmx>
2727
Kafka <connector/kafka>
28+
Lakehouse <connector/lakehouse>
2829
Loki <connector/loki>
2930
MariaDB <connector/mariadb>
3031
Memory <connector/memory>
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Lakehouse connector
2+
3+
The Lakehouse connector provides a unified way to interact with data stored
4+
in various table formats across different storage systems and metastore services.
5+
This single connector allows you to query and write data seamlessly, regardless of
6+
whether it's in Iceberg, Delta Lake, or Hudi table formats, or traditional Hive tables.
7+
8+
This connector offers flexible connectivity to popular metastore services including
9+
AWS Glue and Hive Metastore. For data storage, it supports a wide range of options
10+
including cloud storage services such as AWS S3, S3-compatible storage,
11+
Google Cloud Storage (GCS), and Azure Blob Storage, as well as HDFS installations.
12+
13+
The connector combines the features of the
14+
[Hive](/connector/hive), [Iceberg](/connector/iceberg),
15+
[Delta Lake](/connector/delta-lake), and [Hudi](/connector/hudi)
16+
connectors into a single connector. The configuration properties,
17+
session properties, table properties, and beahvior come from the underlying
18+
connectors. Please refer to the documentation for the underlying connectors
19+
for the table formats that you are using.
20+
21+
## General configuration
22+
23+
To configure the Lakehouse connector, create a catalog properties file
24+
`etc/catalog/example.properties` with the following content, replacing the
25+
properties as appropriate:
26+
27+
```text
28+
connector.name=lakehouse
29+
```
30+
31+
You must configure a [AWS Glue or a Hive metastore](/object-storage/metastores).
32+
The `hive.metastore` property will also configure the Iceberg catalog.
33+
Do not specify `iceberg.catalog.type`.
34+
35+
You must select and configure one of the
36+
[supported file systems](lakehouse-file-system-configuration).
37+
38+
## Configuration properties
39+
40+
The following configuration properties are available:
41+
42+
:::{list-table}
43+
:widths: 30, 58, 12
44+
:header-rows: 1
45+
46+
* - Property name
47+
- Description
48+
- Default
49+
* - `lakehouse.table-type`
50+
- The default table type for newly created tables when the `format`
51+
table property is not specified. Possible values:
52+
* `HIVE`
53+
* `ICEBERG`
54+
* `DELTA`
55+
- `ICEBERG`
56+
:::
57+
58+
(lakehouse-file-system-configuration)=
59+
## File system access configuration
60+
61+
The connector supports accessing the following file systems:
62+
63+
* [](/object-storage/file-system-azure)
64+
* [](/object-storage/file-system-gcs)
65+
* [](/object-storage/file-system-s3)
66+
* [](/object-storage/file-system-hdfs)
67+
68+
You must enable and configure the specific file system access.
69+
70+
## Examples
71+
72+
Create an Iceberg table:
73+
74+
```sql
75+
CREATE TABLE iceberg_table (
76+
c1 INTEGER,
77+
c2 DATE,
78+
c3 DOUBLE
79+
)
80+
WITH (
81+
type = 'ICEBERG'
82+
format = 'PARQUET',
83+
partitioning = ARRAY['c1', 'c2'],
84+
sorted_by = ARRAY['c3']
85+
);
86+
```
87+
88+
Create a Hive table:
89+
90+
```sql
91+
CREATE TABLE hive_page_views (
92+
view_time TIMESTAMP,
93+
user_id BIGINT,
94+
page_url VARCHAR,
95+
ds DATE,
96+
country VARCHAR
97+
)
98+
WITH (
99+
type = 'HIVE',
100+
format = 'ORC',
101+
partitioned_by = ARRAY['ds', 'country'],
102+
bucketed_by = ARRAY['user_id'],
103+
bucket_count = 50
104+
)
105+
```

0 commit comments

Comments
 (0)