Add Delta Lake product tests#11565
Conversation
94c2923 to
6c00390
Compare
...-product-tests/src/main/java/io/trino/tests/product/deltalake/TestOssDeltaLakeHdfsReads.java
Outdated
Show resolved
Hide resolved
...-product-tests/src/main/java/io/trino/tests/product/deltalake/TestOssDeltaLakeHdfsReads.java
Outdated
Show resolved
Hide resolved
d00b9a8 to
6e3e356
Compare
6e3e356 to
8781e2a
Compare
0740a29 to
e77f81f
Compare
303427c to
06bfb9a
Compare
|
Does Minio need it's own module, or can it go into one of the existing shared ones, like |
|
06bfb9a to
bd9e5b5
Compare
310fb13 to
5ba4bd2
Compare
@alexjo2144 Adressed the issue by creating the minio bucket directory on the fly while configuring MinIO via testcontainers. |
I am currently intentionally not fixing the Git conflicts in order to keep the basis of the current PR which is used in a private downstream project. |
There was a problem hiding this comment.
This should go in the common/Minio file. It's not clear here what's special about the /data/ directory
There was a problem hiding this comment.
isn't /data where MinIO docker container stores buckets?
There was a problem hiding this comment.
/data is the argument given to minio server to point the directory where the buckets are physically stored.
The Docker container process is started with:
.withCommand("server", "--address", format("0.0.0.0:%d", MINIO_PORT), "/data")
I think that this customisation is specific for EnvSinglenodeDeltaLakeOss and not a common setting for MinioContainer.
There was a problem hiding this comment.
We should at least have /data/ stored in a static class variable on Minio then
f4b70bd to
97b8ca5
Compare
Change of plan. Fixing the conflicts. |
5a74dc8 to
3d0e6e0
Compare
Good question.
|
3d0e6e0 to
684e998
Compare
Allow the Delta Lake product tests to make use of the Delta Lake testing resources.
95d0a5a to
44591e1
Compare
|
Rebased on |
|
@alexjo2144 ptal |
79e7376 to
032704c
Compare
The Delta Lake product tests can be all executed with SuiteDeltaLake suite class. The following test product test environments are exposed: - single-node-delta-lake-oss: used to test the compatibility of the Trino Delta Lake connector with Apache Spark with Delta OSS - single-node-delta-lake-databricks: used to test the compatibility of the Trino Delta Lake connector with Delta Lake Databricks - single-node-delta-lake-kerberized-hdfs: used to test Delta Lake connector on top of kerberized Hadoop environment - single-node-minio-data-lake: lightweight environment that can be used to test the Lakehouse connectors with HMS & MinIO The aim of the Delta Lake product tests is to ensure compatibility with both implementations of Delta Lake: - Delta OSS - Databricks Delta These product tests were originally written for the Starburst Enterprise Delta Lake connector. Co-authored by various engineers at Starburst Data: Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com> Co-authored-by: Alex Jo <alex.jo@starburstdata.com> Co-authored-by: Łukasz Osipiuk <lukasz@osipiuk.net> Co-authored-by: Konrad Dziedzic <konraddziedzic@gmail.com> Co-authored-by: Adam J. Shook <shook@datacatessen.com> Co-authored-by: Mateusz Gajewski <mateusz.gajewski@gmail.com> Co-authored-by: Gaurav Sehgal <gaurav.sehgal8297@gmail.com> Co-authored-by: Raunaq Morarka <raunaqmorarka@gmail.com> Co-authored-by: Ashhar Hasan <ashhar.hasan@starburstdata.com> Co-authored-by: Michał Ślizak <michal.slizak+github@gmail.com> Co-authored-by: Grzegorz Kokosiński <grzegorz@starburstdata.com> Co-authored-by: Arkadiusz Czajkowski <arek@starburstdata.com> Co-authored-by: Jacob I. Komissar <jacob.komissar@starburstdata.com> Co-authored-by: Krzysztof Sobolewski <krzysztof.sobolewski@starburstdata.com> Co-authored-by: Krzysztof Skrzypczynski <krzysztof.skrzypczynski@starburstdata.com> Co-authored-by: Yuya Ebihara <yuya.ebihara@starburstdata.com> Co-authored-by: Praveen Krishna <praveenkrishna@tutanota.com> Co-authored-by: Karol Sobczak <napewnotrafi@gmail.com> Co-authored-by: Sasha Sheikin <myminitrue@gmail.com> Co-authored-by: Szymon Homa <szymon.homa@starburstdata.com>
2020ab3 to
4a81969
Compare
Description
Expose Delta Lake product tests
TODOs :
Figure out how to setup Databricks environment to be used the Delta Lake connector tests.
Open issues:
- no auto-restart when querying a terminated cluster - can be solved by creating a new cluster via Databricks Clusters API v2
- no Instance profiles functionality available on community clusters - this is a serious limitation because the Delta Lake connector tests create tables backed by AWS S3 buckets
- per Community account only one cluster can be created - can be solved by creating multiple accounts in order to test Databricks 9.1 LTS, 7.3 LTS
Tests
Delta Lake connector
This change contributes to ensuring accuracy of the functionality exposed by the Delta Lake connector.
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: