Test Delta Lake connector against Databricks runtime#12896
Test Delta Lake connector against Databricks runtime#12896ebyhr merged 10 commits intotrinodb:masterfrom
Conversation
1305140 to
5b35f90
Compare
There was a problem hiding this comment.
This variable name can be a bit misleading.
I'd advocate for changing the name of the variable to : DATABRICKS104_TEST_JDBC_URL and also changing the class name EnvSinglenodeDeltaLakeDatabricks to EnvSinglenodeDeltaLakeDatabricks104
...cker/presto-product-tests/conf/environment/singlenode-delta-lake-databricks/delta.properties
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Rename constant to match new meaning. Separate commit.
There was a problem hiding this comment.
before the change, what did this code execute against?
There was a problem hiding this comment.
This code was never executed before in the context of Trino.
This code has been open sourced along with the other product tests, but not used until now.
59a2adc to
7c9b827
Compare
...cker/presto-product-tests/conf/environment/singlenode-delta-lake-databricks/delta.properties
Outdated
Show resolved
Hide resolved
...tests/src/main/java/io/trino/tests/product/deltalake/TestDatabricksCompatibilityCleanUp.java
Outdated
Show resolved
Hide resolved
.github/workflows/ci.yml
Outdated
There was a problem hiding this comment.
If you take the TRINO_ prefix off aws clients will pick them up automatically
There was a problem hiding this comment.
My intention was to avoid colliding with AWS credentials used by default for the CI runner for other product tests.
There was a problem hiding this comment.
Do we have AWS credentials used by default for the CI?
There was a problem hiding this comment.
We still may want to configure env for the containers only, and not the PTL, but it seems like a redundant complication to me.
There was a problem hiding this comment.
I was under the impression that there are s3 / glue related product tests, but now after looking after any product test environments that have these needs, I didn't find any.
I can change the name of the variables, but they still need to be passed to the corresponding containers.
There was a problem hiding this comment.
I was under the impression that there are s3 / glue related product tests,
They should exist, but they don't yet #5426
Once we have them, we will be invoked explicitly in ci.yml (my assumption)
and then they can set same env vars (with potentially different values)
we would not have ability to define one suite that exercises Hive+Glue and Delta+Databricks, but i don't know whether it will be a loss actually.
I can change the name of the variables, but they still need to be passed to the corresponding containers.
of course
7c9b827 to
141baa3
Compare
...cker/presto-product-tests/conf/environment/singlenode-delta-lake-databricks/delta.properties
Outdated
Show resolved
Hide resolved
141baa3 to
631bd69
Compare
631bd69 to
1c2bb20
Compare
|
Test PR with secrets: #12973 |
#12817 is waiting for your review |
fd783e3 to
3291132
Compare
|
Rebased on |
de441ae to
6a07350
Compare
63d5852 to
6f3425b
Compare
As part of the open sourcing process of the product tests for the `trino-delta-lake` connector, the product tests dedicated for the Databricks runtime were dependent on Hive Metastore Service. However, using AWS Glue comes with the advantage of not managing an extra service for running product tests against the Databricks runtime. This is why the product tests are now adapted to run with AWS Glue as metastore service for the Databricks runtime.
When trying to create tables on Databricks on the schema `extraordinary`, the creation of the tables was failing with the message: ``` Error in SQL statement: IllegalArgumentException: Path must be absolute: test100-__PLACEHOLDER__ ``` Specifying the location for the custom schema helps avoiding the above-mentioned issue.
The tests in the class TestHiveAndDeltaLakeRedirect run on both Delta OSS and Databricks infrastructure and not only on Databricks.
Avoid running into a test failure when dealing with a test depending on Databricks AWS S3 and AWS Glue environment. Consult for further details: trinodb#13017
6f3425b to
d63456f
Compare
d63456f to
4b428f9
Compare
Tables created for testing purposes on AWS Glue / AWS S3 have to be eventually removed. Use the prefix "test_" to identify easier the tests in the routine responsible for removing the test Glue tables.
4b428f9 to
e73186f
Compare
e73186f to
bd49f3c
Compare
|
Merged, thanks! |
Delta Lake Databricks Product Tests infrastructure
We currently have in Trino Delta Lake product tests for Delta OSS.
This PR aims to add support for Delta Lake product tests for Databricks environment as well.
The product test environment used to test Delta Lake connector functionality on top of Databricks relies on:
SuiteDeltaLakehas been renamed toSuiteDeltaLakeOssand contains virtually no changes compared tomasterbranch.SuiteDeltaLakeDatabrickscontains the Delta Lake tests to be executed on top of Databricks infrastructure.This suite needs special credentials in order to perform operations on top of Databricks:
A new job
delta-lake-databricks-pthas been added toci.yml.This job is to be executed only when the credentials required by the
SuiteDeltaLakeDatabricksare provided.