-
Notifications
You must be signed in to change notification settings - Fork 348
Add regtests for Spark client to test built jars #1402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
243b79a
8f9d3ac
177605c
d7e469e
802d03e
076a156
d43e8bc
ad76d5b
22e7594
a6f1374
a7f52b3
e12453c
59aff04
3463811
b517f30
075b4c0
5dc9a1e
9c760eb
0b752cf
fd1d61e
19360ea
e19d178
652a2db
8f3074c
f538f95
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| # | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
| # | ||
|
|
||
| name: Spark Client Regression Tests | ||
| on: | ||
| push: | ||
| branches: [ "main" ] | ||
| pull_request: | ||
| branches: [ "main" ] | ||
|
|
||
| jobs: | ||
| regtest: | ||
|
|
||
| runs-on: ubuntu-latest | ||
| permissions: | ||
| contents: read | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up JDK 21 | ||
| uses: actions/setup-java@v4 | ||
| with: | ||
| java-version: '21' | ||
| distribution: 'temurin' | ||
|
|
||
| - name: Fix permissions | ||
| run: mkdir -p regtests/output && chmod 777 regtests/output && chmod 777 regtests/t_*/ref/* | ||
|
|
||
| - name: Project build | ||
| run: ./gradlew build | ||
|
|
||
| - name: Image build | ||
| run: | | ||
| ./gradlew \ | ||
| :polaris-quarkus-server:assemble \ | ||
| :polaris-quarkus-server:quarkusAppPartsBuild --rerun \ | ||
| -Dquarkus.container-image.build=true | ||
|
|
||
| - name: Regression Test | ||
| env: | ||
| AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID}} | ||
| AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY}} | ||
| run: | | ||
| docker compose -f plugins/spark/v3.5/regtests/docker-compose.yml up --build --exit-code-from regtest | ||
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
| # | ||
|
|
||
| FROM docker.io/apache/spark:3.5.5-java17 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we are running java 21 in CI why are pulling java 17 here then ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This mainly follows how existing regtests works today, it pulls an existing spark image, from the published spark docker image here https://hub.docker.com/r/apache/spark/tags, it seems we only have java 11 and java 17 for spark 3.5.5.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets then make the CI use 17 ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it matter what CI uses since we are starting in the docker? and the CI does more than just running spark docker, it also starts container with server, which does Polaris project build and starts Polaris server, the Polaris server will use java 21. Further more, i would like to be consistent with how the current regtests CI looks like today
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generally the jvm in CI are marked as stamp of approval that this works for this jvm version both runtime and compile wise, building these in JVM requires special flags as per JVM version. if we can only assert polaris spark client with JDK 17, then we should have only JDK 17 If we are saying polaris-spark client will work on java 21 (both build and runtime) then i think this is fine to have JDK 21, (though i would like to understand more on how this is guaranteed, when spark doesn't do that ?) prev tests were not building spark client of their own, they were using it, just asserting the Polaris is ok to work with 21 so 21 in CI made sense. Atleast thats my read from it.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This regression test starts two containers, one for Polaris server, and anther one for spark. The Polaris server build and run will all be with Java21, the CI does server image build, so it needs to be with java 21 since that is what the server supports. The spark image is directly downloaded from docker, and docker can manage java version and dependency separately, so it doesn't needs to be consistent with the CI java used. The iceberg ci setup is different, it is mostly just integration tests, which directly builds and runs with the CI env, not docker container. Our integration tests are all running with Java 21 in another CI today, so basically we have test that covers the case that our spark client works with both java 17 and java 21. We will need to add better CI support for different java version with integration tests later.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discussed offline, will add CI to run regression tests on different Java versions, at least 11 and 17 |
||
| ARG POLARIS_HOST=polaris | ||
| ENV POLARIS_HOST=$POLARIS_HOST | ||
| ENV SPARK_HOME=/opt/spark | ||
| ENV CURRENT_SCALA_VERSION='2.12' | ||
| ENV LANGUAGE='en_US:en' | ||
|
|
||
| USER root | ||
| RUN apt update | ||
| RUN apt-get install -y diffutils wget curl | ||
| RUN mkdir -p /home/spark && \ | ||
| chown -R spark /home/spark && \ | ||
| mkdir -p /tmp/polaris-regtests && \ | ||
| chown -R spark /tmp/polaris-regtests | ||
| RUN mkdir /opt/spark/conf && chmod -R 777 /opt/spark/conf | ||
|
|
||
| USER spark | ||
|
|
||
| WORKDIR /home/spark/polaris | ||
|
|
||
| COPY --chown=spark ./v3.5 /home/spark/polaris/v3.5 | ||
|
|
||
| # /home/spark/regtests might not be writable in all situations, see https://github.com/apache/polaris/pull/205 | ||
| USER root | ||
| RUN chmod -R go+rwx /home/spark/polaris | ||
| RUN chmod -R 777 ./v3.5/regtests | ||
| USER spark | ||
|
|
||
| ENTRYPOINT ["./v3.5/regtests/run.sh"] | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,100 @@ | ||||||||||||||||||||||||||||||
| <!-- | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Licensed to the Apache Software Foundation (ASF) under one | ||||||||||||||||||||||||||||||
| or more contributor license agreements. See the NOTICE file | ||||||||||||||||||||||||||||||
| distributed with this work for additional information | ||||||||||||||||||||||||||||||
| regarding copyright ownership. The ASF licenses this file | ||||||||||||||||||||||||||||||
| to you under the Apache License, Version 2.0 (the | ||||||||||||||||||||||||||||||
| "License"); you may not use this file except in compliance | ||||||||||||||||||||||||||||||
| with the License. You may obtain a copy of the License at | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Unless required by applicable law or agreed to in writing, | ||||||||||||||||||||||||||||||
| software distributed under the License is distributed on an | ||||||||||||||||||||||||||||||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||||||||||||||||||||||||||||||
| KIND, either express or implied. See the License for the | ||||||||||||||||||||||||||||||
| specific language governing permissions and limitations | ||||||||||||||||||||||||||||||
| under the License. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| --> | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| # End-to-end regression tests | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| regtests provides basic end-to-end tests for spark_sql using spark client jars. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Regression tests are either run in Docker, using docker-compose to orchestrate the tests, or | ||||||||||||||||||||||||||||||
| locally. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| **NOTE** regtests are supposed to be a light-weight testing to ensure jars can be used to start | ||||||||||||||||||||||||||||||
| spark and run basic SQL commands. Please use integration for detailed testing. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ## Prerequisites | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| It is recommended to clean the `regtests/output` directory before running tests. This can be done by | ||||||||||||||||||||||||||||||
| running: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```shell | ||||||||||||||||||||||||||||||
| rm -rf ./plugins/spark/v3.5/regtests/output && mkdir -p ./plugins/spark/v3.5/regtests/output && chmod -R 777 ./plugins/spark/v3.5/regtests/output | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ## Run Tests With Docker Compose | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Tests can be run with docker-compose using the provided `./plugins/spark/v3.5/regtests/docker-compose.yml` file, as | ||||||||||||||||||||||||||||||
| follows: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```shell | ||||||||||||||||||||||||||||||
| ./gradlew build | ||||||||||||||||||||||||||||||
| ./gradlew \ | ||||||||||||||||||||||||||||||
| :polaris-quarkus-server:assemble \ | ||||||||||||||||||||||||||||||
| :polaris-quarkus-server:quarkusAppPartsBuild --rerun \ | ||||||||||||||||||||||||||||||
| -Dquarkus.container-image.build=true | ||||||||||||||||||||||||||||||
| docker compose -f ./plugins/spark/v3.5/regtests/docker-compose.yml up --build --exit-code-from regtest | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| In this setup, a Polaris container will be started in a docker-compose group, using the image | ||||||||||||||||||||||||||||||
| previously built by the Gradle build. Then another container, including a Spark SQL shell, will run | ||||||||||||||||||||||||||||||
| the tests. The exit code will be the same as the exit code of the Spark container. | ||||||||||||||||||||||||||||||
| **NOTE** Docker compose only support testing with scala 2.12, because no scala 2.13 image is available | ||||||||||||||||||||||||||||||
| for spark 3.5. Scala 2.13 will be supported for Spark 4.0. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| This is the flow used in CI and should be done locally before pushing to GitHub to ensure that no | ||||||||||||||||||||||||||||||
| environmental factors contribute to the outcome of the tests. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| **Important**: if you are also using minikube, for example to test the Helm chart, you may need to | ||||||||||||||||||||||||||||||
| _unset_ the Docker environment that was pointing to the Minikube Docker daemon, otherwise the image | ||||||||||||||||||||||||||||||
| will be built by the Minikube Docker daemon and will not be available to the local Docker daemon. | ||||||||||||||||||||||||||||||
| This can be done by running, _before_ building the image and running the tests: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```shell | ||||||||||||||||||||||||||||||
| eval $(minikube -p minikube docker-env --unset) | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ## Run Tests Locally | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Regression tests can be run locally as well, using the test harness. For local testing, both | ||||||||||||||||||||||||||||||
| Scala 2.12 and Scala 2.13 are supported. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| Before you run the test, make sure you build the project to generate the Spark client jars. | ||||||||||||||||||||||||||||||
| ```shell | ||||||||||||||||||||||||||||||
| ./gradlew build | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| In this setup, a Polaris server must be running on localhost:8181 before running tests. The simplest | ||||||||||||||||||||||||||||||
| way to do this is to run the Polaris server in a separate terminal window: | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| ```shell | ||||||||||||||||||||||||||||||
| ./gradlew run | ||||||||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Before you run the test, make sure you build the project to generate the Spark client jars. | |
| ```shell | |
| ./gradlew build | |
| ``` | |
| In this setup, a Polaris server must be running on localhost:8181 before running tests. The simplest | |
| way to do this is to run the Polaris server in a separate terminal window: | |
| ```shell | |
| ./gradlew run | |
| ``` | |
| - `./gradlew build` -- build the Spark Client jars. | |
| - `./gradlew run` -- start a Polaris server on localhost:8181. | |
| - `env POLARIS_HOST=localhost ./plugins/spark/v3.5/regtests/run.sh` -- run regtests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg! updated
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
| # | ||
|
|
||
| services: | ||
| polaris: | ||
| image: apache/polaris:latest | ||
| ports: | ||
| - "8181" | ||
| - "8182" | ||
| environment: | ||
| AWS_REGION: us-west-2 | ||
| AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID | ||
| AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY | ||
| GOOGLE_APPLICATION_CREDENTIALS: $GOOGLE_APPLICATION_CREDENTIALS | ||
| AZURE_TENANT_ID: $AZURE_TENANT_ID | ||
| AZURE_CLIENT_ID: $AZURE_CLIENT_ID | ||
| AZURE_CLIENT_SECRET: $AZURE_CLIENT_SECRET | ||
|
||
| POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,secret | ||
| quarkus.log.file.enable: "false" | ||
| quarkus.otel.sdk.disabled: "true" | ||
| volumes: | ||
| - ./credentials:/tmp/credentials/ | ||
| healthcheck: | ||
| test: ["CMD", "curl", "http://localhost:8182/q/health"] | ||
| interval: 10s | ||
| timeout: 10s | ||
| retries: 5 | ||
| regtest: | ||
| build: | ||
| context: ../.. | ||
| dockerfile: v3.5/regtests/Dockerfile | ||
| args: | ||
| POLARIS_HOST: polaris | ||
| depends_on: | ||
| polaris: | ||
| condition: service_healthy | ||
| environment: | ||
| AWS_TEST_ENABLED: $AWS_TEST_ENABLED | ||
| AWS_STORAGE_BUCKET: $AWS_STORAGE_BUCKET | ||
| AWS_ROLE_ARN: $AWS_ROLE_ARN | ||
| AWS_TEST_BASE: $AWS_TEST_BASE | ||
| GCS_TEST_ENABLED: $GCS_TEST_ENABLED | ||
| GCS_TEST_BASE: $GCS_TEST_BASE | ||
| GOOGLE_APPLICATION_CREDENTIALS: $GOOGLE_APPLICATION_CREDENTIALS | ||
| AZURE_TEST_ENABLED: $AZURE_TEST_ENABLED | ||
| AZURE_TENANT_ID: $AZURE_TENANT_ID | ||
| AZURE_DFS_TEST_BASE: $AZURE_DFS_TEST_BASE | ||
| AZURE_BLOB_TEST_BASE: $AZURE_BLOB_TEST_BASE | ||
| AZURE_CLIENT_ID: $AZURE_CLIENT_ID | ||
| AZURE_CLIENT_SECRET: $AZURE_CLIENT_SECRET | ||
| AWS_CROSS_REGION_TEST_ENABLED: $AWS_CROSS_REGION_TEST_ENABLED | ||
| AWS_CROSS_REGION_BUCKET: $AWS_CROSS_REGION_BUCKET | ||
| AWS_ROLE_FOR_CROSS_REGION_BUCKET: $AWS_ROLE_FOR_CROSS_REGION_BUCKET | ||
| AWS_REGION_FOR_CROSS_REGION_TEST: $AWS_REGION_FOR_CROSS_REGION_TEST | ||
| volumes: | ||
| - ./output:/tmp/polaris-regtests/ | ||
| - ./credentials:/tmp/credentials/ | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt] why only AWS, not GCP / Azure ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason, i was mainly following how polaris/regtests is working today. My main purpose for the regtests is to make sure spark can start with the packed jars and perform basic operations right now, so this should be enough for current purpose. We are likely going to add more tests to cover more usages later.