Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/spark_client_regtests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: Spark Client Regression Tests
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

jobs:
regtest:

runs-on: ubuntu-latest
permissions:
contents: read

steps:
- uses: actions/checkout@v4

- name: Set up JDK 21
uses: actions/setup-java@v4
with:
java-version: '21'
distribution: 'temurin'

- name: Fix permissions
run: mkdir -p regtests/output && chmod 777 regtests/output && chmod 777 regtests/t_*/ref/*

- name: Project build
run: ./gradlew build

- name: Image build
run: |
./gradlew \
:polaris-quarkus-server:assemble \
:polaris-quarkus-server:quarkusAppPartsBuild --rerun \
-Dquarkus.container-image.build=true

- name: Regression Test
env:
AWS_ACCESS_KEY_ID: ${{secrets.AWS_ACCESS_KEY_ID}}
AWS_SECRET_ACCESS_KEY: ${{secrets.AWS_SECRET_ACCESS_KEY}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[doubt] why only AWS, not GCP / Azure ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason, i was mainly following how polaris/regtests is working today. My main purpose for the regtests is to make sure spark can start with the packed jars and perform basic operations right now, so this should be enough for current purpose. We are likely going to add more tests to cover more usages later.

run: |
docker compose -f plugins/spark/v3.5/regtests/docker-compose.yml up --build --exit-code-from regtest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: an empty line at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

2 changes: 2 additions & 0 deletions build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ tasks.named<RatTask>("rat").configure {

excludes.add("**/kotlin-compiler*")
excludes.add("**/build-logic/.kotlin/**")

excludes.add("plugins/**/*.ref")
}

// Pass environment variables:
Expand Down
48 changes: 48 additions & 0 deletions plugins/spark/v3.5/regtests/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

FROM docker.io/apache/spark:3.5.5-java17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are running java 21 in CI why are pulling java 17 here then ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mainly follows how existing regtests works today, it pulls an existing spark image, from the published spark docker image here https://hub.docker.com/r/apache/spark/tags, it seems we only have java 11 and java 17 for spark 3.5.5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets then make the CI use 17 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter what CI uses since we are starting in the docker? and the CI does more than just running spark docker, it also starts container with server, which does Polaris project build and starts Polaris server, the Polaris server will use java 21. Further more, i would like to be consistent with how the current regtests CI looks like today

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the jvm in CI are marked as stamp of approval that this works for this jvm version both runtime and compile wise, building these in JVM requires special flags as per JVM version.
Saying this from my past experience of upgrading Iceberg with JDK 17 : apache/iceberg#7391

if we can only assert polaris spark client with JDK 17, then we should have only JDK 17

If we are saying polaris-spark client will work on java 21 (both build and runtime) then i think this is fine to have JDK 21, (though i would like to understand more on how this is guaranteed, when spark doesn't do that ?)

prev tests were not building spark client of their own, they were using it, just asserting the Polaris is ok to work with 21 so 21 in CI made sense.

Atleast thats my read from it.

Copy link
Contributor Author

@gh-yzou gh-yzou Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regression test starts two containers, one for Polaris server, and anther one for spark. The Polaris server build and run will all be with Java21, the CI does server image build, so it needs to be with java 21 since that is what the server supports. The spark image is directly downloaded from docker, and docker can manage java version and dependency separately, so it doesn't needs to be consistent with the CI java used.
So we have our polaris server container runs with java21, and our spark container runs with java18, this is actually the closet setup to how the server client env setup and jar release happens for a user.

The iceberg ci setup is different, it is mostly just integration tests, which directly builds and runs with the CI env, not docker container. Our integration tests are all running with Java 21 in another CI today, so basically we have test that covers the case that our spark client works with both java 17 and java 21. We will need to add better CI support for different java version with integration tests later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, will add CI to run regression tests on different Java versions, at least 11 and 17

ARG POLARIS_HOST=polaris
ENV POLARIS_HOST=$POLARIS_HOST
ENV SPARK_HOME=/opt/spark
ENV CURRENT_SCALA_VERSION='2.12'
ENV LANGUAGE='en_US:en'

USER root
RUN apt update
RUN apt-get install -y diffutils wget curl
RUN mkdir -p /home/spark && \
chown -R spark /home/spark && \
mkdir -p /tmp/polaris-regtests && \
chown -R spark /tmp/polaris-regtests
RUN mkdir /opt/spark/conf && chmod -R 777 /opt/spark/conf

USER spark

WORKDIR /home/spark/polaris

COPY --chown=spark ./v3.5 /home/spark/polaris/v3.5

# /home/spark/regtests might not be writable in all situations, see https://github.com/apache/polaris/pull/205
USER root
RUN chmod -R go+rwx /home/spark/polaris
RUN chmod -R 777 ./v3.5/regtests
USER spark

ENTRYPOINT ["./v3.5/regtests/run.sh"]
100 changes: 100 additions & 0 deletions plugins/spark/v3.5/regtests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
<!--

Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.

-->

# End-to-end regression tests

regtests provides basic end-to-end tests for spark_sql using spark client jars.

Regression tests are either run in Docker, using docker-compose to orchestrate the tests, or
locally.

**NOTE** regtests are supposed to be a light-weight testing to ensure jars can be used to start
spark and run basic SQL commands. Please use integration for detailed testing.

## Prerequisites

It is recommended to clean the `regtests/output` directory before running tests. This can be done by
running:

```shell
rm -rf ./plugins/spark/v3.5/regtests/output && mkdir -p ./plugins/spark/v3.5/regtests/output && chmod -R 777 ./plugins/spark/v3.5/regtests/output
```

## Run Tests With Docker Compose

Tests can be run with docker-compose using the provided `./plugins/spark/v3.5/regtests/docker-compose.yml` file, as
follows:

```shell
./gradlew build
./gradlew \
:polaris-quarkus-server:assemble \
:polaris-quarkus-server:quarkusAppPartsBuild --rerun \
-Dquarkus.container-image.build=true
docker compose -f ./plugins/spark/v3.5/regtests/docker-compose.yml up --build --exit-code-from regtest
```

In this setup, a Polaris container will be started in a docker-compose group, using the image
previously built by the Gradle build. Then another container, including a Spark SQL shell, will run
the tests. The exit code will be the same as the exit code of the Spark container.
**NOTE** Docker compose only support testing with scala 2.12, because no scala 2.13 image is available
for spark 3.5. Scala 2.13 will be supported for Spark 4.0.

This is the flow used in CI and should be done locally before pushing to GitHub to ensure that no
environmental factors contribute to the outcome of the tests.

**Important**: if you are also using minikube, for example to test the Helm chart, you may need to
_unset_ the Docker environment that was pointing to the Minikube Docker daemon, otherwise the image
will be built by the Minikube Docker daemon and will not be available to the local Docker daemon.
This can be done by running, _before_ building the image and running the tests:

```shell
eval $(minikube -p minikube docker-env --unset)
```

## Run Tests Locally

Regression tests can be run locally as well, using the test harness. For local testing, both
Scala 2.12 and Scala 2.13 are supported.

Before you run the test, make sure you build the project to generate the Spark client jars.
```shell
./gradlew build
```

In this setup, a Polaris server must be running on localhost:8181 before running tests. The simplest
way to do this is to run the Polaris server in a separate terminal window:

```shell
./gradlew run
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could simplify it like the following, we could still keep the "Note:xxx" part.

Suggested change
Before you run the test, make sure you build the project to generate the Spark client jars.
```shell
./gradlew build
```
In this setup, a Polaris server must be running on localhost:8181 before running tests. The simplest
way to do this is to run the Polaris server in a separate terminal window:
```shell
./gradlew run
```
- `./gradlew build` -- build the Spark Client jars.
- `./gradlew run` -- start a Polaris server on localhost:8181.
- `env POLARIS_HOST=localhost ./plugins/spark/v3.5/regtests/run.sh` -- run regtests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg! updated


Note: the regression tests expect Polaris to run with certain options, e.g. with support for `FILE`
storage, default realm `POLARIS` and root credentials `root:secret`; if you run the above command,
this will be the case. If you run Polaris in a different way, make sure that Polaris is configured
appropriately.

Running the test harness will automatically run the idempotent setup script. From the root of the
project, just run:

```shell
env POLARIS_HOST=localhost ./plugins/spark/v3.5/regtests/run.sh
```
Empty file.
73 changes: 73 additions & 0 deletions plugins/spark/v3.5/regtests/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

services:
polaris:
image: apache/polaris:latest
ports:
- "8181"
- "8182"
environment:
AWS_REGION: us-west-2
AWS_ACCESS_KEY_ID: $AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY: $AWS_SECRET_ACCESS_KEY
GOOGLE_APPLICATION_CREDENTIALS: $GOOGLE_APPLICATION_CREDENTIALS
AZURE_TENANT_ID: $AZURE_TENANT_ID
AZURE_CLIENT_ID: $AZURE_CLIENT_ID
AZURE_CLIENT_SECRET: $AZURE_CLIENT_SECRET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we using it ?

Copy link
Contributor Author

@gh-yzou gh-yzou Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are using it today, this mostly follows how regtests works today, we will use it later when I start adding very cloud specific test cases

Copy link
Contributor

@singhpk234 singhpk234 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add when we add these test then ?

Copy link
Contributor Author

@gh-yzou gh-yzou Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed those credentials for now, we will add it when i add cloud specific tests. I also removed the credential folder

POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,secret
quarkus.log.file.enable: "false"
quarkus.otel.sdk.disabled: "true"
volumes:
- ./credentials:/tmp/credentials/
healthcheck:
test: ["CMD", "curl", "http://localhost:8182/q/health"]
interval: 10s
timeout: 10s
retries: 5
regtest:
build:
context: ../..
dockerfile: v3.5/regtests/Dockerfile
args:
POLARIS_HOST: polaris
depends_on:
polaris:
condition: service_healthy
environment:
AWS_TEST_ENABLED: $AWS_TEST_ENABLED
AWS_STORAGE_BUCKET: $AWS_STORAGE_BUCKET
AWS_ROLE_ARN: $AWS_ROLE_ARN
AWS_TEST_BASE: $AWS_TEST_BASE
GCS_TEST_ENABLED: $GCS_TEST_ENABLED
GCS_TEST_BASE: $GCS_TEST_BASE
GOOGLE_APPLICATION_CREDENTIALS: $GOOGLE_APPLICATION_CREDENTIALS
AZURE_TEST_ENABLED: $AZURE_TEST_ENABLED
AZURE_TENANT_ID: $AZURE_TENANT_ID
AZURE_DFS_TEST_BASE: $AZURE_DFS_TEST_BASE
AZURE_BLOB_TEST_BASE: $AZURE_BLOB_TEST_BASE
AZURE_CLIENT_ID: $AZURE_CLIENT_ID
AZURE_CLIENT_SECRET: $AZURE_CLIENT_SECRET
AWS_CROSS_REGION_TEST_ENABLED: $AWS_CROSS_REGION_TEST_ENABLED
AWS_CROSS_REGION_BUCKET: $AWS_CROSS_REGION_BUCKET
AWS_ROLE_FOR_CROSS_REGION_BUCKET: $AWS_ROLE_FOR_CROSS_REGION_BUCKET
AWS_REGION_FOR_CROSS_REGION_TEST: $AWS_REGION_FOR_CROSS_REGION_TEST
volumes:
- ./output:/tmp/polaris-regtests/
- ./credentials:/tmp/credentials/
Loading