-
Notifications
You must be signed in to change notification settings - Fork 388
infra: use spark connect to run pytests #2491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
a85d2d7
90e8902
d22d0e4
23f4736
1c1d75e
40eb5d5
2569263
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,7 +18,7 @@ | |
| # Configuration Variables | ||
| # ======================== | ||
|
|
||
| PYTEST_ARGS ?= -v # Override with e.g. PYTEST_ARGS="-vv --tb=short" | ||
| PYTEST_ARGS ?= -v -x # Override with e.g. PYTEST_ARGS="-vv --tb=short" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. -x to exit test when ctrl-c https://docs.pytest.org/en/6.2.x/reference.html#command-line-flags |
||
| COVERAGE ?= 0 # Set COVERAGE=1 to enable coverage: make test COVERAGE=1 | ||
| COVERAGE_FAIL_UNDER ?= 85 # Minimum coverage % to pass: make coverage-report COVERAGE_FAIL_UNDER=70 | ||
| KEEP_COMPOSE ?= 0 # Set KEEP_COMPOSE=1 to keep containers after integration tests | ||
|
|
@@ -37,7 +37,7 @@ endif | |
| ifeq ($(KEEP_COMPOSE),1) | ||
| CLEANUP_COMMAND = echo "Keeping containers running for debugging (KEEP_COMPOSE=1)" | ||
| else | ||
| CLEANUP_COMMAND = docker compose -f dev/docker-compose-integration.yml down -v --remove-orphans 2>/dev/null || true | ||
| CLEANUP_COMMAND = docker compose -f dev/docker-compose-integration.yml down -v --remove-orphans --timeout 0 2>/dev/null || true | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dont wait for docker compose down, more responsive |
||
| endif | ||
|
|
||
| # ============ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,11 +36,13 @@ ENV PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9.7-src.zip:$ | |
| RUN mkdir -p ${HADOOP_HOME} && mkdir -p ${SPARK_HOME} && mkdir -p /home/iceberg/spark-events | ||
| WORKDIR ${SPARK_HOME} | ||
|
|
||
| # Remember to also update `tests/conftest`'s spark setting | ||
| ENV SPARK_VERSION=3.5.6 | ||
| ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_2.12 | ||
| ENV ICEBERG_VERSION=1.9.1 | ||
| ENV SCALA_VERSION=2.12 | ||
| ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_${SCALA_VERSION} | ||
| ENV ICEBERG_VERSION=1.9.2 | ||
| ENV PYICEBERG_VERSION=0.10.0 | ||
| ENV HADOOP_VERSION=3.3.4 | ||
| ENV AWS_SDK_VERSION=1.12.753 | ||
|
Comment on lines
+40
to
+45
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. copied over from
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice, this is much better 👍 |
||
|
|
||
| # Try the primary Apache mirror (downloads.apache.org) first, then fall back to the archive | ||
| RUN set -eux; \ | ||
|
|
@@ -59,15 +61,26 @@ RUN set -eux; \ | |
| tar xzf "$FILE" --directory /opt/spark --strip-components 1; \ | ||
| rm -rf "$FILE" | ||
|
|
||
| # Download Spark Connect server JAR | ||
| RUN curl --retry 5 -s -L https://repo1.maven.org/maven2/org/apache/spark/spark-connect_${SCALA_VERSION}/${SPARK_VERSION}/spark-connect_${SCALA_VERSION}-${SPARK_VERSION}.jar \ | ||
| -Lo /opt/spark/jars/spark-connect_${SCALA_VERSION}-${SPARK_VERSION}.jar | ||
|
|
||
| # Download iceberg spark runtime | ||
| RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}/${ICEBERG_VERSION}/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar \ | ||
| -Lo /opt/spark/jars/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar | ||
|
|
||
|
|
||
| # Download AWS bundle | ||
| RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/${ICEBERG_VERSION}/iceberg-aws-bundle-${ICEBERG_VERSION}.jar \ | ||
| -Lo /opt/spark/jars/iceberg-aws-bundle-${ICEBERG_VERSION}.jar | ||
|
|
||
| # Download hadoop-aws (required for S3 support) | ||
| RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar \ | ||
| -Lo /opt/spark/jars/hadoop-aws-${HADOOP_VERSION}.jar | ||
|
|
||
| # Download AWS SDK bundle | ||
| RUN curl --retry 5 -s https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/${AWS_SDK_VERSION}/aws-java-sdk-bundle-${AWS_SDK_VERSION}.jar \ | ||
| -Lo /opt/spark/jars/aws-java-sdk-bundle-${AWS_SDK_VERSION}.jar | ||
|
|
||
| COPY spark-defaults.conf /opt/spark/conf | ||
| ENV PATH="/opt/spark/sbin:/opt/spark/bin:${PATH}" | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,15 +26,13 @@ services: | |
| - rest | ||
| - hive | ||
| - minio | ||
| volumes: | ||
| - ./warehouse:/home/iceberg/warehouse | ||
| environment: | ||
| - AWS_ACCESS_KEY_ID=admin | ||
| - AWS_SECRET_ACCESS_KEY=password | ||
| - AWS_REGION=us-east-1 | ||
| ports: | ||
| - 8888:8888 | ||
| - 8080:8080 | ||
|
Comment on lines
-36
to
-37
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed port 8888 that was previously used for notebooks |
||
| - 15002:15002 # Spark Connect | ||
| - 4040:4040 # Spark UI | ||
| links: | ||
| - rest:rest | ||
| - hive:hive | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -16,20 +16,35 @@ | |||
| # | ||||
|
|
||||
| spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions | ||||
|
|
||||
| # Configure Iceberg REST catalog | ||||
| spark.sql.catalog.rest org.apache.iceberg.spark.SparkCatalog | ||||
| spark.sql.catalog.rest.type rest | ||||
| spark.sql.catalog.rest.uri http://rest:8181 | ||||
| spark.sql.catalog.rest.io-impl org.apache.iceberg.aws.s3.S3FileIO | ||||
| spark.sql.catalog.rest.warehouse s3://warehouse/rest/ | ||||
| spark.sql.catalog.rest.s3.endpoint http://minio:9000 | ||||
| spark.sql.catalog.rest.cache-enabled false | ||||
|
|
||||
| # Configure Iceberg Hive catalog | ||||
| spark.sql.catalog.hive org.apache.iceberg.spark.SparkCatalog | ||||
| spark.sql.catalog.hive.type hive | ||||
| spark.sql.catalog.hive.uri http://hive:9083 | ||||
| spark.sql.catalog.hive.uri thrift://hive:9083 | ||||
| spark.sql.catalog.hive.io-impl org.apache.iceberg.aws.s3.S3FileIO | ||||
| spark.sql.catalog.hive.warehouse s3://warehouse/hive/ | ||||
| spark.sql.catalog.hive.s3.endpoint http://minio:9000 | ||||
|
|
||||
| # Configure Spark's default session catalog (spark_catalog) to use Iceberg backed by the Hive Metastore | ||||
| spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog | ||||
| spark.sql.catalog.spark_catalog.type hive | ||||
| spark.sql.catalog.spark_catalog.uri thrift://hive:9083 | ||||
| spark.hadoop.fs.s3a.endpoint http://minio:9000 | ||||
| spark.sql.catalogImplementation hive | ||||
| spark.sql.warehouse.dir s3a://warehouse/hive/ | ||||
|
Comment on lines
+37
to
+43
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spark_catalog is primarily used by the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It requires the |
||||
|
|
||||
| spark.sql.defaultCatalog rest | ||||
|
|
||||
| # Configure Spark UI and event logging | ||||
| spark.ui.enabled true | ||||
| spark.eventLog.enabled true | ||||
| spark.eventLog.dir /home/iceberg/spark-events | ||||
| spark.history.fs.logDirectory /home/iceberg/spark-events | ||||
| spark.sql.catalogImplementation in-memory | ||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed since we no longer run spark and metastore locally