Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: JDK 17 Upgrade Support Series - Spark modules/pipelines #356

Closed
15 tasks done
cwoods-cpointe opened this issue Sep 20, 2024 · 4 comments · Fixed by #380
Closed
15 tasks done

Feature: JDK 17 Upgrade Support Series - Spark modules/pipelines #356

cwoods-cpointe opened this issue Sep 20, 2024 · 4 comments · Fixed by #380
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@cwoods-cpointe
Copy link
Contributor

cwoods-cpointe commented Sep 20, 2024

Description

In #133 we modified the build-parent to use JDK 17. When building aiSSEMBLE with the updated build-parent, some modules do not build successfully. This is a series of ticket to fix broken modules. This issue will focus on the spark pipeline functionality

DOD

  • The builds for following modules work:
    • extensions-data-delivery
    • extensions-messaging-kafka
    • Docker modules
      • aissemble-spark
      • aissemble-spark-infrastructure
      • aissemble-spark-operator
    • Helm modules
      • aissemble-spark-operator-chart
      • aissemble-spark-application-chart
      • extensions-helm-spark-infrastructure
  • The MDA templates for the spark pipelines are working
  • Write migrations for common issues. These changes need to be weighed on if the destructiveness is doable in downstream projects. These migrations should be in a new group
  • Spark docker images are build on JDK 17 or later
    • update any base images that are not using JDK 17
  • Follow on: Spark and Pyspark pipelines can be executed to completion
  • Follow on: The Kafka messaging module builds and functions (within the pipeline step coordination)
  • Follow on: The spark persist types are testes and work: hive and delta-lake
  • *** Potential work to be split out of this ticket ***
    • Follow on: File store functionality
    • Follow on: pipeline step configuration

Test Strategy/Script

  • Build aiSSEMBLE java upgrade feature branch
    • Note that if the non updated modules are failing you can build just the spark modules with:
mvn clean install -pl :build-parent,:aissemble-quarkus-bom,:bom-component,:aissemble-foundation-core-python,:extensions-data-delivery,:aissemble-extensions-data-delivery-spark-py,:extensions-data-delivery-spark,:extensions-data-delivery-spark-neo4j,:extensions-data-delivery-spark-postgres,:aissemble-spark,:aissemble-spark-infrastructure,:aissemble-spark-operator,:aissemble-spark-application-chart,:aissemble-spark-operator-chart,:extensions-helm-spark-infrastructure,:aissemble-hive-metastore-service-chart,:aissemble-spark-history-chart,:aissemble-thrift-server-chart -am
  • Generate a new downstream project using 1.9.0-SNPASHOT
mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                          -DarchetypeArtifactId=foundation-archetype \
                          -DarchetypeVersion=1.10.0-SNAPSHOT \
                          -DartifactId=test-project\
                          -DgroupId=org.test \
                          -DprojectName='Test' \
                          -DprojectGitUrl=test.org/test-project\
&& cd test-project
  • Create 2 new pipeline MDAs. One spark and one pyspark
  • Run mvn clean install and resolve all manual actions relating to spark
    • Spark operator, spark application, spark-infrastructure, kafka, s3-local
  • Update the pyspark pyproject.toml to point to the local path for foundation-core-python and extensions-data-delivery-spark-py. This is because we are on a feature branch and the artifacts are not being deployed
aissemble-foundation-core-python = {path = "/your/path/to/aissemble/foundation/aissemble-foundation-core-python", develop =true}
aissemble-extensions-data-delivery-spark-py = {path = "/your/path/to/aissemble/extensions/extensions-data-delivery/aissemble-extensions-data-delivery-spark-py", develop = true}
aissemble-foundation-pdp-client-python = {path = "/your/path/to/aissemble/foundation/foundation-security/aissemble-foundation-pdp-client-python", develop = true}
aissemble-foundation-encryption-policy-python = {path = "/your/path/to/aissemble/foundation/foundation-encryption/aissemble-foundation-encryption-policy-python", develop = true}
aissemble-extensions-encryption-vault-python = {path = "/your/path/to/aissemble/extensions/extensions-encryption/aissemble-extensions-encryption-vault-python", develop = true}
  • Update the spark-operator and spark-infrastructure helm chart dependency to the locally build charts. This is because we are on a feature branch and the artifacts are not being deployed
  • Deploy with tilt up
  • Verify all the spark related pods come up successfully
  • Note the pipeline are currently failing and will be part of the follow on

Test 2 - spark migration

  • Create a 1.9.1 downstream project
mvn archetype:generate -B -DarchetypeGroupId=com.boozallen.aissemble \
                          -DarchetypeArtifactId=foundation-archetype \
                          -DarchetypeVersion=1.9.2 \
                          -DartifactId=test-project\
                          -DgroupId=org.test \
                          -DprojectName='Test' \
                          -DprojectGitUrl=test.org/test-project\
&& cd test-project
  • Create 2 new pipeline MDAs. One spark and one pyspark
  • Build the project and add the 2 pipelines into the pom modules to be built
  • Once the pipelines are generated update the sparkConf to include the old and new valies
    • In the base-values.yaml add:
sparkApp:
  spec:
    sparkConf:
      spark.yarn.executor.failuresValidityInterval: "2h"
      spark.yarn.max.executor.failures: 10
  • In the dev-values.yaml add
sparkApp:
    spec:
      sparkConf:
        spark.executor.failuresValidityInterval: "2h"
        spark.executor.maxNumFailures: 10
  • Update the aissemble version to 1.10.0-SNAPSHOT
  • Perform the baton migrations with mvn org.technologybrewery.baton:baton-maven-plugin:baton-migrate
  • Verify the files base-values.yaml configs were updated to match the dev-values.yaml
  • Verify the dev-values.yaml files were not modified

References/Additional Context

None

@cwoods-cpointe cwoods-cpointe added the enhancement New feature or request label Sep 20, 2024
@cwoods-cpointe
Copy link
Contributor Author

Ran into a test issue with Spark needing to use an internal API that has been made inaccessible by default:

[ERROR]   Run 1: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x68999068) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x68999068

Resolved by exporting it in the surefire plugin
<argLine>--add-exports java.base/sun.nio.ch=ALL-UNNAMED</argLine>

@cwoods-cpointe
Copy link
Contributor Author

spark/java support matrix shows 3.4.x supports java17 so docker images should be good - link

@cwoods-cpointe
Copy link
Contributor Author

There is a conflict with our jackson and spark dependencies
Scala module 2.14.2 requires Jackson Databind version >= 2.14.0 and < 2.15.0 - Found jackson-databind version 2.15.0.
To mitigate this, we should try and bump up our spark version

cwoods-cpointe added a commit that referenced this issue Sep 26, 2024
 Also updated the version of pyasn1 to the latest compatible version
cwoods-cpointe added a commit that referenced this issue Sep 27, 2024
 #356 update deprecated mda scala reference

 356 Update mavenUtils to be compatable with jdk17

 #356 Create aissemble compatible quarkus bom

 Also updated the version of pyasn1 to the latest compatible version
cwoods-cpointe added a commit that referenced this issue Sep 27, 2024
 #356 update deprecated mda scala reference

 356 Update mavenUtils to be compatable with jdk17

 #356 Create aissemble compatible quarkus bom

 Also updated the version of pyasn1 to the latest compatible version
cwoods-cpointe added a commit that referenced this issue Sep 27, 2024
356 update deprecated mda scala reference
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
Also updated the version of pyasn1 to the latest compatible version
cwoods-cpointe added a commit that referenced this issue Sep 27, 2024
356 update deprecated mda scala reference
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
Also updated the version of pyasn1 to the latest compatible version
cwoods-cpointe added a commit that referenced this issue Sep 27, 2024
356 update deprecated mda scala reference
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
Also updated the version of pyasn1 to the latest compatible version
cwoods-cpointe added a commit that referenced this issue Sep 27, 2024
356 update deprecated mda scala reference
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
Also updated the version of pyasn1 to the latest compatible version
@cwoods-cpointe cwoods-cpointe linked a pull request Sep 27, 2024 that will close this issue
cwoods-cpointe added a commit that referenced this issue Sep 27, 2024
356 update deprecated mda scala reference
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
Also updated the version of pyasn1 to the latest compatible version

 #356 Create migration to update spark configs with version update
cwoods-cpointe added a commit that referenced this issue Sep 30, 2024
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
356 Create migration to update spark configs with version update
cwoods-cpointe added a commit that referenced this issue Sep 30, 2024
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
356 Create migration to update spark configs with version update
cwoods-cpointe added a commit that referenced this issue Sep 30, 2024
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
356 Create migration to update spark configs with version update
cwoods-cpointe added a commit that referenced this issue Sep 30, 2024
cwoods-cpointe added a commit that referenced this issue Sep 30, 2024
 #356 Correct foundation core pyproject imports
ewilkins-csi pushed a commit that referenced this issue Oct 1, 2024
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
356 Create migration to update spark configs with version update
ewilkins-csi pushed a commit that referenced this issue Oct 1, 2024
356 Update mavenUtils to be compatable with jdk17
356 Create aissemble compatible quarkus bom
356 Create migration to update spark configs with version update
ewilkins-csi pushed a commit that referenced this issue Oct 1, 2024
ewilkins-csi pushed a commit that referenced this issue Oct 1, 2024
 #356 Correct foundation core pyproject imports
@colinpalmer-pro
Copy link
Contributor

Test Steps passed. Ticket Resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants