Skip to content

Add spark350emr shim layer [EMR]#10463

Closed
mimaomao wants to merge 1 commit intoNVIDIA:branch-24.06from
mimaomao:mimaomao/spark350emr-dev
Closed

Add spark350emr shim layer [EMR]#10463
mimaomao wants to merge 1 commit intoNVIDIA:branch-24.06from
mimaomao:mimaomao/spark350emr-dev

Conversation

@mimaomao
Copy link
Contributor

This PR targets to add a new shim layer spark350emr which supports running Spark RAPIDS on AWS EMR Spark 3.5.0.


Note, this PR is a new revision of previous PR rebased on branch-24.04. You can find more details about testing in that PR.

This PR targets to add a new shim layer spark350emr which supports
running Spark RAPIDS on AWS EMR Spark 3.5.0.

Signed-off-by: Maomao Min <mimaomao@amazon.com>
@gerashegalov
Copy link
Collaborator

Somewhere in ./jenkins/version-def.sh or in .github/workflows/mvn-verify-check.yml we need to exclude 350emr from package-tests matrix on GH hosted PR checks because we won't have access to 3.5.0-amzn-0 dependencies presumably.

Error:  Failed to execute goal on project rapids-4-spark-emr-bom: Could not resolve dependencies for project com.nvidia:rapids-4-spark-emr-bom:pom:24.04.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0, org.apache.spark:spark-hive_2.12:jar:3.5.0-amzn-0: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0 was not found in https://repo1.maven.org/maven2 during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

We will also need to produce a spark-rapids-private shim for 350emr

cc @GaryShen2008 @sameerz

@sameerz sameerz added build Related to CI / CD or cleanly building feature request New feature or request labels Feb 23, 2024
@NvTimLiu
Copy link
Collaborator

Somewhere in ./jenkins/version-def.sh or in .github/workflows/mvn-verify-check.yml we need to exclude 350emr from package-tests matrix on GH hosted PR checks because we won't have access to 3.5.0-amzn-0 dependencies presumably.

Error:  Failed to execute goal on project rapids-4-spark-emr-bom: Could not resolve dependencies for project com.nvidia:rapids-4-spark-emr-bom:pom:24.04.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0, org.apache.spark:spark-hive_2.12:jar:3.5.0-amzn-0: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0 was not found in https://repo1.maven.org/maven2 during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

We will also need to produce a spark-rapids-private shim for 350emr

cc @GaryShen2008 @sameerz

EMR pre-merge/nightly build&test will be like what we've done for Databricks runtime.

We'll need separated CI jobs running on EMR

@NvTimLiu NvTimLiu changed the base branch from branch-24.04 to branch-24.06 April 15, 2024 03:31
@NvTimLiu
Copy link
Collaborator

Retarget to branch-24.06 for next release, as we're running v24.04 release, please let me know if you've any concern, thanks!

@sameerz
Copy link
Collaborator

sameerz commented Jul 30, 2024

Closing until we can retarget to the latest branch

@sameerz sameerz closed this Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Related to CI / CD or cleanly building feature request New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants