-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: refactor Spark modules to support multiple versions #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| project(':spark-common').name = 'iceberg-spark-common' | ||
| project(':spark').projectDir = file("spark/${SparkSrcVersion}/core") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we want the name to be something like spark_3_0, which should match the jar we will publish in maven. But not sure if we should also include scala version here.
| throw new Exception("Expected java8 to build Spark 2.4, but got ${SparkSrcVersion}") | ||
| } | ||
|
|
||
| project(':spark-common').name = 'iceberg-spark-common' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this to common to make spark the main directory containing all versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common package should contain only code compilable across all versions. Then for every backwards incompatible version, we create a new source version directory, which will build against Spark versions until it becomes incompatible. So for example here, source version 3_0 is built against both 3.0.3 and 3.1.1. When we introduce 3_2, it will build against 3.2.x, and maybe even 3.3.x.
| implementation "com.github.ben-manes.caffeine:caffeine" | ||
|
|
||
| compileOnly "org.apache.avro:avro" | ||
| compileOnly("org.apache.spark:spark-hive_2.11") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we want this to also build against the correct Scala and spark versions. But currently the build would not pass if building against Spark 3. Should fix that in the formal PR.
| include 'spark-extensions' | ||
| include 'spark-runtime' | ||
|
|
||
| var SparkSrcVersion = System.getProperty("sparkSrc") != null ? System.getProperty("sparkSrc") : '3_0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we don't have to expose this to users, and we can have a map of all build versions supported by a source version. But I am not sure yet which way is better.
|
Thanks for getting this started, @jackye1995! I think this is a good first step, but there are a few things we should consider. First, I agree that we probably want to parameterize the Scala version while we're doing this work. We are going to need to address it sooner or later and now seems like a good time to at least put in a parameter that is always 2.12 so that we can get a 2.13 build working later. Next, I think we need to think about package naming. Right now, we produce Since we probably want to be able to publish Jars for all Spark versions, I think it makes sense to be able to build the whole project with every Spark version. And since we will probably be embedding version strings in module names, I'm wondering if the conditional project logic in the main I went ahead and tried out updating this build like this and the result is #3256. |
|
Let me take a look at both PRs. |
This is a draft prototype for the approach 3 discussed in the mailing list to support multiple Spark versions.
The code is organized to the following format:
I introduce 3 system properties:
Then we can build Spark against different versions:
@aokolnychyi @RussellSpitzer @flyrain @szehon-ho @rdblue @openinx @stevenzwu