Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .github/workflows/java-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [11, 17, 21]
jvm: [17, 21]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for removing JDK 11 from this? we're releasing JDK 11 jars so I think we should keep testing with JDK 11

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
Expand All @@ -108,7 +108,7 @@ jobs:
runs-on: ubuntu-22.04
strategy:
matrix:
jvm: [11, 17, 21]
jvm: [17, 21]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-snapshot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ jobs:
- run: |
./gradlew printVersion
./gradlew -DallModules publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
./gradlew -DflinkVersions= -DsparkVersions=3.4,3.5 -DscalaVersion=2.13 -DkafkaVersions=3 publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
./gradlew -DflinkVersions= -DsparkVersions=3.4,3.5,4.0 -DscalaVersion=2.13 -DkafkaVersions=3 publishApachePublicationToMavenRepository -PmavenUser=${{ secrets.NEXUS_USER }} -PmavenPassword=${{ secrets.NEXUS_PW }}
6 changes: 5 additions & 1 deletion .github/workflows/spark-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,13 +71,17 @@ jobs:
strategy:
matrix:
jvm: [11, 17, 21]
spark: ['3.4', '3.5']
spark: ['3.4', '3.5', '4.0']
scala: ['2.12', '2.13']
exclude:
# Spark 3.5 is the first version not failing on Java 21 (https://issues.apache.org/jira/browse/SPARK-42369)
# Full Java 21 support is coming in Spark 4 (https://issues.apache.org/jira/browse/SPARK-43831)
- jvm: 11
spark: '4.0'
- jvm: 21
spark: '3.4'
- spark: '4.0'
scala: '2.12'
env:
SPARK_LOCAL_IP: localhost
steps:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ spark/v3.4/spark/benchmark/*
spark/v3.4/spark-extensions/benchmark/*
spark/v3.5/spark/benchmark/*
spark/v3.5/spark-extensions/benchmark/*
spark/v4.0/spark/benchmark/*
spark/v4.0/spark-extensions/benchmark/*
*/benchmark/*

__pycache__/
Expand Down
3 changes: 3 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,9 @@ allprojects {
repositories {
mavenCentral()
mavenLocal()
maven {
url "https://repository.apache.org/content/repositories/orgapachespark-1484/"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we planning to merge this PR with RC5?
(because I saw that we recently merged and reverted #13006)

I think it is not a good practice to depend on RC on main branch. Why don't we continue the development in the separate branch till the official release available?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not a good practice to depend on RC on main branch. Why don't we continue the development in the separate branch till the official release available?

@ajantha-bhat It's true that we'd have a dependency on an RC on main but there's a benefit to being able to develop on top of it for any new integrations (e.g. for V3 for example) while the RC is still going because we don't expect those dependent features to fundamentally change between RC and release. Keep in mind the defaultSparkVersion and any infra like benchmarking will still default to 3.5 until the official release.

The main challenge is that continuing development in the separate branch means that an individual needs to keep rebasing and evaluating if a change needs to be kept in sync with 4.0 with any intermediate changes to 3.4/3.5. Merging the initial means that every subsequent change to Iceberg-Spark puts it on the author of that change to keep it in sync, which is more narrow and that author will have a lot more context.

Combining that with the previous point about new integrations simply means we can safely and reasonably iterate on 4.0 integration until the official release rather than an individual just wait for all of that while rebasing/keeping in sync. That feels worthwhile to me compared to the awkwardness of having a RC dependency in main in the short term

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the explanation.

I am neutral about this. If community agrees to depend on RC, it works for me. Just make sure, it has more visibility.
Tagging @RussellSpitzer, @szehon-ho, @rdblue, @danielcweeks for more visibility/approvals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I went ahead and added a few folks for reviews. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as RC5 is getting closer to Spark 4.0 release (only blockers are getting in now), I think it makes sense to work on Iceberg in parallel as we dont anticipate any significant change bumping to the final RC

}
}

Expand Down
2 changes: 1 addition & 1 deletion gradle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jmhIncludeRegex=.*
systemProp.defaultFlinkVersions=2.0
systemProp.knownFlinkVersions=1.19,1.20,2.0
systemProp.defaultSparkVersions=3.5
systemProp.knownSparkVersions=3.4,3.5
systemProp.knownSparkVersions=3.4,3.5,4.0
systemProp.defaultKafkaVersions=3
systemProp.knownKafkaVersions=3
systemProp.defaultScalaVersion=2.12
Expand Down
4 changes: 4 additions & 0 deletions gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ activation = "1.1.1"
aliyun-sdk-oss = "3.10.2"
analyticsaccelerator = "1.0.0"
antlr = "4.9.3"
antlr413 = "4.13.1" # For Spark 4.0 support
aircompressor = "0.27"
apiguardian = "1.1.2"
arrow = "15.0.2"
Expand Down Expand Up @@ -82,6 +83,7 @@ slf4j = "2.0.17"
snowflake-jdbc = "3.24.0"
spark34 = "3.4.4"
spark35 = "3.5.5"
spark40 = "4.0.0"
sqlite-jdbc = "3.49.1.0"
testcontainers = "1.21.0"
tez08 = { strictly = "0.8.4"} # see rich version usage explanation above
Expand All @@ -93,6 +95,8 @@ aliyun-sdk-oss = { module = "com.aliyun.oss:aliyun-sdk-oss", version.ref = "aliy
analyticsaccelerator-s3 = { module = "software.amazon.s3.analyticsaccelerator:analyticsaccelerator-s3", version.ref = "analyticsaccelerator" }
antlr-antlr4 = { module = "org.antlr:antlr4", version.ref = "antlr" }
antlr-runtime = { module = "org.antlr:antlr4-runtime", version.ref = "antlr" }
antlr-antlr413 = { module = "org.antlr:antlr4", version.ref = "antlr413" }
antlr-runtime413 = { module = "org.antlr:antlr4-runtime", version.ref = "antlr413" }
arrow-memory-netty = { module = "org.apache.arrow:arrow-memory-netty", version.ref = "arrow" }
arrow-vector = { module = "org.apache.arrow:arrow-vector", version.ref = "arrow" }
avro-avro = { module = "org.apache.avro:avro", version.ref = "avro" }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ private void initConf(HiveConf conf, int port, boolean directSql) {
// Setting this to avoid thrift exception during running Iceberg tests outside Iceberg.
conf.set(
HiveConf.ConfVars.HIVE_IN_TEST.varname, HiveConf.ConfVars.HIVE_IN_TEST.getDefaultValue());
conf.set("datanucleus.connectionPoolingType", "DBCP");
}

private static void setupMetastoreDB(String dbURL) throws SQLException, IOException {
Expand Down
5 changes: 5 additions & 0 deletions jmh.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ if (sparkVersions.contains("3.5")) {
jmhProjects.add(project(":iceberg-spark:iceberg-spark-extensions-3.5_${scalaVersion}"))
}

if (sparkVersions.contains("4.0")) {
jmhProjects.add(project(":iceberg-spark:iceberg-spark-4.0_2.13"))
jmhProjects.add(project(":iceberg-spark:iceberg-spark-extensions-4.0_2.13"))
}

configure(jmhProjects) {
apply plugin: 'me.champeau.jmh'
apply plugin: 'io.morethan.jmhreport'
Expand Down
12 changes: 12 additions & 0 deletions settings.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,18 @@ if (sparkVersions.contains("3.5")) {
project(":iceberg-spark:spark-runtime-3.5_${scalaVersion}").name = "iceberg-spark-runtime-3.5_${scalaVersion}"
}

if (sparkVersions.contains("4.0")) {
include ":iceberg-spark:spark-4.0_2.13"
include ":iceberg-spark:spark-extensions-4.0_2.13"
include ":iceberg-spark:spark-runtime-4.0_2.13"
project(":iceberg-spark:spark-4.0_2.13").projectDir = file('spark/v4.0/spark')
project(":iceberg-spark:spark-4.0_2.13").name = "iceberg-spark-4.0_2.13"
project(":iceberg-spark:spark-extensions-4.0_2.13").projectDir = file('spark/v4.0/spark-extensions')
project(":iceberg-spark:spark-extensions-4.0_2.13").name = "iceberg-spark-extensions-4.0_2.13"
project(":iceberg-spark:spark-runtime-4.0_2.13").projectDir = file('spark/v4.0/spark-runtime')
project(":iceberg-spark:spark-runtime-4.0_2.13").name = "iceberg-spark-runtime-4.0_2.13"
}

if (kafkaVersions.contains("3")) {
include 'kafka-connect'
project(':kafka-connect').name = 'iceberg-kafka-connect'
Expand Down
4 changes: 4 additions & 0 deletions spark/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,7 @@ if (sparkVersions.contains("3.4")) {
if (sparkVersions.contains("3.5")) {
apply from: file("$projectDir/v3.5/build.gradle")
}

if (sparkVersions.contains("4.0")) {
apply from: file("$projectDir/v4.0/build.gradle")
}
Loading
Loading