-
Notifications
You must be signed in to change notification settings - Fork 9
Bazel migration for spark module #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request involves significant updates to build configurations across multiple Bazel files. The changes primarily focus on dependency management, introducing new ScalaTest dependencies, updating library versions, and adding a new Changes
Possibly related PRs
Suggested Reviewers
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (6)
🔇 Additional comments (6)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tools/build_rules/jvm_binary.bzl (1)
24-25: Enhance error messageAdd file names to help debugging.
- fail("Cannot have scala and java sources in same jvm_binary") + fail("Cannot mix Scala and Java sources in same jvm_binary. Found: " + ", ".join(srcs))spark/BUILD.bazel (1)
90-115: Consider centralizing JVM flagsThese JVM flags could be moved to a central configuration to avoid duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (8)
api/BUILD.bazel(1 hunks)online/BUILD.bazel(1 hunks)spark/BUILD.bazel(1 hunks)tools/build_rules/dependencies/maven_repository.bzl(1 hunks)tools/build_rules/dependencies/spark_repository.bzl(1 hunks)tools/build_rules/jvm_binary.bzl(1 hunks)tools/build_rules/prelude_bazel(1 hunks)tools/build_rules/spark/BUILD(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: fetcher_spark_tests
- GitHub Check: other_spark_tests
- GitHub Check: mutation_spark_tests
- GitHub Check: join_spark_tests
- GitHub Check: no_spark_scala_tests
🔇 Additional comments (8)
tools/build_rules/prelude_bazel (1)
3-4: Verify migration of jar and scala_jar usages✅ Verification successful
Removal of jar and scala_jar from prelude_bazel is safe
- Rules are still available in common.bzl
- No other files were loading these rules through prelude_bazel
- Existing usages in spark/BUILD load directly from common.bzl
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Find any remaining references to jar and scala_jar rg "jar\(" --type bazel rg "scala_jar\(" --type bazelLength of output: 1647
Script:
#!/bin/bash # Find files loading from prelude_bazel rg 'load\("//tools/build_rules/prelude_bazel"' --type bazelLength of output: 61
Script:
#!/bin/bash # Broader search for prelude_bazel references rg "prelude_bazel" --type bazel cat WORKSPACELength of output: 3055
tools/build_rules/dependencies/spark_repository.bzl (1)
13-15: Verify Hive version downgrade compatibilityDowngrading Hive from 3.1.2 to 2.3.9 needs validation.
tools/build_rules/spark/BUILD (1)
77-77: LGTM: Using relative pathtools/build_rules/jvm_binary.bzl (1)
4-54: LGTM: Clean jvm_binary implementationapi/BUILD.bazel (1)
38-40: Verify ScalaTest dependency versionsEnsure these ScalaTest dependencies use compatible versions with existing test frameworks.
✅ Verification successful
ScalaTest dependencies are consistent across the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for potential version conflicts in ScalaTest dependencies rg -l 'scalatest' | xargs rg 'scala_artifact.*scalatest.*"'Length of output: 1984
tools/build_rules/dependencies/maven_repository.bzl (1)
30-32: Verify Jackson version compatibilityThe Jackson upgrade to 2.15.2 is substantial. Ensure compatibility with Spark's Jackson version.
✅ Verification successful
Jackson version upgrade is properly managed
The upgrade to Jackson 2.15.2 is safe as json4s-jackson is explicitly pinned to Spark's version and Jackson versions are managed through dependency overrides.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check Spark's Jackson version compatibility rg -A 5 'jackson.*version' | grep -v '2.15.2'Length of output: 1040
Script:
#!/bin/bash # Check Spark's dependency on Jackson in their codebase gh api repos/apache/spark/pulls/45838 | jq -r '.body' | grep -i jacksonLength of output: 132
Script:
#!/bin/bash # Check dependency declarations for json4s rg "json4s.*jackson.*3\.7\.0-M11" -A 10Length of output: 2741
online/BUILD.bazel (1)
21-23: LGTM: Correct transition to scala_artifactAppropriate use of scala_artifact for Scala-specific dependencies.
spark/BUILD.bazel (1)
155-160: LGTM: Correct jvm_binary configurationThe assembly target is properly configured with the main class and runtime dependencies.
| deps = [ | ||
| ":lib", | ||
| ":test-lib", | ||
| "//aggregator:lib", | ||
| "//aggregator:test-lib", | ||
| "//api:lib", | ||
| "//api:thrift", | ||
| "//online:lib", | ||
| "//tools/build_rules/spark:spark-exec", | ||
| # Library | ||
| maven_artifact("com.google.code.gson:gson"), | ||
| maven_artifact("com.google.guava:guava"), | ||
| scala_artifact("org.rogach:scallop"), | ||
| maven_artifact("commons.io:commons-io"), | ||
| maven_artifact("commons.lang:commons-lang"), | ||
| scala_artifact("org.scala-lang.modules:scala-java8-compat"), | ||
| scala_artifact("com.fasterxml.jackson.module:jackson-module-scala"), | ||
| maven_artifact("org.slf4j:slf4j-api"), | ||
| maven_artifact("org.slf4j:slf4j-log4j12"), | ||
| scala_artifact("org.json4s:json4s-core"), | ||
| scala_artifact("org.json4s:json4s-jackson"), | ||
| scala_artifact("org.json4s:json4s-ast"), | ||
| maven_artifact("org.yaml:snakeyaml"), | ||
| # Testing | ||
| scala_artifact("org.scalatest:scalatest-matchers-core"), | ||
| scala_artifact("org.scalatest:scalatest-core"), | ||
| scala_artifact("org.scalatest:scalatest"), | ||
| scala_artifact("org.scalatest:scalatest-flatspec"), | ||
| scala_artifact("org.scalatest:scalatest-funsuite"), | ||
| scala_artifact("org.scalatest:scalatest-shouldmatchers"), | ||
| scala_artifact("org.scalactic:scalactic"), | ||
| scala_artifact("org.scalatestplus:mockito-3-4"), | ||
| maven_artifact("org.scalatest:scalatest-compatible"), | ||
| maven_artifact("junit:junit"), | ||
| maven_artifact("com.novocode:junit-interface"), | ||
| maven_artifact("org.mockito:mockito-core"), | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce dependency duplication
test target duplicates many dependencies from test-lib. Consider inheriting them through the test-lib dependency.
- # Library
- maven_artifact("com.google.code.gson:gson"),
- maven_artifact("com.google.guava:guava"),
- scala_artifact("org.rogach:scallop"),
- maven_artifact("commons.io:commons-io"),
- maven_artifact("commons.lang:commons-lang"),
- scala_artifact("org.scala-lang.modules:scala-java8-compat"),
- scala_artifact("com.fasterxml.jackson.module:jackson-module-scala"),
- maven_artifact("org.slf4j:slf4j-api"),
- maven_artifact("org.slf4j:slf4j-log4j12"),
- scala_artifact("org.json4s:json4s-core"),
- scala_artifact("org.json4s:json4s-jackson"),
- scala_artifact("org.json4s:json4s-ast"),
- maven_artifact("org.yaml:snakeyaml"),
- # Testing
- scala_artifact("org.scalatest:scalatest-matchers-core"),
- scala_artifact("org.scalatest:scalatest-core"),
- scala_artifact("org.scalatest:scalatest"),
- scala_artifact("org.scalatest:scalatest-flatspec"),
- scala_artifact("org.scalatest:scalatest-funsuite"),
- scala_artifact("org.scalatest:scalatest-shouldmatchers"),
- scala_artifact("org.scalactic:scalactic"),
- scala_artifact("org.scalatestplus:mockito-3-4"),
- maven_artifact("org.scalatest:scalatest-compatible"),
- maven_artifact("junit:junit"),
- maven_artifact("com.novocode:junit-interface"),
- maven_artifact("org.mockito:mockito-core"),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| deps = [ | |
| ":lib", | |
| ":test-lib", | |
| "//aggregator:lib", | |
| "//aggregator:test-lib", | |
| "//api:lib", | |
| "//api:thrift", | |
| "//online:lib", | |
| "//tools/build_rules/spark:spark-exec", | |
| # Library | |
| maven_artifact("com.google.code.gson:gson"), | |
| maven_artifact("com.google.guava:guava"), | |
| scala_artifact("org.rogach:scallop"), | |
| maven_artifact("commons.io:commons-io"), | |
| maven_artifact("commons.lang:commons-lang"), | |
| scala_artifact("org.scala-lang.modules:scala-java8-compat"), | |
| scala_artifact("com.fasterxml.jackson.module:jackson-module-scala"), | |
| maven_artifact("org.slf4j:slf4j-api"), | |
| maven_artifact("org.slf4j:slf4j-log4j12"), | |
| scala_artifact("org.json4s:json4s-core"), | |
| scala_artifact("org.json4s:json4s-jackson"), | |
| scala_artifact("org.json4s:json4s-ast"), | |
| maven_artifact("org.yaml:snakeyaml"), | |
| # Testing | |
| scala_artifact("org.scalatest:scalatest-matchers-core"), | |
| scala_artifact("org.scalatest:scalatest-core"), | |
| scala_artifact("org.scalatest:scalatest"), | |
| scala_artifact("org.scalatest:scalatest-flatspec"), | |
| scala_artifact("org.scalatest:scalatest-funsuite"), | |
| scala_artifact("org.scalatest:scalatest-shouldmatchers"), | |
| scala_artifact("org.scalactic:scalactic"), | |
| scala_artifact("org.scalatestplus:mockito-3-4"), | |
| maven_artifact("org.scalatest:scalatest-compatible"), | |
| maven_artifact("junit:junit"), | |
| maven_artifact("com.novocode:junit-interface"), | |
| maven_artifact("org.mockito:mockito-core"), | |
| ], | |
| deps = [ | |
| ":lib", | |
| ":test-lib", | |
| "//aggregator:lib", | |
| "//aggregator:test-lib", | |
| "//api:lib", | |
| "//api:thrift", | |
| "//online:lib", | |
| "//tools/build_rules/spark:spark-exec", | |
| ], |
| ], | ||
| ) | ||
|
|
||
| scala_test_suite( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc - this will kick off all the tests in a single bazel / jvm process right? Might want to have targets per test (we sorta do this in our sbt github worklows today)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct it will kick off all the tests in the spark module in parallel. We can also run only a specific test if needed by specifying it's path from the test suite which should be good enough for the github workflows use case i think.
We would need to create a lot of boiler plate config specifying targets per test in BUILD file which might actually work out well in the long term when we have a script that auto generates config for the test targets but may be not worth doing it at this point. I know for the fact that at google we specify build targets per test file but the config is mostly auto generated :) Please let me know if you have any concerns with our current approach, happy to discuss more but i'm slightly more inclined towards test_suite approach at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kumar-zlai thanks so much for doing the migration! Could you share a command to use to generate the final jar so that we could use to quickly run some integration tests? Just wanna make sure the classpaths are correct thus far.
Sure, |
tchow-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending CI failures
## Summary Migrated spark module to bazel. Known issue with tests : JUnit and Scala tests using FunSuite are not getting identified in bazel scala_test_suite. Thanks to @nikhil-zlai for working on the script to modify our tests to use Scala FlatSpec api which seems to be working with bazel. Will test again after his changes and create a separate PR if additional changes are needed. Also, it's good to be consistent with the unit tests api going forward unless we really need others. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **Dependencies** - Updated logging framework dependency to Log4j 2.x across various libraries. - Updated ScalaTest dependencies in API testing. - Migrated library dependencies from Maven to Scala artifacts. - Updated Jackson, JSON4s, and other library versions. - Added new dependencies for Guava, Netty, Delta Spark, and Kafka. - **Build Configuration** - Introduced new Scala library and test configurations. - Added JVM binary assembly configuration. - Refined Bazel build rule imports and loading. - **Dependency Management** - Removed several unused dependencies. - Simplified dependency references. - Updated runtime dependency paths. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Migrated spark module to bazel. Known issue with tests : JUnit and Scala tests using FunSuite are not getting identified in bazel scala_test_suite. Thanks to @nikhil-zlai for working on the script to modify our tests to use Scala FlatSpec api which seems to be working with bazel. Will test again after his changes and create a separate PR if additional changes are needed. Also, it's good to be consistent with the unit tests api going forward unless we really need others. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **Dependencies** - Updated logging framework dependency to Log4j 2.x across various libraries. - Updated ScalaTest dependencies in API testing. - Migrated library dependencies from Maven to Scala artifacts. - Updated Jackson, JSON4s, and other library versions. - Added new dependencies for Guava, Netty, Delta Spark, and Kafka. - **Build Configuration** - Introduced new Scala library and test configurations. - Added JVM binary assembly configuration. - Refined Bazel build rule imports and loading. - **Dependency Management** - Removed several unused dependencies. - Simplified dependency references. - Updated runtime dependency paths. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Migrated spark module to bazel. Known issue with tests : JUnit and Scala tests using FunSuite are not getting identified in bazel scala_test_suite. Thanks to @nikhil-zlai for working on the script to modify our tests to use Scala FlatSpec api which seems to be working with bazel. Will test again after his changes and create a separate PR if additional changes are needed. Also, it's good to be consistent with the unit tests api going forward unless we really need others. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **Dependencies** - Updated logging framework dependency to Log4j 2.x across various libraries. - Updated ScalaTest dependencies in API testing. - Migrated library dependencies from Maven to Scala artifacts. - Updated Jackson, JSON4s, and other library versions. - Added new dependencies for Guava, Netty, Delta Spark, and Kafka. - **Build Configuration** - Introduced new Scala library and test configurations. - Added JVM binary assembly configuration. - Refined Bazel build rule imports and loading. - **Dependency Management** - Removed several unused dependencies. - Simplified dependency references. - Updated runtime dependency paths. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Migrated spark module to bazel. Known issue with tests : JUnit and Scala tests using FunSuite are not getting identified in bazel scala_test_suite. Thanks to @nikhil-zlai for working on the script to modify our tests to use Scala FlatSpec api which seems to be working with bazel. Will test again after his changes and create a separate PR if additional changes are needed. Also, it's good to be consistent with the unit tests api going forward unless we really need others. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **Dependencies** - Updated logging framework dependency to Log4j 2.x across various libraries. - Updated ScalaTest dependencies in API testing. - Migrated library dependencies from Maven to Scala artifacts. - Updated Jackson, JSON4s, and other library versions. - Added new dependencies for Guava, Netty, Delta Spark, and Kafka. - **Build Configuration** - Introduced new Scala library and test configurations. - Added JVM binary assembly configuration. - Refined Bazel build rule imports and loading. - **Dependency Management** - Removed several unused dependencies. - Simplified dependency references. - Updated runtime dependency paths. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Migrated spark module to bazel. Known issue with tests : JUnit and Scala tests using FunSuite are not getting identified in bazel scala_test_suite. Thanks to @nikhil-zlai for working on the script to modify our tests to use Scala FlatSpec api which seems to be working with bazel. Will test again after his changes and create a separate PR if additional changes are needed. Also, it's good to be consistent with the unit tests api going forward unless we really need others. ## Checklist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **Dependencies** - Updated logging framework dependency to Log4j 2.x across various libraries. - Updated ScalaTest dependencies in API testing. - Migrated library dependencies from Maven to Scala artifacts. - Updated Jackson, JSON4s, and other library versions. - Added new dependencies for Guava, Netty, Delta Spark, and Kafka. - **Build Configuration** - Introduced new Scala library and test configurations. - Added JVM binary assembly configuration. - Refined Bazel build rule imports and loading. - **Dependency Management** - Removed several unused dependencies. - Simplified dependency references. - Updated runtime dependency paths. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Migrated spark module to bazel. Known issue with tests : JUnit and Scala tests using FunSuite are not getting identified in bazel scala_test_suite. Thanks to @nikhil-zlai for working on the script to modify our tests to use Scala FlatSpec api which seems to be working with bazel. Will test again after his changes and create a separate PR if additional changes are needed. Also, it's good to be consistent with the unit tests api going forward unless we really need others. ## Cheour clientslist - [ ] Added Unit Tests - [x] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **Dependencies** - Updated logging framework dependency to Log4j 2.x across various libraries. - Updated ScalaTest dependencies in API testing. - Migrated library dependencies from Maven to Scala artifacts. - Updated Jaour clientsson, JSON4s, and other library versions. - Added new dependencies for Guava, Netty, Delta Spark, and Kafka. - **Build Configuration** - Introduced new Scala library and test configurations. - Added JVM binary assembly configuration. - Refined Bazel build rule imports and loading. - **Dependency Management** - Removed several unused dependencies. - Simplified dependency references. - Updated runtime dependency paths. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Migrated spark module to bazel.
Known issue with tests : JUnit and Scala tests using FunSuite are not getting identified in bazel scala_test_suite. Thanks to @nikhil-zlai for working on the script to modify our tests to use Scala FlatSpec api which seems to be working with bazel. Will test again after his changes and create a separate PR if additional changes are needed. Also, it's good to be consistent with the unit tests api going forward unless we really need others.
Checklist
Summary by CodeRabbit
Release Notes
Dependencies
Build Configuration
Dependency Management