-
Notifications
You must be signed in to change notification settings - Fork 8
Allow users to include additional jars while deploying Flink apps #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughSupport for passing multiple additional JAR URIs to Flink jobs was added across Python and Scala components. The CLI, runner, and GCP submission layers now accept and validate an Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Runner
participant GCPSubmitter
participant DataprocSubmitter
User->>CLI: run --additional-jars=uri1,uri2
CLI->>Runner: args["additional_jars"] = "uri1,uri2"
Runner->>GCPSubmitter: Pass additional_jars in user_args
GCPSubmitter->>DataprocSubmitter: Pass additional_jars in submissionProperties
DataprocSubmitter->>DataprocSubmitter: Build Flink job with all JAR URIs
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (10)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
api/python/ai/chronon/repo/run.py (1)
235-235:⚠️ Potential issueRemove unused parameter from function signature.
The
mock_sourceparameter exists in the function signature but the corresponding CLI option was removed.flink_state_uri, - mock_source, + additional_jars, validate,🧰 Tools
🪛 Pylint (3.3.7)
[warning] 235-235: Unused argument 'mock_source'
(W0613)
🧹 Nitpick comments (2)
api/python/ai/chronon/repo/run.py (2)
96-104: Fix unused parameters and add docstring.The validation function has unused
ctxandparamparameters, and lacks documentation.-def validate_additional_jars(ctx, param, value): +def validate_additional_jars(ctx, param, value): + """Validate that additional jar URIs start with gs:// or s3://.""" if value: jars = value.split(',') for jar in jars: if not jar.startswith(('gs://', 's3://')): raise click.BadParameter( f"Additional jars must start with gs://, s3://: {jar}" ) return value🧰 Tools
🪛 Pylint (3.3.7)
[convention] 96-96: Missing function or method docstring
(C0116)
[warning] 96-96: Unused argument 'ctx'
(W0613)
[warning] 96-96: Unused argument 'param'
(W0613)
186-188: Fix line length issue.The CLI option help text exceeds 100 characters.
-@click.option("--additional-jars", - help="Comma separated list of additional jar URIs to be included in the Flink job classpath (e.g. gs://bucket/jar1.jar,gs://bucket/jar2.jar).", - callback=validate_additional_jars) +@click.option("--additional-jars", + help="Comma separated list of additional jar URIs for Flink job classpath " + "(e.g. gs://bucket/jar1.jar,gs://bucket/jar2.jar).", + callback=validate_additional_jars)🧰 Tools
🪛 Pylint (3.3.7)
[convention] 187-187: Line too long (157/100)
(C0301)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
api/python/ai/chronon/repo/default_runner.py(1 hunks)api/python/ai/chronon/repo/gcp.py(2 hunks)api/python/ai/chronon/repo/run.py(2 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala(4 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala(2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
api/python/ai/chronon/repo/run.py
[convention] 96-96: Missing function or method docstring
(C0116)
[warning] 96-96: Unused argument 'ctx'
(W0613)
[warning] 96-96: Unused argument 'param'
(W0613)
[convention] 187-187: Line too long (157/100)
(C0301)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: cloud_aws_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: streaming_tests
- GitHub Check: analyzer_tests
- GitHub Check: join_tests
- GitHub Check: fetcher_tests
- GitHub Check: batch_tests
- GitHub Check: groupby_tests
- GitHub Check: spark_tests
🔇 Additional comments (11)
spark/src/main/scala/ai/chronon/spark/submission/JobSubmitter.scala (2)
143-143: LGTM - Constant addition follows existing patterns.
172-172: LGTM - CLI argument keyword follows naming convention.api/python/ai/chronon/repo/default_runner.py (1)
40-40: LGTM - Consistent with existing variable initialization pattern.api/python/ai/chronon/repo/gcp.py (2)
345-345: Verify mock_source removal is intentional.The
--mock-sourceflag was removed fromflag_args. Ensure this removal aligns with the broader PR changes.
359-361: LGTM - Proper conditional inclusion of additional jars.The conditional check prevents empty values from being added to user_args.
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitterTest.scala (3)
39-39: LGTM - Test updated for new jarUris array parameter.
99-99: LGTM - Consistent test parameter update.
116-116: LGTM - PubSub test correctly includes multiple jars in array.The test now properly includes both the main jar and PubSub connector jar in the jarUris array.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/DataprocSubmitter.scala (3)
170-171: LGTM! Clean implementation of additional jars support.The logic properly extracts additional jars and combines them with existing jar URIs into a unified array.
241-241: LGTM! Method signature updated consistently.The
jarUrisparameter type change from single string toArray[String]aligns with the new multi-jar support.
404-413: LGTM! Additional jars properly integrated into submission properties.The implementation correctly extracts additional jars from CLI args and includes them in the properties map when present.
Summary
Users might need to provide custom schema serde providers via jars on cloud storage. This PR allows them to include these in as part of submission and they get included in the Flink job jars.
Example test command (confirmed that I saw this included in the Flink job):
Checklist
Summary by CodeRabbit