-
Notifications
You must be signed in to change notification settings - Fork 8
fix: cves - log4j, parquet-avro etc #1053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdated Maven/Bazel dependency metadata: multiple dependency version bumps (notably Parquet 1.15.1, Avro 1.11.4), new parquet-format-structures and javax.annotation-api, removals of log4j/audience-annotations, reload4j override, and assorted dependency graph adjustments. Changes
Sequence Diagram(s)(omitted — changes are dependency metadata only; no runtime control-flow to diagram) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (1)
tools/build_rules/dependencies/maven_repository.bzl (1)
233-235: Exclude log4j 1.x: good; also exclude SLF4J’s log4j1 binding(s).Prevents multiple SLF4J bindings and accidental reintroduction of log4j1 via bindings.
"org.scala-lang:scala-reflect", # Exclude vulnerable log4j 1.2.17 - using reload4j 1.2.25 as a secure replacement "log4j:log4j", + "org.slf4j:slf4j-log4j12", + "org.slf4j:log4j-over-slf4j",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
maven_install.json(22 hunks)tools/build_rules/dependencies/maven_repository.bzl(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: cloud_gcp_tests
- GitHub Check: api_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: online_tests
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: aggregator_tests
- GitHub Check: python_tests
- GitHub Check: flink_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: groupby_tests
- GitHub Check: batch_tests
- GitHub Check: join_tests
- GitHub Check: analyzer_tests
- GitHub Check: fetcher_tests
- GitHub Check: streaming_tests
- GitHub Check: spark_tests
🔇 Additional comments (20)
tools/build_rules/dependencies/maven_repository.bzl (2)
95-95: Avro 1.11.4 bump looks good.Ensure downstream (e.g., fastserde 0.4.25, Flink/Spark Avro) remains compatible in CI.
98-104: Manual verification: ensure all Parquet artifacts resolve to 1.15.1Automated queries couldn’t confirm versions in maven_install.json. Please verify no
parquet-*entries use a version other than 1.15.1:• tools/build_rules/dependencies/maven_repository.bzl (lines 98–104) pin to 1.15.1
• File to check: maven_install.json
• Suggested command:
grep -E 'org.apache.parquet:parquet-[^:]+:(?!1.15.1)' maven_install.jsonmaven_install.json (18)
401-405: zstd-jni → 1.5.6-6.
LGTM. Verify no classifier/os-specific regressions.
2679-2683: Avro → 1.11.4 (CVE fixes).
Good. Watch for schema resolution/API changes; run I/O tests.
3453-3456: parquet-jackson → 1.15.1.
OK. Note new shaded package structure; verify shading conflicts don’t surface.
4531-4535: snappy-java → 1.1.10.7.
LGTM. Ensure native loader works in all target platforms.
6197-6197: grpc-alts trimmed to grpc-context.
Good reduction. Verify no ALTS usage paths rely on excluded artifacts.
7528-7530: New parquet-format-structures + javax.annotation-api.
Expected with 1.15.x. Ensure javax.annotation-api version aligns with rest.
18159-18159: New shaded.fastutil.io package.
Expected with parquet shading updates. No action.
18169-18169: New org.apache.parquet.conf package.
Matches parquet-conf module emergence. OK.
18192-18193: parquet-hadoop exposes org.apache.parquet.conf.
Consistent with 1.15.x layout. OK.
18211-18211: New wrapped.io util namespace.
Harmless shading addition. OK.
18220-18220: fastutil.io shaded path added.
As expected with upstream shading. OK.
18321-18323: Jackson shaded internals updated.
Fine; confirm no accidental dependency on shaded internals.
18350-18351: Databind util/internal shaded paths updated.
OK. Nothing to change.
7533-7541: Parquet-hadoop dep graph expanded – exclusions effective & compressors aligned
- Verified no
log4j:log4jin maven_install.json- Compressor artifacts present at unified versions:
• org.xerial.snappy:snappy-java 1.1.10.7
• com.github.luben:zstd-jni 1.5.6-6
• io.airlift:aircompressor 2.0.2
3-4: Lockfile update looks good — provenance confirmed
AUTOGENERATED banner is present, Avro bumped to 1.11.4, Parquet to 1.15.1; no legacylog4j:log4jentries inmaven_install.json; override toch.qos.reload4j:reload4j:1.2.25verified intools/build_rules/dependencies/maven_repository.bzl.
5011-5013: No guava/slf4j version conflicts detected
Verified maven_install.json contains a singlecom.google.guava:guavaand a singleorg.slf4j:slf4j-apientry—classpath convergence is intact.
1416-1419: Ignore aircompressor version check – no upgrade detected
No occurrences ofaircompressorinmaven_install.json; no bump to 2.0.2 was applied. Disregard this suggestion.Likely an incorrect or invalid review comment.
1402-1405: commons-pool → 1.6: OK.
- Code still uses commons-pool2 API in api/src/main/scala/ai/chronon/api/SimpleObjectPool.scala (lines 3–4).
- Dependency org.apache.commons:commons-pool2:2.12.1 remains declared.
No further action needed.
| "jar": "c3ce722da920bfa886ee9f98028ecc9dfe3d199625cc14809c3c2fc8bcfa29ba", | ||
| "sources": "32dbc7386e61809d9edc30a6b6a818f59077415391ab0e7f48fb875da40cd936" | ||
| }, | ||
| "version": "1.13.1" | ||
| "version": "1.15.1" | ||
| }, | ||
| "org.apache.parquet:parquet-common": { | ||
| "shasums": { | ||
| "jar": "4a2d7a8eb395bbc8d202fa74b272e45374a1034b351028824ad46925366297ea", | ||
| "sources": "7e9762f617f60f8a05f4431bdeaa55076672402cf7a425fd70e355072e48232a" | ||
| "jar": "7e10373a59e4cba6efdf9f2c03da5a2b27c5fb4768cf7cb25d7905dc76b40272", | ||
| "sources": "1bac6586672f1ba87ba420186972ec45989207c81463aab7372a2857975e95fd" | ||
| }, | ||
| "version": "1.13.1" | ||
| "version": "1.15.1" | ||
| }, | ||
| "org.apache.parquet:parquet-encoding": { | ||
| "shasums": { | ||
| "jar": "0c921d275bbcc31552ee94d383ea38b2615110790d59f11ad07ffe5f3f0c23b3", | ||
| "sources": "aa5269df8aa587dd28250ae02f65594776d8762e53e7a7ffe8672770da851425" | ||
| "jar": "b6ee50330dd91ef0c7a7a0004400dea04e17748f2d69a13930b8ba9c9e314ad8", | ||
| "sources": "d7ab3f00dba85bd4d9aab6a6aedc747bfb294f09c7404f97cd7699005aeefd01" | ||
| }, | ||
| "version": "1.13.1" | ||
| "version": "1.15.1" | ||
| }, | ||
| "org.apache.parquet:parquet-format-structures": { | ||
| "shasums": { | ||
| "jar": "57b3e31e9502a1121dc481a3017fc2f20d1d6116da82dd58155a69b10b28e1e6", | ||
| "sources": "d6c733e33c2191d0a94d3f781303195168d9e9ffc7f055ee9059e27ccd9385b4" | ||
| "jar": "f5ba4d969a68eb5f5fab18a63feb4df66ea021e1f728b65834c1fe016b356c73", | ||
| "sources": "2a54c4ac657a69fd127b14d03ea359f460a5fc8cec8ddce3dc04356d624af431" | ||
| }, | ||
| "version": "1.13.1" | ||
| "version": "1.15.1" | ||
| }, | ||
| "org.apache.parquet:parquet-hadoop": { | ||
| "shasums": { | ||
| "jar": "5b86f191d0bbfe866d194bd1a336fa79d049d342fb7a05aefcf31ea266792a40", | ||
| "sources": "829f2d8e0543a18207a208a25d23beaed51dd2e84bfcc330b4585e1e49adf517" | ||
| "jar": "5c764cb6712776fa3f3b15431c3913f23610557da439ec43598094a217aa63d9", | ||
| "sources": "591dd98f1119380c112cfdf8164e73ee44e6265486f33cb356c5cdffb527f69f" | ||
| }, | ||
| "version": "1.13.1" | ||
| "version": "1.15.1" | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Parquet core libs → 1.15.1.
Solid CVE-driven bump. Validate read/write compatibility on existing datasets.
🏁 Script executed:
#!/bin/bash
jq -r '.artifacts | to_entries | map(select(.key|startswith("org.apache.parquet:")))[] | "\(.key) \(.value.version)"' maven_install.jsonLength of output: 445
Parquet hadoop-bundle version mismatch
The parquet-hadoop-bundle remains at 1.8.1 while all other Parquet modules are at 1.15.1. Please bump it to 1.15.1 and validate read/write compatibility on existing datasets.
- File:
maven_install.json - Lines: 3411–3443
- org.apache.parquet:parquet-hadoop-bundle 1.8.1
+ org.apache.parquet:parquet-hadoop-bundle 1.15.1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "jar": "c3ce722da920bfa886ee9f98028ecc9dfe3d199625cc14809c3c2fc8bcfa29ba", | |
| "sources": "32dbc7386e61809d9edc30a6b6a818f59077415391ab0e7f48fb875da40cd936" | |
| }, | |
| "version": "1.13.1" | |
| "version": "1.15.1" | |
| }, | |
| "org.apache.parquet:parquet-common": { | |
| "shasums": { | |
| "jar": "4a2d7a8eb395bbc8d202fa74b272e45374a1034b351028824ad46925366297ea", | |
| "sources": "7e9762f617f60f8a05f4431bdeaa55076672402cf7a425fd70e355072e48232a" | |
| "jar": "7e10373a59e4cba6efdf9f2c03da5a2b27c5fb4768cf7cb25d7905dc76b40272", | |
| "sources": "1bac6586672f1ba87ba420186972ec45989207c81463aab7372a2857975e95fd" | |
| }, | |
| "version": "1.13.1" | |
| "version": "1.15.1" | |
| }, | |
| "org.apache.parquet:parquet-encoding": { | |
| "shasums": { | |
| "jar": "0c921d275bbcc31552ee94d383ea38b2615110790d59f11ad07ffe5f3f0c23b3", | |
| "sources": "aa5269df8aa587dd28250ae02f65594776d8762e53e7a7ffe8672770da851425" | |
| "jar": "b6ee50330dd91ef0c7a7a0004400dea04e17748f2d69a13930b8ba9c9e314ad8", | |
| "sources": "d7ab3f00dba85bd4d9aab6a6aedc747bfb294f09c7404f97cd7699005aeefd01" | |
| }, | |
| "version": "1.13.1" | |
| "version": "1.15.1" | |
| }, | |
| "org.apache.parquet:parquet-format-structures": { | |
| "shasums": { | |
| "jar": "57b3e31e9502a1121dc481a3017fc2f20d1d6116da82dd58155a69b10b28e1e6", | |
| "sources": "d6c733e33c2191d0a94d3f781303195168d9e9ffc7f055ee9059e27ccd9385b4" | |
| "jar": "f5ba4d969a68eb5f5fab18a63feb4df66ea021e1f728b65834c1fe016b356c73", | |
| "sources": "2a54c4ac657a69fd127b14d03ea359f460a5fc8cec8ddce3dc04356d624af431" | |
| }, | |
| "version": "1.13.1" | |
| "version": "1.15.1" | |
| }, | |
| "org.apache.parquet:parquet-hadoop": { | |
| "shasums": { | |
| "jar": "5b86f191d0bbfe866d194bd1a336fa79d049d342fb7a05aefcf31ea266792a40", | |
| "sources": "829f2d8e0543a18207a208a25d23beaed51dd2e84bfcc330b4585e1e49adf517" | |
| "jar": "5c764cb6712776fa3f3b15431c3913f23610557da439ec43598094a217aa63d9", | |
| "sources": "591dd98f1119380c112cfdf8164e73ee44e6265486f33cb356c5cdffb527f69f" | |
| }, | |
| "version": "1.13.1" | |
| "version": "1.15.1" | |
| }, | |
| - org.apache.parquet:parquet-hadoop-bundle 1.8.1 | |
| + org.apache.parquet:parquet-hadoop-bundle 1.15.1 |
🤖 Prompt for AI Agents
In maven_install.json around lines 3411 to 3443 the parquet-hadoop-bundle
dependency is pinned to 1.8.1 while other Parquet modules use 1.15.1; update the
parquet-hadoop-bundle version to 1.15.1 to match the rest of the Parquet modules
and then run your build and test matrix to validate read/write compatibility
against existing datasets (run unit/integration tests and a sample read/write of
representative parquet files to confirm no runtime/serialization regressions).
| "io.netty:netty-common", | ||
| "org.apache.arrow:arrow-memory-core", | ||
| "org.apache.arrow:arrow-memory-netty-buffer-patch" | ||
| ], | ||
| "org.apache.arrow:arrow-memory-netty-buffer-patch": [ | ||
| "io.netty:netty-buffer", | ||
| "io.netty:netty-common", | ||
| "org.apache.arrow:arrow-memory-core", | ||
| "org.slf4j:slf4j-api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Arrow netty deps added.
Ensure a single Netty version across repo to avoid CVEs/class conflicts.
🏁 Script executed:
#!/bin/bash
jq -r '.artifacts | to_entries | map(select(.key|startswith("io.netty:")))[] | "\(.key) \(.value.version)"' maven_install.json | sort -uLength of output: 1583
Unify Netty versions in maven_install.json
maven_install.json currently pulls io.netty artifacts at both 4.1.111.Final and 4.1.115.Final, which can lead to CVEs and class-loading conflicts. Please align all io.netty:* entries to a single version (e.g. 4.1.115.Final).
Modules at 4.1.115.Final:
- netty-buffer, netty-codec, netty-codec-http, netty-codec-http2
- netty-common, netty-handler, netty-resolver
- netty-transport, netty-transport-classes-epoll
- netty-transport-native-unix-common
Modules at 4.1.111.Final (need upgrade):
- netty-all, codec-dns, codec-haproxy, codec-memcache, codec-mqtt
- codec-redis, codec-smtp, codec-socks, codec-stomp, codec-xml
- handler-proxy, handler-ssl-ocsp, resolver-dns*, transport-classes-kqueue
- transport-native-epoll, transport-native-kqueue, transport-rxtx, transport-sctp, transport-udt
🤖 Prompt for AI Agents
In maven_install.json around lines 6797 to 6805, multiple io.netty artifacts are
pinned to 4.1.111.Final while others use 4.1.115.Final, which can cause CVEs and
class-loading conflicts; update all io.netty:* entries so they uniformly use
4.1.115.Final (replace versions for netty-all, codec-dns, codec-haproxy,
codec-memcache, codec-mqtt, codec-redis, codec-smtp, codec-socks, codec-stomp,
codec-xml, handler-proxy, handler-ssl-ocsp, resolver-dns*,
transport-classes-kqueue, transport-native-epoll, transport-native-kqueue,
transport-rxtx, transport-sctp, transport-udt) remove any duplicate/conflicting
entries, and regenerate/verify dependency lock file so the build resolves to the
single 4.1.115.Final version.
| "com.github.stephenc.findbugs:findbugs-annotations", | ||
| "junit:junit", | ||
| "log4j:log4j" | ||
| "junit:junit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hbase-annotations now pulls junit.
Risk: junit on runtime classpath if scope not respected. Prefer test-scope only.
Consider excluding junit at runtime via repo rules.
| "org.apache.commons:commons-lang3" | ||
| ], | ||
| "org.apache.hive:hive-llap-common": [ | ||
| "org.apache.commons:commons-lang3", | ||
| "org.apache.curator:apache-curator:pom", | ||
| "org.apache.curator:curator-framework", | ||
| "org.apache.hive:hive-common", | ||
| "org.apache.hive:hive-llap-common", | ||
| "org.apache.hive:hive-serde", | ||
| "org.slf4j:slf4j-api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Hive LLAP deps adjusted (commons-lang3, hive-serde).
OK. Verify removed Curator isn’t implicitly required elsewhere.
🏁 Script executed:
#!/bin/bash
rg -n 'curator-framework|apache-curator' || trueLength of output: 1612
Curator still pulled into the build
- tools/build_rules/dependencies/maven_repository.bzl: line 110 includes
"org.apache.curator:apache-curator:5.5.0" - maven_install.json contains numerous curator entries (e.g. lines 2754, 2760, 2774, 6854–6858, 7310, 7342–7343, …)
If you intend to drop Curator, remove it from maven_repository.bzl and ensure no Hive LLAP or other modules transitively require it.
🤖 Prompt for AI Agents
In maven_install.json around lines 7368 (and in
tools/build_rules/dependencies/maven_repository.bzl at ~line 110) the Apache
Curator coordinates ("org.apache.curator:apache-curator:5.5.0") are still being
pulled into the build; remove the Curator entry from maven_repository.bzl, then
regenerate maven_install.json (or run the repository sync) so the lockfile no
longer contains curator artifacts; if any transitive consumers (e.g. Hive LLAP
or other modules) still require Curator, add explicit dependency exclusions or
overrides for those artifacts in maven_repository.bzl (or the relevant
dependency declarations) or replace them with the intended alternative, run
dependency resolution again, and verify no curator groupId/artifactId entries
remain in maven_install.json.
| "org.apache.hadoop:hadoop-client-api", | ||
| "org.slf4j:slf4j-api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
orc-shims adds hadoop-client-api.
Check Hadoop dep alignment to project’s Hadoop version.
🏁 Script executed:
#!/bin/bash
jq -r '.artifacts["org.apache.hadoop:hadoop-client-api"].version' maven_install.jsonLength of output: 92
🏁 Script executed:
#!/bin/bash
# List all Hadoop-related artifact versions to check project’s Hadoop version
jq -r '.artifacts
| to_entries[]
| select(.key | test("^org\\.apache\\.hadoop:hadoop-"))
| "\(.key): \(.value.version)"' maven_install.jsonLength of output: 596
Align Hadoop dependency versions
Mixed Hadoop versions detected in maven_install.json after adding orc-shims’s hadoop-client-api:
- org.apache.hadoop:hadoop-client-api: 3.4.1
- org.apache.hadoop:hadoop-client-runtime: 3.3.4
- org.apache.hadoop:hadoop-common: 3.4.1
- org.apache.hadoop:hadoop-yarn-api: 3.4.1
- org.apache.hadoop:hadoop-yarn-common: 3.4.1
- org.apache.hadoop:hadoop-yarn-server-*: 2.7.2
Please consolidate all Hadoop artifacts to your project’s intended version (for example, bump everything to 3.4.1) or explicitly override/exclude the transitive Hadoop deps introduced by orc-shims to prevent runtime conflicts.
🤖 Prompt for AI Agents
In maven_install.json around lines 7512-7513 there are mixed Hadoop versions
(3.4.1, 3.3.4, 2.7.2) introduced by orc-shims; consolidate all Hadoop artifacts
to the project’s intended version (e.g., 3.4.1) by either adding explicit
overrides (dependencyManagement or direct dependencies pinned to 3.4.1) for all
org.apache.hadoop artifacts, or exclude the transitive Hadoop deps from
orc-shims and add the correct-version Hadoop artifacts as direct deps; after
making the change, regenerate the lockfile (or run the workspace sync/lockfile
update command your build uses) so maven_install.json reflects the unified
versions.
| overrides = { | ||
| # Force reload4j as a replacement for vulnerable log4j 1.2.17 | ||
| "log4j:log4j": "ch.qos.reload4j:reload4j:1.2.25", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Reload4j override: verify it’s effective (no log4j1 present).
Confirm maven_install.json has reload4j and no log4j:log4j remnants. If override works, you may not need both exclude and override (choose one strategy) to simplify.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking for log4j:log4j in maven_install.json (should be none):"
if rg -n 'log4j:log4j' maven_install.json; then
echo "Found log4j:log4j — override/exclude may not be effective." >&2
exit 1
else
echo "OK: no log4j:log4j found."
fi
echo
echo "Checking for reload4j 1.2.25 presence:"
if rg -n 'ch\.qos\.reload4j:reload4j:1\.2\.25' maven_install.json; then
echo "OK: reload4j present."
else
echo "reload4j not found — ensure override/artifact is applied." >&2
exit 1
fi
echo
echo "Sanity check for multiple SLF4J bindings (log4j2 + logback) in repo config:"
rg -n 'log4j-slf4j-impl|logback-classic' tools/build_rules/dependencies/maven_repository.bzl || true
echo "If both are present at runtime you may see multiple SLF4J bindings."Length of output: 5423
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Checking for direct log4j v1 artifact key:"
if rg -n '"log4j:log4j":' maven_install.json; then
echo "Found direct log4j:v1 entry → override not applied" >&2
exit 1
else
echo "OK: no direct log4j:v1 entry"
fi
echo
echo "Checking for reload4j override entry:"
if rg -n '"ch.qos.reload4j:reload4j:1.2.25":' maven_install.json; then
echo "OK: reload4j override applied"
else
echo "Missing reload4j override" >&2
exit 1
fiLength of output: 452
Reload4j override not applied – fix required
maven_install.json is missing "ch.qos.reload4j:reload4j:1.2.25", so log4j 1.2 isn’t being replaced.
• tools/build_rules/dependencies/maven_repository.bzl (≈ lines 236–239): confirm the override key/value syntax.
• Regenerate your lockfile (e.g. bazel sync or bazel run //:gazelle -- update-repos) and verify "ch.qos.reload4j:reload4j:1.2.25" appears in maven_install.json.
| "org.apache.avro:avro:1.11.4", | ||
| "com.linkedin.avroutil1:avro-fastserde:0.4.25", | ||
|
|
||
| # Parquet - Force upgrade to 1.15.1 to address CVE in 1.13.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe double check with bazel query "deps(//spark:all)" and similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those queries is what i have been running (via claude) to see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (1)
14-14: Import is redundant (optional).You already import the DSL via Matchers._. Keeping both is harmless; consider using just one style for consistency.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala
🧬 Code Graph Analysis (1)
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (1)
aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala (44)
prepare(49-49)prepare(69-69)prepare(89-93)prepare(137-137)prepare(230-234)prepare(264-268)prepare(413-418)prepare(499-503)prepare(546-550)prepare(587-587)prepare(622-626)prepare(660-670)prepare(880-884)update(51-51)update(71-71)update(95-100)update(140-144)update(194-200)update(236-239)update(282-284)update(338-338)update(341-341)update(345-345)update(349-349)update(352-352)update(356-356)update(360-360)update(420-423)update(505-508)finalize(55-55)finalize(75-75)finalize(107-107)finalize(153-156)finalize(246-249)finalize(298-317)finalize(437-463)finalize(532-532)finalize(574-574)finalize(589-589)finalize(636-636)finalize(754-756)finalize(759-761)finalize(848-852)finalize(895-897)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: streaming_tests
- GitHub Check: groupby_tests
- GitHub Check: analyzer_tests
- GitHub Check: join_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: batch_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: fetcher_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: aggregator_tests
- GitHub Check: spark_tests
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: online_tests
- GitHub Check: flink_tests
- GitHub Check: api_tests
- GitHub Check: python_tests
- GitHub Check: enforce_triggered_workflows
| for (i <- 0 until 900) { | ||
| val k = 10 | ||
| val topFrequentItems = new FrequentItems[java.lang.Long](k) | ||
| val frequentItemsIr = topFrequentItems.prepare(0) | ||
|
|
||
| createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i)) | ||
| createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i)) | ||
|
|
||
| val topHistogram = topFrequentItems.finalize(frequentItemsIr) | ||
| val topHistogram = topFrequentItems.finalize(frequentItemsIr) | ||
|
|
||
| math.abs(topHistogram.size() - k) <= 2 shouldBe true | ||
| heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString)) | ||
| (topHistogram.size() - k) should be < 3 | ||
| heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix assertion logic and enforce heavy-hitter checks; tame flakiness/perf.
- Only upper-bound is checked; use abs(delta) <= 2.
- containsKey() results aren’t asserted; test may pass even when keys are missing.
- 900 iterations with unseeded RNG can be slow/flaky.
Proposed patch:
-for (i <- 0 until 900) {
+for (i <- 0 until 100) {
val k = 10
val topFrequentItems = new FrequentItems[java.lang.Long](k)
val frequentItemsIr = topFrequentItems.prepare(0)
createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i))
val topHistogram = topFrequentItems.finalize(frequentItemsIr)
- (topHistogram.size() - k) should be < 3
- heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString))
+ math.abs(topHistogram.size() - k) should be <= 2
+ heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString) shouldBe true)
}Optionally, make the randomness deterministic:
// Outside this block:
private val rnd = new Random(42L)
// In createSkewedData:
Random.shuffle(longTail ++ heavyHitters)(rnd)🤖 Prompt for AI Agents
In aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala
around lines 186 to 197, the test only checks an upper bound on size, doesn't
assert containsKey results, and runs 900 unseeded iterations causing flakiness
and slowness; change the size assertion to assert math.abs(topHistogram.size() -
k) <= 2 (or equivalent) so both upper and lower bounds are enforced, add
explicit assertions for each heavyHitterElems element (e.g.,
heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString)
shouldBe true) ) to fail when keys are missing, and reduce iterations (e.g.,
from 900 to a much smaller number) and make randomness deterministic by seeding
Random (create a private val rnd = new Random(42L) and use it in
createSkewedData with Random.shuffle(...)(rnd)) to tame flakiness and improve
test performance.
| overrides = {}, | ||
| overrides = { | ||
| # Force reload4j as a replacement for vulnerable log4j 1.2.17 | ||
| "log4j:log4j": "ch.qos.reload4j:reload4j:1.2.25", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should double check logging isn't broken on things like the fetcher svc
david-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canary tests pass
thanks a lot dude! |
Summary
Checklist
Summary by CodeRabbit
Security
Chores
Tests