fix: cves - log4j, parquet-avro etc #1053

nikhil-zlai · 2025-08-11T04:02:53Z

Summary

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

Security
- Removed legacy Log4j 1.x and replaced with Reload4j as a security mitigation.
Chores
- Upgraded core formats and compression libs (Parquet 1.15.1, Avro 1.11.4, Snappy, Zstd, Aircompressor, Commons Pool) and added parquet format-structures and javax.annotation-api.
- Streamlined transitive deps across gRPC, Netty/Arrow, Hive LLAP; cleaned up audience-annotations and other legacy entries.
Tests
- Stabilized a frequent-items test by running repeated trials to verify consistency.

coderabbitai · 2025-08-11T04:03:03Z

Walkthrough

Updated Maven/Bazel dependency metadata: multiple dependency version bumps (notably Parquet 1.15.1, Avro 1.11.4), new parquet-format-structures and javax.annotation-api, removals of log4j/audience-annotations, reload4j override, and assorted dependency graph adjustments.

Changes

Cohort / File(s)	Change Summary
Root metadata `maven_install.json`	Updated `__INPUT_ARTIFACTS_HASH` and `__RESOLVED_ARTIFACTS_HASH`.
Core version bumps `maven_install.json`	Upgraded artifacts: `com.github.luben:zstd-jni`, `org.xerial.snappy:snappy-java`, `io.airlift:aircompressor`, `commons-pool:commons-pool` (versions and shasums updated).
Avro bump `maven_install.json`, `tools/build_rules/dependencies/maven_repository.bzl`	`org.apache.avro:avro` 1.11.3 → 1.11.4.
Parquet 1.15.1 rollout `maven_install.json`, `tools/build_rules/dependencies/maven_repository.bzl`	Added/updated `org.apache.parquet:{parquet-column,parquet-common,parquet-encoding,parquet-format-structures,parquet-hadoop,parquet-jackson}` → 1.15.1; added top-level `parquet-format-structures` and `javax.annotation:javax.annotation-api`; expanded `parquet-hadoop` deps.
Logging/annotation removals & override `maven_install.json`, `tools/.../maven_repository.bzl`	Removed `log4j:log4j` blocks and many `org.apache.yetus:audience-annotations`; added `excluded_artifacts` entry for `log4j:log4j` and override mapping to `ch.qos.reload4j:reload4j:1.2.25`.
gRPC / Netty / Arrow adjustments `maven_install.json`	Reduced transitive deps for `io.grpc:grpc-alts` and `io.grpc:grpc-netty-shaded`; added netty/arrow-memory deps to Arrow netty modules.
Hive LLAP and downstream changes `maven_install.json`	`org.apache.hive:hive-llap-client` added `commons-lang3`; `hive-llap-common` added `hive-serde` and removed curator entries; dropwizard reporter added `hadoop-common` and `slf4j-api`.
Test refactor `aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala`	Refactored MostFrequentK test to run 900 independent trials, adjusted assertion to check stability (<3 difference) and preserved heavy-hitter checks; added explicit Matchers import.
Bazel maven repo rules `tools/build_rules/dependencies/maven_repository.bzl`	Added Parquet artifacts, bumped Avro, added `excluded_artifacts` for log4j and `overrides` mapping to reload4j.

Sequence Diagram(s)

(omitted — changes are dependency metadata only; no runtime control-flow to diagram)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Changes to shade & pull logging jars to help Etsy Patina lib integration #449 — related changes around removing/replacing Log4j artifacts and dependency adjustments.
Changes to fix all the test failures with Bazel #310 — modifications to tools/build_rules/dependencies/maven_repository.bzl and artifact/version updates overlap.
Bazel migration for spark module #196 — related build dependency edits affecting logging-related artifacts and overrides.

Suggested reviewers

kumar-zlai
piyush-zlai
tchow-zlai

Poem

Jars shuffled, hashes turned anew,
Parquet grew wings and Avro flew.
Log4j bowed, reload4j strides in,
Tests repeat where bugs grow thin.
Builds hum on — dependencies tuned. 🎉

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch nikhil/vulnerabilities

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

tools/build_rules/dependencies/maven_repository.bzl (1)
233-235: Exclude log4j 1.x: good; also exclude SLF4J’s log4j1 binding(s).

Prevents multiple SLF4J bindings and accidental reintroduction of log4j1 via bindings.
     "org.scala-lang:scala-reflect",
     # Exclude vulnerable log4j 1.2.17 - using reload4j 1.2.25 as a secure replacement
     "log4j:log4j",
+    "org.slf4j:slf4j-log4j12",
+    "org.slf4j:log4j-over-slf4j",

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fdbe80 and 16b23c6.

📒 Files selected for processing (2)

maven_install.json (22 hunks)
tools/build_rules/dependencies/maven_repository.bzl (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: cloud_gcp_tests
GitHub Check: api_tests
GitHub Check: cloud_aws_tests
GitHub Check: online_tests
GitHub Check: service_commons_tests
GitHub Check: service_tests
GitHub Check: aggregator_tests
GitHub Check: python_tests
GitHub Check: flink_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: groupby_tests
GitHub Check: batch_tests
GitHub Check: join_tests
GitHub Check: analyzer_tests
GitHub Check: fetcher_tests
GitHub Check: streaming_tests
GitHub Check: spark_tests

🔇 Additional comments (20)

tools/build_rules/dependencies/maven_repository.bzl (2)

95-95: Avro 1.11.4 bump looks good.

Ensure downstream (e.g., fastserde 0.4.25, Flink/Spark Avro) remains compatible in CI.

98-104: Manual verification: ensure all Parquet artifacts resolve to 1.15.1

Automated queries couldn’t confirm versions in maven_install.json. Please verify no parquet-* entries use a version other than 1.15.1:

• tools/build_rules/dependencies/maven_repository.bzl (lines 98–104) pin to 1.15.1
• File to check: maven_install.json
• Suggested command:
grep -E 'org.apache.parquet:parquet-[^:]+:(?!1.15.1)' maven_install.json

maven_install.json (18)

401-405: zstd-jni → 1.5.6-6.
LGTM. Verify no classifier/os-specific regressions.

2679-2683: Avro → 1.11.4 (CVE fixes).
Good. Watch for schema resolution/API changes; run I/O tests.

3453-3456: parquet-jackson → 1.15.1.
OK. Note new shaded package structure; verify shading conflicts don’t surface.

4531-4535: snappy-java → 1.1.10.7.
LGTM. Ensure native loader works in all target platforms.

6197-6197: grpc-alts trimmed to grpc-context.
Good reduction. Verify no ALTS usage paths rely on excluded artifacts.

7528-7530: New parquet-format-structures + javax.annotation-api.
Expected with 1.15.x. Ensure javax.annotation-api version aligns with rest.

18159-18159: New shaded.fastutil.io package.
Expected with parquet shading updates. No action.

18169-18169: New org.apache.parquet.conf package.
Matches parquet-conf module emergence. OK.

18192-18193: parquet-hadoop exposes org.apache.parquet.conf.
Consistent with 1.15.x layout. OK.

18211-18211: New wrapped.io util namespace.
Harmless shading addition. OK.

18220-18220: fastutil.io shaded path added.
As expected with upstream shading. OK.

18321-18323: Jackson shaded internals updated.
Fine; confirm no accidental dependency on shaded internals.

18350-18351: Databind util/internal shaded paths updated.
OK. Nothing to change.

7533-7541: Parquet-hadoop dep graph expanded – exclusions effective & compressors aligned

Verified no log4j:log4j in maven_install.json

Compressor artifacts present at unified versions:
• org.xerial.snappy:snappy-java 1.1.10.7
• com.github.luben:zstd-jni 1.5.6-6
• io.airlift:aircompressor 2.0.2

3-4: Lockfile update looks good — provenance confirmed
AUTOGENERATED banner is present, Avro bumped to 1.11.4, Parquet to 1.15.1; no legacy log4j:log4j entries in maven_install.json; override to ch.qos.reload4j:reload4j:1.2.25 verified in tools/build_rules/dependencies/maven_repository.bzl.

5011-5013: No guava/slf4j version conflicts detected
Verified maven_install.json contains a single com.google.guava:guava and a single org.slf4j:slf4j-api entry—classpath convergence is intact.

1416-1419: Ignore aircompressor version check – no upgrade detected
No occurrences of aircompressor in maven_install.json; no bump to 2.0.2 was applied. Disregard this suggestion.

Likely an incorrect or invalid review comment.

1402-1405: commons-pool → 1.6: OK.

Code still uses commons-pool2 API in api/src/main/scala/ai/chronon/api/SimpleObjectPool.scala (lines 3–4).

Dependency org.apache.commons:commons-pool2:2.12.1 remains declared.

No further action needed.

coderabbitai · 2025-08-11T04:09:04Z

maven_install.json

+        "jar": "c3ce722da920bfa886ee9f98028ecc9dfe3d199625cc14809c3c2fc8bcfa29ba",
+        "sources": "32dbc7386e61809d9edc30a6b6a818f59077415391ab0e7f48fb875da40cd936"
      },
-      "version": "1.13.1"
+      "version": "1.15.1"
    },
    "org.apache.parquet:parquet-common": {
      "shasums": {
-        "jar": "4a2d7a8eb395bbc8d202fa74b272e45374a1034b351028824ad46925366297ea",
-        "sources": "7e9762f617f60f8a05f4431bdeaa55076672402cf7a425fd70e355072e48232a"
+        "jar": "7e10373a59e4cba6efdf9f2c03da5a2b27c5fb4768cf7cb25d7905dc76b40272",
+        "sources": "1bac6586672f1ba87ba420186972ec45989207c81463aab7372a2857975e95fd"
      },
-      "version": "1.13.1"
+      "version": "1.15.1"
    },
    "org.apache.parquet:parquet-encoding": {
      "shasums": {
-        "jar": "0c921d275bbcc31552ee94d383ea38b2615110790d59f11ad07ffe5f3f0c23b3",
-        "sources": "aa5269df8aa587dd28250ae02f65594776d8762e53e7a7ffe8672770da851425"
+        "jar": "b6ee50330dd91ef0c7a7a0004400dea04e17748f2d69a13930b8ba9c9e314ad8",
+        "sources": "d7ab3f00dba85bd4d9aab6a6aedc747bfb294f09c7404f97cd7699005aeefd01"
      },
-      "version": "1.13.1"
+      "version": "1.15.1"
    },
    "org.apache.parquet:parquet-format-structures": {
      "shasums": {
-        "jar": "57b3e31e9502a1121dc481a3017fc2f20d1d6116da82dd58155a69b10b28e1e6",
-        "sources": "d6c733e33c2191d0a94d3f781303195168d9e9ffc7f055ee9059e27ccd9385b4"
+        "jar": "f5ba4d969a68eb5f5fab18a63feb4df66ea021e1f728b65834c1fe016b356c73",
+        "sources": "2a54c4ac657a69fd127b14d03ea359f460a5fc8cec8ddce3dc04356d624af431"
      },
-      "version": "1.13.1"
+      "version": "1.15.1"
    },
    "org.apache.parquet:parquet-hadoop": {
      "shasums": {
-        "jar": "5b86f191d0bbfe866d194bd1a336fa79d049d342fb7a05aefcf31ea266792a40",
-        "sources": "829f2d8e0543a18207a208a25d23beaed51dd2e84bfcc330b4585e1e49adf517"
+        "jar": "5c764cb6712776fa3f3b15431c3913f23610557da439ec43598094a217aa63d9",
+        "sources": "591dd98f1119380c112cfdf8164e73ee44e6265486f33cb356c5cdffb527f69f"
      },
-      "version": "1.13.1"
+      "version": "1.15.1"
    },


💡 Verification agent

🧩 Analysis chain

Parquet core libs → 1.15.1.
Solid CVE-driven bump. Validate read/write compatibility on existing datasets.

🏁 Script executed:

#!/bin/bash jq -r '.artifacts | to_entries | map(select(.key|startswith("org.apache.parquet:")))[] | "\(.key) \(.value.version)"' maven_install.json

Length of output: 445

Parquet hadoop-bundle version mismatch
The parquet-hadoop-bundle remains at 1.8.1 while all other Parquet modules are at 1.15.1. Please bump it to 1.15.1 and validate read/write compatibility on existing datasets.

File: maven_install.json

Lines: 3411–3443

- org.apache.parquet:parquet-hadoop-bundle 1.8.1 + org.apache.parquet:parquet-hadoop-bundle 1.15.1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"jar": "c3ce722da920bfa886ee9f98028ecc9dfe3d199625cc14809c3c2fc8bcfa29ba",

"sources": "32dbc7386e61809d9edc30a6b6a818f59077415391ab0e7f48fb875da40cd936"

},

"version": "1.13.1"

"version": "1.15.1"

},

"org.apache.parquet:parquet-common": {

"shasums": {

"jar": "4a2d7a8eb395bbc8d202fa74b272e45374a1034b351028824ad46925366297ea",

"sources": "7e9762f617f60f8a05f4431bdeaa55076672402cf7a425fd70e355072e48232a"

"jar": "7e10373a59e4cba6efdf9f2c03da5a2b27c5fb4768cf7cb25d7905dc76b40272",

"sources": "1bac6586672f1ba87ba420186972ec45989207c81463aab7372a2857975e95fd"

},

"version": "1.13.1"

"version": "1.15.1"

},

"org.apache.parquet:parquet-encoding": {

"shasums": {

"jar": "0c921d275bbcc31552ee94d383ea38b2615110790d59f11ad07ffe5f3f0c23b3",

"sources": "aa5269df8aa587dd28250ae02f65594776d8762e53e7a7ffe8672770da851425"

"jar": "b6ee50330dd91ef0c7a7a0004400dea04e17748f2d69a13930b8ba9c9e314ad8",

"sources": "d7ab3f00dba85bd4d9aab6a6aedc747bfb294f09c7404f97cd7699005aeefd01"

},

"version": "1.13.1"

"version": "1.15.1"

},

"org.apache.parquet:parquet-format-structures": {

"shasums": {

"jar": "57b3e31e9502a1121dc481a3017fc2f20d1d6116da82dd58155a69b10b28e1e6",

"sources": "d6c733e33c2191d0a94d3f781303195168d9e9ffc7f055ee9059e27ccd9385b4"

"jar": "f5ba4d969a68eb5f5fab18a63feb4df66ea021e1f728b65834c1fe016b356c73",

"sources": "2a54c4ac657a69fd127b14d03ea359f460a5fc8cec8ddce3dc04356d624af431"

},

"version": "1.13.1"

"version": "1.15.1"

},

"org.apache.parquet:parquet-hadoop": {

"shasums": {

"jar": "5b86f191d0bbfe866d194bd1a336fa79d049d342fb7a05aefcf31ea266792a40",

"sources": "829f2d8e0543a18207a208a25d23beaed51dd2e84bfcc330b4585e1e49adf517"

"jar": "5c764cb6712776fa3f3b15431c3913f23610557da439ec43598094a217aa63d9",

"sources": "591dd98f1119380c112cfdf8164e73ee44e6265486f33cb356c5cdffb527f69f"

},

"version": "1.13.1"

"version": "1.15.1"

},

- org.apache.parquet:parquet-hadoop-bundle 1.8.1

+ org.apache.parquet:parquet-hadoop-bundle 1.15.1

🤖 Prompt for AI Agents

In maven_install.json around lines 3411 to 3443 the parquet-hadoop-bundle dependency is pinned to 1.8.1 while other Parquet modules use 1.15.1; update the parquet-hadoop-bundle version to 1.15.1 to match the rest of the Parquet modules and then run your build and test matrix to validate read/write compatibility against existing datasets (run unit/integration tests and a sample read/write of representative parquet files to confirm no runtime/serialization regressions).

coderabbitai · 2025-08-11T04:09:04Z

maven_install.json

+      "io.netty:netty-common",
+      "org.apache.arrow:arrow-memory-core",
      "org.apache.arrow:arrow-memory-netty-buffer-patch"
    ],
    "org.apache.arrow:arrow-memory-netty-buffer-patch": [
+      "io.netty:netty-buffer",
+      "io.netty:netty-common",
      "org.apache.arrow:arrow-memory-core",
      "org.slf4j:slf4j-api"


💡 Verification agent

🧩 Analysis chain

Arrow netty deps added.
Ensure a single Netty version across repo to avoid CVEs/class conflicts.

🏁 Script executed:

#!/bin/bash jq -r '.artifacts | to_entries | map(select(.key|startswith("io.netty:")))[] | "\(.key) \(.value.version)"' maven_install.json | sort -u

Length of output: 1583

Unify Netty versions in maven_install.json

maven_install.json currently pulls io.netty artifacts at both 4.1.111.Final and 4.1.115.Final, which can lead to CVEs and class-loading conflicts. Please align all io.netty:* entries to a single version (e.g. 4.1.115.Final).

Modules at 4.1.115.Final:

netty-buffer, netty-codec, netty-codec-http, netty-codec-http2

netty-common, netty-handler, netty-resolver

netty-transport, netty-transport-classes-epoll

netty-transport-native-unix-common

Modules at 4.1.111.Final (need upgrade):

netty-all, codec-dns, codec-haproxy, codec-memcache, codec-mqtt

codec-redis, codec-smtp, codec-socks, codec-stomp, codec-xml

handler-proxy, handler-ssl-ocsp, resolver-dns*, transport-classes-kqueue

transport-native-epoll, transport-native-kqueue, transport-rxtx, transport-sctp, transport-udt

🤖 Prompt for AI Agents

In maven_install.json around lines 6797 to 6805, multiple io.netty artifacts are pinned to 4.1.111.Final while others use 4.1.115.Final, which can cause CVEs and class-loading conflicts; update all io.netty:* entries so they uniformly use 4.1.115.Final (replace versions for netty-all, codec-dns, codec-haproxy, codec-memcache, codec-mqtt, codec-redis, codec-smtp, codec-socks, codec-stomp, codec-xml, handler-proxy, handler-ssl-ocsp, resolver-dns*, transport-classes-kqueue, transport-native-epoll, transport-native-kqueue, transport-rxtx, transport-sctp, transport-udt) remove any duplicate/conflicting entries, and regenerate/verify dependency lock file so the build resolves to the single 4.1.115.Final version.

coderabbitai · 2025-08-11T04:09:04Z

maven_install.json

      "com.github.stephenc.findbugs:findbugs-annotations",
-      "junit:junit",
-      "log4j:log4j"
+      "junit:junit"


⚠️ Potential issue

hbase-annotations now pulls junit.
Risk: junit on runtime classpath if scope not respected. Prefer test-scope only.

Consider excluding junit at runtime via repo rules.

coderabbitai · 2025-08-11T04:09:04Z

maven_install.json

+      "org.apache.commons:commons-lang3"
+    ],
+    "org.apache.hive:hive-llap-common": [
      "org.apache.commons:commons-lang3",
-      "org.apache.curator:apache-curator:pom",
-      "org.apache.curator:curator-framework",
      "org.apache.hive:hive-common",
-      "org.apache.hive:hive-llap-common",
+      "org.apache.hive:hive-serde",
      "org.slf4j:slf4j-api"


💡 Verification agent

🧩 Analysis chain

Hive LLAP deps adjusted (commons-lang3, hive-serde).
OK. Verify removed Curator isn’t implicitly required elsewhere.

🏁 Script executed:

#!/bin/bash rg -n 'curator-framework|apache-curator' || true

Length of output: 1612

Curator still pulled into the build

tools/build_rules/dependencies/maven_repository.bzl: line 110 includes "org.apache.curator:apache-curator:5.5.0"

maven_install.json contains numerous curator entries (e.g. lines 2754, 2760, 2774, 6854–6858, 7310, 7342–7343, …)

If you intend to drop Curator, remove it from maven_repository.bzl and ensure no Hive LLAP or other modules transitively require it.

🤖 Prompt for AI Agents

In maven_install.json around lines 7368 (and in tools/build_rules/dependencies/maven_repository.bzl at ~line 110) the Apache Curator coordinates ("org.apache.curator:apache-curator:5.5.0") are still being pulled into the build; remove the Curator entry from maven_repository.bzl, then regenerate maven_install.json (or run the repository sync) so the lockfile no longer contains curator artifacts; if any transitive consumers (e.g. Hive LLAP or other modules) still require Curator, add explicit dependency exclusions or overrides for those artifacts in maven_repository.bzl (or the relevant dependency declarations) or replace them with the intended alternative, run dependency resolution again, and verify no curator groupId/artifactId entries remain in maven_install.json.

coderabbitai · 2025-08-11T04:09:05Z

maven_install.json

+      "org.apache.hadoop:hadoop-client-api",
      "org.slf4j:slf4j-api"


💡 Verification agent

🧩 Analysis chain

orc-shims adds hadoop-client-api.
Check Hadoop dep alignment to project’s Hadoop version.

🏁 Script executed:

#!/bin/bash jq -r '.artifacts["org.apache.hadoop:hadoop-client-api"].version' maven_install.json

Length of output: 92

🏁 Script executed:

#!/bin/bash # List all Hadoop-related artifact versions to check project’s Hadoop version jq -r '.artifacts | to_entries[] | select(.key | test("^org\\.apache\\.hadoop:hadoop-")) | "\(.key): \(.value.version)"' maven_install.json

Length of output: 596

Align Hadoop dependency versions

Mixed Hadoop versions detected in maven_install.json after adding orc-shims’s hadoop-client-api:

org.apache.hadoop:hadoop-client-api: 3.4.1

org.apache.hadoop:hadoop-client-runtime: 3.3.4

org.apache.hadoop:hadoop-common: 3.4.1

org.apache.hadoop:hadoop-yarn-api: 3.4.1

org.apache.hadoop:hadoop-yarn-common: 3.4.1

org.apache.hadoop:hadoop-yarn-server-*: 2.7.2

Please consolidate all Hadoop artifacts to your project’s intended version (for example, bump everything to 3.4.1) or explicitly override/exclude the transitive Hadoop deps introduced by orc-shims to prevent runtime conflicts.

🤖 Prompt for AI Agents

In maven_install.json around lines 7512-7513 there are mixed Hadoop versions (3.4.1, 3.3.4, 2.7.2) introduced by orc-shims; consolidate all Hadoop artifacts to the project’s intended version (e.g., 3.4.1) by either adding explicit overrides (dependencyManagement or direct dependencies pinned to 3.4.1) for all org.apache.hadoop artifacts, or exclude the transitive Hadoop deps from orc-shims and add the correct-version Hadoop artifacts as direct deps; after making the change, regenerate the lockfile (or run the workspace sync/lockfile update command your build uses) so maven_install.json reflects the unified versions.

coderabbitai · 2025-08-11T04:09:05Z

tools/build_rules/dependencies/maven_repository.bzl

+    overrides = {
+        # Force reload4j as a replacement for vulnerable log4j 1.2.17
+        "log4j:log4j": "ch.qos.reload4j:reload4j:1.2.25",
+    },


💡 Verification agent

🧩 Analysis chain

Reload4j override: verify it’s effective (no log4j1 present).

Confirm maven_install.json has reload4j and no log4j:log4j remnants. If override works, you may not need both exclude and override (choose one strategy) to simplify.

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Checking for log4j:log4j in maven_install.json (should be none):" if rg -n 'log4j:log4j' maven_install.json; then echo "Found log4j:log4j — override/exclude may not be effective." >&2 exit 1 else echo "OK: no log4j:log4j found." fi echo echo "Checking for reload4j 1.2.25 presence:" if rg -n 'ch\.qos\.reload4j:reload4j:1\.2\.25' maven_install.json; then echo "OK: reload4j present." else echo "reload4j not found — ensure override/artifact is applied." >&2 exit 1 fi echo echo "Sanity check for multiple SLF4J bindings (log4j2 + logback) in repo config:" rg -n 'log4j-slf4j-impl|logback-classic' tools/build_rules/dependencies/maven_repository.bzl || true echo "If both are present at runtime you may see multiple SLF4J bindings."

Length of output: 5423

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Checking for direct log4j v1 artifact key:" if rg -n '"log4j:log4j":' maven_install.json; then echo "Found direct log4j:v1 entry → override not applied" >&2 exit 1 else echo "OK: no direct log4j:v1 entry" fi echo echo "Checking for reload4j override entry:" if rg -n '"ch.qos.reload4j:reload4j:1.2.25":' maven_install.json; then echo "OK: reload4j override applied" else echo "Missing reload4j override" >&2 exit 1 fi

Length of output: 452

Reload4j override not applied – fix required

maven_install.json is missing "ch.qos.reload4j:reload4j:1.2.25", so log4j 1.2 isn’t being replaced.

• tools/build_rules/dependencies/maven_repository.bzl (≈ lines 236–239): confirm the override key/value syntax.
• Regenerate your lockfile (e.g. bazel sync or bazel run //:gazelle -- update-repos) and verify "ch.qos.reload4j:reload4j:1.2.25" appears in maven_install.json.

tchow-zlai · 2025-08-11T04:09:25Z

tools/build_rules/dependencies/maven_repository.bzl

+        "org.apache.avro:avro:1.11.4",
        "com.linkedin.avroutil1:avro-fastserde:0.4.25",
+
+        # Parquet - Force upgrade to 1.15.1 to address CVE in 1.13.1


maybe double check with bazel query "deps(//spark:all)" and similar?

those queries is what i have been running (via claude) to see

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (1)

14-14: Import is redundant (optional).

You already import the DSL via Matchers._. Keeping both is harmless; consider using just one style for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16b23c6 and 02c1b00.

📒 Files selected for processing (1)

aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-29T15:21:58.102Z

Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala

🧬 Code Graph Analysis (1)

aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (1)

aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala (44)

prepare (49-49)

prepare (69-69)

prepare (89-93)

prepare (137-137)

prepare (230-234)

prepare (264-268)

prepare (413-418)

prepare (499-503)

prepare (546-550)

prepare (587-587)

prepare (622-626)

prepare (660-670)

prepare (880-884)

update (51-51)

update (71-71)

update (95-100)

update (140-144)

update (194-200)

update (236-239)

update (282-284)

update (338-338)

update (341-341)

update (345-345)

update (349-349)

update (352-352)

update (356-356)

update (360-360)

update (420-423)

update (505-508)

finalize (55-55)

finalize (75-75)

finalize (107-107)

finalize (153-156)

finalize (246-249)

finalize (298-317)

finalize (437-463)

finalize (532-532)

finalize (574-574)

finalize (589-589)

finalize (636-636)

finalize (754-756)

finalize (759-761)

finalize (848-852)

finalize (895-897)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)

GitHub Check: streaming_tests
GitHub Check: groupby_tests
GitHub Check: analyzer_tests
GitHub Check: join_tests
GitHub Check: cloud_gcp_tests
GitHub Check: batch_tests
GitHub Check: cloud_aws_tests
GitHub Check: fetcher_tests
GitHub Check: scala_compile_fmt_fix
GitHub Check: aggregator_tests
GitHub Check: spark_tests
GitHub Check: service_commons_tests
GitHub Check: service_tests
GitHub Check: online_tests
GitHub Check: flink_tests
GitHub Check: api_tests
GitHub Check: python_tests
GitHub Check: enforce_triggered_workflows

coderabbitai · 2025-08-11T18:21:26Z

aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala

+    for (i <- 0 until 900) {
+      val k = 10
+      val topFrequentItems = new FrequentItems[java.lang.Long](k)
+      val frequentItemsIr = topFrequentItems.prepare(0)

-    createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i))
+      createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i))

-    val topHistogram = topFrequentItems.finalize(frequentItemsIr)
+      val topHistogram = topFrequentItems.finalize(frequentItemsIr)

-    math.abs(topHistogram.size() - k) <= 2 shouldBe true
-    heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString))
+      (topHistogram.size() - k) should be < 3
+      heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString))
+    }


🛠️ Refactor suggestion

⚠️ Potential issue

Fix assertion logic and enforce heavy-hitter checks; tame flakiness/perf.

Only upper-bound is checked; use abs(delta) <= 2.

containsKey() results aren’t asserted; test may pass even when keys are missing.

900 iterations with unseeded RNG can be slow/flaky.

Proposed patch:

-for (i <- 0 until 900) { +for (i <- 0 until 100) { val k = 10 val topFrequentItems = new FrequentItems[java.lang.Long](k) val frequentItemsIr = topFrequentItems.prepare(0) createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i)) val topHistogram = topFrequentItems.finalize(frequentItemsIr) - (topHistogram.size() - k) should be < 3 - heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString)) + math.abs(topHistogram.size() - k) should be <= 2 + heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString) shouldBe true) }

Optionally, make the randomness deterministic:

// Outside this block: private val rnd = new Random(42L) // In createSkewedData: Random.shuffle(longTail ++ heavyHitters)(rnd)

🤖 Prompt for AI Agents

In aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala around lines 186 to 197, the test only checks an upper bound on size, doesn't assert containsKey results, and runs 900 unseeded iterations causing flakiness and slowness; change the size assertion to assert math.abs(topHistogram.size() - k) <= 2 (or equivalent) so both upper and lower bounds are enforced, add explicit assertions for each heavyHitterElems element (e.g., heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString) shouldBe true) ) to fail when keys are missing, and reduce iterations (e.g., from 900 to a much smaller number) and make randomness deterministic by seeding Random (create a private val rnd = new Random(42L) and use it in createSkewedData with Random.shuffle(...)(rnd)) to tame flakiness and improve test performance.

piyush-zlai · 2025-08-11T19:03:06Z

tools/build_rules/dependencies/maven_repository.bzl

-    overrides = {},
+    overrides = {
+        # Force reload4j as a replacement for vulnerable log4j 1.2.17
+        "log4j:log4j": "ch.qos.reload4j:reload4j:1.2.25",


we should double check logging isn't broken on things like the fetcher svc

david-zlai

canary tests pass

nikhil-zlai · 2025-08-12T21:25:58Z

canary tests pass

thanks a lot dude!

fix: cve

16b23c6

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

tchow-zlai reviewed Aug 11, 2025

View reviewed changes

tchow-zlai approved these changes Aug 11, 2025

View reviewed changes

should matchers

02c1b00

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

nikhil-zlai assigned nikhil-zlai and david-zlai and unassigned nikhil-zlai Aug 11, 2025

piyush-zlai reviewed Aug 11, 2025

View reviewed changes

david-zlai approved these changes Aug 12, 2025

View reviewed changes

nikhil-zlai merged commit 516d6d0 into main Aug 12, 2025
22 checks passed

nikhil-zlai deleted the nikhil/vulnerabilities branch August 12, 2025 21:25

fix: cves - log4j, parquet-avro etc #1053

fix: cves - log4j, parquet-avro etc #1053

Uh oh!

Conversation

nikhil-zlai commented Aug 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

tchow-zlai Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

nikhil-zlai Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

piyush-zlai Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

david-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikhil-zlai commented Aug 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nikhil-zlai commented Aug 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 11, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)