Skip to content

Conversation

@nikhil-zlai
Copy link
Contributor

@nikhil-zlai nikhil-zlai commented Aug 11, 2025

Summary

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Security

    • Removed legacy Log4j 1.x and replaced with Reload4j as a security mitigation.
  • Chores

    • Upgraded core formats and compression libs (Parquet 1.15.1, Avro 1.11.4, Snappy, Zstd, Aircompressor, Commons Pool) and added parquet format-structures and javax.annotation-api.
    • Streamlined transitive deps across gRPC, Netty/Arrow, Hive LLAP; cleaned up audience-annotations and other legacy entries.
  • Tests

    • Stabilized a frequent-items test by running repeated trials to verify consistency.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Walkthrough

Updated Maven/Bazel dependency metadata: multiple dependency version bumps (notably Parquet 1.15.1, Avro 1.11.4), new parquet-format-structures and javax.annotation-api, removals of log4j/audience-annotations, reload4j override, and assorted dependency graph adjustments.

Changes

Cohort / File(s) Change Summary
Root metadata
maven_install.json
Updated __INPUT_ARTIFACTS_HASH and __RESOLVED_ARTIFACTS_HASH.
Core version bumps
maven_install.json
Upgraded artifacts: com.github.luben:zstd-jni, org.xerial.snappy:snappy-java, io.airlift:aircompressor, commons-pool:commons-pool (versions and shasums updated).
Avro bump
maven_install.json, tools/build_rules/dependencies/maven_repository.bzl
org.apache.avro:avro 1.11.3 → 1.11.4.
Parquet 1.15.1 rollout
maven_install.json, tools/build_rules/dependencies/maven_repository.bzl
Added/updated org.apache.parquet:{parquet-column,parquet-common,parquet-encoding,parquet-format-structures,parquet-hadoop,parquet-jackson} → 1.15.1; added top-level parquet-format-structures and javax.annotation:javax.annotation-api; expanded parquet-hadoop deps.
Logging/annotation removals & override
maven_install.json, tools/.../maven_repository.bzl
Removed log4j:log4j blocks and many org.apache.yetus:audience-annotations; added excluded_artifacts entry for log4j:log4j and override mapping to ch.qos.reload4j:reload4j:1.2.25.
gRPC / Netty / Arrow adjustments
maven_install.json
Reduced transitive deps for io.grpc:grpc-alts and io.grpc:grpc-netty-shaded; added netty/arrow-memory deps to Arrow netty modules.
Hive LLAP and downstream changes
maven_install.json
org.apache.hive:hive-llap-client added commons-lang3; hive-llap-common added hive-serde and removed curator entries; dropwizard reporter added hadoop-common and slf4j-api.
Test refactor
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala
Refactored MostFrequentK test to run 900 independent trials, adjusted assertion to check stability (<3 difference) and preserved heavy-hitter checks; added explicit Matchers import.
Bazel maven repo rules
tools/build_rules/dependencies/maven_repository.bzl
Added Parquet artifacts, bumped Avro, added excluded_artifacts for log4j and overrides mapping to reload4j.

Sequence Diagram(s)

(omitted — changes are dependency metadata only; no runtime control-flow to diagram)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • kumar-zlai
  • piyush-zlai
  • tchow-zlai

Poem

Jars shuffled, hashes turned anew,
Parquet grew wings and Avro flew.
Log4j bowed, reload4j strides in,
Tests repeat where bugs grow thin.
Builds hum on — dependencies tuned. 🎉

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nikhil/vulnerabilities

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
tools/build_rules/dependencies/maven_repository.bzl (1)

233-235: Exclude log4j 1.x: good; also exclude SLF4J’s log4j1 binding(s).

Prevents multiple SLF4J bindings and accidental reintroduction of log4j1 via bindings.

     "org.scala-lang:scala-reflect",
     # Exclude vulnerable log4j 1.2.17 - using reload4j 1.2.25 as a secure replacement
     "log4j:log4j",
+    "org.slf4j:slf4j-log4j12",
+    "org.slf4j:log4j-over-slf4j",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fdbe80 and 16b23c6.

📒 Files selected for processing (2)
  • maven_install.json (22 hunks)
  • tools/build_rules/dependencies/maven_repository.bzl (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: tchow-zlai
PR: zipline-ai/chronon#393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: api_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: online_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: service_tests
  • GitHub Check: aggregator_tests
  • GitHub Check: python_tests
  • GitHub Check: flink_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: groupby_tests
  • GitHub Check: batch_tests
  • GitHub Check: join_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: streaming_tests
  • GitHub Check: spark_tests
🔇 Additional comments (20)
tools/build_rules/dependencies/maven_repository.bzl (2)

95-95: Avro 1.11.4 bump looks good.

Ensure downstream (e.g., fastserde 0.4.25, Flink/Spark Avro) remains compatible in CI.


98-104: Manual verification: ensure all Parquet artifacts resolve to 1.15.1

Automated queries couldn’t confirm versions in maven_install.json. Please verify no parquet-* entries use a version other than 1.15.1:

• tools/build_rules/dependencies/maven_repository.bzl (lines 98–104) pin to 1.15.1
• File to check: maven_install.json
• Suggested command:
grep -E 'org.apache.parquet:parquet-[^:]+:(?!1.15.1)' maven_install.json

maven_install.json (18)

401-405: zstd-jni → 1.5.6-6.
LGTM. Verify no classifier/os-specific regressions.


2679-2683: Avro → 1.11.4 (CVE fixes).
Good. Watch for schema resolution/API changes; run I/O tests.


3453-3456: parquet-jackson → 1.15.1.
OK. Note new shaded package structure; verify shading conflicts don’t surface.


4531-4535: snappy-java → 1.1.10.7.
LGTM. Ensure native loader works in all target platforms.


6197-6197: grpc-alts trimmed to grpc-context.
Good reduction. Verify no ALTS usage paths rely on excluded artifacts.


7528-7530: New parquet-format-structures + javax.annotation-api.
Expected with 1.15.x. Ensure javax.annotation-api version aligns with rest.


18159-18159: New shaded.fastutil.io package.
Expected with parquet shading updates. No action.


18169-18169: New org.apache.parquet.conf package.
Matches parquet-conf module emergence. OK.


18192-18193: parquet-hadoop exposes org.apache.parquet.conf.
Consistent with 1.15.x layout. OK.


18211-18211: New wrapped.io util namespace.
Harmless shading addition. OK.


18220-18220: fastutil.io shaded path added.
As expected with upstream shading. OK.


18321-18323: Jackson shaded internals updated.
Fine; confirm no accidental dependency on shaded internals.


18350-18351: Databind util/internal shaded paths updated.
OK. Nothing to change.


7533-7541: Parquet-hadoop dep graph expanded – exclusions effective & compressors aligned

  • Verified no log4j:log4j in maven_install.json
  • Compressor artifacts present at unified versions:
    • org.xerial.snappy:snappy-java 1.1.10.7
    • com.github.luben:zstd-jni 1.5.6-6
    • io.airlift:aircompressor 2.0.2

3-4: Lockfile update looks good — provenance confirmed
AUTOGENERATED banner is present, Avro bumped to 1.11.4, Parquet to 1.15.1; no legacy log4j:log4j entries in maven_install.json; override to ch.qos.reload4j:reload4j:1.2.25 verified in tools/build_rules/dependencies/maven_repository.bzl.


5011-5013: No guava/slf4j version conflicts detected
Verified maven_install.json contains a single com.google.guava:guava and a single org.slf4j:slf4j-api entry—classpath convergence is intact.


1416-1419: Ignore aircompressor version check – no upgrade detected
No occurrences of aircompressor in maven_install.json; no bump to 2.0.2 was applied. Disregard this suggestion.

Likely an incorrect or invalid review comment.


1402-1405: commons-pool → 1.6: OK.

  • Code still uses commons-pool2 API in api/src/main/scala/ai/chronon/api/SimpleObjectPool.scala (lines 3–4).
  • Dependency org.apache.commons:commons-pool2:2.12.1 remains declared.

No further action needed.

Comment on lines +3411 to 3443
"jar": "c3ce722da920bfa886ee9f98028ecc9dfe3d199625cc14809c3c2fc8bcfa29ba",
"sources": "32dbc7386e61809d9edc30a6b6a818f59077415391ab0e7f48fb875da40cd936"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-common": {
"shasums": {
"jar": "4a2d7a8eb395bbc8d202fa74b272e45374a1034b351028824ad46925366297ea",
"sources": "7e9762f617f60f8a05f4431bdeaa55076672402cf7a425fd70e355072e48232a"
"jar": "7e10373a59e4cba6efdf9f2c03da5a2b27c5fb4768cf7cb25d7905dc76b40272",
"sources": "1bac6586672f1ba87ba420186972ec45989207c81463aab7372a2857975e95fd"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-encoding": {
"shasums": {
"jar": "0c921d275bbcc31552ee94d383ea38b2615110790d59f11ad07ffe5f3f0c23b3",
"sources": "aa5269df8aa587dd28250ae02f65594776d8762e53e7a7ffe8672770da851425"
"jar": "b6ee50330dd91ef0c7a7a0004400dea04e17748f2d69a13930b8ba9c9e314ad8",
"sources": "d7ab3f00dba85bd4d9aab6a6aedc747bfb294f09c7404f97cd7699005aeefd01"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-format-structures": {
"shasums": {
"jar": "57b3e31e9502a1121dc481a3017fc2f20d1d6116da82dd58155a69b10b28e1e6",
"sources": "d6c733e33c2191d0a94d3f781303195168d9e9ffc7f055ee9059e27ccd9385b4"
"jar": "f5ba4d969a68eb5f5fab18a63feb4df66ea021e1f728b65834c1fe016b356c73",
"sources": "2a54c4ac657a69fd127b14d03ea359f460a5fc8cec8ddce3dc04356d624af431"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-hadoop": {
"shasums": {
"jar": "5b86f191d0bbfe866d194bd1a336fa79d049d342fb7a05aefcf31ea266792a40",
"sources": "829f2d8e0543a18207a208a25d23beaed51dd2e84bfcc330b4585e1e49adf517"
"jar": "5c764cb6712776fa3f3b15431c3913f23610557da439ec43598094a217aa63d9",
"sources": "591dd98f1119380c112cfdf8164e73ee44e6265486f33cb356c5cdffb527f69f"
},
"version": "1.13.1"
"version": "1.15.1"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Parquet core libs → 1.15.1.
Solid CVE-driven bump. Validate read/write compatibility on existing datasets.


🏁 Script executed:

#!/bin/bash
jq -r '.artifacts | to_entries | map(select(.key|startswith("org.apache.parquet:")))[] | "\(.key) \(.value.version)"' maven_install.json

Length of output: 445


Parquet hadoop-bundle version mismatch
The parquet-hadoop-bundle remains at 1.8.1 while all other Parquet modules are at 1.15.1. Please bump it to 1.15.1 and validate read/write compatibility on existing datasets.

  • File: maven_install.json
  • Lines: 3411–3443
- org.apache.parquet:parquet-hadoop-bundle 1.8.1
+ org.apache.parquet:parquet-hadoop-bundle 1.15.1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"jar": "c3ce722da920bfa886ee9f98028ecc9dfe3d199625cc14809c3c2fc8bcfa29ba",
"sources": "32dbc7386e61809d9edc30a6b6a818f59077415391ab0e7f48fb875da40cd936"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-common": {
"shasums": {
"jar": "4a2d7a8eb395bbc8d202fa74b272e45374a1034b351028824ad46925366297ea",
"sources": "7e9762f617f60f8a05f4431bdeaa55076672402cf7a425fd70e355072e48232a"
"jar": "7e10373a59e4cba6efdf9f2c03da5a2b27c5fb4768cf7cb25d7905dc76b40272",
"sources": "1bac6586672f1ba87ba420186972ec45989207c81463aab7372a2857975e95fd"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-encoding": {
"shasums": {
"jar": "0c921d275bbcc31552ee94d383ea38b2615110790d59f11ad07ffe5f3f0c23b3",
"sources": "aa5269df8aa587dd28250ae02f65594776d8762e53e7a7ffe8672770da851425"
"jar": "b6ee50330dd91ef0c7a7a0004400dea04e17748f2d69a13930b8ba9c9e314ad8",
"sources": "d7ab3f00dba85bd4d9aab6a6aedc747bfb294f09c7404f97cd7699005aeefd01"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-format-structures": {
"shasums": {
"jar": "57b3e31e9502a1121dc481a3017fc2f20d1d6116da82dd58155a69b10b28e1e6",
"sources": "d6c733e33c2191d0a94d3f781303195168d9e9ffc7f055ee9059e27ccd9385b4"
"jar": "f5ba4d969a68eb5f5fab18a63feb4df66ea021e1f728b65834c1fe016b356c73",
"sources": "2a54c4ac657a69fd127b14d03ea359f460a5fc8cec8ddce3dc04356d624af431"
},
"version": "1.13.1"
"version": "1.15.1"
},
"org.apache.parquet:parquet-hadoop": {
"shasums": {
"jar": "5b86f191d0bbfe866d194bd1a336fa79d049d342fb7a05aefcf31ea266792a40",
"sources": "829f2d8e0543a18207a208a25d23beaed51dd2e84bfcc330b4585e1e49adf517"
"jar": "5c764cb6712776fa3f3b15431c3913f23610557da439ec43598094a217aa63d9",
"sources": "591dd98f1119380c112cfdf8164e73ee44e6265486f33cb356c5cdffb527f69f"
},
"version": "1.13.1"
"version": "1.15.1"
},
- org.apache.parquet:parquet-hadoop-bundle 1.8.1
+ org.apache.parquet:parquet-hadoop-bundle 1.15.1
🤖 Prompt for AI Agents
In maven_install.json around lines 3411 to 3443 the parquet-hadoop-bundle
dependency is pinned to 1.8.1 while other Parquet modules use 1.15.1; update the
parquet-hadoop-bundle version to 1.15.1 to match the rest of the Parquet modules
and then run your build and test matrix to validate read/write compatibility
against existing datasets (run unit/integration tests and a sample read/write of
representative parquet files to confirm no runtime/serialization regressions).

Comment on lines +6797 to 6805
"io.netty:netty-common",
"org.apache.arrow:arrow-memory-core",
"org.apache.arrow:arrow-memory-netty-buffer-patch"
],
"org.apache.arrow:arrow-memory-netty-buffer-patch": [
"io.netty:netty-buffer",
"io.netty:netty-common",
"org.apache.arrow:arrow-memory-core",
"org.slf4j:slf4j-api"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Arrow netty deps added.
Ensure a single Netty version across repo to avoid CVEs/class conflicts.


🏁 Script executed:

#!/bin/bash
jq -r '.artifacts | to_entries | map(select(.key|startswith("io.netty:")))[] | "\(.key) \(.value.version)"' maven_install.json | sort -u

Length of output: 1583


Unify Netty versions in maven_install.json

maven_install.json currently pulls io.netty artifacts at both 4.1.111.Final and 4.1.115.Final, which can lead to CVEs and class-loading conflicts. Please align all io.netty:* entries to a single version (e.g. 4.1.115.Final).

Modules at 4.1.115.Final:

  • netty-buffer, netty-codec, netty-codec-http, netty-codec-http2
  • netty-common, netty-handler, netty-resolver
  • netty-transport, netty-transport-classes-epoll
  • netty-transport-native-unix-common

Modules at 4.1.111.Final (need upgrade):

  • netty-all, codec-dns, codec-haproxy, codec-memcache, codec-mqtt
  • codec-redis, codec-smtp, codec-socks, codec-stomp, codec-xml
  • handler-proxy, handler-ssl-ocsp, resolver-dns*, transport-classes-kqueue
  • transport-native-epoll, transport-native-kqueue, transport-rxtx, transport-sctp, transport-udt
🤖 Prompt for AI Agents
In maven_install.json around lines 6797 to 6805, multiple io.netty artifacts are
pinned to 4.1.111.Final while others use 4.1.115.Final, which can cause CVEs and
class-loading conflicts; update all io.netty:* entries so they uniformly use
4.1.115.Final (replace versions for netty-all, codec-dns, codec-haproxy,
codec-memcache, codec-mqtt, codec-redis, codec-smtp, codec-socks, codec-stomp,
codec-xml, handler-proxy, handler-ssl-ocsp, resolver-dns*,
transport-classes-kqueue, transport-native-epoll, transport-native-kqueue,
transport-rxtx, transport-sctp, transport-udt) remove any duplicate/conflicting
entries, and regenerate/verify dependency lock file so the build resolves to the
single 4.1.115.Final version.

"com.github.stephenc.findbugs:findbugs-annotations",
"junit:junit",
"log4j:log4j"
"junit:junit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

hbase-annotations now pulls junit.
Risk: junit on runtime classpath if scope not respected. Prefer test-scope only.

Consider excluding junit at runtime via repo rules.

Comment on lines +7368 to 7374
"org.apache.commons:commons-lang3"
],
"org.apache.hive:hive-llap-common": [
"org.apache.commons:commons-lang3",
"org.apache.curator:apache-curator:pom",
"org.apache.curator:curator-framework",
"org.apache.hive:hive-common",
"org.apache.hive:hive-llap-common",
"org.apache.hive:hive-serde",
"org.slf4j:slf4j-api"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Hive LLAP deps adjusted (commons-lang3, hive-serde).
OK. Verify removed Curator isn’t implicitly required elsewhere.


🏁 Script executed:

#!/bin/bash
rg -n 'curator-framework|apache-curator' || true

Length of output: 1612


Curator still pulled into the build

  • tools/build_rules/dependencies/maven_repository.bzl: line 110 includes "org.apache.curator:apache-curator:5.5.0"
  • maven_install.json contains numerous curator entries (e.g. lines 2754, 2760, 2774, 6854–6858, 7310, 7342–7343, …)

If you intend to drop Curator, remove it from maven_repository.bzl and ensure no Hive LLAP or other modules transitively require it.

🤖 Prompt for AI Agents
In maven_install.json around lines 7368 (and in
tools/build_rules/dependencies/maven_repository.bzl at ~line 110) the Apache
Curator coordinates ("org.apache.curator:apache-curator:5.5.0") are still being
pulled into the build; remove the Curator entry from maven_repository.bzl, then
regenerate maven_install.json (or run the repository sync) so the lockfile no
longer contains curator artifacts; if any transitive consumers (e.g. Hive LLAP
or other modules) still require Curator, add explicit dependency exclusions or
overrides for those artifacts in maven_repository.bzl (or the relevant
dependency declarations) or replace them with the intended alternative, run
dependency resolution again, and verify no curator groupId/artifactId entries
remain in maven_install.json.

Comment on lines +7512 to 7513
"org.apache.hadoop:hadoop-client-api",
"org.slf4j:slf4j-api"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

orc-shims adds hadoop-client-api.
Check Hadoop dep alignment to project’s Hadoop version.


🏁 Script executed:

#!/bin/bash
jq -r '.artifacts["org.apache.hadoop:hadoop-client-api"].version' maven_install.json

Length of output: 92


🏁 Script executed:

#!/bin/bash
# List all Hadoop-related artifact versions to check project’s Hadoop version
jq -r '.artifacts
  | to_entries[]
  | select(.key | test("^org\\.apache\\.hadoop:hadoop-"))
  | "\(.key): \(.value.version)"' maven_install.json

Length of output: 596


Align Hadoop dependency versions

Mixed Hadoop versions detected in maven_install.json after adding orc-shims’s hadoop-client-api:

  • org.apache.hadoop:hadoop-client-api: 3.4.1
  • org.apache.hadoop:hadoop-client-runtime: 3.3.4
  • org.apache.hadoop:hadoop-common: 3.4.1
  • org.apache.hadoop:hadoop-yarn-api: 3.4.1
  • org.apache.hadoop:hadoop-yarn-common: 3.4.1
  • org.apache.hadoop:hadoop-yarn-server-*: 2.7.2

Please consolidate all Hadoop artifacts to your project’s intended version (for example, bump everything to 3.4.1) or explicitly override/exclude the transitive Hadoop deps introduced by orc-shims to prevent runtime conflicts.

🤖 Prompt for AI Agents
In maven_install.json around lines 7512-7513 there are mixed Hadoop versions
(3.4.1, 3.3.4, 2.7.2) introduced by orc-shims; consolidate all Hadoop artifacts
to the project’s intended version (e.g., 3.4.1) by either adding explicit
overrides (dependencyManagement or direct dependencies pinned to 3.4.1) for all
org.apache.hadoop artifacts, or exclude the transitive Hadoop deps from
orc-shims and add the correct-version Hadoop artifacts as direct deps; after
making the change, regenerate the lockfile (or run the workspace sync/lockfile
update command your build uses) so maven_install.json reflects the unified
versions.

Comment on lines +236 to +239
overrides = {
# Force reload4j as a replacement for vulnerable log4j 1.2.17
"log4j:log4j": "ch.qos.reload4j:reload4j:1.2.25",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Reload4j override: verify it’s effective (no log4j1 present).

Confirm maven_install.json has reload4j and no log4j:log4j remnants. If override works, you may not need both exclude and override (choose one strategy) to simplify.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking for log4j:log4j in maven_install.json (should be none):"
if rg -n 'log4j:log4j' maven_install.json; then
  echo "Found log4j:log4j — override/exclude may not be effective." >&2
  exit 1
else
  echo "OK: no log4j:log4j found."
fi

echo
echo "Checking for reload4j 1.2.25 presence:"
if rg -n 'ch\.qos\.reload4j:reload4j:1\.2\.25' maven_install.json; then
  echo "OK: reload4j present."
else
  echo "reload4j not found — ensure override/artifact is applied." >&2
  exit 1
fi

echo
echo "Sanity check for multiple SLF4J bindings (log4j2 + logback) in repo config:"
rg -n 'log4j-slf4j-impl|logback-classic' tools/build_rules/dependencies/maven_repository.bzl || true
echo "If both are present at runtime you may see multiple SLF4J bindings."

Length of output: 5423


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking for direct log4j v1 artifact key:"
if rg -n '"log4j:log4j":' maven_install.json; then
  echo "Found direct log4j:v1 entry → override not applied" >&2
  exit 1
else
  echo "OK: no direct log4j:v1 entry"
fi

echo
echo "Checking for reload4j override entry:"
if rg -n '"ch.qos.reload4j:reload4j:1.2.25":' maven_install.json; then
  echo "OK: reload4j override applied"
else
  echo "Missing reload4j override" >&2
  exit 1
fi

Length of output: 452


Reload4j override not applied – fix required

maven_install.json is missing "ch.qos.reload4j:reload4j:1.2.25", so log4j 1.2 isn’t being replaced.

• tools/build_rules/dependencies/maven_repository.bzl (≈ lines 236–239): confirm the override key/value syntax.
• Regenerate your lockfile (e.g. bazel sync or bazel run //:gazelle -- update-repos) and verify "ch.qos.reload4j:reload4j:1.2.25" appears in maven_install.json.

"org.apache.avro:avro:1.11.4",
"com.linkedin.avroutil1:avro-fastserde:0.4.25",

# Parquet - Force upgrade to 1.15.1 to address CVE in 1.13.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe double check with bazel query "deps(//spark:all)" and similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those queries is what i have been running (via claude) to see

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (1)

14-14: Import is redundant (optional).

You already import the DSL via Matchers._. Keeping both is harmless; consider using just one style for consistency.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 16b23c6 and 02c1b00.

📒 Files selected for processing (1)
  • aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
PR: zipline-ai/chronon#53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala
🧬 Code Graph Analysis (1)
aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala (1)
aggregator/src/main/scala/ai/chronon/aggregator/base/SimpleAggregators.scala (44)
  • prepare (49-49)
  • prepare (69-69)
  • prepare (89-93)
  • prepare (137-137)
  • prepare (230-234)
  • prepare (264-268)
  • prepare (413-418)
  • prepare (499-503)
  • prepare (546-550)
  • prepare (587-587)
  • prepare (622-626)
  • prepare (660-670)
  • prepare (880-884)
  • update (51-51)
  • update (71-71)
  • update (95-100)
  • update (140-144)
  • update (194-200)
  • update (236-239)
  • update (282-284)
  • update (338-338)
  • update (341-341)
  • update (345-345)
  • update (349-349)
  • update (352-352)
  • update (356-356)
  • update (360-360)
  • update (420-423)
  • update (505-508)
  • finalize (55-55)
  • finalize (75-75)
  • finalize (107-107)
  • finalize (153-156)
  • finalize (246-249)
  • finalize (298-317)
  • finalize (437-463)
  • finalize (532-532)
  • finalize (574-574)
  • finalize (589-589)
  • finalize (636-636)
  • finalize (754-756)
  • finalize (759-761)
  • finalize (848-852)
  • finalize (895-897)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: join_tests
  • GitHub Check: cloud_gcp_tests
  • GitHub Check: batch_tests
  • GitHub Check: cloud_aws_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: aggregator_tests
  • GitHub Check: spark_tests
  • GitHub Check: service_commons_tests
  • GitHub Check: service_tests
  • GitHub Check: online_tests
  • GitHub Check: flink_tests
  • GitHub Check: api_tests
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows

Comment on lines +186 to +197
for (i <- 0 until 900) {
val k = 10
val topFrequentItems = new FrequentItems[java.lang.Long](k)
val frequentItemsIr = topFrequentItems.prepare(0)

createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i))
createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i))

val topHistogram = topFrequentItems.finalize(frequentItemsIr)
val topHistogram = topFrequentItems.finalize(frequentItemsIr)

math.abs(topHistogram.size() - k) <= 2 shouldBe true
heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString))
(topHistogram.size() - k) should be < 3
heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix assertion logic and enforce heavy-hitter checks; tame flakiness/perf.

  • Only upper-bound is checked; use abs(delta) <= 2.
  • containsKey() results aren’t asserted; test may pass even when keys are missing.
  • 900 iterations with unseeded RNG can be slow/flaky.

Proposed patch:

-for (i <- 0 until 900) {
+for (i <- 0 until 100) {
   val k = 10
   val topFrequentItems = new FrequentItems[java.lang.Long](k)
   val frequentItemsIr = topFrequentItems.prepare(0)

   createSkewedData().foreach(i => topFrequentItems.update(frequentItemsIr, i))

   val topHistogram = topFrequentItems.finalize(frequentItemsIr)

-  (topHistogram.size() - k) should be < 3
-  heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString))
+  math.abs(topHistogram.size() - k) should be <= 2
+  heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString) shouldBe true)
 }

Optionally, make the randomness deterministic:

// Outside this block:
private val rnd = new Random(42L)
// In createSkewedData:
Random.shuffle(longTail ++ heavyHitters)(rnd)
🤖 Prompt for AI Agents
In aggregator/src/test/scala/ai/chronon/aggregator/test/FrequentItemsTest.scala
around lines 186 to 197, the test only checks an upper bound on size, doesn't
assert containsKey results, and runs 900 unseeded iterations causing flakiness
and slowness; change the size assertion to assert math.abs(topHistogram.size() -
k) <= 2 (or equivalent) so both upper and lower bounds are enforced, add
explicit assertions for each heavyHitterElems element (e.g.,
heavyHitterElems.foreach(elem => topHistogram.containsKey(elem.toString)
shouldBe true) ) to fail when keys are missing, and reduce iterations (e.g.,
from 900 to a much smaller number) and make randomness deterministic by seeding
Random (create a private val rnd = new Random(42L) and use it in
createSkewedData with Random.shuffle(...)(rnd)) to tame flakiness and improve
test performance.

overrides = {},
overrides = {
# Force reload4j as a replacement for vulnerable log4j 1.2.17
"log4j:log4j": "ch.qos.reload4j:reload4j:1.2.25",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should double check logging isn't broken on things like the fetcher svc

Copy link
Contributor

@david-zlai david-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canary tests pass

@nikhil-zlai nikhil-zlai merged commit 516d6d0 into main Aug 12, 2025
22 checks passed
@nikhil-zlai nikhil-zlai deleted the nikhil/vulnerabilities branch August 12, 2025 21:25
@nikhil-zlai
Copy link
Contributor Author

canary tests pass

thanks a lot dude!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants