Skip to content

Conversation

@david-zlai
Copy link
Contributor

@david-zlai david-zlai commented Feb 19, 2025

Summary

^^^

Tested on the etsy laptop.

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Bug Fixes
    • Improved error handling to explicitly report when configuration values are missing.
  • New Features
    • Introduced standardized constants for various configuration types, ensuring consistent key naming.
  • Refactor
    • Unified metadata processing by using direct metadata names instead of file paths.
    • Enhanced type safety in configuration options for clearer and more reliable behavior.
  • Tests
    • Updated test cases and parameters to reflect the improved metadata and configuration handling.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 19, 2025

Walkthrough

This pull request refines several modules across online, spark, and API components. It enhances error handling in the KVStore, improves configuration handling by introducing stricter type checks and a new helper case class, and standardizes key name generation by shifting from file path to direct metadata name usage. Additionally, constants and helper methods have been added to support uniformity, and tests have been updated accordingly.

Changes

Files Change Summary
online/src/main/scala/ai/chronon/online/Api.scala Enhanced error handling in KVStore.getString to return a Failure if no values are found.
online/src/main/scala/ai/chronon/online/{FetcherMain.scala,MetadataDirWalker.scala,MetadataEndPoint.scala,MetadataStore.scala} Improved configuration handling: updated confType usage, refactored metadata extraction with tuple returns, introduced ConfPathOrName, and switched from file path to metadata name in parameter usage.
spark/src/main/scala/ai/chronon/spark/{Driver.scala,api/Extensions.scala,api/Constants.scala} Updated configuration option descriptions, added new keyword constants, and introduced key name generation methods via extensions.
online/src/main/scala/ai/chronon/online/{Fetcher.scala,stats/DriftStore.scala} Changed join name retrieval from nameToFilePath to direct metadata name for clarity.
spark/src/main/scala/ai/chronon/spark/{scripts/ObservabilityDemo.scala,stats/drift/Summarizer.scala,test/{SchemaEvolutionTest.scala,analyzer/DerivationTest.scala,bootstrap/LogBootstrapTest.scala,fetcher/{ChainingFetcherTest.scala,FetcherTest.scala},stats/drift/DriftTest.scala}} Updated test and script request constructions to use metadata name instead of file path, ensuring consistency across the codebase.

Possibly related PRs

Suggested reviewers

  • piyush-zlai

Poem

In code we craft a clearer tale,
With checks that never let us fail.
Metadata sings its proper name,
Through tuples, keywords, and constants—no blame.
CodeRabbit’s work shines ever bright,
A joyful fix in lines so light!

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 59b36a3 and ae91478.

📒 Files selected for processing (1)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
⏰ Context from checks skipped due to timeout of 90000ms (16)
  • GitHub Check: streaming_tests
  • GitHub Check: streaming_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: groupby_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: spark_tests
  • GitHub Check: enforce_triggered_workflows

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

new MetadataEndPoint[Conf](
extractFn = (path, conf) => (path.confPathToKey, ThriftJsonCodec.toJsonStr(conf)),
// extractFn = (path, conf) => (path.confPathToKey, ThriftJsonCodec.toJsonStr(conf)),
extractFn = (metadataName, conf) => (metadataName, ThriftJsonCodec.toJsonStr(conf)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing this to metadataName because metadataName is sourced from the actual conf:

{
  "metaData": {
    "name": "quickstart.training_set.v1",
....

Previously, path.confPathToKey extracts from the conf path. It's better to just be consistent and use the name in the actual conf (not the confpath)

val startNs = System.nanoTime
val requests = Seq(Fetcher.Request(featureName, keyMap, args.atMillis.toOption))
val resultFuture = if (args.confType() == "join") {
val resultFuture = if (args.confType() == "joins") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is all over the place.

so confType gets sourced from run.py. and it is set from the directories that chronon expects like joins, group_bys...

and is used in this ROUTES map: https://github.com/zipline-ai/chronon/blob/main/api/py/ai/chronon/repo/run.py#L94-L121

I'm making this plural here to be consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put make this ConfType and object like Constants with Joins, GroupBys etc?

@david-zlai david-zlai force-pushed the davidhan/debug_fetch branch 4 times, most recently from 4c4d6a1 to 614cb4b Compare February 19, 2025 03:03
Comment on lines 86 to 88
case 0 => {
Failure(new RuntimeException(s"No values returned from KVStore. " +
s"Request for key=${key} in dataset=${dataset} failed"))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-zlai david-zlai changed the title Davidhan/debug fetch Fixes to make fetch Join work Feb 19, 2025
@david-zlai david-zlai changed the title Fixes to make fetch Join work Fixes to make fetch Join work in CLI Feb 19, 2025
@david-zlai david-zlai marked this pull request as ready for review February 19, 2025 03:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
online/src/main/scala/ai/chronon/online/FetcherMain.scala (1)

43-43: ⚠️ Potential issue

Fix inconsistent default value.

The default value "join" doesn't match the plural form used in choices ("joins", "group_bys").

-    choice(Seq("joins", "group_bys"), required = false, descr = "the type of conf to fetch", default = Some("join"))
+    choice(Seq("joins", "group_bys"), required = false, descr = "the type of conf to fetch", default = Some("joins"))
🧹 Nitpick comments (3)
online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (1)

127-128: Remove commented out code.

The old implementation is tracked in version control and doesn't need to be preserved in comments.

-//                  .extractFn(filePath, conf.asInstanceOf[api.Model])
-                  .extractFn(metadataName.get, conf.asInstanceOf[api.Model])
+                  .extractFn(metadataName.get, conf.asInstanceOf[api.Model])
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)

129-130: Enhance log message structure.

Add more context to the log message for better traceability.

-      logger.info("Querying for dataset: " + request.dataset)
+      logger.info(s"BigTable query: dataset=${request.dataset}")

152-152: Good use of StandardCharsets.UTF_8.

The explicit charset is good practice. Consider structuring the log message better.

-          logger.info(s"Querying key: " + new String(baseRowKey, StandardCharsets.UTF_8))
+          logger.info(s"BigTable query: rowKey=${new String(baseRowKey, StandardCharsets.UTF_8)}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 1bf782e and 96e985b.

📒 Files selected for processing (7)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/FetcherMain.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataEndPoint.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataStore.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: join_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: streaming_tests
  • GitHub Check: groupby_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (14)
online/src/main/scala/ai/chronon/online/MetadataEndPoint.scala (1)

54-54: LGTM! Good parameter renaming.

Parameter name better reflects the data source.

online/src/main/scala/ai/chronon/online/MetadataStore.scala (1)

181-183: LGTM!

Enhanced error handling to catch and log all potential errors.

online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (5)

81-101: LGTM! Improved metadata name handling.

The tuple destructuring and metadata name extraction from configuration enhances consistency.


103-103: LGTM! Better null safety.

The additional check for metadataName.isDefined improves robustness.


81-101: Well-structured tuple return value.

The change to return both config and metadata name improves error handling consistency.


103-103: Good defensive programming.

Checking both optConf and metadataName prevents potential null pointer exceptions.


112-128: Consistent use of metadata name.

The change from filePath to metadataName.get aligns with the new error handling structure.

online/src/main/scala/ai/chronon/online/FetcherMain.scala (3)

42-43: LGTM! Better type safety with choice.

Restricting confType to valid values prevents runtime errors.


189-189: LGTM! Better parameter naming.

Using named parameter improves code clarity.


189-203: Improved code clarity.

Named parameter and consistent plural form enhance readability.

online/src/main/scala/ai/chronon/online/Api.scala (2)

85-91: LGTM! Better error handling for empty results.

The explicit check for empty values with clear error message improves debugging.


85-91: Better error handling for empty results.

Explicit handling of empty values with informative error messages improves debugging.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigTableKVStoreImpl.scala (2)

40-40: LGTM! Added required import.

StandardCharsets import supports the new logging functionality.


129-130: LGTM! Enhanced logging.

Added logging improves visibility of dataset queries and key lookups.

Also applies to: 152-152

lazy val getJoinConf: TTLCache[String, Try[JoinOps]] = new TTLCache[String, Try[JoinOps]](
{ name =>
val startTimeMs = System.currentTimeMillis()
val result = getConf[Join](s"joins/$name")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I made this change only because it's not compatible with how I've done the run.py + dataprocsubmitter at the moment.

to explain, when we do metadata upload for a join what happens is that we:

  • first take the join conf file like production/joins/quickstart/training_set.v1 and then upload it to gcs gs://zipline-warehouse-canary/metadata/production/joins/quickstart/training_set.v1
  • next, we add that as a fileUri to dataproc so that dataproc copies the gcs file over to the spark working directory
    image

dataproc will copy the file but that only copies production/joins/quickstart/ training_set.v1 . dataproc won't (or from what I see won't) preserve any part of the original path. so training_set.v1 file is at the working directory.
(Chronon would have traditionally expected production/joins/quickstart/training_set.v1 and then called confPathToKey returning **joins**/quickstart/training_set.v1)

  • and later in the code, we'll upload the join conf to BigTable's CHRONON_METADATA where we'll set the key based on the path of the conf. And since the path of the conf from dataproc is just training_set.v1, it'll look like this in bt:
    image

........this breaks when we try to ultimately do a fetch join now as on this line of code we hardcode s"joins/$name"

Copy link
Contributor

@nikhil-zlai nikhil-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should match everywhere to the python import path - even the compiled files should.

production/joins.team.file.var

in cli fetch we no longer need type just get joins.team.file.var

and metadata upload should write to the key joins.team.file.var

wdyt?

.filter(Filters.FILTERS.family().exactMatch(ColumnFamilyString))
.filter(Filters.FILTERS.qualifier().exactMatch(ColumnFamilyQualifierString))

logger.info("Querying for dataset: " + request.dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not log in the hot path - this will kill the disk on the box

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh crap, forgot this is hot path. removing

case _ =>
// for non-timeseries data, we just look up based on the base row key
val baseRowKey = buildRowKey(request.keyBytes, request.dataset)
logger.info(s"Querying key: " + new String(baseRowKey, StandardCharsets.UTF_8))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing

val startNs = System.nanoTime
val requests = Seq(Fetcher.Request(featureName, keyMap, args.atMillis.toOption))
val resultFuture = if (args.confType() == "join") {
val resultFuture = if (args.confType() == "joins") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put make this ConfType and object like Constants with Joins, GroupBys etc?

Copy link
Contributor

@piyush-zlai piyush-zlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of minor comments

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
online/src/main/scala/ai/chronon/online/ConfKeywordConstants.scala (1)

4-7: Add scaladoc for each constant.

Document the purpose and valid usage of each configuration key.

 case object ConfKeywordConstants {
+  /** Configuration key for join definitions */
   val JoinKeyword = "joins"
+  /** Configuration key for group by definitions */
   val GroupByKeyword = "group_bys"
+  /** Configuration key for staging query definitions */
   val StagingQueryKeyword = "staging_queries"
+  /** Configuration key for model definitions */
   val ModelKeyword = "models"
 }
online/src/main/scala/ai/chronon/online/Extensions.scala (1)

57-79: Extract common key name generation logic.

Identical pattern repeated across classes.

+  private def generateKeyName(keyword: String, name: String): String =
+    s"$keyword/" + name

   implicit class JoinOps(join: Join) {
-    def keyNameForKvStore: String = {
-      s"$JoinKeyword/" + join.metaData.name
-    }
+    def keyNameForKvStore: String = generateKeyName(JoinKeyword, join.metaData.name)
   }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 83606ed and 17dde77.

📒 Files selected for processing (7)
  • online/src/main/scala/ai/chronon/online/Api.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/ConfKeywordConstants.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/Extensions.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/FetcherMain.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataStore.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala
  • online/src/main/scala/ai/chronon/online/FetcherMain.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
  • online/src/main/scala/ai/chronon/online/Api.scala
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: streaming_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: groupby_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (7)
online/src/main/scala/ai/chronon/online/ConfKeywordConstants.scala (1)

3-8: LGTM! Good use of case object for type-safe constants.

online/src/main/scala/ai/chronon/online/MetadataStore.scala (5)

27-27: LGTM! Well-structured configuration handling.

The new case class provides type-safe configuration handling with proper validation.

Also applies to: 45-59


82-105: LGTM! Improved type safety.

Pattern matching for configuration types enhances type safety and maintainability.


152-152: LGTM! Fixed join configuration key handling.

The change correctly handles join configuration keys in the dataproc environment.


208-210: LGTM! Enhanced error handling.

Added broader error logging while maintaining existing behavior.


173-176: LGTM! Consistent key generation.

Key generation now aligns with the new configuration handling approach.

online/src/main/scala/ai/chronon/online/Extensions.scala (1)

20-22: LGTM: Imports are well-organized.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)

446-448: Consider extracting common key name generation logic.

All keyNameForKvStore implementations follow the same pattern. Consider extracting to a common trait or utility function.

+trait KeyNameGenerator {
+  def metaData: MetaData
+  def keywordType: String
+  def keyNameForKvStore: String = s"$keywordType/" + metaData.name
+}

Also applies to: 834-836, 1230-1234, 1236-1240

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 17dde77 and a2a7c5e.

📒 Files selected for processing (7)
  • api/src/main/scala/ai/chronon/api/ConfKeywordConstants.scala (1 hunks)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/Extensions.scala (2 hunks)
  • online/src/main/scala/ai/chronon/online/FetcherMain.scala (4 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala (3 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataStore.scala (6 hunks)
  • spark/src/main/scala/ai/chronon/spark/Driver.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • online/src/main/scala/ai/chronon/online/Extensions.scala
  • online/src/main/scala/ai/chronon/online/MetadataStore.scala
  • online/src/main/scala/ai/chronon/online/MetadataDirWalker.scala
  • online/src/main/scala/ai/chronon/online/FetcherMain.scala
  • spark/src/main/scala/ai/chronon/spark/Driver.scala
🔇 Additional comments (2)
api/src/main/scala/ai/chronon/api/ConfKeywordConstants.scala (1)

1-8: LGTM!

Well-structured constants using case object pattern.

api/src/main/scala/ai/chronon/api/Extensions.scala (1)

19-22: LGTM!

Clean import organization for the new constants.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
api/src/main/scala/ai/chronon/api/Extensions.scala (1)

448-450: Consider consolidating duplicate code using a trait.

The keyNameForKvStore method is implemented identically across multiple classes. This could be refactored into a trait to reduce code duplication.

+trait KeyNameForKvStore {
+  def metaData: MetaData
+  def keywordType: String
+  def keyNameForKvStore: String = Extensions._keyNameForKvStore(metaData, keywordType)
+}

-implicit class GroupByOps(groupBy: GroupBy) extends GroupBy(groupBy) {
+implicit class GroupByOps(groupBy: GroupBy) extends GroupBy(groupBy) with KeyNameForKvStore {
+  override def keywordType: String = GroupByKeyword
-  def keyNameForKvStore: String = {
-    _keyNameForKvStore(groupBy.metaData, GroupByKeyword)
-  }
}

-implicit class JoinOps(val join: Join) extends Serializable {
+implicit class JoinOps(val join: Join) extends Serializable with KeyNameForKvStore {
+  override def keywordType: String = JoinKeyword
-  def keyNameForKvStore: String = {
-    _keyNameForKvStore(join.metaData, JoinKeyword)
-  }
}

-implicit class StagingQueryOps(stagingQuery: StagingQuery) {
+implicit class StagingQueryOps(stagingQuery: StagingQuery) with KeyNameForKvStore {
+  override def keywordType: String = StagingQueryKeyword
-  def keyNameForKvStore: String = {
-    _keyNameForKvStore(stagingQuery.metaData, StagingQueryKeyword)
-  }
}

-implicit class ModelOps(model: Model) {
+implicit class ModelOps(model: Model) with KeyNameForKvStore {
+  override def keywordType: String = ModelKeyword
-  def keyNameForKvStore: String = {
-    _keyNameForKvStore(model.metaData, ModelKeyword)
-  }
}

Also applies to: 836-838, 1232-1236, 1238-1242

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a7398e3 and 7699ae2.

📒 Files selected for processing (14)
  • api/src/main/scala/ai/chronon/api/Extensions.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/Fetcher.scala (1 hunks)
  • online/src/main/scala/ai/chronon/online/MetadataStore.scala (6 hunks)
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala (3 hunks)
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (2 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/ConsistencyJob.scala (1 hunks)
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Summarizer.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala (3 hunks)
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
  • spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala
  • spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala
  • spark/src/main/scala/ai/chronon/spark/LogFlattenerJob.scala
  • online/src/main/scala/ai/chronon/online/stats/DriftStore.scala
  • spark/src/main/scala/ai/chronon/spark/stats/ConsistencyJob.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/ChainingFetcherTest.scala
  • spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
  • spark/src/test/scala/ai/chronon/spark/test/analyzer/DerivationTest.scala
  • online/src/main/scala/ai/chronon/online/Fetcher.scala
  • spark/src/main/scala/ai/chronon/spark/stats/drift/Summarizer.scala
  • online/src/main/scala/ai/chronon/online/MetadataStore.scala
  • spark/src/test/scala/ai/chronon/spark/test/fetcher/FetcherTest.scala
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: streaming_tests
  • GitHub Check: analyzer_tests
  • GitHub Check: spark_tests
  • GitHub Check: join_tests
  • GitHub Check: fetcher_tests
  • GitHub Check: groupby_tests
  • GitHub Check: non_spark_tests
  • GitHub Check: scala_compile_fmt_fix
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (2)
api/src/main/scala/ai/chronon/api/Extensions.scala (2)

19-22: LGTM: Imports are well-organized.

The imports are correctly grouped and follow a logical order.


46-48: LGTM: Well-structured helper method.

The private helper method provides a consistent way to generate key names across different components.

@david-zlai david-zlai changed the title Fixes to make fetch Join work in CLI Fixes to make fetch Join work in CLI including use name over nameToFilePath and replacing / to . in MetaData names Feb 21, 2025
val metrics: Metrics.Context = Metrics.Context(Metrics.Environment.JoinLogFlatten, joinConf)

private def getUnfilledRanges(inputTable: String, outputTable: String): Seq[PartitionRange] = {
val partitionName: String = joinConf.metaData.nameToFilePath.replace("/", "%2F")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one line killed me. not sure why we did this but I think it's because later on when we do SHOW PARTITIONS from hive and collect the partition values, when we store values with / in them they get encoded into %2F. so I guess we wanted to be consistent and did this hack.

I'm deleting this line and instead making sure we just use metaData.name which should come without /'s in them

image

@david-zlai david-zlai merged commit 1125593 into main Feb 21, 2025
20 checks passed
@david-zlai david-zlai deleted the davidhan/debug_fetch branch February 21, 2025 22:38
@ken-zlai ken-zlai mentioned this pull request Feb 25, 2025
4 tasks
ken-zlai added a commit that referenced this pull request Feb 25, 2025
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Docker images and rebuild on this
branch to confirm observability works as expected?

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallback label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Feb 26, 2025
…FilePath and replacing `/` to `.` in MetaData names (#398)

## Summary

^^^

Tested on the etsy laptop.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling to explicitly report when configuration values
are missing.
- **New Features**
- Introduced standardized constants for various configuration types,
ensuring consistent key naming.
- **Refactor**
- Unified metadata processing by using direct metadata names instead of
file paths.
- Enhanced type safety in configuration options for clearer and more
reliable behavior.
- **Tests**
- Updated test cases and parameters to reflect the improved metadata and
configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Feb 26, 2025
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Docker images and rebuild on this
branch to confirm observability works as expected?

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallback label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
…FilePath and replacing `/` to `.` in MetaData names (#398)

## Summary

^^^

Tested on the etsy laptop.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling to explicitly report when configuration values
are missing.
- **New Features**
- Introduced standardized constants for various configuration types,
ensuring consistent key naming.
- **Refactor**
- Unified metadata processing by using direct metadata names instead of
file paths.
- Enhanced type safety in configuration options for clearer and more
reliable behavior.
- **Tests**
- Updated test cases and parameters to reflect the improved metadata and
configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 25, 2025
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Docker images and rebuild on this
branch to confirm observability works as expected?

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallback label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
…FilePath and replacing `/` to `.` in MetaData names (#398)

## Summary

^^^

Tested on the etsy laptop.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling to explicitly report when configuration values
are missing.
- **New Features**
- Introduced standardized constants for various configuration types,
ensuring consistent key naming.
- **Refactor**
- Unified metadata processing by using direct metadata names instead of
file paths.
- Enhanced type safety in configuration options for clearer and more
reliable behavior.
- **Tests**
- Updated test cases and parameters to reflect the improved metadata and
configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kumar-zlai pushed a commit that referenced this pull request Apr 29, 2025
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Docker images and rebuild on this
branch to confirm observability works as expected?

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallback label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
…FilePath and replacing `/` to `.` in MetaData names (#398)

## Summary

^^^

Tested on the our clients laptop.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling to explicitly report when configuration values
are missing.
- **New Features**
- Introduced standardized constants for various configuration types,
ensuring consistent key naming.
- **Refactor**
- Unified metadata processing by using direct metadata names instead of
file paths.
- Enhanced type safety in configuration options for clearer and more
reliable behavior.
- **Tests**
- Updated test cases and parameters to reflect the improved metadata and
configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Docker images and rebuild on this
branch to confirm observability works as expected?

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallback label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
…FilePath and replacing `/` to `.` in MetaData names (#398)

## Summary

^^^

Tested on the our clients laptop.

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling to explicitly report when configuration values
are missing.
- **New Features**
- Introduced standardized constants for various configuration types,
ensuring consistent key naming.
- **Refactor**
- Unified metadata processing by using direct metadata names instead of
file paths.
- Enhanced type safety in configuration options for clearer and more
reliable behavior.
- **Tests**
- Updated test cases and parameters to reflect the improved metadata and
configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 15, 2025
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Docker images and rebuild on this
branch to confirm observability works as expected?

## Checklist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallback label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
…FilePath and replacing `/` to `.` in MetaData names (#398)

## Summary

^^^

Tested on the our clients laptop.

## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling to explicitly report when configuration values
are missing.
- **New Features**
- Introduced standardized constants for various configuration types,
ensuring consistent key naming.
- **Refactor**
- Unified metadata processing by using direct metadata names instead of
file paths.
- Enhanced type safety in configuration options for clearer and more
reliable behavior.
- **Tests**
- Updated test cases and parameters to reflect the improved metadata and
configuration handling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
chewy-zlai pushed a commit that referenced this pull request May 16, 2025
## Summary
#398 updated the module path from `"/"` to `"."`, but not all code was
migrated to the new convention, causing frontend API calls to fail when
retrieving joins.

@david-zlai – Can you review the code to ensure it fully aligns with the
new convention?
@sean-zlai – Can you tear down all Doour clientser images and rebuild on this
branch to confirm observability works as expected?

## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Streamlined how configuration names are handled in observability
views. Names are now displayed as originally provided without extra
formatting, ensuring a consistent and straightforward presentation. The
fallbaour clients label remains “Unknown” when a name is not available.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants