-
Notifications
You must be signed in to change notification settings - Fork 8
observability script for demo #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces several changes across multiple files, primarily focused on enhancing logging configurations, improving error handling, and restructuring package declarations. Key modifications include adding a logging level configuration for Spark sessions, introducing timing mechanisms in scripts, and updating method signatures to require explicit parameters. Additionally, several classes and methods have undergone package restructuring for better organization. The changes do not alter the core functionality but aim to improve clarity, robustness, and testing capabilities. Changes
Possibly related PRs
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (20)
spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionUtils.scala (1)
Line range hint
25-36: Good use of dependency injection patternThe addition of the
mockApiparameter makes the dependencies explicit and improves testability. The implementation properly utilizes the MockApi instance for accessing required properties.spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (2)
Line range hint
49-63: Consider enhancing test coverage with negative scenarios.While the happy path is well tested, consider adding test cases for:
- Invalid CSV file format
- Missing required columns
- Empty files
- Files with special characters in names
Would you like me to provide example test cases for these scenarios?
Line range hint
65-78: Consider adding validation for column data types.The test verifies column names and row count but doesn't validate the data types of the columns. Consider adding assertions to verify that the loaded data maintains the expected schema.
Example addition:
val expectedSchema = spark.sql(s"SELECT * FROM $nameSpaceAndTable").schema assert(expectedSchema("id_listing_view_event").dataType == IntegerType) assert(expectedSchema("ds").dataType == DateType)docker-init/start.sh (4)
3-10: Consider using SECONDS bash variable for timingWhile the current implementation works correctly, using the built-in
SECONDSvariable would be more idiomatic in bash:-start_time=$(date +%s) +SECONDS=0 if ! python3.8 generate_anomalous_data.py; then echo "Error: Failed to generate anomalous data" >&2 exit 1 else - end_time=$(date +%s) - elapsed_time=$((end_time - start_time)) - echo "Anomalous data generated successfully! Took $elapsed_time seconds." + echo "Anomalous data generated successfully! Took $SECONDS seconds." fi
32-35: Add timing measurement for consistencyFor consistency with other operations, consider adding timing measurement to the metadata loading step.
echo "Loading metadata.." +SECONDS=0 if ! java -Dlog4j.configurationFile=log4j.properties -cp $SPARK_JAR:$CLASSPATH ai.chronon.spark.Driver metadata-upload --conf-path=/chronon_sample/production/ --online-jar=$CLOUD_AWS_JAR --online-class=$ONLINE_CLASS; then echo "Error: Failed to load metadata into DynamoDB" >&2 exit 1 fi -echo "Metadata load completed successfully!" +echo "Metadata load completed successfully! Took $SECONDS seconds."
Line range hint
40-46: Add timing measurement for consistencyFor consistency with other operations, consider adding timing measurement to the DynamoDB initialization step.
echo "Initializing DynamoDB Table .." +SECONDS=0 if ! output=$(java -Dlog4j.configurationFile=log4j.properties -cp $SPARK_JAR:$CLASSPATH ai.chronon.spark.Driver create-summary-dataset \ --online-jar=$CLOUD_AWS_JAR \ --online-class=$ONLINE_CLASS 2>&1); then echo "Error: Failed to bring up DynamoDB table" >&2 echo "Java command output: $output" >&2 exit 1 fi -echo "DynamoDB Table created successfully!" +echo "DynamoDB Table created successfully! Took $SECONDS seconds."
Line range hint
50-66: Refactor timing implementation and Java optionsSeveral improvements can be made to this section:
- Use
SECONDSfor consistency with earlier suggestion- Java module options are duplicated with the
JAVA_OPTSexport at the end- Long command line could be more readable
-start_time=$(date +%s) +SECONDS=0 if ! java -Dlog4j.configurationFile=log4j.properties \ - --add-opens=java.base/sun.nio.ch=ALL-UNNAMED \ - --add-opens=java.base/sun.security.action=ALL-UNNAMED \ -cp $SPARK_JAR:$CLASSPATH ai.chronon.spark.Driver summarize-and-upload \ --online-jar=$CLOUD_AWS_JAR \ --online-class=$ONLINE_CLASS \ --parquet-path="$(pwd)/drift_data" \ --conf-path=/chronon_sample/production/ \ --time-column=transaction_time; then echo "Error: Failed to load summary data into DynamoDB" >&2 exit 1 else - end_time=$(date +%s) - elapsed_time=$((end_time - start_time)) - echo "Summary load completed successfully! Took $elapsed_time seconds." + echo "Summary load completed successfully! Took $SECONDS seconds." fiConsider moving the Java options to a variable at the start of the script:
# At the start of the script JAVA_MODULE_OPTIONS="--add-opens=java.base/java.lang=ALL-UNNAMED \ --add-opens=java.base/java.util=ALL-UNNAMED \ --add-opens=java.base/sun.nio.ch=ALL-UNNAMED \ --add-opens=java.base/sun.security.action=ALL-UNNAMED"spark/src/test/scala/ai/chronon/spark/test/JavaFetcherTest.java (1)
45-47: Well-structured test infrastructure enhancement!The introduction of MockApi and InMemoryKvStore improves test isolation and maintainability. The initialization chain is logically structured:
- TableUtils with SparkSession
- InMemoryKvStore with TableUtils
- MockApi with KvStore
- JavaFetcher from MockApi
Consider documenting the test infrastructure setup pattern in the test package's README.md to help other contributors follow this pattern consistently.
spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala (1)
Line range hint
39-43: Consider improving the configuration and resource management.Several improvements could enhance the robustness and maintainability of this method:
- Extract "StreamingTest" as a constant to avoid duplication
- Consider making the local mode configurable for different test environments
- Consider caching the TableUtils instance instead of creating a new one in each lambda call
Here's a suggested improvement:
object StreamingTest { + private val TestName = "StreamingTest" + private val LocalMode = true + @volatile private var tableUtilsInstance: TableUtils = _ + def buildInMemoryKvStore(): InMemoryKvStore = { - InMemoryKvStore.build("StreamingTest", - { () => TableUtils(SparkSessionBuilder.build("StreamingTest", local = true)) }) + InMemoryKvStore.build(TestName, { () => + if (tableUtilsInstance == null) { + synchronized { + if (tableUtilsInstance == null) { + tableUtilsInstance = TableUtils(SparkSessionBuilder.build(TestName, local = LocalMode)) + } + } + } + tableUtilsInstance + }) } }spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1)
48-50: LGTM! Good optimization for resource usage.The conditional Hive support is well implemented. This change allows for more efficient Spark sessions when Hive support isn't needed, potentially reducing memory footprint and initialization time.
Consider documenting the performance benefits in the project's performance tuning guide, as this parameter can be useful for optimizing resource usage in scenarios where Hive support is unnecessary.
spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (1)
Line range hint
33-186: Consider improving test maintainability.While the test is comprehensive and well-structured, consider these improvements:
- Extract the timeout duration to a constant
- Break down the long test method into smaller, focused test methods
- Use named constants for test data ranges instead of magic numbers
Example refactor for the timeout:
+ private val FetchTimeout = Duration(10, SECONDS) val responsesF = fetcher.fetchJoin(requests) - val responses = Await.result(responsesF, Duration(10, SECONDS)) + val responses = Await.result(responsesF, FetchTimeout)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)
Line range hint
119-123: Improve error handling and add instrumentation.The current error handling approach has a few issues:
Using
printStackTraceis not suitable for production as it:
- Writes directly to stderr
- Doesn't integrate with the logging system
- Makes it difficult to track and monitor errors
The TODO comment indicates missing instrumentation for failures
Consider replacing with proper logging and metrics:
- // TODO instrument failures - case Failure(exception) => exception.printStackTrace(); None + case Failure(exception) => + logger.error("Failed to process drift summary response", exception) + metrics.incrementCounter("drift.summary.failures") + Nonespark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (2)
Line range hint
201-203: Technical debt: Investigate and document test harness quirkThe TODO comment indicates uncertainty about the
dropDsOnWriteparameter's purpose. This technical debt should be addressed by:
- Investigating why this parameter is necessary
- Documenting the findings
- Refactoring the test harness to remove the quirk if possible
Would you like me to help create a GitHub issue to track this investigation?
Line range hint
146-149: Improve async operation handlingThe current implementation uses a fixed
Thread.sleep(5000)to handle async operations, which is fragile and could lead to flaky tests. Consider:
- Implementing a proper async wait mechanism with timeout
- Adding error handling for async operations
- Using a more reliable approach like polling or callbacks
Example improvement:
def awaitAsyncCompletion(maxWaitMs: Long = 5000, pollIntervalMs: Long = 100): Unit = { val endTime = System.currentTimeMillis() + maxWaitMs while (System.currentTimeMillis() < endTime && !isAsyncWorkComplete) { Thread.sleep(pollIntervalMs) } if (!isAsyncWorkComplete) { throw new TimeoutException(s"Async operations did not complete within ${maxWaitMs}ms") } }spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala (6)
163-167: Add documentation for the fraud patterns being generated.Consider adding scaladoc comments explaining:
- The types of fraud patterns this generates
- The rationale behind the chosen window sizes
- Expected usage scenarios
163-167: Add parameter validation for numerical inputs.Consider adding validation for:
baseValueshould be non-negativeamplitudeshould be non-negativenoiseLevelshould be non-negativedef timeToValue(t: LocalTime, baseValue: Double, amplitude: Double, noiseLevel: Double, scale: Double = 1.0): java.lang.Double = { + require(baseValue >= 0, "baseValue must be non-negative") + require(amplitude >= 0, "amplitude must be non-negative") + require(noiseLevel >= 0, "noiseLevel must be non-negative") if (scale == 0) null else {
Line range hint
345-354: Improve progress reporting frequency.The current implementation logs every 100,000 rows, which might be too infrequent for smaller datasets. Consider:
- Making the logging frequency configurable
- Using a percentage-based approach
- if (i % 100000 == 0) { + if (i % Math.max(numSamples / 10, 1000) == 0) { - println(s"Generated $i/$numSamples rows of data.") + println(s"Generated $i/$numSamples rows of data (${(i.toDouble/numSamples*100).toInt}%)") }
379-389: Consider batch processing for better memory management.The current implementation builds the entire dataset in memory. For large datasets, consider:
- Processing in batches
- Using Spark's RDD API more effectively
Line range hint
391-433: Externalize configuration values.Consider moving hardcoded values to configuration:
- Country lists
- Language codes
- Random value ranges
This would make the data generation more flexible and maintainable.
477-477: Enhance path handling robustness.Consider adding:
- Path normalization
- Directory creation if needed
- File existence checks
- Platform-specific path separator handling
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (23)
docker-init/generate_anomalous_data.py(1 hunks)docker-init/start.sh(4 hunks)online/src/main/scala/ai/chronon/online/stats/DriftStore.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/Driver.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala(15 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala(2 hunks)spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/ChainingFetcherTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/GroupByUploadTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/JavaFetcherTest.java(2 hunks)spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionUtils.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala(1 hunks)spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala(2 hunks)
✅ Files skipped from review due to trivial changes (8)
- docker-init/generate_anomalous_data.py
- spark/src/main/scala/ai/chronon/spark/utils/InMemoryKvStore.scala
- spark/src/main/scala/ai/chronon/spark/utils/InMemoryStream.scala
- spark/src/main/scala/ai/chronon/spark/utils/MockApi.scala
- spark/src/test/scala/ai/chronon/spark/test/ChainingFetcherTest.scala
- spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala
- spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala
- spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala
🔇 Additional comments (21)
spark/src/test/scala/ai/chronon/spark/test/GroupByUploadTest.scala (1)
28-28: Import Statement Added Appropriately
The addition of import ai.chronon.spark.utils.MockApi is appropriate and necessary for utilizing MockApi within the GroupByUploadTest class. This import allows for effective simulation of API interactions in your tests, enhancing test coverage and reliability.
spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionUtils.scala (2)
22-22: LGTM: Import statement is correctly structured
The import statement for MockApi follows proper Scala conventions and aligns with the package restructuring.
Line range hint 25-36: Verify test coverage and caller updates
Since this is a test utility method with a signature change, we should verify that all callers have been updated and tests are passing.
✅ Verification successful
Let me analyze the search results. I can see that runLogSchemaGroupBy is called in four test files:
- LogBootstrapTest.scala
- DerivationTest.scala
- SchemaEvolutionTest.scala
- FetcherTest.scala
Let's verify if all these calls are using the correct number of arguments by checking their implementations:
All callers have been updated with the correct parameters
All four callers of runLogSchemaGroupBy are using the method with three parameters (mockApi, date string, backfill date string), which matches the method signature in SchemaEvolutionUtils. The test coverage appears complete as all usages have been properly updated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for all usages of runLogSchemaGroupBy to ensure they've been updated
# with the new parameter
echo "Searching for method calls..."
rg -t scala "runLogSchemaGroupBy\(" --no-heading
# Look for potential test files that might use this utility
echo -e "\nSearching for related test files..."
fd -e scala -t f "Test.scala$" spark/src/test
Length of output: 3446
Script:
#!/bin/bash
# Check the implementation of each caller
echo "LogBootstrapTest.scala implementation:"
rg -A 2 -B 2 "runLogSchemaGroupBy" spark/src/test/scala/ai/chronon/spark/test/bootstrap/LogBootstrapTest.scala
echo -e "\nDerivationTest.scala implementation:"
rg -A 2 -B 2 "runLogSchemaGroupBy" spark/src/test/scala/ai/chronon/spark/test/bootstrap/DerivationTest.scala
echo -e "\nSchemaEvolutionTest.scala implementation:"
rg -A 2 -B 2 "runLogSchemaGroupBy" spark/src/test/scala/ai/chronon/spark/test/SchemaEvolutionTest.scala
echo -e "\nFetcherTest.scala implementation:"
rg -A 2 -B 2 "runLogSchemaGroupBy" spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala
Length of output: 2044
spark/src/test/scala/ai/chronon/spark/test/LocalDataLoaderTest.scala (1)
38-38: LGTM! Verify consistent parameter naming across the codebase.
The explicit parameter naming improves code readability. Let's verify this naming convention is consistently applied across other SparkSessionBuilder usages.
✅ Verification successful
Parameter naming is consistently applied across the codebase
The search results show that all usages of SparkSessionBuilder.build across the codebase consistently use explicit parameter naming, including:
local = truelocalWarehouseLocation = Some(...)enforceKryoSerializer = ...additionalConfig = Some(...)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other SparkSessionBuilder.build calls to ensure consistent parameter naming
# Search for SparkSessionBuilder.build calls
rg -t scala "SparkSessionBuilder\.build" -A 3
Length of output: 20476
spark/src/test/scala/ai/chronon/spark/test/JavaFetcherTest.java (2)
25-26: LGTM! Clean import additions.
The new imports for test utilities are properly organized and necessary for the enhanced testing infrastructure.
44-44: Verify SparkSessionBuilder usage across test files.
The SparkSessionBuilder.build() method signature has been updated with an additional boolean parameter. Let's ensure all test files are updated consistently.
✅ Verification successful
Let me analyze the SparkSessionBuilder.build usage across the codebase. I notice there are different method signatures being used:
- Two parameters:
build(namespace, local = true) - Three parameters:
build(namespace, local = true, localWarehouseLocation = Some(path)) - Six parameters (in JavaFetcherTest):
build(namespace, true, true, scala.Option.apply(null), scala.Option.apply(null), true)
Let's verify the actual method signatures in SparkSessionBuilder to understand if this is a valid overload:
Let's try a different approach to find the SparkSessionBuilder definition:
Let's see the complete method signature to understand all parameters:
The SparkSessionBuilder.build usage in JavaFetcherTest is correct
The method signature in SparkSessionBuilder matches exactly with how it's being used in JavaFetcherTest:
def build(name: String,
local: Boolean = false,
hiveSupport: Boolean = true,
localWarehouseLocation: Option[String] = None,
additionalConfig: Option[Map[String, String]] = None,
enforceKryoSerializer: Boolean = true)The Java code correctly calls this method with all six parameters:
SparkSessionBuilder.build(namespace, true, true, scala.Option.apply(null), scala.Option.apply(null), true);All other usages in the codebase are also correct as they rely on the default parameter values. No issues found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for all SparkSessionBuilder.build calls to verify consistent parameter usage
rg -t scala -t java "SparkSessionBuilder\.build\(" -A 1
Length of output: 11473
Script:
#!/bin/bash
# Find SparkSessionBuilder class definition and its build methods
ast-grep --pattern 'object SparkSessionBuilder {
$$$
def build($$$) {
$$$
}
$$$
}'
Length of output: 96
Script:
#!/bin/bash
# Search for SparkSessionBuilder definition using ripgrep
rg -t scala "object SparkSessionBuilder" -A 10
Length of output: 1300
Script:
#!/bin/bash
# Get the complete build method signature
rg -t scala "def build" -A 10 spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
Length of output: 1165
spark/src/test/scala/ai/chronon/spark/test/StreamingTest.scala (2)
30-30: LGTM! Package restructuring looks good.
The import statement correctly reflects the movement of InMemoryKvStore from the test package to utils, suggesting its broader utility across the codebase.
Line range hint 114-117: Address potential resource management and configuration issues.
Several concerns in the test implementation:
- Multiple instances of
InMemoryKvStoreare created which could lead to resource leaks - The timeout value (10000ms) is hardcoded
- Potential concurrency issues when using multiple store instances
Let's verify the usage of InMemoryKvStore across tests:
Consider these improvements:
- val inMemoryKvStore = buildInMemoryKvStore()
- val metadataStore = new MetadataStore(inMemoryKvStore, timeoutMillis = 10000)
+ private val DefaultTimeout = 10000
+ val store = buildInMemoryKvStore()
+ try {
+ val metadataStore = new MetadataStore(store, timeoutMillis = DefaultTimeout)
+ // ... existing code ...
+ } finally {
+ // Add cleanup if store implements AutoCloseable
+ store.close()
+ }spark/src/test/scala/ai/chronon/spark/test/LocalTableExporterTest.scala (1)
44-44: LGTM! Parameter name change improves clarity.
The change from local to localWarehouseLocation makes the parameter's purpose more explicit and self-documenting.
Let's verify that this parameter naming is consistent across the codebase:
✅ Verification successful
Parameter localWarehouseLocation is only used in LocalTableExporterTest while other tests use only local
The search results show that:
- All test files consistently use
local = true - Only
LocalTableExporterTestuses the additionallocalWarehouseLocationparameter - This suggests that
localWarehouseLocationis a new parameter specific to table export functionality
Let's verify the SparkSessionBuilder implementation to understand if this is intentional:
Parameter usage is correct and consistent with the implementation
The codebase analysis reveals that:
SparkSessionBuilder.buildmethod correctly defines bothlocalandlocalWarehouseLocationas optional parameterslocalWarehouseLocationis specifically used for customizing the warehouse directory pathLocalTableExporterTestis the only test that needs to specify a custom warehouse location, which makes sense as it deals with table exports- Other tests can safely use the default warehouse location by omitting this parameter
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for any remaining uses of the old parameter name
# and verify consistent usage of the new parameter name
# Search for SparkSessionBuilder.build calls
rg -t scala "SparkSessionBuilder\.build.*local\s*=" --no-filename
# Search for the new parameter usage
rg -t scala "SparkSessionBuilder\.build.*localWarehouseLocation\s*=" --no-filename
Length of output: 3415
Script:
#!/bin/bash
# Find SparkSessionBuilder definition and implementation
ast-grep --pattern 'object SparkSessionBuilder {
$$$
def build($$$) {
$$$
}
$$$
}'
Length of output: 96
Script:
#!/bin/bash
# Try to find SparkSessionBuilder implementation using ripgrep
rg -t scala "object SparkSessionBuilder" -A 20
Length of output: 2407
spark/src/test/scala/ai/chronon/spark/test/stats/drift/DriftTest.scala (2)
16-18: LGTM: Import paths correctly updated
The import statements have been properly updated to reflect the package restructuring.
92-92: Verify all getSummaries calls are updated
The getSummaries method signature has been updated to require an explicit columnPrefix parameter. While this instance has been updated correctly with None, we should verify other usages.
✅ Verification successful
All getSummaries calls are properly updated
Based on the verification results, all calls to getSummaries are correctly updated with the columnPrefix parameter:
- In
DriftTest.scala: UsesNoneas shown in the original change - In
DriftStore.scala: All calls pass thecolumnPrefixvariable which is properly defined in the scope
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining calls to getSummaries that might need updating
rg -l "getSummaries.*Some.*Some.*[^None]" --type scala
Length of output: 190
Script:
#!/bin/bash
# Let's examine the actual calls to getSummaries in these files
rg "getSummaries.*Some.*Some" --type scala -A 2
Length of output: 1598
Script:
#!/bin/bash
# Let's check the method definition to understand the expected parameters
ast-grep --pattern 'def getSummaries' --type scala
Length of output: 232
Script:
#!/bin/bash
# Let's try to find the method definition using ripgrep instead
rg "def getSummaries.*\(" --type scala -A 2
Length of output: 698
Script:
#!/bin/bash
# Let's see the complete method signature including the columnPrefix parameter
rg "def getSummaries.*\(" --type scala -B 1 -A 4
Length of output: 1379
spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala (1)
37-37: Add ScalaDoc for the new parameter.
Consider adding parameter documentation to explain the purpose and impact of hiveSupport.
Let's verify the impact of this signature change:
✅ Verification successful
The signature change has no breaking impact on existing code.
All existing calls to SparkSessionBuilder.build() in the codebase use named parameters, and the new hiveSupport parameter has a default value of true. The codebase shows consistent usage patterns:
- Most calls use
local = truewith named parameter - Some calls include additional named parameters like
localWarehouseLocationandenforceKryoSerializer - No instances of positional parameter usage found
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for existing calls to build() that might be affected
rg "SparkSessionBuilder\.build\(" --type scala -A 3
Length of output: 20102
spark/src/test/scala/ai/chronon/spark/test/ExternalSourcesTest.scala (1)
22-22: LGTM! Package restructuring looks good.
The import change reflects a good architectural decision to move MockApi to the utils package, which is a more appropriate location for shared testing utilities.
Let's verify that the package restructuring was done consistently:
✅ Verification successful
Package restructuring has been consistently applied across the codebase
The verification confirms that:
MockApiclass is correctly located inai.chronon.spark.utilspackage- All imports (10 files in total) are consistently using the new package path
ai.chronon.spark.utils.MockApi - No references to the old package path remain in the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that MockApi is properly moved and there are no lingering references to the old package
# Check if MockApi exists in the new location
fd -e scala "MockApi.scala" -x cat {} | grep -l "package ai.chronon.spark.utils"
# Check for any remaining references to the old package path
rg "ai.chronon.spark.test.MockApi"
Length of output: 134
Script:
#!/bin/bash
# Let's try a different approach to verify the package restructuring
# First, let's find MockApi class in the codebase
rg -l "class MockApi"
# Then, let's find all imports of MockApi
rg -l "import.*MockApi"
# Finally, let's see the actual content of files containing MockApi
fd -e scala "MockApi.scala" -x cat {}
Length of output: 7952
Script:
#!/bin/bash
# Let's check the actual imports in the files to verify they've been updated correctly
rg -A 1 "import.*MockApi" --type scala
# Also check Java files separately as they might have different import syntax
rg -A 1 "import.*MockApi" --type java
Length of output: 2386
online/src/main/scala/ai/chronon/online/stats/DriftStore.scala (1)
77-77: LGTM! Verify all callers are updated.
The removal of the default value for columnPrefix makes the API more explicit, which is a good practice. The internal usage through getSummariesForRange maintains backward compatibility by providing the default.
Let's verify that all callers have been updated to provide the columnPrefix parameter explicitly:
✅ Verification successful
All callers have been updated to provide the columnPrefix parameter explicitly
The verification shows that all callers of getSummaries are providing the columnPrefix parameter explicitly:
- In
DriftStore.scala: All internal calls pass through thecolumnPrefixparameter correctly - In
DriftTest.scala: The test explicitly providesNoneas the parameter
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for direct calls to getSummaries to ensure they provide the columnPrefix parameter
ast-grep --pattern 'getSummaries($$$, $$$, $$$)'
# Search for any remaining references that might need updating
rg -A 2 'getSummaries\('
Length of output: 2460
spark/src/test/scala/ai/chronon/spark/test/OnlineUtils.scala (2)
34-36: LGTM: Well-organized test utility imports
The new imports from ai.chronon.spark.utils package appropriately introduce the necessary test utilities for in-memory operations.
Line range hint 134-134: Clarify deprecation timeline for putStreaming method
The TODO comment indicates that putStreaming should be deprecated, but there's no clear timeline or migration plan. Consider:
- Adding a
@Deprecatedannotation with explanation - Documenting the migration path to
putStreamingNew - Setting a timeline for removal
spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala (1)
Line range hint 1-34: LGTM! Package reorganization improves code organization.
The move from test package to scripts package better reflects the utility nature of this code. The import organization is clean and well-structured.
spark/src/test/scala/ai/chronon/spark/test/FetcherTest.scala (3)
38-38: LGTM!
The import statement for MockApi is correctly placed and follows the project's import organization pattern.
Line range hint 479-481: LGTM! Good enhancement to test isolation.
The addition of the namespace parameter and MockApi initialization improves test isolation and makes the helper method more flexible.
Line range hint 673-677: LGTM! Well-structured test isolation.
Each test case uses a unique namespace, following testing best practices by ensuring proper isolation between test cases. This prevents potential interference and makes tests more reliable.
Also applies to: 679-684, 686-690
spark/src/main/scala/ai/chronon/spark/Driver.scala (1)
150-152: LGTM! Parameter naming improves code clarity.
The explicit naming of the localWarehouseLocation parameter enhances code readability and maintainability.
Let's verify consistent parameter naming across the codebase:
✅ Verification successful
Let's analyze the SparkSessionBuilder.build method calls across the codebase to verify parameter naming consistency.
Based on the analysis of the codebase, I can now provide the final response:
Parameter naming is consistent with method signature, but some calls could be improved
From the verification results:
- The
SparkSessionBuilder.buildmethod haslocalWarehouseLocationas an optional named parameter - Most test files use minimal parameters (name and local only) which is fine as they use default values
- Only a few files explicitly use
localWarehouseLocation:LocalTableExporterTest.scalaLocalDataLoaderTest.scalaDriver.scala(the file under review)
The explicit parameter naming in the reviewed change aligns with the method signature and improves readability where the parameter is actually used.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent parameter naming in SparkSessionBuilder.build calls
# Expect: All calls should use named parameters for localWarehouseLocation
# Search for all SparkSessionBuilder.build method calls
ast-grep --pattern 'SparkSessionBuilder.build($$$)'
Length of output: 8701
Script:
#!/bin/bash
# Let's examine the SparkSessionBuilder class to understand the method signature
ast-grep --pattern 'def build($$$)'
Length of output: 37
Script:
#!/bin/bash
# Let's try to find the SparkSessionBuilder class definition and its build method
rg -A 10 "object SparkSessionBuilder"
Length of output: 1291
Script:
#!/bin/bash
# Let's see the complete method signature of build
rg -A 20 "def build\(name: String," spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
Length of output: 1028
| #!/bin/bash | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add environment variable verification
The script uses several environment variables (SPARK_JAR, CLOUD_AWS_JAR, ONLINE_CLASS) but doesn't verify their presence at the start.
Add this at the beginning of the script:
# Verify required environment variables
for var in SPARK_JAR CLOUD_AWS_JAR ONLINE_CLASS; do
if [[ -z "${!var}" ]]; then
echo "Error: Required environment variable $var is not set" >&2
exit 1
fi
done
spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala
Show resolved
Hide resolved
spark/src/main/scala/ai/chronon/spark/stats/drift/scripts/PrepareData.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🧹 Outside diff range and nitpick comments (9)
docker-init/demo/Dockerfile (4)
16-18: Document the user ID referenceAdd a comment to clarify that user ID 185 refers to the spark user for better maintainability.
RUN mkdir -p /opt/chronon/jars && \ - chown -R 185:185 /opt/chronon && \ + # 185 is the spark user ID in the base image + chown -R 185:185 /opt/chronon && \ chmod 755 /opt/chronon/jars
21-22: Optimize Dockerfile layersConsider moving these ENV commands to the Java installation RUN block to reduce the number of layers.
RUN apt-get update && \ ... && \ - update-alternatives --set java /usr/lib/jvm/java-17-amazon-corretto/bin/java + update-alternatives --set java /usr/lib/jvm/java-17-amazon-corretto/bin/java && \ + echo "export JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto" >> /etc/environment && \ + echo "export PATH=\$PATH:\$JAVA_HOME/bin" >> /etc/environment
28-31: Consolidate classpath configurationsThe current setup has redundant classpath definitions. Consider consolidating them into a single environment variable.
-ENV SPARK_CLASSPATH="/opt/spark/jars/*" -ENV SPARK_DIST_CLASSPATH="/opt/spark/jars/*" -ENV SPARK_EXTRA_CLASSPATH="/opt/spark/jars/*:/opt/chronon/jars/*" -ENV HADOOP_CLASSPATH="/opt/spark/jars/*" +ENV SPARK_CLASSPATH="/opt/spark/jars/*:/opt/chronon/jars/*" +ENV SPARK_DIST_CLASSPATH="${SPARK_CLASSPATH}" +ENV SPARK_EXTRA_CLASSPATH="${SPARK_CLASSPATH}" +ENV HADOOP_CLASSPATH="/opt/spark/jars/*"
33-33: Consider using a more appropriate entrypointUsing
tail -f /dev/nullis a workaround to keep the container running. Consider implementing a proper entrypoint script that handles signals and container lifecycle properly.Example entrypoint script:
#!/bin/bash trap "exit" TERM INT while true; do sleep 1 donespark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (3)
29-35: Consider enhancing the Time utility method.While functional, the timing utility could be improved:
- Add try-catch block to handle exceptions in the timed block
- Consider using
System.nanoTime()for more precise measurements- Make color output configurable for environments where it might not be supported
- def Time(message: String)(block: => Unit): Unit = { + def Time[T](message: String)(block: => T): T = { println(s"$message..".yellow) - val start = System.currentTimeMillis() + val start = System.nanoTime() try { - block + val result = block + val end = System.nanoTime() + val durationMs = (end - start) / 1_000_000.0 + println(s"$message took $durationMs ms".green) + result } catch { + case e: Exception => + println(s"$message failed: ${e.getMessage}".red) + throw e } finally { - val end = System.currentTimeMillis() - println(s"$message took ${end - start} ms".green) } }
56-60: Enhance summarization configuration and validation.The summarization needs improvements:
- Make
useLogsparameter configurable- Add validation of computation success
Time("Summarizing data") { // compute summary table and packed table (for uploading) - Summarizer.compute(join.metaData, ds = endDs, useLogs = true) + val useLogs = spark.conf.getBoolean("chronon.summarizer.use_logs", defaultValue = true) + val summary = Summarizer.compute(join.metaData, ds = endDs, useLogs = useLogs) + require(summary != null, "Summary computation failed") }
86-86: Address the TODO comment about drift store integration.The TODO indicates missing integration work that should be tracked.
Would you like me to create a GitHub issue to track the drift store integration task?
spark/src/main/scala/ai/chronon/spark/TableUtils.scala (2)
79-82: Fix inconsistency between comment and actual threshold valueThe comment states "default threshold is 100K rows" but the actual default value in the code is 1000000 (1M). Please update either the comment or the value to maintain consistency.
- // default threshold is 100K rows + // default threshold is 1M rows
Line range hint
1-824: Well-structured table utilities with robust error handlingThe overall implementation of
TableUtilsis well-designed with:
- Proper error handling for table operations
- Configurable parameters with sensible defaults
- Robust schema evolution support
- Type-safe operations
- Comprehensive documentation
However, consider adding unit tests for critical paths, especially around:
- Schema evolution logic
- Partition management
- Error handling scenarios
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
docker-init/demo/Dockerfile(1 hunks)docker-init/demo/README.md(1 hunks)docker-init/demo/build.sh(1 hunks)docker-init/demo/run.sh(1 hunks)spark/src/main/scala/ai/chronon/spark/TableUtils.scala(3 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docker-init/demo/README.md
🧰 Additional context used
🪛 Shellcheck (0.9.0-1)
docker-init/demo/build.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
docker-init/demo/run.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
🔇 Additional comments (1)
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (1)
1-26: LGTM! Well-organized imports.
The imports are logically grouped and all appear to be utilized within the implementation.
docker-init/demo/run.sh
Outdated
| docker stop spark-app | ||
| docker rm spark-app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for Docker commands
The Docker stop and remove commands should handle cases where the container doesn't exist or can't be stopped.
Replace with this more robust implementation:
-docker stop spark-app
-docker rm spark-app
+if docker ps -a | grep -q spark-app; then
+ docker stop spark-app || echo "Failed to stop container"
+ docker rm spark-app || echo "Failed to remove container"
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| docker stop spark-app | |
| docker rm spark-app | |
| if docker ps -a | grep -q spark-app; then | |
| docker stop spark-app || echo "Failed to stop container" | |
| docker rm spark-app || echo "Failed to remove container" | |
| fi |
| docker run -d \ | ||
| --name spark-app \ | ||
| -v $HOME/repos/chronon/spark/target/scala-2.12:/opt/chronon/jars \ | ||
| obs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve volume mount path handling
The volume mount path assumes a specific directory structure which could cause issues in different environments.
Consider:
- Making the path configurable via environment variable
- Validating the path exists before mounting
+CHRONON_JAR_PATH="${CHRONON_JAR_PATH:-$HOME/repos/chronon/spark/target/scala-2.12}"
+
+if [ ! -d "$CHRONON_JAR_PATH" ]; then
+ echo "Error: JAR directory not found: $CHRONON_JAR_PATH"
+ exit 1
+fi
+
docker run -d \
--name spark-app \
- -v $HOME/repos/chronon/spark/target/scala-2.12:/opt/chronon/jars \
+ -v "$CHRONON_JAR_PATH":/opt/chronon/jars \
obs📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| docker run -d \ | |
| --name spark-app \ | |
| -v $HOME/repos/chronon/spark/target/scala-2.12:/opt/chronon/jars \ | |
| obs | |
| CHRONON_JAR_PATH="${CHRONON_JAR_PATH:-$HOME/repos/chronon/spark/target/scala-2.12}" | |
| if [ ! -d "$CHRONON_JAR_PATH" ]; then | |
| echo "Error: JAR directory not found: $CHRONON_JAR_PATH" | |
| exit 1 | |
| fi | |
| docker run -d \ | |
| --name spark-app \ | |
| -v "$CHRONON_JAR_PATH":/opt/chronon/jars \ | |
| obs |
| println(s"Repartitioning and writing into table $tableName".yellow) | ||
| repartitionAndWrite(finalizedDf, tableName, saveMode, stats, sortByCols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logger instead of println for repartitioning status
For critical operations like repartitioning and writing data, it's important to maintain proper logging through the logging framework rather than using println.
- println(s"Repartitioning and writing into table $tableName".yellow)
+ logger.info(s"Repartitioning and writing into table $tableName")Committable suggestion skipped: line range outside the PR's diff.
| println(s"Table $tableName already exists, skipping creation") | ||
| case e: Exception => | ||
| logger.error(s"Failed to create table $tableName", e) | ||
| println(s"Failed to create table $tableName", e) | ||
| throw e | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert println statements back to logger calls
Using println for logging is not recommended in production code as it:
- Bypasses the logging framework's configuration
- Makes it harder to manage and redirect logs
- Reduces observability in production environments
- println(s"Table $tableName already exists, skipping creation")
+ logger.info(s"Table $tableName already exists, skipping creation")
- println(s"Failed to create table $tableName", e)
+ logger.error(s"Failed to create table $tableName", e)Committable suggestion skipped: line range outside the PR's diff.
piyush-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to figure out the best way to wire the InMem store with the appropriate data - I'm thinking it might be worth adding a controller that exposes a bulkPut endpoint that you can call in the summary uploader - the backend of that controller will be the in-mem store. This way we have the spark jvm computing stuff and throwing it over the wall to the play jvm with the in mem kv store
| .appName(name) | ||
| .enableHiveSupport() | ||
|
|
||
| if (hiveSupport) baseBuilder = baseBuilder.enableHiveSupport() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed as part of this PR? I don't see it in use - can we drop for a follow up?
| println(s"Table $tableName already exists, skipping creation") | ||
| case e: Exception => | ||
| logger.error(s"Failed to create table $tableName", e) | ||
| println(s"Failed to create table $tableName", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
| // so that an exception will be thrown below | ||
| dfRearranged | ||
| } | ||
| println(s"Repartitioning and writing into table $tableName".yellow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
Outdated
Show resolved
Hide resolved
| // test drift store methods | ||
| val driftStore = new DriftStore(api.genKvStore) | ||
|
|
||
| // TODO: Wire up drift store into hub and create an endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm wiring up to kick off play from within the JVM process might end up being painful (it's typically triggered via the command line to launch play.core.server.ProdServerStart with the appropriate params / jars etc)
piyush-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from log4j.properties the rest is minor clean up stuff. Stamping to unblock for now and I'll build on top of this with the http bulkPut code on my side
| --driver-class-path "/opt/spark/jars/*:/opt/chronon/jars/*" \ | ||
| --conf "spark.driver.host=localhost" \ | ||
| --conf "spark.driver.bindAddress=0.0.0.0" \ | ||
| --class ai.chronon.spark.scripts.ObservabilityDemo \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we running the in-memory kvstore on the spark driver? curious how that'd work in a cluster-mode setup or is that out of scope for the demo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah basically only for the demo. we don't plan to launch this.
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala
Outdated
Show resolved
Hide resolved
| object ObservabilityDemo { | ||
|
|
||
| def Time(message: String)(block: => Unit): Unit = { | ||
| println(s"$message..".yellow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: change to log message.
| def Time(message: String)(block: => Unit): Unit = { | ||
| println(s"$message..".yellow) | ||
| val start = System.currentTimeMillis() | ||
| block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this to be threadsafe or no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this demo i think
| val start = System.currentTimeMillis() | ||
| block | ||
| val end = System.currentTimeMillis() | ||
| println(s"$message took ${end - start} ms".green) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: change to log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (4)
docker-init/demo/run.sh (1)
24-33: Consider cluster mode compatibilityBased on the conversation about in-memory kvstore running on the Spark driver, this setup might not work in cluster mode. While it's noted as "only for demo", it would be helpful to document this limitation.
Consider adding a comment in the script:
# Submit with increased memory +# Note: This demo uses driver-local kvstore and is not suitable for cluster mode docker exec spark-app \spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (3)
132-132: Extract timeout configuration and use consistent duration creation.Hardcoded timeout values should be configurable and use a consistent creation pattern.
+ private val DefaultTimeout: Duration = Duration(10, TimeUnit.SECONDS) + def main(args: Array[String]): Unit = { // ... existing code ... - val summaries = Await.result(summariesFuture, Duration.create(10, TimeUnit.SECONDS)) + val summaries = Await.result(summariesFuture, DefaultTimeout) // ... in drift series fetching ... - driftSeries = Await.result(driftSeriesFuture.get, Duration.create(10, TimeUnit.SECONDS)) + driftSeries = Await.result(driftSeriesFuture.get, DefaultTimeout) // ... in summary series fetching ... - summarySeries = Await.result(summarySeriesFuture.get, Duration.create(10, TimeUnit.SECONDS)) + summarySeries = Await.result(summarySeriesFuture.get, DefaultTimeout)Also applies to: 145-145, 169-169
183-185: Use logger consistently instead of println.Replace
printlnwith logger calls for consistent logging throughout the code.- println(s"""summary ptile totals: $summaryTotals - |summary ptile nulls: $summaryNulls - |""".stripMargin) + logger.info(s"""summary ptile totals: $summaryTotals + |summary ptile nulls: $summaryNulls + |""".stripMargin)
121-121: Address the TODO comment about wiring up drift store.The TODO comment indicates missing functionality that needs to be implemented.
Would you like me to help create a GitHub issue to track the implementation of wiring up the drift store into the hub and creating an endpoint?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
docker-init/demo/Dockerfile(1 hunks)docker-init/demo/run.sh(1 hunks)docker-init/start.sh(3 hunks)spark/src/main/resources/logback.xml(1 hunks)spark/src/main/scala/ai/chronon/spark/TableUtils.scala(1 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- spark/src/main/resources/logback.xml
🚧 Files skipped from review as they are similar to previous changes (3)
- docker-init/demo/Dockerfile
- docker-init/start.sh
- spark/src/main/scala/ai/chronon/spark/TableUtils.scala
docker-init/demo/run.sh
Outdated
| # Run new container | ||
| docker run -d \ | ||
| --name spark-app \ | ||
| -v "$CHRONON_JAR_PATH":/opt/chronon/jars:/opt/chronon/jars \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate path in volume mount
There's a duplicate path in the volume mount specification which would cause Docker to fail.
- -v "$CHRONON_JAR_PATH":/opt/chronon/jars:/opt/chronon/jars \
+ -v "$CHRONON_JAR_PATH":/opt/chronon/jars \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -v "$CHRONON_JAR_PATH":/opt/chronon/jars:/opt/chronon/jars \ | |
| -v "$CHRONON_JAR_PATH":/opt/chronon/jars \ |
| --conf "spark.driver.host=localhost" \ | ||
| --conf "spark.driver.bindAddress=0.0.0.0" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Network configuration needs to be updated for Docker Compose environment
The current network configuration in run.sh using localhost and 0.0.0.0 is incompatible with the Docker Compose setup. The compose file shows that:
- A custom network
temporal-networkis defined - Spark services are running in the compose environment
- The Spark master is accessible via hostname
spark(not localhost)
The Spark configuration should be updated to:
- Replace
spark.driver.host=localhostwithspark.driver.host=spark - Keep
spark.driver.bindAddress=0.0.0.0as it allows accepting connections from any network interface
🔗 Analysis chain
Verify network configuration for container networking
The current network configuration might not work correctly in all Docker network configurations. The localhost and 0.0.0.0 bindings need verification.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if custom Docker networks are used
docker network ls --format "{{.Name}}" | grep -q "spark" && {
echo "Custom Spark network exists. Current localhost binding might not work."
}
# Check container's network mode
docker inspect spark-app --format '{{.HostConfig.NetworkMode}}'
Length of output: 316
Script:
#!/bin/bash
# Let's analyze the Docker and Spark configuration in the codebase
# Search for Docker network configurations
rg -l "docker.*network"
# Search for Spark network configurations
rg "spark\.(driver|executor)\.(host|bindAddress)" -A 2
# Look for Docker Compose files that might define networks
fd -e yml -e yaml | xargs grep -l "networks:"
# Check if there are any other Spark configuration files
fd -e conf -e properties | grep -i spark
Length of output: 922
Script:
#!/bin/bash
# Let's check the network configuration in compose file and other Spark settings
cat docker-init/compose.yaml
# Check Spark default configurations
cat quickstart/conf/spark-defaults.conf
# Check the SparkSessionBuilder implementation for network settings
cat spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala
Length of output: 11114
| object ObservabilityDemo { | ||
| @transient lazy val logger: Logger = LoggerFactory.getLogger(getClass) | ||
|
|
||
| def Time(message: String)(block: => Unit): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Rename Time method to follow Scala naming conventions.
Method names in Scala should start with lowercase letters.
- def Time(message: String)(block: => Unit): Unit = {
+ def time(message: String)(block: => Unit): Unit = {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def Time(message: String)(block: => Unit): Unit = { | |
| def time(message: String)(block: => Unit): Unit = { |
| def Time(message: String)(block: => Unit): Unit = { | ||
| logger.info(s"$message..".yellow) | ||
| val start = System.currentTimeMillis() | ||
| block | ||
| val end = System.currentTimeMillis() | ||
| logger.info(s"$message took ${end - start} ms".green) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve timing implementation for thread safety and precision.
The current implementation has two issues:
- It's not thread-safe as timing operations from multiple threads could interleave.
System.currentTimeMillis()is less precise thanSystem.nanoTime()for duration measurements.
- def Time(message: String)(block: => Unit): Unit = {
+ def time(message: String)(block: => Unit): Unit = {
logger.info(s"$message..".yellow)
- val start = System.currentTimeMillis()
+ val start = System.nanoTime()
block
- val end = System.currentTimeMillis()
- logger.info(s"$message took ${end - start} ms".green)
+ val end = System.nanoTime()
+ val durationMs = TimeUnit.NANOSECONDS.toMillis(end - start)
+ logger.info(s"$message took $durationMs ms".green)
}For thread safety, consider using a synchronized block or atomic operations if concurrent timing is needed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def Time(message: String)(block: => Unit): Unit = { | |
| logger.info(s"$message..".yellow) | |
| val start = System.currentTimeMillis() | |
| block | |
| val end = System.currentTimeMillis() | |
| logger.info(s"$message took ${end - start} ms".green) | |
| } | |
| def time(message: String)(block: => Unit): Unit = { | |
| logger.info(s"$message..".yellow) | |
| val start = System.nanoTime() | |
| block | |
| val end = System.nanoTime() | |
| val durationMs = TimeUnit.NANOSECONDS.toMillis(end - start) | |
| logger.info(s"$message took $durationMs ms".green) | |
| } |
| def main(args: Array[String]): Unit = { | ||
|
|
||
| val config = new Conf(args) | ||
| val startDs = config.startDs() | ||
| val endDs = config.endDs() | ||
| val rowCount = config.rowCount() | ||
| val namespace = config.namespace() | ||
|
|
||
| val spark = SparkSessionBuilder.build(namespace, local = true) | ||
| implicit val tableUtils: TableUtils = TableUtils(spark) | ||
| tableUtils.createDatabase(namespace) | ||
|
|
||
| // generate anomalous data (join output) | ||
| val prepareData = PrepareData(namespace) | ||
| val join = prepareData.generateAnomalousFraudJoin | ||
|
|
||
| Time("Preparing data") { | ||
| val df = prepareData.generateFraudSampleData(rowCount, startDs, endDs, join.metaData.loggedTable) | ||
| df.show(10, truncate = false) | ||
| } | ||
|
|
||
| Time("Summarizing data") { | ||
| // compute summary table and packed table (for uploading) | ||
| Summarizer.compute(join.metaData, ds = endDs, useLogs = true) | ||
| } | ||
|
|
||
| val packedTable = join.metaData.packedSummaryTable | ||
| // mock api impl for online fetching and uploading | ||
| val kvStoreFunc: () => KVStore = () => { | ||
| // cannot reuse the variable - or serialization error | ||
| val result = InMemoryKvStore.build(namespace, () => null) | ||
| result | ||
| } | ||
| val api = new MockApi(kvStoreFunc, namespace) | ||
|
|
||
| // create necessary tables in kvstore | ||
| val kvStore = api.genKvStore | ||
| kvStore.create(Constants.MetadataDataset) | ||
| kvStore.create(Constants.TiledSummaryDataset) | ||
|
|
||
| // upload join conf | ||
| api.buildFetcher().putJoinConf(join) | ||
|
|
||
| Time("Uploading summaries") { | ||
| val uploader = new SummaryUploader(tableUtils.loadTable(packedTable), api) | ||
| uploader.run() | ||
| } | ||
|
|
||
| // test drift store methods | ||
| val driftStore = new DriftStore(api.genKvStore) | ||
|
|
||
| // TODO: Wire up drift store into hub and create an endpoint | ||
|
|
||
| // fetch keys | ||
| val tileKeys = driftStore.tileKeysForJoin(join) | ||
| val tileKeysSimple = tileKeys.mapValues(_.map(_.column).toSeq) | ||
| tileKeysSimple.foreach { case (k, v) => logger.info(s"$k -> [${v.mkString(", ")}]") } | ||
|
|
||
| // fetch summaries | ||
| val startMs = PartitionSpec.daily.epochMillis(startDs) | ||
| val endMs = PartitionSpec.daily.epochMillis(endDs) | ||
| val summariesFuture = driftStore.getSummaries(join, Some(startMs), Some(endMs), None) | ||
| val summaries = Await.result(summariesFuture, Duration.create(10, TimeUnit.SECONDS)) | ||
| logger.info(summaries.toString()) | ||
|
|
||
| var driftSeries: Seq[TileDriftSeries] = null | ||
| // fetch drift series | ||
| Time("Fetching drift series") { | ||
| val driftSeriesFuture = driftStore.getDriftSeries( | ||
| join.metaData.nameToFilePath, | ||
| DriftMetric.JENSEN_SHANNON, | ||
| lookBack = new Window(7, chronon.api.TimeUnit.DAYS), | ||
| startMs, | ||
| endMs | ||
| ) | ||
| driftSeries = Await.result(driftSeriesFuture.get, Duration.create(10, TimeUnit.SECONDS)) | ||
| } | ||
|
|
||
| val (nulls, totals) = driftSeries.iterator.foldLeft(0 -> 0) { | ||
| case ((nulls, total), s) => | ||
| val currentNulls = s.getPercentileDriftSeries.iterator().toScala.count(_ == null) | ||
| val currentCount = s.getPercentileDriftSeries.size() | ||
| (nulls + currentNulls, total + currentCount) | ||
| } | ||
|
|
||
| logger.info(s"""drift totals: $totals | ||
| |drift nulls: $nulls | ||
| |""".stripMargin.red) | ||
|
|
||
| logger.info("Drift series fetched successfully".green) | ||
|
|
||
| var summarySeries: Seq[TileSummarySeries] = null | ||
|
|
||
| Time("Fetching summary series") { | ||
| val summarySeriesFuture = driftStore.getSummarySeries( | ||
| join.metaData.nameToFilePath, | ||
| startMs, | ||
| endMs | ||
| ) | ||
| summarySeries = Await.result(summarySeriesFuture.get, Duration.create(10, TimeUnit.SECONDS)) | ||
| } | ||
|
|
||
| val (summaryNulls, summaryTotals) = summarySeries.iterator.foldLeft(0 -> 0) { | ||
| case ((nulls, total), s) => | ||
| if (s.getPercentiles == null) { | ||
| (nulls + 1) -> (total + 1) | ||
| } else { | ||
| val currentNulls = s.getPercentiles.iterator().toScala.count(_ == null) | ||
| val currentCount = s.getPercentiles.size() | ||
| (nulls + currentNulls, total + currentCount) | ||
| } | ||
| } | ||
|
|
||
| println(s"""summary ptile totals: $summaryTotals | ||
| |summary ptile nulls: $summaryNulls | ||
| |""".stripMargin) | ||
|
|
||
| logger.info("Summary series fetched successfully".green) | ||
|
|
||
| spark.stop() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add proper resource cleanup using try-finally.
Ensure resources are properly cleaned up even if an error occurs.
def main(args: Array[String]): Unit = {
+ var spark = null
+ try {
val config = new Conf(args)
// ... existing configuration setup ...
- val spark = SparkSessionBuilder.build(namespace, local = true)
+ spark = SparkSessionBuilder.build(namespace, local = true)
// ... rest of the code ...
- spark.stop()
+ } catch {
+ case e: Exception =>
+ logger.error("Failed to run observability demo", e)
+ throw e
+ } finally {
+ if (spark != null) {
+ try {
+ spark.stop()
+ } catch {
+ case e: Exception =>
+ logger.warn("Failed to stop Spark session", e)
+ }
+ }
+ }
}Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (1)
121-121: Track and implement the drift store integration TODO.The TODO comment indicates missing functionality for drift store integration. This should be tracked and implemented.
Would you like me to help create a GitHub issue to track this task? I can provide a detailed implementation plan considering the JVM process integration challenges mentioned in the previous review.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
docker-init/demo/run.sh(1 hunks)spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docker-init/demo/run.sh
🔇 Additional comments (4)
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (4)
1-30: LGTM! Imports are well-organized and necessary.
34-40: Skipping comment as issues are already addressed in previous reviews.
82-89: Skipping comment as issues are already addressed in previous reviews.
96-109: Skipping comment as issues are already addressed in previous reviews.
| val startMs = PartitionSpec.daily.epochMillis(startDs) | ||
| val endMs = PartitionSpec.daily.epochMillis(endDs) | ||
| val summariesFuture = driftStore.getSummaries(join, Some(startMs), Some(endMs), None) | ||
| val summaries = Await.result(summariesFuture, Duration.create(10, TimeUnit.SECONDS)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Make timeout values configurable.
Multiple operations use hardcoded 10-second timeouts. These should be configurable to accommodate different environments and network conditions.
+ val defaultTimeout = Duration.create(
+ config.timeout.getOrElse(10),
+ TimeUnit.SECONDS
+ )
+
- val summaries = Await.result(summariesFuture, Duration.create(10, TimeUnit.SECONDS))
+ val summaries = Await.result(summariesFuture, defaultTimeout)
- driftSeries = Await.result(driftSeriesFuture.get, Duration.create(10, TimeUnit.SECONDS))
+ driftSeries = Await.result(driftSeriesFuture.get, defaultTimeout)
- summarySeries = Await.result(summarySeriesFuture.get, Duration.create(10, TimeUnit.SECONDS))
+ summarySeries = Await.result(summarySeriesFuture.get, defaultTimeout)Add timeout configuration to the Conf class:
val timeout: ScallopOption[Int] = opt[Int](
name = "timeout",
default = Some(10),
descr = "Timeout in seconds for async operations"
)Also applies to: 145-145, 169-169
|
|
||
| val endDs: ScallopOption[String] = opt[String]( | ||
| name = "end-ds", | ||
| default = Some("2023-02-30"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix invalid default end date.
The default end date "2023-02-30" is invalid as February never has 30 days. This could cause runtime issues.
- default = Some("2023-02-30"),
+ default = Some("2023-02-28"),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default = Some("2023-02-30"), | |
| default = Some("2023-02-28"), |
| spark.stop() | ||
| System.exit(0) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve cleanup process.
The cleanup process has two issues:
- No error handling for spark.stop()
- Using System.exit(0) might prevent proper cleanup of resources
- spark.stop()
- System.exit(0)
+ try {
+ spark.stop()
+ } catch {
+ case e: Exception =>
+ logger.error("Failed to stop Spark session", e)
+ throw e
+ } finally {
+ // Allow natural program termination instead of forcing exit
+ }Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (2)
132-132: Make timeout values configurable.Multiple operations use hardcoded 10-second timeouts. These should be configurable to accommodate different environments and network conditions.
+ val defaultTimeout = Duration.create( + config.timeout.getOrElse(10), + TimeUnit.SECONDS + ) - val summaries = Await.result(summariesFuture, Duration.create(10, TimeUnit.SECONDS)) + val summaries = Await.result(summariesFuture, defaultTimeout) - driftSeries = Await.result(driftSeriesFuture.get, Duration.create(10, TimeUnit.SECONDS)) + driftSeries = Await.result(driftSeriesFuture.get, defaultTimeout) - summarySeries = Await.result(summarySeriesFuture.get, Duration.create(10, TimeUnit.SECONDS)) + summarySeries = Await.result(summarySeriesFuture.get, defaultTimeout)Also applies to: 145-145, 169-169
121-121: Address TODO comment about wiring up drift store.The TODO comment indicates missing functionality for integrating the drift store with the hub.
Would you like me to help create a GitHub issue to track this task and provide implementation guidance?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala(1 hunks)
🔇 Additional comments (2)
spark/src/main/scala/ai/chronon/spark/scripts/ObservabilityDemo.scala (2)
189-191:
Improve cleanup process.
The cleanup process needs proper error handling and should avoid using System.exit(0).
- spark.stop()
- System.exit(0)
+ try {
+ spark.stop()
+ } catch {
+ case e: Exception =>
+ logger.error("Failed to stop Spark session", e)
+ throw e
+ }Likely invalid or redundant comment.
49-53:
Fix invalid default end date.
The default end date "2023-02-30" is invalid as February never has 30 days.
val endDs: ScallopOption[String] = opt[String](
name = "end-ds",
- default = Some("2023-02-30"),
+ default = Some("2023-02-28"),
descr = "End date in YYYY-MM-DD format"
)Likely invalid or redundant comment.
| var driftSeries: Seq[TileDriftSeries] = null | ||
| // fetch drift series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace mutable state with immutable variables.
Use val instead of var to prevent accidental modifications and improve code clarity.
- var driftSeries: Seq[TileDriftSeries] = null
+ val driftSeries: Seq[TileDriftSeries] = time("Fetching drift series") {
+ val driftSeriesFuture = driftStore.getDriftSeries(
+ // ... existing parameters ...
+ )
+ Await.result(driftSeriesFuture.get, defaultTimeout)
+ }
- var summarySeries: Seq[TileSummarySeries] = null
+ val summarySeries: Seq[TileSummarySeries] = time("Fetching summary series") {
+ // ... move the fetching logic here ...
+ }Also applies to: 161-161
| def time(message: String)(block: => Unit): Unit = { | ||
| logger.info(s"$message..".yellow) | ||
| val start = System.currentTimeMillis() | ||
| block | ||
| val end = System.currentTimeMillis() | ||
| logger.info(s"$message took ${end - start} ms".green) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve timing implementation for precision and naming convention.
- Use
System.nanoTime()instead ofSystem.currentTimeMillis()for more precise duration measurements - Follow Scala naming conventions by using lowercase for method names
- def time(message: String)(block: => Unit): Unit = {
+ def time(message: String)(block: => Unit): Unit = {
logger.info(s"$message..".yellow)
- val start = System.currentTimeMillis()
+ val start = System.nanoTime()
block
- val end = System.currentTimeMillis()
- logger.info(s"$message took ${end - start} ms".green)
+ val end = System.nanoTime()
+ val durationMs = TimeUnit.NANOSECONDS.toMillis(end - start)
+ logger.info(s"$message took $durationMs ms".green)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def time(message: String)(block: => Unit): Unit = { | |
| logger.info(s"$message..".yellow) | |
| val start = System.currentTimeMillis() | |
| block | |
| val end = System.currentTimeMillis() | |
| logger.info(s"$message took ${end - start} ms".green) | |
| } | |
| def time(message: String)(block: => Unit): Unit = { | |
| logger.info(s"$message..".yellow) | |
| val start = System.nanoTime() | |
| block | |
| val end = System.nanoTime() | |
| val durationMs = TimeUnit.NANOSECONDS.toMillis(end - start) | |
| logger.info(s"$message took $durationMs ms".green) | |
| } |
| class Conf(arguments: Seq[String]) extends ScallopConf(arguments) { | ||
| val startDs: ScallopOption[String] = opt[String]( | ||
| name = "start-ds", | ||
| default = Some("2023-01-01"), | ||
| descr = "Start date in YYYY-MM-DD format" | ||
| ) | ||
|
|
||
| val endDs: ScallopOption[String] = opt[String]( | ||
| name = "end-ds", | ||
| default = Some("2023-02-30"), | ||
| descr = "End date in YYYY-MM-DD format" | ||
| ) | ||
|
|
||
| val rowCount: ScallopOption[Int] = opt[Int]( | ||
| name = "row-count", | ||
| default = Some(700000), | ||
| descr = "Number of rows to generate" | ||
| ) | ||
|
|
||
| val namespace: ScallopOption[String] = opt[String]( | ||
| name = "namespace", | ||
| default = Some("observability_demo"), | ||
| descr = "Namespace for the demo" | ||
| ) | ||
|
|
||
| verify() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation for configuration parameters.
The configuration parameters need validation to ensure:
- Valid date formats
- Positive row count
- End date is after start date
class Conf(arguments: Seq[String]) extends ScallopConf(arguments) {
+ private def isValidDate(date: String): Boolean = {
+ try {
+ PartitionSpec.daily.isValid(date)
+ } catch {
+ case _: Exception => false
+ }
+ }
+
val startDs: ScallopOption[String] = opt[String](
name = "start-ds",
default = Some("2023-01-01"),
descr = "Start date in YYYY-MM-DD format"
- )
+ ).validate(date => isValidDate(date))
val endDs: ScallopOption[String] = opt[String](
name = "end-ds",
default = Some("2023-02-28"),
descr = "End date in YYYY-MM-DD format"
- )
+ ).validate(date => isValidDate(date))
val rowCount: ScallopOption[Int] = opt[Int](
name = "row-count",
default = Some(700000),
descr = "Number of rows to generate"
- )
+ ).validate(count => count > 0)
+ validate((conf: Conf) => {
+ val start = conf.startDs()
+ val end = conf.endDs()
+ PartitionSpec.daily.epochMillis(start) < PartitionSpec.daily.epochMillis(end)
+ }) { "End date must be after start date" }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class Conf(arguments: Seq[String]) extends ScallopConf(arguments) { | |
| val startDs: ScallopOption[String] = opt[String]( | |
| name = "start-ds", | |
| default = Some("2023-01-01"), | |
| descr = "Start date in YYYY-MM-DD format" | |
| ) | |
| val endDs: ScallopOption[String] = opt[String]( | |
| name = "end-ds", | |
| default = Some("2023-02-30"), | |
| descr = "End date in YYYY-MM-DD format" | |
| ) | |
| val rowCount: ScallopOption[Int] = opt[Int]( | |
| name = "row-count", | |
| default = Some(700000), | |
| descr = "Number of rows to generate" | |
| ) | |
| val namespace: ScallopOption[String] = opt[String]( | |
| name = "namespace", | |
| default = Some("observability_demo"), | |
| descr = "Namespace for the demo" | |
| ) | |
| verify() | |
| } | |
| class Conf(arguments: Seq[String]) extends ScallopConf(arguments) { | |
| private def isValidDate(date: String): Boolean = { | |
| try { | |
| PartitionSpec.daily.isValid(date) | |
| } catch { | |
| case _: Exception => false | |
| } | |
| } | |
| val startDs: ScallopOption[String] = opt[String]( | |
| name = "start-ds", | |
| default = Some("2023-01-01"), | |
| descr = "Start date in YYYY-MM-DD format" | |
| ).validate(date => isValidDate(date)) | |
| val endDs: ScallopOption[String] = opt[String]( | |
| name = "end-ds", | |
| default = Some("2023-02-28"), | |
| descr = "End date in YYYY-MM-DD format" | |
| ).validate(date => isValidDate(date)) | |
| val rowCount: ScallopOption[Int] = opt[Int]( | |
| name = "row-count", | |
| default = Some(700000), | |
| descr = "Number of rows to generate" | |
| ).validate(count => count > 0) | |
| val namespace: ScallopOption[String] = opt[String]( | |
| name = "namespace", | |
| default = Some("observability_demo"), | |
| descr = "Namespace for the demo" | |
| ) | |
| validate((conf: Conf) => { | |
| val start = conf.startDs() | |
| val end = conf.endDs() | |
| PartitionSpec.daily.epochMillis(start) < PartitionSpec.daily.epochMillis(end) | |
| }) { "End date must be after start date" } | |
| verify() | |
| } |
…ontend (#95) ## Summary Builds on a couple of the summary computation PRs and data generation to wire things up so that Hub can serve them. * Yanked out mock data based endpoints (model perf / drift, join & feature skew) - decided it would be confusing to have a mix of mock and generated data so we just have the generated data served * Dropped a few of the scripts introduced in #87. We bring up our containers the way and we have a script `load_summaries.sh` that we can trigger that leverages the existing app container to load data. * DDB ingestion was taking too long and we were dropping a lot of data due to rejected execution exceptions. To unblock for now, we've gone with an approach of making a bulk put HTTP call from the ObservabilityDemo app -> Hub and Hub utilizing a InMemoryKV store to persist and serve up features. * Added an endpoint to serve the join that are configured as we've switched from the model based world. There's still an issue to resolve around fetching individual feature series data. Once I resolve that, we can switch this PR out of wip mode. To test / run: start up our docker containers: ``` $ docker-compose -f docker-init/compose.yaml up --build ... ``` In a different term load data: ``` $ ./docker-init/demo/load_summaries.sh Done uploading summaries! 🥳 ``` You can now curl join & feature time series data. Join drift (null ratios) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=null&offset=10h&algorithm=psi' ``` Join drift (value drift) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=10h&algorithm=psi' ``` Feature drift: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=aggregates' ``` Feature summaries: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=percentile' ``` Join metadata ``` curl -X GET 'http://localhost:9000/api/v1/joins' curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join' ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new `JoinController` for managing joins with pagination support. - Added functionality for an in-memory key-value store with bulk data upload capabilities. - Implemented observability demo data loading within a Spark application. - Added a new `HTTPKVStore` class for remote key-value store interactions over HTTP. - **Improvements** - Enhanced the `ModelController` and `SearchController` to align with the new join data structure. - Updated the `TimeSeriesController` to support asynchronous operations and improved error handling. - Refined dependency management in the build configuration for better clarity and maintainability. - Updated API routes to include new endpoints for listing and retrieving joins. - Updated configuration to replace the `DynamoDBModule` with `ModelStoreModule`, adding `InMemoryKVStoreModule` and `DriftStoreModule`. - **Documentation** - Revised README instructions for Docker container setup and demo data loading. - Updated API routes documentation to reflect new endpoints for joins and in-memory data operations. - **Bug Fixes** - Resolved issues related to error handling in various controllers and improved logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: nikhil-zlai <[email protected]>
## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced logging configuration for Spark sessions to reduce verbosity. - Improved timing and error handling in the data generation script. - New method introduced for alternative streaming data handling in `OnlineUtils`. - Added a demonstration object for observability features in Spark applications. - New configuration file for structured logging setup. - **Bug Fixes** - Adjusted method signatures to ensure clarity and correct parameter usage in various classes. - **Documentation** - Updated import statements to reflect package restructuring for better organization. - Added instructions for building and executing the project in the README. - **Tests** - Integrated `MockApi` into various test classes to enhance testing capabilities and simulate API interactions. - Enhanced test coverage by utilizing the `MockApi` for more robust testing scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ontend (#95) ## Summary Builds on a couple of the summary computation PRs and data generation to wire things up so that Hub can serve them. * Yanked out mock data based endpoints (model perf / drift, join & feature skew) - decided it would be confusing to have a mix of mock and generated data so we just have the generated data served * Dropped a few of the scripts introduced in #87. We bring up our containers the way and we have a script `load_summaries.sh` that we can trigger that leverages the existing app container to load data. * DDB ingestion was taking too long and we were dropping a lot of data due to rejected execution exceptions. To unblock for now, we've gone with an approach of making a bulk put HTTP call from the ObservabilityDemo app -> Hub and Hub utilizing a InMemoryKV store to persist and serve up features. * Added an endpoint to serve the join that are configured as we've switched from the model based world. There's still an issue to resolve around fetching individual feature series data. Once I resolve that, we can switch this PR out of wip mode. To test / run: start up our docker containers: ``` $ docker-compose -f docker-init/compose.yaml up --build ... ``` In a different term load data: ``` $ ./docker-init/demo/load_summaries.sh Done uploading summaries! 🥳 ``` You can now curl join & feature time series data. Join drift (null ratios) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=null&offset=10h&algorithm=psi' ``` Join drift (value drift) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=10h&algorithm=psi' ``` Feature drift: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=aggregates' ``` Feature summaries: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=percentile' ``` Join metadata ``` curl -X GET 'http://localhost:9000/api/v1/joins' curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join' ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new `JoinController` for managing joins with pagination support. - Added functionality for an in-memory key-value store with bulk data upload capabilities. - Implemented observability demo data loading within a Spark application. - Added a new `HTTPKVStore` class for remote key-value store interactions over HTTP. - **Improvements** - Enhanced the `ModelController` and `SearchController` to align with the new join data structure. - Updated the `TimeSeriesController` to support asynchronous operations and improved error handling. - Refined dependency management in the build configuration for better clarity and maintainability. - Updated API routes to include new endpoints for listing and retrieving joins. - Updated configuration to replace the `DynamoDBModule` with `ModelStoreModule`, adding `InMemoryKVStoreModule` and `DriftStoreModule`. - **Documentation** - Revised README instructions for Docker container setup and demo data loading. - Updated API routes documentation to reflect new endpoints for joins and in-memory data operations. - **Bug Fixes** - Resolved issues related to error handling in various controllers and improved logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: nikhil-zlai <[email protected]>
## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced logging configuration for Spark sessions to reduce verbosity. - Improved timing and error handling in the data generation script. - New method introduced for alternative streaming data handling in `OnlineUtils`. - Added a demonstration object for observability features in Spark applications. - New configuration file for structured logging setup. - **Bug Fixes** - Adjusted method signatures to ensure clarity and correct parameter usage in various classes. - **Documentation** - Updated import statements to reflect package restructuring for better organization. - Added instructions for building and executing the project in the README. - **Tests** - Integrated `MockApi` into various test classes to enhance testing capabilities and simulate API interactions. - Enhanced test coverage by utilizing the `MockApi` for more robust testing scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ontend (#95) ## Summary Builds on a couple of the summary computation PRs and data generation to wire things up so that Hub can serve them. * Yanked out mock data based endpoints (model perf / drift, join & feature skew) - decided it would be confusing to have a mix of mock and generated data so we just have the generated data served * Dropped a few of the scripts introduced in #87. We bring up our containers the way and we have a script `load_summaries.sh` that we can trigger that leverages the existing app container to load data. * DDB ingestion was taking too long and we were dropping a lot of data due to rejected execution exceptions. To unblock for now, we've gone with an approach of making a bulk put HTTP call from the ObservabilityDemo app -> Hub and Hub utilizing a InMemoryKV store to persist and serve up features. * Added an endpoint to serve the join that are configured as we've switched from the model based world. There's still an issue to resolve around fetching individual feature series data. Once I resolve that, we can switch this PR out of wip mode. To test / run: start up our docker containers: ``` $ docker-compose -f docker-init/compose.yaml up --build ... ``` In a different term load data: ``` $ ./docker-init/demo/load_summaries.sh Done uploading summaries! 🥳 ``` You can now curl join & feature time series data. Join drift (null ratios) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=null&offset=10h&algorithm=psi' ``` Join drift (value drift) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=10h&algorithm=psi' ``` Feature drift: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=aggregates' ``` Feature summaries: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=percentile' ``` Join metadata ``` curl -X GET 'http://localhost:9000/api/v1/joins' curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join' ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new `JoinController` for managing joins with pagination support. - Added functionality for an in-memory key-value store with bulk data upload capabilities. - Implemented observability demo data loading within a Spark application. - Added a new `HTTPKVStore` class for remote key-value store interactions over HTTP. - **Improvements** - Enhanced the `ModelController` and `SearchController` to align with the new join data structure. - Updated the `TimeSeriesController` to support asynchronous operations and improved error handling. - Refined dependency management in the build configuration for better clarity and maintainability. - Updated API routes to include new endpoints for listing and retrieving joins. - Updated configuration to replace the `DynamoDBModule` with `ModelStoreModule`, adding `InMemoryKVStoreModule` and `DriftStoreModule`. - **Documentation** - Revised README instructions for Docker container setup and demo data loading. - Updated API routes documentation to reflect new endpoints for joins and in-memory data operations. - **Bug Fixes** - Resolved issues related to error handling in various controllers and improved logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: nikhil-zlai <[email protected]>
## Summary ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced logging configuration for Spark sessions to reduce verbosity. - Improved timing and error handling in the data generation script. - New method introduced for alternative streaming data handling in `OnlineUtils`. - Added a demonstration object for observability features in Spark applications. - New configuration file for structured logging setup. - **Bug Fixes** - Adjusted method signatures to ensure clarity and correct parameter usage in various classes. - **Documentation** - Updated import statements to reflect package restructuring for better organization. - Added instructions for building and executing the project in the README. - **Tests** - Integrated `MockApi` into various test classes to enhance testing capabilities and simulate API interactions. - Enhanced test coverage by utilizing the `MockApi` for more robust testing scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ontend (#95) ## Summary Builds on a couple of the summary computation PRs and data generation to wire things up so that Hub can serve them. * Yanked out mock data based endpoints (model perf / drift, join & feature skew) - decided it would be confusing to have a mix of mock and generated data so we just have the generated data served * Dropped a few of the scripts introduced in #87. We bring up our containers the way and we have a script `load_summaries.sh` that we can trigger that leverages the existing app container to load data. * DDB ingestion was taking too long and we were dropping a lot of data due to rejected execution exceptions. To unblock for now, we've gone with an approach of making a bulk put HTTP call from the ObservabilityDemo app -> Hub and Hub utilizing a InMemoryKV store to persist and serve up features. * Added an endpoint to serve the join that are configured as we've switched from the model based world. There's still an issue to resolve around fetching individual feature series data. Once I resolve that, we can switch this PR out of wip mode. To test / run: start up our docker containers: ``` $ docker-compose -f docker-init/compose.yaml up --build ... ``` In a different term load data: ``` $ ./docker-init/demo/load_summaries.sh Done uploading summaries! 🥳 ``` You can now curl join & feature time series data. Join drift (null ratios) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=null&offset=10h&algorithm=psi' ``` Join drift (value drift) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=10h&algorithm=psi' ``` Feature drift: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=aggregates' ``` Feature summaries: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=percentile' ``` Join metadata ``` curl -X GET 'http://localhost:9000/api/v1/joins' curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join' ``` ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new `JoinController` for managing joins with pagination support. - Added functionality for an in-memory key-value store with bulk data upload capabilities. - Implemented observability demo data loading within a Spark application. - Added a new `HTTPKVStore` class for remote key-value store interactions over HTTP. - **Improvements** - Enhanced the `ModelController` and `SearchController` to align with the new join data structure. - Updated the `TimeSeriesController` to support asynchronous operations and improved error handling. - Refined dependency management in the build configuration for better clarity and maintainability. - Updated API routes to include new endpoints for listing and retrieving joins. - Updated configuration to replace the `DynamoDBModule` with `ModelStoreModule`, adding `InMemoryKVStoreModule` and `DriftStoreModule`. - **Documentation** - Revised README instructions for Docker container setup and demo data loading. - Updated API routes documentation to reflect new endpoints for joins and in-memory data operations. - **Bug Fixes** - Resolved issues related to error handling in various controllers and improved logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: nikhil-zlai <[email protected]>
## Summary ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Enhanced logging configuration for Spark sessions to reduce verbosity. - Improved timing and error handling in the data generation script. - New method introduced for alternative streaming data handling in `OnlineUtils`. - Added a demonstration object for observability features in Spark applications. - New configuration file for structured logging setup. - **Bug Fixes** - Adjusted method signatures to ensure clarity and correct parameter usage in various classes. - **Documentation** - Updated import statements to reflect paour clientsage restructuring for better organization. - Added instructions for building and executing the project in the README. - **Tests** - Integrated `Moour clientsApi` into various test classes to enhance testing capabilities and simulate API interactions. - Enhanced test coverage by utilizing the `Moour clientsApi` for more robust testing scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ontend (#95) ## Summary Builds on a couple of the summary computation PRs and data generation to wire things up so that Hub can serve them. * Yanked out moour clients data based endpoints (model perf / drift, join & feature skew) - decided it would be confusing to have a mix of moour clients and generated data so we just have the generated data served * Dropped a few of the scripts introduced in #87. We bring up our containers the way and we have a script `load_summaries.sh` that we can trigger that leverages the existing app container to load data. * DDB ingestion was taking too long and we were dropping a lot of data due to rejected execution exceptions. To unbloour clients for now, we've gone with an approach of making a bulk put HTTP call from the ObservabilityDemo app -> Hub and Hub utilizing a InMemoryKV store to persist and serve up features. * Added an endpoint to serve the join that are configured as we've switched from the model based world. There's still an issue to resolve around fetching individual feature series data. Once I resolve that, we can switch this PR out of wip mode. To test / run: start up our doour clientser containers: ``` $ doour clientser-compose -f doour clientser-init/compose.yaml up --build ... ``` In a different term load data: ``` $ ./doour clientser-init/demo/load_summaries.sh Done uploading summaries! 🥳 ``` You can now curl join & feature time series data. Join drift (null ratios) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=null&offset=10h&algorithm=psi' ``` Join drift (value drift) ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=10h&algorithm=psi' ``` Feature drift: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=aggregates' ``` Feature summaries: ``` curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join/feature/dim_user_account_type/timeseries?startTs=1673308800000&endTs=1674172800000&metricType=drift&metrics=value&offset=1D&algorithm=psi&granularity=percentile' ``` Join metadata ``` curl -X GET 'http://localhost:9000/api/v1/joins' curl -X GET 'http://localhost:9000/api/v1/join/risk.user_transactions.txn_join' ``` ## Cheour clientslist - [X] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new `JoinController` for managing joins with pagination support. - Added functionality for an in-memory key-value store with bulk data upload capabilities. - Implemented observability demo data loading within a Spark application. - Added a new `HTTPKVStore` class for remote key-value store interactions over HTTP. - **Improvements** - Enhanced the `ModelController` and `SearchController` to align with the new join data structure. - Updated the `TimeSeriesController` to support asynchronous operations and improved error handling. - Refined dependency management in the build configuration for better clarity and maintainability. - Updated API routes to include new endpoints for listing and retrieving joins. - Updated configuration to replace the `DynamoDBModule` with `ModelStoreModule`, adding `InMemoryKVStoreModule` and `DriftStoreModule`. - **Documentation** - Revised README instructions for Doour clientser container setup and demo data loading. - Updated API routes documentation to reflect new endpoints for joins and in-memory data operations. - **Bug Fixes** - Resolved issues related to error handling in various controllers and improved logging for better traceability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: nikhil-zlai <[email protected]>
Summary
Checklist
Summary by CodeRabbit
New Features
OnlineUtils.Bug Fixes
Documentation
Tests
MockApiinto various test classes to enhance testing capabilities and simulate API interactions.MockApifor more robust testing scenarios.