-
Notifications
You must be signed in to change notification settings - Fork 8
Add DynamoDB kv store implementation #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes in this pull request involve modifications to several files to enhance the project's functionality, particularly with the integration of Amazon DynamoDB. A new project definition for Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant KVStore
participant DynamoDB
User->>KVStore: create(dataset, props)
KVStore->>DynamoDB: Create table with dataset
DynamoDB-->>KVStore: Table created
KVStore-->>User: Success
User->>KVStore: multiGet(requests)
KVStore->>DynamoDB: Retrieve items
DynamoDB-->>KVStore: Return items
KVStore-->>User: Items retrieved
User->>KVStore: multiPut(keyValueDatasets)
KVStore->>DynamoDB: Batch write items
DynamoDB-->>KVStore: Write success
KVStore-->>User: Items stored
Possibly related PRs
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
78aa430 to
2f5c5e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🧹 Outside diff range and nitpick comments (11)
online/src/main/scala/ai/chronon/online/Metrics.scala (1)
42-42: LGTM! Consider future metric implementations.The addition of the
KVStoreconstant to theEnvironmentenumeration is appropriate and aligns with the PR objective of implementing a DynamoDB key-value store. The naming convention is consistent with other entries in the enumeration.As this is part of a WIP PR, here are some considerations for future implementation:
- Ensure that you implement corresponding metrics using this new environment constant when you add the actual key-value store operations.
- Consider adding comments to document the purpose and usage of this new environment, especially if it introduces any unique metrics or behaviors compared to existing environments.
- Update any relevant documentation or README files to reflect this new metric environment.
build.sbt (2)
181-190: LGTM with suggestions: cloud_aws project structureThe new
cloud_awsproject is well-structured and includes appropriate dependencies for DynamoDB implementation. However, consider the following suggestions:
If JSON handling is needed in the main code, consider moving Circe dependencies to compile scope:
libraryDependencies ++= Seq( "io.circe" %% "circe-core" % circeVersion, "io.circe" %% "circe-generic" % circeVersion, "io.circe" %% "circe-parser" % circeVersion )Specify a version for the AWS SDK to ensure consistency across builds:
val awsSdkVersion = "2.25.35" libraryDependencies += "software.amazon.awssdk" % "dynamodb" % awsSdkVersionConsider adding ScalaTest for unit testing:
libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.19" % TestThese changes will enhance the project's maintainability and testing capabilities.
Line range hint
1-256: Summary: DynamoDB KV store implementation progressThe changes to
build.sbtsuccessfully integrate the newcloud_awsproject for DynamoDB implementation. This aligns well with the PR objectives of adding a key-value store using DynamoDB. The project structure and dependencies are appropriate for the task.However, to fully meet the PR objectives, please consider the following next steps:
- Implement the actual DynamoDB key-value store logic in the
cloud_awsproject.- Add unit tests as mentioned in your PR checklist.
- Ensure integration with existing CI processes.
- Perform integration testing.
- Update relevant documentation.
These steps will help complete the implementation and make the PR ready for final review.
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (3)
59-59: Consider using a dynamic port for the DynamoDB serverHardcoding the DynamoDB server port to
8000may lead to port conflicts if the port is already in use. Using a dynamically assigned port can prevent such conflicts and make the tests more robust.Modify the setup to use an available port:
- server = ServerRunner.createServerFromCommandLineArgs(Array("-inMemory", "-port", "8000")) + val port = 8000 // You can use a method to find an available port dynamically + server = ServerRunner.createServerFromCommandLineArgs(Array("-inMemory", "-port", port.toString))Alternatively, implement a utility to find an available port at runtime.
12-16: Use a consistent testing frameworkMixing JUnit annotations with ScalaTest matchers can lead to confusion and potential integration issues. Consider using a single testing framework consistently throughout the test class to ensure clarity and maintainability.
Option 1: Use ScalaTest annotations and fixtures.
-import org.junit.After -import org.junit.Before -import org.junit.Test +import org.scalatest.BeforeAndAfter +import org.scalatest.funsuite.AnyFunSuite -class DynamoDBKVStoreTest { +class DynamoDBKVStoreTest extends AnyFunSuite with BeforeAndAfter { - @Before + before { - @After + after { - @Test - def testCreatePKeyOnlyTable(): Unit = { + test("testCreatePKeyOnlyTable") {Option 2: Use JUnit matchers instead of ScalaTest matchers.
-import org.scalatest.matchers.must.Matchers.be -import org.scalatest.matchers.should.Matchers.convertToAnyShouldWrapper +import org.junit.Assert._ ... - putResults.length shouldBe putReqs.length + assertEquals(putReqs.length, putResults.length)Choose the framework that aligns best with your project's standards.
115-115: Review the use of fixed timeouts withAwait.resultUsing a fixed timeout of
1.secondinAwait.resultmay cause tests to become flaky if operations take longer than expected, especially under heavy load or on slower machines. Consider increasing the timeout or using asynchronous testing constructs provided by the testing framework.Options to address this:
- Increase the timeout duration to a higher value, such as
5.secondsor10.seconds.- Use ScalaTest's
AsyncFunSuiteor similar to handle asynchronous code without blocking.- Refactor the code to avoid using
Await.resultwhere possible.Example of increasing the timeout:
- val putResults = Await.result(kvStore.multiPut(putReqs), 1.second) + val putResults = Await.result(kvStore.multiPut(putReqs), 5.seconds)Also applies to: 149-149, 180-180
online/src/main/scala/ai/chronon/online/Api.scala (2)
57-58: Add Scaladoc comments forListRequestandListResponseTo enhance code documentation and maintainability, consider adding Scaladoc comments to the new
ListRequestandListResponsecase classes. This will help other developers understand their purpose and how to use them.
71-73: Clarify the default implementation oflistmethodThe
listmethod currently throws aNotImplementedError. If subclasses are expected to override this method, consider adding an@throwsannotation or a comment indicating that this method should be implemented by subclasses. This will provide clearer guidance to developers implementing the trait.cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (3)
280-295: Enhance exception messages for better debuggingThe exception messages in
extractTimedValuesandextractListValuesare generic and may not provide sufficient context during debugging.Include the dataset name or additional context in the exception messages:
if (responseMap.isEmpty) - throw new Exception("Empty response returned from DynamoDB") + throw new Exception(s"Empty response returned from DynamoDB for dataset: ${request.dataset}") // Similarly for other exceptionsAlso applies to: 297-310
99-101: Implement retry logic for transient DynamoDB exceptionsWhen a
ProvisionedThroughputExceededExceptionoccurs, the code logs the error but doesn't attempt to retry the operation, which could be successful on a subsequent attempt due to capacity recovery.Implement exponential backoff and retry logic to handle transient exceptions:
import akka.pattern.retry def withRetry[T](operation: => T, retries: Int = 3): T = { retry(() => Future(operation), retries)(ExecutionContext.global).await }Apply this pattern to DynamoDB operations where transient failures are expected.
Also applies to: 139-151, 165-177, 216-228, 256-268
186-186: Avoid magic numbers; define a constant for default list limitThe default list limit is hardcoded as
100, which can be unclear and error-prone.Define a constant for the default list limit:
val listLimit = - request.props.get(DynamoDBKVStoreConstants.listLimit).map(p => p.asInstanceOf[Int]).getOrElse(100) + request.props.get(DynamoDBKVStoreConstants.listLimit).map(p => p.asInstanceOf[Int]).getOrElse(defaultListLimit) // Add to DynamoDBKVStoreConstants + val defaultListLimit = 100
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- .gitignore (1 hunks)
- build.sbt (2 hunks)
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (1 hunks)
- online/src/main/scala/ai/chronon/online/Api.scala (3 hunks)
- online/src/main/scala/ai/chronon/online/Metrics.scala (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🔇 Additional comments (4)
build.sbt (1)
54-54: LGTM: Appropriate inclusion of cloud_aws projectThe addition of
cloud_awsto the root project's aggregation is consistent with the PR objectives of implementing DynamoDB support. This change maintains the existing structure and correctly integrates the new AWS-related project.online/src/main/scala/ai/chronon/online/Api.scala (1)
23-24: Imports forListRequestandListResponseare appropriateThe added import statements for
ListRequestandListResponseare necessary for the new functionalities introduced.cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
329-329: Confirm default end timestamp behavior in queriesIn the
buildQueryRequestmethod,endTsdefaults to the current system time if not provided. This could lead to unintended results if the client's clock is skewed.Ensure that defaulting to
System.currentTimeMillis()aligns with the expected behavior and doesn't introduce time-related bugs. If necessary, consider requiringendTsMillisto be explicitly provided.
55-57: Address the TODO for tuning capacity unitsThe default read and write capacity units are set to
10L, and there is a TODO comment to tune these values.Review the expected read and write throughput for your application and adjust
defaultReadCapacityUnitsanddefaultWriteCapacityUnitsaccordingly to optimize cost and performance.
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
Outdated
Show resolved
Hide resolved
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
Outdated
Show resolved
Hide resolved
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
Show resolved
Hide resolved
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
Show resolved
Hide resolved
cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (1)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1)
281-285: Define 'valueBytes' as a constantThe string
"valueBytes"is used multiple times in the code. Defining it as a constant inDynamoDBKVStoreConstantscan prevent typos and make future modifications easier.Example:
// In DynamoDBKVStoreConstants val valueBytesColumn = "valueBytes"And update the code accordingly:
- "valueBytes" -> AttributeValue.builder.b(SdkBytes.fromByteArray(valueBytes)).build + valueBytesColumn -> AttributeValue.builder.b(SdkBytes.fromByteArray(valueBytes)).build
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
🧰 Additional context used
📓 Learnings (1)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (3)
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-07T14:37:55.680Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111 Timestamp: 2024-10-07T15:04:30.069Z Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-07T15:09:51.567Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Outdated
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Outdated
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Outdated
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
34-59: LGTM: Well-organized constants, but TODO needs addressingThe
DynamoDBKVStoreConstantsobject is well-structured and provides clear, descriptive constants for DynamoDB configuration. This approach enhances maintainability.However, there's a TODO comment on lines 56-58 regarding tuning the default capacity units. It would be beneficial to address this before finalizing the implementation.
Would you like assistance in determining appropriate default values for read and write capacity units?
231-235: Improve error handling for unimplemented bulkPut methodWhile it's good to document unimplemented methods, consider providing a more informative error message instead of using
???. This will help developers understand that the feature is planned but not yet implemented.Consider updating the implementation to:
override def bulkPut(sourceOfflineTable: String, destinationOnlineDataSet: String, partition: String): Unit = { throw new NotImplementedError("bulkPut is not yet implemented for DynamoDB. This feature is planned for future development.") }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
🧰 Additional context used
📓 Learnings (1)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (5)
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-07T15:49:27.648Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:223-242 Timestamp: 2024-10-07T15:39:43.507Z Learning: In `DynamoDBKVStoreImpl.scala`, handling `ProvisionedThroughputExceededException` covers the scenarios for `ThrottlingException` in the `handleDynamoDbOperation` method.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-07T14:37:55.680Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111 Timestamp: 2024-10-07T15:04:30.069Z Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-07T15:09:51.567Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
🔇 Additional comments (8)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (8)
1-32: LGTM: Imports and package declaration are well-organizedThe imports are comprehensive and correctly grouped, covering all necessary AWS SDK components and Scala utilities required for the implementation.
61-65: LGTM: Well-structured class declaration with good practicesThe
DynamoDBKVStoreImplclass is well-defined, extending theKVStoreinterface and using dependency injection for theDynamoDbClient. The inclusion of a metrics context demonstrates good observability practices.
66-123: LGTM: Robust implementation of create methodThe
createmethod is well-implemented with proper error handling, metrics tracking, and use of a waiter to ensure table creation is complete. It correctly handles different property types and provides informative logging.
125-165: LGTM: Well-implemented multiGet method with proper concurrencyThe
multiGetmethod efficiently handles both simple get requests and query requests. It makes good use ofFuturefor asynchronous processing and leverages helper methods for building requests and extracting values.
167-202: LGTM: Well-implemented list method with pagination supportThe
listmethod effectively implements pagination using DynamoDB's scan operation. It correctly handles continuation keys for subsequent requests and uses Future for asynchronous processing. The default limit of 100 is a reasonable choice.
204-229: LGTM: Efficient implementation of multiPut methodThe
multiPutmethod effectively handles multiple put operations usingFuturefor asynchronous processing. It correctly usesFuture.sequencefor concurrent processing of all put requests and leverages a helper method for handling DynamoDB operations.
237-322: LGTM: Well-structured and reusable helper methodsThe private helper methods are well-organized and promote code reuse. The
handleDynamoDbOperationmethod provides comprehensive error handling, while the methods for extracting values and building requests are clear and focused. This structure enhances the overall maintainability of the code.
1-323: Overall: Excellent implementation of DynamoDB-based key-value storeThis implementation of a DynamoDB-based key-value store is well-structured, efficient, and adheres to best practices. Key strengths include:
- Proper use of asynchronous processing with Future.
- Comprehensive error handling and metrics tracking.
- Well-organized constants and helper methods promoting maintainability.
- Efficient implementation of core operations (create, multiGet, list, multiPut).
Minor improvements could be made by addressing the TODO comments and providing a more informative error for the unimplemented bulkPut method. Overall, this is a solid foundation for a DynamoDB-based key-value store.
52addef to
26347b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (4)
online/src/main/scala/ai/chronon/online/Api.scala (3)
56-58: New case classes added for listing functionalityThe addition of
ListRequestandListResponsecase classes is a good design choice for implementing the new listing functionality. These classes provide a flexible structure for both input parameters and return values.Consider adding scaladoc comments to these new case classes to explain their purpose and the meaning of each field, especially the
propsandresultPropsmaps. This will improve code maintainability and make it easier for other developers to understand and use these classes correctly.
69-70: New create method added with properties supportThe addition of the
create(dataset: String, props: Map[String, Any])method enhances theKVStoretrait by allowing the creation of datasets with associated properties. The default implementation maintains backward compatibility by calling the existingcreate(dataset)method.Consider adding a scaladoc comment to this new method explaining its purpose and the potential use of the
propsparameter. While the default implementation doesn't useprops, it's important to document that subclasses (likeDynamoDBKVStoreImpl) may utilize this parameter for specific implementations.
71-74: New list method added to KVStore traitThe addition of the
list(request: ListRequest)method to theKVStoretrait is a good step towards implementing listing functionality. The default implementation throwing aNotImplementedErrorensures that subclasses must provide their own implementation if they support listing.Consider adding a scaladoc comment to this new method explaining its purpose and noting that it's not implemented by default. This will help other developers understand that they need to override this method in their KVStore implementations if they want to support listing functionality.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1)
34-59: Consider enhancingDynamoDBKVStoreConstantswith ScalaDoc and more specific types.While the constants are well-defined, adding ScalaDoc comments would improve documentation. Also, consider using more specific types for some constants:
object DynamoDBKVStoreConstants { /** Read capacity units to configure DynamoDB table with */ val readCapacityUnits: String = "read-capacity" /** Write capacity units to configure DynamoDB table with */ val writeCapacityUnits: String = "write-capacity" /** Indicates if this table is meant to be time sorted in Dynamo */ val isTimedSorted: String = "is-time-sorted" /** Limit of max number of entries to return in a list call */ val listLimit: String = "limit" /** Continuation key to help with list pagination */ val continuationKey: String = "continuation-key" /** Name of the partition key column to use */ val partitionKeyColumn: String = "keyBytes" /** Name of the time sort key column to use */ val sortKeyColumn: String = "timestamp" // TODO: tune these val defaultReadCapacityUnits: Long = 10L val defaultWriteCapacityUnits: Long = 10L }This change improves code readability and maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- .gitignore (1 hunks)
- build.sbt (2 hunks)
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (1 hunks)
- online/src/main/scala/ai/chronon/online/Api.scala (3 hunks)
- online/src/main/scala/ai/chronon/online/Metrics.scala (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- online/src/main/scala/ai/chronon/online/Metrics.scala
🚧 Files skipped from review as they are similar to previous changes (2)
- .gitignore
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
🧰 Additional context used
📓 Learnings (2)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (5)
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-07T15:49:27.648Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:223-242 Timestamp: 2024-10-07T15:39:43.507Z Learning: In `DynamoDBKVStoreImpl.scala`, handling `ProvisionedThroughputExceededException` covers the scenarios for `ThrottlingException` in the `handleDynamoDbOperation` method.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-07T14:37:55.680Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111 Timestamp: 2024-10-07T15:04:30.069Z Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-07T15:09:51.567Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.online/src/main/scala/ai/chronon/online/Api.scala (3)
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111 Timestamp: 2024-10-07T15:04:30.069Z Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: online/src/main/scala/ai/chronon/online/Api.scala:69-69 Timestamp: 2024-10-07T15:21:50.787Z Learning: In the `KVStore` trait located at `online/src/main/scala/ai/chronon/online/Api.scala`, the default implementation of the `create` method (`def create(dataset: String, props: Map[String, Any]): Unit = create(dataset)`) doesn't leverage the `props` parameter, but subclasses like `DynamoDBKVStoreImpl` use the `props` parameter in their overridden implementations.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: online/src/main/scala/ai/chronon/online/Api.scala:46-50 Timestamp: 2024-10-07T15:17:18.494Z Learning: When adding new parameters with default values to Scala case classes like `GetRequest`, existing usages don't need updating if backward compatibility is intended.
🔇 Additional comments (7)
build.sbt (4)
Line range hint
1-1: LGTM: Scala version updateThe update to Scala version 2.12.18 is a good practice for keeping dependencies current. This minor version update is unlikely to introduce breaking changes and is consistent with the version specified in the
inThisBuildsection.
Line range hint
1-190: Overall assessment: build.sbt changesThe modifications to
build.sbteffectively integrate the newcloud_awsproject into the build configuration. The Scala version update and root project aggregation changes are straightforward and beneficial. Thecloud_awsproject definition is well-structured, though there are suggestions regarding dependency scopes and inclusion.To further improve the configuration:
- Review the necessity of Spark dependencies in the
cloud_awsproject.- Consider adjusting the scope of Circe dependencies if they're needed in the main code.
- Run the suggested verification commands to ensure proper integration and dependency management.
These changes provide a solid foundation for the DynamoDB key-value store implementation. Great job on maintaining build consistency while introducing new functionality!
54-54: LGTM: Root project aggregation updatedThe addition of
cloud_awsto the root project aggregation is consistent with the PR objectives and ensures the new AWS-related functionality will be included in the build process.To ensure the new project is properly integrated, you may want to run the following command:
This command will attempt to switch to the
cloud_awsproject and display its name. If successful, it confirms that the project is properly integrated into the build.
181-190: LGTM with suggestions: cloud_aws project definitionThe new
cloud_awsproject definition aligns well with the PR objectives. Good job including DynamoDBLocal for testing purposes. However, there are a couple of points to consider:
- Circe dependencies are currently scoped to "test". If JSON handling is required in the main code, consider removing the test scope.
- Spark dependencies are included, but it's not clear if they're necessary for a DynamoDB implementation. If they're not used, consider removing them to keep the project lean.
To ensure the dependencies are correctly added and to check for any conflicts, you can run:
This will help verify that all required dependencies are present and there are no version conflicts.
online/src/main/scala/ai/chronon/online/Api.scala (2)
23-24: New imports added for ListRequest and ListResponseThe new imports for
ListRequestandListResponseare correctly placed and correspond to the new case classes added to theKVStoreobject. This change is consistent with the introduction of the new listing functionality.
46-50: Approved: Enhanced GetRequest with endTsMillis parameterThe addition of the
endTsMillisparameter toGetRequestimproves the functionality by allowing range scans. The default value ofNonemaintains backward compatibility, so existing usages don't need to be updated. This change is well-documented with an explanatory comment.cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1)
1-32: LGTM: Imports and package declaration look good.The imports are well-organized and cover all the necessary dependencies for DynamoDB operations and Scala utilities. The package declaration is correct for the project structure.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Outdated
Show resolved
Hide resolved
26347b9 to
3c6711b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
58-60: Consider addressing the TODO comment for tuning default capacity units.The TODO comment suggests that the default read and write capacity units may need tuning. It would be beneficial to revisit these values based on expected usage patterns and performance requirements.
227-231: Create a GitHub issue to track the implementation of bulkPut.The bulkPut method is currently unimplemented, with a TODO comment explaining the requirements. To ensure this doesn't get overlooked, it would be beneficial to create a GitHub issue to track this task.
Would you like me to create a GitHub issue for implementing the bulkPut method? The issue could include the following points:
- Implement bulkPut for DynamoDB store
- Transform Parquet data to Amazon's Ion format
- Implement table swapping mechanism (as bulkLoad only writes to new tables)
- Consider performance implications and best practices for bulk loading in DynamoDB
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
🧰 Additional context used
📓 Learnings (1)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (9)
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-08T16:18:45.669Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-07T14:37:55.680Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-07T15:49:27.648Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-08T16:18:45.669Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:167-229 Timestamp: 2024-10-14T14:05:23.413Z Learning: Leveraging DynamoDB batch writes for performance optimization in `multiPut` will be addressed in a future review.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:167-229 Timestamp: 2024-10-14T14:05:23.413Z Learning: Errors and logging are handled at a granular level in `handleDynamoDbOperation` by passing the metrics context and updating relevant metric values based on the invoking operation.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111 Timestamp: 2024-10-07T15:04:30.069Z Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-08T16:18:45.669Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-07T15:09:51.567Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
🔇 Additional comments (2)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
63-116: LGTM! The create method implementation is robust and well-structured.The create method properly handles different scenarios, including error cases, and uses a waiter to ensure the table is fully created. The code also demonstrates good practices in logging and metrics tracking.
1-329: Overall, the DynamoDBKVStoreImpl is well-implemented and robust.The implementation provides a comprehensive interface for interacting with DynamoDB as a key-value store. It demonstrates good practices in error handling, asynchronous operations, and code organization. The use of constants, helper methods, and clear method implementations contributes to the overall quality of the code.
Some minor improvements could be made:
- Implementing batch writes for the multiPut method to improve performance.
- Refining error handling in some methods to use more functional approaches.
- Extracting some logic into separate methods for improved readability.
These suggestions aside, the current implementation is solid and ready for use, with clear TODO comments for future enhancements.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Outdated
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
aefe14a to
e9f3b38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (8)
build.sbt (2)
181-190: LGTM: cloud_aws project definitionThe new
cloud_awsproject definition is well-structured and aligns with the PR objectives. It includes appropriate dependencies for working with DynamoDB and handling JSON data. The inclusion of DynamoDBLocal for testing is a good practice.Consider moving the Circe dependencies to compile scope if they're needed in the main code, not just for testing. If they're only used in tests, the current configuration is correct.
Line range hint
13-13: LGTM: Version updates for Scala and SparkThe updates to Scala (2.12.18) and Spark (3.5.1) versions are appropriate and align with the project's goal of using recent stable versions. These minor version updates should provide bug fixes and performance improvements without introducing breaking changes.
Consider updating the comment on line 7 to reflect the current date, as it currently mentions "July 24" which might be outdated.
Also applies to: 24-24
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (6)
36-61: Constants are well-defined, but default capacity units may need tuning.The
DynamoDBKVStoreConstantsobject provides a clear set of constants for configuring DynamoDB tables and operations. The naming and documentation are good, promoting code readability and maintainability.However, there's a TODO comment regarding tuning the default read and write capacity units:
// TODO: tune these val defaultReadCapacityUnits = 10L val defaultWriteCapacityUnits = 10LConsider benchmarking the application's performance with these default values and adjust them if necessary. Once tuned, remove the TODO comment.
63-116: Solid implementation of table creation with good error handling.The
createmethods inDynamoDBKVStoreImplare well-implemented, providing flexibility in table creation with or without additional properties. The code handles various scenarios, including existing tables and potential errors during creation.Good practices observed:
- Use of a waiter to ensure table creation is complete.
- Proper error handling and logging.
- Flexible configuration of read/write capacity units and sort keys.
Consider adding a specific catch block for
ResourceInUseExceptionto differentiate between table already existing and other creation errors. This would allow for more precise metric tracking:catch { case _: ResourceInUseException => logger.info(s"Table: $dataset already exists") metricsContext.increment("create.already_exists") case e: Exception => logger.error(s"Error creating Dynamodb table: $dataset", e) metricsContext.increment("create.failures") throw e }
118-158: Well-implementedmultiGetmethod with proper concurrency.The
multiGetmethod effectively handles both simple get requests and query requests based on timestamps. The use ofFuturefor asynchronous processing is appropriate and allows for concurrent execution of requests.Positive aspects:
- Clear separation of get and query requests.
- Proper use of
Futurefor concurrent processing.- Utilization of helper methods for building requests and extracting results.
To improve readability, consider extracting the logic for creating get and query requests into separate methods:
private def createGetItemRequest(req: KVStore.GetRequest): GetItemRequest = { val keyAttributeMap = primaryKeyMap(req.keyBytes) GetItemRequest.builder.key(keyAttributeMap.asJava).tableName(req.dataset).build } private def createQueryRequest(req: KVStore.GetRequest): QueryRequest = buildQueryRequest(req) // Usage in multiGet val getItemRequestPairs = getLookups.map(req => (req, createGetItemRequest(req))) val queryRequestPairs = queryLookups.map(req => (req, createQueryRequest(req)))This refactoring would make the main
multiGetmethod more concise and easier to follow.
160-196: Effective implementation of paginated listing with proper error handling.The
listmethod provides a solid implementation for paginated retrieval of items from a DynamoDB table. It correctly handles continuation keys and uses theScanRequestAPI appropriately.Positive aspects:
- Proper handling of pagination using continuation keys.
- Use of
Futurefor asynchronous processing.- Correct error handling through
handleDynamoDbOperation.Consider improving the handling of the
listLimitparameter:val listLimit = request.props.get(DynamoDBKVStoreConstants.listLimit) match { case Some(value: Int) if value > 0 => value case Some(value: String) => Try(value.toInt).filter(_ > 0).getOrElse(100) case _ => 100 }This change ensures that only positive integer values are used for the limit and provides more robust parsing of string values.
198-225: FunctionalmultiPutimplementation, but potential for performance improvement.The
multiPutmethod correctly implements multiple put operations for DynamoDB. It handles each put request individually and uses proper error handling through thehandleDynamoDbOperationmethod.Positive aspects:
- Correct implementation of individual put operations.
- Proper use of
Futurefor concurrent processing.- Logging of the number of rows being processed.
Consider implementing batch writes using DynamoDB's
BatchWriteItemoperation for improved performance, especially when dealing with a large number of items:import software.amazon.awssdk.services.dynamodb.model.{BatchWriteItemRequest, WriteRequest, PutRequest => DynamoPutRequest} def batchedMultiPut(keyValueDatasets: Seq[KVStore.PutRequest]): Future[Seq[Boolean]] = { val batchSize = 25 // DynamoDB allows up to 25 items per batch write val batches = keyValueDatasets.grouped(batchSize).toSeq Future.traverse(batches) { batch => Future { val writeRequests = batch.map { req => val item = buildAttributeMap(req.keyBytes, req.valueBytes) ++ req.tsMillis.map(ts => Map(sortKeyColumn -> AttributeValue.builder.n(ts.toString).build)).getOrElse(Map.empty) WriteRequest.builder().putRequest(DynamoPutRequest.builder().item(item.asJava).build()).build() } val batchWriteRequest = BatchWriteItemRequest.builder() .requestItems(Map(batch.head.dataset -> writeRequests.asJava).asJava) .build() handleDynamoDbOperation(metricsContext.withSuffix("batchedmultiput"), batch.head.dataset) { dynamoDbClient.batchWriteItem(batchWriteRequest) } // Handle partial success and return a sequence of boolean values // indicating success for each item in the batch } }.map(_.flatten) }This approach would likely provide better performance for large numbers of put operations.
227-329: Well-structured helper methods, butbulkPutneeds implementation.The private helper methods in this class are well-implemented and promote code reuse. They handle various aspects of DynamoDB operations, from building requests to extracting values from responses.
Positive aspects:
- Centralized error handling and metrics in
handleDynamoDbOperation.- Clear and concise methods for building attribute maps and requests.
- Proper extraction of values from DynamoDB responses.
The
bulkPutmethod is currently not implemented:override def bulkPut(sourceOfflineTable: String, destinationOnlineDataSet: String, partition: String): Unit = ???This method seems crucial for bulk data loading operations. Would you like assistance in implementing this method or creating a GitHub issue to track this task?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- .gitignore (1 hunks)
- build.sbt (2 hunks)
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala (1 hunks)
- online/src/main/scala/ai/chronon/online/Api.scala (3 hunks)
- online/src/main/scala/ai/chronon/online/Metrics.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- .gitignore
- cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala
- online/src/main/scala/ai/chronon/online/Api.scala
- online/src/main/scala/ai/chronon/online/Metrics.scala
🧰 Additional context used
📓 Learnings (1)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (9)
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:167-229 Timestamp: 2024-10-14T14:05:23.413Z Learning: Leveraging DynamoDB batch writes for performance optimization in `multiPut` will be addressed in a future review.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-08T16:18:45.669Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-07T14:37:55.680Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-07T15:49:27.648Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-08T16:18:45.669Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:167-229 Timestamp: 2024-10-14T14:05:23.413Z Learning: Errors and logging are handled at a granular level in `handleDynamoDbOperation` by passing the metrics context and updating relevant metric values based on the invoking operation.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111 Timestamp: 2024-10-07T15:04:30.069Z Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-08T16:18:45.669Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-07T15:09:51.567Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
🔇 Additional comments (4)
build.sbt (2)
54-54: LGTM: cloud_aws project added to root aggregateThe addition of
cloud_awsto the root project's aggregate list is consistent with the PR objectives. This change ensures that the new DynamoDB key-value store implementation will be included in the build process along with other existing projects.
Line range hint
1-254: Overall assessment: LGTMThe changes to the
build.sbtfile are well-aligned with the PR objectives of adding a DynamoDB key-value store implementation. The newcloud_awsproject is properly integrated into the build, and the dependency management is appropriate. The version updates for Scala and Spark are reasonable. The minor suggestions provided earlier can be addressed to further improve the configuration, but they don't impede the functionality or correctness of the build.cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
1-34: LGTM: Imports and package declaration are appropriate.The imports cover all necessary classes from the project's online package, AWS SDK for DynamoDB, and Scala standard libraries. There are no unnecessary imports, which is good for maintaining a clean and efficient codebase.
1-329: Overall, a robust and well-implemented DynamoDB-based key-value store.This implementation of
DynamoDBKVStoreImplprovides a comprehensive set of operations for interacting with DynamoDB as a key-value store. The code is well-structured, with good error handling, proper use of the AWS SDK, and appropriate concurrency management usingFuture.Key strengths:
- Clear separation of concerns with well-defined methods for different operations.
- Effective use of helper methods to promote code reuse and readability.
- Proper error handling and metrics tracking through the
handleDynamoDbOperationmethod.- Flexible configuration options for table creation and querying.
Areas for potential improvement:
- Performance optimization for
multiPutusing batch writes.- Implementation of the
bulkPutmethod for bulk data loading.- Minor refinements in error handling and parameter parsing.
Overall, this is a solid foundation for a DynamoDB-based key-value store, with room for some performance enhancements and feature completions.
chewy-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Outdated
Show resolved
Hide resolved
chewy-zlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(meant to approve)
e9f3b38 to
a99acd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (2)
36-61: Consider load testing to tune default capacity units.The
DynamoDBKVStoreConstantsobject is well-defined with necessary configuration options. However, there's a TODO comment about tuning the default read and write capacity units.// TODO: tune these val defaultReadCapacityUnits = 10L val defaultWriteCapacityUnits = 10LConsider conducting load tests to determine appropriate values for these defaults based on your expected workload.
63-116: Create methods look good, consider minor refactoring for clarity.The implementation of the
createmethods is solid, with proper error handling and metric tracking. The method for extracting capacity units from props is well-implemented.For improved readability, consider extracting the table creation logic into a separate private method. This would make the
createmethod more concise and easier to understand at a glance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (1 hunks)
🧰 Additional context used
📓 Learnings (1)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (9)
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:167-229 Timestamp: 2024-10-14T14:05:23.413Z Learning: Leveraging DynamoDB batch writes for performance optimization in `multiPut` will be addressed in a future review.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-08T16:18:45.669Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:29-30 Timestamp: 2024-10-07T14:37:55.680Z Learning: In the codebase, the `KVStore` implementation provides an implicit `ExecutionContext` in scope, so it's unnecessary to import another.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-07T15:49:27.648Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:245-260 Timestamp: 2024-10-08T16:18:45.669Z Learning: In `DynamoDBKVStoreImpl.scala`, refactoring methods like `extractTimedValues` and `extractListValues` to eliminate code duplication is discouraged if it would make the code more convoluted.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:167-229 Timestamp: 2024-10-14T14:05:23.413Z Learning: Errors and logging are handled at a granular level in `handleDynamoDbOperation` by passing the metrics context and updating relevant metric values based on the invoking operation.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:67-111 Timestamp: 2024-10-07T15:04:30.069Z Learning: In the `DynamoDBKVStoreImpl` class, the `props` parameter is kept as `Map[String, Any]` to handle binary properties without serialization, allowing flexibility in storing different types of data.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-08T16:18:45.669Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/test/scala/ai/chronon/integrations/aws/DynamoDBKVStoreTest.scala:175-175 Timestamp: 2024-10-07T15:09:51.567Z Learning: Hardcoding future timestamps in tests within `DynamoDBKVStoreTest.scala` is acceptable when data is generated and queried within the same time range, ensuring the tests remain valid over time.
🔇 Additional comments (3)
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala (3)
1-34: Imports and package declaration look good.The imports cover all necessary classes and utilities for DynamoDB operations and Scala concurrent programming. There are no unused imports, and the package declaration is appropriate.
1-333: Overall, a solid implementation with room for optimization.The
DynamoDBKVStoreImplclass provides a comprehensive implementation of a DynamoDB-based key-value store. The code is well-structured, follows good practices, and covers all necessary operations.Main strengths:
- Thorough implementation of create, multiGet, list, and multiPut operations.
- Good use of Scala's Future for asynchronous processing.
- Proper error handling and metric tracking in most areas.
Areas for potential improvement:
- Performance optimization in the
multiPutmethod by implementing batch writes.- More consistent error handling across helper methods, particularly in value extraction.
- Minor refactoring for improved readability and code reuse.
These suggested improvements are mostly optimizations and refinements. The current implementation is solid and should work well for most use cases.
201-228: 🛠️ Refactor suggestionConsider implementing batch writes for improved performance.
The current implementation of
multiPutuses individual put operations for each item. While this approach works, it may not be the most efficient for large numbers of items. DynamoDB supports batch write operations, which can significantly improve performance.Consider implementing a batched version of the
multiPutmethod using DynamoDB's BatchWriteItem operation. This would involve:
- Grouping the requests into batches of up to 25 items (DynamoDB's limit for batch writes).
- Creating BatchWriteItemRequest objects for each batch.
- Handling partial success scenarios, where some items in a batch might fail to write.
Here's a skeleton of how this could look:
import software.amazon.awssdk.services.dynamodb.model.{BatchWriteItemRequest, WriteRequest, PutRequest => DynamoPutRequest} def batchedMultiPut(keyValueDatasets: Seq[KVStore.PutRequest]): Future[Seq[Boolean]] = { val batchSize = 25 val batches = keyValueDatasets.grouped(batchSize).toSeq Future.traverse(batches) { batch => Future { val writeRequests = batch.map { req => val item = buildAttributeMap(req.keyBytes, req.valueBytes) ++ req.tsMillis.map(ts => Map(sortKeyColumn -> AttributeValue.builder.n(ts.toString).build)).getOrElse(Map.empty) WriteRequest.builder().putRequest(DynamoPutRequest.builder().item(item.asJava).build()).build() } val batchWriteRequest = BatchWriteItemRequest.builder() .requestItems(Map(batch.head.dataset -> writeRequests.asJava).asJava) .build() handleDynamoDbOperation(metricsContext.withSuffix("batchedmultiput"), batch.head.dataset) { dynamoDbClient.batchWriteItem(batchWriteRequest) } // Handle partial success and return a sequence of boolean values // indicating success for each item in the batch } }.map(_.flatten) }This approach would likely provide better performance for large numbers of put operations.
⛔ Skipped due to learnings
Learnt from: piyush-zlai PR: zipline-ai/chronon#33 File: cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala:167-229 Timestamp: 2024-10-14T14:05:23.413Z Learning: Leveraging DynamoDB batch writes for performance optimization in `multiPut` will be addressed in a future review.
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
cloud_aws/src/main/scala/ai/chronon/integrations/aws/DynamoDBKVStoreImpl.scala
Show resolved
Hide resolved
## Summary PR overview - [Video Link](https://drive.google.com/file/d/1Rei6upL2OiAls2jX7mCERaJGai3Noo1j/view?usp=drive_link) This PR builds on #33 and #43. We register the relevant model / join / groupby / staging query artifacts at the 'app' docker container startup by using the MetadataUploader and hitting the Dynamo endpoints using the KV store. We also extend the API to stop returning mocked data for the list and search calls and start returning real registered models + enriched responses (so the model object includes details on the Joins, GroupBys and features). There were a few broken pieces along the way that I fixed while working through the integration (e.g. the metadata walker code was missing handling models, the api.thrift enum for model type needed to start at index 0 etc). ## Checklist - [X] Added Unit Tests - [X] Covered by existing CI - [X] Integration tested - [ ] Documentation update Bringing up the container and curling: ``` $ curl http://localhost:9000/api/v1/search?term=1&limit=20 {"offset":0,"items":[{"name":"risk.transaction_model.v1","join":{"name":"risk.user_transactions.txn_join","joinFeatures":[],"groupBys":[{"name":"risk.transaction_events.txn_group_by_user","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.transaction_events.txn_group_by_merchant","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.user_data.user_group_by","features":["account_age","account_balance","credit_score","number_of_devices","country","account_type","preferred_language"]},{"name":"risk.merchant_data.merchant_group_by","features":["account_age","zipcode","is_big_merchant","country","account_type","preferred_language"]}]},"online":false,"production":false,"team":"risk","modelType":"XGBoost"}]} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new `DynamoDBMonitoringStore` for improved model data retrieval. - Enhanced `Model` structure to include joins and groupings for better data organization. - Expanded `parseTeam` method to handle `Model` instances in metadata processing. - Updated `ModelController` and `SearchController` to utilize real data from `DynamoDBMonitoringStore`. - Introduced new methods for managing datasets in the `MetadataStore`. - Improved handling of list values in the DynamoDB key-value store implementation. - Added support for AWS services in the application through new Docker configurations. - Enhanced the application's Docker setup for better build and runtime environments. - Modified the application's metadata loading process to include a new section for DynamoDB. - **Bug Fixes** - Corrected handling of `Either` types in test cases to prevent runtime errors. - **Documentation** - Updated configuration files to support new DynamoDB module. - **Tests** - Added comprehensive unit tests for `DynamoDBKVStoreImpl` and `DynamoDBMonitoringStore`. - Enhanced test coverage for `ModelController` and `SearchController` with mocked dependencies. - Introduced new tests for the functionality of `DynamoDBMonitoringStore`. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a DynamoDB KV store implementation that we can leverage for our Stripe monitoring PoC. This PR only includes the KV store implementation + tests. Wiring this up to our Play service to create the relevant tables + start storing the actual model metadata + time series data will be part of a follow up. Our existing Kv store api covers some aspects of what we need on the monitoring front but has a few limitations: * No list api - we need this to enumerate all the datasets in tables like the Model table. (We can squeeze this into the get call but it feels a bit contrived as we need to use an arbitrary fixed partition key (e.g. "1") and then issue a range query to get the entries. Leveraging dynamo's scan with a separate 'list' KV store api felt cleaner there. * Create doesn't take params - we do need to provide params to our Dynamo create call where we say if the table is 'sorted' or not, what the read / write capacity units are etc. I've extended the create api to include these params. Open to feedback here to improve / tweak this. We can consider the kv store api to be a bit in flux and once we integrate a couple more cloud provider KV stores (big table, cosmos db) we'll get to something more generic and settled. ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new key-value store interface using Amazon DynamoDB, allowing for data creation, retrieval, and pagination. - Added support for listing datasets with new request and response types in the KVStore API. - Enhanced data retrieval capabilities with optional time range parameters. - Expanded metrics tracking with a new environment type for the key-value store. - **Bug Fixes** - Improved error handling during table creation and data operations. - **Tests** - Implemented a comprehensive test suite for the new DynamoDB key-value store functionalities, including table creation, data insertion, pagination, and time series data handling. - **Chores** - Updated build configuration to integrate AWS support while preserving existing structures. - Added a new entry to the .gitignore file for generated test files. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary PR overview - [Video Link](https://drive.google.com/file/d/1Rei6upL2OiAls2jX7mCERaJGai3Noo1j/view?usp=drive_link) This PR builds on #33 and #43. We register the relevant model / join / groupby / staging query artifacts at the 'app' docker container startup by using the MetadataUploader and hitting the Dynamo endpoints using the KV store. We also extend the API to stop returning mocked data for the list and search calls and start returning real registered models + enriched responses (so the model object includes details on the Joins, GroupBys and features). There were a few broken pieces along the way that I fixed while working through the integration (e.g. the metadata walker code was missing handling models, the api.thrift enum for model type needed to start at index 0 etc). ## Checklist - [X] Added Unit Tests - [X] Covered by existing CI - [X] Integration tested - [ ] Documentation update Bringing up the container and curling: ``` $ curl http://localhost:9000/api/v1/search?term=1&limit=20 {"offset":0,"items":[{"name":"risk.transaction_model.v1","join":{"name":"risk.user_transactions.txn_join","joinFeatures":[],"groupBys":[{"name":"risk.transaction_events.txn_group_by_user","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.transaction_events.txn_group_by_merchant","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.user_data.user_group_by","features":["account_age","account_balance","credit_score","number_of_devices","country","account_type","preferred_language"]},{"name":"risk.merchant_data.merchant_group_by","features":["account_age","zipcode","is_big_merchant","country","account_type","preferred_language"]}]},"online":false,"production":false,"team":"risk","modelType":"XGBoost"}]} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new `DynamoDBMonitoringStore` for improved model data retrieval. - Enhanced `Model` structure to include joins and groupings for better data organization. - Expanded `parseTeam` method to handle `Model` instances in metadata processing. - Updated `ModelController` and `SearchController` to utilize real data from `DynamoDBMonitoringStore`. - Introduced new methods for managing datasets in the `MetadataStore`. - Improved handling of list values in the DynamoDB key-value store implementation. - Added support for AWS services in the application through new Docker configurations. - Enhanced the application's Docker setup for better build and runtime environments. - Modified the application's metadata loading process to include a new section for DynamoDB. - **Bug Fixes** - Corrected handling of `Either` types in test cases to prevent runtime errors. - **Documentation** - Updated configuration files to support new DynamoDB module. - **Tests** - Added comprehensive unit tests for `DynamoDBKVStoreImpl` and `DynamoDBMonitoringStore`. - Enhanced test coverage for `ModelController` and `SearchController` with mocked dependencies. - Introduced new tests for the functionality of `DynamoDBMonitoringStore`. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a DynamoDB KV store implementation that we can leverage for our Stripe monitoring PoC. This PR only includes the KV store implementation + tests. Wiring this up to our Play service to create the relevant tables + start storing the actual model metadata + time series data will be part of a follow up. Our existing Kv store api covers some aspects of what we need on the monitoring front but has a few limitations: * No list api - we need this to enumerate all the datasets in tables like the Model table. (We can squeeze this into the get call but it feels a bit contrived as we need to use an arbitrary fixed partition key (e.g. "1") and then issue a range query to get the entries. Leveraging dynamo's scan with a separate 'list' KV store api felt cleaner there. * Create doesn't take params - we do need to provide params to our Dynamo create call where we say if the table is 'sorted' or not, what the read / write capacity units are etc. I've extended the create api to include these params. Open to feedback here to improve / tweak this. We can consider the kv store api to be a bit in flux and once we integrate a couple more cloud provider KV stores (big table, cosmos db) we'll get to something more generic and settled. ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new key-value store interface using Amazon DynamoDB, allowing for data creation, retrieval, and pagination. - Added support for listing datasets with new request and response types in the KVStore API. - Enhanced data retrieval capabilities with optional time range parameters. - Expanded metrics tracking with a new environment type for the key-value store. - **Bug Fixes** - Improved error handling during table creation and data operations. - **Tests** - Implemented a comprehensive test suite for the new DynamoDB key-value store functionalities, including table creation, data insertion, pagination, and time series data handling. - **Chores** - Updated build configuration to integrate AWS support while preserving existing structures. - Added a new entry to the .gitignore file for generated test files. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary PR overview - [Video Link](https://drive.google.com/file/d/1Rei6upL2OiAls2jX7mCERaJGai3Noo1j/view?usp=drive_link) This PR builds on #33 and #43. We register the relevant model / join / groupby / staging query artifacts at the 'app' docker container startup by using the MetadataUploader and hitting the Dynamo endpoints using the KV store. We also extend the API to stop returning mocked data for the list and search calls and start returning real registered models + enriched responses (so the model object includes details on the Joins, GroupBys and features). There were a few broken pieces along the way that I fixed while working through the integration (e.g. the metadata walker code was missing handling models, the api.thrift enum for model type needed to start at index 0 etc). ## Checklist - [X] Added Unit Tests - [X] Covered by existing CI - [X] Integration tested - [ ] Documentation update Bringing up the container and curling: ``` $ curl http://localhost:9000/api/v1/search?term=1&limit=20 {"offset":0,"items":[{"name":"risk.transaction_model.v1","join":{"name":"risk.user_transactions.txn_join","joinFeatures":[],"groupBys":[{"name":"risk.transaction_events.txn_group_by_user","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.transaction_events.txn_group_by_merchant","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.user_data.user_group_by","features":["account_age","account_balance","credit_score","number_of_devices","country","account_type","preferred_language"]},{"name":"risk.merchant_data.merchant_group_by","features":["account_age","zipcode","is_big_merchant","country","account_type","preferred_language"]}]},"online":false,"production":false,"team":"risk","modelType":"XGBoost"}]} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new `DynamoDBMonitoringStore` for improved model data retrieval. - Enhanced `Model` structure to include joins and groupings for better data organization. - Expanded `parseTeam` method to handle `Model` instances in metadata processing. - Updated `ModelController` and `SearchController` to utilize real data from `DynamoDBMonitoringStore`. - Introduced new methods for managing datasets in the `MetadataStore`. - Improved handling of list values in the DynamoDB key-value store implementation. - Added support for AWS services in the application through new Docker configurations. - Enhanced the application's Docker setup for better build and runtime environments. - Modified the application's metadata loading process to include a new section for DynamoDB. - **Bug Fixes** - Corrected handling of `Either` types in test cases to prevent runtime errors. - **Documentation** - Updated configuration files to support new DynamoDB module. - **Tests** - Added comprehensive unit tests for `DynamoDBKVStoreImpl` and `DynamoDBMonitoringStore`. - Enhanced test coverage for `ModelController` and `SearchController` with mocked dependencies. - Introduced new tests for the functionality of `DynamoDBMonitoringStore`. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a DynamoDB KV store implementation that we can leverage for our Stripe monitoring PoC. This PR only includes the KV store implementation + tests. Wiring this up to our Play service to create the relevant tables + start storing the actual model metadata + time series data will be part of a follow up. Our existing Kv store api covers some aspects of what we need on the monitoring front but has a few limitations: * No list api - we need this to enumerate all the datasets in tables like the Model table. (We can squeeze this into the get call but it feels a bit contrived as we need to use an arbitrary fixed partition key (e.g. "1") and then issue a range query to get the entries. Leveraging dynamo's scan with a separate 'list' KV store api felt cleaner there. * Create doesn't take params - we do need to provide params to our Dynamo create call where we say if the table is 'sorted' or not, what the read / write capacity units are etc. I've extended the create api to include these params. Open to feedback here to improve / tweak this. We can consider the kv store api to be a bit in flux and once we integrate a couple more cloud provider KV stores (big table, cosmos db) we'll get to something more generic and settled. ## Checklist - [X] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new key-value store interface using Amazon DynamoDB, allowing for data creation, retrieval, and pagination. - Added support for listing datasets with new request and response types in the KVStore API. - Enhanced data retrieval capabilities with optional time range parameters. - Expanded metrics tracking with a new environment type for the key-value store. - **Bug Fixes** - Improved error handling during table creation and data operations. - **Tests** - Implemented a comprehensive test suite for the new DynamoDB key-value store functionalities, including table creation, data insertion, pagination, and time series data handling. - **Chores** - Updated build configuration to integrate AWS support while preserving existing structures. - Added a new entry to the .gitignore file for generated test files. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary PR overview - [Video Link](https://drive.google.com/file/d/1Rei6upL2OiAls2jX7mCERaJGai3Noo1j/view?usp=drive_link) This PR builds on #33 and #43. We register the relevant model / join / groupby / staging query artifacts at the 'app' docker container startup by using the MetadataUploader and hitting the Dynamo endpoints using the KV store. We also extend the API to stop returning mocked data for the list and search calls and start returning real registered models + enriched responses (so the model object includes details on the Joins, GroupBys and features). There were a few broken pieces along the way that I fixed while working through the integration (e.g. the metadata walker code was missing handling models, the api.thrift enum for model type needed to start at index 0 etc). ## Checklist - [X] Added Unit Tests - [X] Covered by existing CI - [X] Integration tested - [ ] Documentation update Bringing up the container and curling: ``` $ curl http://localhost:9000/api/v1/search?term=1&limit=20 {"offset":0,"items":[{"name":"risk.transaction_model.v1","join":{"name":"risk.user_transactions.txn_join","joinFeatures":[],"groupBys":[{"name":"risk.transaction_events.txn_group_by_user","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.transaction_events.txn_group_by_merchant","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.user_data.user_group_by","features":["account_age","account_balance","credit_score","number_of_devices","country","account_type","preferred_language"]},{"name":"risk.merchant_data.merchant_group_by","features":["account_age","zipcode","is_big_merchant","country","account_type","preferred_language"]}]},"online":false,"production":false,"team":"risk","modelType":"XGBoost"}]} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new `DynamoDBMonitoringStore` for improved model data retrieval. - Enhanced `Model` structure to include joins and groupings for better data organization. - Expanded `parseTeam` method to handle `Model` instances in metadata processing. - Updated `ModelController` and `SearchController` to utilize real data from `DynamoDBMonitoringStore`. - Introduced new methods for managing datasets in the `MetadataStore`. - Improved handling of list values in the DynamoDB key-value store implementation. - Added support for AWS services in the application through new Docker configurations. - Enhanced the application's Docker setup for better build and runtime environments. - Modified the application's metadata loading process to include a new section for DynamoDB. - **Bug Fixes** - Corrected handling of `Either` types in test cases to prevent runtime errors. - **Documentation** - Updated configuration files to support new DynamoDB module. - **Tests** - Added comprehensive unit tests for `DynamoDBKVStoreImpl` and `DynamoDBMonitoringStore`. - Enhanced test coverage for `ModelController` and `SearchController` with mocked dependencies. - Introduced new tests for the functionality of `DynamoDBMonitoringStore`. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Add a DynamoDB KV store implementation that we can leverage for our Stripe monitoring PoC. This PR only includes the KV store implementation + tests. Wiring this up to our Play service to create the relevant tables + start storing the actual model metadata + time series data will be part of a follow up. Our existing Kv store api covers some aspects of what we need on the monitoring front but has a few limitations: * No list api - we need this to enumerate all the datasets in tables like the Model table. (We can squeeze this into the get call but it feels a bit contrived as we need to use an arbitrary fixed partition key (e.g. "1") and then issue a range query to get the entries. Leveraging dynamo's scan with a separate 'list' KV store api felt cleaner there. * Create doesn't take params - we do need to provide params to our Dynamo create call where we say if the table is 'sorted' or not, what the read / write capacity units are etc. I've extended the create api to include these params. Open to feedbaour clients here to improve / tweak this. We can consider the kv store api to be a bit in flux and once we integrate a couple more cloud provider KV stores (big table, cosmos db) we'll get to something more generic and settled. ## Cheour clientslist - [X] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced a new key-value store interface using Amazon DynamoDB, allowing for data creation, retrieval, and pagination. - Added support for listing datasets with new request and response types in the KVStore API. - Enhanced data retrieval capabilities with optional time range parameters. - Expanded metrics traour clientsing with a new environment type for the key-value store. - **Bug Fixes** - Improved error handling during table creation and data operations. - **Tests** - Implemented a comprehensive test suite for the new DynamoDB key-value store functionalities, including table creation, data insertion, pagination, and time series data handling. - **Chores** - Updated build configuration to integrate AWS support while preserving existing structures. - Added a new entry to the .gitignore file for generated test files. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary PR overview - [Video Link](https://drive.google.com/file/d/1Rei6upL2OiAls2jX7mCERaJGai3Noo1j/view?usp=drive_link) This PR builds on #33 and #43. We register the relevant model / join / groupby / staging query artifacts at the 'app' doour clientser container startup by using the MetadataUploader and hitting the Dynamo endpoints using the KV store. We also extend the API to stop returning moour clientsed data for the list and search calls and start returning real registered models + enriched responses (so the model object includes details on the Joins, GroupBys and features). There were a few broken pieces along the way that I fixed while working through the integration (e.g. the metadata walker code was missing handling models, the api.thrift enum for model type needed to start at index 0 etc). ## Cheour clientslist - [X] Added Unit Tests - [X] Covered by existing CI - [X] Integration tested - [ ] Documentation update Bringing up the container and curling: ``` $ curl http://localhost:9000/api/v1/search?term=1&limit=20 {"offset":0,"items":[{"name":"risk.transaction_model.v1","join":{"name":"risk.user_transactions.txn_join","joinFeatures":[],"groupBys":[{"name":"risk.transaction_events.txn_group_by_user","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.transaction_events.txn_group_by_merchant","features":["transaction_amount_count_1h","transaction_amount_count_1d","transaction_amount_count_30d","transaction_amount_count_365d","transaction_amount_sum_1h"]},{"name":"risk.user_data.user_group_by","features":["account_age","account_balance","credit_score","number_of_devices","country","account_type","preferred_language"]},{"name":"risk.merchant_data.merchant_group_by","features":["account_age","zipcode","is_big_merchant","country","account_type","preferred_language"]}]},"online":false,"production":false,"team":"risk","modelType":"XGBoost"}]} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new `DynamoDBMonitoringStore` for improved model data retrieval. - Enhanced `Model` structure to include joins and groupings for better data organization. - Expanded `parseTeam` method to handle `Model` instances in metadata processing. - Updated `ModelController` and `SearchController` to utilize real data from `DynamoDBMonitoringStore`. - Introduced new methods for managing datasets in the `MetadataStore`. - Improved handling of list values in the DynamoDB key-value store implementation. - Added support for AWS services in the application through new Doour clientser configurations. - Enhanced the application's Doour clientser setup for better build and runtime environments. - Modified the application's metadata loading process to include a new section for DynamoDB. - **Bug Fixes** - Corrected handling of `Either` types in test cases to prevent runtime errors. - **Documentation** - Updated configuration files to support new DynamoDB module. - **Tests** - Added comprehensive unit tests for `DynamoDBKVStoreImpl` and `DynamoDBMonitoringStore`. - Enhanced test coverage for `ModelController` and `SearchController` with moour clientsed dependencies. - Introduced new tests for the functionality of `DynamoDBMonitoringStore`. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Add a DynamoDB KV store implementation that we can leverage for our Stripe monitoring PoC. This PR only includes the KV store implementation + tests. Wiring this up to our Play service to create the relevant tables + start storing the actual model metadata + time series data will be part of a follow up.
Our existing Kv store api covers some aspects of what we need on the monitoring front but has a few limitations:
Open to feedback here to improve / tweak this. We can consider the kv store api to be a bit in flux and once we integrate a couple more cloud provider KV stores (big table, cosmos db) we'll get to something more generic and settled.
Checklist
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests
Chores