-
-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance GetDocuments API by adding bulk retrieval #931
Conversation
WalkthroughThe new method Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (8)
- server/backend/database/database.go (1 hunks)
- server/backend/database/memory/database.go (1 hunks)
- server/backend/database/memory/database_test.go (1 hunks)
- server/backend/database/mongo/client.go (1 hunks)
- server/backend/database/mongo/client_test.go (1 hunks)
- server/backend/database/testcases/testcases.go (1 hunks)
- server/documents/documents.go (2 hunks)
- test/sharding/mongo_client_test.go (1 hunks)
Additional comments not posted (8)
server/backend/database/memory/database_test.go (1)
51-53
: Approval of new test case addition.The addition of the
RunFindDocInfosByKeys test
is aligned with the PR's objectives to enhance theGetDocuments
API performance. This test ensures the new bulk retrieval method works as expected in the memory database implementation.server/backend/database/mongo/client_test.go (1)
66-68
: Approval of new test case addition.The addition of the
RunFindDocInfosByKeys test
is aligned with the PR's objectives to enhance theGetDocuments
API performance. This test ensures the new bulk retrieval method works as expected in the MongoDB database implementation.test/sharding/mongo_client_test.go (1)
75-77
: Approval of new test case addition.The addition of the
RunFindDocInfosByKeys test
is aligned with the PR's objectives to enhance theGetDocuments
API performance. This test ensures the new bulk retrieval method works as expected in the MongoDB client with sharded database configuration.server/documents/documents.go (2)
100-100
: Approval of updatedGetDocumentSummary
function.The simplification of the document retrieval process in
GetDocumentSummary
by usingFindDocInfoByKey
is a positive change, enhancing the efficiency and maintainability of the code.
123-148
: Approval of updatedGetDocumentSummaries
function.The update to
GetDocumentSummaries
to useFindDocInfosByKeys
for bulk document retrieval is a significant improvement. This change effectively addresses the N+1 problem and enhances the performance of the API.server/backend/database/database.go (1)
167-172
: New methodFindDocInfosByKeys
added toDatabase
interfaceThe addition of
FindDocInfosByKeys
to theDatabase
interface is a key enhancement for supporting bulk document retrieval. The method signature correctly takes a context, a project ID, and a slice of document keys, which is consistent with the interface's pattern for similar methods.
- Correctness: The method signature is correct and aligns with Go's conventions for interfaces.
- Performance: This method supports bulk operations, which should improve performance as noted in the PR objectives.
- Maintainability: The method is clearly defined and fits well with the existing structure of the interface.
server/backend/database/mongo/client.go (1)
766-793
: Review of the new methodFindDocInfosByKeys
.This method aims to fetch multiple documents based on their keys, which aligns with the PR's objective to enhance performance by reducing the number of database queries. Here are a few observations and suggestions:
- Error Handling: The method correctly handles potential errors from the MongoDB operations, which is crucial for robustness.
- Efficiency: Using the
$in
operator with the keys array is efficient for fetching multiple documents in a single query.- Filter Construction: The method constructs a filter to exclude documents marked as removed, which is a good practice for data integrity.
However, consider the following improvements:
- Logging: Adding logging before and after the MongoDB operations could help in debugging and monitoring the performance of this method.
- Testing: Ensure that there are comprehensive tests covering various scenarios, including cases with large numbers of keys, no keys, and keys that do not match any documents.
Overall, the implementation looks solid and should contribute positively to the system's performance.
server/backend/database/testcases/testcases.go (1)
96-129
: Review ofRunFindDocInfosByKeysTest
Function
Context Setup: The function correctly sets up the test context and activates a client. This is a standard setup for database-related tests.
Document Creation Simulation: The function simulates the creation of documents by attempting to find document information for a set of keys. However, it does not actually create any documents but only checks if they can be retrieved, assuming they exist. This might be misleading as the name suggests creation but it only checks existence.
Bulk Retrieval and Validation: The bulk retrieval using
FindDocInfosByKeys
is correctly implemented. The function checks if the keys of the retrieved documents match the expected keys usingassert.ElementsMatch
, which is appropriate for unordered comparisons.Length Check: The function also checks if the number of retrieved documents matches the number of requested keys using
assert.Len
. This is a good practice to ensure that no documents are missing or unexpectedly added.Error Handling: The function properly checks for errors after each database operation, which is crucial for identifying issues early in the test.
Test Isolation: Each test run is isolated using
t.Run
, which is good for separating test cases and identifying which specific test fails if there are multiple failures.Suggestions:
- Consider actually creating the documents in the database before trying to retrieve them. This would make the test more comprehensive and realistic.
- Add more detailed assertions to check the contents of the retrieved documents, not just their keys.
Although the logic you mentioned is related to CRDT, it seems okay to understand it roughly and focus on query optimization. |
@sejongk |
Sure. If you have any suggestions about this, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- server/backend/database/memory/database.go (1 hunks)
- server/backend/database/testcases/testcases.go (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- server/backend/database/memory/database.go
- server/backend/database/testcases/testcases.go
DocumentSummaries, which is in the response from Document List Page in CodePair, which uses this API, only uses the https://www.figma.com/design/OYc1Cr0nvFuBnWZxhscfDk/Code-Pair?node-id=42-101&t=lCXENp1HuDnFAkwq-0 Therefore, how about adding an option( |
Thanks for your suggestion. I believe this suggested method is somewhat related to #597. |
I have reviewed all the comments provided. Currently, I have completed the implementation of bulk query methods for However, I am facing some issues and need help with the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
api/yorkie/v1/admin.pb.go
is excluded by!**/*.pb.go
Files selected for processing (6)
- api/docs/yorkie/v1/admin.openapi.yaml (56 hunks)
- api/docs/yorkie/v1/resources.openapi.yaml (64 hunks)
- api/docs/yorkie/v1/yorkie.openapi.yaml (49 hunks)
- api/yorkie/v1/admin.proto (1 hunks)
- server/documents/documents.go (2 hunks)
- server/rpc/admin_server.go (1 hunks)
Files skipped from review due to trivial changes (3)
- api/docs/yorkie/v1/admin.openapi.yaml
- api/docs/yorkie/v1/resources.openapi.yaml
- api/docs/yorkie/v1/yorkie.openapi.yaml
Files skipped from review as they are similar to previous changes (1)
- server/documents/documents.go
Additional comments not posted (2)
api/yorkie/v1/admin.proto (1)
120-120
: LGTM! Theinclude_snapshot
field is correctly added.The new field
include_snapshot
is correctly defined and integrated within theGetDocumentsRequest
message.server/rpc/admin_server.go (1)
228-228
: LGTM! But verify the function usage in the codebase.The code changes are approved.
However, ensure that all function calls to
GetDocuments
match the new signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. 👍
Based on the discussions with @hackerwins and @sejongk regarding the comment ideas above, we have decided to implement the option to include or exclude snapshots in the API request. As a result, the GetDocuments API request specification has changed, which can be reviewed in this commit. Consequently, by adding the |
There was an issue with the updatedAt of a document showing another document updatedAt. Specifically, the updatedAt in the document list was being reversed and reflecting a different document's value. This issue occured during the bulk retrieval of document lists using yorkie-team/yorkie#931. In this process, there was no guarantee that the order of the keys passed to the DB query matches the order of the documents in the query result.
There was an issue with the updatedAt of a document showing another document updatedAt. Specifically, the updatedAt in the document list was being reversed and reflecting a different document's value. This issue occured during the bulk retrieval of document lists using yorkie-team/yorkie#931. In this process, there was no guarantee that the order of the keys passed to the DB query matches the order of the documents in the query result.
What this PR does / why we need it:
This PR implements a bulk retrieval operation for the GetDocuments API to enhance performance.
The specific tasks accomplished include:
FindDocInfosByKeys()
method to the Database interfaceFindDocInfosByKeys()
logic inmongo.Client
andmemory.DB
respectivelyWhile the query to retrieve DocInfos has been reduced from N times to once when calling the GetDocuments API, there still remains an issue where
packs.BuildDocumentForServerSeq()
is called N times.However, this logic seems to be related to CRDT or logical clock functionalities, which I do not fully understand yet, so I could not work on it. Therefore, I did not remove the TODO comment regarding the N+1 issue.
Which issue(s) this PR fixes:
Fixes #921
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation:
Checklist:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation