-
Notifications
You must be signed in to change notification settings - Fork 11
Adding a new returnAndBulkUpsert API in the doc store. #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This will help the clients to build more interesting use cases and also cutdown a roundtrip to the server in the cases where they need to immediately get back the upserted documents.
Codecov Report
@@ Coverage Diff @@
## main #18 +/- ##
============================================
- Coverage 69.58% 69.44% -0.14%
- Complexity 154 160 +6
============================================
Files 11 11
Lines 674 707 +33
Branches 72 73 +1
============================================
+ Hits 469 491 +22
- Misses 164 174 +10
- Partials 41 42 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
I am usually reluctant to such additions as it forces sync processing which
isn't good for high load. Could you please describe some of the uses cases
you have in mind?
…On Tue, 15 Dec 2020, 03:21 Buchi Reddy Busi Reddy, ***@***.***> wrote:
This will help the clients to build more interesting use cases and also
cutdown
a roundtrip to the server in the cases where they need to immediately get
back
the upserted documents.
------------------------------
You can view, comment on, or merge this pull request online at:
#18
Commit Summary
- Adding a new API to bulkUpsertAndGet the documents in the doc store.
File Changes
- *M*
document-store/src/integrationTest/java/org/hypertrace/core/documentstore/mongo/MongoDocStoreTest.java
<https://github.com/hypertrace/document-store/pull/18/files#diff-fab258440918057cd0bc0d16483437cc7dbfb88df8e6256b4efb55a763bb9712>
(22)
- *M*
document-store/src/integrationTest/java/org/hypertrace/core/documentstore/postgres/PostgresDocStoreTest.java
<https://github.com/hypertrace/document-store/pull/18/files#diff-a7415a191ab5ed0893f1d0116760c767adc1290ec45b6647837d027e0fa4d31b>
(44)
- *M*
document-store/src/main/java/org/hypertrace/core/documentstore/Collection.java
<https://github.com/hypertrace/document-store/pull/18/files#diff-9ca60955d4574c5272bb8c34261cffdf9508edb8e2e8bb52f209524dac6ea9d6>
(6)
- *M*
document-store/src/main/java/org/hypertrace/core/documentstore/mongo/MongoCollection.java
<https://github.com/hypertrace/document-store/pull/18/files#diff-a4f7c7eb0b57790b80ac31a65d3504b760caf04c168b42c33da9f9364f2f73e7>
(62)
- *M*
document-store/src/main/java/org/hypertrace/core/documentstore/postgres/PostgresCollection.java
<https://github.com/hypertrace/document-store/pull/18/files#diff-3df0c046f864e7bc937f5d243417a382655bced2f3c5f70e80ec469012ea61bb>
(84)
Patch Links:
- https://github.com/hypertrace/document-store/pull/18.patch
- https://github.com/hypertrace/document-store/pull/18.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXOYAUGOAMDC4K6X4U5AJTSU3B23ANCNFSM4U3UFAYQ>
.
|
@jcchavezs updated the description with more details. |
| .collect(Collectors.joining(", ")); | ||
|
|
||
| String space = " "; | ||
| String query = new StringBuilder("SELECT * FROM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a shortcut to SQL injection, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. Thanks for pointing. Let me check and think more..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the SQLi is being fixed as a part of https://github.com/hypertrace/document-store/pull/16/files
I'll merge this and will update the other PR and will fix SQLi as a part of that. If the other PR is merged before mine, i can update and fix it.
document-store/src/main/java/org/hypertrace/core/documentstore/mongo/MongoCollection.java
Outdated
Show resolved
Hide resolved
|
|
||
| // Now go ahead and do the bulk upsert. | ||
| BulkWriteResult result = bulkUpsertImpl(documents); | ||
| LOGGER.debug(result.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about the performance of this. Not a Javaer here but it seams whether logger debug is enabled or not we still turn it into string? cc @kotharironak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 we should always be letting the logger doing the stringification for us so we don't have to eat this cost unless the message is needed. That means wrapping it in an if or IMO, more graceful to do LOGGER.debug("{}", result);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing in a new PR. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
| }; | ||
| } catch (JsonProcessingException e) { | ||
| LOGGER.error("Error during bulk upsert for documents:{}", documents, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more like an in general comment. I am usually in favour of either log the error and handle it or bubble up the exception but not both of them because they usually flood the logs. Also, do we want to print the full set of documents in logs? How about privacy concerns and also efficient usage of the log storage? I don't thing dumping the failing documents in the logs is actionable either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #25
| }; | ||
| } catch (JsonProcessingException e) { | ||
| LOGGER.error("Error during bulk upsert for documents:{}", documents, e); | ||
| throw new IOException("Error during bulk upsert."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not passing the previous exception makes us loosing all the context on this error. Is there any reason for not doing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the API should mask the implementation specific exception details but this is actually a library so I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #25
| return new BasicDBObject(ID_KEY, key.toString()); | ||
| } | ||
|
|
||
| private BasicDBObject selectionCriteriaForKeys(Set<Key> keys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be private? A simple inspection tells me yes it does but I am not 100% sure. Tho ID_KEY is static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to ask about static right? Fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I meant static, sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #25
| .collect(Collectors.joining(", ")); | ||
|
|
||
| String space = " "; | ||
| String query = new StringBuilder("SELECT * FROM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually have reluctancies with the select *, do we really need to pass it all, can't we make explicit the fields we pass so that changes in the schema don't affect this kind of methods silently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcchavezs The need here is indeed the full document since we want to return all the fields of the document's current copy, as the API requires it.
At this layer, we don't know what all fields are present in the document actually.
|
@buchi-busireddy I am sorry I am late to the party but I left you some comments. |
|
Hi @buchi-busireddy I wonder if you had time to put up some of the fixed we talked about in this PR post merge. |
|
@jcchavezs Thanks for reminding and sorry that I missed raising the PR. I do have the changes locally and will raise shortly. |
This will help the clients to build more interesting use cases and also cutdown a roundtrip to the server.
A use case where this has even become essential is more like a change data capture kind of use case: A client wants to upsert a bunch of docs but it needs to know what all changed in those docs compared to the previous version of the docs so that it can optimize some processing. CDC (Change data capture) at the doc store level might be helpful but that's much more heavy weight and doesn't have enough use cases currently to bring that in.
Also, it's a known that the new API is sync API and could take a bit longer than other APIs so it should be used wisely based on the use cases.