Skip to content

Conversation

@buchi-busireddy
Copy link
Contributor

@buchi-busireddy buchi-busireddy commented Dec 15, 2020

This will help the clients to build more interesting use cases and also cutdown a roundtrip to the server.

A use case where this has even become essential is more like a change data capture kind of use case: A client wants to upsert a bunch of docs but it needs to know what all changed in those docs compared to the previous version of the docs so that it can optimize some processing. CDC (Change data capture) at the doc store level might be helpful but that's much more heavy weight and doesn't have enough use cases currently to bring that in.

Also, it's a known that the new API is sync API and could take a bit longer than other APIs so it should be used wisely based on the use cases.

This will help the clients to build more interesting use cases and also cutdown
a roundtrip to the server in the cases where they need to immediately get back
the upserted documents.
@codecov
Copy link

codecov bot commented Dec 15, 2020

Codecov Report

Merging #18 (37ba364) into main (426e1c6) will decrease coverage by 0.13%.
The diff coverage is 80.70%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main      #18      +/-   ##
============================================
- Coverage     69.58%   69.44%   -0.14%     
- Complexity      154      160       +6     
============================================
  Files            11       11              
  Lines           674      707      +33     
  Branches         72       73       +1     
============================================
+ Hits            469      491      +22     
- Misses          164      174      +10     
- Partials         41       42       +1     
Flag Coverage Δ Complexity Δ
integration 57.14% <78.94%> (+0.46%) 0.00 <12.00> (ø)
unit 30.97% <1.75%> (-1.52%) 0.00 <12.00> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...ore/documentstore/postgres/PostgresCollection.java 72.45% <76.47%> (-1.02%) 53.00 <6.00> (+3.00) ⬇️
...race/core/documentstore/mongo/MongoCollection.java 69.74% <86.95%> (+0.51%) 41.00 <6.00> (+3.00)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 426e1c6...37ba364. Read the comment docs.

@buchi-busireddy buchi-busireddy marked this pull request as ready for review December 15, 2020 02:23
@jcchavezs
Copy link
Contributor

jcchavezs commented Dec 15, 2020 via email

@buchi-busireddy buchi-busireddy changed the title Adding a new API to bulkUpsertAndGet the documents in the doc store. Adding a new returnAndBulkUpsert API in the doc store. Dec 15, 2020
@buchi-busireddy
Copy link
Contributor Author

I am usually reluctant to such additions as it forces sync processing which isn't good for high load. Could you please describe some of the uses cases you have in mind?

@jcchavezs updated the description with more details.

.collect(Collectors.joining(", "));

String space = " ";
String query = new StringBuilder("SELECT * FROM")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a shortcut to SQL injection, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. Thanks for pointing. Let me check and think more..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SQLi is being fixed as a part of https://github.com/hypertrace/document-store/pull/16/files
I'll merge this and will update the other PR and will fix SQLi as a part of that. If the other PR is merged before mine, i can update and fix it.

tim-mwangi
tim-mwangi previously approved these changes Dec 17, 2020
@buchi-busireddy buchi-busireddy merged commit cd6c5a9 into main Dec 17, 2020
@buchi-busireddy buchi-busireddy deleted the feature/bulk_upsert_and_return branch December 17, 2020 17:04

// Now go ahead and do the bulk upsert.
BulkWriteResult result = bulkUpsertImpl(documents);
LOGGER.debug(result.toString());
Copy link
Contributor

@jcchavezs jcchavezs Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the performance of this. Not a Javaer here but it seams whether logger debug is enabled or not we still turn it into string? cc @kotharironak

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we should always be letting the logger doing the stringification for us so we don't have to eat this cost unless the message is needed. That means wrapping it in an if or IMO, more graceful to do LOGGER.debug("{}", result);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing in a new PR. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#25

}
};
} catch (JsonProcessingException e) {
LOGGER.error("Error during bulk upsert for documents:{}", documents, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like an in general comment. I am usually in favour of either log the error and handle it or bubble up the exception but not both of them because they usually flood the logs. Also, do we want to print the full set of documents in logs? How about privacy concerns and also efficient usage of the log storage? I don't thing dumping the failing documents in the logs is actionable either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #25

};
} catch (JsonProcessingException e) {
LOGGER.error("Error during bulk upsert for documents:{}", documents, e);
throw new IOException("Error during bulk upsert.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not passing the previous exception makes us loosing all the context on this error. Is there any reason for not doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the API should mask the implementation specific exception details but this is actually a library so I'll fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #25

return new BasicDBObject(ID_KEY, key.toString());
}

private BasicDBObject selectionCriteriaForKeys(Set<Key> keys) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be private? A simple inspection tells me yes it does but I am not 100% sure. Tho ID_KEY is static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to ask about static right? Fixing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I meant static, sorry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #25

.collect(Collectors.joining(", "));

String space = " ";
String query = new StringBuilder("SELECT * FROM")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually have reluctancies with the select *, do we really need to pass it all, can't we make explicit the fields we pass so that changes in the schema don't affect this kind of methods silently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcchavezs The need here is indeed the full document since we want to return all the fields of the document's current copy, as the API requires it.
At this layer, we don't know what all fields are present in the document actually.

@jcchavezs
Copy link
Contributor

@buchi-busireddy I am sorry I am late to the party but I left you some comments.

@jcchavezs
Copy link
Contributor

Hi @buchi-busireddy I wonder if you had time to put up some of the fixed we talked about in this PR post merge.

@buchi-busireddy
Copy link
Contributor Author

@jcchavezs Thanks for reminding and sorry that I missed raising the PR. I do have the changes locally and will raise shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants