CDRIVER-4363 add client bulk write#1590
Conversation
This is a drive-by fix to fix the referenced payload numbers.
Otherwise, if the type mismatches, the test runner aborts with an error: "expected int64, int32, or double [...]" that does not identify the field.
To do exact BSON matches
To ease reading error message
include some tests of basic usage and libmongoc-specific behavior. Specification and prose tests test driver-agnostic behavior.
To support `w=majority`
137db61 to
b031041
Compare
src/libmongoc/tests/unified/result.c
Outdated
| if (write_errors) { | ||
| if (write_errors && !result->write_errors) { |
There was a problem hiding this comment.
I believe the write_errors in the nested if is redundant.
| bson_oid_init (&oid, NULL); | ||
| BSON_ASSERT (BSON_APPEND_OID (&tmp, "_id", &oid)); | ||
| BSON_ASSERT (bson_concat (&tmp, document)); | ||
| BSON_ASSERT (bson_append_document (&op, "document", 8, &tmp)); |
There was a problem hiding this comment.
Rather than hardcoding a string length, I find it more maintainable to use something like strlen("document") instead. The compiler should optimize away the call to strlen. See this example on godbolt https://godbolt.org/z/bhzoddYKo.
There was a problem hiding this comment.
PHP internals often does sizeof(<str literal>) - 1, but strlen() is certainly more concise if it gets compiled the same.
There also used to be a PHP macro for expanding a string literal into two arguments (the string and length), but it had a cryptic name and casual readers might not realize it expands and assume a parameter is missing. If that sounds useful, readability could probably be addressed with a more verbose name (e.g. STR_AND_LEN()). The main benefit is avoiding duplication (be it the original string or its length).
There was a problem hiding this comment.
The compiler should optimize away the call to strlen.
Neat, thank you for the example. Updated to use the BSON_APPEND_* macros to remove the hard coded lengths. The macros add a strlen call.
| } | ||
|
|
||
| mongoc_bulkwritereturn_t | ||
| mongoc_bulkwrite_execute (mongoc_bulkwrite_t *self, const mongoc_bulkwriteopts_t *opts) |
There was a problem hiding this comment.
This function is nearly 600 lines long. Is it possible to break this function up into smaller functions? The general rule of thumb that I use is to make functions that fit in a single screen size without needing to scroll (approximately 80 lines).
There was a problem hiding this comment.
Updated to split into two new helpers: _bulkwritereturn_apply_result and _bulkwritereturn_apply_reply.
An attempt to more aggressively split did not improve readability IMO. I think separate functions are helpful when it is "easy" to reason about what state is changed. Result handling seemed like an improvement, since only the mongoc_bulkwritereturn_t is updated.
Though this function is still long, block scopes with comments were used to logically group. Code folding can collapse blocks when reading:
| tf->maxMessageSizeBytes = sl.maxMessageSizeBytes; | ||
| tf->maxBsonObjectSize = sl.maxBsonObjectSize; | ||
|
|
||
| const int32_t opsBytes = tf->maxMessageSizeBytes - 1122; |
There was a problem hiding this comment.
Please comment the meaning of 1122 or create a constant with a meaningful name and assign this value to it. Please do the same for other locations where magic numbers are used, e.g., 57, 217, etc.
There was a problem hiding this comment.
The values are copied from the prose test. The prose test describes the calculation.
Added a comment referring to the prose tests.
Replace with existing `mongoc_optional_t`.
Find once and store the offset.
Array filters do not apply to replace. Remove to match spec.
The return was not being set. Callers can determine if the write was acknowledged by checking if `mongoc_bulkwritereturn_t::res` is NULL.
Only select a new stream if another batch is needed.
Reduces repeated assignment
Without the error propagation change, the comparison of int to string aborts (rather than returning an error).
Fix updateResults and deleteResults to return values as int64 to match spec.
Use `lookup_as_int64` for `ok` field. Server's `BulkWriteReplyItem` appends as double. Support other numerics for future proofing.
kevinAlbs
left a comment
There was a problem hiding this comment.
Latest changes verified with this patch build.
| bson_oid_init (&oid, NULL); | ||
| BSON_ASSERT (BSON_APPEND_OID (&tmp, "_id", &oid)); | ||
| BSON_ASSERT (bson_concat (&tmp, document)); | ||
| BSON_ASSERT (bson_append_document (&op, "document", 8, &tmp)); |
There was a problem hiding this comment.
The compiler should optimize away the call to strlen.
Neat, thank you for the example. Updated to use the BSON_APPEND_* macros to remove the hard coded lengths. The macros add a strlen call.
| tf->maxMessageSizeBytes = sl.maxMessageSizeBytes; | ||
| tf->maxBsonObjectSize = sl.maxBsonObjectSize; | ||
|
|
||
| const int32_t opsBytes = tf->maxMessageSizeBytes - 1122; |
There was a problem hiding this comment.
The values are copied from the prose test. The prose test describes the calculation.
Added a comment referring to the prose tests.
jmikola
left a comment
There was a problem hiding this comment.
Reviewed requested changes from my previous review. Note that I did not diligently examine the test changes (just the public API we'd end up using in PHPC).
| mongoc_bulkwriteopts_set_let (mongoc_bulkwriteopts_t *self, const bson_t *let) | ||
| { | ||
| BSON_ASSERT_PARAM (self); | ||
| BSON_ASSERT (let || true); |
There was a problem hiding this comment.
I'd rather not introduce more instances of this funny line. I think we need something like:
#define BSON_OPTIONAL_PARAM (x) (void)(x)There was a problem hiding this comment.
Added BSON_OPTIONAL_PARAM.

Summary
This PR implements the
MongoClient.bulkWriteAPI proposed in: mongodb/specifications#1534RST documentation was omitted from this PR in case the API changes during review. I plan to add RST documentation in a future PR.
Commits are separated to ease filtering the JSON test files and minor drive-by improvements. See the bulk-write.md file in mongodb/specifications#1534 for the expected behavior.
Testing
JSON files are copied from the mongodb/specifications#1534. These were reviewed in the spec PR and I do not expect need another review. test-mongoc-crud.c includes the prose tests from the specification. test-mongoc-bulkwrite.c includes libmongoc-specific tests.
Background
Terms
The following terms are used in this PR description:
MongoCollection.bulkWriterefers to the existing API from the CRUD specification. The C driver implements in terms ofmongoc_bulk_operation_t.MongoClient.bulkWriterefers to the new API in the new Bulk Write specification. This PR implements in terms of the newmongoc_bulkwrite_thandle.MongoCollection.bulkWritesplits write operations by type, and sends separateinsert,update, anddeleteserver commands.MongoCollection.bulkWritewrites to only one collection.MongoClient.bulkWriteuses thebulkWriteserver command. ThebulkWritecommand supports insert, update, and delete operations for multiple namespaces. ThebulkWritecommand is introduced in server 8.0.MongoClient.bulkWritecan write to multiple collections.Both bulk write APIs require splitting large writes into separate commands to stay within server size limits. E.g. a call to
MongoClient.bulkWritewith many writes may result in multiplebulkWritecommands sent.Rationale
The following describes rationale behind some decisions in this PR:
Models
The specification defines model classes for each write. Here is the
DeleteOneModeldefinition (with comments removed for brevity):This PR represents the required fields in arguments, and groups optional fields in an opaque
optsstruct:I thought this was a reasonable compromise between matching the spec and existing C driver patterns. If the spec later extends
DeleteOneModel,mongoc_bulkwrite_deleteoneopts_tcan be extended.The existing C driver implementation of
MongoCollection.bulkWriteusesbson_tfor options:However, I think a typed
mongoc_bulkwrite_deleteoneopts_tis preferable tobson_t. It may ease development with compile time type checks and auto complete.Naming
This PR uses a slightly different API naming pattern. Parts are logically separated by an underscore. Example:
mongoc_bulkwriteopts_set_bypassdocumentvalidation.This differs slightly from the established pattern of separating each word. Example in current API:
mongoc_bulk_operation_set_bypass_document_validation.I find the logical separation easier to read.
Validation
The existing C driver implementation of
MongoCollection.bulkWriteincludes default client-side validation:The default flags for writes are all the same:
UTF-8 validation appears broken. Passing invalid UTF-8 for string values does not result in error. See: CDRIVER-4448. The default validation only enables
BSON_VALIDATE_EMPTY_KEYSto check there are no empty keys (e.g.{"": "foo"}).CDRIVER-3731 is a request to configure validation at a broader scope. Quoting jeroen/mongolite#206:
The C driver benchmarks disable the default validation. I re-enabled the validation and reran the benchmark. This resulted in significant performance drops in the base comparison:
Since UTF-8 validation is already broken, there are requests to disable, validation negatively impacts performance, and it is not a spec requirement, this PR excludes default validation in
MongoClient.bulkWrite.