[GRPC] Add SMILE/CBOR/YAML document format support to Bulk GRPC endpoint#19744
[GRPC] Add SMILE/CBOR/YAML document format support to Bulk GRPC endpoint#19744karenyrx merged 6 commits intoopensearch-project:mainfrom
Conversation
8168c3c to
25991d7
Compare
|
❌ Gradle check result for 25991d7: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Karen X <[email protected]>
Signed-off-by: Karen X <[email protected]>
Signed-off-by: Karen X <[email protected]>
|
❌ Gradle check result for 4ce3caf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 4ce3caf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
This isn't a one-way door, right? If we ever find that the autodetected type is wrong, we have the option of adding a document_type field that will bypass the autodetection. |
msfroh
left a comment
There was a problem hiding this comment.
Nice, simple improvement!
|
❌ Gradle check result for 41bfa33: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 41bfa33: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Karen X <[email protected]>
|
❌ Gradle check result for 9ad3d33: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #19744 +/- ##
============================================
- Coverage 73.15% 73.09% -0.06%
+ Complexity 70958 70940 -18
============================================
Files 5736 5736
Lines 324734 324743 +9
Branches 46979 46980 +1
============================================
- Hits 237548 237380 -168
- Misses 68031 68252 +221
+ Partials 19155 19111 -44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Karen X <[email protected]>
|
❌ Gradle check result for 7499a2e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❕ Gradle check result for 7499a2e: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
…int (opensearch-project#19744) * [GRPC] Add SMILE/CBOR/YAML document format support to Bulk API Signed-off-by: Karen X <[email protected]> * update UTs Signed-off-by: Karen X <[email protected]> * add code cov Signed-off-by: Karen X <[email protected]> --------- Signed-off-by: Karen X <[email protected]>
…int (opensearch-project#19744) * [GRPC] Add SMILE/CBOR/YAML document format support to Bulk API Signed-off-by: Karen X <[email protected]> * update UTs Signed-off-by: Karen X <[email protected]> * add code cov Signed-off-by: Karen X <[email protected]> --------- Signed-off-by: Karen X <[email protected]>
Description
This PR adds auto-detection capability to the gRPC Bulk API to support ingestion of all OpenSearch XContent document formats (CBOR, SMILE, and YAML), not just JSON.
The main motivation is to improve performance via binary formats (CBOR, SMILE). A secondary reason is to maintain feature parity with the HTTP APIs.
Differences: REST Bulk vs gRPC Bulk API
Some differences compared to the HTTP side are:
\n) to parse the NDJSON format, where each line represents either an action metadata object or a document. (JSON uses\nand SMILE uses\0xFFas the delimiter between documents, but CBOR/YAML do not have such delimeters). Thus HTTP Bulk using NDJSON cannot support CBOR/YAML. gRPC avoids this because it uses Protobufs with explicit message boundaries (bulk_request_body[] array), eliminating the need for stream separators.application/json,application/smile) must be provided to determine the format of the request. The gRPC request parser usesMediaTypeRegistry.mediaTypeFromBytesto auto-detects the document format. An alternative considered was to provide a "document_type" field in the protobuf request to allow the user to set it explictly, but this didn't seem necessary.Test Plan
Note: Smile is unable to be tested via a grpccurl command as there is no plaintext/non-binary representation for the SMILE document. But unit tests confirm SMILE format detection + setting is working.
Related Issues
Partially resolves #19311
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.