-
Notifications
You must be signed in to change notification settings - Fork 113
Add support for gRPC transport #938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for gRPC transport #938
Conversation
6e2b8f9 to
a304630
Compare
|
KNN Query needs further investigation. REST API KNN Query gRPC protobuf KNN Query Flame graphs suggest gRPC/protobuf operation is not utilizing derived fields and fetching KNN document source. This should not be the case and accounts for ~10% of CPU time. Will update the above numbers after enabling derived source for gRPC/protobuf. |
|
Note: These previous latency benchmarks are not accurate. Previously we were measuring serialization time for the gRPC client but not the REST client. Moved gRPC request start/end timers to execute in an intercepter such that we are measuring the time closer to the network. Throughput calculations remain unchanged. |
fb62472 to
cf2a1a9
Compare
|
Profiling gRPC vs REST benchmarks I'm noticing an additional issue in this implementation. Benchmark results attached here: In these tests throughput for gRPC index append is worse than REST even though we previously found the opposite here: The difference seems to come down to connection pooling. Looking at connections opened on relevant ports by OSB we observe REST opening 8 connections: With gRPC opening only 4 connections: gRPC connections additionally vary wildly in throughput. The difference I think is connection pooling in The main issue with the gRPC implementation here I believe is sharing a single gRPC To better mirror REST client behavior each worker should have, at least, its own distinct channel instance. |
783e238 to
68f523b
Compare
|
Updated latency benchmarks for bulk with noted performance regression fixed: Confirming network throughput is now equal between REST and gRPC: Latency additionally is as we expect based on flame graphs. REST: gRPC: |
|
Full REST index append: gRPC index append: |
gkamat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add appropriate unit tests and integ tests for this feature.
Some additional logging lines may be helpful for debugging.
Please convert to a regular PR when ready to be merged.
osbenchmark/worker_coordinator/proto_helpers/ProtoKNNQueryHelper.py
Outdated
Show resolved
Hide resolved
osbenchmark/worker_coordinator/proto_helpers/ProtoQueryHelper.py
Outdated
Show resolved
Hide resolved
osbenchmark/worker_coordinator/proto_helpers/ProtoQueryHelper.py
Outdated
Show resolved
Hide resolved
osbenchmark/worker_coordinator/proto_helpers/ProtoQueryHelper.py
Outdated
Show resolved
Hide resolved
osbenchmark/worker_coordinator/proto_helpers/ProtoQueryHelper.py
Outdated
Show resolved
Hide resolved
- Extends opensearchpy client with an additional async gRPC stub supplier. - Adds config options for specifying gRPC server host endpoint. - Adds new operation types for gRPC/protobuf operations. - Adds runner implementations for above operations executing request in gRPC. - Introduces proto helpers to implement conversion logic of params -> protobuf. Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
…ted. Signed-off-by: Finn Carroll <[email protected]>
c68e05c to
fdddc75
Compare
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
|
Hi @gkamat, it looks like the integration tests for OSB run against a few select versions and source OS from the published release .tar. I expect the gRPC apis in this PR will largely be compatible with versions >3.3 of OpenSource. I'm wondering if I need to wait for 3.3 to be published to introduce ITs for these changes? |
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
|
Sharing benchmarking results for KNN query over gRPC. (REST) prod-queriesService time: Throughput: (gRPC) grpc-prod-queriesService time: Throughput: Next stepsWhile we observe little difference in the service time between REST and gRPC there is dramatic improvement in throughput. This could be a reflection of client side benefits as throughput measures operations per second of wall-clock time or some other network benefit of HTTP2. While the serialization performance of the server improves when using gRPC, we can see from flame graphs this optimization is not fully realized on this workload as we spend the majority of our time within the query phase. About 5% of CPU is seen to be processing request/response pairs for the client/server connection. Next steps here include root cause investigation into throughput gains we see in gRPC, as well as varying the dataset to determine which types of workloads benefit the most from improved performance of client/server transport layer. |
Signed-off-by: Finn Carroll <[email protected]>
2d66855 to
dda8c03
Compare
gkamat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the suggestions need to go in right away, perhaps in a future PR.
| Sub channels manage the underlying connection with the server. When the global sub channel pool is used gRPC will | ||
| re-use sub channels and their underlying connections which does not appropriately reflect a multi client scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably could be clearer: using local subchannels permits additional connections to be created each with their own pools, which can improve performance.
| for _, term_value in value.items(): | ||
| terms[key].append(term_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terms[key].extend(value.values())
might be simpler here.
| raise Exception("Error parsing query - Term query contains multiple terms: " + str(query)) | ||
|
|
||
| # Term query body gives field/value as lists | ||
| term_field = next(iter(term.keys())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: next(iter(term)) should suffice.
| @staticmethod | ||
| def build_proto_request(params): | ||
| body = params.get("body") | ||
| size = body.get("size") if "size" in body else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size = body.get("size")
should suffice.
| self.assertEqual(result.request_body.query.term.field, "log.file.path") | ||
| self.assertEqual(result.request_body.query.term.value.string, "/var/log/messages/birdknight") | ||
|
|
||
| def test_build_proto_request_term_query_multi_field_fails(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps another test for the error case where there are multiple keys?








Description
Changes provide support for gRPC transport operations in OSB:
Note that these changes depend on several upstream changes:
Running gRPC benchmarks
To run OSB against the gRPC endpoint please see
transport-grpcsettings for enabling this endpoint in settings:https://github.com/opensearch-project/OpenSearch/tree/main/modules/transport-grpc
Clone the OSB workloads feature branch and provide it with the
--workload-path:opensearch-project/opensearch-benchmark-workloads#689
With example output:
Issues Resolved
N/A
Testing
- [ ] New functionality includes testingTested manually.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.