Skip to content

Commit 27e751c

Browse files
authored
Optimize gRPC Response Parsing Performance (#553)
## Problem The current implementation uses `json_format.MessageToDict` to convert entire protobuf messages to dictionaries when parsing gRPC responses. This is a significant CPU bottleneck when processing large numbers of vectors, as reported in PR #537 where users experienced ~100 vectors/second throughput. The `MessageToDict` conversion is expensive because it: 1. Serializes the entire protobuf message to JSON 2. Deserializes it back into a Python dictionary 3. Does this for every field, even when we only need specific fields Additionally, several other performance issues were identified: - Metadata conversion using `MessageToDict` on `Struct` messages - Inefficient list construction (append vs pre-allocation) - Unnecessary dict creation for `SparseValues` parsing - Response header processing overhead ## Solution Optimized all gRPC response parsing functions in `pinecone/grpc/utils.py` to directly access protobuf fields instead of converting entire messages to dictionaries. This approach: 1. **Directly accesses protobuf fields**: Uses `response.vectors`, `response.matches`, `response.namespace`, etc. directly 2. **Optimized metadata conversion**: Created `_struct_to_dict()` helper that directly accesses `Struct` fields (~1.5-2x faster than `MessageToDict`) 3. **Pre-allocates lists**: Uses `[None] * len()` for known-size lists (~6.5% improvement) 4. **Direct SparseValues creation**: Creates `SparseValues` objects directly instead of going through dict conversion (~410x faster) 5. **Caches protobuf attributes**: Stores repeated attribute accesses in local variables 6. **Optimized response info extraction**: Improved `extract_response_info()` performance with module-level constants and early returns 7. **Maintains backward compatibility**: Output format remains identical to the previous implementation ## Performance Impact Performance testing of the response parsing functions show significant improvements across all optimized functions. ## Changes ### Modified Files - `pinecone/grpc/utils.py`: Optimized 9 response parsing functions with direct protobuf field access - Added `_struct_to_dict()` helper for optimized metadata conversion (~1.5-2x faster) - Pre-allocated lists where size is known (~6.5% improvement) - Direct `SparseValues` creation (removed dict conversion overhead) - Cached protobuf message attributes - Removed dead code paths (dict fallback in `parse_usage`) - `pinecone/grpc/index_grpc.py`: Updated to pass protobuf messages directly to parse functions - `pinecone/grpc/resources/vector_grpc.py`: Updated to pass protobuf messages directly to parse functions - `pinecone/utils/response_info.py`: Optimized `extract_response_info()` with module-level constants and early returns - `tests/perf/test_fetch_response_optimization.py`: New performance tests for fetch response parsing - `tests/perf/test_query_response_optimization.py`: New performance tests for query response parsing - `tests/perf/test_other_parse_methods.py`: New performance tests for all other parse methods - `tests/perf/test_grpc_parsing_perf.py`: Extended with additional benchmarks ### Technical Details **Core Optimizations**: 1. **`_struct_to_dict()` Helper Function**: - Directly accesses protobuf `Struct` and `Value` fields - Handles all value types (null, number, string, bool, struct, list) - Recursively processes nested structures - ~1.5-2x faster than `json_format.MessageToDict` for metadata conversion 2. **List Pre-allocation**: - `parse_query_response`: Pre-allocates `matches` list with `[None] * len(matches_proto)` - `parse_list_namespaces_response`: Pre-allocates `namespaces` list - ~6.5% performance improvement over append-based construction 3. **Direct SparseValues Creation**: - Replaced `parse_sparse_values(dict)` with direct `SparseValues(indices=..., values=...)` creation - ~410x faster (avoids dict creation and conversion overhead) ## Testing - All existing unit tests pass (224 tests in `tests/unit_grpc`) - Comprehensive pytest benchmark tests added for all optimized functions: - `test_fetch_response_optimization.py`: Tests for fetch response with varying metadata sizes - `test_query_response_optimization.py`: Tests for query response with varying match counts, dimensions, metadata sizes, and sparse vectors - `test_other_parse_methods.py`: Tests for all other parse methods (fetch_by_metadata, list_namespaces, stats, upsert, update, namespace_description) - Mypy type checking passes with and without grpc extras (with types extras) - No breaking changes - output format remains identical ## Related This addresses the performance issue reported in PR #537, implementing a similar optimization approach but adapted for the current codebase structure. All parse methods have been optimized with comprehensive performance testing to verify improvements.
1 parent 53082f1 commit 27e751c

File tree

9 files changed

+1389
-173
lines changed

9 files changed

+1389
-173
lines changed

pinecone/grpc/index_grpc.py

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
import logging
44
from typing import List, Any, Iterable, cast, Literal, Iterator, TYPE_CHECKING
55

6-
from google.protobuf import json_format
76

87
from pinecone.utils.tqdm import tqdm
98
from pinecone.utils import require_kwargs
@@ -15,6 +14,7 @@
1514
parse_fetch_response,
1615
parse_fetch_by_metadata_response,
1716
parse_query_response,
17+
query_response_to_dict,
1818
parse_stats_response,
1919
parse_upsert_response,
2020
parse_update_response,
@@ -41,6 +41,7 @@
4141
from pinecone.core.grpc.protos.db_data_2025_10_pb2 import (
4242
Vector as GRPCVector,
4343
QueryVector as GRPCQueryVector,
44+
QueryResponse as ProtoQueryResponse,
4445
UpsertRequest,
4546
DeleteRequest,
4647
QueryRequest,
@@ -501,13 +502,13 @@ def _query(
501502
include_metadata: bool | None = None,
502503
sparse_vector: (SparseValues | GRPCSparseValues | SparseVectorTypedDict) | None = None,
503504
**kwargs,
504-
) -> tuple[dict[str, Any], dict[str, str] | None]:
505+
) -> tuple[ProtoQueryResponse, dict[str, str] | None]:
505506
"""
506-
Low-level query method that returns raw JSON dict and initial metadata without parsing.
507+
Low-level query method that returns protobuf Message and initial metadata without parsing.
507508
Used internally by query() and query_namespaces() for performance.
508509
509510
Returns:
510-
Tuple of (json_dict, initial_metadata). initial_metadata may be None.
511+
Tuple of (protobuf_message, initial_metadata). initial_metadata may be None.
511512
"""
512513
if vector is not None and id is not None:
513514
raise ValueError("Cannot specify both `id` and `vector`")
@@ -535,7 +536,7 @@ def _query(
535536

536537
timeout = kwargs.pop("timeout", None)
537538
response, initial_metadata = self.runner.run(self.stub.Query, request, timeout=timeout)
538-
return json_format.MessageToDict(response), initial_metadata
539+
return response, initial_metadata
539540

540541
def query(
541542
self,
@@ -626,8 +627,8 @@ def query(
626627
future, result_transformer=parse_query_response, timeout=timeout
627628
)
628629
else:
629-
# For sync requests, use _query to get raw dict and metadata, then parse it
630-
json_response, initial_metadata = self._query(
630+
# For sync requests, use _query to get protobuf Message and metadata, then parse it
631+
response, initial_metadata = self._query(
631632
vector=vector,
632633
id=id,
633634
namespace=namespace,
@@ -640,7 +641,7 @@ def query(
640641
**kwargs,
641642
)
642643
return parse_query_response(
643-
json_response, _check_type=False, initial_metadata=initial_metadata
644+
response, _check_type=False, initial_metadata=initial_metadata
644645
)
645646

646647
def query_namespaces(
@@ -681,8 +682,9 @@ def query_namespaces(
681682

682683
only_futures = cast(Iterable[Future], futures)
683684
for response in as_completed(only_futures):
684-
json_response, _ = response.result() # Ignore initial_metadata for query_namespaces
685-
# Pass raw dict directly to aggregator - no parsing needed
685+
proto_response, _ = response.result() # Ignore initial_metadata for query_namespaces
686+
# Convert protobuf Message to dict format for aggregator using optimized helper
687+
json_response = query_response_to_dict(proto_response)
686688
aggregator.add_results(json_response)
687689

688690
final_results = aggregator.get_results()
@@ -946,8 +948,7 @@ def describe_index_stats(
946948

947949
request = DescribeIndexStatsRequest(**args_dict)
948950
response, _ = self.runner.run(self.stub.DescribeIndexStats, request, timeout=timeout)
949-
json_response = json_format.MessageToDict(response)
950-
return parse_stats_response(json_response)
951+
return parse_stats_response(response)
951952

952953
@require_kwargs
953954
def create_namespace(

pinecone/grpc/resources/vector_grpc.py

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
import logging
44
from typing import Any, Iterable, cast, Literal
55

6-
from google.protobuf import json_format
76

87
from pinecone.utils.tqdm import tqdm
98
from concurrent.futures import as_completed, Future
@@ -13,6 +12,7 @@
1312
parse_fetch_response,
1413
parse_fetch_by_metadata_response,
1514
parse_query_response,
15+
query_response_to_dict,
1616
parse_stats_response,
1717
parse_upsert_response,
1818
parse_update_response,
@@ -32,6 +32,7 @@
3232
from pinecone.db_control.models.list_response import ListResponse as SimpleListResponse, Pagination
3333
from pinecone.core.grpc.protos.db_data_2025_10_pb2 import (
3434
Vector as GRPCVector,
35+
QueryResponse as ProtoQueryResponse,
3536
UpsertRequest,
3637
DeleteRequest,
3738
QueryRequest,
@@ -444,13 +445,13 @@ def _query(
444445
include_metadata: bool | None = None,
445446
sparse_vector: (SparseValues | GRPCSparseValues | SparseVectorTypedDict) | None = None,
446447
**kwargs,
447-
) -> tuple[dict[str, Any], dict[str, str] | None]:
448+
) -> tuple[ProtoQueryResponse, dict[str, str] | None]:
448449
"""
449-
Low-level query method that returns raw JSON dict and initial metadata without parsing.
450+
Low-level query method that returns protobuf Message and initial metadata without parsing.
450451
Used internally by query() and query_namespaces() for performance.
451452
452453
Returns:
453-
Tuple of (json_dict, initial_metadata). initial_metadata may be None.
454+
Tuple of (protobuf_message, initial_metadata). initial_metadata may be None.
454455
"""
455456
if vector is not None and id is not None:
456457
raise ValueError("Cannot specify both `id` and `vector`")
@@ -478,7 +479,7 @@ def _query(
478479

479480
timeout = kwargs.pop("timeout", None)
480481
response, initial_metadata = self._runner.run(self._stub.Query, request, timeout=timeout)
481-
return json_format.MessageToDict(response), initial_metadata
482+
return response, initial_metadata
482483

483484
def query(
484485
self,
@@ -569,8 +570,8 @@ def query(
569570
future, result_transformer=parse_query_response, timeout=timeout
570571
)
571572
else:
572-
# For sync requests, use _query to get raw dict and metadata, then parse it
573-
json_response, initial_metadata = self._query(
573+
# For sync requests, use _query to get protobuf Message and metadata, then parse it
574+
response, initial_metadata = self._query(
574575
vector=vector,
575576
id=id,
576577
namespace=namespace,
@@ -583,7 +584,7 @@ def query(
583584
**kwargs,
584585
)
585586
return parse_query_response(
586-
json_response, _check_type=False, initial_metadata=initial_metadata
587+
response, _check_type=False, initial_metadata=initial_metadata
587588
)
588589

589590
def query_namespaces(
@@ -658,8 +659,9 @@ def query_namespaces(
658659

659660
only_futures = cast(Iterable[Future], futures)
660661
for response in as_completed(only_futures):
661-
json_response, _ = response.result() # Ignore initial_metadata for query_namespaces
662-
# Pass raw dict directly to aggregator - no parsing needed
662+
proto_response, _ = response.result() # Ignore initial_metadata for query_namespaces
663+
# Convert protobuf Message to dict format for aggregator using optimized helper
664+
json_response = query_response_to_dict(proto_response)
663665
aggregator.add_results(json_response)
664666

665667
final_results = aggregator.get_results()
@@ -853,5 +855,4 @@ def describe_index_stats(
853855

854856
request = DescribeIndexStatsRequest(**args_dict)
855857
response, _ = self._runner.run(self._stub.DescribeIndexStats, request, timeout=timeout)
856-
json_response = json_format.MessageToDict(response)
857-
return parse_stats_response(json_response)
858+
return parse_stats_response(response)

0 commit comments

Comments
 (0)