-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Add support for custom connector-provided serialization codecs (OVERVIEW) #26026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideEnables optional connector-provided binary serialization codecs for connector handle objects in the JSON protocol, controlled via a new feature flag, while maintaining existing typed JSON serialization as fallback. Class diagram for updated AbstractTypedJacksonModule serialization logicclassDiagram
class AbstractTypedJacksonModule {
<<abstract>>
+TYPE_PROPERTY : String
+DATA_PROPERTY : String
+AbstractTypedJacksonModule(baseClass, nameResolver, classResolver, binarySerializationEnabled, codecExtractor)
}
class CodecSerializer {
+CodecSerializer(nameResolver, classResolver, codecExtractor)
+serialize(value, jsonGenerator, provider)
+serializeWithType(value, gen, serializers, typeSer)
}
class CodecDeserializer {
+CodecDeserializer(classResolver, codecExtractor)
+deserialize(parser, context)
+deserializeWithType(p, ctxt, typeDeserializer)
}
AbstractTypedJacksonModule --> CodecSerializer : uses
AbstractTypedJacksonModule --> CodecDeserializer : uses
class InternalTypeSerializer {
+serialize(value, jsonGenerator, provider)
}
class InternalTypeDeserializer {
+deserialize(parser, context)
}
AbstractTypedJacksonModule --> InternalTypeSerializer : uses (legacy)
AbstractTypedJacksonModule --> InternalTypeDeserializer : uses (legacy)
class ConnectorCodecProvider {
+getConnectorTableHandleCodec()
+getColumnHandleCodec()
+getConnectorPartitioningHandleCodec()
+getConnectorIndexHandleCodec()
+getConnectorDeleteTableHandleCodec()
+getConnectorInsertTableHandleCodec()
+getConnectorOutputTableHandleCodec()
+getConnectorSplitCodec()
+getConnectorTransactionHandleCodec()
}
class ConnectorCodec {
+serialize(value)
+deserialize(data)
}
ConnectorCodecProvider --> ConnectorCodec : provides
CodecSerializer --> ConnectorCodecProvider : queries
CodecDeserializer --> ConnectorCodecProvider : queries
CodecSerializer --> ConnectorCodec : uses
CodecDeserializer --> ConnectorCodec : uses
Class diagram for updated Jackson modules for handle typesclassDiagram
class IndexHandleJacksonModule {
+IndexHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class ColumnHandleJacksonModule {
+ColumnHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class DeleteTableHandleJacksonModule {
+DeleteTableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class InsertTableHandleJacksonModule {
+InsertTableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class OutputTableHandleJacksonModule {
+OutputTableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class PartitioningHandleJacksonModule {
+PartitioningHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class SplitJacksonModule {
+SplitJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class TableHandleJacksonModule {
+TableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class TableLayoutHandleJacksonModule {
+TableLayoutHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
class TransactionHandleJacksonModule {
+TransactionHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
}
IndexHandleJacksonModule --|> AbstractTypedJacksonModule
ColumnHandleJacksonModule --|> AbstractTypedJacksonModule
DeleteTableHandleJacksonModule --|> AbstractTypedJacksonModule
InsertTableHandleJacksonModule --|> AbstractTypedJacksonModule
OutputTableHandleJacksonModule --|> AbstractTypedJacksonModule
PartitioningHandleJacksonModule --|> AbstractTypedJacksonModule
SplitJacksonModule --|> AbstractTypedJacksonModule
TableHandleJacksonModule --|> AbstractTypedJacksonModule
TableLayoutHandleJacksonModule --|> AbstractTypedJacksonModule
TransactionHandleJacksonModule --|> AbstractTypedJacksonModule
Class diagram for ConnectorManager and ConnectorCodecProvider relationshipclassDiagram
class ConnectorManager {
+getConnectorCodecProvider(connectorId)
}
class ConnectorCodecProvider {
+getConnectorTableHandleCodec()
+getColumnHandleCodec()
+getConnectorPartitioningHandleCodec()
+getConnectorIndexHandleCodec()
+getConnectorDeleteTableHandleCodec()
+getConnectorInsertTableHandleCodec()
+getConnectorOutputTableHandleCodec()
+getConnectorSplitCodec()
+getConnectorTransactionHandleCodec()
}
ConnectorManager --> ConnectorCodecProvider : returns Optional
Class diagram for FeaturesConfig with new propertyclassDiagram
class FeaturesConfig {
-useConnectorProvidedSerializationCodecs : boolean
+isUseConnectorProvidedSerializationCodecs()
+setUseConnectorProvidedSerializationCodecs(boolean)
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Add documentation for the new
use-connector-provided-serialization-codecsproperty in admin/properties.rst so it shows up in the reference guide. - The TPC-DS tests are using the non-existent property
binary-connector-serialization-over-json-enabled; update them to use the actualuse-connector-provided-serialization-codecskey. - In BinaryAwareSerializer you use a Guava Cache plus ExecutionException handling; consider replacing it with a simple ConcurrentHashMap<Class<?>, JsonSerializer> to simplify caching and remove exception wrapping.
Prompt for AI Agents
Please address the comments from this code review: ## Overall Comments - Add documentation for the new `use-connector-provided-serialization-codecs` property in admin/properties.rst so it shows up in the reference guide. - The TPC-DS tests are using the non-existent property `binary-connector-serialization-over-json-enabled`; update them to use the actual `use-connector-provided-serialization-codecs` key. - In BinaryAwareSerializer you use a Guava Cache plus ExecutionException handling; consider replacing it with a simple ConcurrentHashMap<Class<?>, JsonSerializer<Object>> to simplify caching and remove exception wrapping. ## Individual Comments ### Comment 1 <location> `presto-main-base/src/main/java/com/facebook/presto/metadata/AbstractTypedJacksonModule.java:225` </location> <code_context> + } + } + // @data field present but no codec available or internal handle + throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it"); + } + </code_context> <issue_to_address> Error message could be improved for troubleshooting. Consider adding connector ID and expected codec type to the error message for easier debugging. </issue_to_address> <suggested_fix> <<<<<<< SEARCH // @data field present but no codec available or internal handle throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it"); ======= // @data field present but no codec available or internal handle String expectedCodecType = "unknown"; if (!connectorIdString.startsWith("$")) { ConnectorId connectorId = new ConnectorId(connectorIdString); Optional<ConnectorCodec<T>> codec = codecExtractor.apply(connectorId); if (codec.isPresent()) { expectedCodecType = codec.get().getClass().getName(); } } throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it. " + "Connector ID: " + connectorIdString + ", Expected codec type: " + expectedCodecType); >>>>>>> REPLACE </suggested_fix> ### Comment 2 <location> `presto-docs/src/main/sphinx/admin/properties.rst:580` </location> <code_context> +* **Type:** ``boolean`` +* **Default value:** ``false`` + +Enables use of custom connector-provided serialization codecs for handles. +This feature allows connectors to use their own serialization format for +handle objects (such as table handles, column handles, and splits) instead </code_context> <issue_to_address> Consider adding 'the' for grammatical correctness: 'Enables the use of custom connector-provided serialization codecs for handles.' This change will make the documentation clearer and more grammatically correct. </issue_to_address> <suggested_fix> <<<<<<< SEARCH Enables use of custom connector-provided serialization codecs for handles. ======= Enables the use of custom connector-provided serialization codecs for handles. >>>>>>> REPLACE </suggested_fix>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // @data field present but no codec available or internal handle | ||
| throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Error message could be improved for troubleshooting.
Consider adding connector ID and expected codec type to the error message for easier debugging.
| // @data field present but no codec available or internal handle | |
| throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it"); | |
| // @data field present but no codec available or internal handle | |
| String expectedCodecType = "unknown"; | |
| if (!connectorIdString.startsWith("$")) { | |
| ConnectorId connectorId = new ConnectorId(connectorIdString); | |
| Optional<ConnectorCodec<T>> codec = codecExtractor.apply(connectorId); | |
| if (codec.isPresent()) { | |
| expectedCodecType = codec.get().getClass().getName(); | |
| } | |
| } | |
| throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it. " | |
| + "Connector ID: " + connectorIdString + ", Expected codec type: " + expectedCodecType); |
| * **Type:** ``boolean`` | ||
| * **Default value:** ``false`` | ||
|
|
||
| Enables use of custom connector-provided serialization codecs for handles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (typo): Consider adding 'the' for grammatical correctness: 'Enables the use of custom connector-provided serialization codecs for handles.'
This change will make the documentation clearer and more grammatically correct.
| Enables use of custom connector-provided serialization codecs for handles. | |
| Enables the use of custom connector-provided serialization codecs for handles. |
1f23e6e to
05b75f1
Compare
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit of formatting.
| network has high latency or if there are many nodes in the cluster. | ||
|
|
||
| ``use-connector-provided-serialization-codecs`` | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
7aa52ff to
c8e417e
Compare
| super(baseClass.getSimpleName() + "Module", Version.unknownVersion()); | ||
|
|
||
| TypeIdResolver typeResolver = new InternalTypeResolver<>(nameResolver, classResolver); | ||
| requireNonNull(baseClass, "baseClass is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the null before using it at line 69?
| this.blockEncodingSerde = requireNonNull(blockEncodingSerde, "blockEncodingSerde is null"); | ||
| this.connectorSystemConfig = () -> featuresConfig.isNativeExecutionEnabled(); | ||
| this.connectorCodecManager = requireNonNull(connectorCodecManager, "connectorThriftCodecManager is null"); | ||
| this.tupleDomainJsonCodec = requireNonNull(tupleDomainCodec, "tupleDomainCodec is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own learning, why treat tupledomain as a special case and why having its serde in connector context instead of providing its serde from connector codec provider? Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TupleDomain values are serialized as blocks, which don't have a generic means to be serialized. The runtime must provide that code.
Add support for binary deserialization of connector handles through the ConnectorProtocol interface. This enables connectors to provide custom binary deserialization alongside the existing JSON support.
Add binary deserialization for TPCH connector handles in C++ to match the Java serialization format. Includes TpchPrestoToVeloxConnector for converting protocol objects to Velox representations.
|
|
||
| handle->_type = "tpch"; | ||
|
|
||
| proto = handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move()
…ion codecs (#26257) ## Description Add support for custom connector-provided serialization codecs ## Motivation and Context This will allow plugin connectors to be written in C++. RFC: prestodb/rfcs#49 End to end changes migrating TPCH to the new framework: #26026 ## Impact No immediate impact ## Test Plan Included tests ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes ``` == NO RELEASE NOTE == ```
Description
Add support for custom connector-provided serialization codecs. This will allow the JSON protocol to also use custom codecs for connector data structures, which will make it possible to add dynamic C++ connectors.
Motivation and Context
Dynamically registered C++ connectors
Impact
When the
use-connector-provided-serialization-codecsproperty is enabled, then the JSON serialization is altered to serialize the connector data structs using the connector's provided codecs.Test Plan
Tests are included, end to end tests added for TPC-DS.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.