Skip to content

Conversation

@tdcmeehan
Copy link
Contributor

Description

Add support for custom connector-provided serialization codecs. This will allow the JSON protocol to also use custom codecs for connector data structures, which will make it possible to add dynamic C++ connectors.

Motivation and Context

Dynamically registered C++ connectors

Impact

When the use-connector-provided-serialization-codecs property is enabled, then the JSON serialization is altered to serialize the connector data structs using the connector's provided codecs.

Test Plan

Tests are included, end to end tests added for TPC-DS.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

* Add support for custom connector-provided serialization codecs for handle objects.
  See :doc:`/admin/properties` for the ``use-connector-provided-serialization-codecs`` configuration property.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Sep 11, 2025
@prestodb-ci prestodb-ci requested review from a team, bibith4 and wanglinsong and removed request for a team September 11, 2025 21:27
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Sep 11, 2025

Reviewer's Guide

Enables optional connector-provided binary serialization codecs for connector handle objects in the JSON protocol, controlled via a new feature flag, while maintaining existing typed JSON serialization as fallback.

Class diagram for updated AbstractTypedJacksonModule serialization logic

classDiagram
    class AbstractTypedJacksonModule {
        <<abstract>>
        +TYPE_PROPERTY : String
        +DATA_PROPERTY : String
        +AbstractTypedJacksonModule(baseClass, nameResolver, classResolver, binarySerializationEnabled, codecExtractor)
    }
    class CodecSerializer {
        +CodecSerializer(nameResolver, classResolver, codecExtractor)
        +serialize(value, jsonGenerator, provider)
        +serializeWithType(value, gen, serializers, typeSer)
    }
    class CodecDeserializer {
        +CodecDeserializer(classResolver, codecExtractor)
        +deserialize(parser, context)
        +deserializeWithType(p, ctxt, typeDeserializer)
    }
    AbstractTypedJacksonModule --> CodecSerializer : uses
    AbstractTypedJacksonModule --> CodecDeserializer : uses
    class InternalTypeSerializer {
        +serialize(value, jsonGenerator, provider)
    }
    class InternalTypeDeserializer {
        +deserialize(parser, context)
    }
    AbstractTypedJacksonModule --> InternalTypeSerializer : uses (legacy)
    AbstractTypedJacksonModule --> InternalTypeDeserializer : uses (legacy)
    class ConnectorCodecProvider {
        +getConnectorTableHandleCodec()
        +getColumnHandleCodec()
        +getConnectorPartitioningHandleCodec()
        +getConnectorIndexHandleCodec()
        +getConnectorDeleteTableHandleCodec()
        +getConnectorInsertTableHandleCodec()
        +getConnectorOutputTableHandleCodec()
        +getConnectorSplitCodec()
        +getConnectorTransactionHandleCodec()
    }
    class ConnectorCodec {
        +serialize(value)
        +deserialize(data)
    }
    ConnectorCodecProvider --> ConnectorCodec : provides
    CodecSerializer --> ConnectorCodecProvider : queries
    CodecDeserializer --> ConnectorCodecProvider : queries
    CodecSerializer --> ConnectorCodec : uses
    CodecDeserializer --> ConnectorCodec : uses
Loading

Class diagram for updated Jackson modules for handle types

classDiagram
    class IndexHandleJacksonModule {
        +IndexHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class ColumnHandleJacksonModule {
        +ColumnHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class DeleteTableHandleJacksonModule {
        +DeleteTableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class InsertTableHandleJacksonModule {
        +InsertTableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class OutputTableHandleJacksonModule {
        +OutputTableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class PartitioningHandleJacksonModule {
        +PartitioningHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class SplitJacksonModule {
        +SplitJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class TableHandleJacksonModule {
        +TableHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class TableLayoutHandleJacksonModule {
        +TableLayoutHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    class TransactionHandleJacksonModule {
        +TransactionHandleJacksonModule(handleResolver, connectorManagerProvider, featuresConfig)
    }
    IndexHandleJacksonModule --|> AbstractTypedJacksonModule
    ColumnHandleJacksonModule --|> AbstractTypedJacksonModule
    DeleteTableHandleJacksonModule --|> AbstractTypedJacksonModule
    InsertTableHandleJacksonModule --|> AbstractTypedJacksonModule
    OutputTableHandleJacksonModule --|> AbstractTypedJacksonModule
    PartitioningHandleJacksonModule --|> AbstractTypedJacksonModule
    SplitJacksonModule --|> AbstractTypedJacksonModule
    TableHandleJacksonModule --|> AbstractTypedJacksonModule
    TableLayoutHandleJacksonModule --|> AbstractTypedJacksonModule
    TransactionHandleJacksonModule --|> AbstractTypedJacksonModule
Loading

Class diagram for ConnectorManager and ConnectorCodecProvider relationship

classDiagram
    class ConnectorManager {
        +getConnectorCodecProvider(connectorId)
    }
    class ConnectorCodecProvider {
        +getConnectorTableHandleCodec()
        +getColumnHandleCodec()
        +getConnectorPartitioningHandleCodec()
        +getConnectorIndexHandleCodec()
        +getConnectorDeleteTableHandleCodec()
        +getConnectorInsertTableHandleCodec()
        +getConnectorOutputTableHandleCodec()
        +getConnectorSplitCodec()
        +getConnectorTransactionHandleCodec()
    }
    ConnectorManager --> ConnectorCodecProvider : returns Optional
Loading

Class diagram for FeaturesConfig with new property

classDiagram
    class FeaturesConfig {
        -useConnectorProvidedSerializationCodecs : boolean
        +isUseConnectorProvidedSerializationCodecs()
        +setUseConnectorProvidedSerializationCodecs(boolean)
    }
Loading

File-Level Changes

Change Details Files
Extend AbstractTypedJacksonModule to support binary serialization and custom codecs
  • Add binarySerializationEnabled flag and codecExtractor to constructor
  • Register CodecSerializer and CodecDeserializer when flag enabled
  • Implement new CodecSerializer for Base64 encoding of connector data
  • Implement CodecDeserializer to decode Base64 and delegate to connector codecs
  • Retain legacy InternalTypeSerializer/Deserializer fallback
presto-main-base/src/main/java/com/facebook/presto/metadata/AbstractTypedJacksonModule.java
Update handle-specific Jackson modules to wire in connector codecs
  • Inject Provider and FeaturesConfig into constructors
  • Pass binarySerializationEnabled flag and connectorId→codec lookup lambda to super()
  • Disable custom codec path for FunctionHandleJacksonModule
presto-main-base/src/main/java/com/facebook/presto/index/IndexHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/ColumnHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/DeleteTableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/InsertTableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/OutputTableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/PartitioningHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/SplitJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/TableHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/TableLayoutHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/TransactionHandleJacksonModule.java
presto-main-base/src/main/java/com/facebook/presto/metadata/FunctionHandleJacksonModule.java
Introduce SPI support for connector-provided codecs and feature flag
  • Add default get*Codec methods for all handle types in ConnectorCodecProvider
  • Expose getConnectorCodecProvider in ConnectorManager
  • Add use-connector-provided-serialization-codecs property with @config and accessor in FeaturesConfig
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorCodecProvider.java
presto-main-base/src/main/java/com/facebook/presto/connector/ConnectorManager.java
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
Revise testing infrastructure to support new serialization mode
  • Replace HandleJsonModule with TestingHandleJsonModule and bind FeaturesConfig
  • Add comprehensive TestAbstractTypedJacksonModule covering binary and legacy flows
  • Enable use-connector-provided-serialization-codecs in TPC-DS and other tests
presto-main-base/src/test/java/com/facebook/presto/metadata/TestingHandleJsonModule.java
presto-main-base/src/test/java/com/facebook/presto/metadata/TestAbstractTypedJacksonModule.java
multiple modified test files under presto-main-base/src/test and presto-tpcds/src/test
Add documentation for use-connector-provided-serialization-codecs property
  • Document new configuration property in admin properties
presto-docs/src/main/sphinx/admin/properties.rst

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Add documentation for the new use-connector-provided-serialization-codecs property in admin/properties.rst so it shows up in the reference guide.
  • The TPC-DS tests are using the non-existent property binary-connector-serialization-over-json-enabled; update them to use the actual use-connector-provided-serialization-codecs key.
  • In BinaryAwareSerializer you use a Guava Cache plus ExecutionException handling; consider replacing it with a simple ConcurrentHashMap<Class<?>, JsonSerializer> to simplify caching and remove exception wrapping.
    Prompt for AI Agents
    Please address the comments from this code review:
    ## Overall Comments
    - Add documentation for the new `use-connector-provided-serialization-codecs` property in admin/properties.rst so it shows up in the reference guide.
    - The TPC-DS tests are using the non-existent property `binary-connector-serialization-over-json-enabled`; update them to use the actual `use-connector-provided-serialization-codecs` key.
    - In BinaryAwareSerializer you use a Guava Cache plus ExecutionException handling; consider replacing it with a simple ConcurrentHashMap<Class<?>, JsonSerializer<Object>> to simplify caching and remove exception wrapping.
    
    ## Individual Comments
    
    ### Comment 1
    <location> `presto-main-base/src/main/java/com/facebook/presto/metadata/AbstractTypedJacksonModule.java:225` </location>
    <code_context>
    +                        }
    +                    }
    +                    // @data field present but no codec available or internal handle
    +                    throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it");
    +                }
    +
    </code_context>
    
    <issue_to_address>
    Error message could be improved for troubleshooting.
    
    Consider adding connector ID and expected codec type to the error message for easier debugging.
    </issue_to_address>
    
    <suggested_fix>
    <<<<<<< SEARCH
                        // @data field present but no codec available or internal handle
                        throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it");
    =======
                        // @data field present but no codec available or internal handle
                        String expectedCodecType = "unknown";
                        if (!connectorIdString.startsWith("$")) {
                            ConnectorId connectorId = new ConnectorId(connectorIdString);
                            Optional<ConnectorCodec<T>> codec = codecExtractor.apply(connectorId);
                            if (codec.isPresent()) {
                                expectedCodecType = codec.get().getClass().getName();
                            }
                        }
                        throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it. "
                            + "Connector ID: " + connectorIdString + ", Expected codec type: " + expectedCodecType);
    >>>>>>> REPLACE
    
    </suggested_fix>
    
    ### Comment 2
    <location> `presto-docs/src/main/sphinx/admin/properties.rst:580` </location>
    <code_context>
    +* **Type:** ``boolean``
    +* **Default value:** ``false``
    +
    +Enables use of custom connector-provided serialization codecs for handles. 
    +This feature allows connectors to use their own serialization format for
    +handle objects (such as table handles, column handles, and splits) instead
    </code_context>
    
    <issue_to_address>
    Consider adding 'the' for grammatical correctness: 'Enables the use of custom connector-provided serialization codecs for handles.'
    
    This change will make the documentation clearer and more grammatically correct.
    </issue_to_address>
    
    <suggested_fix>
    <<<<<<< SEARCH
    Enables use of custom connector-provided serialization codecs for handles. 
    =======
    Enables the use of custom connector-provided serialization codecs for handles. 
    >>>>>>> REPLACE
    
    </suggested_fix>

    Sourcery is free for open source - if you like our reviews please consider sharing them ✨
    Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 224 to 225
// @data field present but no codec available or internal handle
throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Error message could be improved for troubleshooting.

Consider adding connector ID and expected codec type to the error message for easier debugging.

Suggested change
// @data field present but no codec available or internal handle
throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it");
// @data field present but no codec available or internal handle
String expectedCodecType = "unknown";
if (!connectorIdString.startsWith("$")) {
ConnectorId connectorId = new ConnectorId(connectorIdString);
Optional<ConnectorCodec<T>> codec = codecExtractor.apply(connectorId);
if (codec.isPresent()) {
expectedCodecType = codec.get().getClass().getName();
}
}
throw new IOException("Type " + connectorIdString + " has binary data (@data field) but no codec available to deserialize it. "
+ "Connector ID: " + connectorIdString + ", Expected codec type: " + expectedCodecType);

* **Type:** ``boolean``
* **Default value:** ``false``

Enables use of custom connector-provided serialization codecs for handles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (typo): Consider adding 'the' for grammatical correctness: 'Enables the use of custom connector-provided serialization codecs for handles.'

This change will make the documentation clearer and more grammatically correct.

Suggested change
Enables use of custom connector-provided serialization codecs for handles.
Enables the use of custom connector-provided serialization codecs for handles.

@tdcmeehan
Copy link
Contributor Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit of formatting.

network has high latency or if there are many nodes in the cluster.

``use-connector-provided-serialization-codecs``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

@tdcmeehan tdcmeehan requested a review from a team as a code owner September 17, 2025 19:14
@tdcmeehan tdcmeehan force-pushed the dyncon branch 3 times, most recently from 7aa52ff to c8e417e Compare September 17, 2025 19:50
super(baseClass.getSimpleName() + "Module", Version.unknownVersion());

TypeIdResolver typeResolver = new InternalTypeResolver<>(nameResolver, classResolver);
requireNonNull(baseClass, "baseClass is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the null before using it at line 69?

this.blockEncodingSerde = requireNonNull(blockEncodingSerde, "blockEncodingSerde is null");
this.connectorSystemConfig = () -> featuresConfig.isNativeExecutionEnabled();
this.connectorCodecManager = requireNonNull(connectorCodecManager, "connectorThriftCodecManager is null");
this.tupleDomainJsonCodec = requireNonNull(tupleDomainCodec, "tupleDomainCodec is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own learning, why treat tupledomain as a special case and why having its serde in connector context instead of providing its serde from connector codec provider? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TupleDomain values are serialized as blocks, which don't have a generic means to be serialized. The runtime must provide that code.

Add support for binary deserialization of connector handles through the
ConnectorProtocol interface. This enables connectors to provide custom
binary deserialization alongside the existing JSON support.
Add binary deserialization for TPCH connector handles in C++ to match
the Java serialization format. Includes TpchPrestoToVeloxConnector
for converting protocol objects to Velox representations.
@tdcmeehan tdcmeehan changed the title Add support for custom connector-provided serialization codecs Add support for custom connector-provided serialization codecs (OVERVIEW) Oct 8, 2025
@tdcmeehan tdcmeehan marked this pull request as draft October 9, 2025 14:57

handle->_type = "tpch";

proto = handle;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move()

tdcmeehan added a commit that referenced this pull request Nov 5, 2025
…ion codecs (#26257)

## Description
Add support for custom connector-provided serialization codecs

## Motivation and Context
This will allow plugin connectors to be written in C++.

RFC: prestodb/rfcs#49
End to end changes migrating TPCH to the new framework:
#26026

## Impact
No immediate impact

## Test Plan
Included tests

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes

```
== NO RELEASE NOTE ==
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants