refactor(protocol): Add new generic thrift toolkit module for connectors#26259
refactor(protocol): Add new generic thrift toolkit module for connectors#26259tdcmeehan merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideThis PR extracts and centralizes Thrift serialization logic into a new reusable connector toolkit module, extends SPI and server components to support ColumnHandle codecs, and updates the TPCDS connector and project poms to integrate the new toolkit. Class diagram for new ThriftCodecProvider and GenericThriftCodecclassDiagram
class ThriftCodecProvider {
-ThriftCodecManager thriftCodecManager
-Optional<Type> connectorSplitType
-Optional<Type> connectorTransactionHandle
-Optional<Type> connectorTableLayoutHandle
-Optional<Type> connectorTableHandle
-Optional<Type> connectorOutputTableHandle
-Optional<Type> connectorInsertTableHandle
-Optional<Type> connectorDeleteTableHandle
-Optional<Type> connectorColumnHandle
+getConnectorSplitCodec()
+getConnectorTransactionHandleCodec()
+getConnectorTableLayoutHandleCodec()
+getConnectorTableHandleCodec()
+getConnectorOutputTableHandleCodec()
+getConnectorInsertTableHandleCodec()
+getConnectorDeleteTableHandleCodec()
+getColumnHandleCodec()
+getThriftCodecManager()
}
class ThriftCodecProvider.Builder {
-ThriftCodecManager thriftCodecManager
-Optional<Type> connectorSplitType
-Optional<Type> connectorTransactionHandle
-Optional<Type> connectorTableLayoutHandle
-Optional<Type> connectorTableHandle
-Optional<Type> connectorOutputTableHandle
-Optional<Type> connectorInsertTableHandle
-Optional<Type> connectorDeleteTableHandle
-Optional<Type> connectorColumnHandle
+setThriftCodecManager()
+setConnectorSplitType()
+setConnectorTransactionHandle()
+setConnectorTableLayoutHandle()
+setConnectorTableHandle()
+setConnectorOutputTableHandle()
+setConnectorInsertTableHandle()
+setConnectorDeleteTableHandle()
+setConnectorColumnHandle()
+build()
}
ThriftCodecProvider.Builder --> ThriftCodecProvider
ThriftCodecProvider --> GenericThriftCodec
class GenericThriftCodec<T> {
-ThriftCodec<T> thriftCodec
+serialize(T value)
+deserialize(byte[] bytes)
}
GenericThriftCodec <|.. ConnectorCodec
ThriftCodecProvider ..> ThriftCodecManager
GenericThriftCodec ..> ThriftCodecManager
Class diagram for updated ConnectorCodecManager and ConnectorCodecProviderclassDiagram
class ConnectorCodecManager {
-Map<String, ConnectorCodecProvider> connectorCodecProviders
+getTableHandleCodec(String connectorId)
+getColumnHandleCodec(String connectorId)
}
class ConnectorCodecProvider {
+getConnectorTableHandleCodec()
+getColumnHandleCodec()
}
ConnectorCodecManager --> ConnectorCodecProvider
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
d88dfc6 to
e84d692
Compare
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Replace the
assert (byteBuffer.position() == 0)in ColumnHandleThriftCodec with explicit validation and error handling, since assertions may be disabled in production. - Consider tightening GenericThriftCodec’s constructor to accept Class directly instead of a generic Type to eliminate the runtime cast and provide a clearer API.
- ThriftCodecProvider.Builder exposes one setter per handle type; if more codecs are needed in the future consider consolidating into a registration map or varargs-based API to reduce boilerplate.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Replace the `assert (byteBuffer.position() == 0)` in ColumnHandleThriftCodec with explicit validation and error handling, since assertions may be disabled in production.
- Consider tightening GenericThriftCodec’s constructor to accept Class<T> directly instead of a generic Type to eliminate the runtime cast and provide a clearer API.
- ThriftCodecProvider.Builder exposes one setter per handle type; if more codecs are needed in the future consider consolidating into a registration map or varargs-based API to reduce boilerplate.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/server/thrift/ColumnHandleThriftCodec.java:63-65` </location>
<code_context>
+ public ColumnHandle readConcreteValue(String connectorId, TProtocolReader reader)
+ throws Exception
+ {
+ ByteBuffer byteBuffer = reader.readBinary();
+ assert (byteBuffer.position() == 0);
+ byte[] bytes = byteBuffer.array();
+ return connectorCodecManager.getColumnHandleCodec(connectorId).map(codec -> codec.deserialize(bytes)).orElse(null);
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Potential risk with ByteBuffer.array() usage.
Directly accessing byteBuffer.array() may cause issues if the buffer is not array-backed or contains extra data. To ensure correct deserialization, use byteBuffer.get() with byteBuffer.remaining() to extract only the relevant bytes.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/server/thrift/ColumnHandleThriftCodec.java:74` </location>
<code_context>
+ throws Exception
+ {
+ requireNonNull(value, "value is null");
+ writer.writeBinary(ByteBuffer.wrap(connectorCodecManager.getColumnHandleCodec(connectorId).map(codec -> codec.serialize(value)).orElseThrow(() -> new IllegalArgumentException("Can not serialize " + value))));
+ }
+
</code_context>
<issue_to_address>
**suggestion:** Exception message could be more informative.
Consider including connectorId and value type in the exception to improve debugging.
```suggestion
writer.writeBinary(ByteBuffer.wrap(
connectorCodecManager.getColumnHandleCodec(connectorId)
.map(codec -> codec.serialize(value))
.orElseThrow(() -> new IllegalArgumentException(
String.format(
"Can not serialize value '%s' of type '%s' for connectorId '%s'",
value,
value.getClass().getName(),
connectorId
)
))
));
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ByteBuffer byteBuffer = reader.readBinary(); | ||
| assert (byteBuffer.position() == 0); | ||
| byte[] bytes = byteBuffer.array(); |
There was a problem hiding this comment.
issue (bug_risk): Potential risk with ByteBuffer.array() usage.
Directly accessing byteBuffer.array() may cause issues if the buffer is not array-backed or contains extra data. To ensure correct deserialization, use byteBuffer.get() with byteBuffer.remaining() to extract only the relevant bytes.
034dd56 to
8069b21
Compare
29258b8 to
41830d5
Compare
| throws Exception | ||
| { | ||
| ByteBuffer byteBuffer = reader.readBinary(); | ||
| assert (byteBuffer.position() == 0); |
There was a problem hiding this comment.
Let's remove assert keyword and use checkArgument
| ByteBuffer byteBuffer = reader.readBinary(); | ||
| assert (byteBuffer.position() == 0); | ||
| byte[] bytes = byteBuffer.array(); | ||
| return connectorCodecManager.getColumnHandleCodec(connectorId).map(codec -> codec.deserialize(bytes)).orElse(null); |
There was a problem hiding this comment.
I believe this should just throw to fail fast.
| </parent> | ||
|
|
||
| <artifactId>presto-thrift-connector-toolkit</artifactId> | ||
| <name>presto-thrift-toolkit</name> |
There was a problem hiding this comment.
| <name>presto-thrift-toolkit</name> | |
| <name>presto-thrift-connector-toolkit</name> |
|
How would the C++ side generate the same Thrift IDL? |
41830d5 to
28f281e
Compare
61615ed to
aa1112d
Compare
aa1112d to
03bd54c
Compare
03bd54c to
b385622
Compare
|
@tdcmeehan I modified the PR - the thrift IDL hasn't changed now since I removed the Column Handle codec so no C++ changes are needed. |
963655c to
c08f4dc
Compare
a0515fd to
3714272
Compare
3714272 to
63712e0
Compare
| { | ||
| private final ThriftCodec<T> thriftCodec; | ||
|
|
||
| public GenericThriftCodec(ThriftCodecManager codecManager, Type javaType, Class<T> expectedType) |
There was a problem hiding this comment.
Why not just take Class<T> directly instead of Type? The current signature requires callers to pass both javaType and expectedType when they're effectively the same thing.
| private Optional<Type> connectorOutputTableHandle = Optional.empty(); | ||
| private Optional<Type> connectorInsertTableHandle = Optional.empty(); | ||
| private Optional<Type> connectorDeleteTableHandle = Optional.empty(); | ||
| private Optional<Type> connectorColumnHandle = Optional.empty(); |
There was a problem hiding this comment.
This field is defined but never used - no setter in the Builder and no corresponding getConnectorColumnHandleCodec() method. Can you add it?
There was a problem hiding this comment.
I removed the column handle codec since it'll require Prestissimo changes & isn't required yet - removed this reference
| <version>0.297-SNAPSHOT</version> | ||
| </parent> | ||
|
|
||
| <artifactId>presto-thrift-connector-toolkit</artifactId> |
There was a problem hiding this comment.
Since we'll be sharing these among modules, could we add some basic unit tests that aren't tied to any particular module?
| return this; | ||
| } | ||
|
|
||
| public ThriftCodecProvider build() |
There was a problem hiding this comment.
Consider validating that thriftCodecManager is set before building - it's required but could be null if the caller forgets to call setThriftCodecManager().
a722092 to
1800369
Compare
|
Thanks a lot! Please rebase to pick up the new tests |
1800369 to
1496f17
Compare
|
Should this change be documented in the Developer Guide section of the Presto documentation? Maybe in a new and different PR, and perhaps Connectors would be a good place for the doc if it doesn't need its own page. |
Description
The thrift implementation for the tpcds connector can be generalized so that it can be reused.
Motivation and Context
We can use this new interface in other connectors to add support for thrift serialization
Impact
None
Test Plan
The current thrift tpcds UTs
Contributor checklist
Release Notes