Arrow Flight Connector Template#24427
Conversation
| <groupId>org.apache.arrow</groupId> | ||
| <artifactId>arrow-jdbc</artifactId> | ||
| <version>${dep.arrow.version}</version> | ||
| <scope>test</scope> |
There was a problem hiding this comment.
moved this to test dependency
| <dependency> | ||
| <groupId>org.apache.arrow</groupId> | ||
| <artifactId>arrow-memory-netty</artifactId> | ||
| <scope>runtime</scope> |
There was a problem hiding this comment.
This is needed as a runtime dependency
| public Block buildBlockFromFieldVector(FieldVector vector, Type type, DictionaryProvider dictionaryProvider) | ||
| { | ||
| // Use Arrow dictionary to create a DictionaryBlock | ||
| if (dictionaryProvider != null && vector.getField().getDictionary() != null) { |
There was a problem hiding this comment.
A DictionaryBlock is created only if a dictionary provider is supplied (by the FlightStream) and the Arrow Field contains a dictionary
| // TODO: need method handles to construct MapType | ||
| throw new UnsupportedOperationException("MapType is currently unsupported"); | ||
| } | ||
| case Struct: { |
There was a problem hiding this comment.
Added Struct -> RowType
| return BooleanType.BOOLEAN; | ||
| case Time: | ||
| return TimeType.TIME; | ||
| case List: { |
There was a problem hiding this comment.
Added List type -> ArrayType
| Field entryField = children.get(0); | ||
| checkArgument(entryField.getChildren().size() == 2, "Arrow Map entries expected to have 2 child Fields, got: " + children.size()); | ||
| // TODO: need method handles to construct MapType | ||
| throw new UnsupportedOperationException("MapType is currently unsupported"); |
There was a problem hiding this comment.
I could not implement the MapType because need to figure out how to define the MethodHandles. Any ideas for this @tdcmeehan ?
| } | ||
| } | ||
|
|
||
| private void assignBlockFromValueVector(ValueVector vector, Type type, BlockBuilder builder, int startIndex, int endIndex) |
There was a problem hiding this comment.
Modified this to work with nested data types
| } | ||
| } | ||
|
|
||
| public void assignBlockFromListVector(ListVector vector, Type type, BlockBuilder builder, int startIndex, int endIndex) |
There was a problem hiding this comment.
Added this to extract values directly from the element vector
| } | ||
| } | ||
|
|
||
| public void assignBlockFromStructVector(StructVector vector, Type type, BlockBuilder builder, int startIndex, int endIndex) |
There was a problem hiding this comment.
Added this to extract values from all child vectors of the RowType
| this.metadata = requireNonNull(metadata, "Metadata is null"); | ||
| this.splitManager = requireNonNull(splitManager, "splitManager is null"); | ||
| this.pageSourceProvider = requireNonNull(pageSourceProvider, "pageSourceProvider is null"); | ||
| this.connectorAllocator = requireNonNull(connectorAllocator, "connectorAllocator is null"); |
There was a problem hiding this comment.
Changed this to use a BufferAllocator instance. The connector will own this allocator and close it on shutdown. Flight clients created by the connector create a child allocator from this that is closed with the client.
|
|
||
| ArrowErrorCode(int code, ErrorType type) | ||
| { | ||
| errorCode = new ErrorCode(code + 0x0510_0000, name(), type); |
There was a problem hiding this comment.
changed error code range to be unique
| public void configure(Binder binder) | ||
| { | ||
| configBinder(binder).bindConfig(ArrowFlightConfig.class); | ||
| binder.bind(BufferAllocator.class).to(RootAllocator.class).in(Scopes.SINGLETON); |
There was a problem hiding this comment.
Added this as the default RootAllocator instance, will be owned by the connector
| this.columnHandles = requireNonNull(columnHandles, "columnHandles is null"); | ||
| requireNonNull(clientHandler, "clientHandler is null"); | ||
| this.arrowBlockBuilder = requireNonNull(arrowBlockBuilder, "arrowBlockBuilder is null"); | ||
| this.flightStreamAndClient = clientHandler.getFlightStream(split, connectorSession); |
There was a problem hiding this comment.
Cleaned up the stream usage and closing here
| @Override | ||
| public Page getNextPage() | ||
| { | ||
| logger.debug("Reading next Arrow record batch"); |
There was a problem hiding this comment.
Added some debug logs
| implements ConnectorTableHandle | ||
| { | ||
| private final String schema; | ||
| private final String table; |
There was a problem hiding this comment.
There was previously discussion about making these optional #23032 (review)
| import static java.nio.file.Files.newInputStream; | ||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| public abstract class BaseArrowFlightClientHandler |
There was a problem hiding this comment.
Renamed this to better indicate that it is a wrapper around the client and not an instance of the client, was confusing
|
|
||
| public BaseArrowFlightClientHandler(BufferAllocator allocator, ArrowFlightConfig config) | ||
| { | ||
| this.allocator = requireNonNull(allocator, "allocator is null"); |
There was a problem hiding this comment.
Changed allocator usage here, it no longer creates a root allocator for each client instance. Instead an allocator is required in the constructor
| } | ||
| } | ||
|
|
||
| protected ClientClosingFlightStream getFlightStream(ArrowSplit split, ConnectorSession connectorSession) |
There was a problem hiding this comment.
Removed the Ticket parameter. Since it is in the ArrowSplit already, we can just deserialize once
b530fd2 to
2d81e93
Compare
| public class TestingArrowFlightPlugin | ||
| extends ArrowPlugin | ||
| { | ||
| public TestingArrowFlightPlugin() |
There was a problem hiding this comment.
this is part of the testing connector, should go into the testingConnector package
|
|
||
| public class TestingH2DatabaseSetup | ||
| { | ||
| private static final Logger logger = Logger.get(TestingH2DatabaseSetup.class); |
There was a problem hiding this comment.
this is not a part of testing connector, can be moved to outside package
| public class TestingArrowProducer | ||
| implements FlightProducer | ||
| { | ||
| private final BufferAllocator allocator; |
There was a problem hiding this comment.
A connector implementation does not need this right? can be moved to outside package.
There was a problem hiding this comment.
I grouped everything needed by the connector into this package, including the server stuff. It seems like it would be a little clearer to move all testing server related classes into it's own package testingServer. I'll update the docs to explain this too.
| - ``ArrowBlockBuilder.java`` | ||
| This class builds Presto blocks from Arrow vectors. Extend this class if needed and override ``getPrestoTypeFromArrowField`` method, if any customizations are needed for the conversion of Arrow vector to Presto type. A binding for this class should be created in the ``Module`` for the plugin. | ||
|
|
||
| A reference implementation of the presto-base-arrow-flight module in the test folder of this module uses a locally started Flight server to fetch data from an H2 database. The classes ``TestingArrowBlockBuilder``, ``TestingArrowFlightClientHandler``, ``TestingArrowFlightPlugin``, ``TestingArrowFlightRequest``, ``TestingArrowFlightResponse``, ``TestingArrowModule`` and ``TestingArrowQueryBuilder`` make up the reference implementation. |
There was a problem hiding this comment.
Doc can refer to the testingConnector package to indicate the reference connector implementation
| public void close() | ||
| { | ||
| if (flightStreamAndClient.hasRoot()) { | ||
| completed = true; |
There was a problem hiding this comment.
Why set completed to true only when flight stream has root?
There was a problem hiding this comment.
This is how it was before, but looking into it further, it's not needed here. when flightStreamAndClient.next() returns false, that will set the completed flag, indicating that no more pages will be produced. I'll take this out.
59e58f4 to
9986850
Compare
| public ArrowSplit( | ||
| @JsonProperty("schemaName") @Nullable String schemaName, | ||
| @JsonProperty("tableName") String tableName, | ||
| @JsonProperty("flightEndpoint") byte[] flightEndpoint) |
There was a problem hiding this comment.
Let's call it flightEndpointBytes to indicate it's a serialized representation of the endpoint.
There was a problem hiding this comment.
Yeah, sounds better
| } | ||
|
|
||
| logger.info("Executing query: " + query); | ||
| query = query.toUpperCase(); // Optionally, to maintain consistency |
There was a problem hiding this comment.
This comment doesn't appear to mean anything?
There was a problem hiding this comment.
I'm not really sure. I'll remove the comment and keep the uppercase conversion, in case that is preferred by the h2 driver.
This adds the Arrow Flight base module as a template to build connectors that use a Flight service to transfer data with Presto in Arrow format. A concrete Arrow Flight connector implementation will extend the BaseArrowFlightClientHandler that can connect to a Flight service that handles the required RPC calls from the client handler. An example connector implementation is provided in the unit testing. RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md Co-authored-by: sai bhaskar reddy <sai.bhaskar.reddy.sabbasani1@ibm.com> Co-authored-by: SthuthiGhosh9400 <Sthuthi.Ghosh@ibm.com> Co-authored-by: lithinwxd <Lithin.Purushothaman@ibm.com> Co-authored-by: Steve Burnett <burnett@pobox.com> Co-authored-by: elbinpallimalilibm <elbin.pallimalil@ibm.com> Co-authored-by: Tim Meehan <tim@timdmeehan.com>
9986850 to
79c00d5
Compare
This adds the Arrow Flight base module as a template to build connectors that use a Flight service to transfer data with Presto in Arrow format. A concrete Arrow Flight connector implementation will extend the BaseArrowFlightClientHandler that can connect to a Flight service that handles the required RPC calls from the client handler. An example connector implementation is provided in the unit testing. RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md Co-authored-by: sai bhaskar reddy <sai.bhaskar.reddy.sabbasani1@ibm.com> Co-authored-by: SthuthiGhosh9400 <Sthuthi.Ghosh@ibm.com> Co-authored-by: lithinwxd <Lithin.Purushothaman@ibm.com> Co-authored-by: Steve Burnett <burnett@pobox.com> Co-authored-by: elbinpallimalilibm <elbin.pallimalil@ibm.com> Co-authored-by: Tim Meehan <tim@timdmeehan.com>
This adds the Arrow Flight base module as a template to build connectors that use a Flight service to transfer data with Presto in Arrow format. A concrete Arrow Flight connector implementation will extend the BaseArrowFlightClientHandler that can connect to a Flight service that handles the required RPC calls from the client handler. An example connector implementation is provided in the unit testing. RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md Co-authored-by: sai bhaskar reddy <sai.bhaskar.reddy.sabbasani1@ibm.com> Co-authored-by: SthuthiGhosh9400 <Sthuthi.Ghosh@ibm.com> Co-authored-by: lithinwxd <Lithin.Purushothaman@ibm.com> Co-authored-by: Steve Burnett <burnett@pobox.com> Co-authored-by: elbinpallimalilibm <elbin.pallimalil@ibm.com> Co-authored-by: Tim Meehan <tim@timdmeehan.com>
Description
This adds the Arrow Flight base module as a template to build connectors that use a Flight service to transfer data with Presto in Arrow format. A concrete Arrow Flight connector implementation will extend the BaseArrowFlightClientHandler that can connect to a Flight service that handles the required RPC calls from the client handler. An example connector implementation is provided in the unit testing.
Supersedes: #23032
RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md
Motivation and Context
Adds the ability to easily create an Arrow Flight connector that can connect with any data source supported by the Flight service and transfer data efficiently using the Arrow format.
Impact
Adding new base-arrow-flight module.
Test Plan
Unit tests added with testing Arrow Flight connector implementation. These include extension of com.facebook.presto.tests.AbstractTestQueries that test general queries over Flight. Additional roundtrip testing of all data types supported by the ArrowBlockBuilder is done. An ArrowFlightQueryRunner is also provided that can be run standalone for manual query testing.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.