Conversation
|
Consider adding documentation for the new connector. Suggest revising the release note entry to follow the Release Note Guidelines: |
tdcmeehan
left a comment
There was a problem hiding this comment.
It seems like there's a hardcoded presumption that the underlying datasource accepts a SQL query. Can you remove this from the PR? The service may not accept SQL.
| public List<Field> getColumnsList(String schema, String table, ConnectorSession connectorSession) | ||
| { | ||
| try { | ||
| String dbSpecificSchemaName = getDBSpecificSchemaName(config, schema); |
There was a problem hiding this comment.
Avoid "DB" references as it might not be an underlying DB
There was a problem hiding this comment.
I have removed "DB" references
| @JsonProperty("jdbcType") int jdbcType, | ||
| @JsonProperty("jdbcTypeName") String jdbcTypeName, |
There was a problem hiding this comment.
We can't have references to JDBC in this connector.
There was a problem hiding this comment.
I have removed JDBC re references
|
Suggest change to the release note entry as follows: The documentation for Arrow Flight Connector appears to be being added in #23212 , so it doesn't need to be mentioned in this release note. |
BryanCutler
left a comment
There was a problem hiding this comment.
I took a quick pass and had a few comments. I'll take another look at the rest a little later.
presto-base-arrow-flight/pom.xml
Outdated
| <air.main.basedir>${project.parent.basedir}</air.main.basedir> | ||
| <grpc.version>1.53.0</grpc.version> | ||
| <dep.okhttp.version>4.10.0</dep.okhttp.version> | ||
| <arrow.version>11.0.0</arrow.version> |
There was a problem hiding this comment.
This is a fairly old version of Arrow, can you use a more recent one?
There was a problem hiding this comment.
@BryanCutler I have updated to leastest arrow version
presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowColumnHandle.java
Outdated
Show resolved
Hide resolved
| return; | ||
| } | ||
| try { | ||
| RootAllocator allocator = new RootAllocator(Long.MAX_VALUE); |
There was a problem hiding this comment.
You would usually create a RootAllocator as a class member and they should be closed when not used anymore
There was a problem hiding this comment.
@BryanCutler I have made changes for closing allocator
There was a problem hiding this comment.
We are not re-using the flight client. The root allocator will be closed when ArrowFlightClient is closed or auto closed.
There was a problem hiding this comment.
where is this RootAllocator closed then? The FlightClient creates a child allocator that is closed along with the client, it will not close the root
There was a problem hiding this comment.
You wouldn't normally create a RootAllocator for each client, it could be reused as each client will internally make a child allocator. It's ok to clean that up later, since you are closing it here.
There was a problem hiding this comment.
How would we determine when the root allocator should be closed? Should we keep track of open flight clients and close root allocator when there are no flight clients open for some duration of time?
There was a problem hiding this comment.
I would create 1 RootAllocator for the connector and then close it during Connector.shutdown(). The FlightClient creates it's own child allocator that is closed with the client, so the root won't be keeping any buffers directly.
presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowFlightClientHandler.java
Outdated
Show resolved
Hide resolved
| trustedCertificate.get().close(); | ||
| } | ||
| shutdownTimer(); | ||
| isClientClosed.set(true); |
There was a problem hiding this comment.
Do the calls to getClient() and close() need to be thread-safe?
There was a problem hiding this comment.
@BryanCutler I have addressed comments can please review changes
|
|
a69b92e to
7bd4ec9
Compare
| import static java.util.Objects.requireNonNull; | ||
| import static java.util.stream.Collectors.joining; | ||
|
|
||
| public class ArrowQueryBuilder |
There was a problem hiding this comment.
This might better belong in a submodule that depends on this module which implements the Flight SQL spec. I don't think it belongs here.
There was a problem hiding this comment.
@tdcmeehan I have addressed comments can please review changes
04cdc53 to
824175b
Compare
|
@tdcmeehan @BryanCutler @steveburnett |
6d9d75b to
11bada5
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the docs! Minor suggestions, mostly formatting.
A local doc build returns the following warning:
/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/connector/base-arrow-flight.rst: WARNING: document isn't included in any toctree
To address this warning,
-
Add
connector/base-arrow-flightto
https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector.rst so the new page is present on the Connector page in the Presto documentation. -
Add the new line in alphabetic order for the new page title Arrow-flight Connector.
|
Suggest update of the release note to include the PR number in both entries, and to link to the new doc from the release note. |
3b73ee3 to
c4e77f1
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for including the doc from #23212 here! A few minor suggestions, looks good.
a2ed31a to
734fa2b
Compare
| --------------------------------------------------------------------------------- | ||
| To create a plugin extending the base-arrow-module, you need to implement certain abstract methods that are specific to your use case. Below are the required classes and their purposes: | ||
|
|
||
| * ``AbstractArrowFlightClientHandler.java`` |
There was a problem hiding this comment.
@lithinwxd Can you update these instructions to match the latest code changes?
3cb87b6 to
ed3d407
Compare
| --------------------------------------------------------------------------------- | ||
| To create a plugin extending the presto-base-arrow-flight module, you need to implement certain abstract methods that are specific to your use case. Below are the required classes and their purposes: | ||
|
|
||
| - ``BaseArrowFlightClient.java`` |
There was a problem hiding this comment.
@steveburnett We have updated the documentation to match the latest code base. Can you review?
f763a93 to
3dcded8
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
| } | ||
| else { | ||
| String value = new String(vector.get(i), StandardCharsets.UTF_8); | ||
| type.writeSlice(builder, Slices.utf8Slice(value)); |
There was a problem hiding this comment.
Can you create a Slice directly from the raw UTF-8 bytes from vector.get(i) using Slice wrappedBuffer(byte... array)? That would be more efficient that copying to a String first.
| elementBuilder.appendNull(); | ||
| } | ||
| else { | ||
| appendValueToBuilder(elementType, elementBuilder, value); |
There was a problem hiding this comment.
Can you just call buildBlockFromValueVector with the element vector instead of making a new function to append Objects?
There was a problem hiding this comment.
buildBlockFromValueVector gives a block, but here we want to append multiple values to the same block.
There was a problem hiding this comment.
There is a lot of additional code to deal with writing elements and it's poor performance to call reader.readObject() for each element of the list.
I think what you could do is make BlockBuilder an argument instead of creating it in each function, then build() after calling the function
void buildBlockFromValueVector(ValueVector vector, Type type, BlockBuilder builder)
then in buildBlockFromListVector you would call this function with the element vector and elementBuilder, for example
buildBlockFromValueVector(vector.getDataVector(), elementType, elementBuilder);There was a problem hiding this comment.
vector.getDataVector() gives all the values from the list, not just the element vector. Are you aware of a way to get the element vector.
There was a problem hiding this comment.
You can look here for an example of accessing elements within a ListVector https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L525
|
|
||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| public class ClientClosingFlightStream |
There was a problem hiding this comment.
I don't think this class is necessary. You shouldn't lump a client and stream together, just keep them separate.
There was a problem hiding this comment.
@tdcmeehan is it okay for you to decouple client and stream ?
There was a problem hiding this comment.
Since we need to close the client and stream together, it's convenient for encapsulation to put this in one type (otherwise the user of the BaseFlightClient needs to supply a client, which needs to be closed separately). What about a more generic AutoCloseableSupplier or something along those lines?
There was a problem hiding this comment.
I made this comment because the Flight Server dictates how many streams a requested flight will have, and it could be multiple. The client would normally process all streams, closing each one independently. I guess if we are assuming there will always be 1 stream only for the given client, this is ok. But if that ever changes or the client needs to perform some additional actions, then this wouldn't be useful.
There was a problem hiding this comment.
This remains true. Because streams are opened on a per-split basis, the concurrency is being managed by the Presto evaluation engine, and we can safely make the presumption that there will be 1 client per split, 1 stream per client per split.
dcf2b14 to
034c5d7
Compare
Co-authored-by: sai bhaskar reddy <sai.bhaskar.reddy.sabbasani1@ibm.com> Co-authored-by: SthuthiGhosh9400 <Sthuthi.Ghosh@ibm.com> Co-authored-by: lithinwxd <Lithin.Purushothaman@ibm.com> Co-authored-by: Steve Burnett <burnett@pobox.com> Co-authored-by: elbinpallimalilibm <elbin.pallimalil@ibm.com> Co-authored-by: Tim Meehan <tim@timdmeehan.com>
034c5d7 to
2bcce6b
Compare
|
Hi @sabbasani @elbinpallimalilibm based on the discussion we had with Shweta and Sandhya, we are gonna take over this work, @BryanCutler will help move this PR over the finishing line. We keep everyone who worked on this PR as co-authors. Thank you. cc @tdcmeehan |
| @JsonProperty("schema") String schema, | ||
| @JsonProperty("table") String table) | ||
| { | ||
| this.schema = schema; |
There was a problem hiding this comment.
@BryanCutler We are testing table valued functions with an extension of this connector and there is a use case where the PTVF won't have a table and schema associated with it. So we need to consider such a use case when designing the table handle for this connector.
There was a problem hiding this comment.
Could you explain this use case more? I don't quite understand how a TableHandle would not have a table.
There was a problem hiding this comment.
DM'd you the details.
| @JsonProperty("schema") String schema, | ||
| @JsonProperty("table") String table) | ||
| { | ||
| this.schema = schema; |
There was a problem hiding this comment.
Use Optional and provide appropriate constructor argument validation for schema and table.
Eg:
private final Optional <String> schema;
and
this. schema = requireNonNull(schema, "schema is null");
There was a problem hiding this comment.
To reiterate, the ArrowConnector implementation controls the value of schema and table here, so it could set them to a unique identifier even if they are not necessarily needed as you say. I'm not sure it's correct to make these properties optional, and I don't see that done in any other connector. @tdcmeehan can correct me if I'm mistaken.
|
Continuing this PR at #24427 |
| <dependency> | ||
| <groupId>io.perfmark</groupId> | ||
| <artifactId>perfmark-api</artifactId> | ||
| <version>${perfmark-api.version}</version> |
There was a problem hiding this comment.
@elbinpallimalilibm the managed dependencies for perfmark-api and error_prone_annotations don't seem to be needed anymore, was there a specific reason they were added?
There was a problem hiding this comment.
We were seeing upper bound errors on build without mentioning the dependencies in the POM. Is that error not happening now?
There was a problem hiding this comment.
Not seeing any errors now, looks like error prone version has been upgraded in root pom
|
#24427 has been merged, will close this one |


Description
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: