-
Notifications
You must be signed in to change notification settings - Fork 5.5k
[native] Add Apache Arrow Flight Connector #24504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add Apache Arrow Flight Connector #24504
Conversation
|
Thanks for the release note! New release note guidelines. Please remove the manual PR link in the following format from the release note entries for this PR. I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link. |
c9b9df2 to
2418e65
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! One nit of formatting.
| The `tls_certs` directory contains placeholder TLS certificates generated for unit testing the Arrow Flight Connector with TLS enabled. These certificates are not intended for production use and should only be used in the context of unit tests. | ||
|
|
||
| ### Generating TLS Certificates | ||
| To create the TLS certificates and keys inside the `tls_certs` folder, run the following command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| To create the TLS certificates and keys inside the `tls_certs` folder, run the following command: | |
| To create the TLS certificates and keys inside the `tls_certs` folder, run the following command: | |
Add a line after this blank line to separate this text and the command. Without the line, it looks like this:
| } | ||
| } | ||
|
|
||
| void FlightDataSource::addSplit(std::shared_ptr<ConnectorSplit> split) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some minor cleanup in this method to simply use of the serialized FlightEndpoint
| # limitations under the License. | ||
| find_package(Arrow REQUIRED) | ||
| find_package(PkgConfig REQUIRED) | ||
| pkg_check_modules(ARROW_FLIGHT REQUIRED IMPORTED_TARGET GLOBAL arrow-flight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to add GLOBAL for using PkgConfig to prevent an -larrow-flight linker error. This seems like a better general solution to reference the libraries, see https://stackoverflow.com/questions/29191855/what-is-the-proper-way-to-use-pkg-config-from-cmake for more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why choose PkgConfig approach and not how Velox finds Arrow? We should use one approach for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rijin-N had it building with PkgConfig, and one of the first things I tried was changing this to the standard approach. I could not get it to find the flight libraries in my environment though. The only way I could get it to work was to revert back to PkgConfig and add GLOBAL.
@Rijin-N do you remember why PkgConfig was used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler I also tried to use the velox way of finding arrow-flight libraries. But it did not work. Thats why we decided to use PkgConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@majetideepak I went back and tried this again, using find_package like this in arrow_flight/CMakeLists.txt
find_package(Arrow REQUIRED)
find_package(ArrowFlight REQUIRED)
I get this CMake error:
CMake Error at ../working/deps-install/lib64/cmake/ArrowFlight/ArrowFlightConfig.cmake:64 (arrow_keep_backward_compatibility):
Unknown CMake command "arrow_keep_backward_compatibility".
Call Stack (most recent call first):
presto_cpp/main/connectors/arrow_flight/CMakeLists.txt:13 (find_package)
This errors on the line to find_package(ArrowFlight Required) but arrow_keep_backward_compatibility is defined in ArrowConfig.cmake and I'm not sure why it can't find it, but this isn't a problem when using PkgConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tried changing the Arrow package to find_package(Arrow REQUIRED CONFIG) and seems to work with that, so I'll update
| return config_->get<bool>(kServerVerify, true); | ||
| } | ||
|
|
||
| std::optional<std::string> FlightConfig::serverSslCertificate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was folly::Optional, changed to be consistent
| @@ -0,0 +1,251 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added these tests to verify adding a custom authenticator works
|
|
||
| target_link_libraries( | ||
| presto_flight_connector_infra_test presto_protocol | ||
| presto_flight_connector_test_lib GTest::gtest GTest::gtest_main ${GLOG}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added G stuff due to linker error
| virtual arrow::flight::Location getServerLocation(); | ||
|
|
||
| virtual void setFlightServerOptions( | ||
| arrow::flight::FlightServerOptions* serverOptions) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Server options now specified by subclasses
f6eefb6 to
4b83c44
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
| #include "velox/common/base/Exceptions.h" | ||
|
|
||
| namespace facebook::presto::connector::arrow_flight::auth { | ||
| namespace { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this to be in an anonymous namespace, instead of static
| const protocol::ConnectorSplit* const connectorSplit, | ||
| const protocol::SplitContext* splitContext) const { | ||
| const protocol::SplitContext* splitContext, | ||
| const std::map<std::string, std::string>& extraCredentials) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Rijin-N @elbinpallimalilibm is there a specific requirement to use these extraCredentials to authenticate a Flight client? I would think most users would prefer to authenticate the client with a token in the header, rather than use this. The Java Arrow Connector does not have this either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler the base Presto (Java) Flight Connector is a template and not a full connector, and elects not to do authentication at all. The idea is that the authentication mechanism is largely dependent on the specific Flight server, so the whole business of authentication should be dealt with by the specialized connectors.
The problem we need to solve is that different Presto users may have different sets of access permissions on the connected Arrow Flight server. In this case we need to "pass through" the Presto user's authentication to the connected Flight Server, which means we somehow need to send the credentials from the Presto coordinator to the Presto workers.
In Java, there is an Identity API which we can use through connectorSession.getIdentity().getExtraCredentials().
IBM's specialization of the base Flight connector uses this API to pass the user's token to the workers. This isn't possible in Prestissimo because Velox connectors have no concept of Identity at all, so we need to pass the token in the split.
This issue provides more context:
#22849
There's also a corresponding Velox issue:
facebookincubator/velox#10107
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the background @ashkrisk . This related issues you linked are proposing adding an Identity data structure that all prestissimo connectors can access, but it is so far unresolved. I think this is a better solution that simply passing through a map, so we should hold off of adding these extraCredentials until we can get the right API with an Identity. WDYT?
cc @majetideepak @aditi-pandit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler we need this to get the multi-user setup working (IBM's flavor of the Flight connector relies on it). You can leave this off for now as long as you're able to maintain the diff in IBM code until there's a better solution in OSS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashkrisk we need to make sure we have a discussion and proper design before adding APIs like this, since this also affects other connectors. I believe it was mentioned there could be a security concern if the credentials are added to the split, which is then serialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler do we need to get a meeting to talk about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with @majetideepak and @aditi-pandit , and have a possible way forward that I'm looking into, and I'll report back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler With regards to security, any method used to transfer the credentials from the coordinator to the worker requires serialization at some point. This includes the Identity framework available in Java. To ensure security, you need to ensure encrypted connections between all Presto nodes.
Obviously, it's better to have dedicated infra for this, so hopefully you can get together and work something out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan is to incorporate #22859 which will add the extraCredentials to the connector session properties. This will be available to the Flight data source when it is processing the split and can use them for authentication. I'll try to add a test for this if I'm able to.
4b83c44 to
49a3234
Compare
| } | ||
|
|
||
| bool FlightConfig::serverVerify() { | ||
| return config_->get<bool>(kServerVerify, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed default to "false" to be consistent with Java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want the default to be true for both Java and C++, otherwise by default there's no protection against server impersonation. Having this off is good for testing, but in production it's much better to leave it on unless there's a VERY good reason not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the default behavior for the Flight client is to raise an error if using TLS and a certificate is provided but verify is disabled, and you need to explicitly disable it when using TLS.
I didn't catch that the Java connector defaults the config to NULL, which translates to true. I think that's confusing and we should not make it nullable. I'll change Java and here to default to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the Java side in #24518 and changed the default here too
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BryanCutler. Did a quick round of the code.
| - presto-main/src/main/java/com/facebook/presto/connector/system/SystemTableLayoutHandle.java | ||
| - presto-main/src/main/java/com/facebook/presto/connector/system/SystemTransactionHandle.java | ||
| - presto-spi/src/main/java/com/facebook/presto/spi/function/AggregationFunctionMetadata.java | ||
| - presto-function-namespace-managers-common/src/main/java/com/facebook/presto/functionNamespace/JsonBasedUdfFunctionMetadata.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did JsonBasedUdfFunctionMetadata get dropped ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must have been my mistake, I'll put it back
|
|
||
| namespace facebook::presto::connector { | ||
|
|
||
| void registerAllPrestoConnectors() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Rename registerOptionalConnectors ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
registerConnectors will do as well.
We should also move the existing connector registrations here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do away with the connector namespace since the function name has Connectors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file names can also be Registration.h/.cpp since they are in the connectors folder.
presto-native-execution/presto_cpp/main/connectors/Registration.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the SystemConnector here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the SystemConnector here as well.
@majetideepak should the SystemConnector be under namespace facebook::presto::connector like ArrowFlightConnector?
Could you clarify if you mean the source files SystemConnector.cpp/h or just the registration?
edit: confirmed moving sources for connectors also
presto-native-execution/presto_cpp/main/connectors/arrow_flight/Macros.h
Show resolved
Hide resolved
| std::unique_ptr<velox::connector::ColumnHandle> | ||
| ArrowPrestoToVeloxConnector::toVeloxColumnHandle( | ||
| const protocol::ColumnHandle* column, | ||
| const TypeParser& typeParser) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap unused parameters in comments... /typeParser/
...tive-execution/presto_cpp/main/connectors/arrow_flight/tests/ArrowFlightConnectorTlsTest.cpp
Show resolved
Hide resolved
|
|
||
| std::unique_ptr<TestFlightServer> TestFlightServerTest::server; | ||
|
|
||
| TEST_F(TestFlightServerTest, basicTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't understand the purpose of this test. It doesn't seem to be related to the Connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure either, it's part of presto_flight_connector_infra_test tests, which is separate. It starts a server, adds data, then makes it's own client to get the flight stream - not using the connector. So it seems to be testing that the test server can serve Flight clients correctly, but that is also covered with the regular connector tests. @Rijin-N could you comment on this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to run a simple test to be confident that the Test Flight Server is working correctly. Now, any failures that happen in the actual connector tests can be safely assumed to be a problem with the connector and not with the server, potentially making things easier to debug.
This is like the difference between testing each component vs having only integration tests.
| {1, 12, 2, std::numeric_limits<int64_t>::max()})})); | ||
|
|
||
| auto idVec = makeFlatVector<int64_t>( | ||
| {1, 12, 2, std::numeric_limits<int64_t>::max()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract a variable for {1, 12, 2, std::numeric_limits<int64_t>::max()} so that it can be used in the input and result vector.
|
|
||
| namespace { | ||
|
|
||
| namespace velox = facebook::velox; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't seen this style of shorthand for namespaces in Prestissimo code. Can these be removed ?
| "missing columnHandle for column 'value'"); | ||
| } | ||
|
|
||
| TEST_F(FlightConnectorTest, dataSourceTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test names shouldn't have the word "Test" in them. Please fix the names.
0272b7b to
00ace62
Compare
|
Rebased and added additional tests for multiple batches and splits |
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler : The code is looking good. I have few minor comments. But overall I'm satisfied with the code. Will give an approval once Deepak approves all the build related changes.
| namespace facebook::presto::test { | ||
|
|
||
| template <typename T> | ||
| std::shared_ptr<arrow::Array> makeNumericArray( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Add an alias for std::shared_ptrarrow::Array to reuse
using ArrowArrayPtr = std::shared_ptr;
| const arrow::ArrayVector& arrays) { | ||
| VELOX_CHECK_EQ(names.size(), arrays.size()); | ||
|
|
||
| auto nrows = (!arrays.empty()) ? (arrays[0]->length()) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Rename variable to numRows.
|
|
||
| static const std::string kFlightConnectorId = "test-flight"; | ||
|
|
||
| class ArrowFlightConnectorTestBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this class abstracted ? This class can be combined with the child class below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was done to have a test base class that doesn't startup a server, but it's not used so I'll combine them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in IBM code we extend this class again to run an integration test against an actual Flight Server - this would be more difficult if the two classes are combined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashkrisk we can't add additional classes in open-source for the only purpose to be internal testing. The base class without the server only registers the connector, that could easily be done in the internal integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
| std::mutex mutex_; | ||
| }; | ||
|
|
||
| class TestingAuthenticator : public Authenticator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Names TestAuthenticator and TestAuthenticatorFactory are better than usage of verbs like Testing in the name.
...to-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightEchoQueries.java
Show resolved
Hide resolved
...-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightNativeQueries.java
Outdated
Show resolved
Hide resolved
...-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightNativeQueries.java
Outdated
Show resolved
Hide resolved
...-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightNativeQueries.java
Outdated
Show resolved
Hide resolved
presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueries.java
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/arrow_flight/ArrowFlightConnector.cpp
Outdated
Show resolved
Hide resolved
| # limitations under the License. | ||
| find_package(Arrow REQUIRED) | ||
| find_package(PkgConfig REQUIRED) | ||
| pkg_check_modules(ARROW_FLIGHT REQUIRED IMPORTED_TARGET GLOBAL arrow-flight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why choose PkgConfig approach and not how Velox finds Arrow? We should use one approach for consistency.
...rrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightIntegrationSmokeTest.java
Show resolved
Hide resolved
| -B _build/release \ | ||
| -GNinja \ | ||
| -DTREAT_WARNINGS_AS_ERRORS=1 \ | ||
| -DENABLE_ALL_WARNINGS=1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow flight is behind the flag but it isn't turned on here - RESTO_ENABLE_ARROW_FLIGHT_CONNECTOR
But I doubt we need a complete new job in the first place.
Turn on the flag in the prestocpp-linux-build-and-unit-test.yml and rename the test java file TestPrestoNativeArrowFlightQueries.java. Then it is picked up automatically.
We should add the build flag also to the prestocpp-linux-build.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow flight is behind the flag but it isn't turned on here
I do have the flag -DPRESTO_ENABLE_ARROW_FLIGHT_CONNECTOR=ON enabled here
rename the test java file TestPrestoNativeArrowFlightQueries.java. Then it is picked up automatically.
This is only if the test is moved to presto-native-execution right?
| @@ -0,0 +1,427 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are native queries and so should be in the test suite in presto-native-execution - see my other comment on how to change the workflow to get them run regularly without a new job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@czentgr : I gave this a thought as well ... But then felt that having native as well as java connector tests in the connector module would actually be a good approach. Both implementations cohesively work and test together.
Are there particular reservations about this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the Java Flight stuff is under the presto-base-arrow-flight module and only tested as part of the arrow-flight-tests workflow. I also thought about adding the native tests to presto-native-execution but there are a lot of things that would be needed from presto-base-arrow-flight which adds a lot to presto-native-execution for just a few tests. So it seemed like it was better to keep in compartmentalized in it's own module.
3b3ccfa to
a607747
Compare
The native Arrow Flight connector can be used to connect to any Arrow Flight enabled Data Source. The metadata layer is handled by the Presto coordinator and does not need to be re-implemented in C++. Any Java connector that inherits from `presto-base-arrow-flight` can use this connector as it's counterpart for the Prestissimo layer. Different Arrow-Flight enabled data sources can differ in authentication styles. A plugin-style interface is provided to handle such cases with custom authentication code by extending `arrow_flight::auth::Authenticator`. RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation Co-authored-by: Ashwin Kumar <[email protected]> Co-authored-by: Rijin-N <[email protected]> Co-authored-by: Nischay Yadav <[email protected]>
This adds e2e tests using the ArrowFlightQueryRunner to run native workers and test queries against a H2 Flight Producer data source. The CI workflow has been updated to add additional steps to build a native presto server with Arrow Flight connector enabled, and run the e2e Java tests with queries taken from AbstractTestNativeGeneralQueries that are compatible with the Flight data source.
a607747 to
d03d4ea
Compare
majetideepak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my comments have been addressed. Thanks @BryanCutler
|
|
||
| add_subdirectory(auth) | ||
|
|
||
| add_library(presto_flight_connector_utils INTERFACE Macros.h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is only possible from CMake 3.19. Prestissimo minimum CMake version requirement is 3.10. We should bump this up to 3.28 and align with Velox.
| } | ||
| } | ||
|
|
||
| public static Map<String, String> getNativeWorkerSystemProperties() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler : Can we use the functions from PrestoNativeQueryRunnerUtils for getting the properties and external worker launcher instead of repeating the code here ?
That would mean adding a dependency on presto-native-execution, but that should be fine right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that too, the problem is that PrestoNativeQueryRunnerUtils.getExternalWorkerLauncher is setup to use the hive catalog, among other things, so it can't be used as-is. There would have to be some refactoring done to make it usable by both runners. But there really isn't much code that is copied, since here we are setting specific properties for the native arrow connector, so I don't know if it's really worth it to try and reuse this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler : I see... That's fair for getExternalWorkerLauncher.
How about getNativeWorkerSystemProperties ? It would be good to use the one in PrestoNativeQueryRunnerUtils so that we don't have to keep the 2 methods in sync.
We could do that as a follow up though.
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BryanCutler


Description
Add Prestissimo support for Apache Arrow Flight connectors.
The native Arrow Flight connector can be used to connect to any Arrow Flight enabled Data Source. The metadata layer is handled by the Presto coordinator and does not need to be re-implemented in C++. Any Java connector that inherits from
presto-base-arrow-flightcan use this connector as it's counterpart for the Prestissimo layer.Different Arrow-Flight enabled data sources can differ in authentication styles. A plugin-style interface is provided to handle such cases with custom authentication code by extending
arrow_flight::auth::Authenticator.Motivation and Context
RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation
Continues from #24082
Impact
Arrow Flight based connector will be supported in Prestissimo.
Test Plan
Unit Tests set up a testing Arrow Flight server and exchange data using the new connector.
Contributor checklist
Release Notes