-
Notifications
You must be signed in to change notification settings - Fork 5.5k
[native] Add Arrow Flight Connector #24082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This PR requires a velox support PR facebookincubator/velox#11588, to install Arrow Flight dependencies. |
|
Saved that user @Rijin-N is from IBM |
presto-native-execution/Makefile
Outdated
| CMAKE_FLAGS += -DENABLE_ALL_WARNINGS=${ENABLE_WALL} | ||
| CMAKE_FLAGS += -DCMAKE_PREFIX_PATH=$(CMAKE_PREFIX_PATH) | ||
| CMAKE_FLAGS += -DCMAKE_BUILD_TYPE=$(BUILD_TYPE) | ||
| CMAKE_FLAGS += -DPRESTO_ENABLE_ARROW_FLIGHT_CONNECTOR=$(PRESTO_ENABLE_ARROW_FLIGHT_CONNECTOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should stop adding these and instead use EXTRA_CMAKE_FLAGS externally when invoking make
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
presto-native-execution/presto_cpp/main/connectors/arrow_flight/ArrowFlightConnector.cpp
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/arrow_flight/ArrowFlightConnector.cpp
Outdated
Show resolved
Hide resolved
|
|
||
| CallOptionsAddHeaders callOptsAddHeaders{}; | ||
| FlightCallOptions& callOpts = callOptsAddHeaders; | ||
| AddCallHeaders& headerWriter = callOptsAddHeaders; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just add headers directly to the FlightCallOptions, you don't need to use the AddCallHeaders interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here was to have an append-only interface for adding headers to the FlightCallOptions.
Without this, the Authenticator interface would need to accept a mutable reference to FlightCallOptions instead, which theoretically allows the authenticator to modify other parts of the FlightCallOptions, which is not desirable.
It's also less confusing when creating Authenticator implementations, since you don't need to worry about which parts of the FlightCallOptions should or should not be changed.
presto-native-execution/presto_cpp/main/connectors/arrow_flight/ArrowFlightConnector.cpp
Outdated
Show resolved
Hide resolved
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some common themes
i) Please add a copyright header to all the new files added in this PR.
ii) Please follow the test case naming as per conventions in https://github.com/facebookincubator/velox/blob/main/CODING_STYLE.md#naming-conventions
| set_property(TARGET presto_server_lib PROPERTY JOB_POOL_LINK | ||
| presto_link_job_pool) | ||
|
|
||
| if(PRESTO_ENABLE_ARROW_FLIGHT_CONNECTOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines should be in presto_cpp/main/connectors/arrow_flight/CMakeLists.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| std::make_unique<IcebergPrestoToVeloxConnector>("iceberg")); | ||
| registerPrestoToVeloxConnector( | ||
| std::make_unique<TpchPrestoToVeloxConnector>("tpch")); | ||
| #ifdef PRESTO_ENABLE_ARROW_FLIGHT_CONNECTOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous entries are for the base Velox connectors that we will mostly add by default.
Like in other parts of this code, we should add this registration in ArrowFlightConnector.cpp/.hpp if possible.
Another option is to add a ConnectorRegistration.h(cpp) file in presto_cpp/main/connectors folder that includes all the registrations of connectors in its sub-directories and only call that function from here.
e.g say for Windowfunctions in prestosql there is https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/window/WindowFunctionsRegistration.h that has all individual function registrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| return std::make_unique<protocol::tpch::TpchConnectorProtocol>(); | ||
| } | ||
|
|
||
| #ifdef PRESTO_ENABLE_ARROW_FLIGHT_CONNECTOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic should be moved to a file in presto_cpp/main/arrow_flight (ArrowPrestoToVeloxConnector.h/.cpp). The registration logic should be part of ConnectorRegistration file proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| for (const auto& protocolSplit : source.splits) { | ||
| auto split = toVeloxSplit(protocolSplit); | ||
| if (split.hasConnectorSplit()) { | ||
| // Since the extra credential data is coming from TaskUpdateRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be such special case code in TaskManager.cpp.
The more appropriate way to do this is to add the extraCredentials as a parameter in the toVeloxSplit logic and move this specific logic over to the ArrowToPrestoVeloxConnector::toVeloxSplit method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| std::unique_ptr<StaticFlightServer> StaticFlightServerTest::server; | ||
|
|
||
| TEST_F(StaticFlightServerTest, ServerTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test names start with lower case. Maybe you can call this test "basic" inline with other tests in the code-base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| std::unique_ptr<StaticFlightServer> StaticFlightServerTest::server; | ||
|
|
||
| TEST_F(StaticFlightServerTest, ServerTest) { | ||
| auto sampleTable = makeArrowTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its important to add other types as well... Would be great to demo all the scalar and complex types that are supported by this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The StaticFlightServer is just a thin wrapper around the basic Flight Server provided by the Arrow Flight SDK. Testing this server against specific datatypes is something better reserved for the unit tests in the Arrow C++ SDK. The test here is just to make sure that our wrapper is working and we're able to perform simple operations.
Note that the StaticFlightServer isn't actually used in the Flight connector code - it is only used in other unit tests for validating the Flight connector against a "real" server.
That said, we do test for all the different types supported by the Flight connector in ArrowFlightConnectorDataTypeTest.cpp. These test the full flow of data from the StaticFlightServer to the connector and out. We're testing with ints, reals, string types, date and time types etc.
| {FlightConfig::kServerVerify, "true"}})) {} | ||
| }; | ||
|
|
||
| TEST_F(FlightConnectorTlsNoCertTest, TlsNoCertTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change test names to start with lowercase letters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| FlightConnectorTlsNoCertTest() | ||
| : FlightConnectorTlsTestBase(std::make_shared<velox::config::ConfigBase>( | ||
| std::unordered_map<std::string, std::string>{ | ||
| {FlightConfig::kServerVerify, "true"}})) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is always a bad config ? Can we error this at the outset itself ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessarily a bad config. We could be dealing with servers which use certificates signed by trusted CAs, making additional certificates unnecessary.
In our case, this is a purely negative test because our test server is set up with a self-signed certificate, and it's impractical to acquire a certificate signed by a trusted CA just for this purpose.
| .assertResults(makeRowVector({decimalVecBigInt})); | ||
| } | ||
|
|
||
| TEST_F(FlightConnectorDataTypeTest, GeneralTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename "allTypes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
e51be47 to
9f48cec
Compare
The native Arrow Flight connector can be used to connect to any Arrow Flight enabled Data Source. The metadata layer is handled by the Presto coordinator and does not need to be re-implemented in C++. Any Java connector that inherits from `presto-base-arrow-flight` can use this connector as it's counterpart for the Prestissimo layer. Different Arrow-Flight enabled data sources can differ in authentication styles. A plugin-style interface is provided to handle such cases with custom authentication code. RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation Co-authored-by: Ashwin Kumar <[email protected]> Co-authored-by: Rijin-N <[email protected]> Co-authored-by: Nischay Yadav <[email protected]>
9f48cec to
bd02070
Compare
| } | ||
|
|
||
| RowVectorPtr FlightDataSource::projectOutputColumns( | ||
| std::shared_ptr<arrow::RecordBatch> input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be const&.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| using namespace velox; | ||
| using namespace velox::connector; | ||
|
|
||
| // wrapper for CallOptions which doesn't add any members variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: when leaving comments make sure to fllow the comment guidelines
- must be complete sentence (period at end and usually capitalization)
etc
This occurs multiple times in a few files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| uint64_t completedRows_ = 0; | ||
| uint64_t completedBytes_ = 0; | ||
| std::shared_ptr<auth::Authenticator> authenticator_; | ||
| velox::memory::MemoryPool* pool_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can be velox::memory::MemoryPool* const pool_;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| explicit ArrowFlightConnectorFactory(const char* name) | ||
| : ConnectorFactory{name} {} | ||
|
|
||
| ArrowFlightConnectorFactory(const char* name, const char* authenticatorName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use std::string_view instead of const char* for the authenticatorName?
Why do we need this constructor in the first place? It doesn't seem to be used at all.
The authenticator comes from the FlightConfig. Assuming there is a different Authenticator implementation why would it not be named/configured in the FlightConfig and instead be hard coded at registration time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this constructor in the first place?
The idea here is to be able to register different connector names which use the same Flight connector code, but use different Authenticators. This would avoid the need to have two separate config properties to fully qualify the constructor; instead of specifying connector.name and arrow-flight.authenticator.name, you simply specify connector.name and are good to go.
This would tie in a little better with the Java version, where authentication needs to be handled by subclasses of presto-base-arrow-flight, each of which are independent connectors.
Also note that the config arrow-flight.authenticator.name is not present in the base Java Flight connector. A concrete connector extension would need to introduce this config explicitly to be able to switch between between C++ Authenticators within the same connector.
https://github.com/prestodb/presto/pull/23032/files#diff-9b8ea7947728279eacfeb69fc0492cb2390c3dd65897bc9148ecf1d021d37332
| target_link_libraries( | ||
| presto_flight_connector | ||
| velox_connector | ||
| arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is arrow referring to here? It doesn't look like a target but something that cam from PkgConfig? How does this differ to ${ArrowFlight_LIBRARIES}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed arrow, Actually ${ArrowFlight_LIBRARIES} are enough here.
|
|
||
| #define AFC_REGISTER_AUTH_FACTORY(factory) \ | ||
| namespace { \ | ||
| static bool FB_ANONYMOUS_VARIABLE(g_ConnectorFactory) = ::facebook::presto:: \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding the reason for keeping this static bool around - to indicate a successful registration of NoOpAuthenticatorFactory. Once the factory is registered it is in the list. What's the point of having the static bool of the fact that it was registered stored?
The check that the registration was successful was already done (via VELOX_CHECK). Nothing is checking this bool again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is not meant to be used anywhere, which is why it's marked as an FB_ANONYMOUS_VARIABLE. This only reason this variable exists is to ensure that it is statically initialized at runtime, and thus ensure that the Authenticator in question is statically registered.
If C++ had static blocks similar to Java, we could have used that instead.
| : Connector{id}, | ||
| flightConfig_{std::make_shared<FlightConfig>(config)}, | ||
| authenticator_{ | ||
| // auth::getAuthenticatorFactory(config_->authenticatorName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| .assertResults(makeRowVector({boolVec})); | ||
| } | ||
|
|
||
| TEST_F(FlightConnectorDataTypeTest, IntType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better name: integerTypes.
Please see the other comment on naming for the tests (lowerCase) and maybe fix some of the names along the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| protected: | ||
| explicit FlightConnectorTestBase( | ||
| std::shared_ptr<velox::config::ConfigBase> config) | ||
| : config_{config} {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move(config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| install_abseil | ||
| install_grpc | ||
|
|
||
| wget_and_untar https://archive.apache.org/dist/arrow/arrow-${ARROW_VERSION}/apache-arrow-${ARROW_VERSION}.tar.gz arrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets use the source from github https://github.com/apache/arrow/archive/apache-arrow-15.0.0.tar.gz. apache.org can have issues with downloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not find any documentation for me to review.
Should "Arrow Flight connector" be added to the Supported Use Cases topic in the Presto C++ documentation?
|
Hi @majetideepak @BryanCutler @aditi-pandit @czentgr , We have addressed or responded with our reasoning to all the comments you provided. Could you please review and let us know if there is anything else that needs to be done. Thanks. Note: This format checker is working fine locally for us. We're not sure why it is failing in this PR. |
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the doc! Some suggestions for the data/README.md text.
presto-native-execution/presto_cpp/main/connectors/arrow_flight/tests/data/README.md
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/arrow_flight/tests/data/README.md
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/arrow_flight/tests/data/README.md
Outdated
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/connectors/arrow_flight/tests/data/README.md
Outdated
Show resolved
Hide resolved
czentgr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase too.
|
|
||
| std::optional<RowVectorPtr> FlightDataSource::next( | ||
| uint64_t size, | ||
| velox::ContinueFuture& future) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in the declaration comment out the unused variable name with new name unused to only leave the type signature.
/* unused */
Doesn't seem like it is used so lets indicate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| explicit ArrowFlightConnectorFactory( | ||
| const char* name, | ||
| const char* authenticatorName = nullptr) | ||
| : ConnectorFactory{name}, authenticatorName_{authenticatorName} {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Lets use () instead of {} for the initializer list for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Hi @Rijin-N @ashkrisk based on the discussion we had with Shweta and Sandhya, we are gonna take over this work, @BryanCutler will help move this PR over the finishing line. We keep everyone who worked on this PR as co-authors. Thank you. cc @tdcmeehan |
|
Hi @ethanyzhang, We have addressed all the comments so far. You can take over from here. |
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Used "View file" in GitHub to view the .md files as they will be published. Looks good, thank you!
|
Superseded by #24504 |
Description
Add Prestissimo support for Arrow Flight connectors.
The native Arrow Flight connector can be used to connect to any Arrow Flight enabled Data Source. The metadata layer is handled by the Presto coordinator and does not need to be re-implemented in C++. Any Java connector that inherits from
presto-base-arrow-flightcan use this connector as it's counterpart for the Prestissimo layer.Different Arrow-Flight enabled data sources can differ in authentication styles. A plugin-style interface is provided to handle such cases with custom authentication code.
Motivation and Context
RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0004-arrow-flight-connector.md#prestissimo-implementation
Impact
Arrow Flight based connectors can be used with Prestissimo
Test Plan
Unit Tests set up a mock arrow flight server and exchange data using the new connector.
Contributor checklist
a GitHub Issuean RFC is referenced.Release Notes
Please follow release notes guidelines and fill in the release notes below.