-
Notifications
You must be signed in to change notification settings - Fork 1.5k
refactor: Introduce ConnectorObjectFactory and HiveObjectFactory #13798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| /* | ||
| * Copyright (c) Facebook, Inc. and its affiliates. | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| #pragma once | ||
|
|
||
| namespace facebook::velox::connector { | ||
|
|
||
| // TODO: Add a demo connector plugin | ||
|
|
||
| constexpr const char* kFuzzerConnectorName = "fuzzer"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we have these listed here? Do we expect to add all connectors here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@mbasmanova We need the connector names in the common layer, because the ConnectorFactory or Connector registries are all keyed by the connector name. All users in outside modules shall be talking with the connector common layer only, instead of refering to the name in specific connectors like I believe this is the same idea why dwio::common::FileFormat was introduced. With the FileFormat enum in dwio::common, users outside of dwio module can just call If you don't like putting them in another file ConnectorNames.h, we can put them as an enum in Connector.h.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mbasmanova Btw I made refactor: Introduce ConnectorNames.h in velox/connectors a separate commit in https://github.com/facebookincubator/velox/pull/14687/commits. It can also be a simple PR by itself. Do you want me to do that? |
||
| constexpr const char* kHiveConnectorName = "hive"; | ||
| constexpr const char* kIcebergConnectorName = "iceberg"; | ||
| constexpr const char* kTpchConnectorName = "tpch"; | ||
|
|
||
| } // namespace facebook::velox::connector | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| /* | ||
| * Copyright (c) Facebook, Inc. and its affiliates. | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| #include "velox/connectors/ConnectorObjectFactory.h" | ||
|
|
||
| namespace facebook::velox::connector { | ||
|
|
||
| ConnectorObjectFactory::~ConnectorObjectFactory() = default; | ||
|
|
||
| std::unordered_map<std::string, std::shared_ptr<ConnectorObjectFactory>>& | ||
| connectorObjectFactories() { | ||
| static std:: | ||
| unordered_map<std::string, std::shared_ptr<ConnectorObjectFactory>> | ||
| factories; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. factories -> kFactories |
||
| return factories; | ||
| } | ||
|
|
||
| bool registerConnectorObjectFactory( | ||
| std::shared_ptr<ConnectorObjectFactory> factory) { | ||
| bool ok = | ||
| connectorObjectFactories().insert({factory->name(), factory}).second; | ||
| VELOX_CHECK( | ||
| ok, | ||
| "ConnectorObjectFactory with name '{}' is already registered", | ||
| factory->name()); | ||
| return true; | ||
| } | ||
|
|
||
| bool hasConnectorObjectFactory(const std::string& connectorName) { | ||
| return connectorObjectFactories().count(connectorName) == 1; | ||
| } | ||
|
|
||
| bool unregisterConnectorObjectFactory(const std::string& connectorName) { | ||
| auto count = connectorObjectFactories().erase(connectorName); | ||
| return count == 1; | ||
| } | ||
|
|
||
| std::shared_ptr<ConnectorObjectFactory> getConnectorObjectFactory( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these missing from the .h file?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mbasmanova Thanks for the nice catch. I'm so sorry that this commit was indeed incomplete, and I have added the missing signatures in the .h in this PR. The reason why they were missing was that I was thinking to replace this commit with misc: Enhance ConnectorFactory interface , which does not need a separate ConnectorObjectFactory. Could you please take a look to see which way is preferred? If you prefer the other commit, I can pick it over here. |
||
| const std::string& connectorName) { | ||
| auto it = connectorObjectFactories().find(connectorName); | ||
| VELOX_CHECK( | ||
| it != connectorObjectFactories().end(), | ||
| "ConnectorObjectFactory with name '{}' not registered", | ||
| connectorName); | ||
| return it->second; | ||
| } | ||
|
|
||
| } // namespace facebook::velox::connector | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,91 @@ | ||
| /* | ||
| * Copyright (c) Facebook, Inc. and its affiliates. | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| #pragma once | ||
|
|
||
| #include <shared_mutex> | ||
| #include <string> | ||
|
|
||
| #include "velox/connectors/Connector.h" | ||
|
|
||
| namespace facebook::velox::connector { | ||
|
|
||
| class ConnectorObjectFactory { | ||
| public: | ||
| ConnectorObjectFactory(const std::string& name) : name_(name) {} | ||
|
|
||
| virtual ~ConnectorObjectFactory(); | ||
|
|
||
| const std::string& name() const { | ||
| return name_; | ||
| } | ||
|
|
||
| virtual std::shared_ptr<ConnectorSplit> makeSplit( | ||
| const std::string& connectorId, | ||
| const folly::dynamic& options = {}) const { | ||
| VELOX_UNSUPPORTED("ConnectorSplit not supported by connector", connectorId); | ||
| } | ||
|
|
||
| virtual std::shared_ptr<connector::ColumnHandle> makeColumnHandle( | ||
| const std::string& connectorId, | ||
| const std::string& name, | ||
| const TypePtr& type, | ||
| const folly::dynamic& options = {}) const { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yingsu00 Ying, if the goal is to make tests build with hive disabled, then, why can't we just fix cmake to not build tests that depend on hive when hive is disabled? Are you concerned that this will disable virtually all tests? Why is this a problem? Do you believe that many tests do not need to depend on Hive? If so, can they be re-written to use Values nodes instead of table scan? I know that at least some depends do have hard dependency on Hive because they check filter pushdown behavior. I don't think we can expect all connectors to support that. Using folly::dynamic to specify connector properties in tests is not ideal. Wondering if that what @Yuhta referred to as "dynamic dispatch". It would be easy to miss a property or mistype its name or value and the error will only appear at runtime. How are you thinking about this? Can we provide an example test? |
||
| VELOX_UNSUPPORTED( | ||
| "connector::ColumnHandle not supported by connector", connectorId); | ||
| } | ||
|
|
||
| virtual std::shared_ptr<ConnectorTableHandle> makeTableHandle( | ||
| const std::string& connectorId, | ||
| const std::string& tableName, | ||
| std::vector<ColumnHandlePtr> columnHandles, | ||
| const folly::dynamic& options) const { | ||
| VELOX_UNSUPPORTED( | ||
| "ConnectorTableHandle not supported by connector", connectorId); | ||
| } | ||
|
|
||
| virtual std::shared_ptr<ConnectorInsertTableHandle> makeInsertTableHandle( | ||
| const std::string& connectorId, | ||
| std::vector<ColumnHandlePtr> inputColumns, | ||
| std::shared_ptr<const ConnectorLocationHandle> locationHandle, | ||
| const folly::dynamic& options = {}) const { | ||
| VELOX_UNSUPPORTED( | ||
| "ConnectorInsertTableHandle not supported by connector", connectorId); | ||
| } | ||
|
|
||
| virtual std::shared_ptr<ConnectorLocationHandle> makeLocationHandle( | ||
| const std::string& connectorId, | ||
| ConnectorLocationHandle::TableType tableType = | ||
| ConnectorLocationHandle::TableType::kNew, | ||
| const folly::dynamic& options = {}) const { | ||
| VELOX_UNSUPPORTED( | ||
| "ConnectorLocationHandle not supported by connector", connectorId); | ||
| } | ||
|
|
||
| private: | ||
| const std::string name_; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no ctor. How will this be set? |
||
| }; | ||
|
|
||
| bool registerConnectorObjectFactory( | ||
| std::shared_ptr<ConnectorObjectFactory> factory); | ||
|
|
||
| bool hasConnectorObjectFactory(const std::string& connectorName); | ||
|
|
||
| bool unregisterConnectorObjectFactory(const std::string& connectorName); | ||
|
|
||
| std::shared_ptr<ConnectorObjectFactory> getConnectorObjectFactory( | ||
| const std::string& connectorName); | ||
|
|
||
| } // namespace facebook::velox::connector | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |
|
|
||
| #include "velox/common/compression/Compression.h" | ||
| #include "velox/connectors/Connector.h" | ||
| #include "velox/connectors/ConnectorNames.h" | ||
| #include "velox/connectors/hive/HiveConfig.h" | ||
| #include "velox/connectors/hive/PartitionIdGenerator.h" | ||
| #include "velox/connectors/hive/TableHandle.h" | ||
|
|
@@ -31,24 +32,18 @@ class LocationHandle; | |
| using LocationHandlePtr = std::shared_ptr<const LocationHandle>; | ||
|
|
||
| /// Location related properties of the Hive table to be written. | ||
| class LocationHandle : public ISerializable { | ||
| class LocationHandle : public ConnectorLocationHandle { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unrelated change? |
||
| public: | ||
| enum class TableType { | ||
| /// Write to a new table to be created. | ||
| kNew, | ||
| /// Write to an existing table. | ||
| kExisting, | ||
| }; | ||
|
|
||
| LocationHandle( | ||
| std::string targetPath, | ||
| std::string writePath, | ||
| TableType tableType, | ||
| std::string targetFileName = "") | ||
| : targetPath_(std::move(targetPath)), | ||
| std::string targetFileName = "", | ||
| const std::string& connectorId = kHiveConnectorName) | ||
| : ConnectorLocationHandle(connectorId, tableType), | ||
| targetPath_(std::move(targetPath)), | ||
| targetFileName_(std::move(targetFileName)), | ||
| writePath_(std::move(writePath)), | ||
| tableType_(tableType) {} | ||
| writePath_(std::move(writePath)) {} | ||
|
|
||
| const std::string& targetPath() const { | ||
| return targetPath_; | ||
|
|
@@ -62,16 +57,12 @@ class LocationHandle : public ISerializable { | |
| return writePath_; | ||
| } | ||
|
|
||
| TableType tableType() const { | ||
| return tableType_; | ||
| } | ||
| std::string toString() const override; | ||
|
|
||
| std::string toString() const; | ||
| folly::dynamic serialize() const override; | ||
|
|
||
| static void registerSerDe(); | ||
|
|
||
| folly::dynamic serialize() const override; | ||
|
|
||
| static LocationHandlePtr create(const folly::dynamic& obj); | ||
|
|
||
| static const std::string tableTypeName(LocationHandle::TableType type); | ||
|
|
@@ -85,8 +76,6 @@ class LocationHandle : public ISerializable { | |
| const std::string targetFileName_; | ||
| // Staging directory path. | ||
| const std::string writePath_; | ||
| // Whether the table to be written is new, already existing or temporary. | ||
| const TableType tableType_; | ||
| }; | ||
|
|
||
| class HiveSortingColumn : public ISerializable { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add = default right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like an inline {} implementation for a virtual destructor, =default() in headers is also an inline definition of the destructor. With an inline virtual destructor, every translation unit that includes the header can emit its own destructor body, vtable, RTTI, etc. This is bad for a widely used common interface, because it may be consumed by multiple connectors or plugins, and each of them would contain one copy of the destructor and vtable. This means any change in the base ConnectorLocationHandle destructor would cause all connectors to be re-compiled. It could also make the build time slower, becasue even if the linker discards duplicates, intermediate .o files are bigger and link times are slower. It may introduce more problems if we enable dynamic linking in the future, like cross .so library cast may fail.
If we have the destructor definition in the cpp file, then changes to its implementation won't cause the header to change, and all downstream TUs don't need to be re-compiled, but just re-linked. It also forces a single vtable and type_info even across library boundaries. Even though we are doing static linking now, it would help improve link time.
So it's better to put all virtual destructor definitions in Connector.cpp. I intended to do it if this PR can get merged. There are some other minor cleanups needed in Connector.h/.cpp, e.g. some structs have toString(), some not. Some structs has connectorId_, some not. Then it'll be desirable to make the interface versioned but that's a future topic.