[Velox] Table writer 2: Add write protocol interface#2845
[Velox] Table writer 2: Add write protocol interface#2845gggrace14 wants to merge 2 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
@gggrace14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
91537bb to
284a9fd
Compare
3f78584 to
27f1fe7
Compare
mbasmanova
left a comment
There was a problem hiding this comment.
@gggrace14 Some questions and comments.
velox/connectors/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Any particular reason not to add WriteProtocol.cpp to velox_connector?
velox/connectors/WriteProtocol.h
Outdated
There was a problem hiding this comment.
naming: getCommitStrategy -> commitStrategy
velox/connectors/WriteProtocol.h
Outdated
There was a problem hiding this comment.
Any particular reason this API doesn't take CommitStrategy? It would be more natural to specify both CommitStrategy and WriteProtocol when registering as opposed to fetching CommitStrategy from the protocol.
There was a problem hiding this comment.
Okay, used to pass CommitStrategy to this func together wi/ the WriteProtocol. Changing it back to the more natural way.
velox/connectors/WriteProtocol.h
Outdated
There was a problem hiding this comment.
No need to define this again. The base implementation is the same.
There was a problem hiding this comment.
I still see this method here. I assume it can be removed, no?
velox/connectors/WriteProtocol.h
Outdated
There was a problem hiding this comment.
Would you document that it is valid to return nullptr?
velox/connectors/hive/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Any particular reason not to include HiveWriteProtocol.cpp into velox_hive_connector ? Why "OBJECT"?
There was a problem hiding this comment.
Revising. I'd like to include it into velox_hive_connector, which is easier. Just saw HivePartitionFunction.cpp was in a separate item
velox/connectors/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Any particular reason not to add WriteProtocol.cpp to velox_connector?
There was a problem hiding this comment.
Revising. Adding it to velox_connector makes it easier.
There was a problem hiding this comment.
Would you document this constructor and explain when the caller should specify writeFileName and writeDirectory? Should there be some checks based on updateMode?
There was a problem hiding this comment.
It would really be nice to document this constructor. Otherwise, it will be hard for future readers / users of the API to understand how to use it properly.
There was a problem hiding this comment.
Sorry, missed this comment previously. Moving the documentation from data members to this constructor. Relations between write & target and updateMode might not be straightforward, which is determined by a concrete implementation of WriteProtocol::getWriterParameters(). So might not want to have checks on the updateMode.
velox/exec/TableWriter.h
Outdated
There was a problem hiding this comment.
Why do we need to expose these?
b827fee to
3fd5827
Compare
mbasmanova
left a comment
There was a problem hiding this comment.
@gggrace14 Looks good to me % some questions.
There was a problem hiding this comment.
Nice. Would you add a comment explaining this config?
There was a problem hiding this comment.
It would really be nice to document this constructor. Otherwise, it will be hard for future readers / users of the API to understand how to use it properly.
a3f72b8 to
697e52d
Compare
Pass LocationHandle to HiveInsertTableHandle, and pass HiveInsertTableHandle instead of an actual write file name to HiveDataSink. This allows Velox callers to pass more info to HiveDataSink that is used by TableWriter. Thus this gets TableWriter ready for more flexible write and commit strategies. Allow multiple writers in HiveDataSink, to make HiveDataSink and TableWriter ready for partitioned table writing. For now we only keep one writer. Also make HiveDataSink generate a random file name interally, rather than relying Velox callers to pass in a file name. Users could extend this behavior with WriteProtocols, whose support will come in next.
Add WriteProtocol interface to allow systems to implement different write and commit behaviors, including write & target directories and file names, commit actions and output, etc. Support registering WriteProtocols by CommitStrategy. Add two base implementations of WriteProtocols for Hive connector. getWriterParameters() is where table writer can get required parameters including write & target directories and file names. commit() can be extended to perform commit actions.
697e52d to
80a4a97
Compare
Without OBJECT label for velox_hive_conector, build of presto_server sees error like undefined reference to HiveTableHandle::HiveTableHandle(). facebookincubator#2897 added OBJECT, but the next PR facebookincubator#2845 removed it according to change history. Actually PR facebookincubator#2845 did not touch OBJECT according to the PR page. It is likely due to file merge of CMakeList.txt.
Summary: Without OBJECT label for velox_hive_conector, build of presto_server sees error like undefined reference to HiveTableHandle::HiveTableHandle(). #2897 added OBJECT, but the next PR #2845 removed it according to change history. Actually PR #2845 did not touch OBJECT according to the PR page. It is likely due to file merge of CMakeList.txt. Pull Request resolved: #3094 Reviewed By: kgpai, mbasmanova Differential Revision: D41031123 Pulled By: gggrace14 fbshipit-source-id: f50b53b01e5ab2296cd7cb8ee826cd41cf5f4916
Add WriteProtocol interface to allow systems to implement different
write and commit behaviors, including write & target directories
and file names, commit actions and output, etc. Support registering
WriteProtocols by CommitStrategy.
Add two base implementations of WriteProtocols for Hive connector.
getWriterParameters() is where table writer can get required parameters
including write & target directories and file names.
commit() can be extended to perform commit actions.