Skip to content

[Velox] Table writer 2: Add write protocol interface#2845

Closed
gggrace14 wants to merge 2 commits intofacebookincubator:mainfrom
gggrace14:write_protocol
Closed

[Velox] Table writer 2: Add write protocol interface#2845
gggrace14 wants to merge 2 commits intofacebookincubator:mainfrom
gggrace14:write_protocol

Conversation

@gggrace14
Copy link
Copy Markdown
Contributor

Add WriteProtocol interface to allow systems to implement different
write and commit behaviors, including write & target directories
and file names, commit actions and output, etc. Support registering
WriteProtocols by CommitStrategy.

Add two base implementations of WriteProtocols for Hive connector.
getWriterParameters() is where table writer can get required parameters
including write & target directories and file names.
commit() can be extended to perform commit actions.

@netlify
Copy link
Copy Markdown

netlify bot commented Oct 14, 2022

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 80a4a97
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/635a084e9049d70008a70201

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2022
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@gggrace14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@gggrace14 gggrace14 requested a review from mbasmanova October 14, 2022 09:21
@gggrace14 gggrace14 marked this pull request as ready for review October 14, 2022 09:21
@gggrace14 gggrace14 requested a review from spershin October 14, 2022 09:21
@gggrace14 gggrace14 force-pushed the write_protocol branch 2 times, most recently from 91537bb to 284a9fd Compare October 14, 2022 19:41
@gggrace14 gggrace14 changed the title [Velox] Add write protocol interface [Velox] Table writer 2: Add write protocol interface Oct 14, 2022
@gggrace14 gggrace14 force-pushed the write_protocol branch 6 times, most recently from 3f78584 to 27f1fe7 Compare October 19, 2022 07:43
Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggrace14 Some questions and comments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason not to add WriteProtocol.cpp to velox_connector?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: getCommitStrategy -> commitStrategy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason this API doesn't take CommitStrategy? It would be more natural to specify both CommitStrategy and WriteProtocol when registering as opposed to fetching CommitStrategy from the protocol.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, used to pass CommitStrategy to this func together wi/ the WriteProtocol. Changing it back to the more natural way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to define this again. The base implementation is the same.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see this method here. I assume it can be removed, no?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you document that it is valid to return nullptr?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason not to include HiveWriteProtocol.cpp into velox_hive_connector ? Why "OBJECT"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revising. I'd like to include it into velox_hive_connector, which is easier. Just saw HivePartitionFunction.cpp was in a separate item

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason not to add WriteProtocol.cpp to velox_connector?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revising. Adding it to velox_connector makes it easier.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you document this constructor and explain when the caller should specify writeFileName and writeDirectory? Should there be some checks based on updateMode?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would really be nice to document this constructor. Otherwise, it will be hard for future readers / users of the API to understand how to use it properly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this comment previously. Moving the documentation from data members to this constructor. Relations between write & target and updateMode might not be straightforward, which is determined by a concrete implementation of WriteProtocol::getWriterParameters(). So might not want to have checks on the updateMode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to expose these?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing

@gggrace14 gggrace14 force-pushed the write_protocol branch 3 times, most recently from b827fee to 3fd5827 Compare October 21, 2022 03:39
Copy link
Copy Markdown
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggrace14 Looks good to me % some questions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Would you add a comment explaining this config?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: used -> be used

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would really be nice to document this constructor. Otherwise, it will be hard for future readers / users of the API to understand how to use it properly.

@gggrace14 gggrace14 force-pushed the write_protocol branch 3 times, most recently from a3f72b8 to 697e52d Compare October 27, 2022 02:47
Pass LocationHandle to HiveInsertTableHandle, and
pass HiveInsertTableHandle instead of an actual write file name
to HiveDataSink. This allows Velox callers to pass more info
to HiveDataSink that is used by TableWriter. Thus this gets
TableWriter ready for more flexible write and commit strategies.

Allow multiple writers in HiveDataSink, to make HiveDataSink
and TableWriter ready for partitioned table writing. For now
we only keep one writer.

Also make HiveDataSink generate a random file name interally,
rather than relying Velox callers to pass in a file name. Users
could extend this behavior with WriteProtocols, whose support
will come in next.
Add WriteProtocol interface to allow systems to implement different
write and commit behaviors, including write & target directories
and file names, commit actions and output, etc. Support registering
WriteProtocols by CommitStrategy.

Add two base implementations of WriteProtocols for Hive connector.
getWriterParameters() is where table writer can get required parameters
including write & target directories and file names.
commit() can be extended to perform commit actions.
@gggrace14 gggrace14 deleted the write_protocol branch October 27, 2022 23:39
gggrace14 added a commit to gggrace14/velox that referenced this pull request Nov 4, 2022
Without OBJECT label for velox_hive_conector, build of presto_server
sees error like undefined reference to
HiveTableHandle::HiveTableHandle().

facebookincubator#2897 added OBJECT,
but the next PR facebookincubator#2845
removed it according to change history.

Actually PR facebookincubator#2845
did not touch OBJECT according to the PR page. It is likely due
to file merge of CMakeList.txt.
facebook-github-bot pushed a commit that referenced this pull request Nov 4, 2022
Summary:
Without OBJECT label for velox_hive_conector, build of presto_server sees error like undefined reference to
HiveTableHandle::HiveTableHandle().

#2897 added OBJECT, but the next PR #2845 removed it according to change history.

Actually PR #2845 did not touch OBJECT according to the PR page. It is likely due to file merge of CMakeList.txt.

Pull Request resolved: #3094

Reviewed By: kgpai, mbasmanova

Differential Revision: D41031123

Pulled By: gggrace14

fbshipit-source-id: f50b53b01e5ab2296cd7cb8ee826cd41cf5f4916
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants