Add documentation for Hudi connector#13753
Conversation
|
@findinpath @electrum Please also review the documentation. |
There was a problem hiding this comment.
This documentation is primarily intended to Trino users and not Trino contributors.
Please consider adding the design details of the connector in a README.md under trino-hudi module.
There was a problem hiding this comment.
Feel free to add the details directly in trino-hudi readme and not to link an external doc because some of the design details may evolve in the months/years to come.
There was a problem hiding this comment.
Makes sense. I'll add it to readme.
There was a problem hiding this comment.
These details are superfluos.
There was a problem hiding this comment.
I use delta connector doc for inspiration https://trino.io/docs/current/connector/delta-lake.html#requirements
There was a problem hiding this comment.
We are speaking here about the metadata columns and not regular table columns. Please add a note what those metadata columns are. Do reconsider having metadata columns in the Hudi connector (same as in Hive - https://trino.io/docs/current/connector/hive.html#special-columns). Let's strive for consistency within Trino and not for consistency for keeping the status quo on Hudi's approach of exposing by default metadata columns as regular columns.
There was a problem hiding this comment.
Fair enough. It's not a blocker for us. Let me think a bit more about this. I want to also discuss this internally with the team as well.
There was a problem hiding this comment.
Could you please add a link for the Hudi sync tool ?
There was a problem hiding this comment.
How is the hiding of non-Hudi being done?
This may create some unwanted performance drop while listing tables because the tables need to be checked one by one for format.
There was a problem hiding this comment.
actually not yet. i'll remove this part.
There was a problem hiding this comment.
Please add a Supported file types section
https://trino.io/docs/current/connector/hive.html#supported-file-types
There was a problem hiding this comment.
Do you care to mention any limitations (if known) ?
e.g. : Hive has a Hive 3 related limitations section https://trino.io/docs/current/connector/hive.html#hive-3-related-limitations
There was a problem hiding this comment.
lets not talk about limitations .. we generally only document what works .. not all the stuff that doesnt work
There was a problem hiding this comment.
Feel free to add a more detailed description here.
Not everybody is familiar with the CoW, MoR concepts.
Also the Snapshot queries and Read optimized queries are very "Hudi" specific.
There was a problem hiding this comment.
I linked a Hudi concept doc. Let me know if you think adding a few sentences here would be more preferable.
There was a problem hiding this comment.
BTW , talk on the exposed virtual tables table_name_rt , table_name_ro and what they are.
There was a problem hiding this comment.
Could you please add a section related to friendliness of Hudi to Hive?
Specifically, document how the hive connector could read (if possible) a Hudi table.
mosabua
left a comment
There was a problem hiding this comment.
Good start .. a couple of small things to update and then might be close to get live.
There was a problem hiding this comment.
| example ``etc/catalog/hudi.properties``, that references the ``hudi`` | |
| example ``etc/catalog/example.properties``, that references the ``hudi`` |
There was a problem hiding this comment.
To create a catalog that uses the Hudi connector,
There was a problem hiding this comment.
| are accessed using index.Only applicable to Parquet file format. | |
| are accessed using the index. Only applicable to Parquet file format. |
There was a problem hiding this comment.
Reading Hudi tables with the Hive connector
There was a problem hiding this comment.
can also be accessed with a catalog using the Hive connector.
There was a problem hiding this comment.
Dont link to mvnrepository .. instead use the official maven repo or the official maven repo search
|
@codope please let us know if you need any help to proceed further with this. Ideally we would like to get documentation merged soon.. ideally before the Trino Community Broadcast episode about Hudi.. |
|
@mosabua Thanks for reviewing. Sorry for the late response. I am on call which ends this Monday. I will address the comments and update the PR by Tuesday this week. |
|
@mosabua I have addressed your feedback and updated the PR. |
mosabua
left a comment
There was a problem hiding this comment.
Some more minor changes but essentially this a close to ready for a first merge.
There was a problem hiding this comment.
| tables synced to Hive metastore. | |
| tables with relevant metadata in a Hive metastore. |
or even
| tables synced to Hive metastore. | |
| tables. |
There was a problem hiding this comment.
| To use Hudi connector, you need: | |
| To use the Hudi connector, you need: |
There was a problem hiding this comment.
Reword as in other connector - for example https://trino.io/docs/current/connector/postgresql.html#sql-support
and confirm that the linked commands work .. otherwise we have to create a list of individual statements.
There was a problem hiding this comment.
| Merge On Read Read Optimized Queries | |
| Merge on read Read optimized queries |
There was a problem hiding this comment.
lets not talk about limitations .. we generally only document what works .. not all the stuff that doesnt work
There was a problem hiding this comment.
not sure what this sentence says
There was a problem hiding this comment.
I have rephrased the sentence. I am referring to the table in Hudi Quickstart documentation as Hudi users are familiar with this table. For anyone landing here directly, without even trying out Hudi, then the quickstart link kinda redirects them to first create Hudi table, sync metadata to hive metastore and then run these queries.
|
@mosabua I have updated the PR. Please take another pass. |
|
Thanks @mosabua ! I have rebased and squashed. |
|
Tested locally |
Description
Documentation for new connector.
Hudi connector
Add documentation for Hudi connector.
Related issues, pull requests, and links
Fixes #14429
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: