-
Notifications
You must be signed in to change notification settings - Fork 3k
Docs: Add flink iceberg connector #3085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
site/docs/flink-connector.md
Outdated
| ``` | ||
|
|
||
| !!! Note | ||
| The underlying catalog database (`hive_db` in the above example) will be created automatically if it does not exist when writing records into the flink table, same thing with the underlying catalog table (`hive_iceberg_table` in the above example). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing with the underlying catalog table (
hive_iceberg_tablein the above example
is this part redundant? the SQL is to create a new table. this is the intention and probably doesn't need to be called out as a note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's remove this line.
site/docs/flink-connector.md
Outdated
| In flink, the SQL `CREATE TABLE test (..) WITH ('connector'='iceberg', ...)` will create an flink table in current flink catalog (use [GenericInMemoryCatalog](https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/catalogs/#genericinmemorycatalog) by default), | ||
| which is just map to the underlying iceberg table instead of maintaining iceberg table. | ||
|
|
||
| To create flink table backend iceberg table in flink SQL by using syntax `CREATE TABLE test (..) WITH ('connector'='iceberg', ...)`, flink iceberg connector provides the following table properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create flink table backend iceberg table
This part reads weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's make this more clear. Thanks.
site/docs/flink-connector.md
Outdated
|
|
||
| * `connector`: Use the constant `iceberg`. | ||
| * `catalog-name`: User-specified catalog name. | ||
| * `catalog-type`: The optional values are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call out the default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the following hive part, we describe the default behavior:
hive: The hive metastore catalog. Usehiveby default if we don't specify any value forcatalog-type.
We can move it ad the head of this part if you think it's necessary to !
site/mkdocs.yml
Outdated
| - Flink: flink.md | ||
| - Flink: | ||
| - Getting Started: flink.md | ||
| - Iceberg Connector: flink-connector.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iceberg Connector -> Flink Connector?
site/docs/flink-connector.md
Outdated
| * `custom`: The customized catalog, see [custom catalog](./custom-catalog.md) for more details. | ||
| * `catalog-database`: The iceberg database name in the backend catalog, use the current flink database name by default. | ||
| * `catalog-table`: The iceberg table name in the backend catalog. | ||
| * `catalog-table`: The iceberg table name in the backend catalog. Default to use the `<table-name>` in the flink DDL `CREATE TABLE <table-name> (..) WITH ('connector'='iceberg', ...)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this property needed at all with CREATE TABLE <table-name>? I assume <table-name> is required in the SQL syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've made this more clear in the updated PR.
stevenzwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left one more comment/question. otherwise, it looks good to me
|
@rdblue , @jackye1995 , would you like to have a double check ? I think I need at least one apache iceberg committer to approve this PR before merging this, thanks. |
site/docs/flink-connector.md
Outdated
| - limitations under the License. | ||
| --> | ||
|
|
||
| Apache Iceberg supports creating flink table directly without creating the explicit flink catalog in flink SQL in [#2666](https://github.com/apache/iceberg/pull/2666). That means we can just create an iceberg table by specifying `'connector'='iceberg'` table option in flink SQL which is similar to usage in the flink official [document](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/overview/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be the other way? Flink supports creating Iceberg tables directly ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we don't need to mention PR links in the documentation (at least I did not see anything referenced in other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make sure all "Flink" are capitalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, all sounds reasonable to me !
site/docs/flink-connector.md
Outdated
|
|
||
| Before executing the following SQL, please make sure you've configured the flink SQL client correctly according to the quick start [document](./flink.md). | ||
|
|
||
| The following SQL will create an flink table in the current flink catalog, which maps to the iceberg table `default_database.iceberg_table` managed in iceberg catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a Flink table
site/docs/flink-connector.md
Outdated
| ); | ||
| ``` | ||
|
|
||
| If you want to create a flink table mapping to a different iceberg table managed in hive catalog (such as `hive_db.hive_iceberg_table` in hive), then you can create flink table as following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalize "Hive"
site/docs/flink-connector.md
Outdated
| ``` | ||
|
|
||
| !!! Note | ||
| The underlying catalog database (`hive_db` in the above example) will be created automatically if it does not exist when writing records into the flink table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when writing records into the flink table.
so it won't create the database when creating the table, but only after the first INSERT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the correct behavior. Because the CREATE TABLE test (..) WITH ('connector'='iceberg', ...) is actually creating a table in in-memory catalog by default and it won't do any real thing in the underlying fs. The database & table will be created only when the records are planing to write to table.
site/docs/flink-connector.md
Outdated
| ); | ||
| ``` | ||
|
|
||
| Please refer to [AWS](./aws.md#catalogs), [Nessie](./nessie.md), [JDBC](./jdbc.md) catalog for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simply say "check sections under the Integrations tab for all custom catalogs" to avoid listing all the pages.
site/docs/flink-connector.md
Outdated
| 3 rows in set | ||
| ``` | ||
|
|
||
| Please refer to [document](./flink.md) for queries and writes. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For more details, please refer to the Iceberg Flink document.
| - Time Travel: spark-queries/#time-travel | ||
| - Flink: flink.md | ||
| - Flink: | ||
| - Getting Started: flink.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking we might want to break down the flink documentation like Spark (no need to do in this PR), just to make it more consistent between the 2 and easier to navigate. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's necessary to break the flink into more pages so that people could get the correct point in the separate page. We will also introduce more feature for iceberg flink integration work, such as flink table maintaince API , flip-27 unified source/sink etc.
I will prefer to publish a separate PR to address the break-down thing, let this one focus on adding the iceberg flink connector.
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me, just a small typo
site/docs/flink-connector.md
Outdated
|
|
||
| Apache Flink supports creating Iceberg table directly without creating the explicit Flink catalog in Flink SQL. That means we can just create an iceberg table by specifying `'connector'='iceberg'` table option in Flink SQL which is similar to usage in the Flink official [document](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/overview/). | ||
|
|
||
| In Flink, the SQL `CREATE TABLE test (..) WITH ('connector'='iceberg', ...)` will create an Flink table in current Flink catalog (use [GenericInMemoryCatalog](https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/catalogs/#genericinmemorycatalog) by default), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a Flink table
|
Get this merged, thanks @stevenzwu & @jackye1995 for reviewing ! |
Add document for the PR #2666 . I decided to prepare a document for this because I've seen people were asking how to use the flink
'connector'='iceberg'to create iceberg table in #3079