Docs: Add flink iceberg connector #3085

openinx · 2021-09-07T12:04:27Z

Add document for the PR #2666 . I decided to prepare a document for this because I've seen people were asking how to use the flink 'connector'='iceberg' to create iceberg table in #3079

stevenzwu · 2021-09-07T15:16:00Z

site/docs/flink-connector.md

+```
+
+!!! Note
+    The underlying catalog database (`hive_db` in the above example) will be created automatically if it does not exist when writing records into the flink table, same thing with the underlying catalog table (`hive_iceberg_table` in the above example).


same thing with the underlying catalog table (hive_iceberg_table in the above example

is this part redundant? the SQL is to create a new table. this is the intention and probably doesn't need to be called out as a note

Okay, let's remove this line.

stevenzwu · 2021-09-07T15:17:24Z

site/docs/flink-connector.md

+In flink, the SQL `CREATE TABLE test (..) WITH ('connector'='iceberg', ...)` will create an flink table in current flink catalog (use [GenericInMemoryCatalog](https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/catalogs/#genericinmemorycatalog) by default),
+which is just map to the underlying iceberg table instead of maintaining iceberg table.
+
+To create flink table backend iceberg table in flink SQL by using syntax `CREATE TABLE test (..) WITH ('connector'='iceberg', ...)`,  flink iceberg connector provides the following table properties:


To create flink table backend iceberg table

This part reads weird

OK, let's make this more clear. Thanks.

stevenzwu · 2021-09-07T15:20:53Z

site/docs/flink-connector.md

+
+* `connector`: Use the constant `iceberg`.
+* `catalog-name`: User-specified catalog name.
+* `catalog-type`: The optional values are:


can we call out the default value?

In the following hive part, we describe the default behavior:

hive: The hive metastore catalog. Use hive by default if we don't specify any value for catalog-type.

We can move it ad the head of this part if you think it's necessary to !

stevenzwu · 2021-09-07T15:22:37Z

site/mkdocs.yml

-    - Flink: flink.md
+    - Flink:
+      - Getting Started: flink.md
+      - Iceberg Connector: flink-connector.md


Iceberg Connector -> Flink Connector?

stevenzwu · 2021-09-08T03:49:17Z

site/docs/flink-connector.md

+    * `custom`: The customized catalog, see [custom catalog](./custom-catalog.md) for more details.
 * `catalog-database`: The iceberg database name in the backend catalog, use the current flink database name by default.
-* `catalog-table`: The iceberg table name in the backend catalog.
+* `catalog-table`: The iceberg table name in the backend catalog. Default to use the `<table-name>` in the flink DDL `CREATE TABLE <table-name> (..) WITH ('connector'='iceberg', ...)`.


Is this property needed at all with CREATE TABLE <table-name>? I assume <table-name> is required in the SQL syntax.

Okay, I've made this more clear in the updated PR.

stevenzwu

left one more comment/question. otherwise, it looks good to me

openinx · 2021-09-14T12:00:35Z

@rdblue , @jackye1995 , would you like to have a double check ? I think I need at least one apache iceberg committer to approve this PR before merging this, thanks.

jackye1995 · 2021-09-15T06:37:21Z

site/docs/flink-connector.md

+ - limitations under the License.
+ -->
+
+Apache Iceberg supports creating flink table directly without creating the explicit flink catalog in flink SQL in [#2666](https://github.com/apache/iceberg/pull/2666). That means we can just create an iceberg table by specifying `'connector'='iceberg'` table option in flink SQL which is similar to usage in the flink official [document](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/overview/).


should it be the other way? Flink supports creating Iceberg tables directly ...

Also I think we don't need to mention PR links in the documentation (at least I did not see anything referenced in other places)

nit: make sure all "Flink" are capitalized.

Okay, all sounds reasonable to me !

jackye1995 · 2021-09-15T06:39:46Z

site/docs/flink-connector.md

+
+Before executing the following SQL, please make sure you've configured the flink SQL client correctly according to the quick start [document](./flink.md).
+
+The following SQL will create an flink table in the current flink catalog, which maps to the iceberg table `default_database.iceberg_table` managed in iceberg catalog.


nit: a Flink table

jackye1995 · 2021-09-15T06:40:09Z

site/docs/flink-connector.md

+);
+```
+
+If you want to create a flink table mapping to a different iceberg table managed in hive catalog (such as `hive_db.hive_iceberg_table` in hive), then you can create flink table as following:


nit: capitalize "Hive"

jackye1995 · 2021-09-15T06:41:46Z

site/docs/flink-connector.md

+```
+
+!!! Note
+    The underlying catalog database (`hive_db` in the above example) will be created automatically if it does not exist when writing records into the flink table.


when writing records into the flink table.

so it won't create the database when creating the table, but only after the first INSERT?

Yes, that's the correct behavior. Because the CREATE TABLE test (..) WITH ('connector'='iceberg', ...) is actually creating a table in in-memory catalog by default and it won't do any real thing in the underlying fs. The database & table will be created only when the records are planing to write to table.

jackye1995 · 2021-09-15T06:43:00Z

site/docs/flink-connector.md

+);
+```
+
+Please refer to [AWS](./aws.md#catalogs), [Nessie](./nessie.md), [JDBC](./jdbc.md) catalog for more details.


I think we can simply say "check sections under the Integrations tab for all custom catalogs" to avoid listing all the pages.

jackye1995 · 2021-09-15T06:44:17Z

site/docs/flink-connector.md

+3 rows in set
+```
+
+Please refer to [document](./flink.md) for queries and writes.


nit: For more details, please refer to the Iceberg Flink document.

jackye1995 · 2021-09-15T06:45:18Z

site/mkdocs.yml

      - Time Travel: spark-queries/#time-travel
-    - Flink: flink.md
+    - Flink:
+      - Getting Started: flink.md


I am thinking we might want to break down the flink documentation like Spark (no need to do in this PR), just to make it more consistent between the 2 and easier to navigate. What do you think?

Yes, I think it's necessary to break the flink into more pages so that people could get the correct point in the separate page. We will also introduce more feature for iceberg flink integration work, such as flink table maintaince API , flip-27 unified source/sink etc.

I will prefer to publish a separate PR to address the break-down thing, let this one focus on adding the iceberg flink connector.

jackye1995

overall looks good to me, just a small typo

jackye1995 · 2021-09-20T17:40:13Z

site/docs/flink-connector.md

+
+Apache Flink supports creating Iceberg table directly without creating the explicit Flink catalog in Flink SQL. That means we can just create an iceberg table by specifying `'connector'='iceberg'` table option in Flink SQL which is similar to usage in the Flink official [document](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/overview/).
+
+In Flink, the SQL `CREATE TABLE test (..) WITH ('connector'='iceberg', ...)` will create an Flink table in current Flink catalog (use [GenericInMemoryCatalog](https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/catalogs/#genericinmemorycatalog) by default),


nit: a Flink table

openinx · 2021-09-22T02:22:54Z

Get this merged, thanks @stevenzwu & @jackye1995 for reviewing !

Docs: Add flink iceberg connector

f7d0457

github-actions bot added the docs label Sep 7, 2021

openinx mentioned this pull request Sep 7, 2021

java.io.IOException: Mkdirs failed to create file:/user/hive/warehouse/bench/metadata #3079

Closed

stevenzwu reviewed Sep 7, 2021

View reviewed changes

Addressing comment from Steven Zhen Wu

9efe214

openinx added the flink label Sep 8, 2021

stevenzwu reviewed Sep 8, 2021

View reviewed changes

stevenzwu approved these changes Sep 8, 2021

View reviewed changes

Minor fixes

8b6fe82

jackye1995 reviewed Sep 15, 2021

View reviewed changes

Addressing comments.

305a2a7

jackye1995 approved these changes Sep 20, 2021

View reviewed changes

Fix the typo

7d4f589

openinx merged commit 036a374 into apache:master Sep 22, 2021


		Before executing the following SQL, please make sure you've configured the flink SQL client correctly according to the quick start [document](./flink.md).

		The following SQL will create an flink table in the current flink catalog, which maps to the iceberg table `default_database.iceberg_table` managed in iceberg catalog.


		Apache Flink supports creating Iceberg table directly without creating the explicit Flink catalog in Flink SQL. That means we can just create an iceberg table by specifying `'connector'='iceberg'` table option in Flink SQL which is similar to usage in the Flink official [document](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/overview/).

		In Flink, the SQL `CREATE TABLE test (..) WITH ('connector'='iceberg', ...)` will create an Flink table in current Flink catalog (use [GenericInMemoryCatalog](https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/catalogs/#genericinmemorycatalog) by default),

Docs: Add flink iceberg connector #3085

Docs: Add flink iceberg connector #3085

Uh oh!

Conversation

openinx commented Sep 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu left a comment

Choose a reason for hiding this comment

Uh oh!

openinx commented Sep 14, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackye1995 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx commented Sep 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants