Adding documentation for metadata tables #3159

BACtaki · 2021-09-20T18:59:50Z

Issue: #757

jackye1995

thanks for working on this!

jackye1995 · 2021-09-20T19:11:13Z

site/docs/metadata.md

@@ -0,0 +1,134 @@
+# Metadata Tables
+
+This page describes the internal metadata tables maintained by Iceberg. Please refer to [definitions page](terms.md)


nit: prefer to change line on full sentence.

nit: the definitions page

jackye1995 · 2021-09-20T19:13:29Z

site/docs/metadata.md

+
+## Table Schema
+
+### <a id="AllDataFilesTable"></a> 1. `AllDataFilesTable`


I think we should use the actual names of the tables, instead of the class name, like files, manifests, entries, etc.

What is the use of this HTML tag?

I agree with Jack on the section names.

jackye1995 · 2021-09-20T19:14:46Z

site/docs/spec.md

 | _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the snapshot's parent. Omitted for any snapshot with no parent |
 |            | _required_ | **`sequence-number`**    | A monotonically increasing long that tracks the order of changes to a table |
 | _required_ | _required_ | **`timestamp-ms`**       | A timestamp when the snapshot was created, used for garbage collection and table inspection |
-| _optional_ | _required_ | **`manifest-list`**      | The location of a manifest list for this snapshot that tracks manifest files with additional meadata |


nice catch!

jackye1995 · 2021-09-20T19:16:07Z

site/docs/spark-queries.md

 +----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+---------------------------------+
 ```

+### Metadata Table Schema


Not sure what other people think, I think I would prefer having the schema all in that metadata.md page, and we can remove this section in Spark and only provide a SQL example for the syntax to query the table + a link to that page.

I don't see much value in having these schemas here. Seems like they'll just get out of date.

jackye1995 · 2021-09-20T19:16:26Z

site/docs/metadata.md

+### <a id="AllDataFilesTable"></a> 1. `AllDataFilesTable`
+
+| Column name           | Required  | Data type         | Description |
+|-----------------------|-----------|-------------------|-------------|


missing a Column ID column.

Do we need IDs? I'm not sure those are valuable to users.

rdblue · 2021-09-22T17:54:57Z

site/docs/metadata.md

+| equality_ids          |           | `list<int>`       | Equality comparison field IDs
+| sort_order_id         |           | int               | Sort order ID
+
+### <a id="AllEntriesTable"></a> 2. `AllEntriesTable`


Anchors are generated automatically, so no need to add them in the markdown.

rdblue · 2021-09-22T18:00:09Z

site/docs/metadata.md

@@ -0,0 +1,134 @@
+# Metadata Tables


What's the purpose of this documentation? Much of this is already covered in the Spark queries page, which this links to for docs on how to query the metadata tables.

Who is the intended audience for these docs?

Because this is not a Spark only feature, the intention is to make it a top level documentation, and have proper tables for the metadata table schema instead of showing Java code.

There is a pretty good argument against adding this, though. Documenting what tables exist without an engine is confusing to anyone looking for how to view metadata tables. Just saying that these tables exist and what their schemas are doesn't help a user coming to the docs.

If this were part of a document on the API that explained how to use the metadata tables and was targeted at engine developers, I think it would be more valuable. But I think placing it under tables will just cause confusion.

Yes I agree that is a confusion point. So do you recommend having one section for each engine around system tables? My current thought is to have this page introducing the Iceberg schema, linking to related sections in each engine page for examples of using system tables.

One of my intention to add it was because this is a very important feature not exist in other similar products, and it provides huge benefits for users to build data management capabilities around such information. Tables like manifests and files also support optimizations like predicate pushdown and file pruning, which essentially solves the big metadata issue. So I feel it's a pity to hide such information too deeply.

Yeah, I think for now we should include this for each engine. They do expose the tables differently (like Trino's $ syntax) so I don't see a lot of value in splitting docs into common and engine-specific. That just makes it harder to find what you're looking for.

rdblue · 2021-09-22T18:01:21Z

site/mkdocs.yml

    - Performance: performance.md
    - Reliability: reliability.md
+    - Schemas: schemas.md
+    - Table evolution: evolution.md


Please revert the reordering here.

rdblue · 2021-09-26T23:15:06Z

Sounds like we agree that we would rather document this in each engine to avoid confusion. I'm going to close this PR, but if there is still disagreement feel free to reopen it.

BACtaki added 3 commits March 4, 2021 14:33

Adding documentation for metadata tables

48c9383

WIP

2c35514

Merge branch 'master' into metadata_docs

792b552

github-actions bot added the docs label Sep 20, 2021

BACtaki mentioned this pull request Sep 20, 2021

Document all metadata tables #757

Closed

jackye1995 reviewed Sep 20, 2021

View reviewed changes

rdblue reviewed Sep 22, 2021

View reviewed changes

rdblue closed this Sep 26, 2021

szehon-ho mentioned this pull request Oct 22, 2021

Add data_file.spec_id to metadata tables #3015

Merged

		@@ -0,0 +1,134 @@
		# Metadata Tables

		This page describes the internal metadata tables maintained by Iceberg. Please refer to [definitions page](terms.md)


		## Table Schema

		### <a id="AllDataFilesTable"></a> 1. `AllDataFilesTable`

Adding documentation for metadata tables #3159

Adding documentation for metadata tables #3159

Uh oh!

Conversation

BACtaki commented Sep 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jackye1995 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdblue commented Sep 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BACtaki commented Sep 20, 2021 •

edited

Loading