apache · JanKaul · Aug 21, 2024 · Aug 29, 2024 · Sep 5, 2024 · Sep 7, 2024
diff --git a/format/view-spec.md b/format/view-spec.md
@@ -42,12 +42,28 @@ An atomic swap of one view metadata file for another provides the basis for maki
 
 Writers create view metadata files optimistically, assuming that the current metadata location will not be changed before the writer's commit. Once a writer has created an update, it commits by swapping the view's metadata file pointer from the base location to the new location.
 
+### Materialized Views
+
+Materialized views are a type of view with precomputed results from the view query stored as a table.
+When queried, engines may return the precomputed data for the materialized views, shifting the cost of query execution to the precomputation step.
+
+Iceberg materialized views are implemented as a combination of an Iceberg view and an underlying Iceberg table, known as the storage table, which stores the precomputed data.
+The metadata for a materialized view extends the Iceberg view metadata, adding a pointer to the precomputed data and refresh information to determine if the data is still fresh. 
+The refresh information is composed of data about the so-called "source tables", which are the tables referenced in the query definition of the materialized view. 
+The storage table can be in the states of "fresh", "stale" or "invalid", which are determined from the following situations:
+* **fresh** -- The `snapshot_id`s of the last refresh operation match the current `snapshot_id`s of the source tables.
+* **stale** -- The `snapshot_id`s do not match, indicating that a refresh operation needs to be performed to capture the latest source table changes.
+* **invalid** -- The current `version_id` of the materialized view does not match the `refresh-version-id` of the refresh state. 
+
 ## Specification
 
 ### Terms
 
 * **Schema** -- Names and types of fields in a view.
 * **Version** -- The state of a view at some point in time.
+* **Storage table** -- Iceberg table that stores the precomputed data of the materialized view.
+* **Source table** -- A table reference that occurs in the query definition of the materialized view. The materialized view depends on the data from the source tables.
+* **Source view** -- A view reference that occurs in the query definition of the materialized view. The materialized view depends on the definitions from the source views.
 
 ### View Metadata
 
@@ -82,9 +98,12 @@ Each version in `versions` is a struct with the following fields:
 | _required_  | `representations`   | A list of [representations](#representations) for the view definition         |
 | _optional_  | `default-catalog`   | Catalog name to use when a reference in the SELECT does not contain a catalog |
 | _required_  | `default-namespace` | Namespace to use when a reference in the SELECT is a single identifier        |
+| _optional_  | `storage-table`     | A [storage table identifier](#storage-table-identifier) of the storage table |
 
 When `default-catalog` is `null` or not set, the catalog in which the view is stored must be used as the default catalog.
 
+When 'storage-table' is `null` or not set, the entity is a common view, otherwise it is a materialized view. 
+
 #### Summary
 
 Summary is a string to string map of metadata about a view version. Common metadata keys are documented here.
@@ -160,6 +179,57 @@ Each entry in `version-log` is a struct with the following fields:
 | _required_  | `timestamp-ms` | Timestamp when the view's `current-version-id` was updated (ms from epoch) |
 | _required_  | `version-id`   | ID that `current-version-id` was set to |
 
+#### Storage Table Identifier
+
+The table identifier for the storage table that stores the precomputed results.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _optional_  | `catalog`      | A string specifying the name of the catalog. If set to `null`, the catalog is the same as the view's catalog |
+| _required_  | `namespace`    | A list of strings for namespace levels |
+| _required_  | `name`         | A string specifying the name of the table/view |
-| _required_  | `name`         | A string specifying the name of the table/view |
+| _required_  | `name`         | A string specifying the name of the table |
-| _required_  | `name`         | A string specifying the name of the table/view |
+| _required_  | `name`         | A string specifying the name of the table |
+
+### Storage table metadata
+
+This section describes additional metadata for the storage table that supplements the regular table metadata and is required for materialzied views.
+The property "refresh-state" is set on the table [snapshot summary](https://iceberg.apache.org/spec/#snapshots) to determine the freshness of the precomputed data of the storage table.
+
+| Requirement | Field name      | Description |
+|-------------|-----------------|-------------|
+| _required_  | `refresh-state` | A [refresh state](#refresh-state) record stored as a JSON-encoded string | 
+
+#### Refresh state
+
+The refresh state record captures the state of all source tables and source views in the fully expanded query tree of the materialized view, including indirect references. Indirect references are the tables/views that are not directly referenced in the query but are nested within other views. The refresh state has the following fields:
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `view-version-id`         | The `version-id` of the materialized view when the refresh operation was performed  | 
+| _required_  | `source-table-states`        | A list of [source table](#source-table) records for all tables that are directly or indirectly referenced in the materialized view query |
+| _required_  | `source-view-states`         | A list of [source view](#source-view) records for all views that are directly or indirectly referenced in the materialized view query |
+| _required_  | `refresh-start-timestamp-ms` | A timestamp of when the refresh operation was started |
+
+#### Source table
+
+A source table record captures the state of a source table at the time of the last refresh operation.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `uuid`         | The uuid of the source table | 
+| _required_  | `snapshot-id`  | Snapshot-id of when the last refresh operation was performed |
+| _optional_  | `ref`          | Branch name of the source table being referenced in the view query |
+
+When `ref` is `null` or not set, it defaults to "main".
+
+#### Source view
+
+A source view record captures the state of a source view at the time of the last refresh operation.
+
+| Requirement | Field name     | Description |
+|-------------|----------------|-------------|
+| _required_  | `uuid`         | The uuid of the source view | 
+| _required_  | `version-id`   | Version-id of when the last refresh operation was performed |
+
 ## Appendix A: An Example
 
 The JSON metadata file format is described using an example below.