-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add doc for table_changes function #17252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -732,6 +732,154 @@ Fault-tolerant execution support | |
| The connector supports :doc:`/admin/fault-tolerant-execution` of query | ||
| processing. Read and write operations are both supported with any retry policy. | ||
|
|
||
|
|
||
| Table functions | ||
| --------------- | ||
|
|
||
| The connector provides the following table functions: | ||
|
|
||
| table_changes | ||
| ^^^^^^^^^^^^^ | ||
|
|
||
| Allows reading Change Data Feed (CDF) entries to expose row-level changes | ||
| between two versions of a Delta Lake table. When the ``change_data_feed_enabled`` | ||
| table property is set to ``true`` on a specific Delta Lake table, | ||
| the connector records change events for all data changes on the table. | ||
| This is how these changes can be read: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| SELECT | ||
| * | ||
| FROM | ||
| TABLE( | ||
| system.table_changes( | ||
| schema_name => 'test_schema', | ||
| table_name => 'tableName', | ||
| since_version => 0 | ||
| ) | ||
| ); | ||
|
|
||
| ``schema_name`` - type ``VARCHAR``, required, name of the schema for which the function is called | ||
|
|
||
| ``table_name`` - type ``VARCHAR``, required, name of the table for which the function is called | ||
|
|
||
| ``since_version`` - type ``BIGINT``, optional, version from which changes are shown, exclusive | ||
|
|
||
| In addition to returning the columns present in the table, the function | ||
| returns the following values for each change event: | ||
|
|
||
| * ``_change_type`` | ||
| Gives the type of change that occurred. Possible values are ``insert``, | ||
| ``delete``, ``update_preimage`` and ``update_postimage``. | ||
|
|
||
| * ``_commit_version`` | ||
| Shows the table version for which the change occurred. | ||
|
|
||
| * ``_commit_timestamp`` | ||
| Represents the timestamp for the commit in which the specified change happened. | ||
|
|
||
| This is how it would be normally used: | ||
|
|
||
| Create table: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| CREATE TABLE test_schema.pages (page_url VARCHAR, domain VARCHAR, views INTEGER) | ||
| WITH (change_data_feed_enabled = true); | ||
|
|
||
| Insert data: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| INSERT INTO test_schema.pages | ||
| VALUES | ||
| ('url1', 'domain1', 1), | ||
| ('url2', 'domain2', 2), | ||
| ('url3', 'domain1', 3); | ||
| INSERT INTO test_schema.pages | ||
| VALUES | ||
| ('url4', 'domain1', 400), | ||
| ('url5', 'domain2', 500), | ||
| ('url6', 'domain3', 2); | ||
|
|
||
| Update data: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| UPDATE test_schema.pages | ||
| SET domain = 'domain4' | ||
| WHERE views = 2; | ||
|
|
||
| Select changes: | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| SELECT | ||
| * | ||
| FROM | ||
| TABLE( | ||
| system.table_changes( | ||
| schema_name => 'test_schema', | ||
|
||
| table_name => 'pages', | ||
| since_version => 1 | ||
| ) | ||
| ) | ||
| ORDER BY _commit_version ASC; | ||
|
|
||
| The preceding sequence of SQL statements returns the following result: | ||
|
|
||
| .. code-block:: text | ||
|
|
||
| page_url | domain | views | _change_type | _commit_version | _commit_timestamp | ||
| url4 | domain1 | 400 | insert | 2 | 2023-03-10T21:22:23.000+0000 | ||
| url5 | domain2 | 500 | insert | 2 | 2023-03-10T21:22:23.000+0000 | ||
| url6 | domain3 | 2 | insert | 2 | 2023-03-10T21:22:23.000+0000 | ||
| url2 | domain2 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
| url2 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
| url6 | domain3 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
| url6 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
|
|
||
| The output shows what changes happen in which version. | ||
| For example in version 3 two rows were modified, first one changed from | ||
| ``('url2', 'domain2', 2)`` into ``('url2', 'domain4', 2)`` and the second from | ||
| ``('url6', 'domain2', 2)`` into ``('url6', 'domain4', 2)``. | ||
|
|
||
| If ``since_version`` is not provided the function produces change events | ||
| starting from when the table was created. | ||
|
|
||
| .. code-block:: sql | ||
|
|
||
| SELECT | ||
| * | ||
| FROM | ||
| TABLE( | ||
| system.table_changes( | ||
| schema_name => 'test_schema', | ||
| table_name => 'pages' | ||
| ) | ||
| ) | ||
| ORDER BY _commit_version ASC; | ||
|
|
||
| The preceding SQL statement returns the following result: | ||
|
|
||
| .. code-block:: text | ||
|
|
||
| page_url | domain | views | _change_type | _commit_version | _commit_timestamp | ||
| url1 | domain1 | 1 | insert | 1 | 2023-03-10T20:21:22.000+0000 | ||
| url2 | domain2 | 2 | insert | 1 | 2023-03-10T20:21:22.000+0000 | ||
| url3 | domain1 | 3 | insert | 1 | 2023-03-10T20:21:22.000+0000 | ||
| url4 | domain1 | 400 | insert | 2 | 2023-03-10T21:22:23.000+0000 | ||
| url5 | domain2 | 500 | insert | 2 | 2023-03-10T21:22:23.000+0000 | ||
| url6 | domain3 | 2 | insert | 2 | 2023-03-10T21:22:23.000+0000 | ||
| url2 | domain2 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
| url2 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
| url6 | domain3 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
| url6 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000 | ||
|
|
||
| You can see changes that occurred at version 1 as three inserts. They are | ||
| not visible in the previous statement when ``since_version`` value was set to 1. | ||
|
|
||
| Performance | ||
| ----------- | ||
|
|
||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.