Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions docs/src/main/sphinx/connector/delta-lake.rst
Original file line number Diff line number Diff line change
Expand Up @@ -732,6 +732,154 @@ Fault-tolerant execution support
The connector supports :doc:`/admin/fault-tolerant-execution` of query
processing. Read and write operations are both supported with any retry policy.


Table functions
---------------

The connector provides the following table functions:

table_changes
^^^^^^^^^^^^^

Allows reading Change Data Feed (CDF) entries to expose row-level changes
between two versions of a Delta Lake table. When the ``change_data_feed_enabled``
table property is set to ``true`` on a specific Delta Lake table,
the connector records change events for all data changes on the table.
This is how these changes can be read:

.. code-block:: sql

SELECT
*
FROM
TABLE(
system.table_changes(
schema_name => 'test_schema',
table_name => 'tableName',
since_version => 0
)
);

``schema_name`` - type ``VARCHAR``, required, name of the schema for which the function is called

``table_name`` - type ``VARCHAR``, required, name of the table for which the function is called

``since_version`` - type ``BIGINT``, optional, version from which changes are shown, exclusive

In addition to returning the columns present in the table, the function
returns the following values for each change event:

* ``_change_type``
Gives the type of change that occurred. Possible values are ``insert``,
``delete``, ``update_preimage`` and ``update_postimage``.

* ``_commit_version``
Shows the table version for which the change occurred.

* ``_commit_timestamp``
Represents the timestamp for the commit in which the specified change happened.

This is how it would be normally used:

Create table:

.. code-block:: sql

CREATE TABLE test_schema.pages (page_url VARCHAR, domain VARCHAR, views INTEGER)
WITH (change_data_feed_enabled = true);

Insert data:

.. code-block:: sql

INSERT INTO test_schema.pages
VALUES
('url1', 'domain1', 1),
('url2', 'domain2', 2),
('url3', 'domain1', 3);
INSERT INTO test_schema.pages
VALUES
('url4', 'domain1', 400),
('url5', 'domain2', 500),
('url6', 'domain3', 2);

Update data:

.. code-block:: sql

UPDATE test_schema.pages
SET domain = 'domain4'
WHERE views = 2;

Select changes:

.. code-block:: sql

SELECT
*
FROM
TABLE(
system.table_changes(
schema_name => 'test_schema',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add the schema to the queries above to be clear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or add a "use" command first

table_name => 'pages',
since_version => 1
)
)
ORDER BY _commit_version ASC;

The preceding sequence of SQL statements returns the following result:

.. code-block:: text

page_url | domain | views | _change_type | _commit_version | _commit_timestamp
url4 | domain1 | 400 | insert | 2 | 2023-03-10T21:22:23.000+0000
url5 | domain2 | 500 | insert | 2 | 2023-03-10T21:22:23.000+0000
url6 | domain3 | 2 | insert | 2 | 2023-03-10T21:22:23.000+0000
url2 | domain2 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000
url2 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000
url6 | domain3 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000
url6 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000

The output shows what changes happen in which version.
For example in version 3 two rows were modified, first one changed from
``('url2', 'domain2', 2)`` into ``('url2', 'domain4', 2)`` and the second from
``('url6', 'domain2', 2)`` into ``('url6', 'domain4', 2)``.

If ``since_version`` is not provided the function produces change events
starting from when the table was created.

.. code-block:: sql

SELECT
*
FROM
TABLE(
system.table_changes(
schema_name => 'test_schema',
table_name => 'pages'
)
)
ORDER BY _commit_version ASC;

The preceding SQL statement returns the following result:

.. code-block:: text

page_url | domain | views | _change_type | _commit_version | _commit_timestamp
url1 | domain1 | 1 | insert | 1 | 2023-03-10T20:21:22.000+0000
url2 | domain2 | 2 | insert | 1 | 2023-03-10T20:21:22.000+0000
url3 | domain1 | 3 | insert | 1 | 2023-03-10T20:21:22.000+0000
url4 | domain1 | 400 | insert | 2 | 2023-03-10T21:22:23.000+0000
url5 | domain2 | 500 | insert | 2 | 2023-03-10T21:22:23.000+0000
url6 | domain3 | 2 | insert | 2 | 2023-03-10T21:22:23.000+0000
url2 | domain2 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000
url2 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000
url6 | domain3 | 2 | update_preimage | 3 | 2023-03-10T22:23:24.000+0000
url6 | domain4 | 2 | update_postimage | 3 | 2023-03-10T22:23:24.000+0000

You can see changes that occurred at version 1 as three inserts. They are
not visible in the previous statement when ``since_version`` value was set to 1.

Performance
-----------

Expand Down