Skip to content

Add delta history metadata table#15556

Merged
ebyhr merged 4 commits intotrinodb:masterfrom
findinpath:findinpath/delta-history-table
Feb 15, 2023
Merged

Add delta history metadata table#15556
ebyhr merged 4 commits intotrinodb:masterfrom
findinpath:findinpath/delta-history-table

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented Dec 29, 2022

Description

Provide the ability to query the history of changes performed on a Delta Lake table.

The documentation from https://docs.databricks.com/delta/history.html DESCRIBE HISTORY command was used as reference.

Same as in case of Iceberg tables, the history content can be retrieved by doing the query:

SELECT * from "table_name$history";

Fixes #15683

Additional context and related issues

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Add `$history` system table which can be queried to inspect Delta Lake table history. ({issue}`15683`)

@cla-bot cla-bot Bot added the cla-signed label Dec 29, 2022
@findinpath findinpath force-pushed the findinpath/delta-history-table branch from 1e50935 to ec53b61 Compare December 29, 2022 16:13
@findinpath findinpath added release-notes needs-docs This pull request requires changes to the documentation labels Dec 29, 2022
@findinpath findinpath force-pushed the findinpath/delta-history-table branch from ec53b61 to 150865f Compare December 29, 2022 16:21
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of this class has been copied with small adaptations from trino-iceberg.

I'd be inclined to move the class in trino-spi , but I'm missing a builder class for pages collection.

@findinpath findinpath force-pushed the findinpath/delta-history-table branch from 150865f to 5a43b14 Compare December 29, 2022 16:26
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this information is not currently used in Trino, there is some extra info that could be added here:

  • operationMetrics
  • userMetadata
  • engineInfo

Also operationParameters seems to be a bit more complex than only a map of `<string, string> entries.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the capital B Boolean? If it may not be present, use Optional<Boolean>

@findinpath findinpath force-pushed the findinpath/delta-history-table branch from 5a43b14 to 9d596c6 Compare December 30, 2022 09:59
@findinpath findinpath force-pushed the findinpath/delta-history-table branch from 9d596c6 to 4993652 Compare December 31, 2022 13:08
@findinpath findinpath force-pushed the findinpath/delta-history-table branch 3 times, most recently from 5a2a8ab to bbe9c15 Compare January 4, 2023 10:53
@findinpath findinpath requested review from findepi and homar January 4, 2023 10:53
Copy link
Copy Markdown
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the docs and add a product test?

@findinpath findinpath force-pushed the findinpath/delta-history-table branch 4 times, most recently from 25f3552 to 3dedaa5 Compare January 19, 2023 06:46
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably job (same as notebook ) is not relevant in the context of Trino.

See https://docs.databricks.com/delta/history.html

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this column. We can add it later if needed.

@github-actions github-actions Bot added the docs label Jan 19, 2023
@findinpath findinpath force-pushed the findinpath/delta-history-table branch 2 times, most recently from c1e8a3c to 95299e2 Compare January 19, 2023 16:01
@findinpath findinpath requested a review from ebyhr January 20, 2023 05:01
@findinpath
Copy link
Copy Markdown
Contributor Author

It would be nice if this had the same top level schema as the Iceberg one, putting all the connector specific metadata in a nested row type field.
@ebyhr @findepi do you think it's worth the work? Would probably mean changing the schema of the Iceberg one too.

@ebyhr please don't merge this yet.

@alexjo2144
Copy link
Copy Markdown
Member

please don't merge this yet.

I think my comment on the schema is something we can do as follow up

@findinpath
Copy link
Copy Markdown
Contributor Author

Rebasing on master to deal with the code conflicts.

@findinpath findinpath force-pushed the findinpath/delta-history-table branch from b5189d1 to 7e07120 Compare February 7, 2023 13:11
@findinpath findinpath requested a review from ebyhr February 7, 2023 13:21
@findinpath
Copy link
Copy Markdown
Contributor Author

Rebasing on master to deal with the code conflicts.

@findinpath findinpath force-pushed the findinpath/delta-history-table branch from 7e07120 to a6ae7a7 Compare February 8, 2023 14:38
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to what @krvikash is working on for Iceberg, we should avoid materializing all the entries here and do a streaming approach instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this eventually as a follow-up - in case it is needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be follow up but being able to push down a predicate on WHERE version > x seems pretty helpful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Feb 10, 2023

/test-with-secrets sha=a6ae7a7ef5cb89e17fc24f955924f60c3531cfbb

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4143883144

@findinpath
Copy link
Copy Markdown
Contributor Author

https://github.com/trinodb/trino/pull/15556/checks?check_run_id=11257287142

GitHub Actions / hive-tests (config-hdp3) with secrets
failed yesterday in 0s

I don't know what happened here.

@ebyhr could you please run the build again (to be on the safe side) ?

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Feb 12, 2023

/test-with-secrets sha=a6ae7a7ef5cb89e17fc24f955924f60c3531cfbb

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4158604825

The transaction log files can be written by query engines
(Trino included at the time of this writing) which do not
fill this property
Previously this property was being serialized by using `blindAppend` name.
@findinpath findinpath force-pushed the findinpath/delta-history-table branch from a6ae7a7 to 96c8002 Compare February 13, 2023 08:39
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Feb 13, 2023

/test-with-secrets sha=96c80020b151ae66b2c05bd2d64779aeb7b27bf0

@github-actions
Copy link
Copy Markdown

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/4162331409

@findinpath findinpath force-pushed the findinpath/delta-history-table branch from 96c8002 to 2f05ab6 Compare February 14, 2023 11:47
@ebyhr ebyhr merged commit 0bc4292 into trinodb:master Feb 15, 2023
@ebyhr ebyhr mentioned this pull request Feb 15, 2023
@github-actions github-actions Bot added this to the 407 milestone Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Delta Lake transaction log stores blindAppend field unlike Spark isBlindAppend

5 participants