Skip to content

Expire Snapshot and Remove Orphan files for Iceberg together with coordinator-only execute#10810

Merged
findepi merged 3 commits intotrinodb:masterfrom
homar:homar/vacuum_for_iceberg
Apr 21, 2022
Merged

Expire Snapshot and Remove Orphan files for Iceberg together with coordinator-only execute#10810
findepi merged 3 commits intotrinodb:masterfrom
homar:homar/vacuum_for_iceberg

Conversation

@homar
Copy link
Member

@homar homar commented Jan 26, 2022

Two new procedures are added to Iceberg connector:

  1. REMOVE_ORPHAN_FILES - removes files from table folder that don't belong to any tables and that are older than minimum retention
  2. EXPIRE_SNAPSHOTS - expires snapshots older than provided minimum retention

Trino lacked infrastructure to run a specific task on a single node tough the necessity for such a behaviour was predicted when table execute was introduced.
This PR adds necessary changes to run tasks on coordinator only.

@cla-bot cla-bot bot added the cla-signed label Jan 26, 2022
@homar homar requested review from findepi and losipiuk and removed request for losipiuk January 26, 2022 13:12
@homar homar force-pushed the homar/vacuum_for_iceberg branch 2 times, most recently from 8ff40d4 to 82c940f Compare January 27, 2022 19:51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import directly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "else" is redundant since the if block returns

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"else" is redundant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format intentionally empty methods like

public void finish() {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for null

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expireTimestamp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table.expireSnapshots()
        .expireOlderThan(expireTimestamp)
        .cleanExpiredFiles(true)
        .commit();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to previous line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We normally do throws Exception for test methods that throw any checked exceptions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is flaky, as it relies on the INSERT, SELECTs and EXECUTE all running in under 1 second. Or if we don't care, then there is no need to sleep or have a threshold greater than zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not correct.
The sleep is there to make sure at least 1 second passed after we inserted first row.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just drop the sleep and make retention 0?

@homar homar force-pushed the homar/vacuum_for_iceberg branch from 82c940f to d51f5a8 Compare January 28, 2022 09:20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me output schema of all ALTER TABLE ... EXECUTE statements should be the same. So I would make it single BIGINT column and return no rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need status. Either operation completes correctly. Or we get an error, which is passed via exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for separate class - you can inline fields imo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is cumbersome. Maybe NonReadingTableExecuteNode?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continued at #10810 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.getTarget does not have toStriong - but I suggest to drop the class altogether

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have analysis.getTableExecuteHandle().orElseThrow() assigned to variable one line above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Remove only files older than threshold"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continued at #10810 (comment)

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass done

@findinpath findinpath self-requested a review January 28, 2022 14:45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a check at the file level to see whether the vacuum operation was effective?

You could use $files metadata table (while the snapshot to be expired is still active) to gather the file(s) relevant for the snapshot.
You could use $snapshots / $history metadata tables to actually check that the snapshot was expired.
Subsequently you could check at the file level that the files you previously recorded are also removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added counting physical files as it is more reliable

@findinpath
Copy link
Contributor

either here or in a follow-up PR it would be beneficial to have documentation for this newly introduced operation.

@homar homar force-pushed the homar/vacuum_for_iceberg branch from d51f5a8 to f2d9283 Compare January 31, 2022 12:22
@homar homar force-pushed the homar/vacuum_for_iceberg branch from f2d9283 to c991b8e Compare January 31, 2022 12:46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time to understand what readsData is about and why the method io.trino.spi.connector.TableProcedureExecutionMode#coordinatorOnly uses readsData set to false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is removed already from here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the constructor private and only have static factories. The boolean arguments are confusing, especially because certain combinations are not valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add Javadoc to the coordinatorOnly() factory:

Table procedure that does not read any table data and only executes on the coordinator.
Such procedures are useful for custom DDL-type operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name distributedWithFilteringAndRepartitioning() confusing. Would supportsFiltering=false also do repartitioning? More generally, how does filtering relate to repartitioning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the field name still consistent with the type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the javadoc change intentional?

@homar homar force-pushed the homar/vacuum_for_iceberg branch 2 times, most recently from 4e7ab2c to 9488255 Compare January 31, 2022 13:38
@homar
Copy link
Member Author

homar commented Jan 31, 2022

@losipiuk I addressed all the comments, improved existing test and added a new one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URI.getPath is bad conversion (lossy), is it file:// URI?

Path.of(tableDataDir) should suffice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for local file system. Path is from java.nio. It doesn't work with protocol I've just checked

@homar homar force-pushed the homar/vacuum_for_iceberg branch 4 times, most recently from 112da80 to b94fca0 Compare April 9, 2022 12:08
@homar
Copy link
Member Author

homar commented Apr 9, 2022

I've added 2 tests: testTrinoReadsTrinoTableWithSparkDeletesAfterOptimizeAndCleanUp and testCleaningUpIcebergTableFailsForTableV2.
First checks if performing a delete doesn't break the table. It looks like fine. Tough I found that spark has some issues when queries are run both on spark and on trino - as this is not related to this change at all I created separate pr to show the problem here #11891.
Second test makes sure that neither EXPIRE_SNAPSHOT nor DELETE_OPRHAN_FILES works do any work on the table with iceberg version 2.
@findepi @alexjo2144 WDYT ?

@synhershko
Copy link

But does it make sense to let the coordinator (which should coordinate Trino tasks) execute actual data or metadata mutations? wouldn't it be more in-line with Trino's design to just create a task and assign it to a single worker node?

@homar
Copy link
Member Author

homar commented Apr 13, 2022

But does it make sense to let the coordinator (which should coordinate Trino tasks) execute actual data or metadata mutations? wouldn't it be more in-line with Trino's design to just create a task and assign it to a single worker node?

@synhershko Take a look at this comment #10810 (comment)

@homar homar force-pushed the homar/vacuum_for_iceberg branch 2 times, most recently from d8e4ebe to 38a19e6 Compare April 15, 2022 16:10
@findepi
Copy link
Member

findepi commented Apr 20, 2022

@homar please rebase, there are conflicts.

@homar homar force-pushed the homar/vacuum_for_iceberg branch from 38a19e6 to 4ac2ec9 Compare April 20, 2022 10:43
@homar
Copy link
Member Author

homar commented Apr 20, 2022

@findepi done

@findepi findepi requested a review from alexjo2144 April 20, 2022 12:27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment why we decided to use URI.getPath only here. Perhaps a reference to iceberg issue.

@homar homar force-pushed the homar/vacuum_for_iceberg branch from 4ac2ec9 to 2c3c3d7 Compare April 20, 2022 12:44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the name

homar added 3 commits April 20, 2022 20:59
This commit also adds com.starburstdata.presto.plugin.deltalake.procedure.Procedures#checkProcedureArgument
that makes it easier to validate input parameters and throw trino exception
@homar homar force-pushed the homar/vacuum_for_iceberg branch from 2c3c3d7 to 69e0b1e Compare April 20, 2022 19:03
@findepi findepi merged commit 1b7654d into trinodb:master Apr 21, 2022
@findepi findepi mentioned this pull request Apr 21, 2022
@github-actions github-actions bot added this to the 378 milestone Apr 21, 2022
@kekwan
Copy link
Contributor

kekwan commented May 17, 2022

Does expire_snapshots execute in parallel like the Spark version?

@homar
Copy link
Member Author

homar commented May 17, 2022

Does expire_snapshots execute in parallel like the Spark version?

Currently it is not parallel. It may change in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

10 participants