Enable failure recovery for Iceberg connector#10622
Conversation
14f2a06 to
f4cb94d
Compare
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/IcebergQueryRunner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
AbstractTest is an old naming convention, these days we call them Base...Test
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergFailureRecovery.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergFailureRecovery.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Sounds like you're working against testTableModification abstraction.
There was a problem hiding this comment.
Yeah - I guess we do not need this test at all here. It would make sense to document behaviour difference for Iceberg if test was part of superclass.
I will drop for now, but it should be moved to superclass and we should start using TestingConnectorBehavior in super class
There was a problem hiding this comment.
nit: grammar is off
why not enable it for now here?
when is it going to be removed?
There was a problem hiding this comment.
I will add it for now here. As extracting it out from Base class requires significant restructuring.
As for timeline: 🤷
While i agree we should implement such a procedure, failure recovery should not leave garbage behind, in situations when this can be avoided. Iceberg writes files with random names. The file name is determined here we should consider
cc @phd3 @alexjo2144 |
findepi
left a comment
There was a problem hiding this comment.
Current state of the PR - LGTM % comments.
#10622 (comment) fits well as a followup.
25bcc15 to
b36010f
Compare
@findepi I added some code for that. PTAL and tell me what you think. |
b36010f to
53a739a
Compare
findepi
left a comment
There was a problem hiding this comment.
"Cleanup extranous output files in Iceberg DML"
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
could the RetryMode be provided by the engine to the finish*() methods, so that we don't have to embed the info in handles?
There was a problem hiding this comment.
We could - but I would opt for what we do now. It feels more natural - as we know information upfront. Also it allows for rejecting the request sooner if given retry mode is not supported by a connector.
There was a problem hiding this comment.
allows for rejecting the request sooner if given retry mode is not supported by a connector.
i did not suggest not to provide it to begin methods.
There was a problem hiding this comment.
I would be asymetric vs other "stuff" we pass to "begin*" methods (layout, list of columns). I will leave it as is.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
maybe add a comment that one query id be cannot be a prefix of another query id
There was a problem hiding this comment.
That is a good point actually.
Changed to
verify(!queryId.contains("-"), "queryId(%s) should not contain hyphens", queryId);
return fileName.startsWith(queryId + "-");
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/IcebergQueryRunner.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergWritableTableHandle.java
Outdated
Show resolved
Hide resolved
53a739a to
018609c
Compare
With query/task retries there is a chance that extra files, which does not make it to the snapshot are written to tables directory. While most of such cases should be cleaned up by writers on workers, there is a slim channce that some of those will survive query exection (e.g. if worker machine is killed). This commit adds pre-commit routine on coordinator which deletes what remained. This is still opportunistic and not 100% sure to delete everything as extra files may still be written after cleanup routine already completed, but we are trying our best. The remaining files does not imply query correctness.
018609c to
955d243
Compare
|
CI flake: #10631 |
Caveat:
With current logic it is possible that data file written during DML operation, which in the end is not part of the committed table snapshot, remains in the table directory on the distributed filesystem.
It should be a rare situation.
Orphaned files can be clean via
remove_orphan_filesrouting using Spark (https://iceberg.apache.org/#spark-procedures/).Eventuall we want to have similar routine in Trino (#10623)
fixes: #10253